使用Optuna获得准确的Scikit学习模型:超参数框架

更新时间:2023-06-06 15:29:17 阅读: 评论:0

使⽤Optuna获得准确的Scikit学习模型:超参数框架
Hyper-parameter frameworks have been quite in the discussions in the past couple of months. With veral packages developed and still in progress, it has become a tough choice to pick one. Such frameworks not only help fit an accurate model but can help boost Data scientists’ efficiency to the next level. Here I am showing how a recent popular framework Optuna can be ud to get the best parameters for any Scikit-learn model. I have only implemented Random Forest and Logistic Regression as an example, but other algorithms can be implemented in a similar way shown here.
^ h yper参数框架已经相当在过去⼏个⽉的讨论。 随着⼀些软件包的开发并且仍在进⾏中,选择⼀个软件包已经成为⼀个艰难的选择。 这样的框架不仅有助于拟合准确的模型,⽽且可以帮助将数据科学家的效率提⾼到⼀个新的⽔平。 在这⾥,我向您展⽰如何使⽤最近流⾏的Optuna框架来获取任何Scikit学习模型的最佳参数。 我仅以“ 随机森林和逻辑回归”为例进⾏了说明,但是其他算法也可以按照此处显⽰的类似⽅式来实现。
为什么选择奥图纳? (Why Optuna?)
Optuna can become one of the work-hor tools if integrated into everyday experimentations. I was deeply impresd when I implemented Logistic Regression using Optuna with such minimal effort. Her
e are a couple of reasons why I like Optuna:
如果将Optuna集成到⽇常实验中,则可以成为⼯作⼯具之⼀。 当我以极少的努⼒使⽤Optuna实现Logistic回归时,我印象深刻。 我喜欢Optuna的原因有两个:过程与方法
Easy u of API
奶水太多易于使⽤的API
Great documentation
优质的⽂档
Flexibility to accommodate any algorithms
适应任何算法的灵活性
Features like pruning and in-built great visualization modules
修剪和内置出⾊的可视化模块等功能
⽂档 : :
GitHub : :
录取分数线查询Before we start looking at the functionalities, we need to make sure that we have installed pre-requisite packages:
在开始研究功能之前,我们需要确保已安装必备软件包:
1. Optuna
奥图纳
2. Plotly
密谋
3. Pandas
⼤熊猫
4. Scikit-Learn
Scikit学习
基本参数和定义: (Basic parameters and defining:)
Setting up the basic framework is pretty simple and straightforward. It can be divided broadly into 4 steps:击掌的英文
设置基本框架⾮常简单明了。 它可以⼤致分为4个步骤:
1. Define an objective function (Step 1)
定义⽬标函数 (步骤1)
2. Define a t of hyperparameters to try (Step 2)
定义⼀组要尝试的超参数 (步骤2)
3. Define the variable/metrics you want to optimize(Step 3)
定义要优化的变量/指标(第3步)
4. Finally, run the function. Here you need to mention:
最后, 运⾏该函数。 在这⾥您需要提及:
the scoring function/variable you are trying to optimize is to be maximized or minimized
评分功能/变量 您试图优化是要最⼤化还是最⼩化
the number of trials you want to make. Higher the number of hyper-parameters and more the number of trials defined, the more computationally expensive it is (unless you have a beefy machine or a GPU!)
您要进⾏的试验次数 。 超参数的数量越多,定义的试验数量越多,计算量就越⼤(除⾮您拥有强⼤的机器或GPU!)
In the Optuna world, the term Trial is a single call of the objective function, and multiple such Trials together are called Study.
在Optuna世界中,“ 试⽤ ”⼀词是对⽬标函数的⼀次调⽤,⽽多个这样的“试⽤”⼀起称为“ 学习”。
Following is a basic implementation of Random Forest and Logistic Regression from scikit-learn package:
以下是从scikit-learn包中随机森林和逻辑回归的基本实现:
# Importing the Packages:
import optuna
import pandas as pd
from sklearn import linear_model
from sklearn import enmble
from sklearn import datats
from sklearn import model_lection
#Grabbing a sklearn Classification datat:
X,y = datats.load_breast_cancer(return_X_y=True, as_frame=True)
#Step 1. Define an objective function to be maximized.
def objective(trial):
classifier_name = trial.suggest_categorical("classifier", ["LogReg", "RandomForest"])
# Step 2. Setup values for the hyperparameters:
if classifier_name == 'LogReg':
logreg_c = trial.suggest_float("logreg_c", 1e-10, 1e10, log=True)
classifier_obj = linear_model.LogisticRegression(C=logreg_c)
el:
rf_n_estimators = trial.suggest_int("rf_n_estimators", 10, 1000)
rf_max_depth = trial.suggest_int("rf_max_depth", 2, 32, log=True)
classifier_obj = enmble.RandomForestClassifier(
max_depth=rf_max_depth, n_estimators=rf_n_estimators
)
# Step 3: Scoring method:
score = ss_val_score(classifier_obj, X, y, n_jobs=-1, cv=3)
accuracy = an()
return accuracy
# Step 4: Running it
study = ate_study(direction="maximize")
study.optimize(objective, n_trials=100)
When you run the above code, the output would be something like below:
当您运⾏上⾯的代码时,输出将如下所⽰:
Output in terminal or Jupyter notebook
在终端或Jupyter笔记本中输出
As we can e above the lection of Logistic Regression and Random Forest with their respective parameters varies in
each run. Each Trial can be a different algorithm with different parameters. The study object stores variety of outputs and
can be retrieved as follows:
正如我们在上⾯看到的,Logistic回归和随机森林的选择及其各⾃的参数在每次运⾏中都不同。 每个试验可以是具有不同参数的不同算法。
研究对象存储各种输出,可以按以下⽅式检索:
# Getting the best trial:
print(f"The best trial is : \n{study.best_trial}")
# >> Output:
#The best trial is :
#FrozenTrial(number=18, value=0.9631114824097281, datetime_start=datetime.datetime(2020, 8, 16, 14, 24, 37, 407344), datetime_complete=datetime.datetim #distributions={'classifier': CategoricalDistribution(choices=('LogReg', 'RandomForest')), 'rf_n_estimators': IntUniformDistribution(high=1000, low=10, step=1), 'rf_
# Getting the best score:
print(f"The best value is : \n{study.best_value}")
羊奶粉的好处
# >> Output:
# 0.9631114824097281
# Getting the best parameters:
print(f"The best parameters are : \n{study.best_params}")
# >> Output:
# {'classifier': 'RandomForest', 'rf_n_estimators': 153, 'rf_max_depth': 21}
As we can e here Random Forest with n_estimators as 153 and max_depth of 21 works best for this datat.
正如我们在这⾥看到的, n_estimators为153且max_depth为21的 随机森林最适合此数据集。
定义参数空间: (Defining parameter spaces:)
If we look in Step 2 (basic_optuna.py) we defined our hyper-parameter C to have a log of float values. Similarly, for Random
Forest we have defined max_depth and n_estimators as parameters to optimize. Optuna supports five ways in which we can
define the parameters:
如果我们看⼀下步骤2(basic_optuna.py),我们将超参数C定义为具有浮点值的对数。 同样,对于随机森林,我们将max_depth和
n_estimators定义为要优化的参数。 Optuna⽀持五种定义参数的⽅式:
def objective(trial):
# Categorical parameter
optimizer = trial.suggest_categorical('rf_criterion', ['gini', 'entropy'])
静夜思古诗# Int parameter
num_layers = trial.suggest_int("rf_n_estimators", 10, 1000)
# Uniform parameter
棕色英语怎么读
dropout_rate = trial.suggest_uniform('rf_min_weight_fraction_leaf', 0.0, 1.0)
# Loguniform parameter
learning_rate = trial.suggest_loguniform('rf_parameter_x', 1e-5, 1e-2)
# Discrete-uniform parameter
drop_path_rate = trial.suggest_discrete_uniform('rf_parameter_y', 0.0, 1.0, 0.1)
历史研究: (Historical Studies:)
Photo by on
在 上 照⽚
I feel, one of the esntial needs of a data scientist is that they would like to keep a track of all the experiments. This helps not only to compare any two, three, or multiple of them but also understand how the model behaves with a change in either hyper-parameters, adding new features, etc. Optuna has in-built functionality to keep a record of all the experiments. Before accessing old experiments we need to store them. The code below shows how to execute both of them:
我认为,数据科学家的基本需求之⼀是他们想跟踪所有实验。 这不仅有助于⽐较其中的任何两个,三个或多个,⽽且还可以了解模型在超参数变化,添加新功能等⽅⾯的⾏为。Optuna具有内置功能,可以记录所有实验。 在访问旧实验之前,我们需要存储它们。 下⾯的代码显⽰了如何执⾏它们两者:
# Import the package:
import joblib
# Create a study name:
study_name = 'experiment-C'
# Store in DB:
study = ate_study(study_name=study_name, storage='sqlite:///tmp/experiments.db', load_if_exists=True)
# Store and load using joblib:
#joblib.dump(study, 'experiments.pkl')
如何辟谷#study = joblib.load('experiments.pkl')
# Optimize:
study.optimize(objective, n_trials=3)
1. You can create an experiment with a name of choice

本文发布于:2023-06-06 15:29:17,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/89/1005801.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:参数   定义   功能   回归   模型
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图