使用Optuna获得准确的Scikit学习模型：超参数框架

更新时间:2023-06-06 15:29:17 阅读：评论：0

使⽤Optuna获得准确的Scikit学习模型：超参数框架

Hyper-parameter frameworks have been quite in the discussions in the past couple of months. With veral packages developed and still in progress, it has become a tough choice to pick one. Such frameworks not only help fit an accurate model but can help boost Data scientists’ efficiency to the next level. Here I am showing how a recent popular framework Optuna can be ud to get the best parameters for any Scikit-learn model. I have only implemented Random Forest and Logistic Regression as an example, but other algorithms can be implemented in a similar way shown here.

^ h yper参数框架已经相当在过去⼏个⽉的讨论。随着⼀些软件包的开发并且仍在进⾏中，选择⼀个软件包已经成为⼀个艰难的选择。这样的框架不仅有助于拟合准确的模型，⽽且可以帮助将数据科学家的效率提⾼到⼀个新的⽔平。在这⾥，我向您展⽰如何使⽤最近流⾏的Optuna框架来获取任何Scikit学习模型的最佳参数。我仅以“ 随机森林和逻辑回归”为例进⾏了说明，但是其他算法也可以按照此处显⽰的类似⽅式来实现。

为什么选择奥图纳？ (Why Optuna?)

Optuna can become one of the work-hor tools if integrated into everyday experimentations. I was deeply impresd when I implemented Logistic Regression using Optuna with such minimal effort. Her

e are a couple of reasons why I like Optuna:

如果将Optuna集成到⽇常实验中，则可以成为⼯作⼯具之⼀。当我以极少的努⼒使⽤Optuna实现Logistic回归时，我印象深刻。我喜欢Optuna的原因有两个：过程与方法

Easy u of API

奶水太多易于使⽤的API

Great documentation

优质的⽂档

Flexibility to accommodate any algorithms

适应任何算法的灵活性

Features like pruning and in-built great visualization modules

修剪和内置出⾊的可视化模块等功能

⽂档： :

GitHub ： :

录取分数线查询Before we start looking at the functionalities, we need to make sure that we have installed pre-requisite packages:

在开始研究功能之前，我们需要确保已安装必备软件包：

1. Optuna

奥图纳

2. Plotly

密谋

3. Pandas

⼤熊猫

4. Scikit-Learn

Scikit学习

基本参数和定义： (Basic parameters and defining:)

Setting up the basic framework is pretty simple and straightforward. It can be divided broadly into 4 steps:击掌的英文

设置基本框架⾮常简单明了。它可以⼤致分为4个步骤：

1. Define an objective function (Step 1)

定义⽬标函数 (步骤1)

2. Define a t of hyperparameters to try (Step 2)

定义⼀组要尝试的超参数 (步骤2)

3. Define the variable/metrics you want to optimize(Step 3)

定义要优化的变量/指标(第3步)

4. Finally, run the function. Here you need to mention:

最后，运⾏该函数。在这⾥您需要提及：

the scoring function/variable you are trying to optimize is to be maximized or minimized

评分功能/变量您试图优化是要最⼤化还是最⼩化

the number of trials you want to make. Higher the number of hyper-parameters and more the number of trials defined, the more computationally expensive it is (unless you have a beefy machine or a GPU!)

您要进⾏的试验次数。超参数的数量越多，定义的试验数量越多，计算量就越⼤(除⾮您拥有强⼤的机器或GPU！)

In the Optuna world, the term Trial is a single call of the objective function, and multiple such Trials together are called Study.

在Optuna世界中，“ 试⽤ ”⼀词是对⽬标函数的⼀次调⽤，⽽多个这样的“试⽤”⼀起称为“ 学习”。

Following is a basic implementation of Random Forest and Logistic Regression from scikit-learn package:

以下是从scikit-learn包中随机森林和逻辑回归的基本实现：

# Importing the Packages:

import optuna

import pandas as pd

from sklearn import linear_model

from sklearn import enmble

from sklearn import datats

from sklearn import model_lection

#Grabbing a sklearn Classification datat:

X,y = datats.load_breast_cancer(return_X_y=True, as_frame=True)

#Step 1. Define an objective function to be maximized.

def objective(trial):

classifier_name = trial.suggest_categorical("classifier", ["LogReg", "RandomForest"])

# Step 2. Setup values for the hyperparameters:

if classifier_name == 'LogReg':

logreg_c = trial.suggest_float("logreg_c", 1e-10, 1e10, log=True)

classifier_obj = linear_model.LogisticRegression(C=logreg_c)

el:

rf_n_estimators = trial.suggest_int("rf_n_estimators", 10, 1000)

rf_max_depth = trial.suggest_int("rf_max_depth", 2, 32, log=True)

classifier_obj = enmble.RandomForestClassifier(

max_depth=rf_max_depth, n_estimators=rf_n_estimators

)

# Step 3: Scoring method:

score = ss_val_score(classifier_obj, X, y, n_jobs=-1, cv=3)

accuracy = an()

return accuracy

# Step 4: Running it

study = ate_study(direction="maximize")

study.optimize(objective, n_trials=100)

When you run the above code, the output would be something like below:

当您运⾏上⾯的代码时，输出将如下所⽰：

Output in terminal or Jupyter notebook

在终端或Jupyter笔记本中输出

As we can e above the lection of Logistic Regression and Random Forest with their respective parameters varies in

each run. Each Trial can be a different algorithm with different parameters. The study object stores variety of outputs and

can be retrieved as follows:

正如我们在上⾯看到的，Logistic回归和随机森林的选择及其各⾃的参数在每次运⾏中都不同。每个试验可以是具有不同参数的不同算法。

研究对象存储各种输出，可以按以下⽅式检索：

# Getting the best trial:

print(f"The best trial is : \n{study.best_trial}")

# >> Output:

#The best trial is :

#FrozenTrial(number=18, value=0.9631114824097281, datetime_start=datetime.datetime(2020, 8, 16, 14, 24, 37, 407344), datetime_complete=datetime.datetim #distributions={'classifier': CategoricalDistribution(choices=('LogReg', 'RandomForest')), 'rf_n_estimators': IntUniformDistribution(high=1000, low=10, step=1), 'rf_

# Getting the best score:

print(f"The best value is : \n{study.best_value}")

羊奶粉的好处

# >> Output:

# 0.9631114824097281

# Getting the best parameters:

print(f"The best parameters are : \n{study.best_params}")

# >> Output:

# {'classifier': 'RandomForest', 'rf_n_estimators': 153, 'rf_max_depth': 21}

As we can e here Random Forest with n_estimators as 153 and max_depth of 21 works best for this datat.

正如我们在这⾥看到的， n_estimators为153且max_depth为21的随机森林最适合此数据集。

定义参数空间： (Defining parameter spaces:)

If we look in Step 2 (basic_optuna.py) we defined our hyper-parameter C to have a log of float values. Similarly, for Random

Forest we have defined max_depth and n_estimators as parameters to optimize. Optuna supports five ways in which we can

define the parameters:

如果我们看⼀下步骤2(basic_optuna.py)，我们将超参数C定义为具有浮点值的对数。同样，对于随机森林，我们将max_depth和

n_estimators定义为要优化的参数。 Optuna⽀持五种定义参数的⽅式：

def objective(trial):

# Categorical parameter

optimizer = trial.suggest_categorical('rf_criterion', ['gini', 'entropy'])

静夜思古诗# Int parameter

num_layers = trial.suggest_int("rf_n_estimators", 10, 1000)

# Uniform parameter

棕色英语怎么读

dropout_rate = trial.suggest_uniform('rf_min_weight_fraction_leaf', 0.0, 1.0)

# Loguniform parameter

learning_rate = trial.suggest_loguniform('rf_parameter_x', 1e-5, 1e-2)

# Discrete-uniform parameter

drop_path_rate = trial.suggest_discrete_uniform('rf_parameter_y', 0.0, 1.0, 0.1)

历史研究： (Historical Studies:)

Photo by on

在上照⽚

I feel, one of the esntial needs of a data scientist is that they would like to keep a track of all the experiments. This helps not only to compare any two, three, or multiple of them but also understand how the model behaves with a change in either hyper-parameters, adding new features, etc. Optuna has in-built functionality to keep a record of all the experiments. Before accessing old experiments we need to store them. The code below shows how to execute both of them:

我认为，数据科学家的基本需求之⼀是他们想跟踪所有实验。这不仅有助于⽐较其中的任何两个，三个或多个，⽽且还可以了解模型在超参数变化，添加新功能等⽅⾯的⾏为。Optuna具有内置功能，可以记录所有实验。在访问旧实验之前，我们需要存储它们。下⾯的代码显⽰了如何执⾏它们两者：

# Import the package:

import joblib

# Create a study name:

study_name = 'experiment-C'

# Store in DB:

study = ate_study(study_name=study_name, storage='sqlite:///tmp/experiments.db', load_if_exists=True)

# Store and load using joblib:

#joblib.dump(study, 'experiments.pkl')

如何辟谷#study = joblib.load('experiments.pkl')

# Optimize:

study.optimize(objective, n_trials=3)

1. You can create an experiment with a name of choice

本文发布于:2023-06-06 15:29:17，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1005801.html

上一篇：最新业务协议是什么业务协议书二人大全

下一篇：最新工程施工协议书精选(4篇)

标签：参数定义功能回归模型

留言与评论（共有 0 条评论）