首页 > 美文阅读

使用Hyperopt对Xgboost进行参数调优（自己踩坑）

更新时间:2023-05-15 08:06:29 阅读：评论：0

使⽤Hyperopt对Xgboost进⾏参数调优（⾃⼰踩坑）

什么是超参数？

超参数是⽆法通过算法学习得到参数；

超参数需要⼈为预先设置，⽽每组超参数会产⽣结构不同的模型；

超参数需要⼀定的调整去适应不同的应⽤场景；

选择参数的⽅法

⽹格搜索尝试检查每种可能的参数组合，当有⼀种组合优化了你的标准时（⽐如损失函数达到最⼩值），就停⽌搜索。

随机搜索随机检查超参数空间，但速度更快⽽且⼤多时候也更好。块的组词

贝叶斯优化——我们为超参数分布设置⼀个先决条件，然后在观察不同实验的同时逐步更新它，这让我们可以更好地拟合超参数空间，从⽽更好地找到最⼩值。

接⼝说明

fmin接⼝：需要定义⼀个概率分布，确保能在变化的超参中找到⼀个可信的值。就像scipy中的optimize.minimize借⼝。因此需要我们提供⼀个优化的⽬标函数。很多优化算法都要假设在⼀个向量空间⾥⾯搜索最佳值，hyperopt可以对搜索空间进⼀步细化，⽐如说使⽤log话活着均匀分布随机获取之类。

hyperopt需要定义的主要有四个地⽅：

1.需要最⼩化的⽬标函数

2.优化所需要搜索的搜索空间

3.⼀个存放搜索过程计算结果的数据库（利⽤结果进⾏分析）

4.所使⽤的搜索算法

1. 定义⽬标函数

例如优化⼀个⼆次函数

def q(args):

x,y = args

return x **2+ y **2

2. 确定⼀个搜索空间

⽐如在［0，1］内搜索x

from hyperopt import hp

space =[hp.uniform("x",0,1),hp.normal("y",0,1)]

搜索空间的表达式

主要通过hyperopt .hp模块，⽐如：

fmin(q,space = hp.uniform(‘a’,0,1))

‘a’：表⽰标签，在⾃定义空间中，每⼀个超参都必须有⼀个类似这样的标签。

其余说明如下：

hp.choice(label,options):

返回options中的⼀个，因此options应该是⼀个列表或者元组。

hp.pchoice(label,p_options):

按照概率集p_options来选择option之间的⼀个，⼀次为（prob，option）形式

hp.uniform(label,low,high):

从low和high按照均匀分布产⽣数据

hp.quniform(label,low,high,q):

按照round(exp(uniform(low,high))/q)*q产⽣，

hp.loguniform(label,low,high):

从exp(uniform(low,high))，当搜索空间限制在[exp(low),exp(high)]之间

从⼀个正态分布中产⽣数据，主要适⽤于⽆限制的变量

hp.lognormal(label, mu, sigma)：

从exp(normal(mu, sigma)).产⽣，该变量限制为正数

hp.randint(label, upper)：

从［0，upper）中随机产⽣⼀个整数

3. 选择⼀个搜索算法

⽬前主要有两个算法：

algo=hyperopt.tpe.suggest

algo=hyperopt.rand.suggest

使⽤如下：

from hyperopt import hp, fmin, rand, tpe, space_eval

best = fmin(q, space, algo=rand.suggest)

print best

# => XXX

print space_eval(space, best)

# => XXX

best = fmin(q, space, algo=tpe.suggest)

print best

如果有多个需要优化的超参：则可以有不同写法，⽐如：

from hyperopt import hp

list_space =[

hp.uniform(’a’,0,1),

hp.loguniform(’b’,0,1)]

#or

tuple_space =(

hp.uniform(’a’,0,1),

hp.loguniform(’b’,0,1))

#or

dict_space ={

’a’: hp.uniform(’a’,0,1),

’b’: hp.loguniform(’b’,0,1)}

可以使⽤hyperopt.pyll.stochastic从搜索空间中抽样，⽐如：

from hyperopt.pyll.stochastic import sample

print sample(list_space)

# => [0.13, .235]

print sample(nested_space)

# => [[{’ca’: 1, ’a’, 0.12}, {’ca’: 2, ’b’: 2.3}],

# ’extra_literal_string’,

# 3]

也可以在搜索空间定义表达式（这样的话就可以随意的构造你的搜索空间中的值，⽽不局限于在某⼀个向量空间中，如下：

from hyperopt.pyll import scope

def foo(x):

return str(x)*3

expr_space ={

’a’:1+ hp.uniform(’a’,0,1),

’b’: scope.minimum(hp.loguniform(’b’,0,1),10),

’c’: scope.call(foo, args=(hp.randint(’c’,5),)),

}

#当然也可以通过装饰器来增加scope类，⽐如：

from hyperopt.pyll import scope

@scope.define

内存条怎么看频率def foo(x):

return str(x)*3

# -- This will print "000"; foo is called as usual.

print foo(0)

人体农场

expr_space ={

'a':1+ hp.uniform('a',0,1),

'b': scope.minimum(hp.loguniform('b',0,1),10),

'c': scope.foo(hp.randint('cba',5)),

}

⼏个常⽤的分类算法的使⽤⽅式：

from hyperopt import hp

from hyperopt.pyll import scope

from sklearn.naive_bayes import GaussianNB

from sklearn.svm import SVC

import DecisionTreeClassifier\

as DTree

scope.define(GaussianNB)#贝叶斯模型中三种模型之⾼斯模型

scope.define(SVC)#⽀持向量机

scope.define(DTree, name='DTree')#决策树

C = hp.lognormal('svm_C',0,1)#惩罚项参数

space = hp.pchoice('estimator',[#这⾥可以选择哪个作为模型来对数据进⾏训练(0.1, scope.GaussianNB()),

(0.2, scope.SVC(C=C, kernel='linear')),#线性核的svm

(0.3, scope.SVC(C=C, kernel='rbf',width=hp.lognormal('svm_rbf_width',0,1))), (0.4, scope.DTree(criterion=hp.choice('dtree_criterion',['gini','entropy']),

max_depth=hp.choice('dtree_max_depth',

[None, hp.qlognormal('dtree_max_depth_N',

2,2,1)])))])

4. 存储搜索结果

trials结构，⽤法主要如下：

from hyperopt import(hp, fmin, space_eval,Trials)

trials = Trials()#使⽤Trials类即可

best = fmin(q, space, trials=trials)

ials

#其中关于trials的结果说明，主要是dict格式，其中说明如下：

tid：整型，主要hitrial的识别（就是每⼀次调整的id）

results：格式为dict形式，有loss，status等

misc：格式为dict形式，有indxs和vals

导⼊库

import pandas as pd爱你就像爱生命

import numpy as np

import xgboost as xgb

from xgboost.sklearn import XGBClassifier

from sklearn import metrics

del_lection import GridSearchCV

del_lection import train_test_split

import matplotlib.pyplot as plt

祭祖文

from hyperopt import hp,tpe

from hyperopt.fmin import fmin

数据集

data = pd.read_csv("C:\\Urs\\Nihil\\Documents\\pythonlearn\\data\\train_modified.csv")

target='Disburd'

IDcol ='ID'

x_columns =[x for x lumns if x not in[target, IDcol]]

X = data[x_columns]

y = data['Disburd']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,random_state=1)

训练

def objective(params):

params={

'max_depth':int(params['max_depth']),

'gamma':"{:.3f}".format(params['gamma']),

'colsample_bytree':"{:.3f}".format(params['colsample_bytree'])

}

clf = xgb.XGBClassifier(

n_estimators=250,

learning_rate=0.05,

**params

)

y_pred = clf.predict(X_test)

score = roc_auc_score(y_test,y_pred)

print("score is {}".format(score))

树的数量(n_estimators)

学习速率-后树的影响较⼩(learning_rate)

树深度(max_depth)

gamma-过拟合参数

colsample_bytree-减少过度拟合。

酒债寻常行处有⼀开始运⾏这段代码的时候蹦出了⼀个错误：

TypeError:'NoneType'object is not iterable

原因：没加return，没有返回值。即需要优化的⽬标函数。

因此，修改⼀下：

北部湾创业项目

def objective(params):

params={

'max_depth':int(params['max_depth']),

'gamma':"{:.3f}".format(params['gamma']),

'colsample_bytree':"{:.3f}".format(params['colsample_bytree'])

}

clf = xgb.XGBClassifier(

n_estimators=250,

learning_rate=0.05,

**params

)

clf.fit(X_train,y_train)

y_pred = clf.predict(X_test)

score = roc_auc_score(y_test,y_pred)

print("score is {}".format(score))

return score

搜索优化值

space ={

'max_depth':hp.quniform('max_depth',2,8,1),

'colsample_bytree':hp.uniform('colsample_bytree',0.3,1.0),

'gamma':hp.uniform('gamma',0.0,0.5)

}

best = fmin(

fn=objective,

space=space,

algo=tpe.suggest,

max_evals=10

)

print('Hyperopt estimated optimum {}'.format(best))

#运⾏结果

Hyperopt estimated optimum {'colsample_bytree':0.8557201590065211,'gamma':0.394402672688065,'max_depth':4.0}

重新设置⼀个优化函数

def optimize():氧化沟工艺原理

space ={

'n_estimators':hp.quniform('n_estimators',100,1000,1),

'eta':hp.quniform('eta',0.025,0.5,0.025),

'max_depth':hp.quniform('max_depth',2,8,1),

'min_child_weight':hp.quniform('min_child_weight',1,6,1),

'subsample':hp.quniform('subsample',0.5,1,0.05),

'gamma':hp.uniform('gamma',0.0,0.5),

'colsample_bytree': hp.uniform('colsample_bytree',0.3,1.0),

'eval_metric':'auc',

'objective':'binary:logistic',

}

best = fmin(objective,space,algo=tpe.suggest,max_evals=250)

return best

best_hyperparams = optimize()

print("the best hyperparameter are:{}".format(best_hyperparams))

#运⾏结果：

100%|██████████|250/250[37:44<00:00,9.77s/it, best loss:0.4997457627118644]

the best hyperparameter are:{'colsample_bytree':0.997458730977807,'eta':0.05,'gamma':0.33274644882347476,'max_depth':8.0,'min_child_weight': 5.0,'n_estimators':731.0,'subsample':0.9}

对⽐结果

本文发布于:2023-05-15 08:06:29，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/638346.html

上一篇：最新送给自己的励志语录简短(7篇)

下一篇：2023年英国女王大学 qs排名(3篇)

标签：搜索空间参数优化需要模型函数

留言与评论（共有 0 条评论）