首页 > 英文翻译

LightGBM参数调优代码详解

更新时间:2023-05-20 08:16:04 阅读：评论：0

LightGBM参数调优代码详解

⽂章⽬录

1.总体调参思路

对于基于决策树的模型，调参的⽅法都是⼤同⼩异。⼀般都需要如下步骤：

1. ⾸先选择较⾼的学习率，⼤概0.1附近，这样是为了加快收敛的速度。这对于调参是很有必要的。

2. 对决策树基本参数调参

3. 正则化参数调参

4. 最后降低学习率，这⾥是为了最后提⾼准确率

现将数据集为⼀个(4400+, 1000+)的数据集，全是数值特征，由于该问题是⼀个回归问题，故metric采⽤均⽅根误差。

2. 学习率和估计器及数⽬

通常先把学习率先定⼀个较⾼的值，取 learning_rate = 0.1，其次确定估计器boosting/boost/boosting_type的类型，不过默认都会选gbdt。

为了确定估计器的数⽬，也就是boosting迭代的次数，也可以说是残差树的数⽬，参数名为

n_estimators/num_iterations/num_round/num_boost_round。我们可以先将该参数设成⼀个较⼤的数，然后在cv结果中查看最优的迭代次数，具体如代码。

在这之前，我们必须给其他重要的参数⼀个初始值。初始值的意义不⼤，只是为了⽅便确定其他参数。下⾯先给定⼀下初始值：

以下参数根据具体项⽬要求定：

'boosting_type'/'boosting':'gbdt'

菜谱英文翻译'objective':'regression'

'metric':'rm'

cet3成绩查询

以下参数我选择的初始值，你可以根据⾃⼰的情况来选择：

'max_depth':6### 根据问题来定咯，由于我的数据集不是很⼤，所以选择了⼀个适中的值，其实4-10都⽆所谓。

'num_leaves':50### 由于lightGBM是leaves_wi⽣长，官⽅说法是要⼩于2^max_depth

'subsample'/'bagging_fraction':0.8### 数据采样

'colsample_bytree'/'feature_fraction':0.8### 特征采样

下⾯我是⽤LightGBM的cv函数进⾏演⽰：

params ={

'boosting_type':'gbdt',

'objective':'regression',

'learning_rate':0.1,

'num_leaves':50,

'max_depth':6,

'subsample':0.8,

呼叫保持'colsample_bytree':0.8,

}

data_train = lgb.Datat(df_train, y_train, silent=True)

cv_results = lgb.cv(

params, data_train, num_boost_round=1000, nfold=5, stratified=Fal, shuffle=True, metrics='rm',

early_stopping_rounds=50, verbo_eval=50, show_stdv=True, ed=0)

print('best n_estimators:',len(cv_results['rm-mean']))

print('best cv score:', cv_results['rm-mean'][-1])

best n_estimators: 43

best cv score: 1.3838664241

由于数据集不是很⼤，所以在学习率为0.1时，最优的迭代次数只有43。那么现在，我们就代⼊(0.1, 43)进⼊其他参数的tuning。但是还是建议，在硬件条件允许的条件下，学习率还是越⼩越好。

2. max_depth 和 num_leaves

这是提⾼精确度的最重要的参数。

max_depth ：设置树深度，深度越⼤可能过拟合

num_leaves：因为 LightGBM 使⽤的是 leaf-wi 的算法，因此在调节树的复杂程度时，使⽤的是 num_leaves ⽽不是 max_depth。⼤(max_depth)(max_depth)

致换算关系：num_leaves = 2，但是它的值的设置应该⼩于 2，否则可能会导致过拟合。

我们也可以同时调节这两个参数，对于这两个参数调优，我们先粗调，再细调：

这⾥我们引⼊sklearn⾥的GridSearchCV()函数进⾏搜索。不知道怎的，这个函数特别耗内存，特别耗时间，特别耗精⼒。

del_lection import GridSearchCV

### 我们可以创建lgb的sklearn模型，使⽤上⾯选择的(学习率，评估器数⽬)

model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=50,

learning_rate=0.1, n_estimators=43, max_depth=6,

metric='rm', bagging_fraction =0.8,feature_fraction =0.8)

params_test1={

'max_depth':range(3,8,2),

'num_leaves':range(50,170,30)

}

garch1 = GridSearchCV(estimator=model_lgb, param_grid=params_test1, scoring='neg_mean_squared_error', cv=5, verbo=1, n_jobs=4)

garch1.fit(df_train, y_train)

garch1.best_params_, garch1.best_score_

{‘max_depth’: 7, ‘num_leaves’: 80},

-1.8602436718814157

这⾥，我们运⾏了12个参数组合，得到的最优解是在max_depth为7，num_leaves为80的情况下，分数为-1.860。

这⾥必须说⼀下，sklearn模型评估⾥的scoring参数都是采⽤的higher return values are better than lower return values（较⾼的返回值优于较低的返回值）。

但是，我采⽤的metric策略采⽤的是均⽅误差(rm)，越低越好，所以sklearn就提供了neg_mean_squared_erro参数，也就是返回metric的负数，所以就均⽅差来说，也就变成负数越⼤越好了。

所以，可以看到，最优解的分数为-1.860，转化为均⽅差为np.sqrt(-(-1.860)) = 1.3639，明显⽐step1的分数要好很多。

⾄此，我们将我们这步得到的最优解代⼊第三步。其实，我这⾥只进⾏了粗调，如果要得到更好的效果，可以将max_depth在7附近多取⼏个值，num_leaves在80附近多取⼏个值。千万不要怕⿇烦，虽然这确实很⿇烦。

params_test2={

'max_depth':[6,7,8],

'num_leaves':[68,74,80,86,92]

}treaty

garch2 = GridSearchCV(estimator=model_lgb, param_grid=params_test2, scoring='neg_mean_squared_error', cv=5, verbo=1, n_jobs=4)

garch2.fit(df_train, y_train)

garch2.best_params_, garch2.best_score_

{‘max_depth’: 7, ‘num_leaves’: 68},

-1.8602436718814157)

3. min_data_in_leaf 和 min_sum_hessian_in_leaf

说到这⾥，就该降低过拟合了。

min_data_in_leaf 是⼀个很重要的参数, 也叫min_child_samples，它的值取决于训练数据的样本个数和num_leaves. 将其设置的较⼤可以避免⽣成⼀个过深的树, 但有可能导致⽋拟合。

min_sum_hessian_in_leaf：也叫min_child_weight，使⼀个结点分裂的最⼩海森值之和，真拗⼝（Minimum sum of hessians in one leaf to allow a split. Higher values potentially decrea overfitting）

我们采⽤跟上⾯相同的⽅法进⾏：

params_test3={

'min_child_samples':[18,19,20,21,22],

'min_child_weight':[0.001,0.002]

}

model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=80,

learning_rate=0.1, n_estimators=43, max_depth=7,

metric='rm', bagging_fraction =0.8, feature_fraction =0.8)

garch3 = GridSearchCV(estimator=model_lgb, param_grid=params_test3, scoring='neg_mean_squared_error', cv=5, verbo=1, n_jobs=4)

garch3.fit(df_train, y_train)

garch3.best_params_, garch3.best_score_

2016年6月英语四级真题{‘min_child_samples’: 20, ‘min_child_weight’: 0.001},

-1.8602436718814157)

这是我经过粗调后细调的结果，可以看到，min_data_in_leaf的最优值为20，⽽min_sum_hessian_in_leaf对最后的值⼏乎没有影响。且这⾥调参之后，最后的值没有进⾏优化，说明之前的默认值即为20，0.001。

4. feature_fraction 和 bagging_fraction

这两个参数都是为了降低过拟合的。

feature_fraction参数来进⾏特征的⼦抽样。这个参数可以⽤来防⽌过拟合及提⾼训练速度。

教育部考试中心电话

bagging_fraction+bagging_freq参数必须同时设置，bagging_fraction相当于subsample样本采样，可以使bagging更快的运⾏，同时也可以降拟合。bagging_freq默认0，表⽰bagging的频率，0意味着没有使⽤bagging，k意味着每k轮迭代进⾏⼀次bagging。

不同的参数，同样的⽅法。

params_test4={

'feature_fraction':[0.5,0.6,0.7,0.8,0.9],

'bagging_fraction':[0.6,0.7,0.8,0.9,1.0]

}

model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=80,

learning_rate=0.1, n_estimators=43, max_depth=7,

metric='rm', bagging_freq =5, min_child_samples=20)

kapa

garch4 = GridSearchCV(estimator=model_lgb, param_grid=params_test4, scoring='neg_mean_squared_error', cv=5, verbo=1, n_jobs=4)

garch4.fit(df_train, y_train)

garch4.best_params_, garch4.best_score_

{‘bagging_fraction’: 1.0, ‘feature_fraction’: 0.7},

-1.8541224387666373

从这⾥可以看出来，bagging_feaction和feature_fraction的理想值分别是1.0和0.7，⼀个很重要原因就是，我的样本数量⽐较⼩(4000+)，但是特征数量很多(1000+)。所以，这⾥我们取更⼩的步长，对feature_fraction进⾏更细致的取值。

params_test5={

'feature_fraction':[0.62,0.65,0.68,0.7,0.72,0.75,0.78]

}

model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=80,

learning_rate=0.1, n_estimators=43, max_depth=7,

metric='rm', min_child_samples=20)

garch5 = GridSearchCV(estimator=model_lgb, param_grid=params_test5, scoring='neg_mean_squared_error', cv=5, verbo=1, n_jobs=4)

garch5.fit(df_train, y_train)

garch5.best_params_, garch5.best_score_

{‘feature_fraction’: 0.7},

-1.8541224387666373

好吧，feature_fraction就是0.7了

5. 正则化参数

正则化参数lambda_l1(reg_alpha), lambda_l2(reg_lambda)，毫⽆疑问，是降低过拟合的，两者分别对应l1正则化和l2正则化。我们也来尝试⼀下使⽤这两个参数。

params_test6={

'reg_alpha':[0,0.001,0.01,0.03,0.08,0.3,0.5],

'reg_lambda':[0,0.001,0.01,0.03,0.08,0.3,0.5]

}

model_lgb = lgb.LGBMRegressor(objective='regression',num_leaves=80,

learning_rate=0.b1, n_estimators=43, max_depth=7,

metric='rm', min_child_samples=20, feature_fraction=0.7)

garch6 = GridSearchCV(estimator=model_lgb, param_grid=params_test6, scoring='neg_mean_squared_error', cv=5, verbo=1, n_jobs=4)

garch6.fit(df_train, y_train)

garch6.best_params_, garch6.best_score_

国家线预测{‘reg_alpha’: 0, ‘reg_lambda’: 0},

-1.8541224387666373

哈哈，看来我多此⼀举了。

6. 降低learning_rate

之前使⽤较⾼的学习速率是因为可以让收敛更快，但是准确度肯定没有细⽔长流来的好。最后，我们使⽤较低的学习速率，以及使⽤更多的决策树n_estimators来训练数据，看能不能可以进⼀步的优化分数。

我们可以⽤回lightGBM的cv函数了，我们代⼊之前优化好的参数。

params = {

'boosting_type': 'gbdt',

'objective': 'regression',

'learning_rate': 0.005,

'num_leaves': 80,

'max_depth': 7,

'min_data_in_leaf': 20,

'subsample': 1,

'colsample_bytree': 0.7,graco

}

data_train = lgb.Datat(df_train, y_train, silent=True)

cv_results = lgb.cv(

params, data_train, num_boost_round=10000, nfold=5, stratified=Fal, shuffle=True, metrics='rm',

early_stopping_rounds=50, verbo_eval=100, show_stdv=True)

print('best n_estimators:', len(cv_results['rm-mean']))

print('best cv score:', cv_results['rm-mean'][-1])

新东方官网这就是⼀个⼤概过程吧。

参考

本文发布于:2023-05-20 08:16:04，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/90/115558.html

上一篇：The Psychological Inflexibility in Pain Scale (PIP

下一篇：gensim库_LDA主题模型困惑度Perplexity计算

标签：参数学习没有拟合训练条件特征

留言与评论（共有 0 条评论）