模型融合之Stacking(原理+Python代码)

更新时间:2023-05-20 09:31:03 阅读: 评论:0

模型融合之Stacking(原理+Python代码)
数据来源于天池赛题:零基础⼊门数据挖掘 - ⼆⼿车交易价格预测
⽬录
⼀、原理介绍
在数据挖掘过程中,单个模型的泛化能⼒往往⽐较单薄,⽽模型融合的⽅法可以结合多个模型的优点,提升模型的预测精度。典型的模型融合的⽅法有加权融合、Stacking/Blending、提升树。下⾯将以Stacking为例,做⼀个详细介绍。
Stacking是⼀种多层模型,将已训练好的多个模型作为基分类器。然后将这⼏个学习器的预测结果作为新的训练集,来学习⼀个新的学习器。
即可以看成是⼀种结合策略,使⽤另外⼀个机器学习算法来将个体机器学习器的结果结合在⼀起。
我们称第⼀层学习器为初级学习器,称第⼆层学习器为次级学习器。
通常情况下,为了防⽌过拟合,次级学习器宜选⽤简单模型。如在回归问题中,可以使⽤线性回归;在分类问题中,可以使⽤
logistic。
北大青鸟培训课程
⼆、代码实现
#加载需要的模块
import warnings
麝香石竹warnings.filterwarnings('ignore')
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
del_lection import cross_val_score
ics import mean_absolute_error,  make_scorer
from xgboost.sklearn import XGBRegressor
from lightgbm.sklearn import LGBMRegressor
del_lection import train_test_split
ble import GradientBoostingRegressor
import lightgbm as lgb
import xgboost as xgb
del_lection import GridSearchCV,cross_val_score
from sklearn import linear_model
#数据读取
Train_data = pd.read_csv('F:/data/ud_car_train_20200313.csv', p=' ')
TestA_data = pd.read_csv('F:/data/ud_car_testA_20200313.csv', p=' ')
#选择前⾯特征⼯程过程中筛选出的特征
Train_data=Train_data[['v_12','v_10','v_9','v_11','price']]
TestA_data=TestA_data[['v_12','v_10','v_9','v_11']]
上述特征的筛选过程可以参考:
#划分⾃变量和⽬标变量
X_data = Train_data.drop('price', axis=1)#删除列
Y_data = Train_data['price']
X_test  = TestA_data
def build_model_gbdt(x_train,y_train):
enthusiastic
estimator =GradientBoostingRegressor(loss='ls',subsample=0.85,max_depth=5,n_estimators =100)    param_grid ={
'learning_rate':[0.05,0.08,0.1,0.2],
}
gbdt = GridSearchCV(estimator, param_grid,cv=3)
gbdt.fit(x_train,y_train)
print(gbdt.best_params_)
# print(gbdt.best_estimator_ )
return gbdt
def build_model_xgb(x_train,y_train):
model = xgb.XGBRegressor(n_estimators=120, learning_rate=0.08, gamma=0, subsample=0.8,\
colsample_bytree=0.9, max_depth=5)#, objective ='reg:squarederror'
model.fit(x_train, y_train)
return model
def build_model_lgb(x_train,y_train):
estimator = lgb.LGBMRegressor(num_leaves=63,n_estimators =100)
param_grid ={
'learning_rate':[0.01,0.05,0.1],
}
gbm = GridSearchCV(estimator, param_grid)
gbm.fit(x_train, y_train)
return gbm
def build_model_lr(x_train,y_train):
reg_model = linear_model.LinearRegression()
reg_model.fit(x_train,y_train)
return reg_model
#交叉验证
#划分训练集和测试集
x_train,x_val,y_train,y_val = train_test_split(X_data,Y_data,test_size=0.3)
#训练模型
print('')
think of
model_gbdt = build_model_gbdt(x_train,y_train)
val_gbdt = model_gbdt.predict(x_val)
subA_gbdt = model_gbdt.predict(X_test)
print('')
留学服务中心地址
model_xgb = build_model_xgb(x_train,y_train)
val_xgb = model_xgb.predict(x_val)
subA_xgb = model_xgb.predict(X_test)
print('')
model_lgb = build_model_lgb(x_train,y_train)
val_lgb = model_lgb.predict(x_val)
subA_lgb = model_lgb.predict(X_test)upper
Predict GBDT…
{‘learning_rate’: 0.1}
predict XGB…
predict lgb…
在初级学习器中,⼀共建⽴了三个模型,分别是LightGBM、GBDT、XGBoost。
isomers关于XGBoost模型的原理和代码实现,可以参考:
#第⼀层
train_lgb_pred = model_lgb.predict(x_train)
train_xgb_pred = model_xgb.predict(x_train)
train_gbdt_pred = model_gbdt.predict(x_train)
Strak_X_train = pd.DataFrame()
Strak_X_train['Method_1']= train_lgb_pred
Strak_X_train['Method_2']= train_xgb_pred
Strak_X_train['Method_3']= train_gbdt_pred
Strak_X_val = pd.DataFrame()
Strak_X_val['Method_1']= val_lgb
Strak_X_val['Method_2']= val_xgb
Strak_X_val['Method_3']= val_gbdt
Strak_X_test = pd.DataFrame()
Strak_X_test['Method_1']= subA_lgb
Strak_X_test['Method_2']= subA_xgb
Strak_X_test['Method_3']= subA_gbdt
这⾥将线性回归作为次级学习器
#第⼆层
桌子的英文
model_lr_Stacking = build_model_lr(Strak_X_train,y_train)
#训练集
train_pre_Stacking = model_lr_Stacking.predict(Strak_X_train)
suggest
print('MAE of Stacking-LR:',mean_absolute_error(y_train,train_pre_Stacking))
#验证集
val_pre_Stacking = model_lr_Stacking.predict(Strak_X_val)
如何化妆步骤
print('MAE of Stacking-LR:',mean_absolute_error(y_val,val_pre_Stacking))
#预测集
print('')
subA_Stacking = model_lr_Stacking.predict(Strak_X_test)
MAE of Stacking-LR: 914.5652539316941
MAE of Stacking-LR: 961.6758318716319
Predict Stacking-LR…
三、结果解读
从模型结果可以看出,Stacking融合之后模型的MAE达到了914.5652539316941。这相较于前⽂使⽤的单个XGBoost模型,平均绝对误差有所减⼩。说明Stacking在⼀定程度上提升了模型精度。
在验证集中,MAE=961.6758318716319,略⼤于训练集上的MAE。说明模型存在轻微的过拟合,这也是后⾯模型改进的⼀个⽅向。

本文发布于:2023-05-20 09:31:03,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/90/115639.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:模型   学习   结果   训练   融合   过程   预测   验证
相关文章
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图