首页 > 试题

trainning

更新时间:2022-11-12 12:41:44 阅读：评论：0

商务英语综合教程3王立非-用亚字取名寓意

2022年11月12日发(作者：2021清华大学录取分数线)

随机森林模型预测和交叉验证

使⽤随机森林模型来训练和交叉验证数据集。

数据矩阵：allmatrix

标签：target

randomforesttraining

preliminarytrainning

>>>leimportRandomForestClassifier

>>>_lectionimporttrain_test_split

>>>_lectionimportcross_val_score

>>>simportconfusion_matrixasCM

>>>simportaccuracy_scoreasACCS

>>>Xtrain,Xtest,Ytrain,Ytest=train_test_split(allmatrix,target,test_size=0.3,random_state=420)

>>>rfc=RandomForestClassifier(n_estimators=100,random_state=90,n_jobs=-1)

>>>rfc=(Xtrain,Ytrain)

>>>pred_rfc=t(Xtest)

>>>score=ACCS(Ytest,pred_rfc)

>>>print("RFC1sttrainingtestscore:{}".format(score))

RFC1sttrainingtestscore:0.84704

>>>print("Trainingdatascore:{}".format((Xtrain,Ytrain)))

Trainingdatascore:0.999946644308091

>>>cm_rfc=CM(Ytest,pred_rfc)

>>>cm_rfc

array([[23340,4507],

[4389,23991]])

CV=5cross_val交叉验证

>>>score_val=cross_val_score(rfc,allmatrix,target,cv=5)

>>>print("RFCn_estimator=100cv=5corssvalidationscore:{}".format(score_val))

RFCn_estimator=100cv=5corssvalidationscore:[0.829989860.846307760.862314590.811412870.82240423]

>>>score_()

0.8344858607420459

增加n_estimators=200

>>>score=(Xtest,Ytest)

>>>print("RFCn_estimator=200trainingtestscore:{}".format(score))

RFCn_estimator=200trainingtestscore:0.8479

使⽤criterion=‘entropy’

>>>rfc=RandomForestClassifier(n_estimators=100,criterion='entropy',random_state=90,n_jobs=-1)

>>>rfc=(Xtrain,Ytrain)

>>>(Xtest,Ytest)

0.8427445888985718

>>>entr_val=cross_val_score(rfc,allmatrix,target,cv=5)

>>>entr_val

array([0.83004322,0.84561413,0.8643688,0.81197311,0.81944296])

>>>entr_()

0.834288442607874

准确率低于默认参数时的模型

调整max-depth

>>>scorel=[]

>>>foriinrange(2,22,2):

...rfc=RandomForestClassifier(n_estimators=130,

...n_jobs=-1,

...max_depth=i

...,random_state=90)

...score=cross_val_score(rfc,allmatrix,target,cv=5).mean()

...(score)

...

>>>print("max-depthoptimizationanditsvalue")

max-depthoptimizationanditsvalue

>>>print(max(scorel),(((max(scorel))+1)*2))

0.83433648216964920

影响不⼤

调整max-fetures

now()

score2=[]

features=range(35,71,5)

>>>foriinfeatures:

...rfc=RandomForestClassifier(n_estimators=100,n_jobs=-1,max_features=i,random_state=90)

...score=cross_val_score(rfc,allmatrix,target,cv=5).mean()

...(score)

>>>print(max(score2),(features[(max(score2))]))

0.8365165

使⽤标准化后的数据

模型效果⽆明显提升

>>>rfc=RandomForestClassifier(n_estimators=100,random_state=90,n_jobs=-1)

>>>Xstd_val=cross_val_score(rfc,X_std,target,cv=5)

>>>Xstd_val

array([0.83001654,0.84582755,0.86226123,0.81221321,0.8228044])

>>>Xstd_()

0.834624586313739

使⽤归⼀化的数据

模型效果⽆明显提升

>>>cessingimportMinMaxScaler

>>>scaler=MinMaxScaler()

>>>X_mm=_transform(allmatrix)

/homes/xiaohuizou/anaconda3/lib/python3.7/site-packages/sklearn/utils/:595:DataConversionWarning:Datawithinputdtypeint16wasconvert

edtofloat64byMinMaxScaler.

(msg,DataConversionWarning)

>>>Xstd_mm=cross_val_score(rfc,X_mm,target,cv=5)

>>>Xstd_mm

array([0.82974977,0.84718813,0.86228791,0.81135951,0.82240423])

>>>Xstd_()

0.8345979111111592

结论

使⽤随机森林模型最多只能使准确率达到84%左右，⽆法进⼀步提升。如果想要更好的表现，需要使⽤其他模型

本文发布于:2022-11-12 12:41:44，感谢您对本站的认可！

本文链接：http://www.wtabcd.cn/fanwen/fan/88/4592.html

上一篇：简并度

下一篇：pgh

标签：trainning

留言与评论（共有 0 条评论）