随机森林模型预测和交叉验证
使⽤随机森林模型来训练和交叉验证数据集。
数据矩阵:allmatrix
标签:target
randomforesttraining
preliminarytrainning
>>>leimportRandomForestClassifier
>>>_lectionimporttrain_test_split
>>>_lectionimportcross_val_score
>>>simportconfusion_matrixasCM
>>>simportaccuracy_scoreasACCS
>>>Xtrain,Xtest,Ytrain,Ytest=train_test_split(allmatrix,target,test_size=0.3,random_state=420)
>>>rfc=RandomForestClassifier(n_estimators=100,random_state=90,n_jobs=-1)
>>>rfc=(Xtrain,Ytrain)
>>>pred_rfc=t(Xtest)
>>>score=ACCS(Ytest,pred_rfc)
>>>print("RFC1sttrainingtestscore:{}".format(score))
RFC1sttrainingtestscore:0.84704
>>>print("Trainingdatascore:{}".format((Xtrain,Ytrain)))
Trainingdatascore:0.999946644308091
>>>cm_rfc=CM(Ytest,pred_rfc)
>>>cm_rfc
array([[23340,4507],
[4389,23991]])
CV=5cross_val交叉验证
>>>score_val=cross_val_score(rfc,allmatrix,target,cv=5)
>>>print("RFCn_estimator=100cv=5corssvalidationscore:{}".format(score_val))
RFCn_estimator=100cv=5corssvalidationscore:[0.829989860.846307760.862314590.811412870.82240423]
>>>score_()
0.8344858607420459
增加n_estimators=200
>>>score=(Xtest,Ytest)
>>>print("RFCn_estimator=200trainingtestscore:{}".format(score))
RFCn_estimator=200trainingtestscore:0.8479
使⽤criterion=‘entropy’
>>>rfc=RandomForestClassifier(n_estimators=100,criterion='entropy',random_state=90,n_jobs=-1)
>>>rfc=(Xtrain,Ytrain)
>>>(Xtest,Ytest)
0.8427445888985718
>>>entr_val=cross_val_score(rfc,allmatrix,target,cv=5)
>>>entr_val
array([0.83004322,0.84561413,0.8643688,0.81197311,0.81944296])
>>>entr_()
0.834288442607874
准确率低于默认参数时的模型
调整max-depth
>>>scorel=[]
>>>foriinrange(2,22,2):
...rfc=RandomForestClassifier(n_estimators=130,
...n_jobs=-1,
...max_depth=i
...,random_state=90)
...score=cross_val_score(rfc,allmatrix,target,cv=5).mean()
...(score)
...
>>>print("max-depthoptimizationanditsvalue")
max-depthoptimizationanditsvalue
>>>print(max(scorel),(((max(scorel))+1)*2))
0.83433648216964920
影响不⼤
调整max-fetures
now()
score2=[]
features=range(35,71,5)
>>>foriinfeatures:
...rfc=RandomForestClassifier(n_estimators=100,n_jobs=-1,max_features=i,random_state=90)
...score=cross_val_score(rfc,allmatrix,target,cv=5).mean()
...(score)
>>>print(max(score2),(features[(max(score2))]))
0.8365165
使⽤标准化后的数据
模型效果⽆明显提升
>>>rfc=RandomForestClassifier(n_estimators=100,random_state=90,n_jobs=-1)
>>>rfc=RandomForestClassifier(n_estimators=100,random_state=90,n_jobs=-1)
>>>Xstd_val=cross_val_score(rfc,X_std,target,cv=5)
>>>Xstd_val
array([0.83001654,0.84582755,0.86226123,0.81221321,0.8228044])
>>>Xstd_()
0.834624586313739
使⽤归⼀化的数据
模型效果⽆明显提升
>>>cessingimportMinMaxScaler
>>>scaler=MinMaxScaler()
>>>X_mm=_transform(allmatrix)
/homes/xiaohuizou/anaconda3/lib/python3.7/site-packages/sklearn/utils/:595:DataConversionWarning:Datawithinputdtypeint16wasconvert
edtofloat64byMinMaxScaler.
(msg,DataConversionWarning)
>>>Xstd_mm=cross_val_score(rfc,X_mm,target,cv=5)
>>>Xstd_mm
array([0.82974977,0.84718813,0.86228791,0.81135951,0.82240423])
>>>Xstd_()
0.8345979111111592
结论
使⽤随机森林模型最多只能使准确率达到84%左右,⽆法进⼀步提升。如果想要更好的表现,需要使⽤其他模型
本文发布于:2022-11-12 12:41:44,感谢您对本站的认可!
本文链接:http://www.wtabcd.cn/fanwen/fan/88/4592.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
留言与评论(共有 0 条评论) |