数据挖掘实战(四):模型评估
⽂章⽬录
任务要求:记录5个模型(逻辑回归、SVM、决策树、随机森林、XGBoost)关于accuracy、precision,recall和F1-score、auc值的评分表格,并画出ROC曲线
⼀、模型评估⽅法
举例⼀:
假设有1000个病⼈的数据,要对1000个病⼈进⾏分类,有哪些是癌症的?哪些不是患有癌症的?假设其中有990个⼈不患癌症,10个⼈是患癌症。
(1)将1000个样本输⼊到模型中,精度=990/1000=99%
(2)召回率recall:表⽰根据⽬标制定衡量的标准,在⼀个只有10⼈的样本⾥,查到2个⼈患癌症,那召回率为0.2
举例⼆:
(1)准确率:A=(TP+TN)/(TP+TN+FP+FN)
(2)精确率:P=TP/(TP+FP)
(3)召回率:R=TP/(TP+FN)
(3) F1-score:F1 = 准确率 * 召回率 * 2 / (准确率 + 召回率) (F 值即为正确率和召回率的调和平均值)
先进事迹简介⼆、代码实现
#导⼊模型
ics import recall_score,precision_score,f1_score,accuracy_score,roc_curve,roc_auc_score import numpy as np
def plot_roc_curve(fpr_train, tpr_train,fpr_test,tpr_test, name=None):
姜桓楚
plt.plot(fpr_train, tpr_train, linewidth=2,c='r',label='train')
plt.plot(fpr_test, tpr_test, linewidth=2,c='b',label='test')
plt.plot([0, 1], [0, 1], 'k--')
plt.axis([0, 1, 0, 1])
plt.xlabel('Fal Positive Rate')
plt.ylabel('True Positive Rate')
plt.title(name)
plt.legend(loc='best')
plt.show()
def metrics(models,X_train_scaled,X_test_scaled,y_train,y_test):
results_test = pd.DataFrame(columns=['recall_score','precision_score','f1_score','accuracy_score','AUC']) results_train = pd.DataFrame(columns=['recall_score','precision_score','f1_score','accuracy_score','AUC']) for model in models:
name = str(model)
result_train = []
result_test = []
model = models[model]
英语专业毕业论文model.fit(X_train_scaled,y_train)
y_pre_test = model.predict(X_test_scaled)
红薯煮多长时间熟y_pre_train = model.predict(X_train_scaled)
result_test.append(round(recall_score(y_pre_test,y_test),2))小学后勤管理制度
result_test.append(round(precision_score(y_pre_test,y_test),2))
result_test.append(round(f1_score(y_pre_test,y_test),2))
去牙渍的方法result_test.append(round(accuracy_score(y_pre_test,y_test),2))
普通话宣传周result_test.append(round(roc_auc_score(y_pre_test,y_test),2))
result_train.append(round(recall_score(y_pre_train,y_train),2))
result_train.append(round(precision_score(y_pre_train,y_train),2))
result_train.append(round(f1_score(y_pre_train,y_train),2))
result_train.append(round(accuracy_score(y_pre_train,y_train),2))
result_train.append(round(roc_auc_score(y_pre_train,y_train),2))
fpr_train, tpr_train, thresholds_train = roc_curve(y_pre_train,y_train)
fpr_test, tpr_test, thresholds_test = roc_curve(y_pre_test,y_test)
plot_roc_curve(fpr_train, tpr_train,fpr_test,tpr_test,name)
results_test.loc[name] = result_test
results_train.loc[name] = result_train
return results_test,results_train
results_test,results_train = metrics(models,X_train_scaled,X_test_scaled,y_train,y_test)
结果如下:
鞍山景点