Python-matplotlib-决策曲线分析（DecisionCurveAnalysis）

更新时间:2023-08-08 01:52:13 阅读：评论：0

Python-matplotlib-决策曲线分析（DecisionCurveAnalysis ）

⽂章⽬录

⼀、决策曲线分析概念

预测模型（predictive models）被⼴泛地应⽤于诊断（diagnosis）或预后预测（prognosis）。通常，这些模型的价值是通过统计学指标如敏感性、特异性、ROC曲线下⾯积、校准度来评估的，⽽这些指标⽆法考虑特定模型的临床实⽤性（clinical utility）。决策曲线分析（decision curve analysis, DCA）是衡量临床实⽤性的⼀种⼴泛使⽤的⽅法。

1. 阈值概率

⼀个预测模型的输出通常为介于0到1之间的⼀个值（pi），根据事前确定的阈值概率（cutoff value, probability threshold, pt），当pi >pt时，判断为阳性；当pi < pt时，判断为阴性。因此，患者被分成了预测阳性⽽施加⼲预和预测阴性⽽不施加⼲预的两组。在预测阳性组中，存在着真阳性病⼈（TP）和假阳性病⼈（FP）。显然，治疗真阳性病⼈会带来受益（benefits），⽽治疗假阳性病⼈会造成伤害（harms）。选择不同的阈值概率，会改变TP和FP的⽐值，从⽽受益和伤害的改变。

2. 净获益

为了同时考虑受益和伤害，决策曲线分析中，将模型的临床效⽤量化为净获益（net benefit）。

对于⼀个总样本量为 n ，阈值为pt的诊断试验，可以画出四格表：

⾦标准（+）

⾦标准（-）模型（+）

TP FP 模型（-）FN TN

阳性组的净获益为：

阴性组的净获益为：

决策曲线定义了这样⼀种关系：

因此，可以计算得到treat all策略（即⽆论预测模型结果如何，所以病⼈都进⾏⼲预）的净获益为：

对于treat none策略，所有病⼈⽆论模型结果如果，都不进⾏⼲预，其净获益恒为0。

net benefit treated =−n TP ∗n FP安卓机顶盒

()

1−p t p t net benefit untreated =−n TN ∗n FN

()

p t 1−p t =()1−p t p t net benefit treated −net benefit treat all net benefit untreated

岜夯鸡net benefit treat all =−n TP +FN ∗n

美智子

TN +FP

()1−p t p t

所谓决策曲线，即是以不同的probability threshold为横坐标，其所对应的net benefit为纵坐标，画出的曲线。

理论成⽴，实践开始！

⼆、matplotlib实现

绘制模型的决策曲线，我们只需要模型输出的每个样本的预测概率（y_pred_score）和每个样本真实的分类（y_label）。1. 计算模型带来的净获益

模型带来的获益即是模型预测出阳性的部分，因为只有预测阳性的部分会施加和原本不同的⼲预，因此net benefit treated即为net benefit of model：

def calculate_net_benefit_model(thresh_group, y_pred_score, y_label):

net_benefit_model = np.array([])

for thresh in thresh_group:女人胡子

y_pred_label = y_pred_score > thresh

tn, fp, fn, tp = confusion_matrix(y_label, y_pred_label).ravel()

n =len(y_label)

net_benefit =(tp / n)-(fp / n)*(thresh /(1- thresh))

net_benefit_model = np.append(net_benefit_model, net_benefit)

return net_benefit_model

2. 计算treat all策略带来的净获益

def calculate_net_benefit_all(thresh_group, y_label):

net_benefit_all = np.array([])

tn, fp, fn, tp = confusion_matrix(y_label, y_label).ravel()

total = tp + tn

for thresh in thresh_group:

net_benefit =(tp / total)-(tn / total)*(thresh /(1- thresh))

net_benefit_all = np.append(net_benefit_all, net_benefit)

return net_benefit_all

3. 绘制决策曲线

def plot_DCA(ax, thresh_group, net_benefit_model, net_benefit_all):

农业项目招商#Plot

ax.plot(thresh_group, net_benefit_model, color ='crimson', label ='Model')

ax.plot(thresh_group, net_benefit_all, color ='black',label ='Treat all')

ax.plot((0,1),(0,0), color ='black', linestyle =':', label ='Treat none')

#Fill，显⽰出模型较于treat all和treat none好的部分

y2 = np.maximum(net_benefit_all,0)

y1 = np.maximum(net_benefit_model, y2)

ax.fill_between(thresh_group, y1, y2, color ='crimson', alpha =0.2)

#Figure Configuration，美化⼀下细节

ax.t_xlim(0,1)

ax.t_ylim(net_benefit_model.min()-0.15, net_benefit_model.max()+0.15)#adjustify the y axis limitation

ax.t_xlabel(

xlabel ='Threshold Probability',

fontdict={'family':'Times New Roman','fontsize':15}

)

ax.t_ylabel(

ylabel ='Net Benefit',

fontdict={'family':'Times New Roman','fontsize':15}

)

ax.spines['right'].t_color((0.8,0.8,0.8))

ax.spines['top'].t_color((0.8,0.8,0.8))

ax.legend(loc ='upper right')

return ax

三、完整代码

import numpy as np

import matplotlib.pyplot as plt

ics import confusion_matrix

def calculate_net_benefit_model(thresh_group, y_pred_score, y_label):

net_benefit_model = np.array([])

for thresh in thresh_group:

y_pred_label = y_pred_score > thresh

tn, fp, fn, tp = confusion_matrix(y_label, y_pred_label).ravel()

n =len(y_label)

net_benefit =(tp / n)-(fp / n)*(thresh /(1- thresh))

net_benefit_model = np.append(net_benefit_model, net_benefit)

return net_benefit_model

def calculate_net_benefit_all(thresh_group, y_label):

net_benefit_all = np.array([])

tn, fp, fn, tp = confusion_matrix(y_label, y_label).ravel()

total = tp + tn

for thresh in thresh_group:

net_benefit =(tp / total)-(tn / total)*(thresh /(1- thresh))

net_benefit_all = np.append(net_benefit_all, net_benefit)

return net_benefit_all

def plot_DCA(ax, thresh_group, net_benefit_model, net_benefit_all):

#Plot

ax.plot(thresh_group, net_benefit_model, color ='crimson', label ='Model')

ax.plot(thresh_group, net_benefit_all, color ='black',label ='Treat all')

ax.plot((0,1),(0,0), color ='black', linestyle =':', label ='Treat none')

#Fill，显⽰出模型较于treat all和treat none好的部分

y2 = np.maximum(net_benefit_all,0)

y1 = np.maximum(net_benefit_model, y2)

ax.fill_between(thresh_group, y1, y2, color ='crimson', alpha =0.2)

#Figure Configuration，美化⼀下细节

ax.t_xlim(0,1)

ax.t_ylim(net_benefit_model.min()-0.15, net_benefit_model.max()+0.15)#adjustify the y axis limitation

ax.t_xlabel(

xlabel ='Threshold Probability',

fontdict={'family':'Times New Roman','fontsize':15}

)

ax.t_ylabel(

ylabel ='Net Benefit',

fontdict={'family':'Times New Roman','fontsize':15}

)

字幕英文

ax.spines['right'].t_color((0.8,0.8,0.8))

ax.spines['top'].t_color((0.8,0.8,0.8))

ax.legend(loc ='upper right')

return ax

if __name__ =='__main__':

#构造⼀个分类效果不是很好的模型

y_pred_score = np.arange(0,1,0.001)

y_label = np.array([1]*25+[0]*25+[0]*450+[1]*25+[0]*25+[1]*25+[0]*25+[1]*25+[0]*25+[1]*25+[0]*25+[1]*25+[0]*25+[1]*25+[0]*25+[1]*25+ [0]*50+[1]*125)

thresh_group = np.arange(0,1,0.01)

net_benefit_model = decision_curve_analysis(thresh_group, y_pred_score, y_label)

net_benefit_all = calculate_net_benefit_all(thresh_group, y_label)

fig, ax = plt.subplots()

ax = plot_DCA(ax, thresh_group, net_benefit_model, net_benefit_all)

# fig.savefig('fig1.png', dpi = 300)

plt.show()

四、拓展

由于存在抽样误差，单次建模的结果可能存在偏倚。通常情况下，可以采⽤bootstrapping或者k折交叉验证的⽅法来对净获益进⾏校正。同时，还可以⽤这种⽅法获得净获益的置信区间。

1. bootstrapping法校正净获益

1. ⽤原始数据进⾏拟合建模，获得⼀组净获益（未校正的净获益）

2. 从原始数据中采取有放回的随机抽样，得到⼀组⼦数据，⽤这批数据拟合建模

一周计划3. 计算步骤2的模型在步骤2数据集中不同阈值概率的净获益

4. 计算步骤2的模型在原始数据集中不同阈值概率的净获益

5. 计算步骤3和步骤4中两组净获益的差值

6. 重复步骤2-5 n次（通常n = 1000），并可计算n次净效益的差值的平均值

7. ⽤步骤1中未校正的净效益减去步骤6中获得的平均差，即为修正净获益

2. k折交叉验证法校正净获益

1. ⽤原始数据进⾏拟合建模，获得⼀组净获益（未校正的净获益）

2. 将原始样本随机分成k个（例如，k = 5）⼤⼩相等的⼦集

3. 取出⼀个⼦集作为验证集，其他⼦集（k-1个）作为训练集

4. 使⽤训练集拟合建模，再⽤模型来预测验证集，得到预测概率pi

水煮耗儿鱼

5. 重复步骤2-4 k次，所有样本都获得了pi，再计算整个原始数据的净获益

6. 重复分组n次，重复步骤2-5，可计算n次净获益的平均值，即为修正净获益

3. 计算净获益的置信区间

原理：根据bootstrapping法或k折交叉验证法得到的净获益结果，可以根据中⼼极限定理通过正态近似的⽅法求得置信区间

本文发布于:2023-08-08 01:52:13，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1106665.html

上一篇：读历史书的益处英语作文

下一篇：外贸开发邮件模板

标签：模型获益预测曲线阳性概率决策阈值

留言与评论（共有 0 条评论）