首页 > 英文翻译

xgboost特征重要性指标：weight,gain,cover

更新时间:2023-05-20 08:42:52 阅读：评论：0

个人陈述

旋转木马英文xgboost特征重要性指标：weight,gain,cover 官⽅解释

Python中的xgboost可以通过get_fscore获取特征重要性，先看看官⽅对于这个⽅法的:

get_score(fmap=’’, importance_type=‘weight’)

Get feature importance of each feature. Importance type can be defined as:

‘weight’: the number of times a feature is ud to split the data across all trees.

‘gain’: the average gain across all splits the feature is ud in.

‘cover’: the average coverage across all splits the feature is ud in.

‘total_gain’: the total gain across all splits the feature is ud in.

‘total_cover’: the total coverage across all splits the feature is ud in.

看释义不直观，下⾯通过训练⼀个简单的模型，输出这些重要性指标，再结合释义进⾏解释。

代码实践

⾸先构造10个样例的样本，每个样例有两维特征，标签为0或1，⼆分类问题:

维护英语import numpy as np

sample_num =10

feature_num =2

np.random.ed(0)

data = np.random.randn(sample_num, feature_num)

np.random.ed(0)

label = np.random.randint(0,2, sample_num)

输出data和label:

# data:

array([[ 1.76405235, 0.40015721],

介绍英文

[ 0.97873798, 2.2408932 ],

[ 1.86755799, -0.97727788],

[ 0.95008842, -0.15135721],

[-0.10321885, 0.4105985 ],

[ 0.14404357, 1.45427351],

[ 0.76103773, 0.12167502],

笨蛋的英文

[ 0.44386323, 0.33367433],

[ 1.49407907, -0.20515826],

[ 0.3130677 , -0.85409574]])

# label:

array([0, 1, 1, 0, 1, 1, 1, 1, 1, 1])

训练，这⾥为了便于下⾯计算，将树深度设为3(‘max_depth’: 3)，只⽤⼀棵树(num_boost_round=1):

import xgboost as xgb

train_data = xgb.DMatrix(data, label=label)

params ={'max_depth':3}

bst = ain(params, train_data, num_boost_round=1)繁文缛节什么意思

输出重要性指标:

for importance_type in('weight','gain','cover','total_gain','total_cover'):

print('%s: '% importance_type, _score(importance_type=importance_type))

moveahead

结果:

weight: {'f0': 1, 'f1': 2}

gain: {'f0': 0.265151441, 'f1': 0.375000015} cover: {'f0': 10.0, 'f1': 4.0}

kelty

total_gain: {'f0': 0.265151441, 'f1': 0.75000003} total_cover: {'f0': 10.0, 'f1': 8.0}

画出唯⼀的⼀棵树图:

<_graphviz(bst, num_trees=0)

mmmb

下⾯就结合这张图，解释下各指标含义:

1. weight: {‘f0’: 1, ‘f1’: 2}

在所有树中，某特征被⽤来分裂节点的次数，在本例中，可见分裂第1个节点时⽤到f0，分裂第2，3个节点时⽤到f1，所以

weight_f0 = 1, weight_f1 = 2。

2. total_cover: {‘f0’: 10.0, ‘f1’: 8.0}落实英文

第1个节点，f0被⽤来对所有10个样例进⾏分裂，之后的节点中f0没再被⽤到，所以f0的total_cover为10.0，此时f0 >=

0.855563045的样例有5个，落⼊右⼦树；

第2个节点，f1被⽤来对上⾯落⼊右⼦树的5个样例进⾏分裂，其中f1 >= -0.178257734的样例有3个，落⼊右⼦树；

第3个节点，f1被⽤来对上⾯落⼊右⼦树的3个样例进⾏分裂。

总结起来，f0在第1个节点分裂了10个样例，所以total_cover_f0 = 10，f1在第2、3个节点分别⽤于分裂5、3个样例，所以total_cover_f1 = 5 + 3 = 8。total_cover表⽰在所有树中，某特征在每次分裂节点时处理(覆盖)的所有样例的数量。

3. cover: {‘f0’: 10.0, ‘f1’:

4.0}

cover = total_cover / weight，在本例中，cover_f0 = 10 / 1，cover_f1 = 8 / 2 = 4.

4. total_gain: {‘f0’: 0.265151441, ‘f1’: 0.75000003}

在所有树中，某特征在每次分裂节点时带来的总增益，如果⽤熵或基尼不纯衡量分裂前后的信息量分别为i0和i1，则增益为(i0 - i1)。

5. gain: {‘f0’: 0.265151441, ‘f1’: 0.375000015}

gain = total_gain / weight，在本例中，gain_f0 = 0.265151441 / 1，gain_f1 = 75000003 / 2 = 375000015.

在平时的使⽤中，多⽤total_gain来对特征重要性进⾏排序。

By The Way

构造xgboost分类器还有另外⼀种⽅式，这种⽅式类似于sklearn中的分类器，采⽤fit, transform形式训练模型:

from xgboost import XGBClassifier

cls = XGBClassifier(ba_score=0.5, booster='gbtree', colsample_bylevel=1,

colsample_bytree=1, gamma=0, learning_rate=0.07, max_delta_step=0,

max_depth=3, min_child_weight=1, missing=None, n_estimators=300,

n_jobs=1, nthread=None, objective='binary:logistic', random_state=0,

reg_alpha=0, reg_lambda=1, scale_pos_weight=1, ed=None,

silent=True, subsample=1)

# 训练模型

# cls.fit(data, label)

采⽤下⾯的⽅式获取特征重要性指标:

for importance_type in('weight','gain','cover','total_gain','total_cover'):

print('%s: '% importance_type, _booster().get_score(importance_type=importance_type))

本文发布于:2023-05-20 08:42:52，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/90/115587.html

上一篇：邹检验matlab,请问如何用STATA做邹检验

下一篇：关于OpenCV的上采样和下采样

标签：分裂特征重要性节点样例指标计算

留言与评论（共有 0 条评论）