lasso特征选择python_[机器学习]特征选择简明指南

更新时间:2023-06-17 09:17:59 阅读：评论：0

介绍

数据⼯程项⽬往往严格遵循着riro (rubbish in, rubbish out) 的原则，所以我们经常说数据预处理是数据⼯程师或者数据科学家80%的⼯作，它保证了数据原材料的质量。⽽特征⼯程⼜⾄少占据了数据预处理的半壁江⼭，在实际的数据⼯程⼯作中，⽆论是出于解释数据或是防⽌过拟合的⽬的，特征选择都是很常见的⼯作。如何从成百上千个特征中发现其中哪些对结果最具影响，进⽽利⽤它们构建可靠的机器学习算法是特征选择⼯作的中⼼内容。在多次反复的⼯作后，结合书本，kaggle等线上资源以及与其他数据⼯程师的讨论，我决定写⼀篇简明的总结梳理特征选择⼯作的常见⽅法以及python实现。甜品店的发展前景

总的来说，特征选择可以⾛两条路：特征过滤(Filter methods): 不需要结合特定的算法，简单快速，常⽤于预处理

纯真无邪

包装筛选(Wrapper methods): 将特征选择包装在某个算法内，常⽤于学习阶段

在scikit-learn环境中，特征选择拥有独⽴的包sklearn.feature_lection, 包含了在预处理和学习阶段不同层级的特征选择算法。

大连医科大学分数线A. 特征过滤(Filter methods)

(1) ⽅差阈(Variance Treshhold)

最为简单的特征选择⽅式之⼀，去除掉所有⽅差⼩于设定值的特征。

dnf装备合成>牙线在sklearn中实现：

from sklearn.feature_lection import VarianceThresholdVarianceThreshold is a simple baline approach to feature lection. It removes all features who variance doesn’t meet some threshold. By default, it removes all zero-variance features, i.e. features that have the same value in all samples.

(2) 单变量特征选择 (Univariate feature lection)

基于单变量假设检验的特征选择，⽐如卡⽅检验(这⾥有⼀篇很好的博⽂⽤于回顾)是检测两变量是否相关的常⽤⼿段，那么就可以很⾃然的利⽤chi-square值来做降维，保留相关程度⼤的变量。Univariate feature lection works by lecting the best features bad on univariate statistical tests. It can be en as a preprocessing step to an estimator.

X_new = SelectKBest(chi2, k=2).fit_transform(X, y)

B. 包装筛选(Wrapper methods)

包装筛选往往利⽤⼀些在训练过程中可以计算各个特征对应权重的算法来达到选择特征的⽬的。在sklearn中有⼀个专门的模块SelectFromModel 来帮助我们实现这个过程。SelectFromModel is a meta-transformer that can be ud along with any estimator that has a coef_ or feature_importances_ attribute after fitting. The features are considered unimportant and removed, if the corresponding coef_ or feature_importances_ values are below the provided threshold parameter. Apart from specifying the threshold numerically, there are build-in heuristics for finding a threshold using a string argument. Available heuristics are “mean”, “median” and float multiples of the like “0.1*mean”.

(1)利⽤Lasso进⾏特征选择

在介绍利⽤Lasso进⾏特征选择之前，简要介绍⼀下什么是Lasso：

对于⼀个线性回归问题

基本的任务是估计参数，使得十大网络小说

最⼩，这就是经典的 Ordinary Linear Square (OLS) 问题。

但在实际的⼯作中，仅仅使⽤OLS进⾏回归计算很容易造成过拟合，噪声得到了过分的关注，训练数

据的微⼩差异可能带来巨⼤的模型差异(主要是样本的共线性容易使矩阵成为对扰动敏感的病态阵，从⽽造成回归系数解析解的不稳定，要更详细的探究可以参考这⾥)。

怎么挽回女朋友

为了矫正过拟合，我们常使⽤带有正则项的cost function，其中使⽤L1正则的表达式则为Lasso⽅法：

Lasso⽅法下解出的参数常常具有稀疏的特征，即很多特征对应的参数会为零，这就使得特征选择成为可能：我们可以训练⼀个Lasso模

型，然后将系数为零的特征去除。

在实际的⼯作中，Lasso的参数lambda越⼤，参数的解越稀疏，选出的特征越少。那么如何确定使⽤多⼤的lambda？⼀个⽐较稳妥地⽅案是对于⼀系列lambda，⽤交叉验证计算模型的rm，然后选择rm的极⼩值点 (Kaggle上有⼀个很好的例⼦)。Linear models penalized with the L1 norm have spar solutions: many of their estimated coefficients are zero. When the goal is to reduce the dimensionality of the data to u with another classifier, they can be ud along with feature_lection.SelectFromModel to lect the non-zero coefficients. With Lasso, the higher the alpha parameter, the fewer features lected.

在sk-learn中的实现参看这⾥。

(2)基于决策树的特征选择

利⽤决策树中深度较浅的节点对应的特征提供信息较多(可以直观的理解为这个特征将更多的样本区分开)这⼀特性，许多基于决策树的算法，如随机森林也可以在结果中直接给出feature_importances属性。其主要思想是训练⼀系列不同的决策树模型，在每⼀棵树中使⽤特征集的某⼀个随机的⼦集(使⽤bootstrap等⽅法抽样)，最后统计每个特征出现的次数，深度，分离的样本量以及模型的准确率等给出特征的权重值。设定⼀个阈值，我们便可以使⽤这类基于决策树的算法进⾏特征选择。Tree-bad estimators (e module and forest of trees in ble module) can be ud to compute feature importances, which in turn can be ud to discard irrelevant features (when coupled with the sklearn.feature_lection.SelectFromModel meta-transformer).

在sk-learn中的实现参看这⾥。

思念一个人

⼩结

这篇短⽂简明的介绍了部分常⽤的特征处理⽅法，应该提出的是，除了feature lection，feature transformation，包括PCA等降维⽅法也可以达到减少特征数量，抑制过拟合的⽬的。

其他参考资料：

本文发布于:2023-06-17 09:17:59，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1042287.html

上一篇：技术数据降维知识40题（附答案）

下一篇：2012_CVPR_See All by Looking at A FewSpar Modeling for Finding Reprentative Objects