机器学习⼯具WEKA使⽤总结,包括算法选择、参数优化、属性
选择
⼀、属性选择:
1、理论知识:
见以下两篇⽂章:
数据挖掘中的特征选择算法综述及基于WEKA的性能⽐较_陈良龙
数据挖掘中约简技术与属性选择的研究_刘辉
2、weka中的属性选择
2.1评价策略(attribute evaluator)
总的可分为filter和wrapper⽅法,前者注重对单个属性进⾏评价,后者侧重对特征⼦集进⾏评价。
Wrapper⽅法有:CfsSubtEval
Filter⽅法有:CorrelationAttributeEval为什么长白头发
2.1.1 Wrapper⽅法:
(1)CfsSubtEval
根据属性⼦集中每⼀个特征的预测能⼒以及它们之间的关联性进⾏评估,单个特征预测能⼒强且特征⼦集内的相关性低的⼦集表现好。Evaluates the worth of a subt of attributes by considering the individual predictive ability of each feature along with the degree of redundancy between them.Subts of features that are highly correlated with the class while having low intercorrelation are preferred. For more information e:
M. A. Hall (1998). Correlation-bad Feature Subt Selection for Machine Learning. Hamilton, New Zealand.
(2)WrapperSubtEval
Wrapper⽅法中,⽤后续的学习算法嵌⼊到特征选择过程中,通过测试特征⼦集在此算法上的预测性能来决定其优劣,⽽极少关注特征⼦集中每个特征的预测性能。因此,并不要求最优特征⼦集中的每个特征都是最优的。
Evaluates attribute ts by using a learning scheme. Cross validation is ud to estimate the accuracy of the learning scheme for a t of attributes.
For more information e:
Ron Kohavi, George H. John (1997). Wrappers for feature subt lection. Artificial Intelligence. 97(1-2):273-324.
2.1.2 Filter⽅法:
对公司的祝福
如果选⽤此评价策略,则搜索策略必须⽤Ranker。
(1)CorrelationAttributeEval
根据单个属性和类别的相关性进⾏选择。
小户型沙发
Evaluates the worth of an attribute by measuring the correlation (Pearson's) between it and the class.
Nominal attributes are considered on a value by value basis by treating each value as an indicator. An overall correlation for a nominal attribute is arrived at via a weighted average.
(2)GainRatioAttributeEval
根据信息增益⽐选择属性。
Evaluates the worth of an attribute by measuring the gain ratio with respect to the class.
GainR(Class, Attribute) = (H(Class) - H(Class | Attribute)) / H(Attribute).
(3)InfoGainAttributeEval
根据信息增益选择属性。
Evaluates the worth of an attribute by measuring the information gain with respect to the class.
InfoGain(Class,Attribute) = H(Class) - H(Class | Attribute).
(4)OneRAttributeEval
根据OneR分类器评估属性。
Class for building and using a 1R classifier; in other words, us the minimum-error attribute for prediction, discretizing numeric attributes. For more information, e:
R.C. Holte (1993). Very simple classification rules perform well on most commonly ud datats. Machine Learning. 11:63-91.
(5)PrincipalComponents
主成分分析(PCA)。
Performs a principal components analysis and transformation of the data. U in conjunction with a Ranker arch. Dimensionality reduction is accomplished by choosing enough eigenvectors to account for some percentage of the variance in the original data---default 0.95 (95%). Attribute noi can be filtered by transforming to the PC space, eliminating some of the worst eigenvectors, and then transforming back to the original space.
(6)ReliefFAttributeEval
根据ReliefF值评估属性。
Evaluates the worth of an attribute by repeatedly sampling an instance and considering the value of the given attribute for the nearest instance of the same and different class. Can operate on both discrete and continuous class data.
For more information e:
Kenji Kira, Larry A. Rendell: A Practical Approach to Feature Selection. In: Ninth International Workshop on Machine Learning, 249-256, 1992.
Igor Kononenko: Estimating Attributes: Analysis and Extensions of RELIEF. In: European Conference on Machine Learning, 171-182, 1994.
Marko Robnik-Sikonja, Igor Kononenko: An adaptation of Relief for attribute estimation in regression. In: Fourteenth International Conference on Machine Learning, 296-304, 1997.
(7)SymmetricalUncertAttributeEval
根据属性的对称不确定性评估属性。
Evaluates the worth of an attribute by measuring the symmetrical uncertainty with respect to the class.
SymmU(Class, Attribute) = 2 * (H(Class) - H(Class | Attribute)) / H(Class) + H(Attribute).
2.2搜索策略(Search Method)
2.2.1和评价策略中的wrapper⽅法对应
(1)BestFirst
最好优先的搜索策略。是⼀种贪⼼搜索策略。
Searches the space of attribute subts by greedy hillclimbing augmented with a backtracking facility. Setting the number of concutive non-improving nodes allowed controls the level of backtracking done. Best first may start with the empty t of attributes and arch forward, or start with the full t of attributes and arch backward, or start at any point and arch in both directions (by considering all possible single attribute additions and deletions at a given point).
(2)ExhaustiveSearch
穷举搜索所有可能的属性⼦集。
Performs an exhaustive arch through the space of attribute subts starting from the empty t of attrubutes. Reports the best subt found.
(3)GeneticSearch
基于Goldberg在1989年提出的简单遗传算法进⾏的搜索。
高中化学目录Performs a arch using the simple genetic algorithm described in Goldberg (1989).
For more information e:
David E. Goldberg (1989). Genetic algorithms in arch, optimization and machine learning. Addison-Wesley.
(4)GreedyStepwi
向前或向后的单步搜索。
Performs a greedy forward or backward arch through the space of attribute subts. May start with no/all attributes or from an arbitrary point in the space. Stops when the addition/deletion of any remaining attributes results in a decrea in evaluation. Can also produce a ranked list of attributes by traversing the space from one side to the other and recording the order that attributes are lected.
(5)RandomSearch
随机搜索。
Performs a Random arch in the space of attribute subts. If no start t is supplied, Random arch starts from a random point and reports the best subt found. If a start t is supplied, Random arches randomly for subts that are as good or better than the start point with the same or or fewer attributes. Using RandomSearch in conjunction with a start t containing all attributes equates to the LVF algorithm of Liu and Setiono (ICML-96).
For more information e:
H. Liu, R. Setiono: A probabilistic approach to feature lection - A filter solution. In: 13th International Conference on Machine Learning, 319-327, 1996.
寻相(6)RankSearch
⽤⼀个评估器计算属性判据值并排序。
Us an attribute/subt evaluator to rank all attributes. If a subt evaluator is specified, then a forward lection arch is ud to generate a ranked list. From the ranked list of attributes, subts of increasing size are evaluated, ie. The best attribute, the best attribute plus the next best attribute,
< The best attribute t is reported. RankSearch is linear in the number of attributes if a simple attribute evaluator is ud such as GainRatioAttributeEval. For more information e:
Mark Hall, Geoffrey Holmes (2003). Benchmarking attribute lection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering. 15(6):1437-1447.
2.2.2和评价策略中的filter⽅法对应
企业员工培训总结(1)Ranker :
对属性的判据值进⾏排序,和评价策略中的Filter⽅法结合使⽤。
Ranks attributes by their individual evaluations. U in conjunction with attribute evaluators (ReliefF, GainRatio, Entropy etc).
3、我的总结
针对某⼀算法及其参数设置,选⽤WrapperSubtEval评价策略和ExhaustiveSearch搜索策略,能够保证找到适合该算法即参数设置的最优属性⼦集。但其计算时间较长,并且随着属性个数的增多成指数级增长。
⼆、参数优化
针对某⼀特定算法,进⾏参数优化有以下三种⽅法:CVParameterSelection、GridSearch、MultiSearch。
1、CVParameterSelection
采⽤交叉验证的⽅法,对参数进⾏优化选择。
优点:
可以对任意数量的参数进⾏优化选择;
缺点:
①参数太多时,可能造成参数组合数量的爆炸性增长;②只能优化分类器的直接参数,不能优化其嵌⼊的参数,⽐如可以优化
weka.classifiers.functions.SMO⾥的参数C,但不能优化a.FilteredClassifier中的嵌⼊算法
weka.classifiers.functions.SMO⾥的参数C。
⽰例:优化J48算法的置信系数C
①载数据集;
②选择 a.CVParameterSelection作为分类器;
③选择s.J48作为②的基分类器;
④参数优化的字符串:C 0.1 0.5 5(优化参数C,范围是从0.1⾄0.5,步距是0.5/5=0.1)
⑤进⾏运算,得到如下图所⽰的结果(最后⼀⾏是优化的参数):
2、GridSearch
采⽤⽹格搜索,⽽不是试验所有的参数组合,进⾏参数的选择。
适合失恋听的歌
优点:
①理论上,相同的优化范围及设置,GridSearch应该⽐CVParameterSelection要快;②不限于优化分
类器的直接参数,也可以优化其嵌⼊算法的参数;③优化的2个参数中,其中⼀个可以是filter⾥的参数,所以需要在属性表达式中加前缀classifier.或filter.;④⽀持范围的⾃动扩展。缺点:
最多优化2个参数。
⽰例:优化以RBFKernel为核的SMO算法的参数
①加载数据集;
②选择GridSearch为Classifier;
③选择GridSearch的Classifier为weka.classifiers.functions.SMO ,kernel为weka.classifiers.functions.supportVector.RBFKernel。
④设置X参数。XProperty:classifier.c,XMin:1,XMax:16,XStep:1,XExpression:I。这的意思是:选择参数c,其范围是1到16,步长1。
⑤设置Y参数。YProperty:"classifier.kernel.gamma,YMin:-5,YMax:2,YStep:1,YBa:10,YExpression: pow(BASE,I)。这的意思是:选择参数kernel.gamma,其范围是10-5,10-4, (102)
⑥输出如下(最后⼀⾏是优化的参数):
3、MultiSearch
最吃货网
类似⽹格参数,但更普通更简单。
优点:
①不限于优化分类器的直接参数,也可以优化其嵌⼊算法的参数或filter的参数;②⽀持任意数量的参数优化;
缺点:
不⽀持⾃动扩展边界。
4、我的总结
①如果需要优化的参数不⼤于2个,选⽤gridarch,并且设置边界⾃动扩展;
②如果需要优化的参数⼤于2个,选⽤MultiSearch;
③如果优化分类器的直接参数,且参数数量不⼤于2个,也可以考虑⽤CVParameterSelection。
三、meta-Weka的算法
1、算法及描述
LocalWeightedLearning:局部加权学习;
AdaBoostM1:AdaBoost⽅法;
AdditiveRegression:GBRT(Grandient Boosting Regression Tree)梯度下降回归树。是属于Boosting算法,也是将多分类器进⾏级联训练,后⼀级的分类器则更多关注前⾯所有分类器预测结果与实际结果的残差,在这个残差上训练新的分类器,最终预测时将残差级联相加。
AttributeSelectedClassifier:将属性选择和分类器集成设置,先进⾏属性选择、再进⾏分类或回归;
Bagging:bagging⽅法;
ClassificationViaRegression:⽤回归的⽅法进⾏分类;
LogitBoost:是⼀种boosting算法,⽤回归进⾏分类。
MultiClassClassifier:使⽤两类分类器进⾏多类分类的⽅法。
RondomCommittee:随机化基分类器结果的平均值作为结果。
RandomSubspace;
FilterClassifier:将过滤器和分类器集成设置,先进⾏过滤、再进⾏分类或回归;(autoweka中没有)
MultiScheme:在所指定的多个分类器或多种参数配置中,选择最优的⼀个。(犹如experiment)(autoweka中没有)RandomizableFitteredClassifier:是FilterClassifier的变体,对于RondomCommittee的enmble classifiers是很有⽤的。要求不管是filter还是classifier都⽀持randomizable接⼝。(autoweka中没有)
Vote;
Stacking。
2、我的总结
Meta提供了很多以基分类器为输⼊的⽅法,其中:
①AdaBoostM1和Bagging⽅法是常⽤的meta⽅法;
②MultiScheme和experiment的功能类似;
③AttributeSelectedClassifier将属性选择和分类器集成设置,⽐较⽅便。
四、Auto-WEKA
Auto-WEKA⽀持属性、算法、参数的⾃动选择。
1、属性选择
属性选择作为数据的预处理步骤,在分类或回归前运⾏。
Auto-WEKA中属性选择的评价策略和搜索策略如上图所⽰。其中标*的是搜索策略,其余的是评价策略。可见,不包括WrapperSubtEval 评价策略和ExhaustiveSearch搜索策略组合的完备搜索。
2、算法选择
上图是Auto-WEKA中包括的分类或回归算法,共39种:27种基分类器、10种meta分类器、2种enmble分类器。其中,meta分类器可以选任意⼀种基分类器作为输⼊,enmble分类器可以使⽤最多5种基分类器作为输⼊。
27种基分类器包括:
Bayes⾥的3种:BayesNet、NaiveBayes、和NaiveBayesMultinomial;