首页 > 美文鉴赏

机器学习十大算法：CART

更新时间:2023-07-13 04:28:07 阅读：评论：0

Chapter10家常炖鱼

银行柜员年度个人总结CART:Classiﬁcation and Regression Trees Dan Steinberg

Contents

10.1Antecedents (180)

10.2Overview (181)

10.3A Running Example (181)

10.4The Algorithm Brieﬂy Stated (183)

股东借款协议10.5Splitting Rules (185)

10.6Prior Probabilities and Class Balancing (187)

10.7Missing Value Handling (189)

10.8Attribute Importance (190)

10.9Dynamic Feature Construction (191)

10.10Cost-Sensitive Learning (192)

10.11Stopping Rules,Pruning,Tree Sequences,and Tree Selection (193)

10.12Probability Trees (194)

10.13Theoretical Foundations (196)

10.14Post-CART Related Rearch (196)

10.15Software Availability (198)

10.16Exercis (198)

内存卡无法格式化怎么办References (199)

The1984monograph,“CART:Classiﬁcation and Regression Trees,”coauthored by Leo Breiman,Jerome Friedman,Richard Olshen,and Charles Stone(BFOS),repre-nts a major milestone in the evolution of artiﬁcial intelligence,machine learning, nonparametric statistics,and data mining.T

he work is important for the compre-hensiveness of its study of decision trees,the technical innovations it introduces,its sophisticated examples of tree-structured data analysis,and its authoritative treatment of large sample theory for trees.Since its publication the CART monograph has been cited some3000times according to the science and social science citation indexes; Google Scholar reports about8,450citations.CART citations can be found in almost any domain,with many appearing inﬁelds such as credit risk,targeted marketing,ﬁ-nancial markets modeling,electrical engineering,quality control,biology,chemistry, and clinical medical rearch.CART has also strongly inﬂuenced image compression

179

180CART:Classiﬁcation and Regression Trees

via tree-structured vector quantization.This brief account is intended to introduce CART basics,touching on the major themes treated in the CART monograph,and to encourage readers to return to the rich original source for technical details,discus-sions revealing the thought process of the authors,and examples of their analytical style.

10.1Antecedents

CART was not theﬁrst decision tree to be introduced to machine learning,although it is theﬁrst to be described with analytical rigor and supported by sophisticated statistics and probability theory.CART explicitly traces its ancestry to the auto-matic interaction detection(AID)tree of Morgan and Sonquist(1963),an automated recursive method for exploring relationships in data intended to mimic the itera-tive drill-downs typical of practicing survey data analysts.AID was introduced as a potentially uful tool without any theoretical foundation.This1960s-era work on trees was greeted with profound skepticism amidst evidence that AID could radically overﬁt the training data and encourage profoundly misleading conclusions(Einhorn, 1972;Doyle,1973),especially in smaller samples.By1973well-read statisticians were convinced that trees were a dead end;the conventional wisdom held that trees were dangerous and unreliable tools particularly becau of their lack of a theoretical foundation.Other rearchers,however,were not yet prepared to abandon the tree line of thinking.The work of Cover and Hart(1967)on the large sample properties of nearest neighbor(NN)classiﬁers was instrumental in persuading Richard Olshen and Jerome Friedman that trees had sufﬁcient theoretical merit to be worth pursu-ing.Olshen reasoned that if NN classiﬁers could reach the Cover and Hart bound on misclassiﬁcation error,then a similar result should be derivable for a suitably constructed tree becau the terminal nodes of trees could be viewed as dynami-cally constructed NN classiﬁers.Thus,the Cover and Hart NN rearch was the immediate sti

mulus that persuaded Olshen to investigate the asymptotic properties of trees.Coincidentally,Friedman’s algorithmic work on fast identiﬁcation of nearest neighbors via trees(Friedman,Bentley,and Finkel,1977)ud a recursive partition-ing mechanism that evolved into CART.One predecessor of CART appears in the 1975Stanford Linear Accelerator Center(SLAC)discussion paper(Friedman,1975), subquently published in a shorter form by Friedman(1977).While Friedman was working out key elements of CART at SLAC,with Olshen conducting mathemat-ical rearch in the same lab,similar independent rearch was under way in Los Angeles by Leo Breiman and Charles Stone(Breiman and Stone,1978).The two parate strands of rearch(Friedman and Olshen at Stanford,Breiman and Stone in Los Angeles)were brought together in1978when the four CART authors for-mally began the process of merging their work and preparing to write the CART monograph.

10.3A Running Example181

10.2Overview

The CART decision tree is a binary recursive partitioning procedure capable of pro-cessing continuous and nominal attributes as targets and predictors.Data are handled in their raw form;no bi

nning is required or recommended.Beginning in the root node,the data are split into two children,and each of the children is in turn split into grandchildren.Trees are grown to a maximal size without the u of a stopping rule; esntially the tree-growing process stops when no further splits are possible due to lack of data.The maximal-sized tree is then pruned back to the root(esntially split by split)via the novel method of cost-complexity pruning.The next split to be pruned is the one contributing least to the overall performance of the tree on training data(and more than one split may be removed at a time).The CART mechanism is intended to produce not one tree,but a quence of nested pruned trees,each of which is a candidate to be the optimal tree.The“right sized”or“honest”tree is identiﬁed by evaluating the predictive performance of every tree in the pruning quence on inde-pendent test data.Unlike C4.5,CART does not u an internal(training-data-bad) performance measure for tree lection.Instead,tree performance is always measured on independent test data(or via cross-validation)and tree lection proceeds only af-ter test-data-bad evaluation.If testing or cross-validation has not been performed, CART remains agnostic regarding which tree in the quence is best.This is in sharp contrast to methods such as C4.5or classical statistics that generate preferred models on the basis of training data measures.

The CART mechanism includes(optional)automatic class balancing and auto-matic missing value ha

ndling,and allows for cost-nsitive learning,dynamic feature construction,and probability tree estimation.Theﬁnal reports include a novel at-tribute importance ranking.The CART authors also broke new ground in showing how cross-validation can be ud to asss performance for every tree in the pruning quence,given that trees in different cross-validation folds may not align on the number of terminal nodes.It is uful to keep in mind that although BFOS addresd all the topics in the1970s,in some cas the BFOS treatment remains the state-of-the-art.The literature of the1990s contains a number of articles that rediscover core insightsﬁrst introduced in the1984CART monograph.Each of the major features is discusd parately below.

10.3A Running Example

To help make the details of CART concrete we illustrate some of our points using an easy-to-understand real-world example.(The data have been altered to mask some of the original speciﬁcs.)In the early1990s the author assisted a telecommunications company in understanding the market for mobile phones.Becau the mobile phone

182CART:Classiﬁcation and Regression Trees

TABLE10.1Example Data Summary Statistics

煅赤石脂

Attribute N N Missing%Missing N Distinct Mean Min Max AGE81318 2.29 5.05919 CITY830005 1.76915 HANDPRIC830004145.360235 MARITAL8229 1.13 1.901513 PAGER82560.7220.07636401 RENTHOUS830003 1.790613 RESPONSE8300020.151801 SEX81912 1.42 1.443212 TELEBILC768637.6654.1998116 TRA VTIME651180225 2.31815 USEPRICE83000411.1511030 MARITAL=Marital Status(Never Married,Married,Divorced/Widowed)

TRA VTIME=estimated commute time to major center of employment

AGE is recorded as an integer ranging from1to9

was a new technology at that time,we needed to identify the major drivers of adoption of this then-new technology and to identify demographics that might be related to price nsitivity.The data consisted of a houhold’s respon(yes/no)to a market test offer of a mobile phone package;all prospects were offered an identical package of a handt and rvice features,with one exception that the pricing for the package was varied randomly according to an experimental design.The only choice open to the houholds was to accept or reject the offer.

A total of830houholds were approached and126of the houholds agreed to subscribe to the mobile phone rvice plan.One of our objectives was to learn as much as possible about the differen

ces between subscribers and nonsubscribers.A t of summary statistics for lect attributes appear in Table10.1.HANDPRIC is the price quoted for the mobile handt,USEPRIC is the quoted per-minute charge,and the other attributes are provided with common names.

A CART classiﬁcation tree was grown on the data to predict the RESPONSE attribute using all the other attributes as predictors.MARITAL and CITY are cate-gorical(nominal)attributes.A decision tree is grown by recursively partitioning the training data using a splitting rule to identify the split to u at each node.Figure10.1 illustrates this process beginning with the root node splitter at the top of the tree. The root node at the top of the diagram contains all our training data,including704 nonsubscribers(labeled with a0)and126subscribers(labeled1).Each of the830 instances contains data on the10predictor attributes,although there are some missing values.CART begins by arching the data for the best splitter available,testing each predictor attribute-value pair for its goodness-of-split.In Figure10.1we e the results of this arch:HANDPRIC has been determined to be the best splitter using a threshold of130to partition the data.All instances prented with a HANDPRIC less than or equal to130are nt to the left child node and all other instances are nt to the right.The resulting split yields two subts of the data with substantially different

10.4The Algorithm Brieﬂy Stated183

澎湖海战

Figure10.1Root node split.

respon rates:21.9%for tho quoted lower prices and9.9%for tho quoted the higher prices.Clearly both the root node splitter and the magnitude of the difference between the two child nodes are plausible.Obrve that the split always results in two nodes:CART us only binary splitting.

德蒙特

To generate a complete tree CART simply repeats the splitting process just described in each of the two child nodes to produce grandchildren of the root.Grand-children are split to obtain great-grandchildren and so on until further splitting is impossible due to a lack of data.In our example,this g

rowing process results in a “maximal tree”consisting of81terminal nodes:nodes at the bottom of the tree that are not split further.

10.4The Algorithm Brieﬂy Stated

A complete statement of the CART algorithm,including all relevant technical details, is lengthy and complex;there are multiple splitting rules available for both classiﬁca-tion and regression,parate handling of continuous and categorical splitters,special handling for categorical splitters with many levels,and provision for missing value handling.Following the tree-growing procedure there is another complex procedure for pruning the tree,andﬁnally,there is tree lection.In Figure10.2a simpliﬁed algorithm for tree growing is sketched out.Formal statements of the algorithm are provided in the CART monograph.Here we offer an informal statement that is highly simpliﬁed.

Obrve that this simpliﬁed algorithm sketch makes no reference to missing values, class assignments,or other core details of CART.The algorithm sketches a mechanism for growing the largest possible(maximal)tree.

二宫和香番号

本文发布于:2023-07-13 04:28:07，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1079406.html

上一篇：2023届北京市海淀区高三下学期二模英语试题

下一篇：2015年研究生考试英语一试题(word版)及答案

标签：银行股东炖鱼番号柜员协议总结

留言与评论（共有 0 条评论）