an introduction to ROC analysis

更新时间:2023-07-27 02:49:44 阅读：评论：0

An introduction to ROC analysis

Tom Fawcett

Institute for the Study of Learning and Experti,2164Staunton Court,Palo Alto,CA 94306,USA拼音口诀记忆法

Available online 19December 2005

Abstract

Receiver operating characteristics (ROC)graphs are uful for organizing classiﬁers and visualizing their performance.ROC graphs are commonly ud in medical decision making,and in recent years have been ud increasingly in machine learning and data mining rearch.Although ROC graphs are apparently simple,there are some common misconceptions and pitfalls when using them in practice.The purpo of this article is to rve as an introduction to ROC graphs and as a guide for using them in rearch.Ó2005Elvier B.V.All rights rerved.

Keywords:ROC analysis;Classiﬁer evaluation;Evaluation metrics

1.Introduction

A receiver operating characteristics (ROC)graph is a technique for visualizing,organizing and lecting classiﬁ-ers bad on their performance.ROC graphs have long been ud in signal detection theory to depict the tradeoﬀbetween hit rates and fal alarm rates of classiﬁers (Egan,1975;Swets et al.,2000).ROC analysis has been extended for u in visualizing and analyzing the behavior of diag-nostic systems (Swets,1988).The medical decision making community has an extensive literature on the u of ROC graphs for diagnostic testing (Zou,2002).Swets et al.(2000)brought ROC curves to the attention of the wider public with their Scientiﬁc American article.

One of the earliest adopters of ROC graphs in machine learning was Spackman (1989),who demonstrated the value of ROC curves in evaluating and comparing algo-rithms.Recent years have en an increa in the u of ROC graphs in the machine learning community,due in part to the realization that simple classiﬁcation accuracy is often a poor metric for measuring performance (Provost and Fawcett,1997;Provost et al.,1998).In addition to being a generally uful performance graphing method,they have properties that make them especially uful for

domains with skewed class distribution and unequal clas-siﬁcation error costs.The characteristics have become increasingly important as rearch continues into the areas of cost-nsitive learning and learning in the prence of unbalanced class.

ROC graphs are conceptually simple,but there are some non-obvious complexities that ari when they are ud in rearch.There are also common misconceptions and pit-falls when using them in practice.This article attempts to rve as a basic introduction to ROC graphs and as a guide for using them in rearch.The goal of this article is to advance general knowledge about ROC graphs so as to promote better evaluation practices in the ﬁeld.2.Classiﬁer performance

We begin by considering classiﬁcation problems using only two class.Formally,each instance I is mapped to one element of the t {p ,n }of positive and negative class labels.A classiﬁcation model (or classiﬁer )is a mapping from instances to predicted class.Some classiﬁcation models produce a continuous output (e.g.,an estimate of an instance Õs class membership probability)to which diﬀer-ent thresholds may be applied to predict class membership.Other models produce a discrete class label indicating only the predicted class of the instance.To distinguish between

0167-8655/$-e front matter Ó2005Elvier B.V.All rights rerved.doi:10.1016/j.patrec.2005.10.010

E-mail address:tfawcett@acm ,tom.

/locate/patrec

the actual class and the predicted class we u the labels {Y,N}for the class predictions produced by a model.

汽修专修培训Given a classiﬁer and an instance,there are four possible outcomes.If the instance is positive and it is classiﬁed as positive,it is counted as a true positive;if it is classiﬁed as negative,it is counted as a fal negative.If the instance is negative and it is classiﬁed as negative,it is counted as a true negative;if it is classiﬁed as positive,it is counted as a fal positive.Given a classiﬁer and a t of instances(the test t),a two-by-two confusion matrix(also called a con-tingency table)can be constructed reprenting the disposi-tions of the t of instances.This matrix forms the basis for many common metrics.

Fig.1shows a confusion matrix and equations of veral common metrics that can be calculated from it.The num-bers along the major diagonal reprent the correct deci-sions made,and the numbers of this diagonal reprent the errors—the confusion—between the various class. The true positive rate1(also called hit rate and recall)of a classiﬁer is estimated as

tp rate%Positives correctly classified Total positives

The fal positive rate(also called fal alarm rate)of the classiﬁer is

fp rate%Negatives incorrectly classified

Total negatives

Additional terms associated with ROC curves are nsitivity¼recall

specificity¼

True negatives

Fal positivesþTrue negatives

¼1Àfp rate

positive predictive value¼precision 3.ROC space

中性洗发水ROC graphs are two-dimensional graphs in which tp rate is plotted on the Y axis and fp rate is plotted on the X axis.An ROC graph depicts relative tradeoﬀs between beneﬁts(true positives)and costs(fal positives).Fig.2 shows an ROC graph withﬁve classiﬁers labeled A through E.

A discrete classiﬁer is one that outputs only a class label. Each discrete classiﬁer produces an(fp rat

e,tp rate)pair corresponding to a single point in ROC space.The classiﬁ-ers in Fig.2are all discrete classiﬁers.

Several points in ROC space are important to note.The lower left point(0,0)reprents the strategy of never issu-ing a positive classiﬁcation;such a classiﬁer commits no fal positive errors but also gains no true positives.The opposite strategy,of unconditionally issuing positive classi-ﬁcations,is reprented by the upper right point(1,1).

The point(0,1)reprents perfect classiﬁcation.DÕs per-formance is perfect as shown.

Informally,one point in ROC space is better than another if it is to the northwest(tp rate is higher,fp rate is lower,or both)of theﬁrst.Classiﬁers appearing on the left-hand side of an ROC graph,near the X axis,may be

高中语文必背篇目

1For clarity,counts such as TP and FP will be denoted with upper-ca安得后羿弓

letters and rates such as tp rate will be denoted with lower-ca.

862T.Fawcett/Pattern Recognition Letters27(2006)861–874

thought of as‘‘conrvative’’:they make positive classiﬁca-tions only with strong evidence so they make few fal posi-tive errors,but they often have low true positive rates as well.Classiﬁers on the upper right-hand side of an ROC graph may be thought of as‘‘liberal’’:they make positive classiﬁcatio

ns with weak evidence so they classify nearly all positives correctly,but they often have high fal posi-tive rates.In Fig.2,A is more conrvative than B.Many real world domains are dominated by large numbers of negative instances,so performance in the far left-hand side of the ROC graph becomes more interesting.

3.1.Random performance

The diagonal line y=x reprents the strategy of ran-domly guessing a class.For example,if a classiﬁer ran-domly guess the positive class half the time,it can be expected to get half the positives and half the negatives correct;this yields the point(0.5,0.5)in ROC space.If it guess the positive class90%of the time,it can be expected to get90%of the positives correct but its fal positive rate will increa to90%as well,yielding (0.9,0.9)in ROC space.Thus a random classiﬁer will pro-duce a ROC point that‘‘slides’’back and forth on the dia-gonal bad on the frequency with which it guess the positive class.In order to get away from this diagonal into the upper triangular region,the classiﬁer must exploit some information in the data.In Fig.2,CÕs performance is virtu-ally random.At(0.7,0.7),C may be said to be guessing the positive class70%of the time.

Any classiﬁer that appears in the lower right triangle performs wor than random guessing.This tria

ngle is therefore usually empty in ROC graphs.If we negate a classiﬁer—that is,rever its classiﬁcation decisions on every instance—its true positive classiﬁcations become fal negative mistakes,and its fal positives become true neg-atives.Therefore,any classiﬁer that produces a point in the lower right triangle can be negated to produce a point in the upper left triangle.In Fig.2,E performs much wor than random,and is in fact the negation of B.Any classiﬁer on the diagonal may be said to have no information about the class.A classiﬁer below the diagonal may be said to have uful information,but it is applying the information incorrectly(Flach and Wu,2003).

Given an ROC graph in which a classiﬁerÕs performance appears to be slightly better than random,it is natural to ask:‘‘is this classiﬁerÕs performance truly signiﬁcant or is it only better than random by chance?’’There is no conclu-sive test for this,but Forman(2002)has shown a method-ology that address this question with ROC curves.玉器知识

4.Curves in ROC space

以此Many classiﬁers,such as decision trees or rule ts,are designed to produce only a class ,a Y or N on each instance.When such a discrete classiﬁer is applied to a test t,it yields a single confusion matrix,which in turn corresponds to one ROC point.Thus,a discrete clas-siﬁer produces only a single point in ROC space.

Some classiﬁers,such as a Naive Bayes classiﬁer or a neural network,naturally yield an instance probability or score,a numeric value that reprents the degree to which an instance is a member of a class.The values can be strict probabilities,in which ca they adhere to standard theorems of probability;or they can be general,uncali-brated scores,in which ca the only property that holds is that a higher score indicates a higher probability.We shall call both a probabilistic classiﬁer,in spite of the fact that the output may not be a proper probability.2 Such a ranking or scoring classiﬁer can be ud with a threshold to produce a discrete(binary)classiﬁer:if the classiﬁer output is above the threshold,the classiﬁer pro-duces a Y,el a N.Each threshold value produces a diﬀer-ent point in ROC space.Conceptually,we may imagine varying a threshold fromÀ1to+1and tracing a curve through ROC space.Computationally,this is a poor way of generating an ROC curve,and the next ction describes a more eﬃcient and careful method.

Fig.3shows an example of an ROC‘‘curve’’on a test t of20instances.The instances,10positive and10nega-tive,are shown in the table beside the graph.Any ROC curve generated from aﬁnite t of instances is actually a step function,which approaches a true curve as the number of instances approaches inﬁnity.The step function in Fig.3 is taken from a very small instance t so that each pointÕs derivation can be understood.In the table of Fig.3,the instances are sorted by their scores,a

nd each point in the ROC graph is labeled by the score threshold that produces it.A threshold of+1produces the point(0,0).As we lower the threshold to0.9theﬁrst positive instance is clas-siﬁed positive,yielding(0,0.1).As the threshold is further reduced,the curve climbs up and to the right,ending up at(1,1)with a threshold of0.1.Note that lowering this threshold corresponds to moving from the‘‘conrvative’’to the‘‘liberal’’areas of the graph.

Although the test t is very small,we can make some tentative obrvations about the classiﬁer.It appears to perform better in the more conrvative region of the graph;the ROC point at(0.1,0.5)produces its highest accuracy(70%).This is equivalent to saying that the classi-ﬁer is better at identifying likely positives than at identify-ing likely negatives.Note also that the classiﬁerÕs best accuracy occurs at a threshold of P0.54,rather than at P0.5as we might expect with a balanced distribution. The next ction discuss this phenomenon.

4.1.Relative versus absolute scores

An important point about ROC graphs is that they mea-sure the ability of a classiﬁer to produce good relative

2Techniques exist for converting an uncalibrated score into a proper probability but this conversion is

unnecessary for ROC curves.

T.Fawcett/Pattern Recognition Letters27(2006)861–874863

instance scores.A classiﬁer need not produce accurate,cal-ibrated probability estimates;it need only produce relative accurate scores that rve to discriminate positive and neg-ative instances.

Consider the simple instance scores shown in Fig.4, which came from a Naive Bayes classiﬁer.Comparing the hypothesized class(which is Y if score>0.5,el N)against the true class,we can e that the classiﬁer gets instances 7and8wrong,yielding80%accuracy.However,consider the ROC curve on the left side of theﬁgure.The curve ris vertically from(0,0)to(0,1),then horizontally to(1,1). This indicates perfect classiﬁcation performance on this test t.Why is there a discrepancy?

The explanation lies in what each is measuring.The ROC curve shows the ability of the classiﬁer to rank the positive instances relative to the negative instances,and it is indeed perfect in this ability.The accuracy metric impos a threshold(score>0.5)and measures the result-ing classiﬁcations with respect to the scores.The accuracy measure would be appropriate if the scores were proper probabilities,but they are not.Another way of saying this is that the scores are not proper

ly calibrated,as true prob-abilities are.In ROC space,the imposition of a0.5thres-hold results in the performance designated by the circled ‘‘accuracy point’’in Fig.4.This operating point is subop-timal.We could u the training t to estimate a prior for p(p)=6/10=0.6and u this as a threshold,but it would still produce suboptimal performance(90%accuracy).

One way to eliminate this phenomenon is to calibrate the classiﬁer scores.There are some methods for doing this (Zadrozny and Elkan,2001).Another approach is to u an ROC method that choos operating points bad on their relative performance,and there are methods for doing this as well(Provost and Fawcett,1998,2001).The latter methods are discusd brieﬂy in Section6.

A conquence of relative scoring is that classiﬁer scores should not be compared across model class.One model class may be designed to produce scores in the range [0,1]while another produces scores in[À1,+1]or[1,100]. Comparing model performance at a common threshold will be meaningless.

4.2.Class skew

ROC curves have an attractive property:they are inn-sitive to changes in class distribution.If the proportion of positive to negative instances changes in a test t,the ROC curves will not change.To

e why this is so,consider the confusion matrix in Fig.1.Note that the class distribu-tion—the proportion of positive to negative instances—is the relationship of the left(+)column to the right(À)col-umn.Any performance metric that us values from both columns will be inherently nsitive to class skews.Metrics such as accuracy,precision,lift and F score u values from both columns of the confusion matrix.As a class distribu-tion changes the measures will change as well,even if the fundamental classiﬁer performance does not.ROC graphs are bad upon tp rate and fp rate,in which each dimension is a strict columnar ratio,so do not depend on class distributions.

To some rearchers,large class skews and large changes in class distributions may em contrived and unrealistic. However,class skews of101and102are very common in real world domains,and skews up to106have been obrved in some domains(Clearwater and Stern,1991; Fawcett and Provost,1996;Kubat et al.,1998;Saitta and Neri,1998).Substantial changes in class distributions are not unrealistic either.For example,in medical decision making epidemics may cau the incidence of a dia to increa over time.In fraud detection,proportions of fraud varied signiﬁcantly from month to month and place to place(Fawcett and Provost,1997).Changes in a manufac-turing practice may cau the proportion of defective units

万用表用法

864T.Fawcett/Pattern Recognition Letters27(2006)861–874

produced by a manufacturing line to increa or decrea. In each of the examples the prevalence of a class may change drastically without altering the fundamental char-acteristic of the ,the target concept.

Precision and recall are common in information retrie-val for evaluating retrieval(classiﬁcation)performance (Lewis,1990,1991).Precision-recall graphs are commonly ud where static document ts can sometimes be assumed;however,they are also ud in dynamic environ-ments such as web page retrieval,where the number of pages irrelevant to a query(N)is many orders of magni-tude greater than P and probably increas steadily over time as web pages are created.

To e the eﬀect of class skew,consider the curves in Fig.5,which show two classiﬁers evaluated using ROC curves and precision-recall curves.In Fig.5a and b,the test

T.Fawcett/Pattern Recognition Letters27(2006)861–874865

本文发布于:2023-07-27 02:49:44，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1098101.html

上一篇：七年级英语知识点总结

下一篇：重庆历年高考英语真题及答案（2）

标签：记忆法培训知识

留言与评论（共有 0 条评论）