LungCAD:A Clinically Approved,Machine Learning System for Lung Cancer Detection
R Bharat Rao,Jinbo Bi, Glenn Fung,Marcos
Salganicoff Siemens Medical Solutions 51Valley Stream Parkway, Malvern,PA19355
Nancy Obuchowski
Quantitative Health Sciences
The Cleveland Clinic
Foundation
9500Euclid Ave.,Cleveland,
OH44195
David Naidich
Department of Radiology
New Y ork University Medical
Center
400East34Street,New Y ork,
NY10016
ABSTRACT
We prent LungCAD,a computer aided diagnosis(CAD) system that employs a classification algorithm for detecting solid pulmonary nodules from CT thorax studies.We briefly describe some of the machine learning techniques developed to overcome the real world challenges in this medical do-main.The most significant hurdle in transitioning from a machine learning rearch prototype that performs well on an in-hou datat into a clinically deployable system,is the requirement that the CAD system be tested in a clini-cal trial.We describe the clinical trial in which LungCAD was tested:a large scale multi-reader,multi-ca(MRMC) retrospective obrvational study to evaluate the effect of CAD in clinical practice for detecting solid pulmonary nod-ules from CT thorax studies.The clinical trial demonstrates that every radiologist that participated in the trial had a sig-nificantly greater accur
acy with LungCAD,both for detect-ing nodules and identifying potentially actionable nodules; this,along with otherfindings from the trial,has resulted in FDA approval for LungCAD in late2006. Categories and Subject Descriptors
I.5.m[Pattern Recognition]:Miscellaneous
General Terms
Algorithms
Keywords
computer aided detection,lung cancer prognosis,classifica-tion,clinical trial
1.INTRODUCTION
Lung cancer is the most commonly diagnod cancer world-wide,accounting for1.2million new cas annually.Lung Permission to make digital or hard copies of all or part of this work for personal or classroom u is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy o
therwi,to republish,to post on rvers or to redistribute to lists,requires prior specific permission and/or a fee.
KDD’07,August12-15,2007,San Jo,California,USA. Copyright2007ACM978-1-59593-609-7/$5.00.cancer is an exceptionally deadly dia:6out of10people will die within one year of being diagnod.The expected 5-year survival rate for all patients with a diagnosis of lung cancer is only15%,compared to65%for colon,89%for breast and99.9%for prostate cancer.In the United States, lung cancer is the leading cau of cancer death for both men and women,causing more deaths than the next three most common cancers combined,and costs$9.6Billion to treat annually.However,lung cancer prognosis varies greatly de-pending on how early the dia is diagnod;as with all cancers,early detection provides the best prognosis.At one extreme are the patients diagnod with metastatic tumors (that have spread far from the lung),for whom the5-year survival rate is just2%.On the other hand,when diagnod at an early stage,when the dia is still localized within the lung,the5-year survival rate is49%,and many treat-ment options(surgery,radiotherapy,chemotherapy)are vi-able.Today,only24%of lung cancer cas are diagnod at an early stage.[1,10].
The recent development of multidetector computed to-mography(MDCT)scanners has made it feasib
le to detect lung cancer at very early stages in priciple.Despite the ad-vances in technology,many potentially clinically significant lesions still remain undetected[13].One contributing factor is the explosion of generated data:The state-of-the-art64-slice dual-source CT acquires up to3,687axial images in30 conds for each patient(each image must then be carefully examined by a radiologist).There is a growing connsus among clinical experts that the u of computer-aided di-agnosis(CAD)software when ud as a cond , in conjunction with the radiologist)not only offers the po-tential to improve the detection accuracy of a radiologist, but also to reduce mistakes related to misinterpretation[2, 11].In order for a CAD system to be ud in clinical prac-tice in the United States,it mustfirst receive approval from the the Food and Drug Administration(FDA).All CAD systems must go through a rigorous clinical trial to receive approval(in much the same way as a new drug).A handful of CAD systems have received approval for detecting breast cancer lesions in the past8years.To be approved CAD sys-tems must show satisfactory performance in two areas.The principal value of CAD is determined not by its stand-alone performance,but rather by carefully measuring the incre-mental value of using Computer-Aided Diagnosis in normal clinical practice with the radiologist in-the-loop.Secondly, CAD systems must not have a negative impact on patient
management(for instance,fal positives which cau the radiologist to recommend unnecessary,an
d potentially dan-gerous,follow-ups).Additionally,designing a trial for lung cancer detection is considerably more challenging than for breast cancer.One factor is the relative difficulty in obtain-ing ground truth(correct labeling)for lung cancer related lesions.Whereas,in breast cancer virtually all suspicious lesions are routinely biopsied(providing definitive histolog-ical ground truth),a lung biopsy is a dangerous procedure, with a2%risk of rious complications(including death); this makes obtaining definitive ground truth infeasible,par-ticularly for patients being evaluated for early signs of lung cancer.
Section2describes some of the machine learning chal-lenges involved in learning a classifier for detecting lung cancer.We review some of our previous solutions.Section3 describes the clinical trial design for our LungCAD system, which includes a fairly complex mechanism for determin-ing ground truth and measuring incremental improvement. Section4summarizes the experimental results of the clin-ical trial that has resulted in granting clinical approval for LungCAD.We conclude in Section5with some discussion about CAD in general and future challenges.
2.MACHINE LEARNING CHALLENGES LungCAD system consists of5stages:1.lung gmenta-tion to identify the lung area within the chest;2.candidate generation which identifies suspicious unhealthy candidate regions of interest(ROI)from a medical image;
3.feature extraction that computes descriptive features for each can-didate so that each candidate is reprented by a vector x of numerical values or attributes[15];
4.classification that differentiates candidates bad on candidate feature vectors;
5.visual prentation of CADfindings to the radiologist in order for him to accept or reject the CADfindings.In this ction,we focus on learning the classifier in Step4. Automatic learning technologies greatly reduce the time required to develop algorithms that act as“cond readers”besides improving the diagnostic accuracy.Many standard algorithms(such as support vector machines(SVM),back-propagation neural nets,kernel Fisher discriminants)have been ud to learn classifiers for detecting malignant struc-tures[2,11].However,the general-purpo learning meth-ods either make implicit assumptions that are commonly violated in CAD applications,or cannot effectively address the difficulties arin when learning a CAD system.
Non-IID Data Traditional learning methods almost uni-versally assume that the training samples are independently drawn from an identical albeit unobrvable underlying dis-tribution(the IID assumption),which is often not the ca in CAD systems.Due to spatial adjacency of the regions identified by a candidate generator,both the features and the class labels of veral adjacent candid
ates are highly cor-related.This is true both in the training t and in the test-ing data.A batch-classification algorithm in[14]derives a probabilistic classification model by specifying a priori guess on the candidate labels with a covariance matrixΣthat en-codes the spatial-proximity-bad correlations within an im-age.Multiple-instance learning methods[9,3]optimize the classifier design by taking into account the fact that multi-ple candidates can exist to associate with a single malignant structure.Random effects may exist in patient images from the same hospital,or exist in different candidates extracted from the same patient.The approach in[7]propos to u additional mix-effect parameters,each for one hospital,or for one patient.All the algorithms improve the classifica-tion accuracy significantly.
Unbalanced Data and Speed In the candidate identi-fication stage,high nsitivity(ideally clo to100%)is es-ntial,becau any cancers misd at this stage can never be found by the CAD system,which potentially produces many fal positives(less than1%of the candidates are pos-itive),making the classification problem highly unbalanced. Moreover,a CAD system has to satisfy real-time require-ments that itfinishes running during the radiologistsfirst read.The issues were addresd by employing effective cascaded classification frameworks as shown in[4,5].The method in[4]investigates a cascaded classification approach that solves a quence of linear programs,each
construct-ing a spar hyperplane(linear)classifier.It incorporates the computational complexity of various features into the cascade design for time efficiency.A more recent work[5] does not follow standard cascade procedure where individ-ual classifiers are optimized towards one specific stage given the candidates survived from early stages.Instead,it us a novel AND-OR cascade training strategy which optimizes all of the classifiers in the cascade in parallel by minimiz-ing the regularized risk of the entire system and providing implicit mutual feedback to individual classifiers to adjust parameter design.The cascaded approaches have been compared with the well-known cascade AdaBoost,and are superior with many additional advantages.再接再厉什么意思
Irrelevant and Redundant Features When arch-ing for descriptive features,rearchers often deploy a large amount of experimental image features to describe the iden-tified candidates,which conquently introduces irrelevant and redundant features.Feature lection is esntial in CAD systems.A previous LungCAD system[15]utilizes a greedy forward lection approach to lect one feature at one time from the feature t according to certain discrim-inant score ranking.Recent rearch has focud more on general sparsity treatments to construct spar estimates of classifier parameters,such as in[6,4].The models control the classifier complexity by spar-favoring regularization terms,such as the 1-norm regularization||w||1=
|w i| for a linear classifier of the form sign(w T x).
3.LUNGCAD TRIAL DESIGN
The clinical trial design is illustrated in Figure2.The principal challenges we faced in designing the clinical trial are described below:
Measure incremental improvement:The principal value of CAD is determined not by its stand-alone performance, but rather by carefully measuring the incremental value of Computer-Aided Diagnosis in normal clinical practice;as re-flected in incremental improvement in accuracy as objective evaluation by the radiologist.
Patient management impact:It is not enough that Lung-CAD improves the detection of lung cancer.It must result in a net improvement in patient management since unnec-essary fal positivefindings lead to unnecessary follow-ups. Ground truth:As discusd earlier,due to the unavail-ability of lung biopsies,an alternative method had to be devid for determining ground truth.
We retrospectively collected MDCT studies from200con-cutive patients(mean age:61.5y,56%male)who had been
Figure1:A multicenter,Multi-Reader Multi-Ca(MRMC)retrospective clinical study to asss the incre-mental value of LungCAD in the identification of pulmonary nodules on thoracic CT examinations(CRO= contract rearch organization,GR=general radiologist,CR=chest radiologist).
referred for evaluation of potential pulmonary nodules from
4clinical sites:NYU,Univ.of Pennsylvania,Univ.of Mary-
land and the Cleveland Clinic;The studies were procesd
by an independent Contract Rearch Organization(CRO),
BioImaging,Inc,Yardley PA.4studies were excluded due
to respiratory or cardiac motion,or image artifacts.
All196studies were initially evaluated by17board-certified
general radiologists(GR)in active community practice,each
握力器有用吗
using a predetermined randomized order,to detect poten-
tial nodules of diameter≥3mm.The GR’s were required
to score potential nodules on a“nodule”scale,from1(“un-
likely”)to10(“definite”).GR’s were also required to deter-
mine if each nodule could be identified as“actionable”again
on a10point scale(0−2denoting“no followup needed”,
3−6“indeterminate”,>6“definite need for followup”).To
illustrate,a benign calcified granuloma would be reprented
as true(10),non-actionable(<3)nodule.
Then CAD-identified potential nodules were prented to
the GR’s(after eliminating nodules that had already been
found by the GR),and were assd using the same two
scales.The blinded,independent reviews were re-nt to
the CRO,wherefindings were examined by an independent
fellowship trained chest radiologist to consolidate any nod-
ules independently found by more than one GR.
The results were then reviewed parately by5fellowship-
trained expert chest radiologists(CR)randomly chon from
a panel of10,each interpreting100studies.Expert CR’s
were required to evaluate each nodule parately without
knowledge of whether the had been identified by radiol-
ogists or by CAD,and to asss them on both a“nodule”
and“actionability”binary decision and its rating.Further,
the nodule size and lung lobe in which each nodule was en
was also recorded.For nodule candidates to be considered
true nodules(ground truth)a minimum connsus of3out
of5experts was necessary.
A note on sample size:Bad on pilot studies,we assumed
that at least60%of patients would have a nodule in an aver-
age of3lobes,that the CR’s would have average ROC area
without CAD of0.80with moderate inter-reader variability,
and that CAD would improve the ROC area by0.025.To
yield80%power in the trial,we estimated that17readers
and200patients would suffice.
4.CLINICAL TRIAL RESULTS
职场自我介绍Ground truth was defined as having at least3of5expert
chest radiologists identifying at least one nodule in a lobe
(affected lobe);otherwi,lobes were labeled normal.Sim-
ilarly,an actionable lobe was one in which3or more CR’s
identified one or more actionable nodules.
A total of1320(≥3mm)nodules were identified in196
patients of which863(65.4%)were interpreted by expert
大海虾CR’s as actionable.(Unless specified otherwi,from here
on all nodules will be assumed to be in the clinically rel-
evant range of≥3mm in diameter.)181patients had at
least1nodule(prevalence rate of92.3%):only15patients
were interpreted as normal(all lobes were normal).1320
nodules were detected in525(53.6%)of980(=196×5)po-
tentially evaluable lobes of which397(40.5%)had at least
one actionable nodules.
The primary measurement for the diagnostic accuracy of
the17general radiologists(GR),both with and without
CAD,for detecting solid pulmonary nodules,is the area
under ROC curve,using lobes as the unit of analysis.A
nonparametric estimator was ud to adjust for the clustered
data as described in[12].Sensitivity was defined as the
probability that a GR identified at least one nodule in an
affected lobe;specificity was defined as the probability that a
GR did not identify a nodule in a normal ,correctly
identified it as nodule-free).
Figure2shows that the17GR’s accuracy for identify-
ing nodules ranged from0.704to0.853without CAD to
0.6
0.65
0.7
0.75
0.8
0.85
0.9
1
2
3
4
5
6
属兔几月出生最好7
8
91011121314151617
With CAD Without CAD
Figure 2:Area under receiver operating curve with and without CAD,for actionable solid nodules.
Figure 3:Average nonparametric ROC curve of all 17readers for detecting nodules without and with CAD.
0.738to 0.883with CAD.The most important result was that every one of the 17GR’s had statistically significantly greater accuracy with CAD for detecting lung nodules .As-sd collectively,the GR’s mean accuracies were 0.780and 0.828,without and with CAD,respectively (p <0.001;95%CI of 0.036to 0.059),as shown in Figure 3.
Similar results were achieved for the clinically significant actionable nodules:the 17GR’s accuracy for ranged from 0.699to 0.854without CAD to 0.760to 0.880with CAD.Again,every one of the 17GR’s had statistically significantly greater accuracy with CAD for identifying actionable lung nodules .We stress the findings becau most CAD trials demonstrate a statistically significant increa for the read-ers considered as a group,with only some of the readers in-dividually having statistically significantly greater accuracy.The results are particularly significant becau every GR showed statistically significant improvement for both tasks -detecting nodules,and identifying actionable nodules.Fig-
Figure 4:Average nonparametric ROC curve of all 17readers without and with CAD for identifying actionable nodules.
ure 4shows the ROC performance for all 17readers without and with CAD for identifying actionable nodules.
We varied the definition of expert truth by changing the number of expert confirmations required for acceptance from any 1CR to 2,3,4,5expert CR’s for both nodules and ac-tionable nodules.With one exception,every one of the 17GR’s showed statistically significant improvement both for detection an
d identification of actionable nodules with CAD.(The sole exception was the ca that all 5expert CR’s must agree about an actionable nodule -which tended to happen for fewer and more obvious actionable nodules,thus mak-ing it harder to shown statistically significant improvement,but the trend was towards improvement with CAD.)In an-other analysis,statistical improvement in GR’s accuracy was achieved for all nodules regardless of size (≥3mm ).
To determine the patient management impact we esti-mated the number of patients,where CAD lead to a positive management ,a recommendation for additional imaging studies and/or biopsy in an actionable lobe which was misd without CAD);and estimated the number of pa-tients where CAD lead to a negative management change:a recommendation for additional studies and/or biopsy in a normal lobe which was correctly diagnod without CAD.As this is a patient-level analysis,patients with both positive and negative management changes were labelled as a posi-tive change,under the assumption that detecting a misd nodule is more beneficial to a patient than the risk of an unnecessary follow-up (typically another imaging exam).The average number of patients with a positive manage-ment change resulting from using CAD was 24.8(averaged across the CR’s),meaning that 7.9patients (=196/24.8)must be evaluated for a positive management change,on av-erage.On the other hand,12patients had negative manage-ment changes (averages across the 1
7CR’s),meaning that 16.3patients must be evaluated with CAD for a negative management change to result.As the positive management changes exceeded the negative management changes on av-erage,this was sufficient,even without considering that on
average positive management changes are more beneficial than negative managment changes are harmful.
Additional details on the multi-reader,multi-ca(MRMC) statistical methodology ud,are provided in our submis-sion and in[16].The LungCAD clinical trial summary of safety and effectiveness[8](which is available on the FDA’s web site)contains many more results and analys,includ-ing:patient-level analysis of GR’s increa in accuracy with CAD,bootstrap sampling to estimate variability of expert CR’s.
5.DISCUSSION
To summarize our clinical results,CAD is an effective c-ond reader,both for detecting nodules and for identifying potentially actionable nodules.The fal positive rate is acceptably low given the incread rate of positive manage-ment changes.Thefindings have resulted in LungCAD being granted clinical approval by the FDA for detecting solid pulmonary nodules from CT thorax studies.Al
though some debate remains about the preci value of screening (for breast cancer,and now for lung cancer),all experts agree that early detection is key for improvement of can-cer cure rates.Many efforts are ongoing to pave the way for MDCT to be ud for identifying lung cancer at early stages. However,much remains to be done in this area.First,our study focud on solid pulmonary nodules;in high risk pa-tients,part-solid and ground-glass nodules(GGN)are also en on chest MDCTs.GGNs are defined as nodules with hazy attenuation without obscuration of underlying vascular markings,and will necessitate the development of improved machine learning and image processing methods to detect. Our focus in this study been to detect pulmonary nodules. However,the eventual goal is not just to detect nodules,or even to detect actionable nodules,but to detect lung cancer in early stages,and thereby,intervene and treat the patient and improve survival.Therefore,CAD needs to move in the next few years,from detecting nodules to classifying nod-ules as benign or malignant.Afirst step could be to report the probability of malignancy,although the clinical and reg-ulatory challenges to design a trial to prove the efficacy of such a system would be daunting(larger sample size is not the answer-our study took nearly two years to complete-and the FDA is already taking steps to reduce the regula-tory burden,while ensuring the safety and efficacy of CAD). An even more intriguing notion would be to identify lesions that are currently benign,but would have a high probability of turning malignant-pre-cancerous lesions-to move from a reactive paradigm of treating cancer to a more proactive paradigm of prevention.
We have described some machine learning challenges in the lung CAD domain and reviewed some of our previous machine learning work.Our methods are not specific to lung cancer only,and have shown equivalent or superior perfor-mance on other data ts.For instance,the PECAD(Pul-monary Embolism)problem(that formed the basis of the 2006KDDCup)is very different in its evaluation criteria; treatment of PE is systemic(as oppod to localized in lung cancer)and the goal is to identify patients as having one of more PE’s or being PE-free.In the ColonCAD problem, the goal is to detect all pre-cancerous polyps;the cost of a fal positive is not very high,and the treatment of choice is to remove all potentially suspicious lesions.Yet,despite the very different optimization criteria and the vastly different medical domain knowledge,many of the machine learning methods described here,also translate to the and other CAD problems.
6.REFERENCES
[1]American Lung Association.Trends in lung cancer
morbidity and mortality report.2006.
[2]S.G.Armato-III,M.L.Giger,and H.MacMahon.
Automated detection of lung nodules in CT scans:
preliminary results.Medical Physics,28(8):1552–1561,
2001.
回锅肉炒饭[3]J.Bi and J.Liang.Multiple instance learning of pulmonary
embolism detection with geodesic distance along vascular
structure.In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2007.
[4]J.Bi,S.Periaswamy,K.Okada,T.Kubota,G.Fung,
M.Salganicoff,and R.B.Rao.Computer aided detection
via asymmetric cascade of spar hyperplane classifiers.In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2006.
[5]M.Dundar and J.Bi.Joint optimization of cascaded
classifiers for computer aided detection.In Proceedings of
IEEE Conference on Computer Vision and Pattern
Recognition,2007.
[6]M.Dundar,G.Fung,J.Bi,S.Sandilya,and R.B.Rao.
Sparfisher discriminant analysis for computer aided
detection.In Proceedings of SIAM International
Conference on Data Mining,2005.
[7]M.Dundar,B.Krishnapuram,J.Bi,and R.B.Rao.
Learning classifiers when the training data is not IID.In
Proceedings of the20th International Joint Conference on Artificial Intelligence,2007.
[8]Food and Drug Administiration.Siemens Syngo lung CAD
summary of safety and effectiveness,PMA No.0500022.
October2006.
上午的英语[9]G.Fung,M.Dundar,B.Krishnapuram,and R.B.Rao.
Multiple instance algorithms for computer aided diagnosis.
In Advances in Neural Information Processing Systems,
2006.
[10] A.Jemal,R.Siegel,E.Ward,T.Murray,J.Xu,and M.J.
Thun.Cancer statistics.CA Cancer J.Clin.,57:43–66,
2007.
[11] D.P.Naidich,J.P.Ko,and J.Stoechek.Computer aided
diagnosis:Impact on nodule detection amongst community level radiologist.A multi-reader study.In Proceedings of
CARS2004Computer Assisted Radiology and Surgery,
pages902–907,2004.
[12]N.A.Obuchowski.Nonparametric analysis of clustered roc
侮辱英语
curve data.Biometrics,53:170–180,1997.
[13]S.J.Swenn,J.R.Jett,T.E.Hartman,D.E.Midthun,
S.J.Mandrekar,S.L.Hillman,A.-M.Sykes,G.L.
Aughenbaugh,A.O.Bungum,and K.L.Allen.CT
screening for lung cancer:five-year prospective experience.
Radiology,235(1):259–265,2005.
[14]V.Vural,G.Fung,B.Krishnapuram,J.Dy,and R.B.Rao.
Batch-wi classification with applications to computer
aided diagnosis.In Proceedings of European Conference on Machine Learning,2006.
[15]M.Wolf,A.Krishnan,M.Salganicoff,J.Bi,M.Dundar,
G.Fung,J.Stoeckel,S.Periaswamy,H.Shen,P.Herzog,
and D.P.Baidich.CAD performance analysis for
pulmonary nodule detection on thin-slice MDCT scans.In
H.Lemke,K.Inamura,K.Doi,M.Vannier,and
A.Farman,editors,Proceedings of CARS2005Computer
Assisted Radiology and Surgery,pages1104–1108,2005. [16]X.-H.Zhou,N.A.Obuchowski,and D.K.McClish.
Statistical Methods in Diagnostic Medicine.Wiley,New
York,NY,2002.