Identifying Anatomical Phras in Clinical Reports by Shallow Semantic Parsing Methods
Vijayaraghavan Bashyam, Ricky K Taira
Medical Imaging Informatics Group,
University of California, Los Angeles
924 Westwood Blvd, Suite 420, Los Angeles, CA 90024
vijay | rtaira @ mii·ucla·edu
Abstract— Natural Language Processing (NLP) is being applied for veral information extraction tasks in the biomedical domain. The unique nature of clinical information requires the need for developing an NLP system designed specifically for the clinical domain. We describe a method to identify mantically coherent phras within clinical reports. This is an important step towards full syntactic parsing within a clinical NLP system. We u this mantic phra chunker to identify anatomical phras within radiology reports related to the genitourinary domain. A discriminative classifier bad on support vector machines was ud to classify words into one of five phra classification categories. Training of the classifier was performed using 1000 hand-tagged ntences
from a corpus of genitourinary radiology reports. Features ud by the classifier include n-grams, syntactic tags and mantic labels. Evaluation was conducted on a blind test t of 250 ntences from the same domain. The system achieved overall performance scores of 0.87 (precision), 0.91 (recall) and 0.89 (balanced f-score). Anatomical phra extraction can be rapidly and accurately accomplished.
Keywords- Natural language processing, shallow mantic parsing, anatomy phras, radiology reports, support vector machines
辅导员工作计划I.I NTRODUCTION
The adoption of the electronic medical record by hospitals in the United States has resulted in the generation of large volumes of textual data on an everyday basis as a result of routine clinical care. This information is largely in the form of unstructured natural language [1, 2]. The structuring of such narrative reports is vital for using the rich information contained in them such as descriptions of the state of dia. The need for developing tools to extract such information from biomedical text has often been stresd in the past [3]. We have developed a method to identify mantically coherent phras within medical reports as an important step towards full syntactic parsing. Using this techniq
ue, we attempted to mine anatomical phras from radiology reports within the genitourinary domain. Anatomical phra identification is of utmost importance for clinical natural language processing (NLP) becau clinical reports primarily consist of anatomical concepts associated with other concepts. For example, a radiology report contains descriptions of findings in anatomical locations. A surgery report contains a description of actions performed on anatomical parts. Correctly identifying anatomy phras is also an important step towards coding concepts to a standard vocabulary.
This paper reports the performance of this NLP system to identify anatomical phras in urology related radiology reports. The remainder of this paper is organized as follows. Section II provides a brief background on the need for mantic phrasal chunking and reviews the previous methods ud for similar tasks. Section III describes the problem formalization, data collection and implementation of the methods. Section IV summarizes the results of this experiment and Section V concludes with an error analysis and future directions for this project.
II.B ACKGROUND
A.Need for Semantic Phra Chunking
‘Phra chunking’ can be defined as the identification of logically coherent non-overlapping quences of words within a ntence. It is an intermediate step towards full syntactic parsing [4] and is an effective method of reducing the dimensionality of the overall NLP task. Traditionally phra chunking has been primarily syntactic in nature. In other words, the boundaries of the phra being extracted are marked according to grammars so that the resulting phra conforms to an established syntactic structure. The most frequently occurring phras in this scheme of phra chunking are noun phras, verb phras and prepositional phras [5]. The phra boundaries are usually marked according to the conventions followed by the Penn Tree Bank [6].
The disadvantage of using syntactic chunking (at least) in the clinical domain is due to the difficulty in obtaining grammatically correct ntences in medical reports. This problem has been acknowledged in the domain of clinical pediatric literature [7] and to a lesr extent, even in the domain of general language [8, 9]. Physicians often dictate their diagnosis to a speech recognition system which transcribes their dictation to text. Though the physician manually inspects the report to correct it for transcription errors, it is uncommon to find medical reports with strict punctuation. Peculiarities like homophones and abbreviations are sources of noi in automated / manual transcription. Physicians often u partial ntences (e.g. 5cm mass en,)
This work was supported in part by the following grants: 1. National Institute of Biomedical Imaging and Bioengineering NIBIB P01-EB00216, 2. National Institute of Health RO1-EB002247
precision of 88% for the task of identifying the specified anatomical terminology.
To summarize, most previous methods for syntactic chunking ud a small feature t limited to words, their POS tags, contextual words and their POS tags and were able to achieve good performance.
III.M ETHODS
A.Problem Formalization
We model the problem of chunking as a classification problem where each word needs to be tagged with a label which indicates whether or not it is a part of the anatomical phra. We utilize the 5-class tagging scheme described by Kudo and Matsumoto [18]. The goal of our classifier is to tag each word token in the ntence with one of the following five outcomes: a) Begin [B], b) End [E], c) Inside [I], d) Single [S], and e) Outside [O]. The working definition for each outcome is given in Table I. For example, in the ntence, “A chest mass in the right upper lobe is en,” the markup for the anatomy description phra is as en in “Fig. 2”.
TABLE I. D EFINITION OF CLASSIFIER OUTCOMES
Class Definition
B token is Beginning of a phra consisting of more than one
token
E token is the End of a phra consisting of more than one token
生于忧患翻译I token is between the start and end of a phra consisting of
more than two tokens
S token is the lone token of a phra consisting of only one token O Current token is outside of the phra
A chest mass in the right upper lobe is en O S O O O
B I E O O
Figure 2. Class markup for individual tokens in a ntence We ud support vector machines (SVMs) as the classifier for our task. SVMs are primarily binary classifiers but can also be ud for multi-class problems. SVMs have been previously demonstrated to be extremely accurate for the tasks of syntactic chunking [22], dependency parsing [26] and text categorization [27, 28].
B.Data Collection
In any pattern classification task, it is desirable to compile a large number of quality training examples which reflect the underlying distribution of the pool statistics. A reprentative sample of training data is important since the training examples reflect exactly how the classifier will behave. Any inconsistencies or errors in tagging can cau significant performance degradation. Thus, decis
ions have to be made on how to handle somewhat ambiguous tagging assignments such as: “Left native kidney,” “Right true pelvis,” “Loops of presumed colon,” “On the right side, the femur.”
medical text of partial descriptions (ellipsis) that require some prior knowledge either expresd within a previous portion of the text or simply understood within the domain. For example the word “tip”, “end”, and “apex” may or may
not refer to some landmark on an anatomical organ.
The following steps were followed to identify ntences with anatomical phras:
1.Over twelve thousand radiology reports related to
urology within the existing hospital databa at our
institution were captured using an XML-bad gateway
[29]. The reports were associated with all radiological
modalities including magnetic resonance imaging,
computed tomography, ultrasound, fluoroscopy, and
plain film radiography.
2.Section boundary detection was performed on the
reports to break up a report into individual ctions such
as HEADER, HISTORY, FINDINGS, CONCLUSIONS
etc. Following this, ntence boundary detection was
performed on the ctions. Both of the modules have
been previously tested, with recall and precision
accuracies of over 99% within the domain of radiology
[30].
3. A lexical analyzer procesd each ntence performing
tokenization, part-of-speech tagging and mantic class
tagging. Our lexicon categorizes tokens into twenty
syntactic categories and over three hundred mantic
categories providing improved discrimination for tasks
such as syntactic parsing and mantic interpretation.
4.Word-n disambiguation was performed on
commonly occurring words with very different
本科怎么考meanings (e.g., ‘T1’ as an anatomy entity or a signal
type in magnetic resonance imaging). Also, recognition
of dates, measurements, and special symbols (e.g.,
tumor staging classifications) was performed in this
step. Deidentification of the reports was performed at
this stage [31].
5.Using a high-recall ntence-level phra parr, all
英语b级考试技巧
possible anatomical phra instances within a ntence
were conrvatively identified rejecting ntences that
obviously have no candidates. For example, the filter
would reject ntences that do not have at least one word
from the following mantic class: lfReferenceLocation (e.g., neck), physobj.anatomy
(e.g., lung).
6. A domain expert with familiarity to NLP, examined the
ntences and hand-tagged the anatomy phras. The
mj是什么意思
tagged t was verified by a cond expert and then
stored into a training databa to rve as the gold
standard for testing and development. On ambiguous
terms the domain experts came to a connsus.
With this approach, we collected 1250 ntences and tagged them for anatomy phras. We then t aside 250 randomly lected ntences for testing and ud the remaining 1000
ntences for training the classifier. The 80-20 ratio is in
accordance to the standard followed in the CoNLL-2000 shared task on chunking[13].
C.Implementation
We ud the SVM light implementation of SVMs freely available from the website svmlight.joachims/. The input to the classifier is the word to be labeled followed by a t of features. In this ca, the features included syntactic tags, mantic labels, and contextual words with their syntactic and mantic labels.
The classifier categorized each word into one of the five target categories. The output of the classifier was compared to the gold standard previously created by the domain experts.
IV.RESULTS
Table II summarizes the performance of the individual class assignments output by the phra chunker on the 250 ntences from a corpus of genitourinary radiology reports from our institution. The performance on this task is quantified with three rates: 1) precision - the percentage of detected phras that are correct; 2) recall - the percentage of phras in the data that were found by the chunker and; 3) balanced f-measure – the weighted harmonic mean of precision and recall. The measures are related to true positive (TP), fal negative (FN), and fal positive (FP) statistics as follows:
TP
Precision =
TP+FP
(1)
TP
Recall =
TP+FN
(2)
Precision ×Recall
F =2×
Precision + Recall
(3)
TABLE II. I NDIVIDUAL CLASS PERFORMANCE MEASURES
TABLE III. P HRASE IDENTIFICATION PERFORMANCE MEASURES
No. of Phras TP FP FN Recall Precision f-score 423 350 51 31 0.91 0.87 0.89
Of the 423 anatomical phras prent in the test t, 350 phras were identified correctly. Of the, 263 phras were multiword phras and 87 phras were single word phras. The overall precision for identifying anatomical phras was 87% and the recall was 91% as shown in table III. The overall precision and recall for assigning class labels to the tokens were both 91%. The individual class assignment performances are shown in table II.
V.DISCUSSION
We prent a system that is bad on discriminative pattern recognition methods to accurately locate anatomical phras found in clinical text reports. Examples of fal positive errors included phras that were abnormally truncated (e.g., ellipsis) such as the example below:
“The residual barium which was previously en in
dilated loops of small has appeared to have pasd.” Examples of fal negatives errors included phras that were part of a conjunctive phra such as the example below:
“Following catheterization of both the urinary bladder
and a vaginal orifice, the urinary bladder and vaginal
were opacified.”
We also note that the system performance for the class assignments [B], [E] and [S] are lower than the other class. This is expected becau it is more difficult to tag the phra boundaries than to tag the inner words of the phra. However since a large number of tokens are either inside[I] or outside[O] of a phra boundary, the overall performance measures still show a high performance. We recognize that the interpretation of the individual class performance scores is more important than the overall performance scores.
Future directions for the project include incorporating more contextual features, and training the classifier to recognize other types of phras like findings, spatial relations, temporal relations, causal relations, existential relations, etc. Additionally, explorations of the system’s adaptability to new clinical domains outside of radiology and urology will be conducted.
VI.C ONCLUSION
A fast accurate anatomy phra parr has been developed within the application area of genitourinary radiology reports. High system accuracy is achieved by a combination of a large number of domain specific training examples, a rich t of discriminating features, and a powerful dis
crimative classifier. This system will be ud both as a part of an NLP system as well as a standalone application to mine anatomical phras.
Class No. TP FP FN
Recall
Precision f-score
B307 263 45 51 0.83 0.85 0.84 I601 592 64 33 0.94 0.90 0.92 E307 293 48 61 0.82 0.85 0.84 O1596 1467 94 107 0.93 0.93 0.93 S116 87 11 13 0.87 0.88 0.87 Total2927 2702 262 265 0.91 0.91 0.91
P ERMISSIONS
This project has been approved by the University of California, Los Angeles, Institutional Review Board (IRB) vide IRB approval number G0012001-13
A CKNOWLEDGMENT
The authors thank Carol Demi and the domain experts for their effort in preparing the anatomy training corpus. The authors especially thank Drs. Hooshang Kangarloo and Paul S. Cho for their guidance in this project.
R EFERENCES
[1] H. J. Tange, H. C. Schouten, A. D. M. Kester, and A. Hasman, "The
Granularity of Medical Narratives and Its Effect on the Speed and Completeness of Information Retrieval," J Am Med Inform Assoc, vol. 5, pp. 571-82, 1998.
[2] F. Hall, "Language of the radiology report," American Journal of
Roentology,vol. 175, pp. 1239-1241, 2000.
[3] S. Ananiadou, C. Friedman, and J. Tsujii, "Introduction: named entity
recognition in biomedicine.," Journal of Biomedical Informatics, vol.
37, pp. 393-5, 2004.
[4] S. Abney, "Parsing by chunks," in Principle-Bad Parsing. Kluwer
Academic Publishers, 1991.
karl lagerfeld[5] E. F. T. K. Sang and S. Buchholz, "Introduction to the CoNLL-2000
shared task: chunking," Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning-Volume 7, pp. 127-132, 2000.
[6] M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini, "Building a
large annotated corpus of English: the penn treebank," Computational Linguistics, vol. 19(2), pp. 313-330, 1993.
[7] J. Pestian, L. Itert, and W. Duch, "Development of a Pediatric Text-
Corpus for Part-of-Speech Tagging," in Intelligent Information Processing and Web Mining, S. T. W. M.A. Klopotek, K.
Trojanowski, Ed., 2004, pp. 219-226.
[8] E. Sapir, Language: An Introduction to the Study of Speech: Dover
Publications, 2004.
[9] C. D. Manning and H. Schütze, Foundations of Statistical Natural
Language Processing: MIT Press, 1999.
[10] Y. Huang, H. J. Lowe, D. Klein, and R. J. Cucina, "Improved
Identification of Noun Phras in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parr Augmented with the UMLS Specialist Lexicon," Journal of the American Medical Informatics Association, vol. 12, pp. 275-285, 2005.
[11] D. Klein and C. D. Manning, "Fast exact inference with a factored
nt model for natural language parsing," Advances in Neural Information Processing Systems, vol. 15, pp. 3-10, 2003.
[12] D. Klein and C. D. Manning, "Accurate unlexicalized parsing,"
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp. 423-430, 2003.
[13] E. Sang and S. Buchholz, "Introduction to the CoNLL-2000 shared
task: chunking," in Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning, Lisbon, Portugal, 2000, pp. 127-132.
[14] D. Bourigault, "Surface grammatical analysis for the extraction of
terminological noun phras," Proceedings of the 14th conference on Computational linguistics-Volume 3, pp. 977-981, 1992.
[15] A. Voutilainen, "NPtool, a detector of English noun phras,"
eskimo kiss>forever at your feetProceedings of the Workshop on Very Large Corpora, pp. 48-57, 1993.
[16] K. W. Church, "A stochastic parts program and noun phra parr
for unrestricted text," Proceedings of the Second Conference on Applied Natural Language Processing, vol. 136, 1988. [17] L. A. Ramshaw and M. P. Marcus, "Text chunking using
transformation-bad learning," in Proceedings of the Third ACL Workshop on Very Large Corpora, June 1995, pp. 82-94.
[18] J. Veenstra and A. van den Bosch, "Single-Classifier Memory-Bad
Phra Chunking," in Proceedings of the Fourth Workshop on Computational Natural Language Learning (CoNLL 2000), Lisbon, Portugal, 2000, pp. 157–159.
[19] M. Osborne, "Shallow parsing as part-of-speech tagging,"
Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning-
Volume 7, pp. 145-147, 2000.
[20] A. Ratnaparkhi, "A maximum entropy part-of-speech tagger," in
Proceedings of the Empirical Methods in Natural Language Processing, University of Pennsylvania, 1996, pp. 133–142.
[21] C. Johansson, "A Context Sensitive Maximum Likelihood Approach
to Chunking," in Proceedings of the 4th Conference on Computational Natural Language Learning -2000, Lisbon, Portugal, 2000, pp. 136–138.
[22] T. Kudo and Y. Matsumoto, "Chunking with support vector
machines," in Proceedings of the 2nd Meeting of the North American Chapter of the Association of Computational Linguistics, Carnegie Mellon University, Pittsburgh, Pennsylvania, 2001, pp. 192-199. [23] R. Koeling, "Chunking with Maximum Entropy Models,"
Proceedings of CoNLL-2000 and LLL-2000, pp. 139-141, 2000. [24] T. Zhang, F. Damerau, and D. Johnson, "Text chunking bad on a
generalization of winnow," Journal of Machine Learning Rearch, vol. 2, pp. 615-638, 2002.
[25] C. A. Sneiderman, T. C. Rindflesch, and C. A. Bean, "Identification
of anatomical terminology in medical text," in Proceedings of the American Medical Informatics Association Symposium, 1998, p. 32. [26] H. Yamada and Y. Matsumoto, "Statistical dependency analysis with
apology
support vector machines," in Proceedings of the 8th International Workshop on Parsing Technologies (IWPT), 2003, pp. 195–206. [27] S. Dumais, J. Platt, D. Heckerman, and M. Sahami, "Inductive
learning algorithms and reprentations for text categorization," in Proceedings of ACM-CIKM98, 1998, pp. 148-155.
[28] T. Joachims, Learning to Classify Text Using Support Vector
Machines: Methods, Theory and Algorithms: Kluwer Academic Publishers, 2002.
[29] A. A. T. Bui, J. D. N. Dionisio, C. A. Morioka, U. Sinha, R. K. Taira,
and H. Kangarloo, "DataServer: An Infrastructure to Support Evidence-bad Radiology," Acad Radiology, vol. 9, pp. 670-678, 2002.
[30] R. K. Taira and S. G. Soderland, "A statistical natural language
processor for medical reports," in Proceedings of the Amedican Medical Informatics Association Symposium, 1999, p. 4.
[31] R. K. Taira, A. A. T. Bui, and H. Kangarloo, "Identification of patient
name references within medical documents using mantic lectional restrictions," in Proceedings of the Amedican Medical Informatics Association Symposium, 2002, 2002, p. 61.