Summaries with SumUM and its Expansion
for Document Understanding Conference(DUC2002)
Atefeh Farzindar and Guy Lapalme D´e partement d´Informatique
et Recherche Op´e rationnelle
RALI/Universit´e de Montr´e al
{Farzinda,Lapalme}@iro.umontreal.ca
Horacio Saggion Department of Computer Science Natural Language Processing Group University of Sheffield
H.Saggion@dcs.shef.ac.uk
Abstract
We prent the results of the DUC2002
evaluation on an adaptation of SumUM
which had previously been ud for pro-
ducing abstracts of scientific and technical
papers.Although we did not modify the
original manually developed algorithms or
dictionary of concepts for this evaluation,
the results are among the best of all sys-
tems that participated in DUC2002which
was on a completely different domain.
1Introduction
SumUM is a text summarization system developed by Horacio Saggion(2002),which produces short automatic abstracts of long scientific and technical documents.Abstracts are produced in two steps: the reader is prented with an indicative abstract, which identifies the topics of the document.After t
hat,an informative abstract is prented that elab-orates some topics lected by the reader.The im-plementation is bad on shallow syntactic and -mantic analysis,conceptual identification and text re-generation.For a complete description of the sys-tem the reader is referred to(Saggion and Lapalme, 2002).
For participation in DUC2002,we modified Su-mUM without changing the algorithm or templates (developed manually from the study of a corpus of abstracts written by professional abstractors).We did not change the algorithm,becau we wanted to evaluate the performance of the actual system before modifying it in our future investigations.2Description of the system
SumUM is a dynamic summarization system for producing the indicative-informative summaries of the long technical documents.The architecture of the system is shown in Figure1.
This system is implemented in SICStus Prolog (Relea3.7.1)and Perl running on sun workstation (5.6)and Linux machines(RH6.0).
The implementation of our method relies on the following:the ction of particular types of infor-mation from the source text;the instantiation of dif-ferent types of templates;the lection templates in order to produce an indicative abstract;the re-generation of a short but novel text which indicates t
he topics of the document and;the expansion of the indicative text with topic elaboration.
The input to the system is a technical article in English,a single-document without any markup, but with the following structural elements:Title of the article,Author information and affiliation,In-troduction ction,Main ction without prototypi-cal structure,References,and optionally,Author ab-stract,list of Keywords,and Acknowledgements. 2.1Pre-processing and Interpretation
The raw text is tagged and transformed in a struc-tured reprentation,allowing the following pro-cess to access the structure of the text(words, groups of words,titles,ntences,paragraphs,c-tions,and so on).Domain specific transducers are applied in order to identify possible concepts in the discour domain(such as the authors,the paper, references to other authors,institutions and so on) and linguistic transducers are applied in order to
DATA BASE
INDICATIVE INFORMATIVE DATA BASE INFORMATIVE ABSTRACT
TOPICS
INDICATIVE INDICATIVE ABSTRACT
夫人的英文SELECTED TOPICS
GENERATION
GENERATION
USER INDICATIVE SELECTION
INFORMATIVE INFORMATIVE SELECTION
RAW TEXT
PRE PROCESSING INTERPRETATION
POTENTIAL TOPICS INDICATIVE CONTENT
TEXT REPRESENTATION CONCEPTUAL INDEX TOPICAL STRUCTURE
TERM TREE ACRONYM INFORMATION
CONCEPTUAL DICTIONARY
Figure 1:SumUM Architecture
identify noun groups and verb groups.Afterwards,mantic tags,marking discour domain relations and concepts are added to the different elements of the structure.
2.2Indicative Selection
Its function is to identify ntences of indicative type for constructing the content for the indicative ab-stract,which extracts the relevant information of the text and identifies potential topics of the document.2.3Informative Selection
This process determines the ntences of informa-tive type.It elaborates the potential topics computed
by the indicative lection according to the interest of the reader.2.4Generation
The output is an indicative abstract compod of complete re-generated ntences.The text is pre-nted to the reader,with a list of topics available for expansion,which the reader can decide to elabo-rate.
3Evaluation of SumUM in DUC 2002
We participated with SumUM in the DUC 2002”fully automatic summarization of a single docu-ment”track.SumUM produced abstracts 100or less
words length.
The National Institute of Standards and Technol-ogy of the U.S.(NIST)produced60reference ts of approximately10documents.The document ts are produced using data from the TREC disks ud in the question-answering track in TREC-9.Each t contained the short documents defined by four different types of criteria,such as single natural dis-aster event,single event in any domain,multiple dis-tinct events of a single type,and biographical infor-mation mainly about a single individual,but we did not have any long scientific documents. Nevertheless,SumUM is designed for the techni-cal articles and the sources of information that we ud for implementing our system are bad of the linguistic and conceptual pattern of technical doc-uments.Obviously,the linguistic information,in the DUC corpus is different from information usu-ally found in the technical domain.But before we consider any changes,we needed to test the general quality of the actual system in DUC2002.
4Expansion of SumUM for DUC2002
In order to produce the abstract with SumUM for DUC2002,we redesigned the input to our system.
We had to format the source texts in order to be able to recognize the structure of the document.Then we changed the interactive mode of the system and its interface.In thefirst version of SumUM,in each step of pre-processing,interpretation,indica-tive lection,informative lection and generation, the ur must guide the system and choo an option propod by the system.When SumUM produces an indicative abstract and prents the potential top-ics,the ur has to decide to continue elaborating the topics,or stop to exit the system.But in expanded version of SumUM all the process are automatic: it takes a document t of DUC2002as input and it produces the abstracts of all includedfiles in the t, as output.
For each t of10documents,we execute a sin-gle program allowing the system to do the following process automatically and save the result with the submission requested in DUC2002:
srcpan
Input processing:
We have a Perl program for taking the single-document of DUC,removing the mark-up and adapting it with input structure recognizable by Su-mUM.
moderatePre-processing and Interpretation:
This process identifies the structure of the docu-ment and interprets the ntences of the text.
The main process of SumUM:
SumUM generates a short abstract of the text.For the DUC tests,the system is not interactive and will automatically abstract and elaborate the topics,with-out ur intervention.
Output abstract:
A Perl program takes the abstract,generated by SumUM and adapts it,with the submission re-quested for DUC2002.
5Evaluation Results
ersucker
For single document summaries there were two cat-egories of evaluation,one done by humans(for the abstracts),and one done automatically(for the ex-tracts)by NIST.
To compute the quantitative measure between the generated summaries by the summarization systems (Peer abstract)and the reference(Model abstract), the human-created summary was gmented by hand into model units(MUs),and the generated sum-maries by the summarization systems,were g-mented into peer units(PUs),which are always n-tences.
december缩写To evaluate the quality of the system-generated summaries,DUC2002ud the twelve qualitative questions.The quality questions concern aspects like grammaticality,coherence and organization of abstracts.For each document t,the evaluator reads the peer summary and then makes overall judgments as to the peer summary’s quality,independent of the model.Table1lists the results obtained for the evaluation of13systems,which participated in the Single-Document Summarization task at DUC 2002.In the table,Human denotes the result of hu-man subject and Baline is the result of the ba-line which merely takes thefirst100words in the document.The lower score for the mean score and count of quality questions means the better perfor-mance.
SumUM was rankedfirst according to the mean score for questions about the quality of the pro-duced output.This shows the interest of having Su-
Mean Error Count of Mean
Rank Group Quality Group Quality Group Length-Adjusted
Questions Questions Coverage 1Our system0.408Microsoft0.582BBN0.339
2Microsoft0.425LCC0.698Our system0.299
3LCC0.448Our system0.758LCC0.293
4CCS-NSA0.537U.Nijmegen0.885Microsoft0.272
5U.Ottawa0.551U.Ottawa0.986NTT0.272
6NTT0.552Imperial College0.997CCS-NSA0.261
7U.Nijmegen0.561CCS-NSA 1.013U.Leuven0.251
8Imperial College0.565NTT 1.014U.Nijmegen0.247
9U.Michigan0.644U.Lethbridge 1.153U.Lethbridge0.240
10U.Leuven0.660U.Leuven 1.210U.Ottawa0.232
11U.Lethbridge0.676U.Michigan 1.441Imperial College0.228
12BBN 1.040BBN 2.637ISI/GLEANS0.220
13ISI/GLEANS 1.281ISI/GLEANS 3.200U.Michigan0.214
Human0.3540.5050.336 Baline0.4900.7180.255
Table1:Evaluation Results on Single-Document
mUM regenerating well-formed ntences instead of putting together arbitrary parts of text extracts. Coverage metric measures how much of a model summary’s content was expresd by a system-generated peer summary.For DUC2002there was
a desire also to:
1)Look at the ability of a system to produce a summary shorter than the predefined target length 2)Devi a combined measure of coverage and compression.
The higher score for the mean coverage means the better performance.For the mean length-adjusted coverage,our system was ranked on the cond place.We think the reason for the good perfor-mance of SumUM was that the templates were well designed,which regenerates the short complete n-tences.
Table2lists the results of the evaluations of Su-mUM in DUC2002which the summaries responded the twelve questions and the rank of our summariza-tion in single-document between13other system
s. Table3lists also the experimental results we ob-tained in this competition.
6Conclusions
We prented SumUM a text summarization system, which participated in Single-Document Summariza-tion task,and the benefit of our experience in DUC 2002as its expansion.Our system was specified for summarization of the long scientific and techni-cal document and very different from the corpus of DUC,becau the test corpus was very short for Su-mUM with the different linguistic information from the scientific paper.Unfortunately,in DUC2002we ended up with many empty outputs,where the pat-terns and templates of SumUM could not match with the new types of information of non-scientific docu-ments.We just evaluated the automatic regenerated abstracts,which produced by the templates of sys-tem,without replacing the empty outputs with the ba line abstract.This is the reason why we ob-tained the low scores in precision and recall.How-ever,the automatic generated abstracts by SumUM were considered comparable to other systems sum-marization that participated in DUC2002,which has been developed especially for this kind of data.The results so far indicate good general quality of system when compared with other summarization technolo-gies.
We will continue to enhance our system and widen the mantic analysis and conceptual iden-tification in order to develop the new patterns and templates for different types of document,in the text re-generation.
Acknowledgement
shit什么意思The completion of this rearch was made possi-ble thanks to Bell Canada’s support through its Bell University Laboratories R D program.
Twelve questions Evaluation Rank of SumUM
ud in DUC2002Result
Q1:Capitalization errors0.061
Q2:Incorrect Word Order0.037
Q3:Subject-verb agreement0.001
Q4:Missing Component0.125kdg
Q5:Unrelated fragments joint0.055
Q6:Missing articles0.028
Q7:Pronouns lacking antecedents0.065
Q8:Nouns with unclear referents0.2210
Q9:Noun/Noun phra-pronoun replacement0.019
Q10:Out of place conjunctions0.027
Q11:Unnecessary information repetition0.042
Q12:Sentences wrong place0.013
Table2:Result of the evaluation of SumUM,which the summaries responded the qualitative questions and the rank of our summarization in single-document between13other systems.
Evaluation Result Rank of SumUM
Score for quality questions with non-0answe0.312
Count of quality questions with non-0answers0.763
美容学校学费
Mean Score for quality questions with non-0answers0.411
Unmodified Mean coverage0.0812
Mean length-adjusted coverage0.302
Table3:Evaluation Results of SumUM.
References
R.Barzilay and M.Elhadad.1997.Using lexi-cal chains for text summarization.In Proceedings of the Intelligent Scalable Text Summarization Workshop (ISTS’97),ACL.Madrid,Spain.
E.Hovy and C.Lin.1997.Automated text summariza-
tion in SUMMARIST.In Mani and Maybury(1997), 18–24.Cambridge:MIT Press.
Kevin,K.and D.Marcu,2000.Statistics-Bad Summa-rization-Step One:Sentence Compression,The17th National Conference of the American Association for Artificial Intelligence,AAsavvy
AI’2000,Outstanding Paper Award,pages=”703-710”,Austin,Texas,2000,July 30-Augest3.
C.Lin.2000.The automated acquisition of topic sig-
natures for text summarization.In Proceedings of the COLING Conference,2000,Saarbrcken,Germany. C.Lin.2001.SEE-Summary Evaluation Environment,
”www.isi.edu/cyl/SEE/”,
Dragomir R.Radev and Kathleen McKeown.1998.Gen-erating natural language summaries from multiple on-
line sources.Computational Linguistics,24(3):469–500.September1998.
bricH.Saggion and G.Lapalme.2002.Generating informa-
tive and indicative summaries with SumUM,to appear in Computational Linguistics,Special Issue on Auto-matic Summarization.