首页 > 英语园地

Sentence Extraction for Legal Text Summarisation

更新时间:2023-07-22 06:08:28 阅读：评论：0

Sentence Extraction for Legal Text Summarisation Ben Hachey and Claire Grover

University of Edinburgh

School of Informatics

leak2Buccleuch Place

Edinburgh EH89LW,UK

{bhachey,grover}@inf.ed.ac.uk

Abstract

We describe a system for generating extractive

summaries of texts in the legal domain,focus-

children是什么意思ing on the relevance classiﬁer,which determines

which ntences are abstract-worthy.We experi-

星期三的英语单词

builtment with na¨ıve Bayes and maximum entropy es-

timation toolkits and explore methods for lecting

abstract-worthy ntences in rank order.Evaluation

using standard accuracy measures and using corre-

lation conﬁrm the utility of our approach,but sug-

rainbow什么意思gest different optimal conﬁgurations.

1Introduction

季札挂剑翻译

In the SUM project we are developing a system for summaris-ing legal judgments that is generic and portable and which maintains a mechanism to account for the rhetorical struc-ture of the argumentation of a ca.Following Teufel and Moens[2002],we are developing a text extraction system that retains aﬂavour of the fact extraction approach.This is achieved by combining ntence lection with information about why a certain ntence is extracted—e.g.is it part of a judge’s argumentation,or does it contain a decision regarding the disposal of the ca?In this way we are abl

e to produce ﬂexible summaries of varying length and for various audi-ences.Sentences can be reordered,since they have rhetorical roles associated with them,or they can be suppresd if a ur is not interested in certain types of rhetorical roles.

We have prepared a new corpus of UK Hou of Lords judgments(HOLJ)for this work which contains two layers of manual annotation:rhetorical role and relevance.The rhetor-ical roles reprent the ntence’s contribution to the overall communicative goal of the document.In the ca of HOLJ texts,the communicative goal for each lord is to convince their peers of the soundness of their argument.In the current version of the corpus there are69judgments which have been annotated for rhetorical role.The cond manual layer is an-notation of ntences for‘relevance’as measured by whether they match ntences in hand-written summaries.In the cur-rent version of the corpus,47of the69judgments which have been annotated for rhetorical role have also been annotated for relevance.A third layer of annotation is automatic linguis-tic annotation,which provides the features which are ud by the rhetorical role and relevance classiﬁers.2Classiﬁcation and Relevance

Following from[Kupiec et al.,1995],machine learning has been the standard approach to text extraction summarisation as it provides an empirical method for combining different in-formation sour

恒星日与太阳日ces about the textual unit under consideration. For relevance prediction,we performed experiments with publicly available na¨ıve Bayes(NB)and maximum entropy (ME)estimation toolkits.The NB implementation,found in the Weka toolkit,is bad on John and Langley’s[1995]al-gorithm incorporating statistical methods for nonparametric density estimation of continuous variables.The ME estima-tion toolkit,written by Zhang Le,contains a C++implemen-tation of the LMVM[Malouf,2002]estimation algorithm. For ME,we u the Weka implementation of Fayyad and Irani’s[1993]MDL algorithm to discreti numeric features. The features that we have been experimenting with for the HOLJ corpus are broadly similar to tho ud by Teufel and Moens[2002].They consist of location features encoding the position of the ntence in document,speech and paragraph;

a thematic words feature encoding the average tf*idf weight of the ntence terms;a ntence length feature encoding the number of tokens in the ntence;quotation features en-coding percentage of ntence tokens inside and in-line quote and whether or not the ntence is inside a block quote;entity features encoding the prence or abnce of named entities in the ntence;and cue phra features.

suitcaThe term‘cue phra’covers the kinds of stock phras which are frequently good indicators of rhetor

ical phras such as The aim of this study in the scientiﬁc arti-cle domain and It ems to me that in the HOLJ domain). Teufel and Moens invested a considerable amount of effort in building hand-crafted lexicons where the cue phras are assigned to one of a number ofﬁxed categories.A primary aim of the current rearch is to investigate whether this infor-mation can be encoded using automatically computable lin-guistic features.If they can,then this helps to relieve the burden involved in porting systems such as the to new do-mains.Our preliminary cue phra feature t includes syn-tactic features of the main verb(voice,ten,aspect,modal-ity,negation).We also u ntence initial part-of-speech and ntence initial word features to roughly approximate formu-laic expressions which are ntence-level adverbial or prepo-sitional phras.Subject features include the head lemma, entity type,and entity subtype.The features approximate

NB ME

P R F P R F Cue34.921.526.666.615.224.8

Entities30.726.428.466.815.425.1

Them.Words32.226.929.368.615.725.5

Location31.627.229.273.416.426.9

Quotations31.227.729.471.717.428.0

Sent.Length31.729.429.871.416.927.3 Table1:Accuracy measures for yes predictions.

dot dotthe hand-coded agent features of Teufel and Moens.A main verb lemma feature simulates Teufel and Moens’s type of ac-tion and a feature encoding the part-of-speech after the main verb is meant to capture basic subcategorisation information. 3Experimental Results

Table1contains cumulative precision(P),recall(R)and f-scores(F)for the na¨ıve Bayes(NB)and maximum entropy (ME)classiﬁers on the relevance classiﬁcation task.1Though only the cue phra feature t performs well individually, all feature ts contribute positively to the cumulative scores with the exception of ntence length for ME and quotation for NB.Both classiﬁers perform signiﬁcantly better than a baline created by lecting ntences from the end of the document,which obtains P,R and F scores of46.7,16.0and 23.8.F-scores for the best feature combinations are similar to the partial results reported in Teufel and Moens[2002]. Taking the f-score as the best metric to optimi would lead us to choo NB.

However,a basic aspect of summarisation system design, especially a system that needs to beﬂexible enough to suit various ur types,is that the size of the summary will be vari-able.For insta

nce,students may need a20ntence summary containing,for example,quite detailed background informa-tion,to get the same information a judge would get from a10 ntence summary.Furthermore,any given ur might want to request a longer summary for a certain document.So,what we actually want to do is rate how relevant/extract-worthy a ntence is in such a way that will allow us to lect ntences in rank order.Bearing this in mind,precision is probably the more important metric given that recall will be controlled by the size of the summary.So,ME with all but ntence length features actually appears to be the better approach for n-tence extraction.

Since we need a ranking rather than a yes/no classiﬁcation, this might actually be considered a regression task.However, due to the way the corpus was annotated,the target attribute is in fact binary.As both of our classiﬁers are probabilistic, we u p(y=yes| x)as a way to rank ntences.To evaluate the ranking methods with respect to our binary gold standard, we u the point-birial correlation coefﬁcient(r pb).Table 2contains correlation coefﬁcients between the gold standard yes/no classiﬁcation and p(y=yes| x)for na¨ıve Bayes(NB) 1Note that this is a strict evaluation that counts only yes predic-tions.Micro-and macro-averaging over yes and no predictions f-scores of87.6and67.3respectively for ME.

NB ME

I C I C

Cue0.1870.1870.2080.208

Entities0.1030.2110.0560.219

Them.Words0.0160.2110.0000.227

Location0.1040.229-0.0310.166

Quotations0.0920.2330.0930.187

Sent.Length0.0690.2350.0000.175

hodo

Table2:Point-birial correlation coefﬁcients.

and maximum entropy(ME).2The I column has scores for the individual feature ts and the C column has cumula-tive scores.The correlation results are strikingly different for NB and ME.While NB successfully incorporates all features (r pb=0.235),ME performs best using only cue phra,entity and thematic word features(r pb=0.208).For ME,the loca-tion feature t actually gives a negative correlation.Judging by the results,we would again be likely to choo NB.

4Conclusions and Future Work

In this paper,we have prented work on the automatic sum-marisation of legal texts for which we have compiled a new corpus with annotation of rhetorical status,relevance and lin-guistic markup.We prented ntence extraction results in classiﬁcation and ranking frameworks.Na¨ıve Bayes and maximum entropy classiﬁers achieve signiﬁcant improve-ments over the baline according to standard accuracy mea-sures.We have also ud the point-birial correlation coefﬁ-cient for quantitative evaluation of our extraction system,the results of which suggest difﬁerent optimal conﬁgurations.In current work,we are developing a ur study that will help determine empirically whether correlation coefﬁcients are a better evaluation metric than precision and recall accuracy measures.

References

[Fayyad and Irani,1993]U.Fayyad and K.Irani.Multi-interval discretization of continuous-valued attributes for classiﬁcation learning.In IJCAI,1993.

[John and Langley,1995]G.H.John and P.Langley.Esitmating continuous distributions in bayesian classiﬁers.In UAI,1995. [Kupiec et al.,1995]J.Kupiec,J.Pedern,and F.Chen.A train-able document s

ummarizer.In SIGIR,pages68–73,1995. [Malouf,2002]R.Malouf.A comparison of algorithms for maxi-mum entropy parameter estimation.In CoNLL,2002. [Teufel and Moens,2002]S.Teufel and M.Moens.Summarising scientiﬁc articles-experiments with relevance and rhetorical sta-tus.Computational Linguistics,28(4):409–445,2002.

[Wolf and Gibson,2004] F.Wolf and E.Gibson.Paragraph-,word-,and coherence-bad approaches to ntence ranking:A com-parison of algorithm and human performance.In ACL,2004.

2It has been argued that this is actually a better evaluation than standard accuracy measures,which do not account for degree of agreement[Wolf and Gibson,2004].

本文发布于:2023-07-22 06:08:28，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/78/1110311.html

上一篇：语法和语用能力的平衡发展

下一篇：日本企业人才培养模式案例

标签：翻译季札

留言与评论（共有 0 条评论）