Abstract In Statistics-Bad Summarization- Step One Sentence Compression, Knight and

更新时间:2023-07-10 10:51:02 阅读: 评论:0

Supervid and Unsupervid Learning for Sentence Compression
Jenine Turner and Eugene Charniak
Department of Computer Science
Brown Laboratory for Linguistic Information Processing(BLLIP)
Brown University
none是什么意思Providence,RI02912
{jenine|ec}@cs.brown.edu
Abstract
In Statistics-Bad Summarization-Step
One:Sentence Compression,Knight and
Marcu(Knight and Marcu,2000)(K&M)
flickr
prent a noisy-channel model for n-
tence compression.The main difficulty
in using this method is the lack of data;
Knight and Marcu u a corpus of1035
training ntences.More data is not easily
available,so in addition to improving the
original K&M noisy-channel model,we
create unsupervid and mi-supervid
models of the task.Finally,we point out
problems with modeling the task in this
way.They suggest areas for future re-
大连广告设计arch.
mouth的音标1Introduction
Summarization in general,and ntence compres-sion in particular,are popular topics.Knight and Marcu(henceforth K&M)introduce the task of statistical ntence compression in Statistics-Bad Summarization-Step One:Sentence Compression (Knight and Marcu,2000).The appeal of this prob-lem is that it produces summarizations on a small scale.It simplifies general compression problems, such as text-to-abstract conversion,by eliminating the need for coherency between ntences.The model is further simplified by being constrained to word deletion:no rearranging of words takes place.Others have performed the ntence compres-sion task using syntactic approaches to this problem (Mani et al.,1999)(Zajic et al.,2004),but we fo-cus exclusively on the K&M formulation.Though the problem is simpler,it is still pertinent to cur-rent needs;generation of captions for television and audio scanning rvices for the blind(Grefenstette, 1998),as well as compressing chon ntences for headline generation(Angheluta et al.,2004)are ex-amples of us for ntence compression.In addi-tion to simplifying the task,K&M’s noisy-channel formulation is also appealing.
In the following ctions,we discuss the K&M noisy-channel model.We then prent our cleaned up,
and slightly improved noisy-channel model.We also develop unsupervid and mi-supervid(our term for a combination of supervid and unsuper-vid)methods of ntence compression with inspi-ration from the K&M model,and create additional constraints to improve the compressions.We con-clude with the problems inherent in both models.
2The Noisy-Channel Model
2.1The K&M Model
The K&M probabilistic model,adapted from ma-chine translation to this task,is the noisy-channel model.In machine translation,one imagines that a string was originally in English,but that someone adds some noi to make it a foreign string.Analo-gously,in the ntence compression model,the short string is the original ntence and someone adds noi,resulting in the longer ntence.Using this framework,the end goal is,given a long ntence l,to determine the short ntence s that maximizes
P(s|l).By Bayes Rule,
P(s|l)=
P(l|s)P(s)
count(s)
(2)
where count(joint(l,s))is the count of alignments of the long rule and the short.Many compressions do not align exactly.Sometimes the pars do not match,and sometimes there are deletions that are too complex to be modeled in this way.In the cas ntence pairs,or ctions of them,are ignored. The K&M model creates a packed par forest of all possible compressions that are grammatical with respect to the Penn Treebank(Marcus et al.,1993).Any compression given a zero expansion probability according to the training data is instead assigned a very small probability.A tree extractor(Langkilde, 2000)collects the short ntences with the highest score for P(s|l).
the origin of the world
2.2Our Noisy-Channel Model
Our starting implementation is intended to follow the K&M model fairly cloly.We u the same 1067pairs of ntences from the Ziff-Davis cor-pus,with32ud as testing and the rest as train-ing.The main difference between their model and ours is that instead of using the rather ad-hoc K&M language model,we substitute the syntax-bad lan-guage model described in(Charniak,2001).
We slightly modify the channel model equation to be P(l|s)=P expand(l|s)P deleted,where P deleted is the probability of adding the deleted subtrees back into s to get l.We determine this probability also using the Charniak language model.
We require an extra parameter to encourage com-pression.We create a development corpus of25n-tences from the training data in order to adjust this parameter.That we require a parameter to encourage compression is odd as K&M required a parameter to discourage compression,but we address this point in the penultimate ction.
Another difference is that we only generate short versions for which we have rules.If we have never before en the long version,we leave it alone,and in the rare ca when we never e the long version as an expansion of itlf,we allow only the short version.We do not u a packed tree structure,be-cau we make far fewer ntences.Additionally, as we are traversing the list of rules to compress the ntences,we keep the list capped at the100com-pressions with the highest P expand(l|s).We even-tually truncate the list to the best25,still bad upon P expand(l|s).
2.3Special Rules
One difficulty in the u of training data is that so many compressions cannot be modeled by our sim-
ple method.The rules it does model,immediate constituent deletion,as in taking out the ADVP,of S→ADVP,NP VP.,are certainly common,but many good deletions are more structurally compli-cated.One particular type of rule,such as NP(1)→
NP(2)CC NP(3),where the parent has at least one child with the same label as itlf,and the resulting compression is one of the matching children,such as,here,NP(2).There are veral hundred rules of this type,and it is very simple to incorporate into our model.
There are other structures that may be common enough to merit adding,but we limit this experiment to the original rules and our new“special rules.”
3Unsupervid Compression
One of the biggest problems with this model of n-tence compression is the lack of appropriate train-ing data.Typically,abstracts do not em to con-tain short ntences matching long ones elwhere in a paper,and we would prefer a much larger cor-pus.Despite this lack of training data,very good results were obtained both by the K&M model and by our variant.We create a way to compress n-tences without parallel training data,while sticking as cloly to the K&M model as possible.
The source model stays the same,and we still pay a probability cost in the channel model for ev-ery subtree deleted.However,the way we determine P expand(l|s)changes becau we no longer have a parallel text.We create joint rules using only thefirst )of the Penn Treebank.We count all probabilistic context free grammar(PCFG)expan-sions,and then match up similar rules as unsuper-vid joint events.
We change Equation2to calculate P expand(s|l) without parallel data.First,let us define svo(shorter version of)to be:r1svo r2iff the righthand side of r1is a subquence of the righthand side of r2.Then define
count(l)
P expand(l|s)=再别康桥 英文版
be deleted are not labeled with their syntactic role, we add another constraint that disallows deletion of NPs.
6Evaluation
As with Knight and Marcu’s(2000)original work, we u the same32ntence pairs as our Test Cor-p
金融寡头us,leaving us with1035training pairs.After ad-justing the supervid weighting parameter,we fold the development t back into the training data.
We prented four judges with nine compresd versions of each of the32long ntences:A human-generated short version,the K&M version,ourfirst supervid version,our supervid version with our special rules,our supervid version with special rules and additional constraints,our unsupervid version,our supervid version with additional con-straints,our mi-supervid version,and our mi-supervid version with additional constraints.The judges were asked to rate the ntences in two ways: the grammaticality of the short ntences on a scale from1to5,and the importance of the short n-tence,or how well the compresd version retained the important words from the original,also on a scale from1to5.The short ntences were ran-domly shuffled across test cas.
The results in Table1show compression rates, as well as average grammar and importance scores across judges.
There are two main ideas to take away from the results.First,we can get good compressions without paired training data.Second,we achieved a good boost by adding our additional constraints in two of the three versions.
四级阅读理解技巧Note that importance is a somewhat arbitrary dis-tinction,since according to our judges,all of the computer-generated versions do as well in impor-tance as the human-generated versions.
6.1Examples of Results
In Figure1,we give four examples of most compres-sion techniques in order to show the range of perfor-mance that each technique spans.In thefirst two ex-amples,we give only the versions with constraints, becau there is little or no difference between the versions with and without constraints.
Example1shows the additional compression ob-tained by using our special rules.Figure2shows the par trees of the original pair of short and long versions.The relevant expansion is NP→NP1, PP in the long version and simply NP1in the short version.The supervid version that includes the special rules learned this particular common special joint rule from the training data and could apply it to the example ca.This supervid version com-press better than either version of the supervid noisy-channel model that lacks the rules.The un-supervid version does not compress at all,whereas the mi-supervid version is identical with the bet-ter supervid version.
Example2shows how unsupervid and mi-supervid techniques can be ud to improve com-pr
amenable
ession.Although thefinal length of the ntences is roughly the same,the unsupervid and mi-supervid versions are able to take the action of deleting the parenthetical.Deleting parenthes was never en in the training data,so it would be ex-tremely unlikely to occur in this ca.The unsuper-vid version,on the other hand,es both PRN→lrb NP rrb and PRN→NP in its training data,and the mi-supervid version capitalizes on this par-ticular unsupervid rule.
Example3shows an instance of our initial super-vid versions performing far wor than the K&M model.The reason is that currently our supervid model only generates compressions that it has en before,unlike the K&M model,which generates all possible compressions.S→S,ver occurs in the training data,and so a good compression does not exist.The unsupervid and mi-supervid versions do better in this ca,and the supervid version with the added constraints does even better. Example4gives an example of the K&M model being outperformed by all of our other models.
7Problems with Noisy Channel Models of Sentence Compression
To this point our prentation has been rather nor-mal;we draw inspiration from a previous paper,and work at improving on it in various ways.We now deviate from the usual by claiming that while the K&M model works very well,there is a technical problem with formulating the task in this way.
We start by making our noisy channel notation a
original:Many debugging features,including ur-defined break points and
variable-watching and message-watching windows,have been added. human:Many debugging features have been added.
K&M:Many debugging features,including ur-defined points and
variable-watching and message-watching windows,have been added. supervid:Many features,including ur-defined break points and variable-watching
and windows,have been added.
super(+extra rules,constraints):Many debugging features have been added.
unsuper(+constraints):Many debugging features,including ur-defined break
points and variable-watching and message-watching windows,have been added. mi-supervid(+constraints):Many debugging features have been added.
四六级成绩查询官网
original:The faster transfer rate is made possible by an MTI-proprietary data
buffering algorithm that off-loads lock-manager functions from the Q-bus
host,Raimondi said.
human:The algorithm off-loads lock-manager functions from the Q-bus host.
K&M:The faster rate is made possible by a MTI-proprietary data buffering algorithm
that off-loads lock-manager functions from the Q-bus host,Raimondi said. supervid:Raimondi said.
super(+extra rules):Raimondi said.
super(+extra rules,constraints):The faster transfer rate is made possible by an MTI-proprietary data buffering
algorithm,Raimondi said.
unsuper(+constraints):The faster transfer rate is made possible,Raimondi said.
mi-supervid(+constraints):The faster transfer rate is made possible,Raimondi said.
grammar humans  4.96
70.37%  3.85 supervid  4.64
67.41%  3.66 supervid with extra rules and constraints  4.77
79.11%  3.93 unsupervid with constraints  4.51
81.19%  4.18 mi-supervid with constraints  4.75
(S(NP(JJ Many)(JJ debugging)(NNS features))
long:
(PP(VBG including)(NP(NP(JJ ur-defined)(NN break)(NNS points)
(CC and)(NP(JJ message-watching)(NNS windows))))(,,))

本文发布于:2023-07-10 10:51:02,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/90/172998.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:阅读   查询   理解   成绩   技巧   官网
相关文章
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图