AllenNLP系列⽂章之四:指代消解cross fire
指代消解是⾃然语⾔处理的⼀⼤任务之⼀,它是信息抽取不可或缺的组成部分。在信息抽取中,由于⽤户关⼼的事件和实体间语义关系往往散布于⽂本的不同位置,其中涉及到的实体通常可以有多种不同的表达⽅式,例如某个语义关系中的实体可能是以代词形式出现的,为了更准确且没有遗漏地从⽂本中抽取相关信息,必须要对⽂章中的指代现象进⾏消解。指代消解不但在信息抽取中起着重要的作⽤,⽽且在机器翻译、⽂本摘要和问答系统等应⽤中也极为关键。
怎样才能化好妆如本⽅第⼀句话: “指代消解是⾃然语⾔处理的⼀⼤任务之⼀,它是信息抽取不可或缺的组成部分。”
AllenNLP很Nice的⼀点是,提供了指代消解的功能,其介绍如下:
Coreference Resolution
Coreference resolution is the task of finding all expressions that refer to the same entity in a text. It is an important step for many higher level NLP tasks that involve natural language understanding, such as document summarization, question answering and information extraction. Our implementation is bad on --a neural model which considers all possible spans in the document as potential mentions and learns distributions over possible anteceedents for each span. This approach achieved state-of-the-
sceneries
art results on the datat in early 2017. The AllenNLP implementation achives 63.0% F1 on the CoNLL test t. Plea note that this model does not include speaker features (impractical for general u), variational dropout (currently difficult to implement in Pytorch) or data augmentation and considers 100 anteceedents rather than 250 due to memory constraints.
指代消解的基本实现原理可以见stanford的CS224n课程15的介绍,其基本原理是找到⼀个句⼦中的所有mention,然后两两配对,评分,如课程PPT中的图⽰:
由于机器并不知道哪些会成为⼀个Coreference Cluster,因此需要两两配对,再打分。单片机培训
打分后聚类的结果如下,从⽽可实现指代消解。
nj
band
1、论⽂原理
见方即⾥⾯集成了ACL 2017年的指代消解算法,End-to-end Neural Coreference Resolution。它针对的问题就是上⾯配对的数量随着⽂档⽽指数增长的问题,因此采⽤⼀些策略来减少配对,提⾼速度,同时在精度上也有所提升。
Scoring all span pairs in our end-to-end model is impractical, since the complexity would be quartic in the document length. Therefore we factor the model over unary mention scores and pairwi antecedent scores, both of which are simple functions of the learned span embedding. The unary mention scores are ud to prune the space of spans and antecedents, to aggressively reduce the number of pairwi computations.
其技术框架 如下:
上述总共分为两个步骤,输⼊是词向量(含字符向量),然后得到每个mention及其得分,引⼊了head attention机制来实现配对的优化。
2、论⽂实践
mper fi
per(1)测试例⼦:The woman reading a newspaper sat on the bench with her dog.
从其结果可知其聚类结果为【0-4】,【10】两个配对,即:中秋晚会致辞
测试结果的可视化如WEB页⾯所⽰: