rouge摘要评估_ROUGE简介，以及如何使用它评估摘要

更新时间:2023-06-24 11:21:36 阅读：评论：0

日本航母rouge摘要评估_ROUGE简介，以及如何使⽤它评估摘要rouge 摘要评估

by Kavita Ganesan

通过Kavita Ganesan蔡文姬王者荣耀

ROUGE简介，以及如何使⽤它评估摘要 (An intro to ROUGE, and how to u it to evaluate summaries)

ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It is esntially a t of metrics for evaluating automatic summarization of texts as well as machine translations.

腊鱼怎么做好吃

ROUGE代表针对召回评估的⾯向召回的本科。它本质上是⼀组⽤于评估⽂本⾃动摘要和机器翻译的度量。

It works by comparing an automatically produced summary or translation against a t of reference summaries (typically human-produced). Let’s say that we have the following system and reference summaries:

它通过将⾃动产⽣的摘要或翻译与⼀组参考摘要 (通常是⼈⼯产⽣的)进⾏⽐较来⼯作。假设我们有以下系统和参考摘要：

System Summary (what the machine produced):

系统摘要(机器⽣产的产品)：

the cat was found under the bed

Reference Summary (gold standard — usually by humans):

参考摘要(黄⾦标准-通常是⼈类)：

the cat was under the bed

If we consider just the individual words, the number of overlapping words between the system summary and reference summary is 6. This, however, does not tell you much as a metric. To get a good quantitative value, we can actually compute the precision and recall using the overlap.

如果仅考虑单个单词，则系统摘要和参考摘要之间的重叠单词数为6。但是，这并不能告诉您很多度量标准。为了获得良好的定量值，我们实际上可以计算精度并使⽤重叠进⾏调⽤。

Simply put, recall (in the context of ROUGE) refers to how much of the reference summary the syste

m summary is recovering or capturing. If we are just considering the individual words, it can be computed as:

简⽽⾔之，回想(在ROUGE的上下⽂中)指的是多少参考摘要系统摘要正在恢复或捕获。如果我们仅考虑单个单词，则可以将其计算为：

In this example, the recall would thus be:

在此⽰例中，召回将因此为：

This means that all the words in the reference summary have been captured by the system summary, which indeed is the ca for this example. Voila!

这意味着参考摘要中的所有单词都已被系统摘要捕获，对于本⽰例⽽⾔确实如此。瞧！

This looks really good for a text summarization system. But it does not tell you the other side of the story. A machine generated summary (system summary) can be extremely long, capturing all words in the reference summary. But, many of the words in the system summary may be uless, making the summary unnecessarily verbo.

对于⽂本摘要系统来说，这看起来确实不错。但这并不能告诉您故事的另⼀⾯。机器⽣成的摘要(系统摘要)可能⾮常长，会捕获参考摘要中的所有单词。但是，系统摘要中的许多单词可能没有⽤，使摘要不必要地冗长。

This is where precision comes into play. In terms of precision, what you are esntially measuring is, how much of the system summary was in fact relevant or needed? Precision is measured as:

这就是精度发挥作⽤的地⽅。就精度⽽⾔，您实质上要衡量的是，实际上有多少系统摘要是相关的或需要的？精度测量为：

In this example, the Precision would thus be:

因此，在此⽰例中，精度为：

This simply means that 6 out of the 7 words in the system summary were in fact relevant or needed. If we had the following system summary, as oppod to the example above — System Summary 2:

这仅表⽰系统摘要中7个单词中的6个实际上是相关的或需要的。如果我们有以下系统摘要，⽽不是上⾯的⽰例— 系统摘要2：

the tiny little cat was found under the big funny bed

The Precision now becomes:

精度现在变为：

Now, this doesn’t look so good, does it? That is becau we have quite a few unnecessary words in the summary. The precision aspect becomes really crucial when you are trying to generate summaries that are conci in nature. Therefore, it is always best to compute both the precision and recall and then report the F-Measure.

现在，这看起来不太好，不是吗？这是因为摘要中有很多不必要的词。当您尝试⽣成本质上简洁的摘要时，精度⽅⾯变得⾄关重要。因此，始终最好同时计算精度和查全率，然后报告F-Measure 。

If your summaries are in some way forced to be conci through some constraints, then you could consider using just the recall, since precision is of less concern in this scenario.

如果您的摘要在某种程度上受某些约束的约束⽽变得简明扼要，那么您可以考虑仅使⽤召回⽅式，因为在这种情况下，精度不太重要。

ROUGE-N, ROUGE-S, and ROUGE-L can be thought of as the granularity of texts being compared between the system summaries and reference summaries.

可以将ROUGE-N，ROUGE-S和ROUGE-L视为在系统摘要和参考摘要之间进⾏⽐较的⽂本粒度。

abb式成语ROUGE-N — measures unigram, bigram, trigram and higher order n-gram overlap强军观后感

ROUGE-N —度量unigram ， bigram ， trigram 和⾼阶n-gram重叠

ROUGE-L — measures longest matching quence of words using LCS. An advantage of using LCS is that it does not require concutive matches but in-quence matches that reflect ntence level word order. Since it automatically includes longest in-quence common n-grams, you don’t need a predefined n-gram length.

ROUGE-L —使⽤LCS测量最长的单词匹配序列。使⽤LCS的⼀个优点是，它不需要连续匹配，但是需要按顺序进⾏匹配，以反映句⼦级单词的顺序。由于它⾃动包含最长的顺序公共n-gram，因此您不需要预定义的n-gram长度。

ROUGE-S — Is any pair of words in a ntence in order, allowing for arbitrary gaps. This can also be called skip-gram concurrence. For example, skip-bigram measures the overlap of word pairs that can have a maximum of two gaps in between words. As an example, for the phra “cat in the hat” the skip-bigrams would be “cat in, cat the, cat hat, in the, in hat, the hat”.

ROUGE-S —句⼦中的任意⼀对单词，允许任意间隔。这也可以称为跳过语法并发。例如， skip-bigram测量单词对之间的重叠，单词对之间的重叠最⼤为两个间隙。例如，对于短语“戴帽⼦的猫” ，跳过⼆字组将是“戴帽⼦的猫，戴帽⼦的猫，戴帽⼦的猫”。

For example, ROUGE-1 refers to overlap of unigrams between the system summary and reference summary. ROUGE-2 refers to the overlap of bigrams between the system and reference summaries.

例如， ROUGE-1表⽰系统摘要和参考摘要之间的字母组合重叠。 ROUGE-2表⽰系统摘要和参考摘要之间的双字母组重叠。

Let’s take the example from above. Let us say we want to compute the ROUGE-2 precision and recall scores.

让我们从上⾯举个例⼦。假设我们要计算ROUGE-2精度和召回得分。

System Summary:

系统摘要：

the cat was found under the bed

Reference Summary:

参考摘要：

the cat was under the bed

System Summary Bigrams:

系统摘要⼆元组：

the cat, cat was, was found, found under, under the, the bed

Reference Summary Bigrams:

参考摘要Bigrams：

the cat, cat was, was under, under the, the bed

Bad on the bigrams above, the ROUGE-2 recall is as follows:

基于以上的⼆元组，ROUGE-2的召回情况如下：

Esntially, the system summary has recovered 4 bigrams out of 5 bigrams from the reference summary, which is pretty good! Now the ROUGE-2 precision is as follows:

本质上，系统摘要已从参考摘要中的5个双元⽂件中恢复了4个双元⽂件，这⾮常好！现在，ROUGE-2的精度如下：

The precision here tells us that out of all the system summary bigrams, there is a 67% overlap with the reference summary. This is not too bad either. Note that as the summaries (both system and reference summaries) get longer and longer, there will be fewer overlapping bigrams. This is especially true in the ca of abstractive summarization, where you are not directly re-using ntences for summarization.

此处的精度告诉我们，在所有系统摘要⼆元组中，与参考摘要有67％的重叠。这也不错。请注意，随着摘要(系统摘要和参考摘要)变得越来越长，重叠的⼆元组将越来越少。在抽象摘要的情况下尤其如此，在这种情况下，您不直接重复使⽤句⼦进⾏摘要。

如何写教学反思

The reason one would u ROUGE-1 over or in conjunction with ROUGE-2 (or other finer granularity ROUGE measures), is to also show the fluency of the summaries or translation. The intuition is that if you more cloly follow the word orderings of the reference summary, then your summary is actual

ly more fluent.

之所以要使⽤ROUGE-1⽽不是结合使⽤ROUGE-2(或其他更细粒度的ROUGE度量值)，是为了显⽰摘要或翻译的流畅性。直觉是，如果您更仔细地遵循参考摘要的单词顺序，则您的摘要实际上会更流利。

For more in-depth information about the evaluation metrics, you can refer to . Which measure to u depends on the specific task that you are trying to evaluate. If you are working on extractive summarization with fairly verbo system and reference summaries, then it may make n to u ROUGE-1 and ROUGE-L. For very conci summaries, ROUGE-1 alone may suffice, especially if you are also applying stemming and stop word removal.

有关这些评估指标的更多详细信息，请参阅。使⽤哪种度量取决于您要评估的特定任务。如果您正在使⽤相当冗长的系统摘要和参考摘要来进⾏提取摘要，那么使⽤ROUGE-1和ROUGE-L可能是有意义的。对于⾮常简洁的摘要，仅ROUGE-1就⾜够了，尤其是在您还应⽤词⼲和停⽌单词删除的情况下。

不仅而且英文阅读论⽂ (Papers to Read)

rouge 摘要评估

>人际关系能力

本文发布于:2023-06-24 11:21:36，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1052685.html

上一篇：DP460 TDS 环氧树脂

下一篇：【音频处理】之Matlab实现信号的时域和频域的滤波