跨模态检索评价指标（evaluationsofcross-modalretrieval）

更新时间:2023-06-07 01:55:09 阅读：评论：0

跨模态检索评价指标（evaluationsofcross-modalretrieval）跨模态检索评价指标(evaluations of cross-modal retrieval)

1：背景

跨模态检索在实际⽣活中有着⼴泛的应⽤，⽐如说输⼊⼀段话，希望找到对应的图⽚，再⽐如说根据⼀段语⾳得到与描述⼤致相同的图像，这些都是跨模态(modality, 如⽂本，图像，语⾳等等）检索的例⼦，本⽂主要介绍跨模态检索的评价指标，对基本的跨模态检索概念不清楚的话，请⾸先查阅资料。

最近⾯试的时候有⾯试官问我这样⼀个问题，⾕歌也能做到使⽤图像进⾏检索，那么跨模态检索和⾕歌他们做的检索有什么区别？我的观点是这样的。⾸先，两者都是检索问题，肯定存在相似的关联，那就是都要通过⼀个度量⽅式(metric)来检索出想要的⽂章。但是，这两者也存在不同，如果你给⼀张图⽚的话，百度现在只能返回出相似的图像，实际上这个就是content-bad image retrieval（基于内容的图像检索），⾕歌能够检索相似的图像，还能够返回⼀些相关的⽂章，但是这些相关⽂章都是包含这幅图像或者包含相似图像的⽂章（这个⼤家可以⾃⾏试⼀下）。因此，实际上搜索引擎做的跨模态的检索实际上都是基于内容的检索，不管是⽂本还是图像，只会在相同的模态中进⾏检索相似的内容。

但是，跨模态检索的优势就在于，他可以直接⽐较⽂本和图像，当然也可以⽐较图像和图像，因此，如果我们给了⼀幅图像，如果能够很好的提取出图像中的语义信息，那么检索回来的⽂本（⽂章）可能根

本不包含这个图像，但是包含的恰恰就是该图像的描述，因此⽐普通的检索要准确。

2：跨模态检索评价指标

混淆矩阵：

TP(True Positive): 真实为1，预测也为1

FN(Fal Negative): 真实为1，预测为0

儿歌铃儿响叮当FP(Fal Positive): 真实为0，预测为1

TN(True Negative): 真实为0，预测也为0

灯笼灯谜分类模型总体判断的准确率(包括了所有class的总体准确率)：

precision（精确率）：正例的识别率，⽐如说⼀张图⽚⾥⾯有12只狗，我们要做的是物体识别，此时我们识别出8只狗（其中5只是狗，3只是猫），precision=5/8.

recall(召回率）：正例的识别率，⽐如说⼀张图⽚⾥⾯有12只狗，我们要做的是物体识别，此时我们识别出8只狗（其中5只是狗，3只是猫），recall=5/12.

再⽐如说，搜索引擎返回了30个相关页⾯，其中20个是真正相关的，实际上另外还有40个相关页⾯，但是搜索引擎没有返回，那么precision=20/30=2/3，recall=20/(20+40)=1/3.

上⾯这张图说明了precision和recall之间的联系，中间的圆形代表的返回值（或者是预测为正例，对于上⾯举的例⼦来说，返回的8只狗以及30个相关页⾯就是中间的圆形部分），precision就是返回值中正确判断的概率（返回的正例中实际上也为正例的⽐例），recall就是返回的正确判断的部分占所有正例的⽐例。

wiki⾥⾯说：Precision can be en as a measure of exactness or quality, whereas recall is a measure of completeness

or quantity. In information retrieval, a perfect precision score of 1.0 means that every result retrieved by a arch was relevant (but says nothing about whether all relevant documents were retrieved) whereas a perfect recall score of 1.0 means that all relevant documents were retrieved by the arch (but says nothing about how many irrelevant documents were also retrieved).

如果仅仅考虑precision或者recall中的⼀个的话，是ill-pod的，如果仅仅考虑precision，那么我可以尽可能返回较少的正例，对于搜索引擎这个例⼦来说，如果我仅仅返回⼀个相关页⾯，⽽且该页⾯确实也是相关的，那么precision=1，但是我们不能说这是⼀个好的搜索引擎。如果仅仅考虑recall，那么我可以把所有的样本判断为正例，那么fal negtive=0, recall=TP/(TP+FN)=1，对于搜索引擎这个例⼦来说，就是返回所有的页⾯，那么recall=1，但是这样同样不能说明这个搜索引擎是好的。

⼀般来说，precision和recall是此消彼长的，下⾯⽤⼀个外科⼿术来说明这种关系：

Often, there is an inver relationship between precision and recall, where it is possible to increa one at the cost of reducing the other.Brain surgery provides an illustrative example of the tradeoff. Consider a brain surgeon tasked with removing a cancerous tumor from a patient’s brain. The surgeon needs to remove all of the tumor cells since any remaining cancer cells will regenerate the tumor. Converly, the surgeon must not remove healthy brain cells since that would leave the patient with impaired brain function. The surgeon may be more liberal in the area of the brain she removes to ensure she has extracted all the cancer cells. This decision increas recall but reduces precision. On the other hand, the surgeon may be more conrvative in the brain she removes to ensure she extracts only cancer cells. This decision increas precision but reduces recall. That is to say, greater recall increas the chances of removing healthy cells (negative outcome) and increas the chances of removing all cancer cells (positive outcome). Greater precision decreas the chances of removing healthy cells (positive outcome) but also decreas the chances of removing all cancer cells (negative outcome).

⼤致的思想就是，想要增⼤precision那么结果就是尽可能少的返回正例，那么出错的概率就减⼩了，但是recall⾃然就变⼩了，因为recall 希望返回的正例占总的正例尽可能的多。同理，想要增⼤recall，

那么我们可以把所有的都当做正例返回，但是这样的结果就是出错的概率增⼤了（负例被判断为正例），那么precision⾃然就减⼩了。

2.1 F-measure

所以precision以及recall很少被单独作为评价指标，⼀些根据这两者构成的评价指标，⽐如说F-measure⼀般被⽤来评价算法。

F1-measure评价指标下，recall和precision是同等重要的（权重⼀致），存在其他的F-measure⽅法，在这些⽅法⾥⾯，recall和precision的权重就不⼀定相同：

当B=2时，F2-measure更看重recall，反之，当B=0.5的情况下，precision的权重⾼于recall。 B可以取任意的⾮负实数。

2.2 P-R curve (precision-recall curve)

Particularly, if true negative is not much valuable to the problem, or negative examples are abundant. Then, PR-curve is typically more appropriate. For example, if the class is highly imbalanced and positive samples are very rare, then u PR-curve. One example may be fraud detection, where non-fraud sample may be 10000 and fraud sample may be below 100. In other cas, ROC curve will be more helpful.

其说明，如果是不平衡类，正样本的数⽬⾮常的稀有，⽽且很重要，⽐如说在诈骗交易的检测中，⼤部分的交易都是正常的，但是少量的⾮正常交易确很重要，这时候使⽤PR-curve就更加合适，再⽐如说搜索引擎检索⾥⾯，总的页⾯⾮常多，但是检索得到relevent的页⾯很少，但是relecece很重要，所以⽤PR-curve更好。其中关于ROC曲线的⽂章请查看.

当我们根据学习器的预测结果对样例进⾏排序（排在前⾯的时学习器认为“最可能”是正例的样本），我们计算每个位置的准确率和召回率，描出来就会得到⼀个P-R曲线。也就是说，根据预测结果进⾏排序之后，我们选择1个正例（学习器返回1个正例），我们计算precision以及recall，画出⼀个点，然后选择2个正例，3个，...，这样就能得到⼀个曲线。⼀般情况下，PR-curve在上⾯的⽅法效果更好，如下图：

2.3 Average precision

Precision and recall are single-value metrics bad on the whole list of documents returned by the system. For systems that return a ranked quence of documents, it is desirable to also consider the order in which the returned documents are prented. By computing a precision and recall at every position in the ranked quence of documents, one can plot a precision-recall curve, plotting precision as a function of recall . Average precision computes the average value of

over the interval from to :

我们直接根据PR-curve曲线就可以得到average precision，average precision的值就等于PR曲线所围成的⾯积。

积分形式不好计算，因此下⾯使⽤求和式来近似的代替AveP:

where is the rank in the quence of retrieved documents, is the number of retrieved documents, is the precision at cut-off in the list, and is the change in recall from items to .

上⾯的求和式等价于下⾯的式⼦：

where is an indicator function equaling 1 if the item at rank is a relevant document, zero

otherwi. Note that the average is over all relevant documents and the relevant documents not retrieved get a precision score of zero.分母代表的是TP+FN，也就是所有的正例的个数，这个可以根据公式求出。但是，很多代码⾥⾯都没有⽤TP+FN，⽽是使⽤的是n或者count(rel(k)==1)替代，不知道能不能这样？⽐如说下⾯这个：

2.4 Mean Average Precision

进⾏Q次试验（例如检索），求出平均的average precision，就可以得到mAP:

map计算的时候可以在得到的所有检索结果上做，⽐如说图像检索，总共有10000张图像数据库，现在有32张未知的图像数据，我们使⽤这32张图像检索10000张图像中相似的图像。map可以在返回的排好序的10000个检索结果上进⾏，如下：

def calc_map(qB, rB, query_L, retrieval_L):

# qB: {-1,+1}^{mxq}

# rB: {-1,+1}^{nxq}窘迫的英文

房子绘画# query_L: {0,1}^{mxl}

# retrieval_L: {0,1}^{nxl}

num_query = query_L.shape[0]内存推荐

map = 0

for iter in xrange(num_query):

光圈和景深的关系gnd = (np.dot(query_L[iter, :], anspo()) > 0).astype(np.float32)

tsum = np.sum(gnd)

if tsum == 0:

continue

hamm = calc_hammingDist(qB[iter, :], rB)

ind = np.argsort(hamm)

gnd = gnd[ind]

梦见车翻了count = np.linspace(1, tsum, tsum)

tindex = np.asarray(np.where(gnd == 1)) + 1.0

map = map + np.mean(count / (tindex))

map = map / num_query

ECC内存

return map

也可以使⽤top-k map，也就是计算map的时候仅仅使⽤返回的10000个检索结果中前top-k上计算map，如下代码就是top-50 map：

本文发布于:2023-06-07 01:55:09，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/890085.html

上一篇：教师职业规划题目(五篇)

下一篇：最新幼儿园大班安全工作计划总结幼儿园大班安全工作计划下学期(十篇)

标签：检索图像正例返回模态

留言与评论（共有 0 条评论）