跨模态检索评价指标(evaluationsofcross-modalretrieval)跨模态检索评价指标(evaluations of cross-modal retrieval)
跨模态检索在实际⽣活中有着⼴泛的应⽤,⽐如说输⼊⼀段话,希望找到对应的图⽚,再⽐如说根据⼀段语⾳得到与描述⼤致相同的图像,这些都是跨模态(modality, 如⽂本,图像,语⾳等等)检索的例⼦,本⽂主要介绍跨模态检索的评价指标,对基本的跨模态检索概念不清楚的话,请⾸先查阅资料。
最近⾯试的时候有⾯试官问我这样⼀个问题,⾕歌也能做到使⽤图像进⾏检索,那么跨模态检索和⾕歌他们做的检索有什么区别?我的观点是这样的。⾸先,两者都是检索问题,肯定存在相似的关联,那就是都要通过⼀个度量⽅式(metric)来检索出想要的⽂章。但是,这两者也存在不同,如果你给⼀张图⽚的话,百度现在只能返回出相似的图像,实际上这个就是content-bad image retrieval(基于内容的图像检索),⾕歌能够检索相似的图像,还能够返回⼀些相关的⽂章,但是这些相关⽂章都是包含这幅图像或者包含相似图像的⽂章(这个⼤家可以⾃⾏试⼀下)。因此,实际上搜索引擎做的跨模态的检索实际上都是基于内容的检索,不管是⽂本还是图像,只会在相同的模态中进⾏检索相似的内容。
TP(True Positive): 真实为1,预测也为1
FN(Fal Negative): 真实为1,预测为0
FP(Fal Positive): 真实为0,预测为1
TN(True Negative): 真实为0,预测也为0
wiki⾥⾯说:Precision can be en as a measure of exactness or quality, whereas recall is a measure of completeness
or quantity. In information retrieval, a perfect precision score of 1.0 means that every result retrieved by a arch was relevant (but says nothing about whether all relevant documents were retrieved) whereas a perfect recall score of 1.0 means that all relevant documents were retrieved by the arch (but says nothing about how many irrelevant documents were also retrieved).
如果仅仅考虑precision或者recall中的⼀个的话,是ill-pod的,如果仅仅考虑precision,那么我可以尽可能返回较少的正例,对于搜索引擎这个例⼦来说,如果我仅仅返回⼀个相关页⾯,⽽且该页⾯确实也是相关的,那么precision=1,但是我们不能说这是⼀个好的搜索引擎。如果仅仅考虑recall,那么我可以把所有的样本判断为正例,那么fal negtive=0, recall=TP/(TP+FN)=1,对于搜索引擎这个例⼦来说,就是返回所有的页⾯,那么recall=1,但是这样同样不能说明这个搜索引擎是好的。
Often, there is an inver relationship between precision and recall, where it is possible to increa one at the cost of reducing the other.Brain surgery provides an illustrative example of the tradeoff. Consider a brain surgeon tasked with removing a cancerous tumor from a patient’s brain. The surgeon needs to remove all of the tumor cells since any remaining cancer cells will regenerate the tumor. Converly, the surgeon must not remove healthy brain cells since that would leave the patient with impaired brain function. The surgeon may be more liberal in the area of the brain she removes to ensure she has extracted all the cancer cells. This decision increas recall but reduces precision. On the other hand, the surgeon may be more conrvative in the brain she removes to ensure she extracts only cancer cells. This decision increas precision but reduces recall. That is to say, greater recall increas the chances of removing healthy cells (negative outcome) and increas the chances of removing all cancer cells (positive outcome). Greater precision decreas the chances of removing healthy cells (positive outcome) but also decreas the chances of removing all cancer cells (negative outcome).
⼤致的思想就是,想要增⼤precision那么结果就是尽可能少的返回正例,那么出错的概率就减⼩了,但是recall⾃然就变⼩了,因为recall 希望返回的正例占总的正例尽可能的多。同理,想要增⼤recall,
2.1 F-measure
当B=2时,F2-measure更看重recall,反之,当B=0.5的情况下,precision的权重⾼于recall。 B可以取任意的⾮负实数。
2.2 P-R curve (precision-recall curve)
Particularly, if true negative is not much valuable to the problem, or negative examples are abundant. Then, PR-curve is typically more appropriate. For example, if the class is highly imbalanced and positive samples are very rare, then u PR-curve. One example may be fraud detection, where non-fraud sample may be 10000 and fraud sample may be below 100. In other cas, ROC curve will be more helpful.
2.3 Average precision
Precision and recall are single-value metrics bad on the whole list of documents returned by the system. For systems that return a ranked quence of documents, it is desirable to also consider the order in which the returned documents are prented. By computing a precision and recall at every position in the ranked quence of documents, one can plot a precision-recall curve, plotting precision as a function of recall . Average precision computes the average value of
over the interval from to :
我们直接根据PR-curve曲线就可以得到average precision,average precision的值就等于PR曲线所围成的⾯积。
where is the rank in the quence of retrieved documents, is the number of retrieved documents, is the precision at cut-off in the list, and is the change in recall from items to .
where is an indicator function equaling 1 if the item at rank is a relevant document, zero
分母代表的是TP+FN,也就是所有的正例的个数,这个可以根据公式求出。但是,很多代码⾥⾯都没有⽤TP+FN,⽽是使⽤的是n或者count(rel(k)==1)替代,不知道能不能这样?⽐如说下⾯这个:
2.4 Mean Average Precision
进⾏Q次试验(例如检索),求出平均的average precision,就可以得到mAP:
def calc_map(qB, rB, query_L, retrieval_L):
# qB: {-1,+1}^{mxq}
# rB: {-1,+1}^{nxq}
# query_L: {0,1}^{mxl}
# retrieval_L: {0,1}^{nxl}
num_query = query_L.shape[0]
map = 0
儿歌大全for iter in xrange(num_query):
gnd = (np.dot(query_L[iter, :], anspo()) > 0).astype(np.float32)
tsum = np.sum(gnd)
if tsum == 0:
hamm = calc_hammingDist(qB[iter, :], rB)
ind = np.argsort(hamm)
gnd = gnd[ind]
count = np.linspace(1, tsum, tsum)
tindex = np.asarray(np.where(gnd == 1)) + 1.0
map = map + np.mean(count / (tindex))
map = map / num_query
return map
也可以使⽤top-k map,也就是计算map的时候仅仅使⽤返回的10000个检索结果中前top-k上计算map,如下代码就是top-50 map: