Tripletloss源码解析
2021最后⼀天,赶紧学习⼀⼿吧。由于 Triplet loss 很重要,⽽代码不复习⼜很容易忘,这⾥记录⼀下。
论⽂在这⾥:
那咱们开始吧!
⼀、Triplet loss
如上图所⽰,三元组损失(Triplet loss)由anchor、positve、negative组成,简记为<a,p,n>,其中 anchor 表⽰训练样本,positive 表⽰预测为整样本,negative表⽰预测为负样本。
triplet loss的作⽤:⽤于减少 positive(正样本)与 anchor 之间的距离,扩⼤ negative(负样本)与 anchor之间的距离。基于上述三元组,可以构建⼀个 positive pair <a, p> 和⼀个 negative pair <a, n>。triplet loss的⽬的是在⼀定距离(margin)上把 positive pair 和 negative pair 分开。所以我们希望:。进⼀步希望在⼀定距离margin上 满⾜这个情况:
其中 表⽰合页函数,也就是待会代码⾥的 MarginRankingLoss
⼆、代码实现
搞清楚⼀个事,Triplet loss 它只是⼀个loss⽽已,不会影响样本的训练,所以代码实现是损失怎么写,以及每⼀⾏代码代表什么意思。1. loss.py
def normalize(x, axis=-1):
x = 1. * x / ((x, 2, axis, keepdim=True).expand_as(x) + 1e-12)
return x
def euclidean_dist(x, y):
m, n = x.size(0), y.size(0)
xx = torch.pow(x, 2).sum(1, keepdim=True).expand(m, n)
yy = torch.pow(y, 2).sum(1, keepdim=True).expand(n, m).t()
dist = xx + yy
dist.addmm_(1, -2, x, y.t())
dist = dist.clamp(min=1e-12).sqrt()
return dist
def hard_example_mining(dist_mat, labels, return_inds=Fal):
asrt len(dist_mat.size()) == 2
asrt dist_mat.size(0) == dist_mat.size(1)
N = dist_mat.size(0)
# shape [N, N]
is_pos = pand(N, N).pand(N, N).t())
is_neg = pand(N, N).pand(N, N).t())
# `dist_ap` means distance(anchor, positive)
# both `dist_ap` and `relative_p_inds` with shape [N, 1]
dist_ap, relative_p_inds = torch.max(
dist_mat[is_pos].contiguous().view(N, -1), 1, keepdim=True)
# `dist_an` means distance(anchor, negative)
# both `dist_an` and `relative_n_inds` with shape [N, 1]
dist_an, relative_n_inds = torch.min(
dist_an, relative_n_inds = torch.min(
dist_mat[is_neg].contiguous().view(N, -1), 1, keepdim=True)
# shape [N]
dist_ap = dist_ap.squeeze(1) # compression dimension
dist_an = dist_an.squeeze(1)
# calculate the indexs of hard positive and hard negative in dist_mat matrix
if return_inds:
# shape [N, N]
ind = (w().resize_as_(labels)
.copy_(torch.arange(0, N).long())
.unsqueeze( 0).expand(N, N))
# shape [N, 1]
p_inds = torch.gather(
ind[is_pos].contiguous().view(N, -1), 1, relative_p_inds.data)
n_inds = torch.gather(
ind[is_neg].contiguous().view(N, -1), 1, relative_n_inds.data)
# shape [N]
p_inds =缺火的女孩名字
p_inds.squeeze(1)
n_inds = n_inds.squeeze(1)
return dist_ap, dist_an, p_inds, n_inds
return dist_ap, dist_an
def global_loss(tri_loss, global_feat, labels, normalize_feature=True):
if normalize_feature:
global_feat = normalize(global_feat, axis=-1)
# shape [N, N]
dist_mat = euclidean_dist(global_feat, global_feat)
dist_ap, dist_an, p_inds, n_inds = hard_example_mining(
dist_mat, labels, return_inds=True)
loss = tri_loss(dist_ap, dist_an)
return loss, p_inds, n_inds, dist_ap, dist_an, dist_mat
这个代码要怎么看呢? 从debug跳转的地⽅,即 def global_loss(tri_loss, global_feat, labels, normalize_feature=True): 着⼿
if normalize_feature:
global_feat = normalize(global_feat, axis=-1)
这⾥debug是直接跳过,代表不执⾏。接着下⼀步dist_mat = euclidean_dist(global_feat, global_feat),这显然是在计
算global_feat与global_feat之间的欧式距离,那么函数会跳转到euclidean_dist(x,y),显然这⾥的x,y都是指global_feat
def euclidean_dist(x, y):
"""
Args:
x: pytorch Variable, with shape [m, d]
y: pytorch Variable, with shape [n, d]
Returns:
dist: pytorch Variable, with shape [m, n]
"""
m, n = x.size(0), y.size(0)
xx = torch.pow(x, 2).sum(1, keepdim=True).expand(m, n)
yy = torch.pow(y, 2).sum(1, keepdim=True).expand(n, m).t()
dist = xx + yy
dist.addmm_(1, -2, x, y.t())
dist = dist.clamp(min=1e-12).sqrt()
return dist
我们来看⼀下euclidean_dist(global_feat, global_feat)是怎么计算的。假设global_feat的样本维度为n,则第i个样本为:
样本矩阵为:
xx = torch.pow(x, 2).sum(1, keepdim=True).expand(m, n)
这⾏代码表⽰,⾸先进⾏pow(x,2),即元素级幂运算 ,样本矩阵变为:
然后对每⾏元素进⾏求和,同时保持维度,sum(dim=1.keepdim=True);紧接着在将其扩张为 m x n 阶矩阵(在代码⾥m=n=128):
dist = xx + yy
将其与其转置相加(在代码⾥m=n=128 ),得到的 dist 矩阵为:
dist.addmm_(1, -2, x, y.t())
这⼀⾏代码是在执⾏公式: dist=1*dist-2*(x @ ) ,其中@为矩阵乘法
化简得出最后结果为:
可能这样看不出什么神奇,我们来验证⼀下:假设第i个样本为,第j个样本为,则样本的差值为:
,紧接着,L2范数(也就是欧⽒距离)为:
亦即:
,给它平⽅⼀下再整理,容易得到:
。这表⽰什么意思?即第i个样本与第j个样本的平⽅距离。
dist = dist.clamp(min=1e-12).sqrt()
最后,进⾏区间压缩,最⼩值设为1e-12,同时进⾏元素级开⽅(有出⼊:
)
这⼀步dist_mat = euclidean_dist(global_feat, global_feat)计算牛肉柿子
完了,现在得到了⼀个⼤⼩为 128 x 12
8 的 dist_mat 矩阵,它计算的是global_feat 间的欧⽒距离。下⼀步 debug 到 dist_ap, dist_an, p_inds, n_inds = hard_example_mining(dist_mat, labels,
return_inds=True),跳转到 def hard_example_mining() 模块。
def hard_example_mining(dist_mat, labels, return_inds=Fal):
asrt len(dist_mat.size()) == 2
asrt dist_mat.size(0) == dist_mat.size(1)
N = dist_mat.size(0)
# shape [N, N]
is_pos = pand(N, N).pand(N, N).t())
is_neg = pand(N, N).pand(N, N).t())
# `dist_ap` means distance(anchor, positive)
# both `dist_ap` and `relative_p_inds` with shape [N新年祝语
, 1]
dist_ap, relative_p_inds = torch.max(
dist_mat[is_pos].contiguous().view(N, -1), 1, keepdim=True)
# `dist_an` means distance(anchor, negative)
# both `dist_an` and `relative_n_inds` with shape [N, 1]
dist_an, relative_n_inds = torch.min(
dist_mat[is_neg].contiguous().view(N, -1), 1, keepdim=True)
# shape [N]
dist_ap = dist_ap.squeeze(1) # compression dimension
dist_an = dist_an.squeeze(1)
if return_inds:
# shape [N, N]
ind = (w().resize_as_(labels)
.copy_(torch.arange(0, N).long())
.unsqueeze( 0).expand(N, N))
# shape [N, 1]
p_inds = torch.gather(
ind[is_pos].contiguous().view(N, -1), 1, relative_p组织人事
_inds.data)
n_inds = torch.gather(
ind[is_neg].contiguous().view(N, -1), 1, relative_n_inds.data)
# shape [N]
p_inds = p_inds.squeeze(1)
n_inds = n_inds.squeeze(1)
return dist_ap, dist_an, p_inds, n_inds
return dist_ap, dist_an
这⼀块代码主要是在进⾏样本挖掘,也就是找到hardest positive 和 hardest negative. 在解析之前,我们要先弄清
楚hard_example_mining(dis_mat, labels, return_inds=Fal)函数的输⼊输出是什么。
输⼊:
1.距离矩阵dist_mat,维度(batch_size,batch_size)((注意啦,在代码⾥P=32,K=4,⽽在论⽂⾥P=18,K=4))
2.本批次特征向量对应的⾏⼈ID labels,维度(batch_size)
3.是否返回最⼩相似度正样本与最⼤相似度负样本所对应的距离矩阵的序号return_indexs,默认为Fal
输出:
1.正样本区(hardest positive)最⼩相似度张量dist_ap,维度(batch_size)
2.负样本区(hardest negative)最⼤相似度张量dist_an,维度(batch_size)
3.正样本区最⼩相似度样本对应的距离矩阵下标p_indexs,维度(batch_size)
4.负样本区最⼤相似度样本对应的距离矩阵下标,n_indexs,维度(batch_size)
接下来,我们开始学习样本挖掘的写法(注意思路)。
asrt len(dist_mat.size()) == 2 #先判断dis_mat是不是⼆维矩阵,若不是,则报错
asrt dist_mat.size(0) == dist_mat.size(1) #再判断dis_mat是否为⽅阵,若不是,则报错
#挖掘hardest_positive,相同标签为True,不同标签为Fal
is_pos = pand(N, N).pand(N, N).t())
#挖掘hardest_negative,相同标签为True,不同标签为Fal
is_neg = pand(N, N).pand(N, N).t())
#计算最⼩相似度(最⼤距离)正样本距离与最⼩相似度所对应正样本的序号(序号范围0~N-1)
'''
torch.max函数不仅返回每⼀列中最⼤值的那个元素,并且返回最⼤值对应索引
.contiguous()⽤于将dis_mat中正样本区dist_mat[is_pos]的距离矩阵拉成⼀维连续向量
.view(N,-1)⽤于改变矩阵形状,N表⽰⾏数,-1表⽰⾃动填充列
'''
dist_ap, relative_p_inds = torch.max(
dist_mat[is_pos].contiguous().view(N, -1), 1, keepdim=True)
#计算最⼤相似度(最⼩距离)负样本距离与最⼤相似度所对应负样本的序号(序号范围0~N-1)
dist_an, relative_n_inds = torch.min(
dist_mat[is_neg].contiguous().view(N, -1), 1, keepdim=True)
#上⾯计算得到的dist_ap与dist_an维度(batch_size,1)压缩,最后维度维[batc化妆品成分查询网
h_size,]
#squzze(1)表⽰去除size为1的维度
dist_ap = dist_ap鲁班的发明
.squeeze(1) # compression dimension
dist_an = di土豆吃法
st_an.squeeze(1)
#计算最⼩相似度正样本与最⼤相似度负样本在距离矩阵中的序号
if return_inds:
# shape [N, N]
ind = (w().resize_as_(labels)
.copy_(torch.arange(0, N).long())
.unsqueeze( 0).expand(N, N))
# shape [N, 1]
p_inds = torch.gather(
ind[is_pos].contiguous().view(N, -1), 1, relative_p_inds.data)
n_inds = torch.gather(
ind[is_neg].contiguous().view(N, -1), 1, relative_n_inds.data)
# shape [N]
p_ind蚬仔
s = p_inds.squeeze(1)
n_inds = n_inds.squeeze(1)
return dist_ap, dist_an, p_inds, n_inds
⾏,现在样本挖掘也处理完了,debug到下⼀步:loss=tri_loss(dist_ap,dist_an),这时候代码会跳转到 TripletLoss.py , 我们接着学习损失计算怎么写。
2. TripletLoss.py