首页 > 美文鉴赏

对比学习（ContrastiveLearning）中的损失函数

更新时间:2023-07-14 14:24:18 阅读：评论：0

对⽐学习（ContrastiveLearning ）中的损失函数

⽂章⽬录

写在前⾯

最近在基于对⽐学习做实验，github有许多实现，虽然直接套⽤即可，但是细看之下，损失函数部分甚是疑惑，故学习并记录于此。关于对⽐学习的内容⽹络上已经有很多内容了，因此不再赘述。本⽂重在对InfoNCE的两种实现⽅式的记录。

应时而生

⼀、Info Noi-contrastive estimation(Info NCE)

双子座今日运势星座屋

1.1 描述

InfoNCE在中被描述为：

其中是超参。

分⼦表⽰：对的点积。所谓点积就是描述和两个向量之间的距离。

分母表⽰：对所有的点积。所谓所有就是指正例（positive sample）和负例（negative sample），所以

求和号是从到，⼀共项。

1.2 实现

的\moco\builder.py 中，实现如下：

# compute logits

# Einstein sum is more intuitive

# positive logits: Nx1

l_pos = torch .einsum ('nc,nc->n', [q , k ]).unsqueeze (-1)

# negative logits: NxK

l_neg = torch .einsum ('nc,ck->nk', [q , lf .queue .clone ().detach ()])

# logits: Nx(1+K)

logits = torch .cat ([l_pos , l_neg ], dim =1)

容许和允许的区别

# apply temperature

logits /= lf .T

# labels: positive key indicators

labels = torch .zeros (logits .shape [0], dtype =torch .long ).cuda ()

...

return logits , labels 这⾥的变量logits 的意义我也查了⼀下：是未进⼊softmax的概率这段代码根据注释即可理解：l_pos 表⽰正样本的得分，l_neg 表⽰所有负样本的得分，logits 表⽰将正样本和负样本在列上cat起来之后的值。值得关注的是，labels 的数值，是根据logits.shape[0]的⼤⼩⽣成的⼀组zero 。也就是⼤⼩为batch_size 的⼀组0。

接下来看损失函数部分，\main_moco.py ：

L =q −log exp q ⋅k /τ∑i =0K (i )exp q ⋅k /τ(+)(1)

τq k +q k +q k i =0K K +1

# define loss function (criterion) and optimizer

criterion = nn .CrossEntropyLoss ().cuda (args .gpu )

...

# compute output

output , target = model (im_q =images [0], im_k =images [1])

loss = criterion (output , target )这⾥直接对输出的logits 和⽣成的labels 计算交叉熵，然后就是模型的loss。这⾥就是让我不是很理解的地⽅。先将疑惑埋在⼼⾥～⼆、HCL

2.1 描述

在⽂章中描述到，使⽤负样本的损失函数为：

分⼦：表⽰学到的表⽰和正样本的点积。（其实也就是正样本的得分）

分母：第⼀项表⽰正样本的得分，第⼆项表⽰负样本的得分。其实本质上适合InfoNCE⼀个道理，都是mean(-log(正样本的得分/所有样本的得分))。

2.2 实现

但是在这篇⽂章的中，\image\main.py ：

def criterion (out_1,out_2,tau_plus ,batch_size ,beta , estimator ):

# neg score西林寺

out = torch .cat ([out_1, out_2], dim =0)

大衣英语怎么读

neg = torch .exp (torch .mm (out , out .t ().contiguous ()) / temperature )

old_neg = neg .clone ()

大豆的功效mask = get_negative_mask (batch_size ).to (device )

neg = neg .masked_lect (mask ).view (2 * batch_size , -1)社会道德

# pos score

pos = torch .exp (torch .sum (out_1 * out_2, dim =-1) / temperature )

pos = torch .cat ([pos , pos ], dim =0)

# negative samples similarity scoring

if estimator =='hard':

N = batch_size * 2 - 2

imp = (beta * neg .log ()).exp ()

reweight_neg = (imp *neg ).sum (dim = -1) / imp .mean (dim = -1)

Ng = (-tau_plus * N * pos + reweight_neg ) / (1 - tau_plus )

# constrain (optional)

Ng = torch .clamp (Ng , min = N * np .e **(-1 / temperature ))

elif estimator =='easy':

Ng = neg .sum (dim =-1)

el :

rai Exception ('Invalid estimator lected. Plea u any of [hard, easy]')

# contrastive loss

loss = (- torch .log (pos / (pos + Ng ) )).mean ()

return loss

可以看到最后计算loss的公式是：

炸小黄花鱼

loss = (- torch .log (pos / (pos + Ng ) )).mean ()

E −log x ∼p ,x ∼p +x +[e +e f (x )f x T (+)N Q ∑i =1N f (x )f x T (i −)e f (x )f x T (+)

](2)

e f (x )f (x )T +f (x )f (x )+

的确与我上⽂中的理解相同，可是为什么这样的实现，没有⽤到全0的label 呢？

三、⽂字解释

既然是同⼀种⽅法的两种实现，已经理解了第⼆种实现(HCL)。那么，问题就出在了：不理解第⼀种实现的label为何要这样⽣成? 于是乎，查看交叉熵的计算⽅式：交叉熵的label的作⽤是：将label 作为索引，来取得中的项()，因此，这些项就是label。⽽倘若label是全0的项，那么其含义为：中的第⼀列为label（正样本），其他列就是负样本。然后带⼊公式(3)中计算，即可得到交叉熵下的loss值。

⽽对于HCL的实现⽅式，是直接将InfoNCE拆解开来，使⽤正样本的得分和负样本的得分来计算。

四、代码解释

⾸先，⽣成pos得分和neg的得分：

注意，这⾥省略了⽣成的特征，直接⽣成了得分，

4.1 Info NCE

4.2 HCL

嗒哒～两者的结果“⼀模⼀样”(取值范围导致最后⼀位不太⼀样)loss(x ,class )=−log =(exp(x [j ])∑j exp(x [class ]))−x [class ]+log exp(x [j ])(j ∑)(3)

x x [class ]x

本文发布于:2023-07-14 14:24:18，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1081247.html

上一篇：doa是什么意思

下一篇：英语专业论文A Contrastive Study of English and Chine Euphemism