大白话讲解三元组triplet损失函数及源码（FaceNet）

更新时间:2023-07-10 00:59:26 阅读：评论：0

⼤⽩话讲解三元组triplet损失函数及源码（FaceNet）

对于Facenet进⾏⼈脸特征提取，算法内容较为核⼼和⽐较难以理解的地⽅在于三元损失函数Triplet-loss。此损失函数原理⽐较简单，但是如何实施及操作就有点难以理解，本篇博客希望能够以⼤⽩话讲解此损失函数，使得刚接触此损失函数的⼈能够更好的理解。

⾸先我们需要明确⼀下⼏点：

（1）在深度学习训练中，我们需要尽可能的学习训练难样本，也即hard，为什么需要尽可能学习这些呢？⾸先这些是对损失影响⽐较⼤，如果我们学习了很多对损失函数影响⽐较⼩的样本其实效果不太好也浪费资源，⽐如在正负样本不均衡时，负样本很多，如果⼀直学习负样本的话，训练起来对损失影响⽐较⼩，所以我们会采取⼀些措施，⽐如何⼤神的focal loss 或者对正负样本sample。此外类似于⽀持向量机，我们不仅要把正负样例分开还要把最难分的样例分开。在此三元组损函数中的hard也即这个意思。

（2）作者的三元组样例是在mini_batch中找的。

triplet-loss（三元组损失函数）的思想⾮常简单：通过学习使得类别内部的样本距离⼩于不同类别样本的

距离即可。具体效果即为下图所⽰：

数学表⽰式是什么？

进⼀步的损失函数变为如下：

损失函数确定好之后如何在训练时寻找anchor对应的negative样本和positive样本成为⼀个要着重考虑的问题。那么这个损失函数具体是什么意思呢？

先选定a-p两元数组，然后在不是同⼀个⼈的⾥⾯找⼀个距离此⼈的距离⼩于alfa的样本（这句话就对应上⾯的损失函数），因为这个是最难分的（从上⾯满⾜条件的样本中随机选的），所以为hard 需要注意的是其中同⼀个⼈的图⽚不能作为negative所以将其距离设为⽆穷⼤。这样的话就排除了那些同⼀⼈的样本，因为同⼀⼈的样本的距离肯定⼩于alfa。

具体过程如下：

1、调⽤sample_people()⽅法从训练数据集中抽取⼀组图

2、计算得到这组图⽚在当时的⽹络模型中的embedding，保存在emb_array当中。

3、调⽤lect_triplets()得到（A,P,N）三元组

# 1.sample_people过程及源码解析

#从数据集中进⾏抽样图⽚，参数为训练数据集，每⼀个batch抽样多少⼈，每个⼈抽样多少张

def sample_people(datat, people_per_batch, images_per_person):

#总共应该抽样多少张默认：people_per_batch：45 images_per_person：40

nrof_images = people_per_batch * images_per_person

#数据集中⼀共有多少⼈的图像

nrof_class = len(datat)

#每个⼈的索引

class_indices = np.arange(nrof_class)

#随机打乱⼀下

np.random.shuffle(class_indices)

电脑无信号i = 0

#保存抽样出来的图像的路径

image_paths = []

#抽样的样本是属于哪⼀个⼈的，作为label

num_per_class = []

sampled_class_indices = []

# Sample images from the class until we have enough

# 不断抽样直到达到指定数量

while len(image_paths)<nrof_images:

#从第i个⼈开始抽样

class_index = class_indices[i]

#第i个⼈有多少张图⽚

nrof_images_in_class = len(datat[class_index])

#这些图⽚的索引

追讨欠款起诉书image_indices = np.arange(nrof_images_in_class)

np.random.shuffle(image_indices)

#从第i个⼈中抽样的图⽚数量

nrof_images_from_class = min(nrof_images_in_class, images_per_person, nrof_images-len(image_paths)) idx = image_indices[0:nrof_images_from_class]

#抽样出来的⼈的路径

image_paths_for_class = [datat[class_index].image_paths[j] for j in idx]

#图⽚的label

sampled_class_indices += [class_index]*nrof_images_from_class

image_paths += image_paths_for_class

#第i个⼈抽样了多少张

num_per_class.append(nrof_images_from_class)

对父亲感恩的话

i+=1

return image_paths, num_per_class

#3. 调⽤lect_triplets()得到（A,P,N）三元组

今年是平年还是闰年

def lect_triplets(embeddings, nrof_images_per_class, image_paths, people_per_batch, alpha):

""" Select the triplets for training

"""

trip_idx = 0

#某个⼈的图⽚的embedding在emb_arr中的开始的索引

emb_start_idx = 0

num_trips = 0

缩阴方法triplets = []

# VGG Face: Choosing good triplets is crucial and should strike a balance between

# lecting informative (i.e. challenging) examples and swamping training with examples that

# are too hard. This is achieve by extending each pair (a, p) to a triplet (a, p, n) by sampling

# the image n at random, but only between the ones that violate the triplet loss margin. The

# latter is a form of hard-negative mining, but it is not as aggressive (and much cheaper) than

# choosing the maximally violating example, as often done in structured output learning.

#遍历每⼀个⼈

for i in xrange(people_per_batch):

#这个⼈有多少张图⽚

nrof_images = int(nrof_images_per_class[i])

地下暗流

#遍历第i个⼈的所有图⽚

for j in xrange(1,nrof_images):

#第j张图的embedding在emb_arr 中的位置

a_idx = emb_start_idx + j - 1

#第j张图跟其他所有图⽚的欧⽒距离

neg_dists_sqr = np.sum(np.square(embeddings[a_idx] - embeddings), 1)

#遍历每⼀对可能的(anchor,postive)图⽚，记为(a,p)吧

for pair in xrange(j, nrof_images): # For every possible positive pair.

#第p张图⽚在emb_arr中的位置

p_idx = emb_start_idx + pair

#(a,p)之前的欧式距离

pos_dist_sqr = np.sum(np.square(embeddings[a_idx]-embeddings[p_idx]))

#同⼀个⼈的图⽚不作为negative，所以将距离设为⽆穷⼤

neg_dists_sqr[emb_start_idx:emb_start_idx+nrof_images] = np.NaN

#all_neg = np.where(np.logical_and(neg_dists_sqr-pos_dist_sqr<alpha, pos_dist_sqr<neg_dists_sqr))[0] # FaceNet lection #其他⼈的图⽚中有哪些图⽚与a之间的距离-p与a之间的距离⼩于alpha的

all_neg = np.where(neg_dists_sqr-pos_dist_sqr<alpha)[0] # VGG Face lecction

#所有可能的negative

nrof_random_negs = all_neg.shape[0]关于女人

#如果有满⾜条件的negative

if nrof_random_negs>0:

#从中随机选取⼀个作为n

rnd_idx = np.random.randint(nrof_random_negs)

n_idx = all_neg[rnd_idx]

# 选到(a,p,n)作为三元组

triplets.append((image_paths[a_idx], image_paths[p_idx], image_paths[n_idx]))

#print('Triplet %d: (%d, %d, %d), pos_dist=%2.6f, neg_dist=%2.6f (%d, %d, %d, %d, %d)' %

# (trip_idx, a_idx, p_idx, n_idx, pos_dist_sqr, neg_dists_sqr[n_idx], nrof_random_negs, rnd_idx, i, j, emb_start_idx))

trip_idx += 1

num_trips += 1

emb_start_idx += nrof_images

np.random.shuffle(triplets)

环境卫生宣传标语return triplets, num_trips, len(triplets)

本文发布于:2023-07-10 00:59:26，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1075042.html

上一篇：PV-RCNN论文和逐代码解析（二）

下一篇：AdobePhotoshopCS试题库及答案

标签：损失函数样本抽样训练

留言与评论（共有 0 条评论）