语义分割损失函数系列(1):交叉熵损失函数
最近⼀直在做⼀些语义分割相关的项⽬,找损失函数的时候发现⽹上这些⼤佬的写得各有千秋,也没说怎么⽤,在此记录⼀下⾃⼰在训练过程中使⽤损失函数的⼀些⼼得.本⼈是使⽤的Pytorch框架,故这⼀系列都会基于Pytorch来实现。
⾸先是交叉熵损失函数,语义分割其实是⼀个逐像素分类的⼀个分类问题,做过图像分类的应该都⽐较熟悉交叉熵损失函数。
pytorch中⾃带有写好的交叉熵函数,只需要调⽤就⾏:
loss_func = nn.CrossEntropyLoss()
<模块中写好的损失函数都是以类的⽅式写的,只需要提前声明⼀下后⾯即可调⽤。
pytorch中交叉熵损失函数的实现:
class CrossEntropyLoss(_WeightedLoss):
r"""This criterion computes the cross entropy loss between input and target.
It is uful when training a classification problem with `C` class.
If provided, the optional argument :attr:`weight` should be a 1D `Tensor`
assigning weight to each of the class.
使用方法英文This is particularly uful when you have an unbalanced training t.
The `input` is expected to contain raw, unnormalized scores for each class.
`input` has to be a Tensor of size either :math:`(minibatch, C)` or
:math:`(minibatch, C, d_1, d_2, ..., d_K)` with :math:`K \geq 1` for the
`K`-dimensional ca. The latter is uful for higher dimension inputs, such
as computing cross entropy loss per-pixel for 2D images.
The `target` that this criterion expects should contain either:
- Class indices in the range :math:`[0, C-1]` where :math:`C` is the number of class; if
`ignore_index` is specified, this loss also accepts this class index (this index
may not necessarily be in the class range). The unreduced (i.e. with :attr:`reduction`
t to ``'none'``) loss for this ca can be described as:
.. math::
\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad
l_n = - w_{y_n} \log \frac{\exp(x_{n,y_n})}{\sum_{c=1}^C \exp(x_{n,c})}
\cdot \mathbb{1}\{y_n \not= \text{ignore\_index}\}
where :math:`x` is the input, :math:`y` is the target, :math:`w` is the weight,
:math:`C` is the number of class, and :math:`N` spans the minibatch dimension as well as
:math:`d_1, ..., d_k` for the `K`-dimensional ca. If
:attr:`reduction` is not ``'none'`` (default ``'mean'``), then
.
. math::
\ell(x, y) = \begin{cas}
\sum_{n=1}^N \frac{1}{\sum_{n=1}^N w_{y_n} \cdot \mathbb{1}\{y_n \not= \text{ignore\_index}\}} l_n, &
\text{if reduction} = \text{`mean';}\\
\sum_{n=1}^N l_n, &
\text{if reduction} = \text{`sum'.}
\end{cas}
Note that this ca is equivalent to the combination of :class:`~LogSoftmax` and
:class:`~NLLLoss`.
- Probabilities for each class; uful when labels beyond a single class per minibatch item
are required, such as for blended labels, label smoothing, etc. The unreduced (i.e. withdescriptions
:attr:`reduction` t to ``'none'``) loss for this ca can be described as:
.. math::
\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad
l_n = - \sum_{c=1}^C w_c \log \frac{\exp(x_{n,c})}{\exp(\sum_{i=1}^C x_{n,i})} y_{n,c}
where :math:`x` is the input, :math:`y` is the target, :math:`w` is the weight,
:math:`C` is the number of class, and :math:`N` spans the minibatch dimension as well as
:math:`d_1, ..., d_k` for the `K`-dimensional ca. If
:attr:`reduction` is not ``'none'`` (default ``'mean'``), then
.. math::
\ell(x, y) = \begin{cas}
\frac{\sum_{n=1}^N l_n}{N}, &
\text{if reduction} = \text{`mean';}\\
\sum_{n=1}^N l_n, &
\sum_{n=1}^N l_n, &
\text{if reduction} = \text{`sum'.}
\end{cas}
.. note::
The performance of this criterion is generally better when `target` contains class
indices, as this allows for optimized computation. Consider providing `target` as
class probabilities only when a single class label per minibatch item is too restrictive.
Args:
weight (Tensor, optional): a manual rescaling weight given to each class.
If given, has to be a Tensor of size `C`
size_average (bool, optional): Deprecated (e :attr:`reduction`). By default,
the loss are averaged over each loss element in the batch. Note that for
some loss, there are multiple elements per sample. If the field :attr:`size_average`
is t to ``Fal``, the loss are instead summed for each minibatch. Ignored
when :attr:`reduce` is ``Fal``. Default: ``True``
ignore_index (int, optional): Specifies a target value that is ignored
and does not contribute to the input gradient. When :attr:`size_average` is
``True``, the loss is averaged over non-ignored targets. Note that
:attr:`ignore_index` is only applicable when the target contains class indices.
reduce (bool, optional): Deprecated (e :attr:`reduction`). By default, the
loss are averaged or summed over obrvations for each minibatch depending
on :attr:`size_average`. When :attr:`reduce` is ``Fal``, returns a loss per
batch element instead and ignores :attr:`size_average`. Default: ``True``
reduction (string, optional): Specifies the reduction to apply to the output:
``'none'`` | ``'mean'`` | ``'sum'``. ``'none'``: no reduction will
be applied, ``'mean'``: the weighted mean of the output is taken,
``'sum'``: the output will be summed. Note: :attr:`size_average`
and :attr:`reduce` are in the process of being deprecated, and in
the meantime, specifying either of tho two args will override
:attr:`reduction`. Default: ``'mean'``
label_smoothing (float, optional): A float in [0.0, 1.0]. Specifies the amountwebphone
ysm
of smoothing when computing the loss, where 0.0 means no smoothing. The targets
become a mixture of the original ground truth and a uniform distribution as described in
`Rethinking the Inception Architecture for Computer Vision <arxiv/abs/1512.00567>`__. Default: :math:`0.0`. Shape:
- Input: :math:`(N, C)` where `C = number of class`, or
:math:`(N, C, d_1, d_2, ..., d_K)` with :math:`K \geq 1`
in the ca of `K`-dimensional loss.
- Target: If containing class indices, shape :math:`(N)` where each value is
:math:`0 \leq \text{targets}[i] \leq C-1`, or :math:`(N, d_1, d_2, ..., d_K)` with
:math:`K \geq 1` in the ca of K-dimensional loss. If containing class probabilities,
same shape as the input.
- Output: If :attr:`reduction` is ``'none'``, shape :math:`(N)` or
:math:`(N, d_1, d_2, ..., d_K)` with :math:`K \geq 1` in the ca of K-dimensional loss.
Otherwi, scalar.
Examples::
>>> # Example of target with class indices
>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = pty(3, dtype=torch.long).random_(5)
>>> output = loss(input, target)
>>> output.backward()
>>>
>>> # Example of target with class probabilities
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randn(3, 5).softmax(dim=1)
>>> output = loss(input, target)
>>> output.backward()
"""
__constants__ =['ignore_index','reduction','label_smoothing']super bowl
ignore_index:int
label_smoothing:float
def__init__(lf, weight: Optional[Tensor]=None, size_average=None, ignore_index:int=-100,
ennreduce=None, reduction:str='mean', label_smoothing:float=0.0)->None:
super(CrossEntropyLoss, lf).__init__(weight, size_average,reduce, reduction)
lf.ignore_index = ignore_index
lf.label_smoothing = label_smoothing
def forward(lf,input: Tensor, target: Tensor)-> Tensor:
ss_entropy(input, target, weight=lf.weight,
ignore_index=lf.ignore_index, duction,
label_smoothing=lf.label_smoothing)
label_smoothing=lf.label_smoothing)
在声明损失函数的时候可以加上⼀些参数,其中⽐较重要的是weight: Optional[Tensor] = None,
weight权重是在计算中给每⼀类加上相应的计算权重,是⼀个tensor,长度和类别数⼀致;以及label_smoothing,label_smoothing⽅法在交叉熵损失函数中⾃带有,假设label_smoothing = 0.1的话,在⼆分类问题中就会将类别为1的变为0.9,类别为0的变为0.1,这样做能够让损失更加平滑,更容易收敛,避免错误分类带来的过⼤的损失。
在使⽤也就是计算loss值的时候需要两个参数,⼀个是input,⼀个是target,两个都是tensor,input是你模型的预测结果,target是真实标注。例如:
import torch
时间表英文 as nn
loss_func = nn.CrossEntropyLoss()
input= torch.randn(3,5, requires_grad=True)
perfumes
target = pty(3, dtype=torch.long).random_(5)
output = loss_func(input, target)
print(output)
input: tensor([[1.6738,0.0526,0.6329,-0.8809,1.4822],诗句翻译
[-0.5908,1.5717,1.3402,0.4227,-0.3498],
[-0.3359,-2.3797,-1.6206,-2.3070,0.6010]], requires_grad=True)
target: tensor([3,4,1])
loss: tensor(3.2306, grad_fn=<NllLossBackward0>)
上⾯这就类似与⼀个五分类的问题,input是模型最后的全连接层的输出,或者是全卷积⽹络最后的输出。
这⾥注意:
best wishes中文翻译加⼊你把分类输出的概率做了argmax操作以后,计算会出错,例如:
import torch
as nn
loss_func = nn.CrossEntropyLoss()
# input = torch.randn(3, 5, requires_grad=True)
input= torch.randn(3,requires_grad=True)
print("input:",input)
target = pty(3, dtype=torch.long).random_(5)
print("target:",target)
output = loss_func(input, target)
print("loss:",output)
input: tensor([-0.3463,1.2289,0.2517], requires_grad=True)
target: tensor([3,4,3])
Traceback (most recent call last):
File "/home/lwf/Project/MRI-Segmentation/tets.py", line 19,in<module>
output = loss_func(input, target)
File "/home/lwf/anaconda3/envs/torch3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102,in _call_impl
return forward_call(*input,**kwargs)
File "/home/lwf/anaconda3/envs/torch3.7/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 1152,in forward
label_smoothing=lf.label_smoothing)
File "/home/lwf/anaconda3/envs/torch3.7/lib/python3.7/site-packages/torch/nn/functional.py", line 2846,in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, __enum(reduction), ignore_index, label_smoothing)
RuntimeError: Expected floating point type for target with class probabilities, got Long
也就是说,input的输出必须是每个类别的概率。