首页 > 英语园地

激活函数Swish和Hardswish简介

更新时间:2023-06-04 14:58:00 阅读：评论：0

激活函数Swish 和Hardswish 简介

前⾔

bios是什么意思

Swish激活函数和Hardswish激活函数学习笔记。

Swish 论⽂

，Google出品。

论⽂摘要翻译和解读

The choice of activation functions in deep networks has a significant effect on the training dynamics and task performance.深度⽹络中激活函数的选择对训练动态（training dynamics）和任务性能有显著影响。

licenmanagertraining dynamics：指的是训练过程中，模型的性能指标随迭代轮数变化的情况。影响training dynamics的因素不⽌⼀个，每种⽹络结构都有⾃⼰的training dynamic，但有些因素对各类⽹络的traning dynamics都有影响，⽐如激活函数、学习率等。Currently, the most successful and widely-ud activation function is the Rectified Linear Unit (ReLU).

⽬前，最成功和⼴泛使⽤的激活函数是整流线性单元（ReLU）。

Although various hand-designed alternatives to ReLU have been propod, none have managed to replace it due to inconsistent gains.

悬而不决

虽然已经提出了各种⼿⼯设计的ReLU替代品，但由于增益不⼀致，没有⼀个能够取代它。

In this work, we propo to leverage automatic arch techniques to discover new activation functions.

百闻不如一见英文在这项⼯作中，我们建议利⽤⾃动搜索技术来发现新的激活功能。

Using a combination of exhaustive and reinforcement learning-bad arch, we discover multiple novel activation functions.

通过结合暴⼒搜索和基于强化学习的搜索，我们发现了多种新颖的激活函数。

We verify the effectiveness of the arches by conducting an empirical evaluation with the best discovered activation function.

我们通过对发现的最佳激活函数进⾏实证评估来验证搜索的有效性。

实验评估，就是不⽤理论分析。

Our experiments show that the best discovered activation function, f(x)=x ⋅sigmoid(βx), which we name Swish, tends to work better than ReLU on deeper models across a number of challenging datats.我们的实验表明，在许多具有挑战性的数据集中，所发现的最佳激活函数f(x)=x ⋅sigmoid(βx)，我们将其命名为Swish，在更深的模型上往往⽐ReLU更好。

β，常量，或者是可学习的参数。

如果β = 1， f(x)=x ⋅sigmoid(x)，相当于Sigmoid-weighted Linear Unit (SiL) 。

如果β = 0，Swish 变成了缩放线性函数 f(x) = x/2。

如果β → ∞，sigmoid 分量接近 0-1 函数，因此 Swish 变得像 ReLU 函数。

这表明，我们可以⼤致地把Swish 视为⼀个平滑函数，它在线性函数和 ReLU 函数之间进⾏⾮线性插值。如果将插值程度设置为可训练参数，则模型可以控制β。

从下图可以看到不同的β取值时的函数曲线，当β =10的时候，就开始和ReLU很靠近了。

四级英语成绩sigmoid (x )=1+exp (−x )

非主流翻译>土耳其语

摆姿势英语怎么说

For example, simply replacing ReLUs with Swish units improves top-1 classification accuracy on ImageNet by 0.9% for Mobile NASNet-A and 0.6% for Inception-ResNet-v2.

阅读网例如，只需将 ReLU 替换为 Swish 单元，Mobile NASNet-A 的 ImageNet 上 top-1 分类准确率就会提⾼ 0.9%，Inception-ResNet-v2的分类准确率提⾼ 0.6%。

相当于⽩拿0.9%的准确率，不拿⽩不拿。

但摘要中并没有提到收敛速度的对⽐。

The simplicity of Swish and its similarity to ReLU make it easy for practitioners to replace ReLUs with Swish units in any neural network.

callfor

Swish的简单性及其与ReLU的相似性使从业者可以轻松地在任何神经⽹络中⽤Swish单元替换ReLU。

后来的论⽂发现，Swish只有在深层⽹络中才能发挥作⽤，并且它还是有计算量的，于是就提出了hardswish，也就是硬编码的swish。

Hard-Swish激活

计算公式

函数图像

pytorch中已经提供了hardswish激活函数，直接⽤。

本文发布于:2023-06-04 14:58:00，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/78/861311.html

上一篇：第3章实义名词和抽象名词 (1)

下一篇：陶洁《美国文学选读》笔记和课后习题(含考研真题)详解(舍伍德安德森)【圣才出品】

标签：函数搜索训练发现没有模型

留言与评论（共有 0 条评论）