注意 (Note)
In this article I have discusd the various types of activation functions and what are the types of problems one might encounter while using each of them.
上班时间I would suggest to begin with a ReLU function and explore other functions as you move further. You can also design your own activation functions giving a non-linearity component to your network.
我建议从ReLU功能开始,并在您进⼀步开发其他功能时进⾏探索。 您还可以设计⾃⼰的激活函数,为⽹络提供⾮线性组件。
Recall that inputs x0,x1,x2,x3……xn and weights w0,w1,w2,w3……..wn are multiplied and added with bias term to form our input.
Clearly W implies how much weight or strength we want to give our incoming input and we can think b as an offt value, making x*w have to reach an offt value before having an effect.
显然, W表⽰我们要给输⼊的输⼊多少重量或强度,我们可以将b视为偏移值,从⽽使x * w必须先达到偏移值才能⽣效。
到⽬前为⽌,我们已经看到了输⼊,那么什么是激活函数? (As far we have en the inputs so now what is activation function?)
Activation function is ud to t the boundaries for the overall output value.For Example:-let z=X*w+b be the output of the previous layer then it will be nt to the activation function for limit it’svalue between 0 and 1(if binary classification problem).
激活函数⽤于设置总输出值的边界,例如:-let z = X * w + b是上⼀层的输出,然后将其发送到激活函数以将其值限制在0到0之间。 1(如果⼆进制分类问题)。
Finally, the output from the activation function moves to the next hidden layer and the same process is repeated. This forward movement of information is known as the forward propagation.
最后,激活函数的输出移⾄下⼀个隐藏层,并重复相同的过程。 信息的这种前向移动称为前向传播 。
What if the output generated is far away from the actual value? Using the output from the forward propagation, error is calculated. Bad on this error value, the weights and bias of the neurons are updated. This process is known as back-propagation.
如果⽣成的输出与实际值相去甚远怎么办? 使⽤前向传播的输出,可以计算误差。 基于此误差值,将更新神经元的权重和偏差。 此过程称为反向传播 。
A neural network without an activation function is esntially just a linear regression model.
⼀些激活功能 (Some Activation Functions)
个人成绩查询1. Step Function
Image for post
if value of z<0,output=0,if value of z>0,output=1
This sort of function is for classification however this activation function is less ud becau this is very strong function as the small changes are not reflected.
此类功能⽤于分类,但是此激活功能使⽤较少,因为由于未反映出较⼩的变化,因此此功能⾮常强⼤ 。
2. Sigmoid Function
The next activation function that we are going to look at is the Sigmoid function. It is one of the most widely ud non-linear activation function. Sigmoid transforms the values between the range 0 and 1.
我们要看的下⼀个激活函数是Sigmoid函数。 它是使⽤最⼴泛的⾮线性激活函数之⼀。 Sigmoid转换0到1之间的值。
Image for post
A noteworthy point here is that unlike the binary step and linear functions, sigmoid is a non-linear function. This
esntially means -when I have multiple neurons having sigmoid function as their activation function,the output is non linear as well.
这⾥值得注意的⼀点是,与⼆值阶跃函数和线性函数不同,Sigmoid是⾮线性函数。 这本质上意味着-当我有多个具有S型功能作为激活功能的神经元时,输出也是⾮线性的。描写老师的作文
3.Hyperbolic Tangent(tanh(z))
Image for post
The tanh function is very similar to the sigmoid function. The only difference is that it is symmetric around the origin.
The range of values in this ca is from -1 to 1. Thus the inputs to the next layers will not always be of the same sign.
tanh函数与S型函数⾮常相似。 唯⼀的区别是它围绕原点对称。 在这种情况下,值的范围是-1⾄1 。 因此,下⼀层的输⼊将不会总是具有相同的符号。
4. Rectified Linear Unit(Relu)
Image for post
This is actually a relative simple funcion max(0,z).
Image for post
def relu_function(x):
if x<0:
return 0
return x
Relu has been found to have very good performance ,especially when dealing with the issue of Vanishing Gradient.
⼈们发现Relu的性能⾮常好,尤其是在处理“ 消失梯度 ”问题时。
5. Leaky Rectified Linear Unit
Leaky ReLU function is nothing but an improved version of the ReLU function. As we saw that for the ReLU function, the gradient is 0 for x<0, which would deactivate the neurons in that region.
泄漏的ReLU功能不过是ReLU功能的改进版本。 如我们所见,对于ReLU函数,对于x <0,梯度为0,这将使该区域的神经元失活。
Image for post
Leaky ReLU is defined to address this problem. Instead of defining the Relu function as 0 for negative values of x, we define it as an extremely small linear component of x. Here is the mathematical expression-
泄漏的ReLU被定义为解决此问题。 对于x的负值,我们没有将Relu函数定义为0,⽽是将其定义为x的⾮常⼩的线性分量。 这是数学表达式-
f(x)={ 0.01x, x<0
建行怎么查开户行x, x>=0}
6.Softmax Function
Softmax function is often described as a combination of multiple sigmoids. We know that sigmoid returns values between 0 and 1, which can be treated as probabilities of a data point belonging to a particular class. Thus sigmoid is widely ud for binary classification problems.
女人贫血怎么补Softmax函数通常被描述为多个S型曲线的组合。 我们知道sigmoid返回的值介于0和1之间,可以将其视为属于特定类的数据点的概率。因此,⼄状结肠被⼴泛⽤于⼆进制分类问题。
Image for post
for i=1,2,3,4……k ,(k=no of categories) 对于i = 1,2,3,4……k,(k =类别数)
Softamax function calculates the probablities distribution of the event over k different events.
So,this means this function will calculate the probablities of each target over all possible targets.
def softmax_function(x):
z = np.exp(x)
z_ = z/z.sum()
return z_
选择正确的激活功能 (Choosing the right Activation Function)
Now that we have en so many activation functions, we need some logic / heuristics to know which activation function should be ud in which situation. Good or bad — there is no rule of thumb.
现在我们已经看到了这么多的激活函数,我们需要⼀些逻辑/试探法来知道在哪种情况下应该使⽤哪个激活函数。 好与坏-没有经验法则。
However depending upon the properties of the problem we might be able to make a better choice for easy and quicker convergence of the network.
Sigmoid functions and their combinations generally work better in the ca of classifiers
Sigmoids and tanh functions are sometimes avoided due to the vanishing gradient problem
ReLU function is a general activation function and is ud in most cas the days
If we encounter a ca of dead neurons in our networks the leaky ReLU function is the best choice
Always keep in mind that ReLU function should only be ud in the hidden layers
As a rule of thumb, you can begin with using ReLU function and then move over to other activation functions in ca ReLU doesn’t provide with optimum results.