What is Activation Function?
It is a curve (sigmoid,
The value of the activation function is then assigned to the node.
It is also called transfer function or squashing function.
Where in the Network?
As a beginner, you might ask where is the activation function in the network because looking at the network I can only see nodes and weights. So where in the Network?
The results are assigned to nodes in
The activation function is placed in every node of the network.
How does it work?
The working of activation function can be understood by simply asking – what is the value of y on the curve for given x?
Let’s look at an example of sigmoid function.
y = sigmoid ( x )
So the curve that goes from -infinity to +infinity in X-axis and stays between (0 to 1) in Y-axis will always give output between 0 and 1 from any value of x.
In neural networks,
nodes = sigmoid ( np.dot(input, weights) )
Why do we use
Simple patterns are represented by straight lines. Therefore, they are linear in nature.
The data present around us is complex. The relation between the input and output is not simple. The output of the dataset can be influenced by many inputs/variables.
The patterns which can’t be represented by a straight line are non–linear.
They are represented by curves.
So to recognise the complex pattern where the output is influenced by many inputs. The relation tends to be non-linear, turning the simple line into a curve.
Therefore, we use non-linear activation functions for non-linear patterns present in data.
Types of Activation Functions
Cheatsheet of all activation functions is at end.
It looks like S in shape.
The function is widely used because it exists between (0 to 1) making it easier to distinguish between two inputs.
The function is differentiable.That means, we can find the slope of the sigmoid curve at any two points.
Download Sigmoid Code – Python.
TanH is more quick than sigmoid curve.
We can also say that, the slope of tanh more steep or higher than sigmoid curve.
Download tanH Code – Python.
3. ReLU (Rectified Linear Unit)
The slope of ReLU is 1, which means x is accelerating at the same rate as y. This results in x=y for every point above 0 on the curve.
Range: [ 0 to infinity)
This helps in easily classifying many more objects through the neural network.
Download ReLU Code – Python.
We can further improve this too. But why?
Dying ReLU Problem 💀
At high learning rate – you may find that as much as 40% of your network can be “dead” (i.e. neurons that never activate across the entire training dataset) and the neuron might never activate.
4. Leaky ReLU
Fixing the dying ReLU problem.
Leaky ReLU will have a small negative slope (of 0.01, or so).
The leaky ReLU makes things easier and stops negative values from becoming dead 0.
Range: (-infinity, +infinity)
Download Leaky ReLU Code – Python.
Why is differentiation used?
When updating the curve, to know in which direction and how much to change or update the curve depending upon the slope. That is why we use differentiation in almost every part of Machine Learning and Deep Learning.
Activation Function Cheetsheet
Colclusion | Which one is better?
If you have read the post you will see that the last standing function is Leaky ReLU overcoming all the issues of previous functions.