Activation Functions and Its types

Activation Functions in Neural Networks and Its Types

What is Activation Function?

It is a curve (sigmoid, tanH, ReLU) which is used to map the values of the network between bounded values. This is done for every node in the network. For example, sigmoid can map any range of values between 0 and 1.

The value of the activation function is then assigned to the node.

Activation Functions and Its types
https://theffork.com/activation-functions-in-neural-networks/
Activation Functions and Its types

It is also called transfer function or squashing function.


Where in the Network?

As a beginner, you might ask where is the activation function in the network because looking at the network I can only see nodes and weights. So where in the Network?

The results are assigned to nodes in layer.

Activation Functions in Neural Network
Activation Functions in Neural Network

The activation function is placed in every node of the network.


How does it work?

The working of activation function can be understood by simply asking – what is the value of on the curve for given x?

Let’s look at an example of sigmoid function.

Activation Funciton Working

Working Activation Function

Mathematical representation

y = sigmoid ( x )                  

So the curve that goes from -infinity to +infinity in X-axis and stays between (0 to 1) in Y-axis will always give output between 0 and 1 from any value of x.

In neural networks,

nodes = sigmoid ( np.dot(input, weights) )

Why do we use Activation Function?

Simple patterns are represented by straight lines. Therefore, they are linear in nature.

Linear Function
Linear Function

The data present around us is complex. The relation between the input and output is not simple. The output of the dataset can be influenced by many inputs/variables.

The patterns which can’t be represented by a straight line are nonlinear.

They are represented by curves.

So to recognise the complex pattern where the output is influenced by many inputs. The relation tends to be non-linear, turning the simple line into a curve.

Therefore, we use non-linear activation functions for non-linear patterns present in data.


Types of Activation Functions

Cheatsheet of all activation functions is at end.

1. Sigmoid

It looks like S in shape.

Sigmoid Curve

The function is widely used because it exists between (0 to 1) making it easier to distinguish between two inputs.

Range: (0,1)

The function is differentiable.That means, we can find the slope of the sigmoid curve at any two points.

Download Sigmoid CodePython.

2. tanH

TanH is more quick than sigmoid curve.

Sigmoid v/s tanH

Therefore, tanH decides faster than sigmoid curve whether the output is -1 or 1.

Range: (-1,1)

We can also say that, the slope of tanh more steep or higher than sigmoid curve.

Download tanH CodePython.

3. ReLU (Rectified Linear Unit)

ReLU v/s Sigmoid

The slope of ReLU is 1, which means x is accelerating at the same rate as y. This results in x=y for every point above 0 on the curve.

Range: [ 0 to infinity)

This helps in easily classifying many more objects through the neural network.

Download ReLU CodePython.

We can further improve this too. But why?

Dying ReLU Problem 💀

At high learning rate – you may find that as much as 40% of your network can be “dead” (i.e. neurons that never activate across the entire training dataset) and the neuron might never activate.

4. Leaky ReLU

Fixing the dying ReLU problem.

Leaky ReLU v/s ReLU


Leaky ReLU will have a small negative slope (of 0.01, or so).

The leaky ReLU makes things easier and stops negative values from becoming dead 0.

Range: (-infinity, +infinity)

Download Leaky ReLU CodePython.

Why is differentiation used?

When updating the curve, to know in which direction and how much to change or update the curve depending upon the slope. That is why we use differentiation in almost every part of Machine Learning and Deep Learning.


Activation Function Cheetsheet

Activaiton Function Cheatsheet
Activaiton Function Cheatsheet *Source

Colclusion | Which one is better?

If you have read the post you will see that the last standing function is Leaky ReLU overcoming all the issues of previous functions.

DOWNLOAD Activation Funcitons from Scratch – Python

You get a Machine learning Oprah

Previous Articles

Leave a Reply

Your email address will not be published. Required fields are marked *