Activation Functions in Deep Learning: Understanding the Role of Activation Functions in Neural Networks

udit
3 min readJan 1, 2023
Source: https://machine-learning.paperspace.com/wiki/activation-function

Activation functions are mathematical functions that are used to introduce non-linearity in neural networks. They are applied element-wise to the output of a layer, meaning that each element in the output is transformed by the activation function before it is passed to the next layer.

The purpose of activation functions is to introduce non-linearity in the neural network, which allows the network to learn more complex patterns in the data. Without activation functions, the network would be limited to learning linear patterns and would not be able to learn more complex patterns.

There are various types of activation functions that can be used in deep learning, including sigmoid, tanh, ReLU, and softmax. Each type of activation function has its own unique characteristics and is suitable for different types of data.

Sigmoid activation function:

The sigmoid activation function is a s-shaped curve that maps any input to a value between 0 and 1. It is defined by the following function:

f(x) = 1 / (1 + e^(-x))

The sigmoid activation function has the following characteristics:

  1. It is a smooth function that allows for smooth gradients.
  2. It saturates for large positive or negative values, meaning that the output of the function approaches 0 or 1 as the input becomes very large.
  3. It is sensitive to the scale of the input, meaning that the output is significantly impacted by small changes in the scale of the input.

Tanh activation function:

The tanh activation function is a s-shaped curve that maps any input to a value between -1 and 1. It is defined by the following function:

f(x) = 2 / (1 + e^(-2x)) — 1

The tanh activation function has the following characteristics:

  1. It is a smooth function that allows for smooth gradients.
  2. It saturates for large positive or negative values, meaning that the output of the function approaches -1 or 1 as the input becomes very large.
  3. It is less sensitive to the scale of the input compared to the sigmoid activation function.

ReLU activation function:

The ReLU (Rectified Linear Unit) activation function is a piecewise linear function that maps any input to a value between 0 and the input. It is defined by the following function:

f(x) = max(0, x)

The ReLU activation function has the following characteristics:

  1. It is a non-smooth function that allows for faster training compared to smooth activation functions.
  2. It does not saturate for large positive or negative values, meaning that the output of the function is the input for all positive values and 0 for all negative values.
  3. It is not sensitive to the scale of the input.

Softmax activation function:

The softmax activation function is a generalization of the sigmoid activation function that is used for multiclass classification tasks. It maps any input to a value between 0 and 1 and is defined by the following function:

f(x) = e^x / sum(e^x)

The softmax activation function has the following characteristics:

  1. It maps the input values to a probability distribution, meaning that the sum of the output values is 1.
  2. It is sensitive to the scale of the input, meaning that small changes in the scale of the input can significantly impact the output.

Conclusion:

Activation functions are mathematical functions that are used to introduce non-linearity in neural networks. They play a crucial role in the functioning of neural networks and allow the network to learn more complex patterns in the data. There are various types of activation functions that can be used in deep learning, each with its own unique characteristics and suitable for different types of data.

--

--