Vanishing-Gradient-Problem

Certain activation functions are added to neural networks, the gradients of the loss function approaches zero, making the network hard to train.

Why This is a problem?

Certain activation functions, like the sigmoid function, squishes a large input space into a small input space between 0 and 1. Therefore, a large change in the input of the sigmoid function will cause a small change in the output. Hence, the derivative becomes small.

Lets have a look on this problem through the code:

I have used the sklearn dataset to train the neural network, Here is the dataset:

Neural Network model Summery

Lets Concentrate on the first layer of the model

For Sigmoid

Weights before Training:
[-0.1077401 , 0.43455356, -0.60425615, -0.53927976, -0.190355,-0.12216991, 0.17259805, -0.29025432, -0.34041786, 0.5119183 ]

Weights after 1st Epoch:
[ 0.53107524, 0.12173636, 0.5073124 , 0.54416114, -0.19135918,-0.08547983, 0.56848377, 0.44454524, -0.6231898 , 0.0601779 ]

Change in Weight
[-0.00113249, -0.00108778, -0.00292063, -0.00071526, -0.00165403,-0.00002235, -0.00202656, -0.00181794, 0.00232458, -0.00081211]

As you can see that change in weight is very low, means that the weights and biases of the initial layers will not be updated effectively with each training session. This can lead to overall inaccuracy of model.

One solution is Restricted Input:

Batch normalization reduces this problem by simply normalizing the input so |x| doesn’t reach the outer edges of the sigmoid function, it normalizes the input so that most of it falls in the green region, where the derivative isn’t too small.

Simplest solution is ReLU Activation:

It doesn't causes small derivative.

For ReLU

Weights before Training:
[-0.7054342 , 0.55165845, -0.44873548, -0.62863475, 0.61142164, -0.4026693 , 0.4864213 , -0.48546308, 0.32516545, 0.49556655]

Weights after 1st Epoch:
[ 0.53107524, 0.12173636, 0.5073124 , 0.54416114, -0.19135918,-0.08547983, 0.56848377, 0.44454524, -0.6231898 , 0.0601779 ]

Change in Weight
[ 3.5640223, 5.7049985, -10.301827 , 8.888125 , -7.2175856, -6.165191 , -9.538114 , 3.937869 , -5.6343074, 5.639493 ] Here is the significant change in weight of the model using ReLU activation function.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Images		Images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Vanishing Gradient Problem .ipynb		Vanishing Gradient Problem .ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vanishing-Gradient-Problem

Why This is a problem?

Lets have a look on this problem through the code:

Neural Network model Summery

For Sigmoid

One solution is Restricted Input:

Simplest solution is ReLU Activation:

For ReLU

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vanishing-Gradient-Problem

Why This is a problem?

Lets have a look on this problem through the code:

Neural Network model Summery

For Sigmoid

One solution is Restricted Input:

Simplest solution is ReLU Activation:

For ReLU

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages