The exploding gradient problem was first described in an academic paper written titled "The problem of learning long-term dependencies in recurrent networks". The exploding gradient problem is a difficulty which can occur when training artificial neural networks using gradient descent by back propagation.
When large error gradients accumulate the model may become unstable and impair effective learning. Change in model weights can create an unstable network. The values of weights can become so large and cause overflow. A gradient is the direction and magnitude calculated during the training of a neural network it is used to teach the network weights in the right direction by the right amount. When there is an error gradient, explosion of components may grow exponentially.
Exploding gradient problem can be addressed by redesigning the network model, using rectified linear activation, using long short term memory (LSTM) networks, gradient clipping and weight regularization.Another solution to the exploding gradient problem is to prevent gradients from becoming to0 big applying a process known as gradient clipping that places a predefined threshold on each gradient. Gradient clipping ensures the gradients stay heading towards the same direction but with shorter lengths.