Category: ai notes
-
Gradient Clipping
Gradient clipping prevents exploding gradient problems by limiting on how large the adjustments to the neural network‘s weights can be during a single update. It imposes constraints on the magnitude of the calculated gradients during backpropagation, ensures that the resulting weight updates remain withing safe and manageable bounds, leading to stable and efficient training dynamics.…
-
Batch Normalization
Batch normalization is a method of reparametrizing any layer, input or hidden, in a deep neural network. Batch Normalization, or BN, resolves the vanishing gradient problem by ensuring that activations remain in the non-saturated regions of non-linear functions. This is achieved by forcing the inputs to have zero mean and unit variance. It resolves the…
-
He Initialization
Also known as Kaiming Initialization, this technique was developed for neural networks that used the ReLU activation function. ReLU is a non-linear function that clips all negative inputs to zero. This results in a non-zero, positive mean and a reduction in the signal’s overall magnitude. When we apply Xavier initialization to deep networks using ReLU,…
-
Xavier Initialization
Training deep networks requires careful initialization. If weights are too small, activations and gradients can shrink toward zero; if too large, they can grow out of control. Also known as the Glorot Initialization, the Xavier Initialization was proposed by Xavier Glorot and Yoshua Bengio in 2010. It is designed to combat both, vanishing and exploding…
-
Solving Gradient Problems in Neural Networks
Back-propagation calculates the gradient of the loss function with respect to the weights and updates the weights to reduce the error. The main mathematical principle used here is the chain rule. However, the repeated multiplication inherent in this process can lead to either the vanishing or the exploding gradient problem. Solutions For Both Issues Intelligent…
-
The Exploding Gradient Problem
Similar to the vanishing gradient problem, the issue of exploding gradients arises during backpropagation. In this chain-reaction-like scenario, gradients become excessively large, causing model weights to grow uncontrollably. This instability often leads to numerical overflow, which results in a ‘Not a Number’ (NaN) error. Spotting the Exploding Gradient Problem Exploding gradients are a tell-tale sign…
-
Back-propagation
Back-propagation is a computational method that computes the gradient of the loss function in neural networks. This is the third step in the neural network process. The mathematical concept used here is the chain rule from calculus. The chain rule helps in finding the derivative of a composite function, and a neural network can be…
-
Neural Network Process
Step 1: Forward Pass Step 2: Calculate the Error (Loss/Cost Function) Step 3: Backward Pass (Back-propagation) Step 4: Update Weights