ai notes – Learn.Do.Rinse.Repeat.

Gradient Clipping

Gradient clipping prevents exploding gradient problems by limiting on how large the adjustments to the neural network‘s weights can be during a single update. It imposes constraints on the magnitude of the calculated gradients during backpropagation, ensures that the resulting weight updates remain withing safe and manageable bounds, leading to stable and efficient training dynamics.…

Anu Bhatia

December 16, 2025

ai notes

ai compendium, ai notes, deep learning, gradient clipping, gradient descent, machine learning, neural networks

Batch Normalization

Batch normalization is a method of reparametrizing any layer, input or hidden, in a deep neural network. Batch Normalization, or BN, resolves the vanishing gradient problem by ensuring that activations remain in the non-saturated regions of non-linear functions. This is achieved by forcing the inputs to have zero mean and unit variance. It resolves the…

Anu Bhatia

November 28, 2025

ai notes

ai compendium, ai notes, batch normalization, deep learning, gradient descent

He Initialization

Also known as Kaiming Initialization, this technique was developed for neural networks that used the ReLU activation function. ReLU is a non-linear function that clips all negative inputs to zero. This results in a non-zero, positive mean and a reduction in the signal’s overall magnitude. When we apply Xavier initialization to deep networks using ReLU,…

Anu Bhatia

November 3, 2025

ai notes

ai compendium, ai notes, anu bhatia’s ai compendium, deep learning, he initialization, kaiming initialization, machine learning, neural network, neural networks

Xavier Initialization

Training deep networks requires careful initialization. If weights are too small, activations and gradients can shrink toward zero; if too large, they can grow out of control. Also known as the Glorot Initialization, the Xavier Initialization was proposed by Xavier Glorot and Yoshua Bengio in 2010. It is designed to combat both, vanishing and exploding…

Anu Bhatia

October 6, 2025

ai notes

ai compendium, ai notes, glorot initialization, machine learning, neural networks, weight initialization, xavier initialization

Solving Gradient Problems in Neural Networks

Back-propagation calculates the gradient of the loss function with respect to the weights and updates the weights to reduce the error. The main mathematical principle used here is the chain rule. However, the repeated multiplication inherent in this process can lead to either the vanishing or the exploding gradient problem. Solutions For Both Issues Intelligent…

Anu Bhatia

September 29, 2025

ai notes

ai compendium, ai notes, anu bhatia’s ai compendium, gradient descent, machine learning, neural networks

The Exploding Gradient Problem

Similar to the vanishing gradient problem, the issue of exploding gradients arises during backpropagation. In this chain-reaction-like scenario, gradients become excessively large, causing model weights to grow uncontrollably. This instability often leads to numerical overflow, which results in a ‘Not a Number’ (NaN) error. Spotting the Exploding Gradient Problem Exploding gradients are a tell-tale sign…

Anu Bhatia

September 19, 2025

ai notes

ai compendium, ai notes, exploding gradient problem, gradient descent, machine learning, neural networks

The Vanishing Gradient Problem

The vanishing gradient problem can plague deep neural networks which consists of many hidden layers. It generally occurs when the derivatives of the activation functions are less than one. This leads to the multiplication of small numbers during back-propagation, which in turn shrink the gradients exponentially. As a result of this issue, the network will…

Anu Bhatia

September 13, 2025

ai notes

ai compendium, ai notes, gradient descent, identifying the vanishing gradient problem, machine learning, neural networks, vanishing gradient problem

Back-propagation

Back-propagation is a computational method that computes the gradient of the loss function in neural networks. This is the third step in the neural network process. The mathematical concept used here is the chain rule from calculus. The chain rule helps in finding the derivative of a composite function, and a neural network can be…

Anu Bhatia

September 5, 2025

ai notes

AI, ai compendium, ai notes, back-propagation explained, machine learning