Tag: deep learning
-
Gradient Clipping
Gradient clipping prevents exploding gradient problems by limiting on how large the adjustments to the neural network‘s weights can be during a single update. It imposes constraints on the magnitude of the calculated gradients during backpropagation, ensures that the resulting weight updates remain withing safe and manageable bounds, leading to stable and efficient training dynamics.…
-
Batch Normalization
Batch normalization is a method of reparametrizing any layer, input or hidden, in a deep neural network. Batch Normalization, or BN, resolves the vanishing gradient problem by ensuring that activations remain in the non-saturated regions of non-linear functions. This is achieved by forcing the inputs to have zero mean and unit variance. It resolves the…
-
He Initialization
Also known as Kaiming Initialization, this technique was developed for neural networks that used the ReLU activation function. ReLU is a non-linear function that clips all negative inputs to zero. This results in a non-zero, positive mean and a reduction in the signal’s overall magnitude. When we apply Xavier initialization to deep networks using ReLU,…