Batch Normalization vs Internal Covariate Shift
When batch normalization was introduced in 2015 by Sergey Ioffe and Christian Szegedy, it revolutionized deep learning training. The paper claimed that batch normalization’s success stemmed from reducing “internal covariate shift”—a phenomenon where the distribution of layer inputs changes during training, forcing each layer to continuously adapt. This explanation became widely accepted in the deep … Read more