Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization.

Overview Structured Data Issues Contributors Activity

arxiv.org...02.03167v3

arxiv.org...v3.pdf

Is a