No article content yet.
Timeline
No Timeline data yet.
Further Resources
No Further Resources data yet.
Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization.