Patent 11651192 was granted and assigned to Apple (company) on May, 2023 by the United States Patent and Trademark Office.
Systems and processes for training and compressing a convolutional neural network model include the use of quantization and layer fusion. Quantized training data is passed through a convolutional layer of a neural network model to generate convolutional results during a first iteration of training the neural network model. The convolutional results are passed through a batch normalization layer of the neural network model to update normalization parameters of the batch normalization layer. The convolutional layer is fused with the batch normalization layer to generate a first fused layer and the fused parameters of the fused layer are quantized. The quantized training data is passed through the fused layer using the quantized fused parameters to generate output data, which may be quantized for a subsequent layer in the training iteration.