Patent attributes
Techniques are described for efficiently reducing the amount of total computation in convolutional neural networks (CNNs) without affecting the output result or classification accuracy. Computation redundancy in CNNs is reduced by exploiting the computing nature of the convolution and subsequent pooling (e.g., sub-sampling) operations. In some implementations, the input features may be divided into a group of precision values and the operation(s) may be cascaded. A maximum may be identified (e.g., by 90% probability) using a small number of bits in the input features, and the full-precision convolution may then be performed on the maximum input. Accordingly, the total number of bits used to perform the convolution is reduced without affecting the output features or the final classification accuracy.