Patent attributes
In one embodiment, a method for machine learning acceleration includes receiving instructions to perform convolution on an input tensor using a filter tensor, determining that the size of a first dimension of the input tensor is less than a processing capacity of each of multiple subarrays of computation units in a tensor processor, selecting a second dimension of the input tensor along which to perform the convolution, selecting, based on the second dimension, one or more dimensions of the filter tensor, generating (1) first instructions for reading, using vector read operations, activation elements in the input tensor organized such that elements with different values in the second dimension are stored contiguously in memory, and (2) second instructions for reading weights of the filter tensor along the selected one or more dimensions, and using the first and second instructions to provide the activation elements and the weights to the subarrays.