Patent attributes
Computational apparatus includes a memory, which contains first and second input matrices of input data values, having at least three dimensions including respective heights and widths in a predefined sampling space and a common depth in a feature dimension, orthogonal to the sampling space. An array of processing elements each perform a multiplication of respective first and second input operands and to accumulate products of the multiplication to generate a respective output value. Data access logic extracts first and second pluralities of vectors of the input data values extending in the feature dimension from the first and second input matrices, respectively, and distributes the input data values from the extracted vectors in sequence to the processing elements so as to cause the processing elements to compute a convolution of first and second two-dimensional matrices composed respectively of the first and second pluralities of vectors.