Patent attributes
Computational apparatus includes a systolic array of processing elements. In each of a sequence of processing cycles, the processing elements in a first row of the array each receive a respective first plurality of first operands, while the processing elements in a first column of the array each receive a respective second plurality of second operands. Each processing element, except in the first row and first column, receives the respective first and second pluralities of the operands from adjacent processing elements in a preceding row and column of the array. Each processing element multiplies pairs of the first and second operands together to generate multiple respective products, and accumulates the products in accumulators. Synchronization logic loads a succession of first and second vectors of the operands into the array, and upon completion of processing triggers the processing elements to transfer respective data values from the accumulators out of the array.