Patent attributes
A neural network unit convolves a H×W×C input with F R×S×C filters to generate F Q×P outputs. N processing units (PU) each have a register receiving a memory word and a multiplexed-register selectively receiving a memory word or word rotated from an adjacent PU multiplexed-register. The N PUs are logically partitioned as G blocks each of B PUs. The PUs convolve in a column-channel-row order. For each filter column: the N registers read a memory row, each PU multiplies the register and the multiplexed-register to generate a product to accumulate, and the multiplexed-registers are rotated by one; the multiplexed-registers are rotated to align the input blocks with the adjacent PU block. This is performed for each channel. For each filter row, N multiplexed-registers read a memory row for the multiply-accumulations, F column-channel-row-sums are generated and written to the memory, then all steps are performed for each output row.