Patent attributes
An apparatus includes a memory and a circuit. The memory may be configured to store data. The circuit generally has a buffer and may be configured to (i) fetch a kernel from the memory, where the kernel may have a plurality of kernel values, (ii) fetch a block from the memory to the buffer, where the block may have a plurality of input tiles and each of the input tiles may have a plurality of input values in multiple dimensions, (iii) calculate a plurality of intermediate values in parallel by multiplying the input tiles read from the buffer with a corresponding one of the kernel values and (iv) calculate an output tile that may have a plurality of output values based on the intermediate values.