Patent attributes
A system for convolving an image includes a processing circuitry that retrieves the image including a set of rows, and a set of kernels, and merges serially all columns of each kernel, to generate a merged kernel. The processing circuitry executes parallelly multiple times, a multiply-accumulate (MAC) instruction on a row loaded in a corresponding vector register and a corresponding coefficient of the merged kernel and a load instruction on a subsequent row in one clock cycle. In the same clock cycle based on the MAC instruction, a logical shift operation is executed on the merged kernel to shift a current coefficient of the merged kernel with a subsequent coefficient such that the MAC instruction is executed on the subsequent row and the subsequent coefficient in the next clock cycle. Thus, each clock cycle is utilized by the system for executing both the MAC and load instructions.