Patent attributes
A processing unit of an inference engine for machine learning (ML) includes a first, a second, and a third register, and a matrix multiplication block. The first register receives a first stream of data associated with a first matrix data that is read only once. The second register receives a second stream of data associated with a second matrix data that is read only once. The matrix multiplication block performs a multiplication operation based on data from the first register and the second register resulting in an output matrix. A row associated with the first matrix is maintained while rows associated with the second matrix is fed to the matrix multiplication block to perform a multiplication operation. The process is repeated for each row of the first matrix. The third register receives the output matrix from the matrix multiplication block and stores the output matrix.