Patent attributes
Systems, methods, and apparatuses relating to enhanced matrix multiplier architecture are described. In one embodiment, an apparatus includes a matrix operations accelerator circuit having a two-dimensional grid of multiplier circuits; a first plurality of registers that represents a first two-dimensional matrix coupled to the matrix operations accelerator circuit; a second plurality of registers that represents a second two-dimensional matrix coupled to the matrix operations accelerator circuit; a decoder, of a core coupled to the matrix operations accelerator circuit, to decode a single instruction into a decoded single instruction; and an execution circuit of the core to execute the decoded single instruction to store each element of the first two-dimensional matrix from the first plurality of registers into a respective clocked flip-flop circuit of each multiplier circuit of the two-dimensional grid of multiplier circuits, store a first element of a first proper subset of elements of the second two-dimensional matrix from the second plurality of registers into a single first clocked flip-flop circuit coupled to a first proper subset of multiplier circuits of the two-dimensional grid of multiplier circuits, store a second element of the first proper subset of elements of the second two-dimensional matrix from the second plurality of registers into a single second clocked flip-flop circuit coupled to a second proper subset of multiplier circuits of the two-dimensional grid of multiplier circuits, multiply the first element of the first proper subset of elements from the single first clocked flip-flop circuit by a respective element from the clocked flip-flop circuit of each multiplier circuit of the first proper subset of multiplier circuits to generate a first plurality of resultants, and multiply the second element of the first proper subset of elements from the single second clocked flip-flop circuit by a respective element from the clocked flip-flop circuit of each multiplier circuit of the second proper subset of multiplier circuits to generate a second plurality of resultants.