Patent attributes
Some embodiments include a special-purpose hardware accelerator that can perform specialized machine learning tasks during both training and inference stages. For example, this hardware accelerator uses a systolic array having a number of data processing units (“DPUs”) that are each connected to a small number of other DPUs in a local region. Data from the many nodes of a neural network is pulsed through these DPUs with associated tags that identify where such data was originated or processed, such that each DPU has knowledge of where incoming data originated and thus is able to compute the data as specified by the architecture of the neural network. These tags enable the systolic neural network engine to perform computations during backpropagation, such that the systolic neural network engine is able to support training.