Patent attributes
Disclosed is a data processing system that includes compile time logic configured to process a processing graph to generate a modified processing graph, which includes a plurality of forward processing nodes of a forward pass and a plurality of backward processing nodes of a backward pass. The data processing system also includes runtime logic configured with the compile time logic to execute the modified processing graph to generate, at a backward processing node of the plurality of backward processing nodes, a plurality of partial weight gradients, based on processing a corresponding plurality of gradient tiles of a gradient tensor, and generate, based on the plurality of partial weight gradients, a final weight gradient corresponding to the gradient tensor.