Patent attributes
Methods for gradient accumulation with free momentum are performed by systems and devices during neural network model training. An accumulator that includes a processor circuit and a memory element generates free momentum between passes of a neural network model training process. The processor circuit receives a difference weight (gradient) and generates a first input by applying a weighting parameter thereto. The processor circuit obtains a prior weight from the memory element and generates a second input by applying another weighting parameter thereto. The processor circuit generates a filtered input with momentum by filtering the first and second input. The memory element generates a stored next pass weight by accumulating the filtered input with the prior weight. A computing resource then processes the next pass of the neural network model training using the stored next pass weight. The methods, systems, and devices are applicable to pipelined model parallelism training processes.