Patent attributes
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency, such as accuracy of learning, accuracy of prediction, speed of learning, performance of learning, and energy efficiency of learning. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has processing resources and memory resources. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Stochastic gradient descent, mini-batch gradient descent, and continuous propagation gradient descent are techniques usable to train weights of a neural network modeled by the processing elements. Reverse checkpoint is usable to reduce memory usage during the training.