Patent attributes
Techniques for selectively offloading data that is computed by a first processing unit during training of an artificial neural network onto memory associated with a second processing unit and transferring the data back to the first processing unit when the data is needed for further processing are described herein. For example, the first processing unit may compute activations for operations associated with forward propagation. During the forward propagation, one or more of the activations may be transferred to a second processing unit for storage. Then, during backpropagation for the artificial neural network, the activations may be transferred back to the first processing unit as needed to compute gradients.