Computer-implemented methods are provided for implementing training of a machine learning model in a heterogeneous processing system that includes a host computer operatively interconnected to an accelerator unit. The training operation involves an iterative optimization process for optimizing a model vector defining the model. Such a method includes, in the host computer, storing a matrix of training data and partitioning the matrix into a plurality of batches of data vectors. For each of successive iterations of the optimization process, a selected subset of the batches is provided to the accelerator unit. In the accelerator unit, each iteration of the optimization process is performed to update the model vector in dependence on vectors in the selected subset for that iteration. In the host computer, batch importance values are calculated for respective batches. The batch importance value is dependent on contributions of vectors in that batch to sub-optimality of the model vector.