Patent attributes
Features are disclosed for distributing the training of models over multiple computing nodes (e.g., servers or other computing devices). Each computing device may include a separate copy of the model to be trained, and a subset of the training data to be used. A computing device may determine updates for parameters of the model based on processing of a portion of the training data. A portion of those updates may be selected for application to the model and synchronization with other computing devices. In some embodiments, the portion of the updates is selected based on a threshold value. Other computing devices can apply the received portion of the updates such that the copy of the model being trained in each individual computing device may be substantially synchronized, even though each computing device may be using a different subset of training data to train the model.