Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for classifying actions of an object. The methods, systems, and apparatus include actions of: obtaining frames of video including an object of interest; determining a type of action of the object in each of the frames of video; determining a group of frames from the frames of video based on the type of action; determining an aggregated background subtraction (ABS) image based on adjacent frames of the group of frames; generating a training set that includes labeled ABS images including the ABS image; and training an action classifier using the training set.