Patent attributes
Systems and methods are described for training machine learning models to detect objects in image or video data. A system may select a first sample set of frames from one or more video files. Indications of a location of an object of interest in each of at least two sample frames may be received, then the system may determine the location of the object of interest across a number of intermediary frames using a tracker. Annotation data may be stored identifying the objects of interest in the sample frames, and the annotation data may be used in training a machine learning model to identify the object of interest in subsequently provided image or video data.