Patent attributes
One embodiment of the present invention sets forth a technique for performing a labeling task. The technique includes determining one or more region proposals, wherein each region proposal included in the one or more region proposals includes estimates of one or more bounding boxes surrounding one or more objects in a plurality of video frames. The technique also includes performing one or more operations that execute a refinement stage of a machine learning model to produce one or more refined estimates of the one or more bounding boxes included in the one or more region proposals. The technique further includes outputting the one or more refined estimates as initial representations of the one more bounding boxes for subsequent annotation of the one or more bounding boxes by one or more users.