Patent attributes
Systems and methods for generating object segmentations across videos are provided. An example system can enable an annotator to identify objects within a first image frame of a video sequence by clicking anywhere within the object. The system processes the first image frame and a second, subsequent, image frame to assign each pixel of the second image frame to one of the objects identified in the first image frame or the background. The system refines the resulting object masks for the second image frame using a recurrent attention module based on contextual features extracted from the second image frame. The system receives additional user input for the second image frame and uses the input, in combination with the object masks for the second image frame, to determine object masks for a third, subsequent, image frame in the video sequence. The process is repeated for each image in the video sequence.