Patent attributes
Systems and method directed to performing video object segmentation are provided. In examples, video data representing a sequence of image frames and video data representing an object mask may be received at a video object segmentation server. Image features may be generated based on a first image frame of the sequence of image frames, image features may be generated based on a second image frame of the sequence of image frames; and object features may be generated based on the object mask. A transform matrix may be computed based on the image features of the first image frame and image features of the second image frame; the transform matrix may be applied to the object features resulting in transformed object features. A predicted object mask associated with the second image frame may be obtained by decoding the transformed object features.