Patent attributes
A temporal object segmentation system determines a location of an object depicted in a video. In some cases, the temporal object segmentation system determines the object's location in a particular frame of the video based on information indicating a previous location of the object in a previous video frame. For example, an encoder neural network in the temporal object segmentation system extracts features describing image attributes of a video frame. A convolutional long-short term memory neural network determines the location of the object in the frame, based on the extracted image attributes and information indicating a previous location in a previous frame. A decoder neural network generates an image mask indicating the object's location in the frame. In some cases, a video editing system receives multiple generated masks for a video, and modifies one or more video frames based on the locations indicated by the masks.