Patent attributes
In one embodiment, a method includes receiving a plurality of input frames of a video sequence associated with a time t, training a convolutional network to predict one or more future frames of the video sequence from the plurality of input frames based on a generative model, and outputting a first future frame of the video sequence associated with a time t+1 as predicted by the generative model. The training may comprise using an adversarial model and an image gradient difference loss model. In addition, the training may comprise randomly selecting temporal sequences of a n×m grid of pixels from the plurality of input frames exhibiting a threshold of optical flow.