Patent attributes
A system comprising an encoder neural network, a scene structure decoder neural network, and a motion decoder neural network. The encoder neural network is configured to: receive a first image and a second image; and process the first image and the second image to generate an encoded representation of the first image and the second image. The scene structure decoder neural network is configured to process the encoded representation to generate a structure output characterizing a structure of a scene depicted in the first image. The motion decoder neural network configured to process the encoded representation to generate a motion output characterizing motion between the first image and the second image.