Patent attributes
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a simulation of an environment that is being interacted with by a plurality of agents over a plurality of time steps, wherein the simulation comprises a respective simulation state for each time step that specifies a respective state of each agent at the time step. In one aspect, a method comprises, for each time step: obtaining a current simulation state for the current time step; generating a plurality of candidate next simulation states for a next time step; determining, for each candidate next simulation state, a discriminative score characterizing a likelihood that the candidate next simulation state is a realistic simulation state; and selecting a candidate next simulation state as the simulation state for the next time step based on the discriminative scores for the candidate next simulation states.