Patent attributes
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing an interactive autonomous vehicle agent. One of the methods includes receiving a request to generate an experience tuple for a vehicle in a particular driving context. A predicted environment observation representing a predicted environment of the autonomous vehicle after the candidate action is taken by the autonomous vehicle in an initial environment is generated, including providing an initial environment observation and the candidate action as input to a vehicle behavior model neural network trained to generate predicted environment observations. An immediate quality value is generated from a context-specific quality model that generates immediate quality values that are specific to the particular driving context. An experience tuple comprising the initial environment observation, the candidate action, and the immediate quality value is generated and used as input to a reinforcement learning system for the autonomous vehicle.