Patent attributes
Methods and systems for training a motion planner for an autonomous vehicle are described. A trajectory evaluator agent of the motion planner receives state data defining a current state of the autonomous vehicle and an environment at a current time step. Based on the current state, a trajectory is selected. A reward is calculated based on performance of the selected trajectory in the current state. State data is received for a next state of the autonomous vehicle and the environment at a next time step. Parameters of the trajectory evaluator agent are updated based on the current state, selected trajectory, computed reward and next state. The parameters of the trajectory evaluator agent are updated to assign an evaluation value for the selected trajectory that reflects the calculated reward and expected performance of the selected trajectory in the future states.