Patent attributes
Interaction-aware decision making may include training a first agent based on a first policy gradient, training a first critic based on a first loss function to learn goals in a single-agent environment using a Markov decision process, training a number N of agents based on the first policy gradient, training a second policy gradient and a second critic based on the first loss function and a second loss function to learn goals in a multi-agent environment using a Markov game to instantiate a second agent neural network, and generating an interaction-aware decision making network policy based on the first agent neural network and the second agent neural network. The N number of agents may be associated with a driver type indicative of a level of cooperation. When a collision occurs, a negative reward or penalty may be assigned to each agent involved based on a lane priority level of respective agents.