Patent attributes
A reinforcement learning processor specifically configured to execute reinforcement learning operations by the way of implementing an application-specific instruction set is envisaged. The application-specific instruction set incorporates ‘Single Instruction Multiple Agents (SIMA)’ instructions. SIMA type instructions are specifically designed to be implemented simultaneously on a plurality of reinforcement learning agents which interact with corresponding reinforcement learning environments. The SIMA type instructions are specifically configured to receive either a reinforcement learning agent ID or a reinforcement learning environment ID as the operand. The reinforcement learning processor uses neural network data paths to communicate with a neural network which in turn uses the actions, state-value functions, Q-values and reward values generated by the reinforcement learning processor to approximate an optimal state-value function as well as an optimal reward function.