Patent attributes
Provided is a method, a system, and a program product for determining a policy using semi-supervised reinforcement learning. The method includes observing a state of an environment by a learning agent. The method also includes taking an action by the learning agent. The method further includes observing a new state of the environment and calculating a reward for the action taken by the learning agent. The method also includes determining whether a policy related to the learning agent should be changed. The determination is conducted by a teaching agent that inputs the state of the environment and the reward as features. The method can also include changing the policy related to the learning agent upon a determination that a label outputted by the teaching agent exceeds a reward threshold.