US Patent 8392346 Reinforcement learning system

A reinforcement learning system (1) of the present invention utilizes a value of a first value gradient function (dV1/dt) in the learning performed by a second learning device (122), namely in evaluating a second reward (r2(t)). The first value gradient function (dV1/dt) is a temporal differential of a first value function (V1) which is defined according to a first reward (r1(t)) obtained from an environment and is served as a learning result given by a first learning device (121). An action policy which should be taken by a robot (R) to execute a task is determined based on the second reward (r2(t)).

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

No Further Resources data yet.

US Patent 8392346 Reinforcement learning system

Contents

Patent attributes

Timeline

Further Resources

References

Find more entities like US Patent 8392346 Reinforcement learning system