Is a
Patent attributes
Patent Jurisdiction
Patent Number
Date of Patent
March 5, 2013
Patent Application Number
12610709
Date Filed
November 2, 2009
Patent Citations Received
Patent Primary Examiner
Patent abstract
A reinforcement learning system (1) of the present invention utilizes a value of a first value gradient function (dV1/dt) in the learning performed by a second learning device (122), namely in evaluating a second reward (r2(t)). The first value gradient function (dV1/dt) is a temporal differential of a first value function (V1) which is defined according to a first reward (r1(t)) obtained from an environment and is served as a learning result given by a first learning device (121). An action policy which should be taken by a robot (R) to execute a task is determined based on the second reward (r2(t)).
Timeline
No Timeline data yet.
Further Resources
No Further Resources data yet.