US Patent 11366433 Reinforcement learning method and device

A reinforcement learning device includes a processor that determines a first action on a control target by using a basic controller that defines an action on the control target depending on a state of the control target. The processor performs a first reinforcement learning within a first action range around the first action in order to acquire a first policy for determining an action on the control target depending on a state of the control target. The first action range is smaller than a limit action range for the control target. The processor determines a second action on the control target by using the first policy. The processor updates the first policy to a second policy by performing a second reinforcement learning within a second action range around the second action. The second action range is smaller than the limit action range.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

No Further Resources data yet.

US Patent 11366433 Reinforcement learning method and device

Contents

Patent attributes

Timeline

Further Resources

References

Find more entities like US Patent 11366433 Reinforcement learning method and device