View article

[PDF] from googleapis.com

Approximate value iteration with complex returns by bounding

Inventors

Robert Wright, Lei Yu, Steven Loscalzo

Publication date

2020/11/17

Patent office

Patent number

10839302

Application number

15359122

Description

A control system and method for controlling a system, which employs a data set representing a plurality of states and associated trajectories of an environment of the system; and which iteratively determines an estimate of an optimal control policy for the system. The iterative process performs the substeps, until convergence, of estimating a long term value for operation at a respective state of the environment over a series of predicted future environmental states; using a complex return of the data set to determine a bound to improve the estimated long term value; and producing an updated estimate of an optimal control policy dependent on the improved estimate of the long term value. The control system may produce an output signal to control the system directly, or output the optimized control policy. The system preferably is a reinforcement learning system which continually improves.

Total citations

Cited by 60

2019202020212022202320245 7 13 16 11 3

Scholar articles

Approximate value iteration with complex returns by bounding

R Wright, L Yu, S Loscalzo - US Patent 10,839,302, 2020