Authors
Wei Zhang, Thomas G Dietterich
Publication date
1995/8/20
Journal
Ijcai
Volume
95
Pages
1114-1120
Description
We apply reinforcement learning methods to learn domain-speci c heuristics for job shop scheduling. A repair-based scheduler starts with a critical-path schedule and incrementally repairs constraint violations with the goal of nding a short con ict-free schedule. The temporal di erence algorithm TD () is applied to train a neural network to learn a heuristic evaluation function over states. This evaluation function is used by a one-step lookahead search procedure to nd good solutions to new scheduling problems. We evaluate this approach on synthetic problems and on problems from a NASA space shuttle payload processing task. The evaluation function is trained on problems involving a small number of jobs and then tested on larger problems. The TD scheduler performs better than the best known existing algorithm for this task| Zweben's iterative repair method based on simulated annealing. The results suggest that reinforcement learning can provide a new method for constructing high-performance scheduling systems.
Total citations
19951996199719981999200020012002200320042005200620072008200920102011201220132014201520162017201820192020202120222023202441320152020202023151726362127171615141512141422213436423311