1. The Rollout Algorithm
Rollout is a specific way to construct the approximate cost-to-go \( \tilde{J}_{k+1} \) using a Base Heuristic.
How it Works
- At state \( x_k \): Consider all possible controls \( u_k \).
- Simulate Next State: For each \( u_k \), compute \( x_{k+1} = f_k(x_k, u_k) \).
- Run Heuristic: From each \( x_{k+1} \), run the base heuristic to the end of the horizon.
- Calculate Cost: Sum the immediate cost \( g_k \) and the heuristic cost \( H_{k+1} \).
- Choose Best: Pick the \( u_k \) that minimizes this total cost.
Rollout Flowchart
2. Example: Multi-Vehicle Routing
Problem: 2 Vehicles, 2 Tasks (at nodes 7 and 9). Minimize total moves to complete all tasks.
Base Heuristic
"Move each vehicle towards its nearest pending task along the shortest path, ignoring the other vehicle."
Problem: Vehicles might crowd the same task or ignore a better joint strategy.
Rollout Strategy
At each step, consider all possible joint moves (e.g., V1 moves N, V2 moves E). Run the heuristic from the resulting positions to see which move leads to the best long-term outcome.
Routing Graph Visualization
* Tasks are at nodes 7 and 9.
3. Cost Improvement
Cost Improvement Property
Under appropriate conditions (e.g., sequential consistency), the Rollout policy performs at least as well as the base heuristic.
\[ J_{rollout}(x) \le J_{heuristic}(x) \]In the vehicle routing example:
- Heuristic Alone: 10 moves (Vehicles interfere/inefficient).
- Rollout: 6 moves (Optimal!). It "sees" that splitting up is better by simulating the heuristic's failure in other branches.