🏎️

Rollout for Discrete Optimization

Improving base heuristics through one-step lookahead and cost approximation.

⚙️

1. The Rollout Algorithm

Rollout is a specific way to construct the approximate cost-to-go \( \tilde{J}_{k+1} \) using a Base Heuristic.

How it Works

  1. At state \( x_k \): Consider all possible controls \( u_k \).
  2. Simulate Next State: For each \( u_k \), compute \( x_{k+1} = f_k(x_k, u_k) \).
  3. Run Heuristic: From each \( x_{k+1} \), run the base heuristic to the end of the horizon.
  4. Calculate Cost: Sum the immediate cost \( g_k \) and the heuristic cost \( H_{k+1} \).
  5. Choose Best: Pick the \( u_k \) that minimizes this total cost.

Rollout Flowchart

graph TD Start("State x_k") --> Branch1("Try u_k^1") Start --> Branch2("Try u_k^2") Start --> Branch3("Try u_k^3") Branch1 --> Next1("x_{k+1}^1") Branch2 --> Next2("x_{k+1}^2") Branch3 --> Next3("x_{k+1}^3") Next1 --> Heur1("Run Heuristic -> Cost H1") Next2 --> Heur2("Run Heuristic -> Cost H2") Next3 --> Heur3("Run Heuristic -> Cost H3") Heur1 --> Compare("Compare Total Costs") Heur2 --> Compare Heur3 --> Compare Compare --> Best("Choose Best u_k") style Best fill:#f3e8ff,stroke:#8b5cf6,stroke-width:2px
🚚

2. Example: Multi-Vehicle Routing

Problem: 2 Vehicles, 2 Tasks (at nodes 7 and 9). Minimize total moves to complete all tasks.

Base Heuristic

"Move each vehicle towards its nearest pending task along the shortest path, ignoring the other vehicle."

Problem: Vehicles might crowd the same task or ignore a better joint strategy.

Rollout Strategy

At each step, consider all possible joint moves (e.g., V1 moves N, V2 moves E). Run the heuristic from the resulting positions to see which move leads to the best long-term outcome.

Routing Graph Visualization

graph LR subgraph Grid 1((1)) --- 2((2)) --- 3((3)) 4((4)) --- 5((5)) --- 6((6)) 7((7*)) --- 8((8)) --- 9((9*)) end style 7 fill:#fce7f3,stroke:#ec4899,stroke-width:2px style 9 fill:#fce7f3,stroke:#ec4899,stroke-width:2px

* Tasks are at nodes 7 and 9.

📈

3. Cost Improvement

Cost Improvement Property

Under appropriate conditions (e.g., sequential consistency), the Rollout policy performs at least as well as the base heuristic.

\[ J_{rollout}(x) \le J_{heuristic}(x) \]

In the vehicle routing example:

📝

4. Test Your Knowledge

1. What is the "Base Heuristic" in Rollout?

2. How many times do we run the heuristic at each step?

3. In the Vehicle Routing example, why does the heuristic fail?

4. What is the "Cost Improvement Property"?

5. Is Rollout an "Online" or "Offline" method?

Previous

Lecture 19

Next

Lecture 21