Rollout for Discrete Optimization

⚙️

1. The Rollout Algorithm

Rollout is a specific way to construct the approximate cost-to-go \( \tilde{J}_{k+1} \) using a Base Heuristic.

How it Works

At state \( x_k \): Consider all possible controls \( u_k \).
Simulate Next State: For each \( u_k \), compute \( x_{k+1} = f_k(x_k, u_k) \).
Run Heuristic: From each \( x_{k+1} \), run the base heuristic to the end of the horizon.
Calculate Cost: Sum the immediate cost \( g_k \) and the heuristic cost \( H_{k+1} \).
Choose Best: Pick the \( u_k \) that minimizes this total cost.

Rollout Flowchart

graph TD Start("State x_k") --> Branch1("Try u_k^1") Start --> Branch2("Try u_k^2") Start --> Branch3("Try u_k^3") Branch1 --> Next1("x_{k+1}^1") Branch2 --> Next2("x_{k+1}^2") Branch3 --> Next3("x_{k+1}^3") Next1 --> Heur1("Run Heuristic -> Cost H1") Next2 --> Heur2("Run Heuristic -> Cost H2") Next3 --> Heur3("Run Heuristic -> Cost H3") Heur1 --> Compare("Compare Total Costs") Heur2 --> Compare Heur3 --> Compare Compare --> Best("Choose Best u_k") style Best fill:#f3e8ff,stroke:#8b5cf6,stroke-width:2px

🚚

2. Example: Multi-Vehicle Routing

Problem: 2 Vehicles, 2 Tasks (at nodes 7 and 9). Minimize total moves to complete all tasks.

Base Heuristic

"Move each vehicle towards its nearest pending task along the shortest path, ignoring the other vehicle."

Problem: Vehicles might crowd the same task or ignore a better joint strategy.

Rollout Strategy

At each step, consider all possible joint moves (e.g., V1 moves N, V2 moves E). Run the heuristic from the resulting positions to see which move leads to the best long-term outcome.

Routing Graph Visualization

graph LR subgraph Grid 1((1)) --- 2((2)) --- 3((3)) 4((4)) --- 5((5)) --- 6((6)) 7((7*)) --- 8((8)) --- 9((9*)) end style 7 fill:#fce7f3,stroke:#ec4899,stroke-width:2px style 9 fill:#fce7f3,stroke:#ec4899,stroke-width:2px

* Tasks are at nodes 7 and 9.

📈

3. Cost Improvement

Cost Improvement Property

Under appropriate conditions (e.g., sequential consistency), the Rollout policy performs at least as well as the base heuristic.

\[ J_{rollout}(x) \le J_{heuristic}(x) \]

In the vehicle routing example:

Heuristic Alone: 10 moves (Vehicles interfere/inefficient).
Rollout: 6 moves (Optimal!). It "sees" that splitting up is better by simulating the heuristic's failure in other branches.

1. The Rollout Algorithm

How it Works

Rollout Flowchart

2. Example: Multi-Vehicle Routing

Base Heuristic

Rollout Strategy

Routing Graph Visualization

3. Cost Improvement

Cost Improvement Property

4. Test Your Knowledge

1. What is the "Base Heuristic" in Rollout?

2. How many times do we run the heuristic at each step?

3. In the Vehicle Routing example, why does the heuristic fail?

4. What is the "Cost Improvement Property"?

5. Is Rollout an "Online" or "Offline" method?