Cost Improvement with Rollout

🤝

1. Sequential Consistency

Definition

A base heuristic is sequentially consistent if it "stays the course".

If starting at \( x_k \) it generates a trajectory \( \{x_k, \dots, x_N\} \), then starting at \( x_{k+1} \) (the next state in that trajectory), it should generate the same remainder \( \{x_{k+1}, \dots, x_N\} \).

Visualizing Consistency

graph LR subgraph "Start at x_k" xk((x_k)) -->|Heuristic| xk1((x_{k+1})) xk1 -->|Heuristic| xk2((x_{k+2})) xk2 -->|Heuristic| xN((x_N)) end subgraph "Start at x_{k+1}" xk1_new((x_{k+1})) -->|Heuristic| xk2_new((x_{k+2})) xk2_new -->|Heuristic| xN_new((x_N)) end style xk fill:#dcfce7,stroke:#22c55e style xk1_new fill:#dcfce7,stroke:#22c55e

The path from \( x_{k+1} \) is identical in both cases.

📜

2. Cost Improvement Theorem

Theorem

If the base heuristic is sequentially consistent, then the Rollout policy \( \tilde{\pi} \) performs at least as well as the base heuristic \( H \).

\[ J_{k, \tilde{\pi}}(x_k) \leq H_k(x_k) \]

Why? Because the rollout algorithm considers the move suggested by the heuristic as one of the options. Since the heuristic is consistent, taking that move and continuing with the heuristic yields exactly \( H_k(x_k) \). The minimization step ensures we pick something better or equal.

🚀

3. Sequential Improvement

What if the heuristic is NOT consistent? We need a stronger condition.

Definition

A heuristic is sequentially improving if at every state \( x_k \), the rollout minimum Q-factor is better than or equal to the heuristic cost.

\[ \min_{u_k} \tilde{Q}_k(x_k, u_k) \leq H_k(x_k) \]

Trajectory Improvement

Under sequential improvement, the cost of the generated trajectories decreases (or stays same) at each step.

Cost(T_0) ≥ Cost(T_1) ≥ ... ≥ Cost(T_N)

⚠️

4. When it Fails

If the heuristic is not consistent (e.g., changes its mind mid-way), rollout might actually perform worse.

Example: Manhattan Grid

Imagine a heuristic that prefers "Go East" if x < 5, but "Go North" if x>= 5. If the rollout step moves you to x=5, the heuristic suddenly changes strategy, potentially abandoning a good path it "promised" earlier.

🛡️

5. The Fortified Rollout Algorithm

To guarantee cost improvement even when the base heuristic is not sequentially improving, we can use the Fortified Rollout algorithm.

Key Concepts

Permanent Trajectory (\( P_k \)): The fixed path we have already traversed up to step \( k \).
Tentative Best Trajectory (\( \bar{T}_k \)): The best complete trajectory found so far (from start to finish).

Update Rule

At each step \( k \), we compute a new candidate trajectory \( \tilde{T}_k \) using standard rollout.

If Cost(\( \tilde{T}_k \)) < Cost(\( \bar{T}_k \)):
Update \( \bar{T}_{k+1} = \tilde{T}_k \) (Switch to the new better path).

Else:
Keep \( \bar{T}_{k+1} = \bar{T}_k \) (Stick to the best known path).

Fortified Update Logic

graph TD Start("Current Best: T_bar") --> New("Compute New: T_tilde") New --> Check{"Is T_tilde < T_bar?"} Check -- Yes --> Update("Update Best: T_bar = T_tilde") Check -- No --> Keep("Keep Best: T_bar") Update --> Next("Proceed to k+1") Keep --> Next style Update fill:#dcfce7,stroke:#22c55e style Keep fill:#fce7f3,stroke:#ec4899

1. Sequential Consistency

Definition

Visualizing Consistency

2. Cost Improvement Theorem

Theorem

3. Sequential Improvement

Definition

Trajectory Improvement

4. When it Fails

Example: Manhattan Grid

5. The Fortified Rollout Algorithm

Key Concepts

Update Rule

Fortified Update Logic

6. Test Your Knowledge

1. What does "Sequentially Consistent" mean?

2. Is the "Shortest Path" heuristic typically sequentially consistent?

3. If a heuristic is sequentially consistent, does Rollout improve cost?

4. What is "Sequential Improvement"?

5. Can Rollout perform worse than the heuristic?