1. Sequential Consistency
Definition
A base heuristic is sequentially consistent if it "stays the course".
If starting at \( x_k \) it generates a trajectory \( \{x_k, \dots, x_N\} \), then starting at \( x_{k+1} \) (the next state in that trajectory), it should generate the same remainder \( \{x_{k+1}, \dots, x_N\} \).
Visualizing Consistency
The path from \( x_{k+1} \) is identical in both cases.
2. Cost Improvement Theorem
Theorem
If the base heuristic is sequentially consistent, then the Rollout policy \( \tilde{\pi} \) performs at least as well as the base heuristic \( H \).
\[ J_{k, \tilde{\pi}}(x_k) \leq H_k(x_k) \]Why? Because the rollout algorithm considers the move suggested by the heuristic as one of the options. Since the heuristic is consistent, taking that move and continuing with the heuristic yields exactly \( H_k(x_k) \). The minimization step ensures we pick something better or equal.
3. Sequential Improvement
What if the heuristic is NOT consistent? We need a stronger condition.
Definition
A heuristic is sequentially improving if at every state \( x_k \), the rollout minimum Q-factor is better than or equal to the heuristic cost.
\[ \min_{u_k} \tilde{Q}_k(x_k, u_k) \leq H_k(x_k) \]Trajectory Improvement
Under sequential improvement, the cost of the generated trajectories decreases (or stays same) at each step.
4. When it Fails
If the heuristic is not consistent (e.g., changes its mind mid-way), rollout might actually perform worse.
Example: Manhattan Grid
Imagine a heuristic that prefers "Go East" if x < 5, but "Go North" if x>= 5. If the rollout step moves you to x=5, the heuristic suddenly changes strategy, potentially abandoning a good path it "promised" earlier.
5. The Fortified Rollout Algorithm
To guarantee cost improvement even when the base heuristic is not sequentially improving, we can use the Fortified Rollout algorithm.
Key Concepts
- Permanent Trajectory (\( P_k \)): The fixed path we have already traversed up to step \( k \).
- Tentative Best Trajectory (\( \bar{T}_k \)): The best complete trajectory found so far (from start to finish).
Update Rule
At each step \( k \), we compute a new candidate trajectory \( \tilde{T}_k \) using standard rollout.
If Cost(\( \tilde{T}_k \)) < Cost(\( \bar{T}_k \)):
Update \( \bar{T}_{k+1} = \tilde{T}_k \) (Switch to the new better
path).
Else:
Keep \( \bar{T}_{k+1} = \bar{T}_k \) (Stick to the best known path).