Advanced Rollout Strategies

⚡

1. Parallel Rollout

Why limit ourselves to one heuristic? We can use multiple heuristics simultaneously and pick the best one. This is called a Superheuristic.

The Superheuristic Concept

Given \( m \) heuristics, for each next state \( x_{k+1} \), we run all of them in parallel.

The Q-factor becomes the minimum over all heuristics:

\[ \widetilde{Q}_k(x_k, u_k) = g_k(x_k, u_k) + \min_{\ell = 1, \ldots, m} C(\widetilde{T}_{k+1}^{\ell}) \]

If all base heuristics are sequentially improving, the superheuristic is also sequentially improving!

Parallel Execution Flow

graph TD State("State x_{k+1}") --> H1("Heuristic 1") State --> H2("Heuristic 2") State --> H3("Heuristic 3") H1 --> Cost1("Cost 1") H2 --> Cost2("Cost 2") H3 --> Cost3("Cost 3") Cost1 --> Min("Min(Costs)") Cost2 --> Min Cost3 --> Min Min --> Result("Superheuristic Value") style Min fill:#fef3c7,stroke:#f59e0b

✂️

2. Simplified Rollout

When the control space \( U_k(x_k) \) is very large or infinite, checking every single control is impossible.

The Strategy

Instead of minimizing over the full set \( U_k(x_k) \), we minimize over a smaller, manageable subset \( \tilde{U}_k(x_k) \).

\[ \hat{\mu}_k (x_k) \in \arg\min_{u_k \in \tilde{U}_k(x_k)} \hat{Q}_k(x_k, u_k) \]

How to choose \( \tilde{U}_k \)?

Discretization: Grid search over continuous space.
Heuristic Selection: Use a fast method to pick "promising" candidates.
Random Search: Sample random controls intelligently.

Cost Improvement Condition

To guarantee improvement, the subset \( \tilde{U}_k(x_k) \) must be "good enough".

Sufficient Condition: Ensure the base heuristic control is included in \( \tilde{U}_k(x_k) \).

📝

1. Parallel Rollout

The Superheuristic Concept

Parallel Execution Flow

2. Simplified Rollout

The Strategy

How to choose \( \tilde{U}_k \)?

Cost Improvement Condition

3. Test Your Knowledge

1. What is a "Superheuristic"?

2. Why is it called "Parallel" Rollout?

3. What is the main motivation for "Simplified Rollout"?

4. How can we guarantee cost improvement in Simplified Rollout?

5. If all base heuristics are sequentially improving, is the superheuristic also improving?