🚀

Advanced Rollout Strategies

Parallel Rollout with Multiple Heuristics and Simplified Rollout for Large Spaces.

1. Parallel Rollout

Why limit ourselves to one heuristic? We can use multiple heuristics simultaneously and pick the best one. This is called a Superheuristic.

The Superheuristic Concept

Given \( m \) heuristics, for each next state \( x_{k+1} \), we run all of them in parallel.

The Q-factor becomes the minimum over all heuristics:

\[ \widetilde{Q}_k(x_k, u_k) = g_k(x_k, u_k) + \min_{\ell = 1, \ldots, m} C(\widetilde{T}_{k+1}^{\ell}) \]

If all base heuristics are sequentially improving, the superheuristic is also sequentially improving!

Parallel Execution Flow

graph TD State("State x_{k+1}") --> H1("Heuristic 1") State --> H2("Heuristic 2") State --> H3("Heuristic 3") H1 --> Cost1("Cost 1") H2 --> Cost2("Cost 2") H3 --> Cost3("Cost 3") Cost1 --> Min("Min(Costs)") Cost2 --> Min Cost3 --> Min Min --> Result("Superheuristic Value") style Min fill:#fef3c7,stroke:#f59e0b
✂️

2. Simplified Rollout

When the control space \( U_k(x_k) \) is very large or infinite, checking every single control is impossible.

The Strategy

Instead of minimizing over the full set \( U_k(x_k) \), we minimize over a smaller, manageable subset \( \tilde{U}_k(x_k) \).

\[ \hat{\mu}_k (x_k) \in \arg\min_{u_k \in \tilde{U}_k(x_k)} \hat{Q}_k(x_k, u_k) \]

How to choose \( \tilde{U}_k \)?

  • Discretization: Grid search over continuous space.
  • Heuristic Selection: Use a fast method to pick "promising" candidates.
  • Random Search: Sample random controls intelligently.

Cost Improvement Condition

To guarantee improvement, the subset \( \tilde{U}_k(x_k) \) must be "good enough".

Sufficient Condition: Ensure the base heuristic control is included in \( \tilde{U}_k(x_k) \).

📝

3. Test Your Knowledge

1. What is a "Superheuristic"?

2. Why is it called "Parallel" Rollout?

3. What is the main motivation for "Simplified Rollout"?

4. How can we guarantee cost improvement in Simplified Rollout?

5. If all base heuristics are sequentially improving, is the superheuristic also improving?

Previous

Lecture 21

Next

Lecture 23