1. Parallel Rollout
Why limit ourselves to one heuristic? We can use multiple heuristics simultaneously and pick the best one. This is called a Superheuristic.
The Superheuristic Concept
Given \( m \) heuristics, for each next state \( x_{k+1} \), we run all of them in parallel.
The Q-factor becomes the minimum over all heuristics:
\[ \widetilde{Q}_k(x_k, u_k) = g_k(x_k, u_k) + \min_{\ell = 1, \ldots, m} C(\widetilde{T}_{k+1}^{\ell}) \]If all base heuristics are sequentially improving, the superheuristic is also sequentially improving!
Parallel Execution Flow
2. Simplified Rollout
When the control space \( U_k(x_k) \) is very large or infinite, checking every single control is impossible.
The Strategy
Instead of minimizing over the full set \( U_k(x_k) \), we minimize over a smaller, manageable subset \( \tilde{U}_k(x_k) \).
\[ \hat{\mu}_k (x_k) \in \arg\min_{u_k \in \tilde{U}_k(x_k)} \hat{Q}_k(x_k, u_k) \]How to choose \( \tilde{U}_k \)?
- Discretization: Grid search over continuous space.
- Heuristic Selection: Use a fast method to pick "promising" candidates.
- Random Search: Sample random controls intelligently.
Cost Improvement Condition
To guarantee improvement, the subset \( \tilde{U}_k(x_k) \) must be "good enough".
Sufficient Condition: Ensure the base heuristic control is included in \( \tilde{U}_k(x_k) \).