✂️
1. Truncated Rollout
For problems with very long horizons (large \( N \)), running the heuristic to the end is too expensive.
The Strategy
- Run the base heuristic for only \( \ell \) steps (truncation).
- Add a Terminal Cost Approximation \( \tilde{J}(x_{k+\ell}) \) at the leaf nodes.
- This bridges the gap between full rollout and pure value space approximation.
Truncated Lookahead
graph TD
Root("State x_k") -->|u_k| Next("x_{k+1}")
Next -->|Heuristic Step 1| H1("...")
H1 -->|Heuristic Step l| Leaf("Leaf x_{k+l}")
Leaf -->|Approx Cost| Cost("Terminal Cost J~(x_{k+l})")
style Leaf fill:#e0e7ff,stroke:#6366f1
style Cost fill:#f3e8ff,stroke:#a855f7
🧠
2. Model-Free Rollout
What if we don't know the cost function \( G(u) \) or the constraints?
The Setup
- Base Heuristic: Can generate complete feasible solutions from partial ones.
- Expert: A human or software oracle that can rank any two solutions (better/worse) without giving numbers.
Expert-Guided Selection
graph LR
Partial("Partial Solution (u_0...u_k)") -->|Extend u^1| Sol1("Complete Sol 1")
Partial -->|Extend u^2| Sol2("Complete Sol 2")
Sol1 --> Expert("Expert Ranking")
Sol2 --> Expert
Expert --> Choice("Select Best u_{k+1}")
style Expert fill:#fef3c7,stroke:#f59e0b
💬
3. LLMs as Rollout Policies
Large Language Models (LLMs) like GPT can be viewed as policies for generating text (sequential decisions).
The Analogy
- State: Context window (partial text).
- Control: Next word selection.
- Heuristic: The LLM generating the rest of the sentence.
Challenges
- Cost Function: Hard to define "quality" numerically.
- Expert: Need a way to rank completions (e.g., Reward Models in RLHF).
📝