One-Step & Multistep Lookahead

❓

1. Ways of Approximation

We've established that we need to approximate the optimal cost-to-go function \( J^* \). But how far should we look ahead before applying this approximation?

1️⃣

One-Step Lookahead

Look one stage ahead, then approximate.

🔢

Multistep Lookahead

Look \( \ell \) stages ahead, then approximate.

👀

2. One-Step Lookahead

Definition

The algorithm involves a one-step lookahead minimization if it solves a one-stage DP problem for each \( k \):

\[ \tilde{u}_k = \arg\min_{u_k \in U_k(\tilde{x}_k)} \left[ g_k(\tilde{x}_k, u_k) + \tilde{J}_{k+1} (f_k(\tilde{x}_k, u_k)) \right] \]

At state \( k \), you look ahead to various possible next states \( x_{k+1} \).
Minimization is performed over controls \( u_k \).
We require the approximate cost \( \tilde{J}_{k+1} \).

🪜

3. Multistep Lookahead

Multistep lookahead involves solving an \( \ell \)-step DP problem, where \( 1 < \ell < N - k \), with a terminal cost function approximation \( \tilde{J}_{k+\ell} \).

Why use Multistep?

It typically provides better performance. The intuitive reason is that by treating \( \ell \) stages "exactly" via optimization, the effect of the approximation error \( \tilde{J}_{k+\ell} - J^*_{k+\ell} \) tends to diminish as \( \ell \) increases.

Example: In AlphaZero chess, long multistep lookahead is critical for superior performance.

Trade-off:

The solution becomes more time-consuming compared to one-step lookahead.

♟️

4. Example: Chess

Analyzing the Game

?
States:
The arrangement of pieces on the board.
?
Actions (Controls):
Legal moves available to the current player.
?
Lookahead:
Calculating potential future board states. Grandmasters might look 10-15 steps ahead (Multistep), while novices might only look 1 step ahead.

But how do I compute \( \tilde{J} \)?

We've talked about using \( \tilde{J} \) for approximation, but we haven't discussed where this function comes from.