🎯

Approximation in Value Space

A central idea in Reinforcement Learning: replacing exact optimal cost-to-go functions with approximations to overcome computational limits.

🤔

1. The Need for Approximation

The forward optimal control sequence construction requires us to solve: \[ u_k^* \in \arg \min_{u_k \in U_k(x_k^*)} \left[ g_k(x_k^*, u_k) + J_{k+1}^*(f_k(x_k^*, u_k)) \right] \]

The Problem:

This is possible only after we have computed \( J^*_k(x_k) \) by DP for all \( x_k \) and \( k \). In practice, this is often prohibitively time-consuming because the number of possible states can be enormous.

The Solution: Approximation in Value Space

A similar forward algorithmic process can be used if the optimal cost-to-go functions \( J^*_k \) are replaced by some approximations \( \tilde{J}_k \).

This constructs a suboptimal sequence \( \{ u^*_0, \dots, u^*_{N-1} \} \) based on using \( \tilde{J}_k \) in place of \( J^*_k \) in the DP algorithm.

⚙️

2. The Algorithm

We proceed sequentially forward, using the approximate cost-to-go functions \( \tilde{J}_k \).

1

Initial Step

Start at \( x_0 \) and compute:

\[ \tilde{u}_0 = \arg\min_{u_0 \in U_0(x_0)} \left[ g_0(x_0, u_0) + \tilde{J}_1 (f_0(x_0, u_0)) \right] \]
2

State Transition

Move to the next state:

\[ \tilde{x}_1 = f_0(x_0, \tilde{u}_0) \]
3

Sequential Steps

For \( k = 1, 2, \dots, N - 1 \), set:

\[ \tilde{u}_k = \arg\min_{u_k \in U_k(\tilde{x}_k)} \left[ g_k(\tilde{x}_k, u_k) + \tilde{J}_{k+1} (f_k(\tilde{x}_k, u_k)) \right] \]

and update the state:

\[ \tilde{x}_{k+1} = f_k(\tilde{x}_k, \tilde{u}_k) \]
📝

3. Key Remarks

Forward Computation Only

The suboptimal sequence \( \{\tilde{u}_0, \dots, \tilde{u}_{N-1}\} \) is obtained by going forward. No backward calculation is required if the approximate functions \( \tilde{J}_k \) are already available.

Vastly Reduced Computation

The minimization needs to be performed only for the \( N \) states \( \{x_0, \tilde{x}_1, \dots, \tilde{x}_{N-1}\} \) that are actually encountered during the on-line control of the system, not for every state in the state space.

🧠

4. Test Your Knowledge

1. Why do we use Approximation in Value Space?

2. In this approach, for which states do we perform the minimization?

3. What replaces \( J^*_k \) in the minimization step?

Previous

Lecture 07

Next

Lecture 09