🔭

Approximation in Value Space

Deterministic Problems, One-Step Lookahead, and Q-Factors.

💡

1. The Core Idea

Exact DP is often too expensive. Instead, we replace the optimal cost-to-go \( J^*_{k+1} \) with an approximate cost function \( \tilde{J}_{k+1} \).

One-Step Lookahead Minimization

At state \( \tilde{x}_k \), we choose control \( \tilde{u}_k \) by solving:

\[ \tilde{u}_k \in \arg\min_{u_k \in U_k(\tilde{x}_k)} \left[ g_k(\tilde{x}_k, u_k) + \tilde{J}_{k+1} (f_k(\tilde{x}_k, u_k)) \right] \]

Then we move to the next state: \( \tilde{x}_{k+1} = f_k(\tilde{x}_k, \tilde{u}_k) \).

Visualizing the Process

graph LR State["Current State x_k"] -->|Apply u_k| NextState["Next State x_{k+1}"] NextState -->|Evaluate| Cost["Approx Cost J~(x_{k+1})"] State -->|Plus Stage Cost g_k| Total["Total: g_k + J~"] Total -->|Minimize| Decision["Choose Best u_k"] style Decision fill:#cffafe,stroke:#06b6d4,stroke-width:2px
Q

2. Approximate Q-Factors

Definition

The Q-factor represents the value of taking action \( u_k \) in state \( x_k \) and then following the approximate policy thereafter.

\[ \tilde{Q}_k(x_k, u_k) = g_k(x_k, u_k) + \tilde{J}_{k+1} (f_k(x_k, u_k)) \]

The control selection simply becomes minimizing the Q-factor:

\[ \tilde{u}_k \in \arg\min_{u_k \in U_k(\tilde{x}_k)} \tilde{Q}_k(\tilde{x}_k, u_k) \]
⚖️

3. Offline vs Online

Online Approximation

Compute \( \tilde{J}_{k+1} \) on the fly (e.g., using Rollout).

  • Pros: Adapts to the current state.
  • Cons: Computationally expensive at each step.

Offline Q-Factors

Train \( \tilde{Q}_k \) beforehand (e.g., Neural Networks) and use it directly.

  • Pros: Very fast online execution.
  • Cons: Performance depends on training quality; errors can degrade results.
📝

4. Test Your Knowledge

1. What do we replace \( J^*_{k+1} \) with in Approximation in Value Space?

2. What is the "One-Step Lookahead"?

3. What is a Q-Factor?

4. What is a risk of using Offline Trained Q-Factors?

5. Which method is typically faster during online execution?

Previous

Lecture 18

Next

Lecture 20