1. The Exact LP Formulation
We can view the optimal cost vector \( J^* \) as the solution to a maximization problem.
Key Idea
\( J^* \) is the "largest" \( J \) that satisfies the Bellman inequality constraints:
\[ J(i) \leq \sum_{j=1}^{n} p_{ij}(u) ( g(i, u, j) + \alpha J(j) ) \]for all states \( i \) and controls \( u \in U(i) \).
The Linear Program
Maximize the sum of values subject to the constraints:
\[ \text{Maximize } \sum_{i=1}^{n} J(i) \]Subject to: \( J(i) \le E \{ g + \alpha J \} \) for all \( i, u \).
Visualizing the Feasible Region
\( J^* \) sits at the "corner" of the constraints, maximizing the objective.
2. Proof & Challenges
Proof Sketch
Start with any \( J_0 \) that satisfies the constraints. This implies \( J_0 \le T J_0 = J_1 \).
By the monotonicity of the DP operator \( T \):
\[ J_0 \le J_1 \le J_2 \dots \le J^* \]Thus, any feasible \( J \) is bounded above by \( J^* \). Maximizing \( J \) pushes it up to \( J^* \).
The Difficulty
For large problems, this exact LP is intractable:
- Too many variables: One for each state \( n \).
- Too many constraints: One for each state-control pair \( (i, u) \).
3. Approximate Linear Programming
To handle large state spaces, we use Approximation in Value Space within the LP framework.
Feature-Based Architecture
Replace \( J(i) \) with a linear approximation:
\[ \tilde{J}(i, r) = \sum_{\ell=1}^{m} r_{\ell} \phi_{\ell}(i) \]This drastically reduces the number of variables from \( n \) (states) to \( m \) (features).
Constraint Sampling
We still have too many constraints. We solve this by sampling a subset of constraints to enforce.
- Use known suboptimal policies to visit "important" states.
- Enforce constraints only on these sampled states \( \tilde{I} \) and controls \( \tilde{U}(i) \).