📏

Linear Programming Approach

Solving Dynamic Programming problems using Linear Programming and Approximate LP.

🎯

1. The Exact LP Formulation

We can view the optimal cost vector \( J^* \) as the solution to a maximization problem.

Key Idea

\( J^* \) is the "largest" \( J \) that satisfies the Bellman inequality constraints:

\[ J(i) \leq \sum_{j=1}^{n} p_{ij}(u) ( g(i, u, j) + \alpha J(j) ) \]

for all states \( i \) and controls \( u \in U(i) \).

The Linear Program

Maximize the sum of values subject to the constraints:

\[ \text{Maximize } \sum_{i=1}^{n} J(i) \]

Subject to: \( J(i) \le E \{ g + \alpha J \} \) for all \( i, u \).

Visualizing the Feasible Region

graph TD Constraint1("Constraint u_1") --> Region Constraint2("Constraint u_2") --> Region Constraint3("Constraint u_3") --> Region Region("Feasible Region (J <= J*)") --> Max("Maximize Sum(J)") Max --> Optimal("Optimal Solution J*") style Optimal fill:#ffedd5,stroke:#f97316

\( J^* \) sits at the "corner" of the constraints, maximizing the objective.

🚧

2. Proof & Challenges

Proof Sketch

Start with any \( J_0 \) that satisfies the constraints. This implies \( J_0 \le T J_0 = J_1 \).

By the monotonicity of the DP operator \( T \):

\[ J_0 \le J_1 \le J_2 \dots \le J^* \]

Thus, any feasible \( J \) is bounded above by \( J^* \). Maximizing \( J \) pushes it up to \( J^* \).

The Difficulty

For large problems, this exact LP is intractable:

  • Too many variables: One for each state \( n \).
  • Too many constraints: One for each state-control pair \( (i, u) \).
📉

3. Approximate Linear Programming

To handle large state spaces, we use Approximation in Value Space within the LP framework.

Feature-Based Architecture

Replace \( J(i) \) with a linear approximation:

\[ \tilde{J}(i, r) = \sum_{\ell=1}^{m} r_{\ell} \phi_{\ell}(i) \]

This drastically reduces the number of variables from \( n \) (states) to \( m \) (features).

Constraint Sampling

We still have too many constraints. We solve this by sampling a subset of constraints to enforce.

  • Use known suboptimal policies to visit "important" states.
  • Enforce constraints only on these sampled states \( \tilde{I} \) and controls \( \tilde{U}(i) \).

ALP Process

graph LR Params("Variables r") --> Approx("Approx J(i,r)") Approx --> Constraints("Sampled Constraints") Constraints --> LP("Solve Small LP") LP --> BestR("Optimal Weights r*") style LP fill:#dbeafe,stroke:#3b82f6
📝

4. Test Your Knowledge

1. In the exact LP formulation, what are the variables?

2. What does the objective function maximize?

3. How does Approximate LP reduce the number of variables?

4. How do we handle the massive number of constraints in ALP?

5. What is a major benefit of the LP approach?

Previous

Lecture 27

Next

Lecture 29