🧩

DP Algorithm & Examples

The general Dynamic Optimization Algorithm and real-world examples of States, Controls, and Rewards.

💡

1. The DP Algorithm Idea

Key Idea: To solve tail sub-problem at \( x_k \):

  1. Consider every possible \( u_k \) and solve the tail sub-problem that starts at the next state: \[ x_{k+1} = f_k(x_k, u_k) \]
    • This gives the cost-to-go from the next state.
  2. Optimize over all possible \( u_k \): Pick the control that minimizes the sum of the immediate cost \( g_k(x_k, u_k) \) and the optimal cost-to-go from \( x_{k+1} \).
🌍

2. Examples of State, Control, & Reward

🚦

Traffic Light Control

  • State: Traffic situation (cars waiting, current phase).
  • Control: Change light (switch phase, extend green).
  • Outcome: Traffic flow (reduced waiting time).
🚗

Self-Driving Car

  • State: Environment (position, speed, obstacles).
  • Control: Driving commands (accelerate, steer).
  • Outcome: Safe & efficient driving.
📦

Inventory Management

  • State: Inventory levels, demand rate.
  • Control: Restocking (order units, adjust production).
  • Outcome: Meet demand, minimize holding costs.
🌡️

Home Heating System

  • State: Indoor temperature.
  • Control: Adjust heating/cooling (on/off).
  • Outcome: Comfortable temperature, energy efficiency.
🦾

Robot Arm Manipulation

  • State: Joint angles, end-effector position.
  • Control: Motor torques, movement commands.
  • Outcome: Successful pick & place.
🛒

Recommendation System

  • State: User history (views, purchases).
  • Control: Recommend items, discounts.
  • Outcome: Sales, user satisfaction.
📝

3. Formal DP Algorithm

Backward Induction Steps

1. Initialization (Last Stage):

\[ J^*_N(x_N) = g_N(x_N), \quad \text{for all } x_N \]

2. Backward Recursion:

For \( k = N-1, \dots, 0 \), compute: \[ J^*_k(x_k) = \min_{u_k \in U_k(x_k)} \left[ g_k(x_k, u_k) + J^*_{k+1}(f_k(x_k, u_k)) \right] \]

3. Result:

The optimal cost \( J^*(x_0) \) is obtained at the last step (time 0).

🧠

4. Test Your Knowledge

1. In the DP algorithm, where do we start the computation?

2. What is the "State" in the Inventory Management example?

3. What does \( J^*_k(x_k) \) represent?

Previous

Lecture 03

Next

Lecture 05