DP Algorithm & Examples

💡

1. The DP Algorithm Idea

Key Idea: To solve tail sub-problem at \( x_k \):

Consider every possible \( u_k \) and solve the tail sub-problem that starts at the next state: \[ x_{k+1} = f_k(x_k, u_k) \]
- This gives the cost-to-go from the next state.
Optimize over all possible \( u_k \): Pick the control that minimizes the sum of the immediate cost \( g_k(x_k, u_k) \) and the optimal cost-to-go from \( x_{k+1} \).

🌍

2. Examples of State, Control, & Reward

🚦

Traffic Light Control

State: Traffic situation (cars waiting, current phase).
Control: Change light (switch phase, extend green).
Outcome: Traffic flow (reduced waiting time).

🚗

Self-Driving Car

State: Environment (position, speed, obstacles).
Control: Driving commands (accelerate, steer).
Outcome: Safe & efficient driving.

📦

Inventory Management

State: Inventory levels, demand rate.
Control: Restocking (order units, adjust production).
Outcome: Meet demand, minimize holding costs.

🌡️

Home Heating System

State: Indoor temperature.
Control: Adjust heating/cooling (on/off).
Outcome: Comfortable temperature, energy efficiency.

🦾

Robot Arm Manipulation

State: Joint angles, end-effector position.
Control: Motor torques, movement commands.
Outcome: Successful pick & place.

🛒

Recommendation System

State: User history (views, purchases).
Control: Recommend items, discounts.
Outcome: Sales, user satisfaction.

📝

3. Formal DP Algorithm

Backward Induction Steps

1. Initialization (Last Stage):

\[ J^*_N(x_N) = g_N(x_N), \quad \text{for all } x_N \]

2. Backward Recursion:

For \( k = N-1, \dots, 0 \), compute: \[ J^*_k(x_k) = \min_{u_k \in U_k(x_k)} \left[ g_k(x_k, u_k) + J^*_{k+1}(f_k(x_k, u_k)) \right] \]

3. Result:

The optimal cost \( J^*(x_0) \) is obtained at the last step (time 0).

🧠