💡
1. The DP Algorithm Idea
Key Idea: To solve tail sub-problem at \( x_k \):
-
Consider every possible \( u_k \) and solve the tail sub-problem that starts at
the next state:
\[ x_{k+1} = f_k(x_k, u_k) \]
- This gives the cost-to-go from the next state.
- Optimize over all possible \( u_k \): Pick the control that minimizes the sum of the immediate cost \( g_k(x_k, u_k) \) and the optimal cost-to-go from \( x_{k+1} \).
🌍
2. Examples of State, Control, & Reward
🚦
Traffic Light Control
- State: Traffic situation (cars waiting, current phase).
- Control: Change light (switch phase, extend green).
- Outcome: Traffic flow (reduced waiting time).
🚗
Self-Driving Car
- State: Environment (position, speed, obstacles).
- Control: Driving commands (accelerate, steer).
- Outcome: Safe & efficient driving.
📦
Inventory Management
- State: Inventory levels, demand rate.
- Control: Restocking (order units, adjust production).
- Outcome: Meet demand, minimize holding costs.
🌡️
Home Heating System
- State: Indoor temperature.
- Control: Adjust heating/cooling (on/off).
- Outcome: Comfortable temperature, energy efficiency.
🦾
Robot Arm Manipulation
- State: Joint angles, end-effector position.
- Control: Motor torques, movement commands.
- Outcome: Successful pick & place.
🛒
Recommendation System
- State: User history (views, purchases).
- Control: Recommend items, discounts.
- Outcome: Sales, user satisfaction.
📝
3. Formal DP Algorithm
Backward Induction Steps
1. Initialization (Last Stage):
\[ J^*_N(x_N) = g_N(x_N), \quad \text{for all } x_N \]
2. Backward Recursion:
For \( k = N-1, \dots, 0 \), compute: \[ J^*_k(x_k) = \min_{u_k \in U_k(x_k)} \left[ g_k(x_k, u_k) + J^*_{k+1}(f_k(x_k, u_k)) \right] \]
3. Result:
The optimal cost \( J^*(x_0) \) is obtained at the last step (time 0).
🧠