1. What are Sequential Decision Problems?
decision, information, decision, information, ...
The Challenge
How to represent the information that will arrive in the future and how to make decisions now and in the future.
Goal: Modeling these problems, and making effective decisions in the presence of the uncertainty of new information.
Two Major Classes
Finite Horizon
- Deterministic
- Stochastic
Infinite Horizon
- Deterministic
- Stochastic
2. Finite Horizon Problems
Tasks where the decision-making process has a fixed time limit or number of steps \( T \).
Robot Navigation
Reach a goal in \( T \) steps.
Video Games
Complete level before timer runs out.
Portfolio Mgmt
Maximize returns over a fixed period.
Mathematical Formulation
System evolution: \[ x_{k+1} = f_k(x_k, u_k), \quad k=0,1, \dots, N-1 \]
- \( k \): Time index.
- \( x_k \): State of the system.
- \( u_k \): Control/Decision variable from set \( U_k(x_k) \).
- \( f_k \): System dynamics function (table or formula).
- \( N \): Horizon (number of stages).
Definitions:
- State Space: The set of all possible \( x_k \) at time \( k \).
- Control Space: The set of all possible \( u_k \) at time \( k \).
3. Transition Graph & Cost
Graph Structure
- Nodes: Correspond to states \( x_k \).
- Arcs: Correspond to state-control pairs \( (x_k, u_k) \).
- Cost: Length of the arc \( g_k(x_k, u_k) \).
- Terminal Node: An artificial node \( t \) is added to handle the final stage.
The problem is equivalent to finding a shortest path from the initial node to the terminal node \( t \).
Cost Function
We want to minimize the total cost:
\[ J(x_0; u_0, \dots, u_{N-1}) = g_N(x_N) + \sum_{k=0}^{N-1} g_k(x_k, u_k) \]Optimal Cost Function: \[ J^*(x_0) = \min_{u_k \in U_k(x)} J(x_0; u_0, \dots, u_{N-1}) \]
4. The DP Principle
Some costs can be estimated by evaluating backwards!
Example: Climbing Stairs
A person wants to reach the \( n \)-th stair. Can climb 1 or 2 stairs at a time. How many ways?
ways(n) = ways(n-1) + ways(n-2)
This is a Tail Subproblem. We solve smaller problems to solve the larger one.
Overlapping Sub-problems
Many sub-problems are repeated (overlapping).
- DP Strategy: Avoid recomputing by memorizing results or using a recursion stack.
- Once the optimal tail sub-problem is found, use it to compute the "higher" level.
Scheduling Example
Operations A, B, C, D. B after A, D after C.
Cost = \( S_A + C_{AC} + C_{CD} + C_{DB} \).
This is a deterministic problem. At a given state, each choice of control leads to a uniquely determined state.