Sequential Decision Problems

🌍

1. What are Sequential Decision Problems?

decision, information, decision, information, ...

The Challenge

How to represent the information that will arrive in the future and how to make decisions now and in the future.

Goal: Modeling these problems, and making effective decisions in the presence of the uncertainty of new information.

Two Major Classes

Finite Horizon

Deterministic
Stochastic

Infinite Horizon

Deterministic
Stochastic

⏱️

2. Finite Horizon Problems

Tasks where the decision-making process has a fixed time limit or number of steps \( T \).

🤖

Robot Navigation

Reach a goal in \( T \) steps.

🎮

Video Games

Complete level before timer runs out.

📈

Portfolio Mgmt

Maximize returns over a fixed period.

Mathematical Formulation

System evolution: \[ x_{k+1} = f_k(x_k, u_k), \quad k=0,1, \dots, N-1 \]

\( k \): Time index.
\( x_k \): State of the system.
\( u_k \): Control/Decision variable from set \( U_k(x_k) \).
\( f_k \): System dynamics function (table or formula).
\( N \): Horizon (number of stages).

Definitions:

State Space: The set of all possible \( x_k \) at time \( k \).
Control Space: The set of all possible \( u_k \) at time \( k \).

📊

3. Transition Graph & Cost

graph LR x0((x0)) -->|u0, g0| x1((x1)) x1 -->|u1, g1| x2((x2)) x2 -->|...| xN((xN)) xN -->|Final Cost| t((t)) style x0 fill:#dbeafe,stroke:#3b82f6 style t fill:#dcfce7,stroke:#16a34a

Graph Structure

Nodes: Correspond to states \( x_k \).
Arcs: Correspond to state-control pairs \( (x_k, u_k) \).
Cost: Length of the arc \( g_k(x_k, u_k) \).
Terminal Node: An artificial node \( t \) is added to handle the final stage.

The problem is equivalent to finding a shortest path from the initial node to the terminal node \( t \).

Cost Function

We want to minimize the total cost:

\[ J(x_0; u_0, \dots, u_{N-1}) = g_N(x_N) + \sum_{k=0}^{N-1} g_k(x_k, u_k) \]

Optimal Cost Function: \[ J^*(x_0) = \min_{u_k \in U_k(x)} J(x_0; u_0, \dots, u_{N-1}) \]

💡

4. The DP Principle

Some costs can be estimated by evaluating backwards!

Example: Climbing Stairs

A person wants to reach the \( n \)-th stair. Can climb 1 or 2 stairs at a time. How many ways?

ways(n) = ways(n-1) + ways(n-2)

This is a Tail Subproblem. We solve smaller problems to solve the larger one.

Overlapping Sub-problems

Many sub-problems are repeated (overlapping).

DP Strategy: Avoid recomputing by memorizing results or using a recursion stack.
Once the optimal tail sub-problem is found, use it to compute the "higher" level.

Scheduling Example

Operations A, B, C, D. B after A, D after C.
Cost = \( S_A + C_{AC} + C_{CD} + C_{DB} \).

graph TD Start --> A Start --> C A --> B C --> D B --> End D --> End

This is a deterministic problem. At a given state, each choice of control leads to a uniquely determined state.