1. What is a State?
MDP Perspective
"A physical system characterized at any stage by a small set of parameters." - Bellman (1957)
RL Perspective
"The state must include all aspects of the past that make a difference for the future." - Sutton & Barto
Professor Powell's Definition
Optimization Version: A function of history necessary to compute the cost, constraints, and future transitions.
2. Types of State Variables
Physical (\(R_t\))
Inventory, Location, Battery Level.
Informational (\(I_t\))
Market Prices, Weather Forecasts.
Belief (\(B_t\))
Probabilistic beliefs about uncertain quantities.
Example: Clinical Trial Modeling
- Physical: Number of patients, dosage levels.
- Informational: Medical guidelines, regulatory approvals.
- Belief: Effectiveness of drug (Bayesian), likelihood of side effects.
3. Why Partial Observation?
In the real world, we rarely have perfect information. Sensors are flawed, measurements are costly, or states are hidden.
Man in Fog
Limited visibility of surroundings.
Robot in Smoke
Sensors obscured by environment.
Blindfolded Chess
State must be maintained in memory (Belief).
Miscalculated Belief
Fatal errors due to wrong assumptions.
4. POMDP & Belief State
POMDP (Partially Observed Markov Decision Process): The framework for decision making when state information is imperfect.
The Solution: Sufficient Statistic
To solve a POMDP, we convert it to a problem with perfect information using a Sufficient Statistic.
The most common sufficient statistic is the Belief State (\(b_k\)):
- Represents the probability distribution of \(x_k\) given all history.
- Updated using Bayes' rule (e.g., Kalman Filter, Particle Filter).