1. Q-Factors: Alternative Form of DP
The DP algorithm can be restated using Q-factors (or Q-values), which are defined for all state-control pairs \( (x_k, u_k) \). This formulation is fundamental to Reinforcement Learning, specifically Q-Learning.
Definition of Q-Factors
The optimal Q-factor \( Q^*_k(x_k, u_k) \) is the cost of starting at state \( x_k \), taking control \( u_k \), and then acting optimally thereafter:
\[ Q^*_k (x_k, u_k) = g_k(x_k, u_k) + J^*_{k+1} (f_k(x_k, u_k)) \]Relationship with \( J^*_k \)
The optimal cost-to-go \( J^*_k(x_k) \) is simply the minimum Q-factor over all possible controls:
\[ J^*_k(x_k) = \min_{u_k \in U_k(x_k)} Q^*_k (x_k, u_k) \]DP Algorithm in Q-Factors
Substituting \( J^*_{k+1} \) with the minimization of \( Q^*_{k+1} \), we get the Q-factor recurrence:
\[ Q_k^* (x_k, u_k) = g_k(x_k, u_k) + \min_{u_{k+1} \in U_k(f_k(x_k, u_k))} Q^*_{k+1}(f_k(x_k, u_k), u_{k+1}) \]Why is this useful?
Q-factors allow us to evaluate the quality of a specific action \( u_k \) in a specific state \( x_k \) without immediately committing to it. This is the core idea behind Q-Learning!
2. DP for Traveling Salesman Problem (TSP)
Interactive Demo: Traveling Salesman Problem Visualization
Problem Statement
Given \( N \) cities and the travel time between each pair of cities, find a minimum time tour that visits each city exactly once and returns to the start city.
Brute Force Approach
Try all possible permutations of cities.
Complexity: \( O(N!) \)
Extremely slow for large \( N \).
Dynamic Programming Approach
Use held-karp algorithm (state involves set of visited cities).
Complexity: \( O(N^2 2^N) \)
Much faster than brute force, but still exponential.