🧩

Q-Factors & TSP

An alternative form of DP using Q-factors (crucial for Q-learning) and an application to the Traveling Salesman Problem.

1. Q-Factors: Alternative Form of DP

The DP algorithm can be restated using Q-factors (or Q-values), which are defined for all state-control pairs \( (x_k, u_k) \). This formulation is fundamental to Reinforcement Learning, specifically Q-Learning.

Definition of Q-Factors

The optimal Q-factor \( Q^*_k(x_k, u_k) \) is the cost of starting at state \( x_k \), taking control \( u_k \), and then acting optimally thereafter:

\[ Q^*_k (x_k, u_k) = g_k(x_k, u_k) + J^*_{k+1} (f_k(x_k, u_k)) \]

Relationship with \( J^*_k \)

The optimal cost-to-go \( J^*_k(x_k) \) is simply the minimum Q-factor over all possible controls:

\[ J^*_k(x_k) = \min_{u_k \in U_k(x_k)} Q^*_k (x_k, u_k) \]

DP Algorithm in Q-Factors

Substituting \( J^*_{k+1} \) with the minimization of \( Q^*_{k+1} \), we get the Q-factor recurrence:

\[ Q_k^* (x_k, u_k) = g_k(x_k, u_k) + \min_{u_{k+1} \in U_k(f_k(x_k, u_k))} Q^*_{k+1}(f_k(x_k, u_k), u_{k+1}) \]

Why is this useful?

Q-factors allow us to evaluate the quality of a specific action \( u_k \) in a specific state \( x_k \) without immediately committing to it. This is the core idea behind Q-Learning!

🌍

2. DP for Traveling Salesman Problem (TSP)

Interactive Demo: Traveling Salesman Problem Visualization

Problem Statement

Given \( N \) cities and the travel time between each pair of cities, find a minimum time tour that visits each city exactly once and returns to the start city.

⚠️ NP-Complete Problem

Brute Force Approach

Try all possible permutations of cities.

Complexity: \( O(N!) \)

Extremely slow for large \( N \).

Dynamic Programming Approach

Use held-karp algorithm (state involves set of visited cities).

Complexity: \( O(N^2 2^N) \)

Much faster than brute force, but still exponential.

🧠

3. Test Your Knowledge

1. How is the optimal cost-to-go \( J^*_k(x_k) \) related to Q-factors?

2. Why are Q-factors important?

3. Which approach is faster for solving TSP?

Previous

Lecture 05

Next

Lecture 07