1. Motivation
The core idea of duality is to transform a difficult primal problem into a potentially easier dual problem. This concept has deep historical roots.
Joseph-Louis Lagrange
Introduced Lagrange multipliers for equality constrained problems, forming the basis of duality theory.
Werner Fenchel
Developed conjugate functions, providing a modern convex analysis perspective on duality.
2. Geometric Intuition
Example Problem
Minimize \( f(x,y) = 8x^2 - 2y \) subject to \( x^2 + y^2 = 1 \).
Key Insight
- The minimum occurs where the level curve of \( f \) just touches the constraint curve \( h(x,y)=0 \).
- At this touch point, their tangent lines are parallel.
- Therefore, their normal vectors (gradients) must be parallel:
Here, \( \lambda \) is the Lagrange Multiplier.
Why Orthogonality?
Imagine moving along the constraint curve \( h(x,y)=0 \). The motion is along the tangent vector \( T \).
To change the value of \( f \), we must move with a component along \( \nabla f \). If \( \nabla f \cdot T \neq 0 \), we can still decrease \( f \).
At an extremum (min/max), we cannot decrease \( f \) anymore while staying on the curve. Thus, the motion along \( T \) must be orthogonal to \( \nabla f \):
\[ \nabla f \cdot T = 0 \]Conclusion
Since \( T \) is orthogonal to \( \nabla h \) (by definition) and now also to \( \nabla f \), the gradients \( \nabla f \) and \( \nabla h \) must be parallel (or anti-parallel).
\[ \nabla f + \lambda \nabla h = 0 \]3. The Lagrangian
Definition
For the problem minimize \( f(x) \) subject to \( h(x) = 0 \), we define the Lagrangian:
\[ \mathcal{L}(x, \lambda) = f(x) + \lambda h(x) \]The condition \( \nabla \mathcal{L} = 0 \) encapsulates both the original constraint and the parallel gradient condition:
4. Example: Non-convex Problem
Minimize \( f(x,y) = xy \)
Subject to \( h(x,y) = \frac{x^2}{8} + \frac{y^2}{2} - 1 = 0 \)
Lagrangian:
\[ \mathcal{L}(x,y,\lambda) = xy + \lambda(\frac{x^2}{8} + \frac{y^2}{2} - 1) \]Setting \( \nabla \mathcal{L} = 0 \) gives the system:
2) x + \lambda y = 0
3) x^2/8 + y^2/2 = 1
Solving this system yields \( \lambda = \pm 2 \).
Extremal points: \( (\pm 2, \pm 1) \).
Minimum value: -2 at \( (2, -1) \) and \( (-2, 1) \).
5. Lagrange Dual Function
Definition
The Lagrange dual function \( g: \mathbb{R}^m \times \mathbb{R}^p \rightarrow \mathbb{R} \) is defined as the minimum value of the Lagrangian over \( x \):
\[ g(\lambda, \nu) = \inf_{x \in \mathcal{D}} L(x, \lambda, \nu) = \inf_{x \in \mathcal{D}} \left( f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x) \right) \]Key Properties
- Concavity: The dual function \( g(\lambda, \nu) \) is always concave, even if the original problem is not convex! (It's the pointwise infimum of affine functions).
- Lower Bound: For any \( \lambda \geq 0 \) and any \( \nu \), \( g(\lambda, \nu) \leq p^* \).
Example: Least Squares
Minimize \( x^T x \) subject to \( Ax = b \).
Lagrangian: \( L(x, \nu) = x^T x + \nu^T (Ax - b) \).
Minimize w.r.t \( x \): \( \nabla_x L = 2x + A^T \nu = 0 \implies x = -(1/2)A^T \nu \).
Substitute back to get \( g(\nu) = -(1/4)\nu^T A A^T \nu - b^T \nu \).
6. Weak and Strong Duality
Weak Duality
Let \( d^* \) be the optimal value of the dual problem (maximize \( g \)).
\[ d^* \leq p^* \]This always holds. The difference \( p^* - d^* \) is the duality gap.
Strong Duality
When the gap is zero:
\[ d^* = p^* \]Usually holds for convex problems under certain conditions (like Slater's).
Slater's Condition
If the primal problem is convex and there exists a strictly feasible point \( x \) (where \( f_i(x) < 0 \)), then strong duality holds.
7. KKT Conditions
For differentiable functions, if strong duality holds, the optimal primal and dual points must satisfy the Karush-Kuhn-Tucker (KKT) conditions:
8. Solving Primal via Dual
If strong duality holds, we can solve the (often easier) dual problem to find \( \lambda^*, \nu^* \). Then, we find the primal solution \( x^* \) by minimizing the Lagrangian:
\[ x^* = \text{argmin}_x L(x, \lambda^*, \nu^*) \]9. Duality Examples
Example 1: Equality Constrained QP
Consider the problem:
subject to Ax = b
where \( P \in S^n_+ \). The KKT conditions are:
- Primal Feasibility: \( Ax^* = b \)
- Dual Feasibility: None (no inequality constraints)
- Complementary Slackness: None
- Stationarity: \( Px^* + q + A^T \nu^* = 0 \)
Example 2: Water-Filling Problem
Maximize total communication rate by allocating power \( x_i \) to channels with noise \( \alpha_i \):
subject to x \geq 0, \quad \mathbf{1}^T x = 1
KKT Conditions:
- \( x^* \geq 0, \quad \sum x_i^* = 1 \)
- \( \lambda^* \geq 0 \) (for \( -x \leq 0 \))
- \( \lambda_i^* x_i^* = 0 \)
- \( -1/(\alpha_i + x_i^*) - \lambda_i^* + \nu^* = 0 \)
10. Test Your Understanding
1. The Lagrange dual function \( g(\lambda, \nu) \) is always:
2. Slater's condition guarantees:
3. Complementary slackness \( \lambda_i f_i(x) = 0 \) implies:
๐ Spot the Mistake!
Scenario 1: A student claims:
"If \( \nabla f(x) = 0 \), then \( x \) must be the solution to the constrained optimization problem minimize \( f(x) \) subject to \( h(x)=0 \)."
Scenario 2: Another student claims:
"For the problem \(\min x^2 + y^2\) s.t. \(x+y=1\), the Lagrange multiplier \(\lambda\) must be positive."