Optimization Solvers for Min-Max Problems

From gradient descent to game-theoretic equilibria: understanding the optimization landscape of GANs.

"Two players, one game—finding balance in the chaos of adversarial optimization."

🎯

Why Min-Max Optimization?

GANs involve a generator and discriminator playing a two-player game, each trying to optimize their own objective. Unlike standard minimization problems, we need to solve:

$\min_{\theta} \max_{\phi} \mathcal{L}(\theta, \phi)$

Generator minimizes, Discriminator maximizes

This creates unique challenges: oscillations, cycling, and convergence failures. We'll explore the algorithms designed to handle these adversarial dynamics.

⬇️

Gradient Descent: The Foundation

Gradient descent is the workhorse of optimization—iteratively moving parameters in the direction opposite to the gradient to minimize a loss function.

Algorithm: Gradient Descent

1: Initialize parameters $\theta$ randomly
2: Set learning rate $\alpha$
3: while not converged:
4: Compute gradient $\nabla_\theta J(\theta)$
5: Update: $\theta \leftarrow \theta - \alpha \nabla_\theta J(\theta)$

Interactive: Gradient Descent in 2D

Watch the ball roll down to the minimum of the function!

↕️

Gradient Descent Ascent (GDA)

For min-max problems, we need one player to minimize (descent) and another to maximize (ascent). This is GDA!

Algorithm: GDA for Min-Max

Input: Learning rate $\eta$, iterations $T$
1: Initialize $\theta$ and $\phi$ randomly
2: for $t = 1$ to $T$:
3: Sample minibatch $\{x_i\}_{i=1}^m$
4: ⬇ Descent: $\theta \leftarrow \theta - \eta \nabla_\theta \mathcal{L}$
5: ⬆ Ascent: $\phi \leftarrow \phi + \eta \nabla_\phi \mathcal{L}$

Generator (θ)

Performs gradient descent to minimize loss

Discriminator (φ)

Performs gradient ascent to maximize loss

⚠️

When GDA Fails: Oscillation

⚠️ Problem:

GDA can fail to converge on simple games due to cyclic dynamics. Consider the zero-sum game: $f(x, y) = xy$ where player 1 minimizes and player 2 maximizes.

Example: $f(x,y) = xy$

Gradients: $\nabla_x f = y$, $\nabla_y f = x$

Updates:

$x^{(t)} = x^{(t-1)} - \alpha y^{(t-1)}$

$y^{(t)} = y^{(t-1)} + \alpha x^{(t-1)}$

Starting at $(1, 1)$ with $\alpha = 0.1$:

$(x^1, y^1) = (0.9, 1.1)$

$(x^2, y^2) = (0.79, 1.19)$

$(x^3, y^3) = (0.671, 1.269)$

...oscillates without converging!

Oscillation Animation

x y GDA cycles!

Instead of converging to (0,0), GDA oscillates in circles

🧠 Quick Quiz 1 Test yourself

What causes GDA to fail in the example $f(x,y) = xy$?

⚖️

Nash Equilibrium

Definition

A Nash equilibrium is a strategy profile where no player can unilaterally improve their payoff by changing their strategy, given the strategies of other players.

$u_i(s^*_i, s^*_{-i}) \geq u_i(s_i, s^*_{-i})$ for all players $i$ and strategies $s_i$

🎯 Stability

No incentive to deviate

🤝 Mutual Best Response

Each player's strategy is optimal given others'

⚡ Self-Enforcing

Stable without external enforcement

🎮

Game Theory: Simultaneous vs Sequential

Simultaneous Games

Players make decisions simultaneously without knowing others' choices. Represented by payoff matrices.

Classic Example: Prisoner's Dilemma

Player 2: Cooperate
Player 2: Betray
Player 1: Cooperate
R, R
Reward
S, T
Sucker/Tempt
Player 1: Betray
T, S
Tempt/Sucker
P, P
Punishment

Typical ordering: T > R > P > S

Nash Equilibrium: (Betray, Betray) — even though mutual cooperation is better!

Sequential Games

Players make decisions one after another, with later players knowing earlier choices. Represented by game trees.

Classic Example: Ultimatum Game

P Proposer Offer $x Offer $10-x R R Accept (10-x, x) Reject (0, 0) Accept (x, 10-x) Reject (0, 0)

Backward Induction: Responder accepts any positive offer → Proposer offers minimum!

Reality: Humans reject unfair offers due to fairness preferences 🤝

🧠 Quick Quiz 2 Game theory

What is a Nash Equilibrium?

🚀

Advanced Optimization Methods

Competitive Gradient Descent

Accounts for the competitive nature of min-max games by considering the opponent's gradient when updating.

Key Idea:

Anticipate opponent's response to your update

Consensus Optimization (ConOpt)

Seeks consensus between players' objectives by combining gradient information from both players.

Key Idea:

Find direction that improves both objectives simultaneously when possible

LEAD (Leader-Follower)

Assumes one player (leader) updates first, and the other (follower) responds optimally.

Key Idea:

Stackelberg game formulation with asymmetric roles

Symplectic Gradient Adjustment (SGA)

Uses symplectic geometry to adjust gradients, preventing cyclic behavior and improving convergence.

Key Idea:

Add correction term perpendicular to gradient to break cycles

📝

Key Takeaways

⚠️ Challenge

Min-max optimization is fundamentally harder than minimization due to adversarial dynamics

⚖️ Equilibria

Nash equilibrium is the goal, but GDA doesn't always find it due to cycling

🚀 Solutions

Advanced methods like SGA, ConOpt, and competitive GD help stabilize adversarial training

From Games to GANs...

"Finding equilibrium in a two-player game is like teaching cats to dance—
beautiful when it works, chaotic when it doesn't!"