Generative Adversarial Networks

A generator learns to fool; a discriminator learns to spot. Together they shape a distribution.

"Two nets walk into a bar—one makes art, one judges hard."

🎲

Generative modelling

Given samples $x_1,\dots,x_n \sim p$, build a generator $g: Z \to X$ with $z \sim$ simple noise so that $\hat{x}=g(z) \sim \hat{p} \approx p$.

Goal: learn $\hat{p}$ close enough that new draws feel like $p$—no explicit density needed.

🗺️

GAN timeline (highlights)

2014 · GAN

Goodfellow et al. introduce the minimax duel.

2015 · DCGAN

Radford et al. add conv nets for stable images.

2017 · WGAN

Arjovsky et al. swap JS for Wasserstein distance.

2019 · StyleGAN

Karras et al. push controllable style mixing.

Imagine the nodes along a timeline: 2014 GAN → 2015 DCGAN → 2017 WGAN → 2019 StyleGAN.

🤼‍♂️

Two-player game

Generator: maps noise $z$ to data-like $\hat{x}$; tries to fool discriminator.
Discriminator: scores real vs fake; penalizes implausible $\hat{x}$.

🧮

Variational $f$-divergence view

For convex $f$, $f(x) = \sup_t \{tx - f^*(t)\}$. Plugging into $D_f(p||q)$ yields

$$D_f(p\|q)=\sup_T \left[\mathbb{E}_{x\sim p}[T(x)] - \mathbb{E}_{x\sim q}[f^*(T(x))]\right]$$

Parameterize $T_\phi$; minimize over generator parameters $\theta$ while maximizing over $\phi$:

$$\min_{\theta}\max_{\phi} \mathbb{E}_{x\sim p}[T_\phi(x)] - \mathbb{E}_{z\sim q}[f^*(T_\phi(g_\theta(z)))]$$

GAN (2014) uses $T_\phi(x)=\log d_\phi(x)$ leading to JS divergence objective.

🎯

Classic GAN objective

Minimax: $$\min_G \max_D \; \mathbb{E}_{x\sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z\sim p_z}[\log(1 - D(G(z)))]$$

JS connection: $$\text{GAN}(p,q)=2\,\text{JSD}(p,q) - \log 4 = D_{\text{KL}}\!\left(p\middle\|\tfrac{p+q}{2}\right)+D_{\text{KL}}\!\left(p_g\middle\|\tfrac{p+q}{2}\right)-\log 4$$

⚙️

Minibatch SGD loop

Repeat for iterations; with $k$ discriminator steps (often $k=1$):

Sample $m$ noise $\{z^{(i)}\}$, $m$ real $\{x^{(i)}\}$.
Update $D$ by ascending $$\nabla_\phi \frac{1}{m}\sum_{i=1}^m \left[\log d_\phi(x^{(i)}) + \log(1-d_\phi(g_\theta(z^{(i)})))\right]$$

Then sample noise and update $G$ by descending $$\nabla_\theta \frac{1}{m} \sum_{i=1}^m \log\big(1 - d_\phi(g_\theta(z^{(i)}))\big)$$

Variants flip the generator loss to $-\log d_\phi(G(z))$ for stronger gradients.

📜

Optimal discriminator & equilibrium

Proposition (fixed $G$)

Optimal $$D^*_G(x)=\frac{p_{\text{data}}(x)}{p_{\text{data}}(x)+p_g(x)}$$ maximizing the value function.

Theorem (global min)

Global minimum at $p_g = p_{\text{data}}$ with $$C(G) = -\log 4$$ since $D^*_G(x)=1/2$ and JSD is zero only when the distributions match.

🧠 Quick Quiz

What does the E-step analogue compute in GAN training?

✅

True / False speed round

🧪

Mini Lab: GAN tweaks

Click a chip for a stabilization idea.

Choose a tweak to see an idea.

🎵 Memory jingle

"G draws dreams from random haze,
D critiques with steady gaze.
JS whispers when to stop,
Mix and train till losses drop."

Wrap-up

← Back to GMMs Next: Wasserstein GAN →