Generative Adversarial Networks
A generator learns to fool; a discriminator learns to spot. Together they shape a distribution.
"Two nets walk into a bar—one makes art, one judges hard."
Generative modelling
Given samples $x_1,\dots,x_n \sim p$, build a generator $g: Z \to X$ with $z \sim$ simple noise so that $\hat{x}=g(z) \sim \hat{p} \approx p$.
Goal: learn $\hat{p}$ close enough that new draws feel like $p$—no explicit density needed.
GAN timeline (highlights)
2014 · GAN
Goodfellow et al. introduce the minimax duel.
2015 · DCGAN
Radford et al. add conv nets for stable images.
2017 · WGAN
Arjovsky et al. swap JS for Wasserstein distance.
2019 · StyleGAN
Karras et al. push controllable style mixing.
Imagine the nodes along a timeline: 2014 GAN → 2015 DCGAN → 2017 WGAN → 2019 StyleGAN.
Two-player game
- Generator: maps noise $z$ to data-like $\hat{x}$; tries to fool discriminator.
- Discriminator: scores real vs fake; penalizes implausible $\hat{x}$.
Variational $f$-divergence view
For convex $f$, $f(x) = \sup_t \{tx - f^*(t)\}$. Plugging into $D_f(p||q)$ yields
$$D_f(p\|q)=\sup_T \left[\mathbb{E}_{x\sim p}[T(x)] - \mathbb{E}_{x\sim q}[f^*(T(x))]\right]$$
Parameterize $T_\phi$; minimize over generator parameters $\theta$ while maximizing over $\phi$:
$$\min_{\theta}\max_{\phi} \mathbb{E}_{x\sim p}[T_\phi(x)] - \mathbb{E}_{z\sim q}[f^*(T_\phi(g_\theta(z)))]$$
GAN (2014) uses $T_\phi(x)=\log d_\phi(x)$ leading to JS divergence objective.
Classic GAN objective
Minimax: $$\min_G \max_D \; \mathbb{E}_{x\sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z\sim p_z}[\log(1 - D(G(z)))]$$
JS connection: $$\text{GAN}(p,q)=2\,\text{JSD}(p,q) - \log 4 = D_{\text{KL}}\!\left(p\middle\|\tfrac{p+q}{2}\right)+D_{\text{KL}}\!\left(p_g\middle\|\tfrac{p+q}{2}\right)-\log 4$$
Minibatch SGD loop
Repeat for iterations; with $k$ discriminator steps (often $k=1$):
- Sample $m$ noise $\{z^{(i)}\}$, $m$ real $\{x^{(i)}\}$.
- Update $D$ by ascending $$\nabla_\phi \frac{1}{m}\sum_{i=1}^m \left[\log d_\phi(x^{(i)}) + \log(1-d_\phi(g_\theta(z^{(i)})))\right]$$
Then sample noise and update $G$ by descending $$\nabla_\theta \frac{1}{m} \sum_{i=1}^m \log\big(1 - d_\phi(g_\theta(z^{(i)}))\big)$$
Variants flip the generator loss to $-\log d_\phi(G(z))$ for stronger gradients.
Optimal discriminator & equilibrium
Proposition (fixed $G$)
Optimal $$D^*_G(x)=\frac{p_{\text{data}}(x)}{p_{\text{data}}(x)+p_g(x)}$$ maximizing the value function.
Theorem (global min)
Global minimum at $p_g = p_{\text{data}}$ with $$C(G) = -\log 4$$ since $D^*_G(x)=1/2$ and JSD is zero only when the distributions match.
🧠 Quick Quiz
What does the E-step analogue compute in GAN training?
True / False speed round
Mini Lab: GAN tweaks
Click a chip for a stabilization idea.
🎵 Memory jingle
"G draws dreams from random haze,
D critiques with steady gaze.
JS whispers when to stop,
Mix and train till losses drop."