Variational Autoencoders

Learning compressed representations through probabilistic encoding and decoding

🔄

Autoencoders: Compress & Reconstruct

An autoencoder is an unsupervised neural network that learns to compress data into a lower-dimensional latent space and then reconstruct it back.

Interactive Autoencoder Architecture

Data flows through encoder → compressed representation → decoder → reconstruction

Training Objective: Reconstruction Loss

$\mathcal{L}(\mathbf{x}, \hat{\mathbf{x}}) = \frac{1}{N} \sum_{i=1}^{N} \|\mathbf{x}_i - \hat{\mathbf{x}}_i\|^2$

Mean Squared Error between input and reconstruction

📊 Dimensionality Reduction

Learn lower-dimensional representations

🎨 Feature Learning

Extract meaningful features automatically

🔊 Denoising

Remove noise from corrupted data

⚠️ Anomaly Detection

High reconstruction error → anomaly

🎭 Generation

VAEs generate new samples

🗜️ Compression

Efficient data storage

🔗

Connection to PCA

Key Insight

Linear autoencoders (with linear activation functions) learn the same solution as Principal Component Analysis (PCA)!

PCA

Finds principal components via eigendecomposition of covariance matrix

Linear Autoencoder

Learns same subspace via gradient descent on MSE loss

Example: Computing PCA

Step 1: Original Data

$X = \begin{bmatrix} 1 & 2 \\ 2 & 3 \\ 3 & 5 \end{bmatrix}$

Step 2: Center the Data

$\bar{\mathbf{x}} = \begin{bmatrix} 2 \\ 3.333 \end{bmatrix}$

$X_c = \begin{bmatrix} -1 & -1.333 \\ 0 & -0.333 \\ 1 & 1.667 \end{bmatrix}$

Step 3: Covariance Matrix

$\Sigma = \frac{1}{2} X_c^T X_c = \begin{bmatrix} 1 & 1.5 \\ 1.5 & 2.333 \end{bmatrix}$

Step 4: Eigendecomposition

$\lambda_1 = 3.116, \quad \mathbf{v}_1 = \begin{bmatrix} -0.464 \\ -0.886 \end{bmatrix}$

$\lambda_2 = 0.216, \quad \mathbf{v}_2 = \begin{bmatrix} 0.886 \\ -0.464 \end{bmatrix}$

Step 5: Project onto PC1

$X_p = X_c \mathbf{v}_1 = \begin{bmatrix} 1.161 \\ 0.300 \\ -1.461 \end{bmatrix}$

✓ Reduced from 2D to 1D while preserving maximum variance!

🧠 Quiz 1

What is the main difference between a standard autoencoder and PCA?

🎲

Variational Autoencoders (VAE)

VAEs extend autoencoders by making the latent space probabilistic, enabling them to generate new samples by sampling from a learned distribution!

VAE Architecture

Encoder outputs μ and σ² → Sample z using reparameterization trick → Decoder reconstructs

Reconstruction Loss

$\mathcal{L}_{\text{rec}} = \mathbb{E}_{q_\phi(z|x)}[\log p_\theta(x|z)]$

Encourages accurate reconstruction

KL Divergence Loss

$\mathcal{L}_{\text{KL}} = D_{KL}(q_\phi(z|x) \| p(z))$

Regularizes latent space to match prior

Total VAE Loss (ELBO)

$\mathcal{L}_{\text{VAE}} = \mathcal{L}_{\text{rec}} + \beta \mathcal{L}_{\text{KL}}$

Evidence Lower BOund (ELBO) - maximize to improve model

🎨 Interactive: Explore Latent Space

Click on the canvas to sample from different regions of the 2D latent space!

📍 Current z:

[0.00, 0.00]

🎲 Sampled:

0 points

⚙️

VAE Training Algorithm

Algorithm: Train VAE

Input: Data $\{\mathbf{x}^i\}_{i=1}^n$, epochs $E$, batch size $m$, latent dim $d$

1: Initialize encoder $q_\phi(\mathbf{z}|\mathbf{x})$ and decoder $p_\theta(\mathbf{x}|\mathbf{z})$

2: for $e=1$ to $E$:

3: Shuffle training data

4: for each minibatch $\{\mathbf{x}^j\}_{j=1}^m$:

5: Encode: Compute $\mu, \sigma^2 = q_\phi(\mathbf{z}|\mathbf{x})$

6: Sample: $\mathbf{z} = \mu + \sigma \odot \epsilon$ where $\epsilon \sim \mathcal{N}(0,I)$

7: Decode: $\hat{\mathbf{x}} = p_\theta(\mathbf{x}|\mathbf{z})$

8: Compute $\mathcal{L}_{\text{rec}} = \|\mathbf{x} - \hat{\mathbf{x}}\|^2$

9: Compute $\mathcal{L}_{\text{KL}} = -\frac{1}{2}\sum(1 + \log\sigma^2 - \mu^2 - \sigma^2)$

10: Total loss: $\mathcal{L} = \mathcal{L}_{\text{rec}} + \mathcal{L}_{\text{KL}}$

11: Update $\phi, \theta$ via backpropagation

🔑 Reparameterization Trick: $z = \mu + \sigma \odot \epsilon$ makes sampling differentiable!

🧠 Quiz 2

Why is the reparameterization trick necessary in VAEs?

📝

Key Takeaways

🔄 Autoencoders

Learn compressed representations through encoder-decoder architecture

📊 PCA Link

Linear autoencoders = PCA, but nonlinear ones learn richer features

🎲 VAEs

Probabilistic latent space enables generation via sampling

"Compress, regularize, generate—the VAE way!"