Variational Autoencoders

Learning compressed representations through probabilistic encoding and decoding

πŸ”„

Autoencoders: Compress & Reconstruct

An autoencoder is an unsupervised neural network that learns to compress data into a lower-dimensional latent space and then reconstruct it back.

Interactive Autoencoder Architecture

Input x ∈ ℝⁿ Encoder Latent z ∈ β„α΅ˆ (d << n) Decoder Output xΜ‚ ∈ ℝⁿ

Data flows through encoder β†’ compressed representation β†’ decoder β†’ reconstruction

Training Objective: Reconstruction Loss

$\mathcal{L}(\mathbf{x}, \hat{\mathbf{x}}) = \frac{1}{N} \sum_{i=1}^{N} \|\mathbf{x}_i - \hat{\mathbf{x}}_i\|^2$

Mean Squared Error between input and reconstruction

πŸ“Š Dimensionality Reduction

Learn lower-dimensional representations

🎨 Feature Learning

Extract meaningful features automatically

πŸ”Š Denoising

Remove noise from corrupted data

⚠️ Anomaly Detection

High reconstruction error β†’ anomaly

🎭 Generation

VAEs generate new samples

πŸ—œοΈ Compression

Efficient data storage

πŸ”—

Connection to PCA

Key Insight

Linear autoencoders (with linear activation functions) learn the same solution as Principal Component Analysis (PCA)!

PCA

Finds principal components via eigendecomposition of covariance matrix

Linear Autoencoder

Learns same subspace via gradient descent on MSE loss

Example: Computing PCA

Step 1: Original Data

$X = \begin{bmatrix} 1 & 2 \\ 2 & 3 \\ 3 & 5 \end{bmatrix}$

Step 2: Center the Data

$\bar{\mathbf{x}} = \begin{bmatrix} 2 \\ 3.333 \end{bmatrix}$

$X_c = \begin{bmatrix} -1 & -1.333 \\ 0 & -0.333 \\ 1 & 1.667 \end{bmatrix}$

Step 3: Covariance Matrix

$\Sigma = \frac{1}{2} X_c^T X_c = \begin{bmatrix} 1 & 1.5 \\ 1.5 & 2.333 \end{bmatrix}$

Step 4: Eigendecomposition

$\lambda_1 = 3.116, \quad \mathbf{v}_1 = \begin{bmatrix} -0.464 \\ -0.886 \end{bmatrix}$

$\lambda_2 = 0.216, \quad \mathbf{v}_2 = \begin{bmatrix} 0.886 \\ -0.464 \end{bmatrix}$

Step 5: Project onto PC1

$X_p = X_c \mathbf{v}_1 = \begin{bmatrix} 1.161 \\ 0.300 \\ -1.461 \end{bmatrix}$

βœ“ Reduced from 2D to 1D while preserving maximum variance!

🧠 Quiz 1

What is the main difference between a standard autoencoder and PCA?

🎲

Variational Autoencoders (VAE)

VAEs extend autoencoders by making the latent space probabilistic, enabling them to generate new samples by sampling from a learned distribution!

VAE Architecture

Input x Encoder q_Ο†(z|x) ΞΌ σ² Sample z ~ N(ΞΌ,σ²) Decoder p_ΞΈ(x|z) xΜ‚ Ξ΅ ~ N(0,1)

Encoder outputs ΞΌ and σ² β†’ Sample z using reparameterization trick β†’ Decoder reconstructs

Reconstruction Loss

$\mathcal{L}_{\text{rec}} = \mathbb{E}_{q_\phi(z|x)}[\log p_\theta(x|z)]$

Encourages accurate reconstruction

KL Divergence Loss

$\mathcal{L}_{\text{KL}} = D_{KL}(q_\phi(z|x) \| p(z))$

Regularizes latent space to match prior

Total VAE Loss (ELBO)

$\mathcal{L}_{\text{VAE}} = \mathcal{L}_{\text{rec}} + \beta \mathcal{L}_{\text{KL}}$

Evidence Lower BOund (ELBO) - maximize to improve model

🎨 Interactive: Explore Latent Space

Click on the canvas to sample from different regions of the 2D latent space!

πŸ“ Current z:

[0.00, 0.00]

🎲 Sampled:

0 points

βš™οΈ

VAE Training Algorithm

Algorithm: Train VAE

Input: Data $\{\mathbf{x}^i\}_{i=1}^n$, epochs $E$, batch size $m$, latent dim $d$

1: Initialize encoder $q_\phi(\mathbf{z}|\mathbf{x})$ and decoder $p_\theta(\mathbf{x}|\mathbf{z})$

2: for $e=1$ to $E$:

3: Shuffle training data

4: for each minibatch $\{\mathbf{x}^j\}_{j=1}^m$:

5: Encode: Compute $\mu, \sigma^2 = q_\phi(\mathbf{z}|\mathbf{x})$

6: Sample: $\mathbf{z} = \mu + \sigma \odot \epsilon$ where $\epsilon \sim \mathcal{N}(0,I)$

7: Decode: $\hat{\mathbf{x}} = p_\theta(\mathbf{x}|\mathbf{z})$

8: Compute $\mathcal{L}_{\text{rec}} = \|\mathbf{x} - \hat{\mathbf{x}}\|^2$

9: Compute $\mathcal{L}_{\text{KL}} = -\frac{1}{2}\sum(1 + \log\sigma^2 - \mu^2 - \sigma^2)$

10: Total loss: $\mathcal{L} = \mathcal{L}_{\text{rec}} + \mathcal{L}_{\text{KL}}$

11: Update $\phi, \theta$ via backpropagation

πŸ”‘ Reparameterization Trick: $z = \mu + \sigma \odot \epsilon$ makes sampling differentiable!

🧠 Quiz 2

Why is the reparameterization trick necessary in VAEs?

πŸ“

Key Takeaways

πŸ”„ Autoencoders

Learn compressed representations through encoder-decoder architecture

πŸ“Š PCA Link

Linear autoencoders = PCA, but nonlinear ones learn richer features

🎲 VAEs

Probabilistic latent space enables generation via sampling

"Compress, regularize, generateβ€”the VAE way!"