Variational Autoencoders
Learning compressed representations through probabilistic encoding and decoding
Autoencoders: Compress & Reconstruct
An autoencoder is an unsupervised neural network that learns to compress data into a lower-dimensional latent space and then reconstruct it back.
Interactive Autoencoder Architecture
Data flows through encoder β compressed representation β decoder β reconstruction
Training Objective: Reconstruction Loss
$\mathcal{L}(\mathbf{x}, \hat{\mathbf{x}}) = \frac{1}{N} \sum_{i=1}^{N} \|\mathbf{x}_i - \hat{\mathbf{x}}_i\|^2$
Mean Squared Error between input and reconstruction
π Dimensionality Reduction
Learn lower-dimensional representations
π¨ Feature Learning
Extract meaningful features automatically
π Denoising
Remove noise from corrupted data
β οΈ Anomaly Detection
High reconstruction error β anomaly
π Generation
VAEs generate new samples
ποΈ Compression
Efficient data storage
Connection to PCA
Key Insight
Linear autoencoders (with linear activation functions) learn the same solution as Principal Component Analysis (PCA)!
PCA
Finds principal components via eigendecomposition of covariance matrix
Linear Autoencoder
Learns same subspace via gradient descent on MSE loss
Example: Computing PCA
Step 1: Original Data
$X = \begin{bmatrix} 1 & 2 \\ 2 & 3 \\ 3 & 5 \end{bmatrix}$
Step 2: Center the Data
$\bar{\mathbf{x}} = \begin{bmatrix} 2 \\ 3.333 \end{bmatrix}$
$X_c = \begin{bmatrix} -1 & -1.333 \\ 0 & -0.333 \\ 1 & 1.667 \end{bmatrix}$
Step 3: Covariance Matrix
$\Sigma = \frac{1}{2} X_c^T X_c = \begin{bmatrix} 1 & 1.5 \\ 1.5 & 2.333 \end{bmatrix}$
Step 4: Eigendecomposition
$\lambda_1 = 3.116, \quad \mathbf{v}_1 = \begin{bmatrix} -0.464 \\ -0.886 \end{bmatrix}$
$\lambda_2 = 0.216, \quad \mathbf{v}_2 = \begin{bmatrix} 0.886 \\ -0.464 \end{bmatrix}$
Step 5: Project onto PC1
$X_p = X_c \mathbf{v}_1 = \begin{bmatrix} 1.161 \\ 0.300 \\ -1.461 \end{bmatrix}$
β Reduced from 2D to 1D while preserving maximum variance!
π§ Quiz 1
What is the main difference between a standard autoencoder and PCA?
Variational Autoencoders (VAE)
VAEs extend autoencoders by making the latent space probabilistic, enabling them to generate new samples by sampling from a learned distribution!
VAE Architecture
Encoder outputs ΞΌ and ΟΒ² β Sample z using reparameterization trick β Decoder reconstructs
Reconstruction Loss
$\mathcal{L}_{\text{rec}} = \mathbb{E}_{q_\phi(z|x)}[\log p_\theta(x|z)]$
Encourages accurate reconstruction
KL Divergence Loss
$\mathcal{L}_{\text{KL}} = D_{KL}(q_\phi(z|x) \| p(z))$
Regularizes latent space to match prior
Total VAE Loss (ELBO)
$\mathcal{L}_{\text{VAE}} = \mathcal{L}_{\text{rec}} + \beta \mathcal{L}_{\text{KL}}$
Evidence Lower BOund (ELBO) - maximize to improve model
π¨ Interactive: Explore Latent Space
Click on the canvas to sample from different regions of the 2D latent space!
π Current z:
[0.00, 0.00]
π² Sampled:
0 points
VAE Training Algorithm
Algorithm: Train VAE
Input: Data $\{\mathbf{x}^i\}_{i=1}^n$, epochs $E$, batch size $m$, latent dim $d$
1: Initialize encoder $q_\phi(\mathbf{z}|\mathbf{x})$ and decoder $p_\theta(\mathbf{x}|\mathbf{z})$
2: for $e=1$ to $E$:
3: Shuffle training data
4: for each minibatch $\{\mathbf{x}^j\}_{j=1}^m$:
5: Encode: Compute $\mu, \sigma^2 = q_\phi(\mathbf{z}|\mathbf{x})$
6: Sample: $\mathbf{z} = \mu + \sigma \odot \epsilon$ where $\epsilon \sim \mathcal{N}(0,I)$
7: Decode: $\hat{\mathbf{x}} = p_\theta(\mathbf{x}|\mathbf{z})$
8: Compute $\mathcal{L}_{\text{rec}} = \|\mathbf{x} - \hat{\mathbf{x}}\|^2$
9: Compute $\mathcal{L}_{\text{KL}} = -\frac{1}{2}\sum(1 + \log\sigma^2 - \mu^2 - \sigma^2)$
10: Total loss: $\mathcal{L} = \mathcal{L}_{\text{rec}} + \mathcal{L}_{\text{KL}}$
11: Update $\phi, \theta$ via backpropagation
π Reparameterization Trick: $z = \mu + \sigma \odot \epsilon$ makes sampling differentiable!
π§ Quiz 2
Why is the reparameterization trick necessary in VAEs?
Key Takeaways
π Autoencoders
Learn compressed representations through encoder-decoder architecture
π PCA Link
Linear autoencoders = PCA, but nonlinear ones learn richer features
π² VAEs
Probabilistic latent space enables generation via sampling
"Compress, regularize, generateβthe VAE way!"