Gaussian Mixture Models
Blend multiple Gaussians to model complex data; fit with EM or variational tricks.
"Each point whispers, 'I belong,' mixtures hum a Gaussian song."
Mixture model basics
Assume each observation comes from one of $K$ components. For GMMs: $$p(\mathbf{x}) = \sum_{i=1}^{K} \pi_i \,\mathcal{N}(\mathbf{x} \mid \boldsymbol{\mu}_i, \boldsymbol{\Sigma}_i)$$ with weights $\pi_i$, means $\boldsymbol{\mu}_i$, covariances $\boldsymbol{\Sigma}_i$.
- Latent component assignment per point.
- Useful for clustering, density estimation, anomaly detection.
- Estimate parameters via EM, MLE, or variational inference.
Maximum likelihood for GMMs
Dataset $\{\mathbf{x}_1,\dots,\mathbf{x}_n\}$, parameters $\boldsymbol{\theta}=\{\pi_k,\boldsymbol{\mu}_k,\boldsymbol{\Sigma}_k\}_{k=1}^K$.
Log-likelihood: $$\ell(\boldsymbol{\theta})=\sum_{i=1}^n \log \sum_{k=1}^K \pi_k \mathcal{N}(\mathbf{x}_i \mid \boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k)$$
EM steps
E-step responsibilities: $$\gamma_{ik} = \frac{\pi_k \mathcal{N}(\mathbf{x}_i \mid \boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k)}{\sum_{j=1}^K \pi_j \mathcal{N}(\mathbf{x}_i \mid \boldsymbol{\mu}_j, \boldsymbol{\Sigma}_j)}$$ M-step updates: $$\pi_k = \frac{1}{n}\sum_{i=1}^n \gamma_{ik},\quad \boldsymbol{\mu}_k = \frac{\sum_{i=1}^n \gamma_{ik}\mathbf{x}_i}{\sum_{i=1}^n \gamma_{ik}},$$ $$\boldsymbol{\Sigma}_k = \frac{\sum_{i=1}^n \gamma_{ik}(\mathbf{x}_i - \boldsymbol{\mu}_k)(\mathbf{x}_i - \boldsymbol{\mu}_k)^\top}{\sum_{i=1}^n \gamma_{ik}}$$
EM climbs the log-likelihood and converges to a local maximum.
Variational inference
Approximate the posterior over latent assignments with a tractable family; optimize KL divergence (ELBO) via variational EM.
- E-step: update variational distribution to tighten ELBO.
- M-step: update $\pi_k, \boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k$ using variational stats.
- Faster than exact posteriors; may introduce bias but scales well.
- Extensions: black-box VI, amortized VI for flexible inference networks.
Example: fit a 3-component GMM
Generate 2D clusters and fit a 3-component full-covariance GMM with scikit-learn.
# Python example
import numpy as np
from sklearn.mixture import GaussianMixture
import matplotlib.pyplot as plt
np.random.seed(0)
n_samples = 500
X = np.concatenate((
np.random.randn(n_samples, 2) * 0.5 + np.array([1, 1]),
np.random.randn(n_samples, 2) * 0.5 + np.array([-2, 1]),
np.random.randn(n_samples, 2) * 0.5 + np.array([0, -2])
))
gmm = GaussianMixture(n_components=3, covariance_type='full')
gmm.fit(X)
labels = gmm.predict(X)
plt.scatter(X[:, 0], X[:, 1], c=labels, s=40, cmap='viridis')
plt.show()
Try switching covariance_type to 'diag' or 'tied' and watch cluster shapes change.
🎵 Memory jingle
"Pick a $k$, stir Gaussians round,
Weights that sum, covariances sound.
E then M, responsibilities sway,
Mixtures hum their latent way."
🧠 Quick Quiz
In the E-step of EM for a GMM, what is computed?
True / False speed round
Mini Lab: tweak a GMM
Click to see practical tweaks.
Wrap-up
"Mix the bells, fit the song,
E then M till peaks are strong.
Variational if you must,
Gaussians blend with gentle trust."