Least Squares and SVD
Curve Fitting, Normal Equations, and Singular Value Decomposition
November 17, 2008
Polling Data & Curve Fitting
Recall in interpolation we wanted to find a curve that went through all of the data points.
Suppose we are given the data $\{(x_1,y_1), ..., (x_n, y_n)\}$ and we want to find a curve that best fits the data.
Fitting Curves
Fitting a Line
Given $n$ data points $\{(x_1,y_i),...,(x_n,y_n)\}$, we want to find $a$ and $b$ such that:
In matrix form, we want to find $a$ and $b$ that solves:
$$ \begin{bmatrix}x_1 & 1 \\ \vdots & \vdots \\x_n & 1 \end{bmatrix} \begin{bmatrix}a \\ b \end{bmatrix} = \begin{bmatrix} y_1 \\ \vdots \\ y_n \end{bmatrix} $$Note: Systems with more equations than unknowns are called overdetermined.
Overdetermined Systems
If $A$ is an $m\times n$ matrix, then in general, an $m \times 1$ vector $b$ may not lie in the column space of $A$. Hence $Ax=b$ may not have an exact solution.
Definition: Residual Vector
$$ r = b - Ax $$The least squares solution is given by minimizing the square of the residual in the 2-norm.
The Normal Equations
Writing $r = (b-Ax)$ and substituting, we want to find an $x$ that minimizes the following function:
From calculus, the minimizer occurs where $\nabla\phi(x)=0$. The derivative is given by:
$$ \nabla\phi(x) = -2A^Tb + 2A^TAx = 0 $$Definition: Normal Equations
$$ A^TAx = A^Tb $$Solving Normal Equations
Since the normal equations form a symmetric system, we can solve by computing the Cholesky factorization:
$$ A^TA = LL^T $$And solving $Ly = A^Tb$ and $L^Tx = y$.
Example: Conditioning Issue
Consider the matrix where $0 < \epsilon < \sqrt{\epsilon_{mach}}$:
$$ A = \begin{bmatrix} 1 & 1 \\ \epsilon & 0 \\ 0 & \epsilon \end{bmatrix} $$The normal equations for this system are given by:
$$ A^TA = \begin{bmatrix} 1+\epsilon^2 & 1 \\ 1 & 1+\epsilon^2 \end{bmatrix} \approx \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} $$Note: For small $\epsilon$, computer arithmetic might lose the $\epsilon^2$ term, making the matrix singular!
Conditioning Theorem
The normal equations tend to worsen the condition of the matrix.
How can we solve the least squares problem without squaring the condition of the matrix?
🧠Quick Check Win $10
If the condition number of A is $10^3$, what is the condition number of the normal equations matrix $A^TA$?
Other Approaches
1. QR Factorization
For $A \in \mathbb{R}^{m\times n}$, factor $A = QR$ where:
- $Q$ is an $m\times m$ orthogonal matrix.
- $R$ is an $m\times n$ upper triangular matrix.
We can write $R = \begin{bmatrix} R' \\ 0 \end{bmatrix}$ where $R'$ is $n \times n$ upper triangular.
2. Singular Value Decomposition (SVD)
For $A \in \mathbb{R}^{m\times n}$, factor $A = USV^T$ where:
- $U$ is an $m\times m$ orthogonal matrix.
- $V$ is an $n\times n$ orthogonal matrix.
- $S$ is an $m\times n$ diagonal matrix (singular values).
Orthogonal Matrices
Definition
A matrix $Q$ is orthogonal if $Q^TQ = QQ^T = I$.
Orthogonal matrices preserve the Euclidean norm of any vector $v$:
$$ \norm{Qv}^2_2 = (Qv)^T(Qv) = v^TQ^TQv = v^Tv = \norm{v}^2_2 $$Using QR for Least Squares
We apply orthogonal matrices to the residual vector without changing the norm:
If $Q^Tb = \begin{bmatrix} c_1 \\ c_2 \end{bmatrix}$ and $x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}$ (assuming $n=m$ for simplicity in notation here, usually $x$ is just $n \times 1$), then:
$$ \left\|Q^Tb-\begin{bmatrix}R\\0\end{bmatrix}\right\|^2_2 = \left\|\begin{bmatrix} c_1-Rx_1 \\ c_2 \end{bmatrix}\right\|^2_2 = \norm{c_1 - Rx_1}^2_2 + \norm{c_2}^2_2 $$Solution: Solve $Rx_1 = c_1$ by back substitution. The residual is $\norm{r}_2 = \norm{c_2}_2$.
Gram-Schmidt Orthogonalization
One way to obtain the $QR$ factorization is by Gram-Schmidt.
Idea: For each column of $A$, subtract out its component in the other columns.
Simple Case (2 vectors)
For $\{a_1, a_2\}$:
- Normalize $a_1$: $$ q_1 = \frac{a_1}{\norm{a_1}} $$
- Subtract components from $a_2$: $$ a_2' = a_2 - (q_1^Ta_2)q_1 $$
- Normalize $a_2'$: $$ q_2 = \frac{a_2'}{\norm{a_2'}} $$
Matrix Relation
Since $R$ is upper triangular and $A=QR$:
From this, we see that $r_{ij} = q_i^T a_j$ for $j > i$.
Using SVD for Least Squares
Recall the Singular Value Decomposition $A = USV^T$:
Minimization using SVD
Assume $A$ has rank $k$. Substitute SVD into the residual norm:
$$ \norm{r}^2_2 = \norm{b-Ax}^2_2 = \norm{b-USV^Tx}^2_2 $$Since $U$ is orthogonal, we can multiply by $U^T$ without changing the norm:
$$ \norm{U^Tb - U^TUSV^Tx}^2_2 = \norm{U^Tb - SV^Tx}^2_2 $$Let $c = U^Tb$ and $y = V^Tx$ (so $x=Vy$). We now have:
$$ \norm{r}^2_2 = \norm{c - Sy}^2_2 $$Since $S$ is diagonal with only $k$ nonzeros:
$$ \norm{r}^2_2 = \sum_{i=1}^k (c_i - \sigma_i y_i)^2 + \sum_{i=k+1}^n c_i^2 $$This is minimized when $y_i = \frac{c_i}{\sigma_i}$ for $1 \leq i \leq k$.
Theorem
Let $A$ be an $m \times n$ matrix of rank $r$ with SVD $A = USV^T$. The least squares solution of $Ax=b$ is:
$$ x = \sum_{i=1}^r (\sigma^{-1}_i c_i) v_i $$where $c_i = u_i^T b$.
🧠Final Challenge Win $20
Why might we prefer SVD or QR over Normal Equations for least squares?