Least Squares and SVD

Curve Fitting, Normal Equations, and Singular Value Decomposition

November 17, 2008

Part 1: The Problem

Polling Data & Curve Fitting

Recall in interpolation we wanted to find a curve that went through all of the data points.

Suppose we are given the data $\{(x_1,y_1), ..., (x_n, y_n)\}$ and we want to find a curve that best fits the data.

Fitting Curves

Fitting a Line

Given $n$ data points $\{(x_1,y_i),...,(x_n,y_n)\}$, we want to find $a$ and $b$ such that:

$$ y_i = ax_i - b \quad \forall i \in [1,n] $$

In matrix form, we want to find $a$ and $b$ that solves:

$$ \begin{bmatrix}x_1 & 1 \\ \vdots & \vdots \\x_n & 1 \end{bmatrix} \begin{bmatrix}a \\ b \end{bmatrix} = \begin{bmatrix} y_1 \\ \vdots \\ y_n \end{bmatrix} $$

Note: Systems with more equations than unknowns are called overdetermined.

Overdetermined Systems

If $A$ is an $m\times n$ matrix, then in general, an $m \times 1$ vector $b$ may not lie in the column space of $A$. Hence $Ax=b$ may not have an exact solution.

Definition: Residual Vector

$$ r = b - Ax $$

The least squares solution is given by minimizing the square of the residual in the 2-norm.

Part 2: Normal Equations

The Normal Equations

Writing $r = (b-Ax)$ and substituting, we want to find an $x$ that minimizes the following function:

$$ \phi(x) = \norm{r}^2_2 = r^Tr = (b-Ax)^T(b-Ax) = b^Tb - 2x^TA^Tb + x^TA^TAx $$

From calculus, the minimizer occurs where $\nabla\phi(x)=0$. The derivative is given by:

$$ \nabla\phi(x) = -2A^Tb + 2A^TAx = 0 $$

Definition: Normal Equations

$$ A^TAx = A^Tb $$

Solving Normal Equations

Since the normal equations form a symmetric system, we can solve by computing the Cholesky factorization:

$$ A^TA = LL^T $$

And solving $Ly = A^Tb$ and $L^Tx = y$.

Example: Conditioning Issue

Consider the matrix where $0 < \epsilon < \sqrt{\epsilon_{mach}}$:

$$ A = \begin{bmatrix} 1 & 1 \\ \epsilon & 0 \\ 0 & \epsilon \end{bmatrix} $$

The normal equations for this system are given by:

$$ A^TA = \begin{bmatrix} 1+\epsilon^2 & 1 \\ 1 & 1+\epsilon^2 \end{bmatrix} \approx \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} $$

Note: For small $\epsilon$, computer arithmetic might lose the $\epsilon^2$ term, making the matrix singular!

Conditioning Theorem

The normal equations tend to worsen the condition of the matrix.

$$ \text{cond}(A^TA) = (\text{cond}(A))^2 $$

>> A = rand(10,10);
>> cond(A)
43.4237
>> cond(A'*A)
1.8856e+03

How can we solve the least squares problem without squaring the condition of the matrix?

🧠 Quick Check Win $10

If the condition number of A is $10^3$, what is the condition number of the normal equations matrix $A^TA$?

Part 3: QR & SVD Approaches

Other Approaches

1. QR Factorization

For $A \in \mathbb{R}^{m\times n}$, factor $A = QR$ where:

$Q$ is an $m\times m$ orthogonal matrix.
$R$ is an $m\times n$ upper triangular matrix.

We can write $R = \begin{bmatrix} R' \\ 0 \end{bmatrix}$ where $R'$ is $n \times n$ upper triangular.

2. Singular Value Decomposition (SVD)

For $A \in \mathbb{R}^{m\times n}$, factor $A = USV^T$ where:

$U$ is an $m\times m$ orthogonal matrix.
$V$ is an $n\times n$ orthogonal matrix.
$S$ is an $m\times n$ diagonal matrix (singular values).

Orthogonal Matrices

Definition

A matrix $Q$ is orthogonal if $Q^TQ = QQ^T = I$.

Orthogonal matrices preserve the Euclidean norm of any vector $v$:

$$ \norm{Qv}^2_2 = (Qv)^T(Qv) = v^TQ^TQv = v^Tv = \norm{v}^2_2 $$

Using QR for Least Squares

We apply orthogonal matrices to the residual vector without changing the norm:

$$ \norm{r}^2_2 = \norm{b-Ax}^2_2 = \left\|b-Q \begin{bmatrix} R \\ 0 \end{bmatrix}x\right\|^2_2 = \left\|Q^Tb - \begin{bmatrix} R \\ 0 \end{bmatrix}x \right\|^2_2 $$

If $Q^Tb = \begin{bmatrix} c_1 \\ c_2 \end{bmatrix}$ and $x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}$ (assuming $n=m$ for simplicity in notation here, usually $x$ is just $n \times 1$), then:

$$ \left\|Q^Tb-\begin{bmatrix}R\\0\end{bmatrix}\right\|^2_2 = \left\|\begin{bmatrix} c_1-Rx_1 \\ c_2 \end{bmatrix}\right\|^2_2 = \norm{c_1 - Rx_1}^2_2 + \norm{c_2}^2_2 $$

Solution: Solve $Rx_1 = c_1$ by back substitution. The residual is $\norm{r}_2 = \norm{c_2}_2$.

Gram-Schmidt Orthogonalization

One way to obtain the $QR$ factorization is by Gram-Schmidt.

Idea: For each column of $A$, subtract out its component in the other columns.

Simple Case (2 vectors)

For $\{a_1, a_2\}$:

Normalize $a_1$: $$ q_1 = \frac{a_1}{\norm{a_1}} $$
Subtract components from $a_2$: $$ a_2' = a_2 - (q_1^Ta_2)q_1 $$
Normalize $a_2'$: $$ q_2 = \frac{a_2'}{\norm{a_2'}} $$

Matrix Relation

Since $R$ is upper triangular and $A=QR$:

$$ \begin{align*} a_1 &= q_1r_{11} \\ a_2 &= q_1r_{12} + q_2r_{22} \\ \vdots &= \quad \vdots \\ a_n &= q_1r_{1n} + q_2r_{2n} + ... + q_nr_{nn} \end{align*} $$

From this, we see that $r_{ij} = q_i^T a_j$ for $j > i$.

% MATLAB: Gram-Schmidt QR
function [Q,R] = gs_qr (A)
    m = size(A,1);
    n = size(A,2);
    for i = 1:n
        R(i,i) = norm(A(:,i),2);
        Q(:,i) = A(:,i)./R(i,i);
        for j = i+1:n
            R(i,j) = Q(:,i)' * A(:,j);
            A(:,j) = A(:,j) - R(i,j)*Q(:,i);
        end
    end
end

Using SVD for Least Squares

Recall the Singular Value Decomposition $A = USV^T$:

$$ A = \begin{bmatrix} \vdots & \vdots & \vdots\\ u_1 & \dots & u_m \\ \vdots & \vdots & \vdots \end{bmatrix} \begin{bmatrix} \sigma_1 & & & &\\ & \ddots & & & \\ & & \sigma_r & & \\ & & & \ddots & \\ & & & & 0 \end{bmatrix} \begin{bmatrix} \dots & v_1^T & \dots\\ \dots & \vdots & \dots\\ \dots & v_n^T & \dots \end{bmatrix} $$

Minimization using SVD

Assume $A$ has rank $k$. Substitute SVD into the residual norm:

$$ \norm{r}^2_2 = \norm{b-Ax}^2_2 = \norm{b-USV^Tx}^2_2 $$

Since $U$ is orthogonal, we can multiply by $U^T$ without changing the norm:

$$ \norm{U^Tb - U^TUSV^Tx}^2_2 = \norm{U^Tb - SV^Tx}^2_2 $$

Let $c = U^Tb$ and $y = V^Tx$ (so $x=Vy$). We now have:

$$ \norm{r}^2_2 = \norm{c - Sy}^2_2 $$

Since $S$ is diagonal with only $k$ nonzeros:

$$ \norm{r}^2_2 = \sum_{i=1}^k (c_i - \sigma_i y_i)^2 + \sum_{i=k+1}^n c_i^2 $$

This is minimized when $y_i = \frac{c_i}{\sigma_i}$ for $1 \leq i \leq k$.

Theorem

Let $A$ be an $m \times n$ matrix of rank $r$ with SVD $A = USV^T$. The least squares solution of $Ax=b$ is:

$$ x = \sum_{i=1}^r (\sigma^{-1}_i c_i) v_i $$

where $c_i = u_i^T b$.

🧠 Final Challenge Win $20

Why might we prefer SVD or QR over Normal Equations for least squares?