Better Iterative Methods & Google
From Conjugate Gradients to Markov Chains & Monte Carlo
The Quadratic Form
We define a quadratic form $f(x)$ as:
Example: Given $A= \begin{bmatrix}3 & 2\\ 2& 6\end{bmatrix}, \quad b= \begin{bmatrix}2\\-8\end{bmatrix}, \quad c = 0$
Gradient & Minima
The gradient $\nabla f(x)$ points uphill. The derivation of the $i^{th}$ component yields:
$$ \nabla f = \frac{1}{2} A^T x + \frac{1}{2} A x - b $$If $A$ is symmetric ($A^T = A$), this simplifies to:
$\nabla f = Ax - b$
Thus, at the minimum where $\nabla f = 0$, we have $Ax = b$.
Steepest Descent
We want to find $\alpha$ to minimize $f(x_{k+1})$ where $x_{k+1} = x_k + \alpha r_k$.
This implies we pick $\alpha$ such that the new residual is orthogonal to the current direction. The optimal search parameter is derived as:
Conjugate Gradients (CG)
Steepest descent leads to "staggered" convergence. CG improves this by choosing search directions $s_k$ that are $A$-orthogonal (conjugate) to previous directions. It converges in at most $n$ steps.
Convergence & Preconditioning
Convergence Bound
The $A$-norm of the error is minimized. The rate depends on the condition number $\kappa = \lambda_{max}/\lambda_{min}$.
Preconditioning
We transform $Ax=b$ with a matrix $Q$ so $Q^{-1}A$ is closer to identity. For CG, we generally use Symmetric Preconditioning:
- Left: $QAx = Qb$
- Right: $AQw = b, Qw=x$
- Symmetric: $Q^TAQw = Q^Tb, Qw=x$ (Required for CG)
🧠 Quick Check Win $10
Why is Symmetric Preconditioning preferred for Conjugate Gradients?
Randomly Walking with Google
Start at a webpage, randomly select a link, and repeat.
- Dead End: Page with no outgoing links.
- Cycle (Markov Chain): You end up where you began.
- PageRank: The limiting probability that an infinitely dedicated random surfer visits a page.
Markov Chain: The Broken Machine
A machine has 3 states: Working, Temporarily Broken, and Permanently Broken.
Transition Matrix P
$$ P = \begin{pmatrix} 0.9 & 0.6 & 0 \\ 0.095 & 0.4 & 0 \\ 0.005 & 0 & 1 \end{pmatrix} $$Question: If the machine is working on Day 1 ($v^{(0)} = [1, 0, 0]^T$), what is the probability it is still functional after a year (365 days)?
We compute $P^{365}$:
The first column tells us the state probabilities starting from State 1. The probability of being Permanently Broken (Row 3) is 0.7937.
State 3 is an absorbing state. The steady state solution is $v = [0, 0, 1]^T$.
The Google Matrix
Let $G$ be the connectivity matrix ($n \times n$). Let $c_j$ be the column sums (in-degree). We define matrix $A$:
- $p$: Probability of following a link (typically 0.85).
- $1-p$: Probability of random teleportation (handling dead ends/cycles).
- $\delta$: $(1-p)/n$.
The PageRank vector $x$ is the eigenvector where $x = Ax$. This can be written as the linear system:
Integration as Averaging
Integration can be thought of as averaging a function over a domain.
1. Average Daily Snowfall (1D)
Domain: Year (Time)
$$ \text{Average} = \frac{\int_{day \in year} s(day) \, d(day)}{\text{Length of Year}} $$2. Average Snowfall in Illinois (2D)
Domain: Surface Area (Space)
$$ \text{Average} = \frac{\int_{loc \in IL} s(loc) \, d(loc)}{\text{Area of IL}} $$3. Average Snowfall in Illinois Today (3D)
Domain: Space $\times$ Time
$$ \text{Average} = \frac{\int_{day}\int_{loc} s(loc, day) \, d(loc, day)}{\text{Area} \cdot \text{Length}} $$Discrete Random Variables
Expected Value ($E[x]$)
$$ E[x] = \sum_{j=1}^{n} x_j p_j $$Example (Die Roll): $(1+2+3+4+5+6)/6 = 3.5$
Variance ($\sigma^2$)
$$ \sigma^2[x] = E[(x - E[x])^2] = E[x^2] - E[x]^2 $$Measure of variation from the average.
Estimation & Law of Large Numbers
We can estimate $E[x]$ by choosing $N$ random values and averaging them:
$$ E[x] \approx \frac{1}{N} \sum_{j=1}^{N} x_j $$Law of Large Numbers
As $N \to \infty$, the estimate converges to the correct value with probability 1.
$$ P(E[x] = \lim_{N\to\infty} \frac{1}{N} \sum_{i=1}^{N} x_i) = 1 $$Continuous Extensions
For complex geometries (like maps), we extend this to integration using a density function $\rho(x)$:
$$ E[g(x)] = \int_{x \in D} g(x)\rho(x) \, dA_x \approx \frac{1}{N} \sum_{i=1}^{N} g(x_i) $$🧠 Final Challenge Win $20
Based on the Law of Large Numbers, how does the error of our estimate behave as we increase $N$?