Google, Markov Chains, Power Method, SVD
Understanding PageRank via Markov Chains, computing eigenvalues with the Power Method, and the fundamentals of Singular Value Decomposition.
Random Walks & Markov Chains
Imagine a "random surfer" starting on a webpage, randomly picking a link, and repeating effectively forever. This process describes a Markov Chain: the next state depends only on the current state.
Case Study: The Broken Machine
A machine has 3 states: (1) Working, (2) Temporarily Broken, (3) Permanently Broken.
Transition Matrix P
$$ P = \begin{pmatrix} 0.9 & 0.6 & 0 \\ 0.095 & 0.4 & 0 \\ 0.005 & 0 & 1 \\ \end{pmatrix} $$Question:
If working on Day 1, what is the chance it works after 365 days?
~21% chance it's still functional.
Steady State
The steady state vector $\vv$ satisfies $P\vv = \vv$. For this machine, $\vv = [0, 0, 1]^T$. State 3 is an absorbing state.
Constructing the Google Matrix
Let $G$ be the connectivity matrix ($G_{ij}=1$ if $i \to j$). We need a transition simply scaling $G$ isn't enough due to dead ends and cycles.
The Formula
- $c_j$: Out-degree (column sum) of page $j$.
- $p$: Damping factor (usually 0.85). Probability of following a link.
- $\delta = (1-p)/n$: Probability of teleporting to any random page.
In matrix form:
$$ A = p G D + e z^T $$where $D_{jj} = 1/c_j$ and $ez^T$ handles dead ends.
PageRank $x$ is the eigenvector: $(I - pGD)x = \gamma e$.
Matlab Implementation (Moler's Scripts)
Cleve Moler (creator of MATLAB) provided scripts to demonstrate this. The process starts at a root URL and crawls to build the matrix.
G is the sparse $n \times n$ connectivity matrix ($G_{ji}=1$ if link $j \to
i$).
U is the array of URLs.
Connection to Numerical Methods
Why is this a numerical problem?
- Sparse Matrices: The web graph is massive but sparse.
- Eigenvalue Problems: PageRank is just the dominant eigenvector.
- Markov Chains: These relate closely to Monte Carlo simulations and the Metropolis algorithm (used in physics/statistics).
The Power Method
How do we find the dominant eigenvalue $\lambda_1$ (largest magnitude)?
The Algorithm
Why it works
Since $x^{(0)} = \sum c_i v_i$, multiplying by $A^k$ emphasizes the component associated with the largest eigenvalue $\lambda_1$:
$$ A^k x^{(0)} \approx c_1 \lambda_1^k v_1 $$Inverse Power Method
To find the smallest eigenvalue, apply the Power Method to $A^{-1}$. Instead of computing $A^{-1}$, we solve systems $Ax^{(k+1)} = x^{(k)}$ (using LU decomposition).
SVD Fundamentals
Any matrix $A$ ($m \times n$) can be factored into:
- $U$: $m \times m$ Orthogonal matrix ($AA^T$ eigenvectors).
- $V$: $n \times n$ Orthogonal matrix ($A^TA$ eigenvectors).
- $S$: $m \times n$ Diagonal matrix of singular values $\sigma_i = \sqrt{\lambda_i(A^TA)}$.
Derivation: Diagonalization
We want to factor $A = USV^T$. Consider the product $A^TA$:
Since $U$ is orthogonal ($U^TU=I$) and $S$ is diagonal ($S^T=S$):
This is a similarity transformation! $S^2$ contains the eigenvalues of $A^TA$, and $V$ contains its eigenvectors. Similarly, $AA^T = U S^2 U^T$ gives us $U$.
Worked Example
Let $ A = \begin{bmatrix}2 & -2\\ 1 & 1\end{bmatrix} $.
Applications & Storage
We can write $A$ as a sum of rank-1 matrices: $$ A = \sigma_1 u_1 v_1^T + \sigma_2 u_2 v_2^T + \dots + \sigma_r u_r v_r^T $$
Compression (Truncated SVD)
Keep only the largest $k$ singular values. Storage drops from $m \times n$ to $k(1 + m + n)$. Great for image compression!
Other Uses
- Search Technology: Finding related documents/images.
- Clustering: Grouping similar data.
- Compression: Efficient image storage.
- Principal Axis: Finding main axes of solids.
- Summaries: Extracting representative tags.
- Graph Partitioning: Splitting graphs into subgraphs.
Interactive Playground
Use the tools below to experiment with the algorithms directly in your browser.
MATLAB / Octave (Online)
Run the exact MATLAB code from the lecture. This connects to a live Octave session.
Copy & Paste into Terminal:
Python (Client-Side)
Run the Power Method using NumPy right here. This runs 100% locally in your browser using WebAssembly.
Ready to run...
🧠 Knowledge Check +20 XP
In the Power Method, what determines the rate of convergence?