Monte Carlo Methods
Integration, Probability, Randomness
November 19, 2008
Why Monte Carlo?
- Central to PageRank (and many other applications in finance, science, informatics, etc.) is that we randomly process something.
- What we want to know is "on average" what is likely to happen.
- What would happen if we have an infinite number of samples?
- Let's take a look at the integral (a discrete limit in a sense).
Integration Basics
The integral of a function over a domain:
The size of a domain:
The average of a function over some domain:
Integration Examples
1. Daily Snowfall (1D)
- Domain: Year (1D time interval)
- Variable: Day
- Function: Snowfall depending on day
2. Snowfall in Illinois (2D)
- Domain: Illinois (2D surface)
- Variable: $(x,y)$ location
- Function: Snowfall depending on location
3. Snowfall in Illinois Today (3D)
- Domain: Illinois $\times$ Year (3D space-time)
- Variables: Location and Day
- Function: Snowfall depending on location and day
Discrete Random Variables
- Random variable $x$
- Values: $x_0, x_1, \dots, x_n$
- Probabilities $p_0, p_1,\dots,p_n$ with $\sum_{i=0}^{n}p_i = 1$
Example: Throwing a Die
Values: $x_1=1, x_2=2, \dots, x_6=6$
Probabilities: $p_i = 1/6$
Expected Value and Variance
Expected Value ($E[x]$)
Average value of the variable:
$$ E[x] = \sum_{j=1}^{n}x_j p_j $$Variance ($\sigma^2[x]$)
Variation from the average:
$$ \sigma^2[x] = E[(x-E[x])^2] = E[x^2] - E[x]^2 $$Die Example Calculations
Expected Value: $E[x] = (1+2+\dots+6)/6 = 3.5$
Variance: $\sigma^2[x] = 2.916$
🧠Quick Check Win $10
If you roll a fair 6-sided die, what is the probability of rolling a 3?
Estimated $E[x]$
To estimate the expected value, choose a set of random values based on the probability and average the results:
$$ E[x] \approx \frac{1}{N} \sum_{j=1}^{N} x_i $$Bigger $N$ gives better estimates.
Die Example: Sampling
- 3 rolls: $3, 1, 6 \rightarrow E[x] \approx (3+1+6)/3 = 3.33$
- 9 rolls: $3, 1, 6, 2, 5, 3, 4, 6, 2 \rightarrow E[x] \approx 3.51$
Law of Large Numbers
By taking $N \to \infty$, the error between the estimate and the expected value is statistically zero. That is, the estimate will converge to the correct value.
Continuous Random Variable
- Random variable: $x$
- Values: $x \in [a, b]$
- Probability density function $\rho(x)$ with $\int_a^b \rho(x)\,dx = 1$
Uniformly Distributed
$\rho(x)$ is constant.
$\int_a^b \rho(x)\,dx = 1 \implies \rho(x) = 1/(b-a)$
Extensions
Expected Value
$$ E[g(x)] = \int_a^b g(x)\rho(x)\,dx $$Estimate
$$ E[g(x)] \approx \frac{1}{N} \sum_{i=1}^{N} g(x_i) $$Multidimensional Extension
For difficult domains (complex geometries):
$$ E[g(x)] = \int_{x\in D} g(x)\rho(x)\,dA_x $$Numerical Integration
Deterministic
- Split domain into set of fixed segments.
- Sum function values with size of segments (Riemann!).
Random (Monte Carlo) Algorithm
For a random sequence $x_1, \dots, x_n$:
$$ \int_0^1 f(x)\,dx \approx \frac{1}{n} \sum_{i=1}^{n} f(x_i) $$Example: Computing $\pi$
Use the unit square $[0, 1]^2$ with a quarter-circle.
$$ f(x,y) = \begin{cases} 1 & (x,y)\in \text{circle} \\ 0 & \text{else} \end{cases} $$ $$ A_{\text{quarter-circle}} = \int_0^1\int_0^1 f(x,y)\,dxdy = \frac{\pi}{4} $$We estimate the area by randomly evaluating $f(x,y)$:
$$ A_{\text{quarter-circle}} \approx \frac{1}{N}\sum_{i=1}^{N}f(x_i,y_i) $$ $$ \pi \approx \frac{4}{N} \sum_{i=1}^{N}f(x_i,y_i) $$Convergence & Error
The expected value of the error is $\mathcal{O}\left(\frac{1}{\sqrt{N}}\right)$.
Key Takeaways
- Convergence does not depend on dimension.
- Deterministic integration is very hard in higher dimensions.
- Deterministic integration is very hard for complicated domains.
Central Limit Theorem (CLT)
The distribution of an average is close to being normal, even when the distribution from which the average is computed is not normal.
- Let $x_1, \dots, x_n$ be independent random variables from any PDF.
- Consider the sum $S_n = x_1 + \dots + x_n$.
- The expected value is $n\mu$ and standard deviation is $\sigma \sqrt{n}$.
- The term $\frac{S_n - n\mu}{\sigma\sqrt{n}}$ approaches the normal distribution.
- Implication: The sample mean has an error of $\sigma/\sqrt{n}$.
Now What? Minimizing Noise
How does one minimize the noise in this random process?
- Pick better samples!
- Use more samples where the impact is high: where $f$ is large.
So pick a distribution similar to $f$:
$$ p_{optimal} \propto f(x) $$Implementation Warning
🧠Final Challenge Win $20
How does the error of Monte Carlo integration scale with the number of samples $N$?