Randomness

Probability, Monte Carlo, and Random Number Generators

December 1, 2008

Part 1: Probability Basics

Expected Value and Variance

Expected Value

Average value of the variable:

$$ E[x] = \sum_{j=1}^{n}x_j p_j $$

Variance

Variation from the average:

$$ \sigma^2[x] = E[(x-E[x])^2] = E[x^2] - E[x]^2 $$

Example: Throwing a Die

Expected value: $E[x] = (1+2+\dots+6)/6 = 3.5$

Variance: $\sigma^2[x] = 2.916$

Estimation

To estimate the expected value, choose a set of random values based on the probability and average the results:

$$ E[x] = \frac{1}{N} \sum_{j=1}^{N} x_i $$

Note: Bigger $N$ gives better estimates.

Die Example:

3 rolls: $3, 1, 6 \rightarrow E[x] \approx (3+1+6)/3 = 3.33$
9 rolls: $3, 1, 6, 2, 5, 3, 4, 6, 2 \rightarrow E[x] \approx 3.51$

Law of Large Numbers

By taking $N$ to $\infty$, the error between the estimate and the expected value is statistically zero. That is, the estimate will converge to the correct value.

$$ P\left(E[x] = \lim_{N\rightarrow\infty} \frac{1}{N} \sum_{i=1}^{N} x_i\right) = 1 $$

Continuous Extensions

Expected Value $E[x] = \int_a^b x\rho(x)\,dx$

Function E.V. $E[g(x)] = \int_a^b g(x)\rho(x)\,dx$

Variance $\sigma^2[x] = \int_a^b (x-E[x])^2\rho(x)\,dx$

Estimate $E[g(x)] \approx \frac{1}{N} \sum_{i=1}^{N} g(x_i)$

🧠 Quick Check Win $10

What happens to the error of the sample mean as $N \to \infty$?

Part 2: Monte Carlo Integration

2D Example: Computing $\pi$

Use the unit square $[0,1]^2$ with a quarter-circle.

$$ f(x,y) = \begin{cases} 1 & \quad (x,y)\in \, circle\\ 0 & else \end{cases} $$ $$ A_{quarter-circle} = \int_0^1\int_0^1 f(x,y)\,dxdy = \frac{\pi}{4} $$

Estimate the area by randomly evaluating $f(x,y)$:

$$ A_{quarter-circle} \approx \frac{1}{N}\sum_{i=1}^{N}f(x_i,y_i) $$

Therefore: $\pi \approx \frac{4}{N} \sum_{i=1}^{N}f(x_i,y_i)$

input N
call rand in 2d
for i=1:N
    sum = sum + f(x_i,y_i)
end
sum = 4*sum/N

The expected value of the error is $\mO(\frac{1}{\sqrt{N}})$.

Convergence does not depend on dimension.
Deterministic integration (e.g., Trapezoid $\mO(1/N^{2/d})$) is very hard in higher dimensions.

Central Limit Theorem (CLT)

The distribution of an average is close to being normal, even when the distribution from which the average is computed is not normal.

Let $x_1,\dots,x_n$ be some independent random variables from any PDF.
Consider sum $S_n = x_1 + \dots + x_n$.
Expected value is $n\mu$, Standard Deviation is $\sigma \sqrt{n}$.
$\frac{S_n - n\mu}{\sigma\sqrt{n}}$ approaches the normal distribution.
Result: The sample mean has an error of $\sigma/\sqrt{n}$.

Stochastic Simulation

From M. Heath, Scientific Computing

Also known as Monte Carlo methods. Useful for:

Nondeterministic processes.
Deterministic models too complicated to model.
Deterministic models with high dimensionality.

Requirements:

Knowing which probability distributions are needed.
Generating sufficient random numbers.

Part 3: Random Number Generators

Randomness & Repeatability

Randomness $\approx$ Unpredictability. Physical processes (coins, dice) are technically deterministic but practically unpredictable.

Alert: Computer algorithms for random number generation are deterministic.

These are called Pseudorandom. They are predictable and reproducible (which is good for debugging!).

>> rand('seed', 1234)
>> rand(10, 1) % Generates same sequence every time

Properties of a Good RNG

Random pattern: Passes statistical tests.
Long period: Long time before repeating.
Efficiency: Fast execution, low storage.
Repeatability: Same sequence from same seed.
Portability: Same sequence on different architectures.

Linear Congruential Generators (LCG)

Congruential generators are of the form:

$$ x_k = (a x_{k-1} + c) \pmod M $$

$x_0$: Seed.
$M$: Modulus (e.g., $2^{31}-1$).
$a, c$: Multiplier and increment.

Example:

Let $a=13, c=0, m=31, x_0=1$.

Sequence: $1, 13, 14, 27, 10, 6, \dots$

This is a permutation of integers $1 \dots 30$, so period is $m-1$.

History & Modern Methods

IBM SSP (1960s)

Used $a=65539, c=0, m=2^{31}$. Resulted in strong correlation among three successive integers.

$$ x_{k+2} = 6x_{k+1} - 9x_k \pmod m $$

Modern Methods

Method of Marsaglia (Period $\approx 2^{1430}$).
Initialize x_0...x_3 and c s = 2111111111*x_{n-4} + 1492*x_{n-3} + 1776*x_{n-2} + 5115*x_{n-1} + c x_n = s mod 2^32 c = floor(s/2^32)
Unix rand(): uses $a=1103515245, c=12345, m=2^{31}$.

Typical LCG Values

Source	m	a	c
Numerical Recipes	$2^{32}$	1664525	1013904223
Borland C/C++	$2^{32}$	22695477	1
glibc (GCC)	$2^{32}$	1103515245	12345
MS Visual C++	$2^{32}$	214013	2531011
Apple CarbonLib	$2^{31}-1$	16807	0

Advanced Generators

Fibonacci (Subtractive) Generator

Produce floating-point random numbers directly using differences.

$$ x_k = x_{k-17} - x_5 $$

Lags (17, 5) must be chosen carefully.
Long periods. No division needed.

Non-Uniform Distributions

If cumulative distribution function is invertible, we can map uniform $x_k$ to non-uniform $y_k$.

Example (Exponential): $f(t) = \lambda e^{-\lambda t}$

$$ y_k = -\frac{\log(1-x_k)}{\lambda} $$

Quasi-Random Sequences

True random samples exhibit clumping. Quasi-random sequences attempt randomness while maintaining better uniform coverage (points avoid each other).

🧠 Final Challenge Win $20

Why are Linear Congruential Generators (LCG) sensitive to the choice of 'a' and 'c'?