Information Theory

Measuring uncertainty, knowledge, and the limits of compression

Claude Shannon

In 1948, Claude Shannon published "A Mathematical Theory of Communication", introducing the concept of Information Entropy.

He asked: How can we quantify "information"? How much "surprise" is in a message?

📡

Bit (Binary Digit)

Coined by Shannon

🗳️ Entropy as Uncertainty

Entropy measures the uncertainty of an outcome.
High Entropy = High Uncertainty = High "Surprise"

Bucket 1

4 Red, 0 Blue

Entropy: 0 bits

Bucket 2

3 Red, 1 Blue

Entropy: ~0.81 bits

Bucket 3

2 Red, 2 Blue

Entropy: 1 bit

📐 The Formula

\( H = - \sum_{i} p_i \log_2(p_i) \)

Interactive Calculator

Enter probabilities (comma separated, must sum to 1):

Binary Entropy Function

How does entropy change for a coin flip with probability \( p \) of Heads?

Entropy \( H(p) = \) 1.00 bits

Notice: Entropy is maximized at \( p=0.5 \) (most uncertain) and 0 at \( p=0 \) or \( p=1 \) (certainty).

Connection to "20 Questions"

Entropy is the minimum average number of yes/no questions needed to determine an outcome.

Example: Guessing a Letter (A, B, C, D)

If all are equally likely (p=0.25):

  • Q1 Is it A or B? (Yes/No)
  • Q2 Is it A? (or C?)

Total Questions = 2 = \(\log_2(4)\)

What if A is very common?

P(A)=0.5, P(B)=0.25, P(C)=0.125, P(D)=0.125

We should ask "Is it A?" first!

Entropy = 1.75 bits

(Less than 2 questions on average!)

Interactive: Guess the Letter!

I have picked a letter based on the probabilities above (A is 50%, B is 25%, etc). Try to guess it!

📝 Test Your Understanding

Question 1:

Which coin toss has the HIGHEST entropy?

Previous: Random Walks Next: Conditional Probability