Information Theory
Measuring uncertainty, knowledge, and the limits of compression
Claude Shannon
In 1948, Claude Shannon published "A Mathematical Theory of Communication", introducing the concept of Information Entropy.
He asked: How can we quantify "information"? How much "surprise" is in a message?
Bit (Binary Digit)
Coined by Shannon
🗳️ Entropy as Uncertainty
Entropy measures the uncertainty of an outcome.
High Entropy = High Uncertainty = High "Surprise"
Bucket 1
4 Red, 0 Blue
Entropy: 0 bits
Bucket 2
3 Red, 1 Blue
Entropy: ~0.81 bits
Bucket 3
2 Red, 2 Blue
Entropy: 1 bit
📐 The Formula
\( H = - \sum_{i} p_i \log_2(p_i) \)
Interactive Calculator
Enter probabilities (comma separated, must sum to 1):
Binary Entropy Function
How does entropy change for a coin flip with probability \( p \) of Heads?
Entropy \( H(p) = \) 1.00 bits
Notice: Entropy is maximized at \( p=0.5 \) (most uncertain) and 0 at \( p=0 \) or \( p=1 \) (certainty).
❓ Connection to "20 Questions"
Entropy is the minimum average number of yes/no questions needed to determine an outcome.
Example: Guessing a Letter (A, B, C, D)
If all are equally likely (p=0.25):
- Q1 Is it A or B? (Yes/No)
- Q2 Is it A? (or C?)
Total Questions = 2 = \(\log_2(4)\)
What if A is very common?
P(A)=0.5, P(B)=0.25, P(C)=0.125, P(D)=0.125
We should ask "Is it A?" first!
Entropy = 1.75 bits
(Less than 2 questions on average!)
Interactive: Guess the Letter!
I have picked a letter based on the probabilities above (A is 50%, B is 25%, etc). Try to guess it!
📝 Test Your Understanding
Question 1:
Which coin toss has the HIGHEST entropy?