Numerical Representation

Floating Point & Errors

Understanding how computers store numbers and why $0.1 + 0.2 \neq 0.3$.

🎯 Objectives

  • 1 Look at floating point representation in its basic form.
  • 2 Expose errors of a different form: rounding error.
  • 3 Highlight the IEEE-754 standard.

Why this matters

Errors come in two forms: truncation error and rounding error.

  • We always have them.
  • Case study: Intel Pentium Bug.
  • Our job as developers: Reduce their impact.
1

Base Representations

We are familiar with base 10 representation. For example, $1234.1234$:

\[ 1234 = 4\times 10^0 + 3\times 10^1 + 2\times 10^2 + 1\times 10^3 \] \[ .1234 = 1\times 10^{-1} + 2\times 10^{-2} + 3\times 10^{-3} + 4\times 10^{-4} \]

General Base $\beta$

In a system with base $\beta$ (e.g., binary $\beta=2$, hex $\beta=16$), we represent a number as:

\[ (a_n\dots a_2a_1a_0.b_1b_2b_3b_4\dots)_{\beta} = \sum_{k=0}^{n}a_k\beta^{k} + \sum_{k=1}^{\infty} b_k \beta^{-k} \]
2

Conversion Algorithms

Integer to Binary (Base 10 → 2)

Repeatedly divide by 2 and keep the remainders.

Example: Convert $(11)_{10}$

  • $11 / 2 = 5$ R 1 ($a_0$)
  • $5 / 2 = 2$ R 1 ($a_1$)
  • $2 / 2 = 1$ R 0 ($a_2$)
  • $1 / 2 = 0$ R 1 ($a_3$)
Result: $(1011)_2$

Fraction to Binary (Base 10 → 2)

Repeatedly multiply by 2 and keep the integer part.

Example: Convert $0.625$

  • $0.625 \times 2 = \mathbf{1}.25$ ($b_1 = 1$)
  • $0.25 \times 2 = \mathbf{0}.5$ ($b_2 = 0$)
  • $0.5 \times 2 = \mathbf{1}.0$ ($b_3 = 1$)
Result: $(0.101)_2$

🧠 Quick Check +10 XP

Use Matlab commands to check conversions:

>> dec2bin(197) ans = 11000101 >> bin2dec('11000101') ans = 197

What is the binary representation of $(5)_{10}$?

3

The Precision Problem

Converting fractions doesn't always end cleanly. Consider the algorithm:

r_0 = x for k = 1, 2, ..., m if r_{k-1} >= 2^{-k} b_k = 1 r_k = r_{k-1} - 2^{-k} else b_k = 0 end end

The Infinite Series

For simple numbers like $\frac{1}{5} = 0.2$, we need an infinite number of binary digits:

$0.2 \rightarrow .0011\, 0011\, 0011\, \dots$

⚠️ This truncation is Roundoff Error in its most basic form.

4

Case Study: The Intel Pentium Bug

Chip Architecture
Pentium Logo
Newspaper Headlines

June 1994

Intel engineers discover a division error in the FPU. Managers decide it's minor and keep it internal.
Simultaneously: Dr. Nicely at Lynchburg College notices computation problems.

October 1994

After months of testing, Nicely confirms the bug is in the processor. He contacts Intel (Oct 24), but after no action, sends a public email (Oct 30).

FROM: Dr. Thomas R. Nicely Professor of Mathematics Lynchburg College 1501 Lakeside Drive Lynchburg, Virginia 24501-3199 Phone: 804-522-8374 Fax: 804-522-8499 Internet: nicely@acavax.lynchburg.edu TO: Whom it may concern RE: Bug in the Pentium FPU DATE: 30 October 1994 It appears that there is a bug in the floating point unit (numeric coprocessor) of many, and perhaps all, Pentium processors. In short, the Pentium FPU is returning erroneous values for certain division operations. For example, 0001/824633702441.0 is calculated incorrectly (all digits beyond the eighth significant digit are in error). This can be verified in compiled code, an ordinary spreadsheet such as Quattro Pro or Excel, or even the Windows calculator (use the scientific mode), by computing (824633702441.0)*(1/824633702441.0), which should equal 1 exactly (within some extremely small rounding error; in general, coprocessor results should contain 19 significant decimal digits). However, the Pentiums tested return 0.999999996274709702

November 1994

  • Nov 1: Phar Lap Software receives Nicely's email and alerts Microsoft, Borland, Watcom.
  • Nov 2: Email goes global.
  • Nov 15: USC reverse-engineers the chip to expose the problem. Intel denies severity. Stock falls.
  • Nov 22: CNN Moneyline interviews Intel; they claim the problem is minor.
  • Nov 23: The MathWorks develops a software fix.
  • Nov 24: NYT story. Intel still sending flawed chips.

December 1994

IBM halts shipments. Intel admits fault and sets aside $420 million to fix/replace chips.

5

Types of Numerical Bugs

Roundoff Error

Occurs when digits are lost due to limited memory (e.g., storing $0.3333...$ as $0.3333$).

Truncation Error

Occurs when discrete values/finite series are used to approximate a mathematical expression (e.g., stopping a Taylor series at term 3).

Uncertainty & Conditioning

Well-conditioned

Numerical results are insensitive to small variations in the input.

⚠️

Ill-conditioned

Small variations lead to drastically different numerical calculations.

6

Floating Point Representation

Normalized Floating-Point Form

\[ x = \pm(0.d_1d_2d_3\dots d_m)_{\beta} \times \beta^{e} \]

where $d_1 \neq 0$ for normalization.

Toy Example: 4-Bit System

Suppose we have 3 bits for mantissa ($d_1, d_2, d_3$) and 1 bit for exponent.

.$d_1$ $d_2$ $d_3$ $e_1$

Possible values range from $0.000_2 \times 2^{-1}$ to $0.111_2 \times 2^1$. This creates discrete values on the number line.

Discrete Points
Normalized Gaps

Underflow

Computations too close to zero. Often fall back to 0.

Overflow

Computations too large to represent. Considered a severe error.

🧠 Final Check +20 XP

Why do we use "Normalized" floating point numbers (where $d_1 \neq 0$)?

Interactive Computing
7

Interactive Playground

Experiment with number representation below. You can run MATLAB code via Octave Online or Python code directly in your browser.

Octave MATLAB / Octave (Online)

Copy & Paste into Terminal:

Python Python (Client-Side)

Output
Loading Python environment...