CSCI 2227, Introduction to Scientific Computation
Prof. Alvarez
IEEE 754 Floating Point Standard

The IEEE 754 standard describes a set of rules for representing floating point numbers such as -25.77 and 2.5*10-29 in terms of strings of binary digits. This provides a finite precision model of real numbers for use in scientific computation.

Binary scientific notation

IEEE 754 is based on "normalized scientific notation" in base 2. The number to be represented is first converted to the form
mantissa * 2exponent
where the mantissa has a value between 1 (inclusive) and 2 (exclusive). The mantissa and exponent are then expressed in binary positional notation (refer to the discussion in the first lecture). This yields the desired normalized scientific notation for the number.

Example

In normalized base 2 scientific notation, the value 10+(1/8) is represented as
1.010001 * 2011
The actual bit strings could change depending on the number of bits allocated for the mantissa and exponent. For instance, using 8 bits for each of these, we would have the following representation:
1.0100010 * 200000011

Assembling the IEEE 754 representation

IEEE 754 comes in several different levels of precision: single (32 bits), double (64 bits), extended (usually 80 bits), and quadruple (128 bits). I will only discuss the double precision version here, as it is very widely used, and it is the version to which MATLAB defaults.

The double precision IEEE 754 representation of a number is broken down as follows:

sign (1 bit) exponent (11 bits) mantissa (52 bits)

Notes

Example

Following the above steps, we find the IEEE 754 representation of the value 10 + (1/8): We conclude that the double-precision IEEE 754 representation of 10 + (1/8) is:
0 10000000010 01000100 + 44 more zeros
This would normally be partitioned 4 bits at a time and expressed in hexadecimal (base 16) notation, as follows:
0100 0000 0010 0100 0100 0000 0000 0000 0000 = 4 0 2 4 4 0 0 0 0 0 0 0 0 0 0 0
You can check the result in MATLAB by formatting the value 10 + 1/8 in hex.