This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
ECE 313 - Computer Organization
Floating PointFeb 2005
Reading: 3.6-3.9HW Due Monday 3/28: 3.7,3.9, 3.10, 3.14
3.30, 3.35, 3.37, 3.38, 3.40, 3.44
EXAM 1: Wednesday 3/30Why did the Ariane 5 Explode?(image source: java.sun.com)
Motivation and Key Ideas IEEE 754 Floating Point Format Range and precision Floating Point Arithmetic MIPS Floating Point Instructions Rounding & Errors Summary
Feb 2005 Floating Point 3
Floating Point - Motivation
Review: n-bit integer representations Unsigned: 0 to 2n-1 Signed Two’s Complement: - 2n-1 to 2n-1-1 Biased (excess-b): -b to 2n-b
Problem: how do we represent: Very large numbers 9,345,524,282,135,672,
2354
Very small numbers 0.00000000000000005216,2-100
Rational numbers 2/3 Irrational numbers sqrt(2) Transcendental numbers e, π
Feb 2005 Floating Point 4
Fixed Point Representation
Idea: fixed-point numbers with fractions Decimal point (binary point) marks start of fraction
Decimal: 1.2503 = 1 X 100 + 2 X 10-1 + 5 X 10-2 + 3 X 10-4
Binary: 1.0100001 = 1 X 20 + 1 X 2-2 + 1 X 2-7
Problems Limited locations for “decimal point” (binary point”) Won’t work for very small or very larger numbers
Feb 2005 Floating Point 5
Another Approach: Scientific Notation
Represent a number as a combination of Mantissa (significand): Normalized number
AND Exponent (base 10)
Example: 6.02 X 1023
Significand(mantissa)
Radix(base)
Exponent
Feb 2005 Floating Point 6
Floating Point
Key idea: adapt scientific notation to binary Fixed-width binary number for significand Fixed-width binary number for exponent (base 2)
Idea: represent a number as1.xxxxxxxtwo X 2yyyy
Significand(mantissa)
Radix(2)
Exponent
Leading ‘1’(Implicit)
Important Points: This is a tradeoff between precision and range Arithmetic is approximate - error is inevitable!
Feb 2005 Floating Point 7
Outline - Floating Point
Motivation and Key Ideas IEEE 754 Floating Point Format Range and precision Floating Point Arithmetic MIPS Floating Point Instructions Rounding & Errors Summary
Feb 2005 Floating Point 8
IEEE 754 Floating Point
Single precision (C/C++/Java float type)
Value N = (-1)S X 1.F X 2E-127
Double precision (C/C++/Java double type)
Value N = (-1)S X 1.F X 2E-1023
S E Exponent F Significand
1 bit 8 bits 23 bits
S E Exponent F Significand
F Significand (continued - 52 bits total)
1 bit 11 bits 20 bits
32 bits
Bias
Bias
Feb 2005 Floating Point 9
Floating Point Examples
8.75ten = 1 X 23 + 1 X 2-1 + 1 X 2-2 = 1.00011 X 23
Single Precision: • Significand: 1.00011000…. (note leading 1 is implied)
0000000 exponent - reserved for zero value (all bits zero) “Denormalized numbers” - drop the “1.”
• Used for “very small” numbers … “gradual underflow”
• Smallest denormalized number (single precision):0.00000000000000000000001 X 2-126 = 2-149
1111111 exponent Infinity - 111111 exponent, zero significand NaN (Not a Number) - 1111111 exponent, nonzero
significand
Feb 2005 Floating Point 13
Outline - Floating Point
Motivation and Key Ideas IEEE 754 Floating Point Format Range and precision Floating Point Arithmetic MIPS Floating Point Instructions Rounding & Errors Summary
Feb 2005 Floating Point 14
Floating Point Range and Precision
The tradeoff: range in exchange for uniformity “Tiny” example: floating point with:
Visualizing Floating Point - “Small” FP Representation
8-bit Floating Point Representation the sign bit is in the most significant bit. the next four bits are the exponent, with a bias of 7. the last three bits are the frac
Same General Form as IEEE Format normalized, denormalized representation of 0, NaN, infinity)
Smallest Pos. Denorm. 00…00 00…01 2– {23,52} X 2– {126,1022}
Single 1.4 X 10–45
Double 4.9 X 10–324
Largest Denormalized 00…00 11…11 (1.0 – ) X 2– {126,1022}
Single 1.18 X 10–38
Double 2.2 X 10–308
Smallest Pos. Normalized 00…01 00…00 1.0 X 2– {126,1022}
Just larger than largest denormalized
One 01…11 00…00 1.0
Largest Normalized 11…10 11…11 (2.0 – ) X 2{127,1023}
Single 3.4 X 1038
Double 1.8 X 10308
Feb 2005 Floating Point 21
Outline - Floating Point
Motivation and Key Ideas IEEE 754 Floating Point Format Range and precision Floating Point Arithmetic MIPS Floating Point Instructions Rounding & Errors Summary
Feb 2005 Floating Point 22
Floating Point Addition (Fig. 3.16)
1. Align binary point to number with larger exponent
2. Add significands
3. Normalize result and adjust exponent
4. If overflow/underflow throw exception
5. Round result (go to 3 if normalization needed again)
A 1.11 X 20 1.11 X 20 1.75
+ B + 1.00 X 2-2 + 0.01 X 20 0.25
10.00 X 20
(Normalize) 1.00 X 21 2.00
Hardware - Fig. 3.17, p. 201
Feb 2005 Floating Point 23
Compare the exponent of the two numbers.Shift the smaller to the right until its exponent Would match the larger exponent
Normalize the sum, either shifting right and Increment the exponent or shifting left andDecrementing the exponent
Round the significant to the appropriateNumber of bits
Add the significants
Overflow or underflow
Still normalized?
Feb 2005 Floating Point 24
Hardware Fig. 3.17, p. 201
Feb 2005 Floating Point 25
Floating Point Multiplication (Fig. 3.18)
1. Add 2 exponents together to get new exponent (subtract 127 to get proper biased value)
2. Multiply significands
3. Normalize result if necessary (shift right) & adjust exponent
4. If overflow/underflow throw exception
5. Round result (go to 3 if normalization needed again)
6. Set sign of result using sign of X, Y
Feb 2005 Floating Point 26
Outline - Floating Point
Motivation and Key Ideas IEEE 754 Floating Point Format Range and precision Floating Point Arithmetic MIPS Floating Point Instructions Rounding & Errors Summary
Feb 2005 Floating Point 27
MIPS Floating Point Instructions
Organized as a coprocessor Separate registers $f0-$f31 Separate operations Separate data transfer (to same memory)
Basic operations add.s - single add.d - double sub.s - single sub.d - double mul.s - single mul.d - double div.s - single div.d - double
Feb 2005 Floating Point 28
MIPS Floating Point Instructions (cont’d)
Data transfer lwc1, swcl (l.s, s.s) - load/store float to fp reg
compare and set condition bit if true bclt - branch if condition true bclf - branch if condition false
Feb 2005 Floating Point 29
Outline - Floating Point
Motivation and Key Ideas IEEE 754 Floating Point Format Range and precision Floating Point Arithmetic MIPS Floating Point Instructions Rounding & Errors Summary
Feb 2005 Floating Point 30
Rounding
Extra bits allow rounding after computation Guard Digit (may shift into number during normalization) Round digit - used to round when guard bit shifted during
normalization Sticky bit - used when there are 1’s to the right of the
round digit e.g., “0.010000001” (round to nearest even)
IEEE 754 supports four rounding modes Always round up Always round down Truncate Round to nearest even (most common)
Feb 2005 Floating Point 31
Limitations on Floating-Point Math
Most numbers are approximate Roundoff error is inevitable Range (and accuracy) vary depending on exponent “Normal” math properties not guaranteed:
Inverse (1/r)*r ≠ 1 Associative (A+B) + C ≠ A + (B+C)
(A*B) * C ≠ A * (B*C) Distributive (A+B) * C ≠ A*B + B*C
Scientific calculations require error management take a numerical analysis for more info
Feb 2005 Floating Point 32
IEEE Floating Point - Special Properties Floating Point 0 same as Integer 0
All bits = 0
Can (Almost) Use Unsigned Integer Comparison A > B if:
• A.EXP > B.EXP or
• A.EXP=B.EXP and A.SIG > B.SIG
But, must first compare sign bits Must consider -0 == 0 NaNs problematic
• Will be greater than any other values
• What should comparison yield?
This is equivalent to unsigned comparision!
Feb 2005 Floating Point 33
Addendum - Why Did the Ariane 5 Explode?
In 1996 Ariane 5 Flight 501 exploded after launch. Estimated cost of accident: $500 million
Feb 2005 Floating Point 34
Addendum - Why Did the Ariane 5 Explode?
The cause was traced to the Inertial reference system (SRI).
Both the main and backup SRI failed. Both units failed due to an out-of-range conversion
Input: double precision floating point Output: 16-bit integer for “horizontal bias” (BH)
Careful analysis during software design had indicated that BH would “fit” in 16 bits
So, why didn’t it fit?
Feb 2005 Floating Point 35
Addendum -Why did the Ariane 5 Explode?
Careful analysis during software design had indicated that BH would “fit” in 16 bits
BUT, all analysis had been done for the Ariane 4, the predecessor of Ariane 5 - software was reused
Since Ariane 5 was a larger rocket, the values for BH were higher than anticipated
AND, there was no handler to deal with the exception!
For more information: http://www.ima.umn.edu/~arnold/disasters/ariane.html Or, Google “Ariane 5”
Feb 2005 Floating Point 36
Summary - Chapter 3
Important Topics Signed & Unsigned Numbers (3.2) Addition and Subtraction (3.3) Carry Lookahead (B.6) Constructing an ALU (B.5) Multiplication and Division (3.4, 4.5) Floating Point (3.6)