EE 319K Introduction to Embedded Systems

14-1Bard, Gerstlauer, Valvano, Yerraballi

EE 319KIntroduction to Embedded Systems

Lecture 14: Gaming Engines, Coding Style,

Floating Point


AgendaRecap

Software design2-D arrays, structsBitmaps, spritesLab 10

AgendaGaming engine designCoding styleFloating point


Numbers

Integers (ℤ): universe is infinite but discrete No fractions No numbers between 5 and 6 A countable (finite) number of items in a finite range

Real numbers (ℝ): universe is infinite & continuous Fractions represented by decimal notation

o Rational numbers, e.g., 5/2 = 2.5o Irrational numbers, e.g., ~ 22/7 = 3.14159265 . . .

Infinity of numbers exist even in the smallest range

(Adapted from V. Aagrawal)


Number RepresentationIntegers

Fixed-width integer number

RealsFixed-point number I •

o Store I, but is fixedo Decimal fixed-point (=10m) = I • 10m

o Binary fixed-point (=2m) = I • 2m

Floating-point number = I • BE

o Store both I and E (only B is fixed)


Wide Range of Real Numbers

A large number:976,000,000,000,000 = 9.76 • 1014

A small number:0.0000000000000976 = 9.76 • 10-14

No fixed that can represent bothNot representable in single fixed-point format

(Adapted from V. Aagrawal)


Floating Point NumbersDecimal scientific notation

0.513×105, 5.13×104 and 51.3×103

5.13×104 is in normalized scientific notation

Binary floating point numbersBase B = 2Binary point

o Multiplication by 2 moves the point to the leftNormalized scientific notation, e.g., 1.0×2-1

o Known as floating point numbers

(Adapted from V. Agrawal)


Normalizing NumbersIn scientific notation, we generally

choose one digit to the left of the decimal point13.25 × 1010 becomes 1.325 × 1011

Normalizing meansShifting the decimal point until we have the

right number of digits to its lefto Normally one

Adding or subtracting from the exponent to reflect the shift



Floating Point Numbers

General format±1.bbbbb two×2eeee

or (-1)S × (1+F) × 2E

Where S = sign, 0 for positive, 1 for negative F = fraction (or mantissa) as a binary integer,

1+F is called significand E = exponent as a binary integer, positive or

negative (two’s complement)



ANSI/IEEE Std 754-1985Single-precision float format


s e7

e0

m1

m23

Bit 31 Mantissa sign, s=0 for positive, s=1 for negativeBits 30:23 8-bit biased binary exponent 0 ≤ e ≤ 255Bits 22:0 24-bit mantissa, m, expressed as a binary fraction,

A binary 1 as the most significant bit is implied.m = 1.m1m2m3...m23

f = (-1)s • 2e-127• m


IEEE 754 Floating Point Standard Biased exponent:

exponent range [-127,127] changed to [0, 255] Biased exponent is an 8-bit positive binary integer True exponent obtained by subtracting 12710 or 011111112 255 = special case

First bit of significand is always 1:± 1.bbbb . . . b × 2E

1 before the binary point is implicitly assumedo So we don’t need to include it – just assume it’s there!

Significand field is 23 bit fraction after the binary point Significand range is [1, 2)

Standard formats: Single precision: 8 (E) + 23 (F) + 1 (S) = 32 bits (float) Double precision: 11 (E) + 52 (F) + 1 (S) = 64 bits (double)



NegativeOverflow

PositiveOverflow

Expressible numbers

Numbers in 32-bit Formats Two’s complement integers

Floating point numbers

The range is larger, but the number of numbers per unit interval is less than that for a comparable fixed point range

-231 231-10

Expressible negativenumbers

Expressible positivenumbers

0-2-127 2-127

Positive underflowNegative underflow

(2 – 2-23)×2127- (2 – 2-23)×2127



Binary to Decimal Conversion

Binary (-1)S (1.b1b2b3b4) × 2E

Represents (-1)S × (1 + b1×2-1 + b2×2-2 + b3×2-3 + b4×2-4) × 2E

Example: -1.1100 × 2-2 (binary) = - (1 + 2-1 + 2-2) ×2-2

= - (1 + 0.5 + 0.25)/4

= - 1.75/4

= - 0.4375 (decimal)



Decimal to Binary Conversion

Converting from base 10 to the representation Single precision example Covert 10010 Step 1 – convert to binary - 0110 0100

In a binary representation form of 1.xxx have 0110 0100 = 1.100100 x 26

128 64 32 16 8 4 2 1 0 1 1 0 0 1 0 0


Decimal to Binary Conversion (cont’d)

1.1001 x 26 is binary for 100 Thus the exponent is a 6

Biased exponent will be 6+127=133 = 1000 0101 Sign will be a 0 for positive Stored fractional part f will be 1001

Thus we have S E F 0 100 0 010 1 1 00 1000…. 4 2 C 8 0 0 0 0 in hexadecimal $42C8 0000 is representation for 100


Positive Zero in IEEE 754

+ 1.0 × 2-127

Smallest positive number in single-precision IEEE 754 standard.

Interpreted as positive zero. Exponent less than -127 is positive underflow;

can be regarded as zero.

0 00000000 00000000000000000000000Biased

exponentFraction



Negative Zero in IEEE 754

- 1.0 × 2-127 Smallest negative number in single-precision IEEE

754 standard. Interpreted as negative zero. True exponent less than -127 is negative

underflow; may be regarded as 0.

1 00000000 00000000000000000000000Biased

exponentFraction



Positive Infinity in IEEE 754

+ 1.0 × 2128 Largest positive number in single-precision IEEE

754 standard. Interpreted as + ∞ If true exponent = 128 and fraction ≠ 0, then the

number is greater than ∞. It is called “not a number” or NaN and may be

interpreted as ∞.

0 11111111 00000000000000000000000Biased

exponentFraction



Negative Infinity in IEEE 754

-1.0 × 2128 Smallest negative number in single-precision IEEE

754 standard. Interpreted as - ∞ If true exponent = 128 and fraction ≠ 0, then the

number is less than - ∞ It is called “not a number” or NaN and may be

interpreted as - ∞.

1 11111111 00000000000000000000000Biased

exponentFraction



IEEE Representation Values

If E=255 and F is nonzero, then V=NaN ("Not a number") If E=255 and F is zero and S is 1, then V=-Infinity If E=255 and F is zero and S is 0, then V=+Infinity If 0<E<255 then V=(-1)**S * 2 ** (E-127) * (1.F) where

"1.F" is intended to represent the binary number created by prefixing F with an implicit leading 1 and a binary point.

If E=0 and F is nonzero, then V=(-1)**S * 2 ** (-126) * (0.F) These are "unnormalized" values.

If E=0 and F is zero and S is 1, then V=-0 If E=0 and F is zero and S is 0, then V=0


Addition and Subtraction0. Zero check

- Change the sign of subtrahend- If either operand is 0, the other is the result

1. Significand alignment: right shift smaller significand until two exponents are identical.

2. Addition: add significands and report exception if overflow occurs.

3. Normalization- Shift significand bits to normalize.- report overflow or underflow if exponent goes out of

range.4. Rounding



RoundingAdjusting significands before addition

will produce results that exceed 24 bit Round toward infinity

o select next largest normalized resultRound toward minus infinity

o select next smallest normalized resultRound toward zero

o truncate resultRound to nearest

o select closest normalized resulto used by IEEE 754


Example Subtraction: 0.510- 0.437510 Step 0:Floating point numbers to be added 1.0002×2-1 and -1.1102×2-2

Step 1: Significand of lesser exponent is shifted right until exponents match

-1.1102×2-2 → - 0.1112×2-1

Step 2: Add significands, 1.0002 + (- 0.1112)Result is 0.0012 ×2-1

Step 3: Normalize, 1.0002× 2-4

No overflow/underflow since127 ≥ exponent ≥ -126

Step 4: Rounding, no change since the sum fits in 4 bits.

1.0002 ×2-4 = (1+0)/16 = 0.062510



FP Multiplication: Basic Idea

1. Separate sign2. Add exponents3. Multiply significands4. Normalize, round, check overflow5. Replace sign



FP Division: Basic Idea

1. Separate sign.2. Check for zeros and infinity.3. Subtract exponents.4. Divide significands.5. Normalize/overflow/underflow.6. Rounding.7. Replace sign.


EE 319K Introduction to Embedded Systems

Documents