Top Banner
1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1
33

1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

1

Lecture 3 Lecture 3

Bit Operations

Floating Point – 32 bits or 64 bits

1

Page 2: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO2

Bit Operations

AND &

OR |

ONE'S COMPLEMENT ~

EXCLUSIVE OR ^

SHIFT (right) >>

SHIFT (left) <<

Page 3: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO3

Operation - examples

AND 1 & 1 = 1; 1& 0 = 0

OR 1 |1 = 1; 1| 0 = 1; 0|0 = 0

~ 0 =~1; 1 =~0;

^ 0^ 0 = 0; 1^1 = 0; 1^0 =1; 0^1 = 1

>> 0x010 = 0x001 <<1

<< 0x001 = 0x010 >>1

Page 4: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO4

AND Example

1111 0010 (0xf2)1111 1110 (0xfe)---------------- (and) &1111 0010 (0xf2)char c = 0xf2;char d = 0xfe;Char e = c & d; //e is 0xf2

Page 5: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO5

OR Example

1111 0010 (0xf2)1111 1110 (0xfe)--------------(or) |1111 1110 (0xfe)

char c = 0xf2;char d = 0xfe;char e = c | d; //e is 0xfe

Page 6: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO6

One’s complement

1111 0010 (0xf2)

-------------- ~0000 1101 (0x0d)

char c = 0xf2;char e = ~c; //e is 0x0d

Page 7: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO7

EXCLUSIVE OR

1111 0010 (0xf2)1111 1110 (0xfe)-------------- (^) 0000 1100 (0x0c)

char c = 0xf2;char d = 0xfe;char e = c ^ d; //e is 0x0c

Page 8: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO8

SHIFT >> 1 - (right) by one bit

1111 0010 (0xf2)>> 1 (shift right by one bit)---------------------0111 10001 (0x79)

char c = 0xf2;char e = c >>1; //e is 0x79

Page 9: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO9

SHIFT >> 2 - by two bits

1111 0010 (0xf2)>> 2 (shift right by one bit)---------------------0011 1100 (0x3c)

char c = 0xf2;char e = c >>2; //e is 0x3c

Page 10: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO10

SHIFT << 1 - (left) by one bit

1111 0010 (0xf2)<< 1 (shift right by one bit)---------------------1110 0100 (0xe4)

char c = 0xf2;char e = c <<1; //e is 0xe4

Page 11: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO11

SHIFT << 2 - by two bits

1111 0010 (0xf2)>> 2 (shift right by one bit)---------------------1100 1000 (0xc8)

char c = 0xf2;char e = c <<2; //e is 0xc8

Page 12: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO12

Results or bit operation (example)

(1 | 2) == 3 (1 | 3) == 3 (1 & 2) == 0 (1 & 3) == 1 (2 & 3) == 2 (0 ^ 3) == 3 (1 ^ 3) == 2 (2 ^ 3) == 1 (3 ^ 3) == 0 ~0 == -1 (signed) or 255 (unsigned). ~23 == -24 (signed) or 232 (unsigned).

Page 13: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO13

Integer

Formats, Signs, and Precision

Overflow

Page 14: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO14

Data representation

150 (decimal) is

128(2^7) + 16(2^4) + 4(2^2) + 2(2^1). We can rewrite this as:

1*128 + 0*64 + 0*32 + 1*16 + 0*8 + 1*4 + 1*2 + 0*1 or

1001 0110 (in hex), In 8 bits, for 32 bits, 2^32

Page 15: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO15

Format

In C, the size of integer types (short, int, long, etc.) is implementation dependent, but on most machines:

chars are 8 bits

ints and longs are 32 bits

shorts are 16 bits

Page 16: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO16

One’s complement & two’s complement

1001 0010 //original data0110 1101 //reverse 1 to 0, 0 to 1

One’s complement

0110 1101 +0000 0001 (add one to one’s complement)

0110 1110 Two’s complement

Page 17: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO17

Application of twos’ complement

1100 0011 – 0011 0001

-0011 0001 can be converted into two’s complement

The operation becomes addition instead of subtraction

Two’s complement –0011 0001 is 1100 1111

1100 0011 – 0011 0001 is

1100 0011 + 1100 1111 =0001 0010Convert “-” to “+”

Page 18: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO18

Signed integer of 100

Negative sign is represented by 1 (most significant bit)100 (decimal) 0x0064 (hex)

0000 0000 0110 0100

positive

-100 (decimal) 0xff9c (hex)

1111 1111 1001 1100

negative

Page 19: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO19

Example of – 101 (decimal)

101 (decimal ) is 0x0065 (hex)

0000 0000 0110 0101 ( 32 bits integer)

-101 is two’s complement

0xff9bConvert 0 to 1, 1 to 0 and then add 1

0000 0000 0110 0101 (101 in decimal)

1111 1111 1001 1010 (convert 1 to 0 and 0 to 1)

1111 1111 1001 1011 (add 1, result)

Page 20: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO20

Left & right Shift

101 (dec) is 0x0065 (hex)

Shift right by one bit >>1

0000 0000 0110 0101 (101 dec)

>> 1 (shift left)

0000 0000 0011 0010 (50 dec), not 50.5

Page 21: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO21

Overflow

When integers are too big to fit into a word, overflow occurs.

16-bit unsigned integer, the largest is 32767

If you add 32767 + 1 (dec), it produces 0

1111 1111 1111 1111 (binary)

+

0000 0000 0000 0001

1 0000 0000 0000 0000 (overflow)

Page 22: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO22

Floating

Fixed-Point Representations – 32.45

Floating-Point Representations such as 32.45, 3.245 x 10, 0.3245 x 100

Normalization and Hidden Bits

IEEE Floating Point

Details

Page 23: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO23

Fixed-Point Representations

Such as 123.45

1234.78

The decimal point is fixed

Page 24: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO24

Floating point

A floating-point number is really two numbers packed into one string of bits. One of the numbers is simply a fixed-point binary number, which is called the mantissa. The other number acts to shift the binary point left or right to keep the mantissa in a useful range. This number is called the exponent.

Page 25: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO25

Example of Floating

1.234

78.945

12.56

4.5 E10

4.5 x 10^4

Page 26: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO26

Expression1 bit sign bit, 8 bit exponent, and 23 bit Mantissa (total 32 bits)

-1^Sign * 2^(Exponent - 127) * (1 + Mantissa * 2^-23)

Zero, sign bit is 0, Negative, sign bit is 1

Exponent is unsigned, minus 127. That is if the value is 128, it means 128 – 127 = 1, if the value is 256, it means 256 – 127 = 128, or the value is zero, it means 0 – 127 = -127.

Page 27: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO27

Example

Page 28: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO28

Expression - Mantissa

Mantissa is unsigned bit, the expression is 1 + Mantissa * 2^(-23)

If Mantissa is zero, the expression becomes 1 + 0 * 2^(-23) = 1If Mantissa is 2^23, the expression becomes 1 + 2^23 * 2 ^(-23) = 1 + 1 = 2

It ranges from 1 to 2

Page 29: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO29

Example

2.5 (floating point)

0100 0000 0010 0000 0000 0000 0000 0000

Sign: positive (1)

Exponent : 1000 0000 : 128 (128 – 127 = 1)

Mantissa: 1. 010 0000 0000 0000 0000 0000, 1.25

Result 1 x 1.25 x 2^1 = 2.5

Page 30: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO30

Example

Determine the values of

1011 1101 0100 0000 0000 0000 0000 0000

Sign = 1, negative

Exponent = 122 so exponent = -5 (122 – 127)

Mantissa: 100 0000 0000 0000 0000 0000 = 1.1

So result –1 x 1.1 x 2^(-5) 2 = -0.046875 10

(decimal)

Page 31: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO31

Two representation of Zero

When the sign bit is zero, positive, exponent is zero and Mantissa is zero

When the sign bit is 1, negative, exponent is zero and Mantissa is zero

In short, there are two formats of ZERO

Page 32: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO32

IEEE Double Precision – 64 bits

64 bits

with

11 exponent bits

52 mantissa bits

Plus one bit sign bit

C++ uses 32 bits (4 bytes) instead of 64 bits

Page 33: 1 Lecture 3 Bit Operations Floating Point – 32 bits or 64 bits 1.

DCO33

Summary

Bit operation, &, |, ~, >> or <<

Representation of data, int, short, long

Floating point – sign, magnitude, mantissa for example, -3.4 x10^4

- (negative), 3.5 (mantissa), 4 magnitude

0x00341256 can be an integer, floating point, or a statement. That is why we need to

define

int i;Char a;