Computer Organization: A Programmer's Perspective Bits, Bytes, Nibbles, Words and Strings
Computer Organization:A Programmer's Perspective
Bits, Bytes, Nibbles, Words and Strings
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 2
Topics
Why bits? Why 0/1? Basic terms: Bits, Bytes, Nibbles, Words Representing information as bits
Characters and strings Instructions (more on this when we look at assembly) Numbers
Bit-level manipulations Boolean algebra Expressing in C
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 3
Everything is bits
Each bit is 0 or 1 By encoding/interpreting sets of bits in various ways
Computers represent numbers, sets, strings, etc…
Computers manipulate representations (instructions)
Why bits? Electronic implementation is easy Easy to store with bistable elements
Reliably transmitted on noisy and inaccurate wires
0.0V
0.2V
0.9V
1.1V
0 1 0
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 4
Binary Representations Number representation (base 2)
Represent 1521310 as 111011011011012
Represent 1.5213 X 104 as 1.11011011011012 X 213
Represent 1.2010 as 1.0011001100110011[0011]…2
Instruction representation In AMD64 machine code
RETQ command: C3 (hex) = 110000112
MOV $0, %EAX: b8 (hex) = 101110002 [00000000 … ]
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 5
Terms
Bit: Single binary digit, 0 or 1 Byte: 8 bits.
Smallest unit of memory used in modern computers Nibble (English: small bite): 4 bits
2 nibbles = 1 byte Word: 8-64 bits (1 to 8 bytes)
Depends on machine!
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 6
Encoding Byte Values
Byte = 8 bits, nibble = 4 bits Binary 000000002 to 111111112
Decimal: 010 to 25510
Hexadecimal 0016 to FF16
Two nibblesBase 16 number representationUse characters ‘0’ to ‘9’ and ‘A’ to ‘F’Write FA1D37B16 in C as 0xFA1D37B
» Or 0xfa1d37b
Octal: 08 to 3778
Base 8, Not often usedWritten in C as '0256' (0 is zero)
0 0 00001 1 00012 2 00103 3 00114 4 01005 5 01016 6 01107 7 01118 8 10009 9 1001A 10 1010B 11 1011C 12 1100D 13 1101E 14 1110F 15 1111
HexDecim
al
Binary
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 7
Bit level operations
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8
Boolean Algebra (George Boole, 19th century)
Developed by George Boole in 19th Century Algebraic representation of logic
Encode “True” as 1 and “False” as 0
And A&B = 1 when both A=1 and
B=1
Or A|B = 1 when either A=1 or
B=1
Not ~A = 1 when
A=0
Exclusive-Or (Xor) A^B = 1 when either A=1 or B=1, but not
both
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 9
General Boolean Algebras
Operate on Bit Vectors Operations applied bitwise
All of the Properties of Boolean Algebra Apply
01101001& 01010101 01000001
01101001| 01010101 01111101
01101001^ 01010101 00111100
~ 01010101 10101010 01000001 01111101 00111100 10101010
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 10
Example: Representing & Manipulating Sets
Representation Width w bit vector represents subsets of {0, …, w–1}
aj = 1 if j A∈
01101001 { 0, 3, 5, 6 }
76543210
01010101 { 0, 2, 4, 6 }
76543210
Operations & Intersection 01000001 { 0, 6 }
| Union 01111101 { 0, 2, 3, 4, 5, 6 }
^ Symmetric difference 00111100 { 2, 3, 4, 5 }
~ Complement 10101010 { 1, 3, 5, 7 }
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 11
Bit-Level Operations in C
Operations &, |, ~, ^ Available in C Apply to any “integral” data type
long, int, short, char, unsigned
View arguments as bit vectors
Arguments applied bit-wise
Examples (Char data type) ~0x41 ➙ 0xBE
~010000012 ➙ 101111102
~0x00 ➙ 0xFF ~000000002 ➙ 111111112
0x69 & 0x55 ➙ 0x41 011010012 & 010101012 ➙ 010000012
0x69 | 0x55 ➙ 0x7D 011010012 | 010101012 ➙ 011111012
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 12
Contrast: Logic Operations in C
Contrast to Logical Operators &&, ||, !
View 0 as “False”
Anything nonzero as “True”
Always return 0 or 1
Early termination
Examples (char data type) !0x41 ➙ 0x00 !0x00 ➙ 0x01 !!0x41 ➙ 0x01
0x69 && 0x55 ➙ 0x01 0x69 || 0x55 ➙ 0x01
p && *p (avoids null pointer access)
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 13
Shift Operations
Left Shift: x << y
Shift bit-vector x left y positions (throw extras on left)
Fill with 0’s on right
Right Shift: x >> y
Shift bit-vector x right y positions (throw extras on right)
Logical shift: Fill with 0’s on left
Arithmetic shift: Replicate most significant bit on left
Undefined Behavior
Arithmetic or logical right shift? Up to compiler!
Often arithm. for signed, logical otherwise
Shift amount < 0 or ≥ word size
01100010Argument x
00010000<< 3
00011000Log. >> 2
00011000Arith. >> 2
10100010Argument x
00010000<< 3
00101000Log. >> 2
11101000Arith. >> 2
0001000000010000
0001100000011000
0001100000011000
00010000
00101000
11101000
00010000
00101000
11101000
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 14
Cool Stuff with Xor
void funnyswap(int *x, int *y){ *x = *x ^ *y; /* #1 */ *y = *x ^ *y; /* #2 */ *x = *x ^ *y; /* #3 */}
Bitwise Xor is form of addition
With extra property that every value is its own additive inverse A ^ A = 0
BABegin
BA^B1
(A^B)^B = AA^B2
A(A^B)^A = B3
ABEnd
*y*x
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 15
Some basic representations
C Data Type Typical 32-bit Typical 64-bit x86-64
char 1 1 1
short 2 2 2
int 4 4 4
long 4 8 8
float 4 4 4
double 8 8 8
long double − − 10/16
pointer 4 8 8
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 16
Encoding IntegersEncoding Integers
short int x = 15213; short int y = -15213;
C short 2 bytes long
Sign Bit For 2’s complement, most significant bit indicates sign
0 for nonnegative 1 for negative
B2T (X ) xw1 2w1 xi 2
i
i0
w2
B2U(X ) xi 2i
i0
w1
Unsigned Two’s Complement
SignBit
Decimal Hex Binaryx 15213 3B 6D 00111011 01101101y -15213 C4 93 11000100 10010011
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 17
Encoding Example (Cont.)Encoding Example (Cont.) x = 15213: 00111011 01101101 y = -15213: 11000100 10010011
Weight 15213 -152131 1 1 1 12 0 0 1 24 1 4 0 08 1 8 0 0
16 0 0 1 1632 1 32 0 064 1 64 0 0
128 0 0 1 128256 1 256 0 0512 1 512 0 0
1024 0 0 1 10242048 1 2048 0 04096 1 4096 0 08192 1 8192 0 0
16384 0 0 1 16384-32768 0 0 1 -32768
Sum 15213 -15213
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 18
Numeric RangesNumeric Ranges
Unsigned Values UMin = 0
000…0 UMax = 2w – 1
111…1
Two’s Complement Values TMin = –2w–1
100…0 TMax = 2w–1 – 1
011…1
Other Values Minus 1
111…1
Decimal Hex BinaryUMax 65535 FF FF 11111111 11111111TMax 32767 7F FF 01111111 11111111TMin -32768 80 00 10000000 00000000-1 -1 FF FF 11111111 111111110 0 00 00 00000000 00000000
Values for W = 16
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 19
Values for Different Word Sizes
Observations |TMin | =
TMax + 1 Asymmetric range
UMax = 2 * TMax + 1
C Programming #include <limits.h>
K&R App. B11
Declares constants, e.g., ULONG_MAX
LONG_MAX
LONG_MIN
Values platform-specific
W8 16 32 64
UMax 255 65,535 4,294,967,295 18,446,744,073,709,551,615TMax 127 32,767 2,147,483,647 9,223,372,036,854,775,807TMin -128 -32,768 -2,147,483,648 -9,223,372,036,854,775,808
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 20
Unsigned & Signed Numeric ValuesEquivalence
Same encodings for nonnegative values
Uniqueness Every bit pattern represents
unique integer value Each representable integer has
unique bit encoding
Can Invert Mappings U2B(x) = B2U-1(x) T2B(x) = B2T-1(x)
Bit pattern for two’s comp integer
X B2T(X)B2U(X)0000 00001 10010 20011 30100 40101 50110 60111 7
–88
–79
–610
–511
–412
–313
–214
–115
1000
1001
1010
1011
1100
1101
1110
1111
0
1
2
3
4
5
6
7
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 21
short int x = 15213; unsigned short int ux = (unsigned short) x; short int y = -15213; unsigned short int uy = (unsigned short) y;
Casting Signed to UnsignedCasting Signed to Unsigned
C Allows Conversions from Signed to Unsigned
Resulting Value No change in bit representation Nonnegative values unchanged
ux = 15213 Negative values change into (large) positive values
uy = 50323
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 22
Relation Between Signed & UnsignedRelation Between Signed & Unsigned
uy = y + 2 * 32768= y + 65536
Weight -15213 503231 1 1 1 12 1 2 1 24 0 0 0 08 0 0 0 0
16 1 16 1 1632 0 0 0 064 0 0 0 0
128 1 128 1 128256 0 0 0 0512 0 0 0 0
1024 1 1024 1 10242048 0 0 0 04096 0 0 0 08192 0 0 0 0
16384 1 16384 1 1638432768 1 -32768 1 32768
Sum -15213 50323
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 23
0
TMax
TMin
–1–2
0
UMaxUMax – 1
TMaxTMax + 1
2’s Complement Range
UnsignedRange
Conversion Visualized 2’s Comp. Unsigned
Ordering Inversion
Negative Big Positive
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 24
Signed vs. Unsigned in CSigned vs. Unsigned in CConstants
By default are considered to be signed integers Unsigned if have “U” as suffix
0U, 4294967259U
Casting Explicit casting between signed & unsigned (U2T and T2U)
int tx, ty;unsigned ux, uy;tx = (int) ux;uy = (unsigned) ty;
Implicit casting also occurs via assignments, procedure callstx = ux;uy = ty;
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 25
0 0U == unsigned
-1 0 < signed
-1 0U > unsigned
2147483647 -2147483648 > signed
2147483647U -2147483648 < unsigned
-1 -2 > signed
(unsigned) -1 -2 > unsigned
2147483647 2147483648U < unsigned
2147483647 (int) 2147483648U > signed
Casting SurprisesCasting SurprisesExpression Evaluation
If mix unsigned and signed in single expression, signed values implicitly cast to unsigned
Including comparison operations <, >, ==, <=, >=Examples for W = 32
Constant1 Constant2 Relation Evaluation0 0U-1 0-1 0U2147483647 -2147483648 2147483647U -2147483648 -1 -2 (unsigned)-1 -2 2147483647 2147483648U 2147483647 (int)2147483648U
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 26
Remember!
Bit pattern does not change Bit pattern gets re-interpreted==> Unexpected effects
When mixed signed, unsigned => cast to unsigned
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 27
Sign ExtensionSign Extension
Task: Given w-bit signed integer x Convert it to w+k-bit integer with same value
Rule: Make k copies of sign bit: X = xw–1 ,…, xw–1 , xw–1 , xw–2 ,…, x0
k copies of MSB• • •X
X • • • • • •
• • •
w
wk
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 28
Justification For Sign ExtensionJustification For Sign ExtensionProve Correctness by Induction on k
Induction Step: extending by single bit maintains value
Key observation: –2w–1 = –2w +2w–1
Look at weight of upper bits: X –2w–1 xw–1 X –2w xw–1 + 2w–1 xw–1 = –2w–1 xw–1
- • • •X
X - + • • •
w+1
w
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 29
Sign Extension ExampleSign Extension Example
Converting from smaller to larger integer data type C automatically performs sign extension
short int x = 15213; int ix = (int) x; short int y = -15213; int iy = (int) y;
Decimal Hex Binary
x 15213 3B 6D 00111011 01101101ix 15213 00 00 3B 6D 00000000 00000000 00111011 01101101y -15213 C4 93 11000100 10010011iy -15213 FF FF C4 93 11111111 11111111 11000100 10010011
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 30
Visualizing (Mathematical) Integer Addition
Integer Addition 4-bit integers u, v
Compute true sum Add4(u , v)
Values increase linearly with u and v
Forms planar surface
Add4(u , v)
u
v0 2
46 8
1012
14
0
2
4
6
8
10
1214
0
4
8
12
16
20
24
28
32
Integer Addition
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 31
Unsigned Addition
Standard Addition Function Ignores carry output
Implements Modular Arithmetics = UAddw(u , v) = u + v mod 2w
• • •
• • •
u
v+
• • •u + v
• • •
True Sum: w+1 bits
Operands: w bits
Discard Carry: w bits UAddw(u , v)
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 32
Visualizing Unsigned Addition
0
2w
2w+1
UAdd4(u , v)
u
v
True Sum
Modular Sum
Overflow
Overflow
02
46
810
1214
0
2
4
6
8
10
1214
0
2
4
6
8
10
12
14
16
Wraps Around If true sum ≥ 2w
At most once
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 33
Two’s Complement Addition
TAdd and UAdd have Identical Bit-Level Behavior Signed vs. unsigned addition in C:
int s, t, u, v;s = (int) ((unsigned) u + (unsigned) v);
t = u + v
Will give s == t
• • •
• • •
u
v+
• • •u + v
• • •
True Sum: w+1 bits
Operands: w bits
Discard Carry: w bits TAddw(u , v)
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 34
TAdd Overflow
–2w –1
–2w
0
2w –1–1
2w–1
True Sum
TAdd Result
1 000…0
1 011…1
0 000…0
0 100…0
0 111…1
100…0
000…0
011…1
PosOver
NegOver
Functionality True sum requires w+1
bits
Drop off MSB
Treat remaining bits as 2’s comp. integer
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 35
Visualizing 2’s Complement Addition
Values 4-bit two’s comp.
Range from -8 to +7
Wraps Around If sum 2w–1
Becomes negative
At most once
If sum < –2w–1
Becomes positive
At most once
TAdd4(u , v)
u
v
PosOver
NegOver
-8 -6 -4-2 0
24
6
-8
-6
-4
-2
0
2
46
-8
-6
-4
-2
0
2
4
6
8
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 36
Multiplication
Goal: Computing Product of w-bit numbers x, y Either signed or unsigned
But, exact results can be bigger than w bits Unsigned: up to 2w bits
Result range: 0 ≤ x * y ≤ (2w – 1) 2 = 22w – 2w+1 + 1
Two’s complement min (negative): Up to 2w-1 bits
Result range: x * y ≥ (–2w–1)*(2w–1–1) = –22w–2 + 2w–1
Two’s complement max (positive): Up to 2w bits, but only for (TMinw)2
Result range: x * y ≤ (–2w–1) 2 = 22w–2
So, maintaining exact results… would need to keep expanding word size with each product computed
is done in software, if needed
e.g., by “arbitrary precision” arithmetic packages
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 37
Unsigned Multiplication in C
Standard Multiplication Function Ignores high order w bits
Implements Modular ArithmeticUMultw(u , v) = u · v mod 2w
• • •
• • •
u
v*
• • •u · v
• • •
True Product: 2*w bits
Operands: w bits
Discard w bits: w bitsUMultw(u , v)
• • •
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 38
Signed Multiplication in C
Standard Multiplication Function Ignores high order w bits
Some of which are different for signed vs. unsigned multiplication
Lower bits are the same
• • •
• • •
u
v*
• • •u · v
• • •
True Product: 2*w bits
Operands: w bits
Discard w bits: w bitsTMultw(u , v)
• • •
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 39
Power-of-2 Multiply with Shift
Operation u << k gives u * 2k
Both signed and unsigned
Examples u << 3 == u * 8 (u << 5) – (u << 3) == u * 24 Most machines shift and add faster than multiply
Compiler generates this code automatically
• • •
0 0 1 0 0 0•••
u
2k*u · 2kTrue Product: w+k bits
Operands: w bits
Discard k bits: w bits UMultw(u , 2k)
•••
k
• • • 0 0 0•••
TMultw(u , 2k)0 0 0••••••
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 40
Why Should I Use Unsigned?Why Should I Use Unsigned?Don’t Use Just Because Number Nonzero
C compilers on some machines generate less efficient codeunsigned i; int cnt;for (i = 1; i < cnt; i++) a[i] += a[i-1];
Cast cnt to unsigned
(repeatedly)
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 41
Why Should I Use Unsigned?Why Should I Use Unsigned?Don’t Use Just Because Number Nonzero
C compilers on some machines generate less efficient codeunsigned i; int cnt;for (i = 1; i < cnt; i++) a[i] += a[i-1];
Easy to make mistakesfor (i = cnt-2; i >= 0; i--) a[i] += a[i+1];
What’s the bug?
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 42
Why Should I Use Unsigned?Why Should I Use Unsigned?Don’t Use Just Because Number Nonzero
C compilers on some machines generate less efficient codeunsigned i; int cnt;for (i = 1; i < cnt; i++) a[i] += a[i-1];
Easy to make mistakesfor (i = cnt-2; i >= 0; i--) a[i] += a[i+1];
Loop foreveri never < 0
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 43
Why Should I Use Unsigned?Why Should I Use Unsigned?Don’t Use Just Because Number Nonzero
C compilers on some machines generate less efficient codeunsigned i; int cnt;for (i = 1; i < cnt; i++) a[i] += a[i-1];
Should befor (i = cnt-2; i <cnt; i--) a[i] += a[i+1];
Why does this work?
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 44
Why Should I Use Unsigned?Why Should I Use Unsigned?Don’t Use Just Because Number Nonzero
C compilers on some machines generate less efficient codeunsigned i; int cnt;for (i = 1; i < cnt; i++) a[i] += a[i-1];
Should befor (i = cnt-2; i <cnt; i--) a[i] += a[i+1];
Do Use For: Multiprecision or modular arithmetic Need extra bits' range, right up to limit of word size Represent sets
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 45
C PuzzleC Puzzle Assume machine with 32 bit word size, two’s comp. integers TMin makes a good counterexample in many cases
x < 0 ((x*2) < 0)
ux >= 0
(x & 7) == 7 (x<<30) < 0
ux > -1
x > y -x < -y
x * x >= 0
x > 0 && y > 0 x + y > 0
x >= 0 -x <= 0
x <= 0 -x >= 0
int x = foo();
int y = bar();
unsigned ux = x;
unsigned uy = y;
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 46
C Puzzle AnswersC Puzzle Answers
x < 0 ((x*2) < 0) False: TMin
ux >= 0 True: 0 = UMin
(x & 7) == 7 (x<<30) < 0 True: x1 = 1
ux > -1 False: 0
x > y -x < -y False: -1, TMin
x * x >= 0 False: x=65535
x > 0 && y > 0 x + y > 0 False: TMax, TMax
x >= 0 -x <= 0 True: –TMax < 0
x <= 0 -x >= 0 False: TMin
Assume machine with 32 bit word size, two’s comp. integers TMin makes a good counterexample in many cases
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 47
Security Example: Impact of (un)signed
Similar to code found in FreeBSD’s implementation of getpeername There are legions of smart people trying to find vulnerabilities in
programs
/* Kernel memory region holding user-accessible data */#define KSIZE 1024char kbuf[KSIZE];
/* Copy at most maxlen bytes from kernel region to user buffer */int copy_from_kernel(void *user_dest, int maxlen) { /* Byte count len is minimum of buffer size and maxlen */ int len = KSIZE < maxlen ? KSIZE : maxlen; memcpy(user_dest, kbuf, len); return len;}
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 48
Typical Usage
/* Kernel memory region holding user-accessible data */#define KSIZE 1024char kbuf[KSIZE];
/* Copy at most maxlen bytes from kernel region to user buffer */int copy_from_kernel(void *user_dest, int maxlen) { /* Byte count len is minimum of buffer size and maxlen */ int len = KSIZE < maxlen ? KSIZE : maxlen; memcpy(user_dest, kbuf, len); return len;}
#define MSIZE 528
void getstuff() { char mybuf[MSIZE]; copy_from_kernel(mybuf, MSIZE); printf(“%s\n”, mybuf);}
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 49
Malicious Usage
/* Kernel memory region holding user-accessible data */#define KSIZE 1024char kbuf[KSIZE];
/* Copy at most maxlen bytes from kernel region to user buffer */int copy_from_kernel(void *user_dest, int maxlen) { /* Byte count len is minimum of buffer size and maxlen */ int len = KSIZE < maxlen ? KSIZE : maxlen; memcpy(user_dest, kbuf, len); return len;}
#define MSIZE 528
void getstuff() { char mybuf[MSIZE]; copy_from_kernel(mybuf, -MSIZE); . . .}
/* Declaration of library function memcpy */void *memcpy(void *dest, void *src, size_t n);
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 50
Code Security Example #2
SUN XDR library Widely used library for transferring data between machines
void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size);
ele_src
malloc(ele_cnt * ele_size)
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 51
XDR Code
void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size) { /* * Allocate buffer for ele_cnt objects, each of ele_size bytes * and copy from locations designated by ele_src */ void *result = malloc(ele_cnt * ele_size); if (result == NULL)
/* malloc failed */return NULL;
void *next = result; int i; for (i = 0; i < ele_cnt; i++) { /* Copy object i to destination */ memcpy(next, ele_src[i], ele_size);
/* Move pointer to next memory region */next += ele_size;
} return result;}
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 52
XDR Vulnerability
What if: ele_cnt = 220 + 1
ele_size = 4096 = 212
Allocation on 32 bits? What happens to the ‘next’ pointer?
How can I make this function secure?
malloc(ele_cnt * ele_size)
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 53
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 54
Machine-Level Code Representation
Encode Program as Sequence of Instructions Arithmetic, logical, math operations Read or write memory Conditional branches, jumps
Different machines, different instructions Most code not binary compatible Alpha’s, Sun’s, ARM (tablets) use fix-length instructions
Reduced Instruction Set Computer (RISC) PC’s use variable length instructions
Complex Instruction Set Computer (CISC)
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 55
Representing Instructionsint sum(int x, int y){ return x+y;}
Different CPUs use totally different instructions and encodings
00
00
30
42
Alpha sum
01
80
FA
6B
E0
08
81
C3
Sun sum
90
02
00
09
For this example, Alpha & Sun use two 4-byte instructions Use differing numbers of instructions
in other cases
PC uses 7 instructions with lengths 1, 2, and 3 bytes Same for NT and for Linux
E5
8B
55
89
PC sum
45
0C
03
45
08
89
EC
5D
C3
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 56
Machines have Words
Imprecise definitions Nominal size of integer-valued data Sometimes size of address, memory bus width
Current desktop machines are 64 bits (8 bytes) Potentially address 1.8 X 1019 bytes 32-bit machines phasing out (but in phones, tablets)
Low-end use 8- or 16-bit words Machines support multiple data formats
Fractions or multiples of word size Always integral number of bytes
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 57
Byte-Oriented Memory Organization Programs Refer to Virtual Addresses
Conceptually very large array of bytes Implemented with hierarchy of different memory types
SRAM, DRAM, disk In Unix and Windows NT, address space private to “process”
Program can clobber its own data, but not that of others You will see this again, in much more detail
Compiler + Run-Time System Control Allocation Where different program objects should be stored Multiple mechanisms: static, stack, and heap In any case, all allocation within single virtual address space You will see this again, in much more detail
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 58
Word-Oriented Memory Organization
Addresses Specify Byte Locations Address of first byte in word Addresses of successive
words differ by 4 (32-bit) or 8 (64-bit)
0000
0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
32-bitWords
Bytes Addr.
0012
0013
0014
0015
64-bitWords
Addr =??
Addr =??
Addr =??
Addr =??
Addr =??
Addr =??
0000
0004
0008
0012
0000
0008
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 59
Byte Ordering
How should bytes within multi-byte word be ordered in memory?
Conventions Sun’s, Mac’s are “Big Endian” machines
Least significant byte has highest address Alphas, PC’s are “Little Endian” machines
Least significant byte has lowest address
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 60
Byte Ordering Example
Big Endian Least significant byte has highest address
Little Endian Least significant byte has lowest address
Example Variable x has 4-byte representation 0x01234567 Address given by &x is 0x100
0x100 0x101 0x102 0x103
01 23 45 67
0x100 0x101 0x102 0x103
67 45 23 01
Big Endian
Little Endian
01 23 45 67
67 45 23 01
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 61
Reading Byte-Reversed Listings
Disassembly Text representation of binary machine code Generated by program that reads the machine code
Example Fragment
Address Instruction Code Assembly Rendition 8048365: 5b pop %ebx 8048366: 81 c3 ab 12 00 00 add $0x12ab,%ebx 804836c: 83 bb 28 00 00 00 00 cmpl $0x0,0x28(%ebx)
Deciphering Numbers Value: 0x12ab
Pad to 4 bytes: 0x000012ab
Split into bytes: 00 00 12 ab
Reverse: ab 12 00 00
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 62
Examining Data Representations
Code to Print Byte Representation of Data Casting pointer to unsigned char * creates byte array
typedef unsigned char *pointer;
void show_bytes(pointer start, int len){ int i; for (i = 0; i < len; i++) printf("0x%p\t0x%.2x\n", start+i, start[i]); printf("\n");}
Printf directives:%p: Print pointer%x: Print Hexadecimal
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 63
show_bytes Execution Example
int a = 15213;
printf("int a = 15213;\n");
show_bytes((pointer) &a, sizeof(int));
Result (Linux):
int a = 15213;
0x11ffffcb8 0x6d
0x11ffffcb9 0x3b
0x11ffffcba 0x00
0x11ffffcbb 0x00
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 64
Representing Integersint A = 15213;int B = -15213;long int C = 15213;
Decimal: 15213
Binary: 0011 1011 0110 1101
Hex: 3 B 6 D
6D
3B
00
00
Alpha A
3B
6D
00
00
Sun A
93
C4
FF
FF
Alpha B
C4
93
FF
FF
Sun B
Two’s complement representation
00
00
00
00
6D
3B
00
00
Alpha C
3B
6D
00
00
Sun C
6D
3B
00
00
ia32 C
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 65
Representing Pointersint B = -15213;int *P = &B;
Alpha Address
Hex: ... 0 1 F F F F F C A 0
Binary: 0000 0001 1111 1111 1111 1111 1111 1100 1010 0000
01
00
00
00
A0
FC
FF
FF
Alpha P
Sun Address
Hex: E F F F F B 2 C Binary: 1110 1111 1111 1111 1111 1011 0010 1100
Different compilers & machines assign different locations to objects
FB
2C
EF
FF
Sun P
FF
BF
D4
F8
Linux PLinux Address
Hex: B F F F F 8 D 4 Binary: 1011 1111 1111 1111 1111 1000 1101 0100
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 66
Characters and Strings
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 67
Representing a character Single byte sufficient for small codes
In C: char c; But interpretation of value depends on chosen code Fixed-length code:
Baudot, ITA2, USTTY: 5 bits BCD (binary coded decimal, used in early computers): 6 bits ASCII – defines 128 characters (7 bits) ISO 8859, EBCDIC: 8 bits
Variable-length codes UTF-8: 8-32 bits UTF-16: 16-32 bits
Wide chars: chars more than 8 bits.
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 68
Example: ASCII
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 69
Representing Strings (in C) Represented by array of 1-byte characters
Each character encoded in ASCII formatStandard 7-bit encoding of character setCharacter “0” has code 0x30
» Digit i has code 0x30+i Other encodings exist (e.g., for Hebrew) String are null-terminated
Final character = 0x00Array length must be strlen()+1.
Alpha S Sun S
32
31
31
35
33
00
32
31
31
35
33
00
char S[6] = "15213";
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 70
Representing Strings (in C) Represented by array of 1-byte characters
Each character encoded in ASCII formatStandard 7-bit encoding of character setCharacter “0” has code 0x30
» Digit i has code 0x30+i Other encodings exist (e.g., for Hebrew) String are null-terminated
Final character = 0x00Array length must be strlen()+1.
Text files generally platform independent But: Different conventions of line termination character(s)! But: Different default encodings (depend on locale)
Alpha S Sun S
32
31
31
35
33
00
32
31
31
35
33
00
char S[6] = "15213";
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 71
Representing Strings (Cont'd) Strings in Pascal (a.k.a P-Strings)
Represented by array of characters+1 First cell holds length, no need for null-termination Length known in O(1)!
Think about strcat(), strlen(), strcpy(), ... Size of string limited by size of array cell value
p-String
35
32
5
31
31
33
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 72
Representing Strings (Cont'd) Strings in Pascal (a.k.a P-Strings)
Represented by array of characters+1 First cell holds length, no need for null-termination Length known in O(1)!
Think about strcat(), strlen(), strcpy(), ... Size of string limited by size of array cell value
c-strings and p-strings can be used with wide chars, but: big/little endian now matters (each cell multiple bytes) No longer built-in, instead platform/library specific
p-String
35
32
5
31
31
33
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 73
Main Points
Boolean Algebra is Mathematical Basis Basic form encodes “false” as 0, “true” as 1 General form like bit-level operations in C Good for representing & manipulating sets
It’s All About Bits & Bytes Numbers, text, programs: They are all collections of bits!
Different Machines/Languages: Different Conventions Representations Word size Byte ordering
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 74
שאלות?
Computer Organization:A Programmer's Perspective
Bits, Bytes, Nibbles, Words and Strings
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 2
Topics Why bits? Why 0/1? Basic terms: Bits, Bytes, Nibbles, Words Representing information as bits
Characters and strings Instructions (more on this when we look at assembly) Numbers
Bit-level manipulations Boolean algebra Expressing in C
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 3
Everything is bits
Each bit is 0 or 1 By encoding/interpreting sets of bits in various ways
Computers represent numbers, sets, strings, etc…
Computers manipulate representations (instructions)
Why bits? Electronic implementation is easy Easy to store with bistable elements
Reliably transmitted on noisy and inaccurate wires
0.0V
0.2V
0.9V
1.1V
0 1 0
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 4
Binary Representations Number representation (base 2)
Represent 1521310 as 111011011011012
Represent 1.5213 X 104 as 1.11011011011012 X 213
Represent 1.2010 as 1.0011001100110011[0011]…2
Instruction representation In AMD64 machine code
RETQ command: C3 (hex) = 110000112
MOV $0, %EAX: b8 (hex) = 101110002 [00000000 … ]
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 5
Terms
Bit: Single binary digit, 0 or 1 Byte: 8 bits.
Smallest unit of memory used in modern computers Nibble (English: small bite): 4 bits
2 nibbles = 1 byte Word: 8-64 bits (1 to 8 bytes)
Depends on machine!
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 6
Encoding Byte Values
Byte = 8 bits, nibble = 4 bits Binary 000000002 to 111111112
Decimal: 010 to 25510
Hexadecimal 0016 to FF16
Two nibblesBase 16 number representationUse characters ‘0’ to ‘9’ and ‘A’ to ‘F’Write FA1D37B16 in C as 0xFA1D37B
» Or 0xfa1d37b
Octal: 08 to 3778
Base 8, Not often usedWritten in C as '0256' (0 is zero)
0 0 00001 1 00012 2 00103 3 00114 4 01005 5 01016 6 01107 7 01118 8 10009 9 1001A 10 1010B 11 1011C 12 1100D 13 1101E 14 1110F 15 1111
HexDecim
al
Binary
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 7
Bit level operations
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 8
Boolean Algebra (George Boole, 19th century)
Developed by George Boole in 19th Century Algebraic representation of logic
Encode “True” as 1 and “False” as 0
And A&B = 1 when both A=1 and
B=1
Or A|B = 1 when either A=1 or
B=1
Not ~A = 1 when
A=0
Exclusive-Or (Xor) A^B = 1 when either A=1 or B=1, but not
both
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 9
General Boolean Algebras
Operate on Bit Vectors Operations applied bitwise
All of the Properties of Boolean Algebra Apply
01101001& 01010101 01000001
01101001| 01010101 01111101
01101001^ 01010101 00111100
~ 01010101 10101010 01000001 01111101 00111100 10101010
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 10
Example: Representing & Manipulating Sets
Representation Width w bit vector represents subsets of {0, …, w–1}
aj = 1 if j A∈
01101001 { 0, 3, 5, 6 }
76543210
01010101 { 0, 2, 4, 6 }
76543210
Operations & Intersection 01000001 { 0, 6 }
| Union 01111101 { 0, 2, 3, 4, 5, 6 }
^ Symmetric difference 00111100 { 2, 3, 4, 5 }
~ Complement 10101010 { 1, 3, 5, 7 }
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 11
Bit-Level Operations in C
Operations &, |, ~, ^ Available in C Apply to any “integral” data type
long, int, short, char, unsigned
View arguments as bit vectors
Arguments applied bit-wise
Examples (Char data type) ~0x41 ➙ 0xBE
~010000012 ➙ 101111102
~0x00 ➙ 0xFF ~000000002 ➙ 111111112
0x69 & 0x55 ➙ 0x41 011010012 & 010101012 ➙ 010000012
0x69 | 0x55 ➙ 0x7D 011010012 | 010101012 ➙ 011111012
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 12
Contrast: Logic Operations in C
Contrast to Logical Operators &&, ||, !
View 0 as “False”
Anything nonzero as “True”
Always return 0 or 1
Early termination
Examples (char data type) !0x41 ➙ 0x00 !0x00 ➙ 0x01 !!0x41 ➙ 0x01
0x69 && 0x55 ➙ 0x01 0x69 || 0x55 ➙ 0x01
p && *p (avoids null pointer access)
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 13
Shift Operations
Left Shift: x << y Shift bit-vector x left y positions (throw extras on left)
Fill with 0’s on right
Right Shift: x >> y Shift bit-vector x right y positions (throw extras on right)
Logical shift: Fill with 0’s on left
Arithmetic shift: Replicate most significant bit on left
Undefined Behavior
Arithmetic or logical right shift? Up to compiler!
Often arithm. for signed, logical otherwise
Shift amount < 0 or ≥ word size
01100010Argument x
00010000<< 3
00011000Log. >> 2
00011000Arith. >> 2
10100010Argument x
00010000<< 3
00101000Log. >> 2
11101000Arith. >> 2
0001000000010000
0001100000011000
0001100000011000
00010000
00101000
11101000
00010000
00101000
11101000
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 14
Cool Stuff with Xor
void funnyswap(int *x, int *y){ *x = *x ^ *y; /* #1 */ *y = *x ^ *y; /* #2 */ *x = *x ^ *y; /* #3 */}
Bitwise Xor is form of addition
With extra property that every value is its own additive inverse A ^ A = 0
BABegin
BA^B1
(A^B)^B = AA^B2
A(A^B)^A = B3
ABEnd
*y*x
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 15
Some basic representations
C Data Type Typical 32-bit Typical 64-bit x86-64
char 1 1 1
short 2 2 2
int 4 4 4
long 4 8 8
float 4 4 4
double 8 8 8
long double − − 10/16
pointer 4 8 8
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 16
Encoding IntegersEncoding Integers
short int x = 15213; short int y = -15213;
C short 2 bytes long
Sign Bit For 2’s complement, most significant bit indicates sign
0 for nonnegative 1 for negative
B2T (X ) xw1 2w1 xi 2
i
i0
w2
B2U(X ) xi 2i
i0
w1
Unsigned Two’s Complement
SignBit
Decimal Hex Binaryx 15213 3B 6D 00111011 01101101y -15213 C4 93 11000100 10010011
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 17
Encoding Example (Cont.)Encoding Example (Cont.) x = 15213: 00111011 01101101 y = -15213: 11000100 10010011
Weight 15213 -152131 1 1 1 12 0 0 1 24 1 4 0 08 1 8 0 0
16 0 0 1 1632 1 32 0 064 1 64 0 0
128 0 0 1 128256 1 256 0 0512 1 512 0 0
1024 0 0 1 10242048 1 2048 0 04096 1 4096 0 08192 1 8192 0 0
16384 0 0 1 16384-32768 0 0 1 -32768
Sum 15213 -15213
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 18
Numeric RangesNumeric Ranges
Unsigned Values UMin = 0
000…0 UMax = 2w – 1
111…1
Two’s Complement Values TMin = –2w–1
100…0 TMax = 2w–1 – 1
011…1
Other Values Minus 1
111…1
Decimal Hex BinaryUMax 65535 FF FF 11111111 11111111TMax 32767 7F FF 01111111 11111111TMin -32768 80 00 10000000 00000000-1 -1 FF FF 11111111 111111110 0 00 00 00000000 00000000
Values for W = 16
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 19
Values for Different Word Sizes
Observations |TMin | =
TMax + 1 Asymmetric range
UMax = 2 * TMax + 1
C Programming #include <limits.h>
K&R App. B11
Declares constants, e.g., ULONG_MAX LONG_MAX LONG_MIN
Values platform-specific
W8 16 32 64
UMax 255 65,535 4,294,967,295 18,446,744,073,709,551,615TMax 127 32,767 2,147,483,647 9,223,372,036,854,775,807TMin -128 -32,768 -2,147,483,648 -9,223,372,036,854,775,808
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 20
Unsigned & Signed Numeric ValuesEquivalence
Same encodings for nonnegative values
Uniqueness Every bit pattern represents
unique integer value Each representable integer has
unique bit encoding
Can Invert Mappings U2B(x) = B2U-1(x) T2B(x) = B2T-1(x)
Bit pattern for two’s comp integer
X B2T(X)B2U(X)0000 00001 10010 20011 30100 40101 50110 60111 7
–88
–79
–610
–511
–412
–313
–214
–115
1000
1001
1010
1011
1100
1101
1110
1111
0
1
2
3
4
5
6
7
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 21
short int x = 15213; unsigned short int ux = (unsigned short) x; short int y = -15213; unsigned short int uy = (unsigned short) y;
Casting Signed to UnsignedCasting Signed to Unsigned
C Allows Conversions from Signed to Unsigned
Resulting Value No change in bit representation Nonnegative values unchanged
ux = 15213 Negative values change into (large) positive values
uy = 50323
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 22
Relation Between Signed & UnsignedRelation Between Signed & Unsigned
uy = y + 2 * 32768= y + 65536
Weight -15213 503231 1 1 1 12 1 2 1 24 0 0 0 08 0 0 0 0
16 1 16 1 1632 0 0 0 064 0 0 0 0
128 1 128 1 128256 0 0 0 0512 0 0 0 0
1024 1 1024 1 10242048 0 0 0 04096 0 0 0 08192 0 0 0 0
16384 1 16384 1 1638432768 1 -32768 1 32768
Sum -15213 50323
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 23
0
TMax
TMin
–1–2
0
UMaxUMax – 1
TMaxTMax + 1
2’s Complement Range
UnsignedRange
Conversion Visualized 2’s Comp. Unsigned
Ordering Inversion
Negative Big Positive
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 24
Signed vs. Unsigned in CSigned vs. Unsigned in CConstants
By default are considered to be signed integers Unsigned if have “U” as suffix
0U, 4294967259U
Casting Explicit casting between signed & unsigned (U2T and T2U)
int tx, ty;unsigned ux, uy;tx = (int) ux;uy = (unsigned) ty;
Implicit casting also occurs via assignments, procedure callstx = ux;uy = ty;
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 25
0 0U == unsigned
-1 0 < signed
-1 0U > unsigned
2147483647 -2147483648 > signed
2147483647U -2147483648 < unsigned
-1 -2 > signed
(unsigned) -1 -2 > unsigned
2147483647 2147483648U < unsigned
2147483647 (int) 2147483648U > signed
Casting SurprisesCasting SurprisesExpression Evaluation
If mix unsigned and signed in single expression, signed values implicitly cast to unsigned
Including comparison operations <, >, ==, <=, >=Examples for W = 32
Constant1 Constant2 Relation Evaluation0 0U-1 0
-1 0U2147483647 -2147483648 2147483647U -2147483648
-1 -2 (unsigned)-1 -2
2147483647 2147483648U 2147483647 (int)2147483648U
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 26
Remember!
Bit pattern does not change Bit pattern gets re-interpreted==> Unexpected effects
When mixed signed, unsigned => cast to unsigned
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 27
Sign ExtensionSign Extension
Task: Given w-bit signed integer x Convert it to w+k-bit integer with same value
Rule: Make k copies of sign bit: X = xw–1 ,…, xw–1 , xw–1 , xw–2 ,…, x0
k copies of MSB• • •X
X • • • • • •
• • •
w
wk
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 28
Justification For Sign ExtensionJustification For Sign ExtensionProve Correctness by Induction on k
Induction Step: extending by single bit maintains value
Key observation: –2w–1 = –2w +2w–1
Look at weight of upper bits: X –2w–1 xw–1 X –2w xw–1 + 2w–1 xw–1 = –2w–1 xw–1
- • • •X
X - + • • •
w+1
w
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 29
Sign Extension ExampleSign Extension Example
Converting from smaller to larger integer data type C automatically performs sign extension
short int x = 15213; int ix = (int) x; short int y = -15213; int iy = (int) y;
Decimal Hex Binary
x 15213 3B 6D 00111011 01101101ix 15213 00 00 3B 6D 00000000 00000000 00111011 01101101y -15213 C4 93 11000100 10010011iy -15213 FF FF C4 93 11111111 11111111 11000100 10010011
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 30
Visualizing (Mathematical) Integer Addition
Integer Addition 4-bit integers u, v
Compute true sum Add4(u , v)
Values increase linearly with u and v
Forms planar surface
Add4(u , v)
u
v0
2 46
810
1214
0
2
4
6
8
10
1214
0
4
8
12
16
20
24
28
32
Integer Addition
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 31
Unsigned Addition
Standard Addition Function Ignores carry output
Implements Modular Arithmetics = UAddw(u , v) = u + v mod 2w
• • •
• • •
u
v+
• • •u + v
• • •
True Sum: w+1 bits
Operands: w bits
Discard Carry: w bits UAddw(u , v)
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 32
Visualizing Unsigned Addition
0
2w
2w+1
UAdd4(u , v)
u
v
True Sum
Modular Sum
Overflow
Overflow
0 24
6 810
1214
0
2
4
6
8
10
1214
0
2
4
6
8
10
12
14
16
Wraps Around If true sum ≥ 2w
At most once
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 33
Two’s Complement Addition
TAdd and UAdd have Identical Bit-Level Behavior Signed vs. unsigned addition in C:
int s, t, u, v;s = (int) ((unsigned) u + (unsigned) v);
t = u + v
Will give s == t
• • •
• • •
u
v+
• • •u + v
• • •
True Sum: w+1 bits
Operands: w bits
Discard Carry: w bits TAddw(u , v)
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 34
TAdd Overflow
–2w –1
–2w
0
2w –1–1
2w–1
True Sum
TAdd Result
1 000…0
1 011…1
0 000…0
0 100…0
0 111…1
100…0
000…0
011…1
PosOver
NegOver
Functionality True sum requires w+1
bits
Drop off MSB
Treat remaining bits as 2’s comp. integer
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 35
Visualizing 2’s Complement Addition
Values 4-bit two’s comp.
Range from -8 to +7
Wraps Around If sum 2w–1
Becomes negative
At most once
If sum < –2w–1
Becomes positive
At most once
TAdd4(u , v)
u
v
PosOver
NegOver
-8 -6 -4-2 0
24
6
-8
-6
-4
-2
0
2
46
-8
-6
-4
-2
0
2
4
6
8
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 36
Multiplication
Goal: Computing Product of w-bit numbers x, y Either signed or unsigned
But, exact results can be bigger than w bits Unsigned: up to 2w bits
Result range: 0 ≤ x * y ≤ (2w – 1) 2 = 22w – 2w+1 + 1
Two’s complement min (negative): Up to 2w-1 bits
Result range: x * y ≥ (–2w–1)*(2w–1–1) = –22w–2 + 2w–1
Two’s complement max (positive): Up to 2w bits, but only for (TMinw)2
Result range: x * y ≤ (–2w–1) 2 = 22w–2
So, maintaining exact results… would need to keep expanding word size with each product computed
is done in software, if needed
e.g., by “arbitrary precision” arithmetic packages
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 37
Unsigned Multiplication in C
Standard Multiplication Function Ignores high order w bits
Implements Modular ArithmeticUMultw(u , v) = u · v mod 2w
• • •
• • •
u
v*
• • •u · v
• • •
True Product: 2*w bits
Operands: w bits
Discard w bits: w bitsUMultw(u , v)
• • •
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 38
Signed Multiplication in C
Standard Multiplication Function Ignores high order w bits
Some of which are different for signed vs. unsigned multiplication
Lower bits are the same
• • •
• • •
u
v*
• • •u · v
• • •
True Product: 2*w bits
Operands: w bits
Discard w bits: w bitsTMultw(u , v)
• • •
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 39
Power-of-2 Multiply with Shift
Operation u << k gives u * 2k
Both signed and unsigned
Examples u << 3 == u * 8 (u << 5) – (u << 3) == u * 24 Most machines shift and add faster than multiply
Compiler generates this code automatically
• • •
0 0 1 0 0 0•••
u
2k*u · 2kTrue Product: w+k bits
Operands: w bits
Discard k bits: w bits UMultw(u , 2k)
•••
k
• • • 0 0 0•••
TMultw(u , 2k)0 0 0••••••
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 40
Why Should I Use Unsigned?Why Should I Use Unsigned?Don’t Use Just Because Number Nonzero
C compilers on some machines generate less efficient codeunsigned i; int cnt;for (i = 1; i < cnt; i++) a[i] += a[i-1];
Cast cnt to unsigned
(repeatedly)
First code bad because may cause repeated casting of cnt to unsigned with every passage of loop
Second code will loop forever. When unsigned i=0, do i-- which is a big number
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 41
Why Should I Use Unsigned?Why Should I Use Unsigned?Don’t Use Just Because Number Nonzero
C compilers on some machines generate less efficient codeunsigned i; int cnt;for (i = 1; i < cnt; i++) a[i] += a[i-1];
Easy to make mistakesfor (i = cnt-2; i >= 0; i--) a[i] += a[i+1];
What’s the bug?
First code bad because may cause repeated casting of cnt to unsigned with every passage of loop
Second code will loop forever. When unsigned i=0, do i-- which is a big number
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 42
Why Should I Use Unsigned?Why Should I Use Unsigned?Don’t Use Just Because Number Nonzero
C compilers on some machines generate less efficient codeunsigned i; int cnt;for (i = 1; i < cnt; i++) a[i] += a[i-1];
Easy to make mistakesfor (i = cnt-2; i >= 0; i--) a[i] += a[i+1];
Loop foreveri never < 0
First code bad because may cause repeated casting of cnt to unsigned with every passage of loop
Second code will loop forever. When unsigned i=0, do i-- which is a big number
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 43
Why Should I Use Unsigned?Why Should I Use Unsigned?Don’t Use Just Because Number Nonzero
C compilers on some machines generate less efficient codeunsigned i; int cnt;for (i = 1; i < cnt; i++) a[i] += a[i-1];
Should befor (i = cnt-2; i <cnt; i--) a[i] += a[i+1];
Why does this work?
First code bad because may cause repeated casting of cnt to unsigned with every passage of loop
Second code will loop forever. When unsigned i=0, do i-- which is a big number
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 44
Why Should I Use Unsigned?Why Should I Use Unsigned?Don’t Use Just Because Number Nonzero
C compilers on some machines generate less efficient codeunsigned i; int cnt;for (i = 1; i < cnt; i++) a[i] += a[i-1];
Should befor (i = cnt-2; i <cnt; i--) a[i] += a[i+1];
Do Use For: Multiprecision or modular arithmetic Need extra bits' range, right up to limit of word size Represent sets
First code bad because may cause repeated casting of cnt to unsigned with every passage of loop
Second code will loop forever. When unsigned i=0, do i-- which is a big number
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 45
C PuzzleC Puzzle Assume machine with 32 bit word size, two’s comp. integers TMin makes a good counterexample in many cases
x < 0 ((x*2) < 0)
ux >= 0
(x & 7) == 7 (x<<30) < 0
ux > -1
x > y -x < -y
x * x >= 0
x > 0 && y > 0 x + y > 0
x >= 0 -x <= 0
x <= 0 -x >= 0
int x = foo();
int y = bar();
unsigned ux = x;
unsigned uy = y;
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 46
C Puzzle AnswersC Puzzle Answers
x < 0 ((x*2) < 0) False: TMin
ux >= 0 True: 0 = UMin
(x & 7) == 7 (x<<30) < 0 True: x1 = 1
ux > -1 False: 0
x > y -x < -y False: -1, TMin
x * x >= 0 False: x=65535
x > 0 && y > 0 x + y > 0 False: TMax, TMax
x >= 0 -x <= 0 True: –TMax < 0
x <= 0 -x >= 0 False: TMin
Assume machine with 32 bit word size, two’s comp. integers TMin makes a good counterexample in many cases
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 47
Security Example: Impact of (un)signed
Similar to code found in FreeBSD’s implementation of getpeername There are legions of smart people trying to find vulnerabilities in
programs
/* Kernel memory region holding user-accessible data */#define KSIZE 1024char kbuf[KSIZE];
/* Copy at most maxlen bytes from kernel region to user buffer */int copy_from_kernel(void *user_dest, int maxlen) { /* Byte count len is minimum of buffer size and maxlen */ int len = KSIZE < maxlen ? KSIZE : maxlen; memcpy(user_dest, kbuf, len); return len;}
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 48
Typical Usage
/* Kernel memory region holding user-accessible data */#define KSIZE 1024char kbuf[KSIZE];
/* Copy at most maxlen bytes from kernel region to user buffer */int copy_from_kernel(void *user_dest, int maxlen) { /* Byte count len is minimum of buffer size and maxlen */ int len = KSIZE < maxlen ? KSIZE : maxlen; memcpy(user_dest, kbuf, len); return len;}
#define MSIZE 528
void getstuff() { char mybuf[MSIZE]; copy_from_kernel(mybuf, MSIZE); printf(“%s\n”, mybuf);}
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 49
Malicious Usage
/* Kernel memory region holding user-accessible data */#define KSIZE 1024char kbuf[KSIZE];
/* Copy at most maxlen bytes from kernel region to user buffer */int copy_from_kernel(void *user_dest, int maxlen) { /* Byte count len is minimum of buffer size and maxlen */ int len = KSIZE < maxlen ? KSIZE : maxlen; memcpy(user_dest, kbuf, len); return len;}
#define MSIZE 528
void getstuff() { char mybuf[MSIZE]; copy_from_kernel(mybuf, -MSIZE); . . .}
/* Declaration of library function memcpy */void *memcpy(void *dest, void *src, size_t n);
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 50
Code Security Example #2
SUN XDR library Widely used library for transferring data between machines
void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size);
ele_src
malloc(ele_cnt * ele_size)
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 51
XDR Code
void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size) { /* * Allocate buffer for ele_cnt objects, each of ele_size bytes * and copy from locations designated by ele_src */ void *result = malloc(ele_cnt * ele_size); if (result == NULL)
/* malloc failed */return NULL;
void *next = result; int i; for (i = 0; i < ele_cnt; i++) { /* Copy object i to destination */ memcpy(next, ele_src[i], ele_size);
/* Move pointer to next memory region */next += ele_size;
} return result;}
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 52
XDR Vulnerability
What if: ele_cnt = 220 + 1
ele_size = 4096 = 212
Allocation on 32 bits? What happens to the ‘next’ pointer?
How can I make this function secure?
malloc(ele_cnt * ele_size)
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 53
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 54
Machine-Level Code Representation
Encode Program as Sequence of Instructions Arithmetic, logical, math operations Read or write memory Conditional branches, jumps
Different machines, different instructions Most code not binary compatible Alpha’s, Sun’s, ARM (tablets) use fix-length instructions
Reduced Instruction Set Computer (RISC) PC’s use variable length instructions
Complex Instruction Set Computer (CISC)
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 55
Representing Instructionsint sum(int x, int y){ return x+y;}
Different CPUs use totally different instructions and encodings
00
00
30
42
Alpha sum
01
80
FA
6B
E0
08
81
C3
Sun sum
90
02
00
09
For this example, Alpha & Sun use two 4-byte instructions Use differing numbers of instructions
in other cases
PC uses 7 instructions with lengths 1, 2, and 3 bytes Same for NT and for Linux
E5
8B
55
89
PC sum
45
0C
03
45
08
89
EC
5D
C3
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 56
Machines have Words
Imprecise definitions Nominal size of integer-valued data Sometimes size of address, memory bus width
Current desktop machines are 64 bits (8 bytes) Potentially address 1.8 X 1019 bytes 32-bit machines phasing out (but in phones, tablets)
Low-end use 8- or 16-bit words Machines support multiple data formats
Fractions or multiples of word size Always integral number of bytes
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 57
Byte-Oriented Memory Organization Programs Refer to Virtual Addresses
Conceptually very large array of bytes Implemented with hierarchy of different memory types
SRAM, DRAM, disk In Unix and Windows NT, address space private to “process”
Program can clobber its own data, but not that of others You will see this again, in much more detail
Compiler + Run-Time System Control Allocation Where different program objects should be stored Multiple mechanisms: static, stack, and heap In any case, all allocation within single virtual address space You will see this again, in much more detail
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 58
Word-Oriented Memory Organization
Addresses Specify Byte Locations Address of first byte in word Addresses of successive
words differ by 4 (32-bit) or 8 (64-bit)
0000
0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
32-bitWords
Bytes Addr.
0012
0013
0014
0015
64-bitWords
Addr =??
Addr =??
Addr =??
Addr =??
Addr =??
Addr =??
0000
0004
0008
0012
0000
0008
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 59
Byte Ordering
How should bytes within multi-byte word be ordered in memory?
Conventions Sun’s, Mac’s are “Big Endian” machines
Least significant byte has highest address Alphas, PC’s are “Little Endian” machines
Least significant byte has lowest address
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 60
Byte Ordering Example
Big Endian Least significant byte has highest address
Little Endian Least significant byte has lowest address
Example Variable x has 4-byte representation 0x01234567 Address given by &x is 0x100
0x100 0x101 0x102 0x103
01 23 45 67
0x100 0x101 0x102 0x103
67 45 23 01
Big Endian
Little Endian
01 23 45 67
67 45 23 01
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 61
Reading Byte-Reversed Listings
Disassembly Text representation of binary machine code Generated by program that reads the machine code
Example Fragment
Address Instruction Code Assembly Rendition 8048365: 5b pop %ebx 8048366: 81 c3 ab 12 00 00 add $0x12ab,%ebx 804836c: 83 bb 28 00 00 00 00 cmpl $0x0,0x28(%ebx)
Deciphering Numbers Value: 0x12ab
Pad to 4 bytes: 0x000012ab
Split into bytes: 00 00 12 ab
Reverse: ab 12 00 00
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 62
Examining Data Representations
Code to Print Byte Representation of Data Casting pointer to unsigned char * creates byte array
typedef unsigned char *pointer;
void show_bytes(pointer start, int len){ int i; for (i = 0; i < len; i++) printf("0x%p\t0x%.2x\n", start+i, start[i]); printf("\n");}
Printf directives:%p: Print pointer%x: Print Hexadecimal
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 63
show_bytes Execution Example
int a = 15213;
printf("int a = 15213;\n");
show_bytes((pointer) &a, sizeof(int));
Result (Linux):
int a = 15213;
0x11ffffcb8 0x6d
0x11ffffcb9 0x3b
0x11ffffcba 0x00
0x11ffffcbb 0x00
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 64
Representing Integersint A = 15213;int B = -15213;long int C = 15213;
Decimal: 15213
Binary: 0011 1011 0110 1101
Hex: 3 B 6 D
6D
3B
00
00
Alpha A
3B
6D
00
00
Sun A
93
C4
FF
FF
Alpha B
C4
93
FF
FF
Sun B
Two’s complement representation
00
00
00
00
6D
3B
00
00
Alpha C
3B
6D
00
00
Sun C
6D
3B
00
00
ia32 C
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 65
Representing Pointersint B = -15213;int *P = &B;
Alpha Address
Hex: ... 0 1 F F F F F C A 0
Binary: 0000 0001 1111 1111 1111 1111 1111 1100 1010 0000
01
00
00
00
A0
FC
FF
FF
Alpha P
Sun Address
Hex: E F F F F B 2 C Binary: 1110 1111 1111 1111 1111 1011 0010 1100
Different compilers & machines assign different locations to objects
FB
2C
EF
FF
Sun P
FF
BF
D4
F8
Linux PLinux Address
Hex: B F F F F 8 D 4 Binary: 1011 1111 1111 1111 1111 1000 1101 0100
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 66
Characters and Strings
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 67
Representing a character Single byte sufficient for small codes
In C: char c; But interpretation of value depends on chosen code Fixed-length code:
Baudot, ITA2, USTTY: 5 bits BCD (binary coded decimal, used in early computers): 6 bits ASCII – defines 128 characters (7 bits) ISO 8859, EBCDIC: 8 bits
Variable-length codes UTF-8: 8-32 bits UTF-16: 16-32 bits
Wide chars: chars more than 8 bits.
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 68
Example: ASCII
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 69
Representing Strings (in C) Represented by array of 1-byte characters
Each character encoded in ASCII formatStandard 7-bit encoding of character setCharacter “0” has code 0x30
» Digit i has code 0x30+i Other encodings exist (e.g., for Hebrew) String are null-terminated
Final character = 0x00Array length must be strlen()+1.
Alpha S Sun S
32
31
31
35
33
00
32
31
31
35
33
00
char S[6] = "15213";
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 70
Representing Strings (in C) Represented by array of 1-byte characters
Each character encoded in ASCII formatStandard 7-bit encoding of character setCharacter “0” has code 0x30
» Digit i has code 0x30+i Other encodings exist (e.g., for Hebrew) String are null-terminated
Final character = 0x00Array length must be strlen()+1.
Text files generally platform independent But: Different conventions of line termination character(s)! But: Different default encodings (depend on locale)
Alpha S Sun S
32
31
31
35
33
00
32
31
31
35
33
00
char S[6] = "15213";
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 71
Representing Strings (Cont'd) Strings in Pascal (a.k.a P-Strings)
Represented by array of characters+1 First cell holds length, no need for null-termination Length known in O(1)!
Think about strcat(), strlen(), strcpy(), ... Size of string limited by size of array cell value
p-String
35
32
5
31
31
33
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 72
Representing Strings (Cont'd) Strings in Pascal (a.k.a P-Strings)
Represented by array of characters+1 First cell holds length, no need for null-termination Length known in O(1)!
Think about strcat(), strlen(), strcpy(), ... Size of string limited by size of array cell value
c-strings and p-strings can be used with wide chars, but: big/little endian now matters (each cell multiple bytes) No longer built-in, instead platform/library specific
p-String
35
32
5
31
31
33
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 73
Main Points
Boolean Algebra is Mathematical Basis Basic form encodes “false” as 0, “true” as 1 General form like bit-level operations in C Good for representing & manipulating sets
It’s All About Bits & Bytes Numbers, text, programs: They are all collections of bits!
Different Machines/Languages: Different Conventions Representations Word size Byte ordering
Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 74
שאלות?