EI 209 Chapter 3.1CSE, 2015 EI 209 Computer Organization Fall 2015 Chapter 3: Arithmetic for Computers Haojin Zhu (hjzhu/ )hjzhu

EI 209 Chapter 3.1 CSE, 2015

EI 209 Computer Organization

Fall 2015

Chapter 3: Arithmetic forComputers

Haojin Zhu (http://tdt.sjtu.edu.cn/~hjzhu/ )

[Adapted from Computer Organization and Design, 4th Edition, Patterson & Hennessy, © 2012, MK]

http://tdt.sjtu.edu.cn/~hjzhu/


Review: MIPS (RISC) Design Principles

Simplicity favors regularity fixed size instructions small number of instruction formats opcode always the first 6 bits

Smaller is faster limited instruction set limited number of registers in register file limited number of addressing modes

Make the common case fast arithmetic operands from the register file (load-store machine) allow instructions to contain immediate operands

Good design demands good compromises three instruction formats


Specifying Branch Destinations Use a register (like in lw and sw) added to the 16-bit offset

which register? Instruction Address Register (the PC)- its use is automatically implied by instruction- PC gets updated (PC+4) during the fetch cycle so that it holds the

address of the next instruction limits the branch distance to -215 to +215-1 (word) instructions from

the (instruction after the) branch instruction, but most branches are local anyway

PCAdd

32

32 3232

32

offset

16

32

00

sign-extend

from the low order 16 bits of the branch instruction

branch dstaddress

?Add

4 32


MIPS also has an unconditional branch instruction or jump instruction:

j label #go to label

Other Control Flow Instructions

Instruction Format (J Format):

0x02 26-bit address

PC4

32

26

32

00

from the low order 26 bits of the jump instruction

Why shift left by two bits?


Review: MIPS Addressing Modes Illustrated1. Register addressing

op rs rt rd funct Register

word operand

op rs rt offset

2. Base (displacement) addressing

base register

Memory

word or byte operand

3. Immediate addressing

op rs rt operand

4. PC-relative addressing

op rs rt offset

Program Counter (PC)

Memory

branch destination instruction

5. Pseudo-direct addressing

op jump address

Program Counter (PC)

Memory

jump destination instruction||


32-bit signed numbers (2’s complement):

0000 0000 0000 0000 0000 0000 0000 0000two = 0ten

0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten...

0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646ten

0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647ten

1000 0000 0000 0000 0000 0000 0000 0000two = – 2,147,483,648ten

1000 0000 0000 0000 0000 0000 0000 0001two = – 2,147,483,647ten...

1111 1111 1111 1111 1111 1111 1111 1110two = – 2ten

1111 1111 1111 1111 1111 1111 1111 1111two = – 1ten

Number Representations

maxint

minint

Converting <32-bit values into 32-bit values copy the most significant bit (the sign bit) into the “empty” bits

0010 -> 0000 00101010 -> 1111 1010

sign extend versus zero extend (lb vs. lbu)

MSB

LSB


MIPS Arithmetic Logic Unit (ALU) Must support the Arithmetic/Logic

operations of the ISAadd, addi, addiu, addu

sub, subu

mult, multu, div, divu

sqrt

and, andi, nor, or, ori, xor, xori

beq, bne, slt, slti, sltiu, sltu

32

32

32

m (operation)

result

A

B

ALU

4

zero ovf

11

With special handling for sign extend – addi, addiu, slti, sltiu zero extend – andi, ori, xori overflow detection – add, addi, sub


Dealing with Overflow

Operation Operand A Operand B Result indicating overflow

A + B ≥ 0 ≥ 0 < 0

A + B < 0 < 0 ≥ 0

A - B ≥ 0 < 0 < 0

A - B < 0 ≥ 0 ≥ 0

Overflow occurs when the result of an operation cannot be represented in 32-bits, i.e., when the sign bit contains a value bit of the result and not the proper sign bit

When adding operands with different signs or when subtracting operands with the same sign, overflow can never occur

MIPS signals overflow with an exception (aka interrupt) – an unscheduled procedure call where the EPC contains the address of the instruction that caused the exception


Just like in grade school (carry/borrow 1s) 0111 0111 0110+ 0110 - 0110 - 0101

Two's complement operations are easy

do subtraction by negating and then adding

0111 0111 - 0110 + 1010

Overflow (result too large for finite computer word)

e.g., adding two n-bit numbers does not yield an n-bit number 0111+ 0001

Addition & Subtraction

1101 0001 0001

0001 1 0001

1000


Building a 1-bit Binary Adder

1 bit Full Adder

A

BS

carry_in

carry_out

S = A xor B xor carry_in carry_out = A&B | A&carry_in | B&carry_in (majority function)

How can we use it to build a 32-bit adder?

How can we modify it easily to build an adder/subtractor?

A B carry_in carry_out S

0 0 0 0 0

0 0 1 0 1

0 1 0 0 1

0 1 1 1 0

1 0 0 0 1

1 0 1 1 0

1 1 0 1 0

1 1 1 1 1


Building 32-bit Adder

1-bit FA

A0

B0

S0

c0=carry_in

c1

1-bit FA

A1

B1

S1

c2

1-bit FA

A2

B2

S2

c3

c32=carry_out

1-bit FA

A31

B31

S31

c31

. .

.

Just connect the carry-out of the least significant bit FA to the carry-in of the next least significant bit and connect . . .

Ripple Carry Adder (RCA) advantage: simple logic, so small

(low cost)

disadvantage: slow and lots of glitching (so lots of energy consumption)


A 32-bit Ripple Carry Adder/Subtractor

Remember 2’s complement is just

complement all the bits

add a 1 in the least significant bit

A 0111 0111 B - 0110 +

1-bit FA S0

c0=carry_in

c1

1-bit FA S1

c2

1-bit FA S2

c3

c32=carry_out

1-bit FA S31

c31

. .

.

A0

A1

A2

A31

B0

B1

B2

B31

add/sub

B0

control(0=add,1=sub) B0 if control = 0

!B0 if control = 1

0001

1001 1

1 0001


Overflow Detection Logic

Carry into MSB ! = Carry out of MSB For a N-bit ALU: Overflow = CarryIn [N-1] XOR CarryOut [N-1]

Overflow

X Y X XOR Y

0 0 01 1

1 0 11 1 0

A0

B0

1-bitALU

Result0

CarryIn0

CarryOut0

A1

B1

1-bitALU

Result1

CarryIn1

CarryOut1

A2

B2

1-bitALU

Result2

CarryIn2

CarryOut2A3

B3

1-bitALU

Result3

CarryIn3

CarryOut3

0

why?


Multiply

Binary multiplication is just a bunch of right shifts and adds

multiplicand

multiplier

partialproductarray

double precision product

n

2n

ncan be formed in parallel and added in parallel for faster multiplication


More complicated than addition Can be accomplished via shifting and adding

0010 (multiplicand) x_1011 (multiplier)

0010 0010 (partial product

0000 array) 0010 00010110 (product)

In every step• multiplicand is shifted• next bit of multiplier is examined (also a shifting step)• if this bit is 1, shifted multiplicand is added to the product

Multiplication


In every step• multiplicand is shifted• next bit of multiplier is examined (also a shifting step)• if this bit is 1, shifted multiplicand is added to the product

Multiplication Algorithm 1



Comments on Multiplicand Algorithm 1

Performance Three basic steps for each bit It requires 100 clock cycles to multiply two

32-bit numbers If each step took a clock cycle,

How to improve it?

Motivation (Performing the operations in parallel): Putting multiplier and the product together Shift them together


Refined Multiplicand Algorithm 2

multiplicand

32-bit ALU

multiplier Control

addshiftright

product

• 32-bit ALU and multiplicand is untouched• the sum keeps shifting right• at every step, number of bits in product + multiplier = 64,

hence, they share a single 64-bit register


Add and Right Shift Multiplier Hardware

multiplicand

32-bit ALU

multiplier Control

addshiftright

product

0 1 1 0 = 6

0 0 0 0 0 1 0 1 = 5add 0 1 1 0 0 1 0 1

0 0 1 1 0 0 1 0add 0 0 1 1 0 0 1 0

0 0 0 1 1 0 0 1 add 0 1 1 1 1 0 0 1

0 0 0 1 1 1 1 0add 0 0 1 1 1 1 0 0

0 0 1 1 1 1 0 0

= 30


Exercise

Using 4-bit numbers to save space, multiply 2ten*3ten, or 0010two * 0011two


Division

Division is just a bunch of quotient digit guesses and left shifts and subtracts

dividend = quotient x divisor + remainder

dividenddivisor

partialremainderarray

quotientnn

remainder

n

0 0 0

0

0

0


Division

1001ten QuotientDivisor 1000ten | 1001010ten Dividend -1000 10 101 1010 -1000 10ten Remainder

At every step,• shift divisor right and compare it with current dividend• if divisor is larger, shift 0 as the next bit of the quotient• if divisor is smaller, subtract to get new dividend and shift 1

as the next bit of the quotient


26

First Version of Hardware for Division

A comparison requires a subtract; the sign of the result is examined; if the result is negative, the divisor must be added back


1. Subtract the Divisor register from the Remainder register, and place the result in the Remainder register.

Test RemainderRemainder < 0Remainder >=0

2a. Shift the Quotient register to the left setting the new rightmost bit to 1.

2b. Restore the original value by adding the Divisor reg to the Remainder reg and place the sum in the Remainder reg. Also shift the Quotient register to the left, setting the new LSB to 0

3. Shift the Divisor register right1 bit.

33rd repetition?No: < 33 repetitions

DoneYes: 33 repetitions

StartDivide Algorithm


28

Divide Example

• Divide 7ten (0000 0111two) by 2ten (0010two)

Iter Step Quot Divisor Remainder

0 Initial values

1

2

3

4

5


Divide Example

• Divide 7ten (0000 0111two) by 2ten (0010two)

Iter Step Quot Divisor Remainder

0 Initial values 0000 0010 0000 0000 0111

1 Rem = Rem – DivRem < 0 +Div, shift 0 into QShift Div right

000000000000

0010 00000010 00000001 0000

1110 01110000 01110000 0111

2 Same steps as 1 000000000000

0001 00000001 00000000 1000

1111 01110000 01110000 0111

3 Same steps as 1 0000 0000 0100 0000 0111

4 Rem = Rem – Div Rem >= 0 shift 1 into QShift Div right

000000010001

0000 01000000 01000000 0010

0000 00110000 00110000 0011

5 Same steps as 4 0011 0000 0001 0000 0001


30

Efficient Division

Remainder

Quotient

Divisor

64-bit ALU

Shift Right

Shift Left

Write Control

32 bits

64 bits

64 bits

divisor

32-bit ALU

quotient Control

subtractshiftleft

dividend

remainder


Left Shift and Subtract Division Hardware

divisor

32-bit ALU

quotient Control

subtractshiftleft

dividend

remainder

0 0 1 0 = 2

0 0 0 0 0 1 1 0 = 60 0 0 0 1 1 0 0

sub 1 1 1 0 1 1 0 0 rem neg, so ‘ient bit = 00 0 0 0 1 1 0 0 restore remainder0 0 0 1 1 0 0 0

sub 1 1 1 1 1 1 0 0 rem neg, so ‘ient bit = 00 0 0 1 1 0 0 0 restore remainder0 0 1 1 0 0 0 0

sub 0 0 0 1 0 0 0 1 rem pos, so ‘ient bit = 10 0 1 0 0 0 1 0

sub 0 0 0 0 0 0 1 1 rem pos, so ‘ient bit = 1= 3 with 0 remainder


s(0) = z

for j = 1 to k

if 2 s(j-1) - 2k d > 0 qk-j = 1 s(j) = 2 s(j-1) - 2k d else qk-j = 0 s(j) = 2 s(j-1)

32

Restoring Unsigned Integer Division

No need to restore the remainder

in the case of R-D>0,

Restore the remainderIn the case of

R-D<0,

the remainder shift left by 1 bit

K=32, put divisor in the left 32 bit register


Non-Restoring Unsigned Integer Division

s(1) = 2 z - 2k dfor j = 2 to k if s(j-1) 0 qk-(j-1) = 1 s(j) = 2 s(j-1) - 2k d else qk-(j-1) = 0 s(j) = 2 s(j-1) + 2k dend forif s(k) 0 q0 = 1else q0 = 0 Correction step

If in the last step, remainder –divisor >0,

Perform subtraction

If in the last step, remainder –divisor <0,

Perform addition

why?


s(0) = z

for j = 1 to k

if 2 s(j-1) - 2k d > 0 qk-j = 1 s(j) = 2 s(j-1) - 2k d else qk-j = 0 s(j) = 2 s(j-1)

s(1) = 2 z - 2k dfor j = 2 to k if s(j-1) 0 qk-(j-1) = 1 s(j) = 2 s(j-1) - 2k d else qk-(j-1) = 0 s(j) = 2 s(j-1) + 2k dend forif s(k) 0 q0 = 1else q0 = 0 Correction step

Restoring Unsigned Integer Division

equal

Why?

Non-Restoring Unsigned Integer Division

considering two consequentsteps j-1 and j, in particular2s(j-2) - 2k d <0

In the j-1 step, Restoring Algorithm computes qk-j = 0 s(j-1) = 2 s(j-2)

Non-Restoring Algorithm s(j-1) = 2 s(j-2) - 2k d

In the subsequent j step, Restoring Algorithm computes 2 s(j-1) - 2k d== 2*2 s(j-2) - 2k d

In the subsequent j step, non-Restoring Algorithm computes 2 s(j-1) + 2k d = 2*2 s(j-2) - 2*2k d +2k d= 2*2 s(j-2) - 2k d

2x-y= 2(x-y)+y


Non-restoring algorithmset subtract_bit true

1: If subtract bit true:Subtract the Divisor register from the Remainder and place the result

in the remainder register else

Add the Divisor register to the Remainder and place the result in the remainder register

2:If Remainder >= 0Shift the Quotient register to the left, setting rightmost bit to 1

elseSet subtract bit to false

3: Shift the Divisor register right 1 bit if < 33rd rep

goto 1 else Add Divisor register to remainder and place in Remainder register exit


Example:Perform n + 1 iterations for n bitsRemainder 0000 1011Divisor 00110000-----------------------------------Iteration 1:(subtract)Rem 1101 1011Quotient 0Divisor 0001 1000-----------------------------------Iteration 2:(add)Rem 11110011Q00Divisor 0000 1100-----------------------------------Iteration 3:(add)Rem 11111111Q000Divisor 0000 0110

-----------------------------------Iteration 4:(add)Rem 0000 0101Q0001Divisor 0000 0011-----------------------------------Iteration 5:(subtract)Rem 0000 0010Q 00011Divisor 0000 0001Since reminder is positive, done.Q = 0011 and Rem = 0010


Exercise

Calculate A divided by B using restoring and non-restoring division. A=26, B=5


Divide (div and divu) generates the reminder in hi and the quotient in lo

div $s0, $s1 # lo = $s0 / $s1

# hi = $s0 mod $s1

Instructions mfhi rd and mflo rd are provided to move the quotient and reminder to (user accessible) registers in the register file

MIPS Divide Instruction

As with multiply, divide ignores overflow so software must determine if the quotient is too large. Software must also check the divisor to avoid division by 0.

0 16 17 0 0 0x1A


Lecture 1

EI 209 Chapter 3.1CSE, 2015 EI 209 Computer Organization Fall 2015 Chapter 3: Arithmetic for Computers Haojin Zhu (hjzhu/ )hjzhu

Documents

EI 209 Chapter 3.1CSE, 2015 EI 209 Computer Organization Fall 2015 Chapter 3: Arithmetic for Computers Haojin Zhu (hjzhu/ )hjzhu