Top Banner

of 39

computer architecture lecture notes

Mar 02, 2018

Download

Documents

Santhosh Vijay
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/26/2019 computer architecture lecture notes

    1/39

    1

    Module 2

    Execution of complete instruction

    To execute an instruction processor has to perform following steps

    Fetch the contents of the memory location pointed to by the PC. The contents of this location a

    loaded into the IR (fetch phase).

    IR[[PC]]

    Assuming that the memory is byte addressable, increment the contents of the PC by 4 (fetch phase).

    PC[PC] + 4

    Carry out the actions specified by the instruction in the IR (execution phase).

    In general execution of instruction involves

    Transfer a word of data from one processor register to another or to the ALU.

    Perform an arithmetic or a logic operation and store the result in a processor register.

    Fetch the contents of a given memory location and load them into a processor register.

    Store a word of data from a processor register into a given memory location.

    Consider the single bus organization of datapath shown in figure below.Consider the instruction AD

    (R3),R1 which adds contents of memory location pointed to by R3 to register R1.Execution of th

    instruction involves following actions

    Fetch the instruction

    Fetch the first operand (the contents of the memory location pointed to byR3)

    Perform the addition

    Load the result into R1

  • 7/26/2019 computer architecture lecture notes

    2/39

    2

    Control sequence for execution of this instruction is shown below.

    In step1 instruction fetch operation is initiated by loading cntents of PC intoMAR and sending a rea

    request to memory. The select signal is set to select 4 which causes MUX to select constant 4.This valu

    is added to operand at input B which is content of PC and result is stored in register Z.The updated valu

  • 7/26/2019 computer architecture lecture notes

    3/39

    3

    is moved from register Z back to PC during step 2 while waiting for memory function to complete.

    step 3 word fetched from memory is loaded into IR.

    Step 1 to 3 constitute instruction fetch phase .The instruction decoding circuit interprets the contents

    IR at beginning of step 4.This enable the control circuitry to activatecontrol signals for steps 4-7 whi

    constitute execution phase.The contents of register R3 are transferred to MAR in step 4 and a memo

    read operation is initiated.Then the contents of R1 are transferred to register Y in step 5 to prepare f

    addition operation.When the read opearation is completed the memory operand is available in regist

    MDR and addition operation is performed in step 6.The contents of MDR are gated to the bus and thu

    also the B input of ALU and register Y is selected as second input of ALU by choosing select y.The su

    is stored in register Z then transferred to R1 in step 7.The End instructionncauses a new instruction fetc

    cycle to begin by returning to step 1.

    WMC stands for Wait for Memory operation to Complete. Generally the addressed device on th

    memory bus is slower than the microprocessor. Therefore,the microprocessor has to wait for th

    addressed device to complete its operation. This indication that the memory operation has be

    completed is given to the processor by the control signal WMC.

    Execution of Branch Instructions

    Unconditional branch

    A branch instruction replaces the contents of PC with branch target address.The address is obtaine

    by adding an offset X which is given in instruction to updated value of PC.

    Processing of unconditional branch instruction begins with fetch phase. This phase ends when a

    instruction is loaded into IR in step 3.The offset value is extracted from IR by instruction decodin

    circuit. Since the value of updated PC is already available in register Y the offset X is gated onto bus

    step 4 and an addition operation is performed. The result which is branch target address is loaded into P

    in step 5.The offset X is difference between branch target address immediately following bran

    instruction.

    Conditional branch

    For a conditional branch check the status of condition codes before loading new

    value into PC.

  • 7/26/2019 computer architecture lecture notes

    4/39

    4

    If n=0 processor returns to step 1 else step 5 is performed to load a new value into PC.

    Single bus Organisation

    The datapath often consists of the following functional blocks

  • 7/26/2019 computer architecture lecture notes

    5/39

    5

    The Instruction register stores the current instruction to be executed.

    The Program Counter (PC) stores the address of the next instruction to be fetched.

    Memory address register (MAR) - A register that either stores the memory address from which da

    will be fetched to the CPU or the address to which data will be sent and stored.

    Memory data register (MDR) - A register of a computer's control unit that contains the data to b

    stored in the computer storage (e. g. RAM), or the data after a fetch from the computer storage. RegisteR0Rn-1 are provided for general purpose use.Register Y Z are provided for temporary storage durin

    execution of some instructions. This organisation is called single bus organization since ALU and a

    registers are interconnected via single common bus. The data and address lines of external memory b

    are connected to internal processor bus via memory data register and memory address register MA

    .Register MDR has two inputs and two outputs. Data may be loaded into MDR either from memory b

    or from internal processor bus .The data stored in MDR may be stored in either bus. The input of MAR

    connected to instruction decoder and control block. This unit is responsible for issuing signals th

    control the operation of all units inside the processor and for interfacing with memory bus. Th

    multiplexer selects either the output of register y or a constant value 4 to be provided as input A of ALU

    The constant 4 is used to increment the contents of PC.As instruction execution proceeds data atransferred from one register to another passing through ALU to perform arithmetic or logic operatio

    The instruction decode and control logic unit is responsible for implementing the actions specified b

    instructions loaded in IR. Decoder generates the control signals needed to select registers involved an

    direct the Transfer of data .The registers ,ALU and interconnecting bus are collectively referred to

    datapath. To execute an instruction processor has to perform following steps

    Fetch the contents of the memory location pointed to by the PC. The contents of this location are

    loaded into the IR (fetch phase).

    IR[[PC]]

    Assuming that the memory is byte addressable, increment the contents of the PC by 4 (fetch phase).

    PC[PC] + 4

    Carry out the actions specified by the instruction in the IR (execution phase). In general execution of

    instruction involves

    Transfer a word of data from one processor register to another or to the ALU.

    Perform an arithmetic or a logic operation and store the result in a processor register.

    Fetch the contents of a given memory location and load them into a processor register.

    Store a word of data from a processor register into a given memory location. Control sequence for

    execution of this instruction for a single bus organization

  • 7/26/2019 computer architecture lecture notes

    6/39

    6

    In step1 instruction fetch operation is initiated by loading contents of PC into MAR and sending a re

    request to memory. The select signal is set to select 4 which causes MUX to select constant 4.This valu

    is added to operand at input B which is content of PC and result is stored in register Z. The updated valu

    is moved from register Z back to PC during step 2 while waiting for memory function to complete.

    step 3 word fetched from memory is loaded into IR.

    Step 1 to 3 constitute instruction fetch phase .The instruction decoding circuit interprets the contents

    IR at beginning of step 4.This enable the control circuitry to activate control signals for steps 4-7 whic

    constitute execution phase. The contents of register R3 are transferred to MAR in step 4 and a memo

    read operation is initiated. Then the contents of R1 are transferred to register Y in step 5 to prepare f

    addition operation. When the read operation is completed the memory operand is available in regist

    MDR and addition operation is performed in step 6.The contents of MDR are gated to the bus and thu

    also the B input of ALU and register Y is selected as second input of ALU by choosing select y. The su

    is stored in register Z then transferred to R1 in step 7.The End instruction causes a new instruction fetccycle to begin by returning to step 1.

    Multi bus OrganizationIn a multibus organization several data transfers can take place in parallel. Three bus structures are use

    to connect registers and ALU of a processor. All general purpose registers are combined into sing

    register called register file. The register file has 3 ports. There are two outputs allowing contents of tw

    registers to be accessed simultaneously and have their contents placed on buses A and B. The third po

    allows data on bus C to be loaded into third register during some clock cycle.

    Buses A and B are used to transfer source operands to A and B inputs of ALU where arithmetic and log

    operation may be performed. Result is transferred to destination over bus C. Three bus organizatio

    eliminates the need of temporary registers y and z in single bus organization. Another feature of thorganization is introduction of incrementer unit

    which is used to increment the PC by 4.Using the incrementer eliminates the need to add 4 to PC usin

    ALU.The source for constant 4 at ALU input is used to increment other address such as memory addre

    in load,multiple and store instructions

  • 7/26/2019 computer architecture lecture notes

    7/39

    7

    Control sequence for three bus organisation for the instruction ADD R4,R5,R6

    In step 1 contents of PC are passed through ALU using R=B control signal and loaded into MAR to star

    a memory read operation. At the same time

    PC is incremented by 4.The incremented value is loaded into PC at the end clock cycle. In step

    processor waits for MFC and loads data received into MDR ,then transfers them to IR in step 3.final

    the execution phase of instruction requires only one control step to complete step 4.

  • 7/26/2019 computer architecture lecture notes

    8/39

    8

    Control UnitThe control unit issues control signals external to the processor to cause data exchange with memory an

    I/O modules. The control unit also issues control signals internal to the processor to move data betwe

    registers, to cause the ALU to perform a specified function, and to regulate other internal operations. Th

    control unit performs two basic tasks: Sequencing: The control unit causes the processor to step throug

    a series of micro-operations in the proper sequence, based on the program being executed. Executio

    The control unit causes each micro-operation to be performed.

    Figure is a general model of the control unit, showing all of its inputs and outputs. The inputs are

    Clock: This is how the control unit keeps time. The control unit causes one micro-operation (or a s

    of simultaneous micro-operations) to be performed for each clock pulse.This is sometimes referred to

    the processor cycle time, or the clock cycle time.Instruction register: The opcode and addressing mode of the current instruction are used to determin

    which micro-operations to perform during the execute cycle.

    Flags: These are needed by the control unit to determine the status of the processor and the outcome

    previous ALU operations. For example, for the increment-and-skip-if-zero (ISZ) instruction, the contr

    unit will increment the PC if the zero flag is set.

    Control signals from control bus: The control bus portion of the system bus provides signals to th

    control unit. The outputs are as follows:

    Control signals within the processor: These are two types: those that cause data to be moved from on

    register to another, and those that activate specific ALU functions.

    Control signals to control bus: These are also of two types: contro l signals tomemory, and controlsignals to the I/O modules. Techniques for control unit implementation

    Hardwired implementation

    Microprogrammed implementation

    Hardwired controlIn a hardwired implementation, the control unit is essentially a state machine circuit. Its input log

    signals are transformed into a set of output logic signals, which are the control signals.Control unit is

  • 7/26/2019 computer architecture lecture notes

    9/39

    9

    combinational circuit that generates the required control outputs depending on state of all its inputs.Bas

    block diagram of hardwired control unit is shown below.

    The control unit makes use of the opcode and will perform different actions (issue a differe

    combination of control signals) for different instructions. To simplify the control unit logic, there shou

    be a unique logic input for each opcode. This function can be performed by a decoder,which takes

    encoded input and produces a single output. In general, a decoder will have n binary inputs and 2n bina

    outputs. Each of the 2n different input patterns will activate a single unique output. The clock portion

    the control unit issues a repetitive sequence of pulses. This is useful for measuring the duration of micr

    operations. Essentially, the period of the clock pulses must be long enough to allow the propagation

    signals along data paths and through processor circuitry.A counter is used to keep track of contrsteps.Each count of this counter corresponds to control step.The required control signals are determine

    by

    contents of control step counter

    contents of instruction register

    contents of condition code flags

    External input signals and interrupt requests

    By separating decoding and encoding functions more detailed block diagram is shown below

  • 7/26/2019 computer architecture lecture notes

    10/39

    10

    RUN control signal when set to 1,RUN causes counter to be incremented by one at the end of every

    clock cycle .When RUN equal to zero counter stops counting.

    Generation of Zin control signal

    Generation of END control signal

  • 7/26/2019 computer architecture lecture notes

    11/39

    11

    Advantage

    Hardwired system can operate at high speed

    Disadvantage

    Little flexibility

    Application

    Used in RISC processor

    Micro programmed Control

    In a micro programmed control unit the logic of the control unit is specified by a microprogram.A micrprogram consists of a sequence of instructions in a microprogramming language. A micro progra

    consists of sequence of instructions in micro programming language similar to machine language. The

    are very simple instructions that specify micro-operations. A micro programmed control unit is

    relatively simple logic circuit that is capable of (1) sequencing through microinstructions and (

    generating control signals to execute each microinstruction.

  • 7/26/2019 computer architecture lecture notes

    12/39

    12

    Basic organization of micro programmed control unit

  • 7/26/2019 computer architecture lecture notes

    13/39

    13

    The micro routines for all instructions in the instruction set of computer are stored in special memo

    called controlled store or control memorythe control unit can generate the control signals for an

    instruction by reading Control word of corresponding microroutine from control store.(A control word

    a word whose individual bits represent the various control signals.A sequence of control wo

    corresponding to control sequence of machine instruction constitute the micro routine for that instructio

    And the individual control words in micro routine are reffered to as micro instructions. To read contr

    words sequentially from control store a microprogram counter is used.every time a new micro instructio

    is loaded into IR the output of starting address generator is loaded into PC. PC is then automatical

    incremented by clock causing successive micro instructions to be read from control store.hence th

    control signals are delivered to various parts of processor in correct sequence. To support microprogra

    branching organisation of control unit is modified as follows

    Starting and branch address generator block loads a new address into PC when a micro instructio

    instructs it to do so. PC is incremented every time a new instruction is fetched from control store exce

  • 7/26/2019 computer architecture lecture notes

    14/39

    14

    in following situations.when a new micro instruction PC is loaded into IR is loded withstarting addre

    of that instruction .when a branch micro instruction is encountered PC is loaded with branch targ

    address if branch condition is satisfied.when an END micro instruction PC is loaded with address

    first

    CW in micro routine for that instruction cycle.

    Advantages of micro programmed control unit

    Simplifies design of control unit

    Cheaper and less error prone to implement

    Disadvantage

    Slower than hardwired unit

    Application

    Used in CISC processor

    Micro program sequencing

    If all micro programs require only straightforward sequential execution of microinstructions except f

    branches, letting a PC governs the sequencing would be efficient. However, this has tw

    disadvantages:

    Having a separate micro routine for each machine instruction results in a large total number

    microinstructions and a large control store.

    Longer execution time because it takes more time to carry out the required branches.

    A powerful alternative approach is to include an address field as a part of every microinstruction

    indicate the location of the next microinstruction to be fetched. Separate branch microinstructions a

    virtually eliminated. Microinstructions with Next-Address Field is shown below.

  • 7/26/2019 computer architecture lecture notes

    15/39

    15

    Arithmetic and logic designAn n bit sequence of binary digits an-1,an-2,.....a1,a0 is interpreted as unsigned integer A as

    The simplest form of representation that employs a sign bit is sign magnitude representation .In an n b

    word rightmost n-1 bits hold magnitude of integer The general representation of signed integer is

    Addition/subtraction of signed numbersAdditionAt the ith stage:Input:ci is the carry-in Output:si is the sum ci+1 carry-out to (i+1)st state

  • 7/26/2019 computer architecture lecture notes

    16/39

    16

    Addition logic for a single stage

    n-bit adder Cascade n full adder (FA) blocks to form a n-bit adder.

    Carries propagate or ripple through this cascade, n-bit ripple carry adder

  • 7/26/2019 computer architecture lecture notes

    17/39

    17

    K n-bit adderK n-bit numbers can be added by cascading k n-bit adders

    n-bit subtractor

    X

    Y is equivalent to adding 2s complement of Y toX. 2s complement is equivalent to 1s complement + 1.

    XY = X + Y + 1

    2s complement of positive and negative numbers is computed similarly.

    n-bit adder/subtractorThe two inputs x and y represent the arguments to be added/subtracted. The control input ADD/SU

    determines whether an add or a subtract operation is to be performed such that if the control input is

    then an add operation is performed while if the control input is 1 then a subtract operation is performed

  • 7/26/2019 computer architecture lecture notes

    18/39

    18

    Detecting overflows

    Overflows can only occur when the sign of the two operands is the same. Overflow occurs if the sign

    of the result is different from the sign of the operands.

    xn-1, yn-1, sn-1 represent the sign of operand x, operand y and result s respectively.

    Circuit to detect overflow can be implemented by the following logic expressions:

    Computing the add time

  • 7/26/2019 computer architecture lecture notes

    19/39

    19

    Consider 0th stage: S0 is available after 1 gate delay.

    c1 is available after 2 gate delays.

    Computing the add time of n bit ripple carry adder

    Consider a 4 bit ripple carry adder

    s0 available after 1 gate delays, c1 available after 2 gate delays.

    s1 available after 3 gate delays, c2 available after 4 gate delays. s2 available after 5 gate delays, c3 available after 6 gate delays.

    s3 available after 7 gate delays, c4 available after 8 gate delays

    For an n-bit adder, sn-1 is available after 2n-1 gate delays cn is available

    after 2n gate delays

    Fast addersOne of the main drawbacks of the RIPPLE CARRY ADDER circuit is the expected long delay betwee

    the time the inputs are presented to the circuit until the final output is obtained. This is because of th

    dependence of each stage on the carry output produced by the previous stage. This chain of dependen

    makes the adders delay. In order to speed up the addit ion process, it is necessary to introduce additiocircuits in which the chain of dependence among the adder stages must be broken. One fast adder circu

    is carry-look ahead

    (CLA) adder

    Carry-look-ahead (CLA) adder

    C i+1 can be written as

    we can write C i+1 as

    Where

  • 7/26/2019 computer architecture lecture notes

    20/39

    20

    Gi is called generate function and Pi is called propagate function

    Gi and Pi are computed only from xi and yi and not ci, thus they can be computed in one gate delay

    after X and Y are applied to the inputs of an n-bit adder.A simpler circuit can be realized as

    Which differs from

    only when x i=y i =1.

    Thus using a cascade of 2 -two input XOR gate to realize sum the basic cell shown below can be used

    each bit stage

    Expanding ci in terms of i-1 subscripted variables and substitute it in c i+1

    Expression

    Continuing in this way the final expression for any carry variable is

    All carries can be obtained 3 gate delays afterX, Y and c0 are applied.

    -One gate delay forPi and Gi

  • 7/26/2019 computer architecture lecture notes

    21/39

    21

    -Two gate delays in the AND-OR circuit for ci+1

    All sums can be obtained 1 gate delay(XOR gate after the carries are computed. Independent of n, n-b

    addition requires only 4 gate delays.

    This is called Carry Lookahead adder

    C4 is available after after 3 gate delays and S3 after 4 gate delays where as a 4 bit ripple carry adder C 4

    available after 8 gate delays and S3 after 7 gate delays. Performing n-bit addition in 4 gate dela

    independent of n is good only theoretically because of fan-in constraints.

    Last AND gate and OR gate require a fan-in of (n+1) for a n-bit adder. For a 4-bit adder (n=4) fan-

    of 5 is required.

    Practical limit for most gates.

    In order to add operands longer than 4 bits, we can cascade 4-bit Carry- Lookahead adders. Cascade

    Carry-Lookahead adders is called Blocked

    Carry-Lookahead adder.

    Figure shows a 16 bit adder built from 4 bit adders

  • 7/26/2019 computer architecture lecture notes

    22/39

    22

    Blocked Carry-Look ahead adder

    In the first block

    And

    C16 is obtained as

    After xi, yi and c0 are applied as inputs:

    Gi andPi for each stage are available after 1 gate delay.

    PI is available after 2 and GI after 3 gate delays. All carries are available after 5 gate delays.

    c16 is available after 5 gate delays.

    s15 which depends on c12 is available after 8 (5+3)gate delays (Since for a 4-bit carry look ahead adde

    the last sum bit is available 3 gate delays after all inputs are available)

    Multiplication

    Multiplication of unsigned numbersConsider nxn multiplication

  • 7/26/2019 computer architecture lecture notes

    23/39

    23

    Product of 2 n-bit numbers is at most a 2n-bit number. Unsigned multiplication can be viewed

    addition of shifted versions of the multiplicand.

    Multiplication involves the generation of partial products, one for each digit in the multiplier. The

    partial products are then summed to produce the final product.

    The partial products are easily defined. When the multiplier bit is 0, the partial product is 0.When th

    multiplier is 1, the partial product is the multiplicand.

    The total product is produced by summing the partial products. For this operation, each successiv

    partial product is shifted one position to the left relative to the preceding partial product.

    Array multiplier

    Multiplicand is shifted by displacing it through an array of adders.

  • 7/26/2019 computer architecture lecture notes

    24/39

    24

    Where each multiplier cell is given as

    Array multipliers are:

    Extremely inefficient.

    Have a high gate count for multiplying numbers of practical size such as 32-bit or 64-bit numbers.

    Perform only one function, namely, unsigned integer product.

    Assuming that there are 2 gate delays from input to output of a full adder block the worst case sign

    propagation delay path (right end of first row to highest product bit output at the left end ,comprising a

    cells in bottom row and two cells in right end of all other rows) has a total of 6x(n-1)-1 gate dela

    including initial and gate delays in each cell. Since incoming partial product of first row is 0,only AN

    gates are required which is included in delay expression.

    Sequential multiplicationIn this case, multiplication is performed as a series of (n) conditional addition and shift operations suc

    that if the given bit of the multiplier is 0 then only a shift operation is performed, while if the given bit o

    the multiplier is 1 then addition of the partial products and a shift operation are performed.

    Register configuration in sequential multiplier

  • 7/26/2019 computer architecture lecture notes

    25/39

    25

    Flow chart for unsigned binary multiplication

  • 7/26/2019 computer architecture lecture notes

    26/39

    26

    The multiplier and multiplicand are loaded into two registers (Q and M).

    A third register, the A register, is also needed and is initially set to 0. There is

    also a 1-bit C register, initialized to 0, which holds a potential carry bit resulting

    from addition. The operation of the multiplier is as follows. Control logic reads

    the bits of the multiplier one at a time. If is 1, then the multiplicand is added to

    the A register and the result is stored in the A register, with the C bit used for

    overflow.Then all of the bits of the C, A, and Q registers are shifted to the right

    one bit, so that the C bit goes into goes into and is lost. If is 0, then no addition

    is performed,just the shift.This process is repeated for each bit of the original

    multiplier. The resulting -bit product is contained in the A and Q

    registers.Example is shown below

    Signed Multiplication

    Considering 2s-complement signed operands, what will happen to (-13) (+11)

    if following the same method of unsigned multiplication

  • 7/26/2019 computer architecture lecture notes

    27/39

    27

    For a negative multiplier, a straightforward solution is to form the 2scomplement

    of both the multiplier and the multiplicand and proceed as in

    the case of a positive multiplier.

    This is possible because complementation of both operands does not

    change the value or the sign of the product.

    A technique that works equally well for both negative and positive

    multipliersBooth algorithm.

    Booth algorithm

  • 7/26/2019 computer architecture lecture notes

    28/39

    28

    The multiplier and multiplicand are placed in the Q and M registers,

    respectively. There is also a 1-bit register placed logically to the right of the

    least significant bit of the Q register The results of the multiplication will appear

    in the A and Q registers. A and Q-1 are initialized to 0. Control logic scans thebits of the multiplier one at a time.Now, as each bit is examined, the bit to its

    right is also examined. If the two bits are the same (11 or 00), then all of the

    bits of the A,Q, and Q-1 registers are shifted to the right 1 bit. If the two bits

    differ, then the multiplicand is added to or subtracted from the A register,

    depending on whether the two bits are 01 or 10. Following the addition or

    subtraction, the right shift occurs. In either case, the right shift is such that the

    leftmost bit of A, namely A n-1not only is shifted into A n-2but also remains in

    A n-1.This is required to preserve the sign of the number in A and Q. It is known

    as an arithmetic shift, because it preserves the sign bit.

    Booth multiplier recording table

  • 7/26/2019 computer architecture lecture notes

    29/39

    29

    In general, in the Booth scheme, -1 times the shifted multiplicand is

    selected when moving from 0 to 1, and +1 times the shifted multiplicand is

    selected when moving from 1 to 0, as the multiplier is scanned from right to

    left.Booth multiplication with a positive multiplierConsider in a multiplication, the multiplier is positive 0011110

    Multiplier 0 0 1 1 1 1 0

    Booth recorded multiplier 0 +1 0 0 -1 0

    Booth multiplication with negative number

  • 7/26/2019 computer architecture lecture notes

    30/39

    30

    Booth recorded multiplier

    Best casea long string of 1s (skipping over 1s)

    Worst case0s and 1s are alternating

    Fast Multiplication

    Bit-Pair Recoding of MultipliersBit-pair recoding halves the maximum number of summands (versions of

    the multiplicand).

    Bit pair recording is derived from booth multiplier scheme

  • 7/26/2019 computer architecture lecture notes

    31/39

    31

    Multiplier bit pair recordingExample of bit Pair recording of multiplier 11010

    Example multiplication using bit pair recording

    Using booth multiplication

  • 7/26/2019 computer architecture lecture notes

    32/39

    32

    Hence using bit pair recording no;of summands is reduced by n/2,where n

    no:of bits in multiplier.

    Carry-Save Addition of Summands

    CSA speeds up the addition process

    Consider the addition of many summands, we can:

    Group the summands in threes and perform carry-save addition on each of these groups in parallel

    generate a set of S and C vectors in one fulladder delay

    Group all of the S and C vectors into threes, and perform carry-save addition on them, generating

    further set of S and C vectors in one more full-adder delay

    Continue with this process until there are only two vectors remaining They can be added in a RCA

    CLA to produce the desired product A multiplication example used to illustrate carry-save addition

  • 7/26/2019 computer architecture lecture notes

    33/39

    33

    Multiplication using normal scheme

  • 7/26/2019 computer architecture lecture notes

    34/39

    34

    Multiplication using CSA of summands

    Schematic representation of carry saves addition operations

    The outputs S4 and C4 from third CS level are available 6 gate delays later assuming two gate delays pe

    CSA level.The final two vectors can be added in further 8 gate delays using carry look ahead adder.Thetotal gate delay is there fore 15.by comparison total gate delay in performing this multiplication using

    nxn array multiplier is 6(n-1)-1=29( substituten=6).in general 1.7 log2 K - 1.7 t levels of CSA steps are

    needed to reduce K summands to 2 vectors which when added produce desired sum.

    Issues with carry save method

    If negative summnds are involved it is necessary to accommodate sign extension

    2n bit CLA is needed to add final S and C vectors.Fewer bits are actually needed.

    n summands are used for nxn multiplication. If bit pair recording is used this will be reduced

    n/2.This reduces no:of CSA levels from 1.7 log2 n -1.7 to 1.7 log2n-3.4

    Integer divisionManual Division

    Steps for Manual Division

    Position the divisor appropriately with respect to the dividend and performs a subtraction.

    If the remainder is zero or positive, a quotient bit of 1 is determined, the remainder is extended b

    another bit of the dividend, the divisor is repositioned, and another subtraction is performed. If th

  • 7/26/2019 computer architecture lecture notes

    35/39

    35

    remainder is negative, a quotient bit of 0 is determined, the dividend is restored by adding back th

    divisor, and the divisor is repositioned for another subtraction.

    Restoring Division

    Circuit arrangement for restoring division

    The divisor is placed in the M register, the dividend in the Q register. At each step, the A and Q registe

    together are shifted to the left 1 bit. M is subtracted from A to determine whether A divides the parti

    remainder.3 If it does, then gets a 1 bit. Otherwise, gets a 0 bit and M must be added back to A to resto

    the previous value. The count is then decremented, and the process continues for n steps. At the end, th

    quotient is in the Q register and the remainder is in the A register.

    StepsRepeat these steps n times

    Shift A and Q left one binary position

    Subtract M from A, and place the answer back in A

    If the sign of A is 1, set q0 to 0 and add M back to A (restore A);

    otherwise, set q0 to 1

    Flow chart summarizing the restoring method

  • 7/26/2019 computer architecture lecture notes

    36/39

    36

    Example for restoring method

  • 7/26/2019 computer architecture lecture notes

    37/39

    37

    Non-restoring DivisionAvoid the need for restoring A after an unsuccessful subtraction.

    Step 1: Do the following n times

    If the sign of A is 0, shift A and Q left one bit position and subtract M from A; otherwise, shift A an

    Q left and add M to A.

    Now, if the sign of A is 0, set q0 to 1; otherwise, set q0 to 0.

    Step2: If the sign of A is 1, add M to A

    Example

  • 7/26/2019 computer architecture lecture notes

    38/39

    38

    Floating point numbers

    Fixed point numbers:Binary point is fixed

    Representation of n bit binary fraction

    B=b0.b-1b-2b-3..b-(n-1)

    In 2s complement system signed value F is given by

    F(B)=-b0x20+ b-1x2-1+ b-2x2-2.+ b-(n-1

    )x2-(n-1)

    Where the range of F

    -1

  • 7/26/2019 computer architecture lecture notes

    39/39

    39

    A sample representation of 32 bit number

    IEEE notation

    IEEE Floating Point notation is the standard representation in use. There are two representations:

    - Single precision.

    - Double precision.

    Both have an implied base of 2.

    Single precision:

    - 32 bits (23-bit mantissa, 8-bit exponent in excess-127 representation) Double precision:

    - 64 bits (52-bit mantissa, 11-bit exponent in excess-1023 representation) Fractional mantissa, with an

    implied binary point at immediate left

    IEEE notation assumes that all numbers are normalized so that the MSB of the mantissa is a 1 and doe

    not store this bit. So the real MSB of a number in the IEEE notation is either a 0 or a 1.

    The values of the numbers represented in the IEEE single precisionnotation are of the form:

    The hidden 1 forms the integer part of the mantissa.

    excess-127 and excess-1023 (not excess-128 or excess-1024) are used to represent the exponent.

    In the IEEE representation, the exponent is in excess-127 (excess-1023) notation. The actual exponen

    represented are

    In single precession case normalized representation requires an exponent less than -126 or greater tha

    127.In the first case underflow occurred and in second case an overflow occurred.This is because th

    IEEE uses the exponents -127 and 128 (and -1023 and 1024), that is the actual values 0 and 255

    represent special conditions:

    - Exact zero

    - Infinity