7/26/2019 computer architecture lecture notes
1/39
1
Module 2
Execution of complete instruction
To execute an instruction processor has to perform following steps
Fetch the contents of the memory location pointed to by the PC. The contents of this location a
loaded into the IR (fetch phase).
IR[[PC]]
Assuming that the memory is byte addressable, increment the contents of the PC by 4 (fetch phase).
PC[PC] + 4
Carry out the actions specified by the instruction in the IR (execution phase).
In general execution of instruction involves
Transfer a word of data from one processor register to another or to the ALU.
Perform an arithmetic or a logic operation and store the result in a processor register.
Fetch the contents of a given memory location and load them into a processor register.
Store a word of data from a processor register into a given memory location.
Consider the single bus organization of datapath shown in figure below.Consider the instruction AD
(R3),R1 which adds contents of memory location pointed to by R3 to register R1.Execution of th
instruction involves following actions
Fetch the instruction
Fetch the first operand (the contents of the memory location pointed to byR3)
Perform the addition
Load the result into R1
7/26/2019 computer architecture lecture notes
2/39
2
Control sequence for execution of this instruction is shown below.
In step1 instruction fetch operation is initiated by loading cntents of PC intoMAR and sending a rea
request to memory. The select signal is set to select 4 which causes MUX to select constant 4.This valu
is added to operand at input B which is content of PC and result is stored in register Z.The updated valu
7/26/2019 computer architecture lecture notes
3/39
3
is moved from register Z back to PC during step 2 while waiting for memory function to complete.
step 3 word fetched from memory is loaded into IR.
Step 1 to 3 constitute instruction fetch phase .The instruction decoding circuit interprets the contents
IR at beginning of step 4.This enable the control circuitry to activatecontrol signals for steps 4-7 whi
constitute execution phase.The contents of register R3 are transferred to MAR in step 4 and a memo
read operation is initiated.Then the contents of R1 are transferred to register Y in step 5 to prepare f
addition operation.When the read opearation is completed the memory operand is available in regist
MDR and addition operation is performed in step 6.The contents of MDR are gated to the bus and thu
also the B input of ALU and register Y is selected as second input of ALU by choosing select y.The su
is stored in register Z then transferred to R1 in step 7.The End instructionncauses a new instruction fetc
cycle to begin by returning to step 1.
WMC stands for Wait for Memory operation to Complete. Generally the addressed device on th
memory bus is slower than the microprocessor. Therefore,the microprocessor has to wait for th
addressed device to complete its operation. This indication that the memory operation has be
completed is given to the processor by the control signal WMC.
Execution of Branch Instructions
Unconditional branch
A branch instruction replaces the contents of PC with branch target address.The address is obtaine
by adding an offset X which is given in instruction to updated value of PC.
Processing of unconditional branch instruction begins with fetch phase. This phase ends when a
instruction is loaded into IR in step 3.The offset value is extracted from IR by instruction decodin
circuit. Since the value of updated PC is already available in register Y the offset X is gated onto bus
step 4 and an addition operation is performed. The result which is branch target address is loaded into P
in step 5.The offset X is difference between branch target address immediately following bran
instruction.
Conditional branch
For a conditional branch check the status of condition codes before loading new
value into PC.
7/26/2019 computer architecture lecture notes
4/39
4
If n=0 processor returns to step 1 else step 5 is performed to load a new value into PC.
Single bus Organisation
The datapath often consists of the following functional blocks
7/26/2019 computer architecture lecture notes
5/39
5
The Instruction register stores the current instruction to be executed.
The Program Counter (PC) stores the address of the next instruction to be fetched.
Memory address register (MAR) - A register that either stores the memory address from which da
will be fetched to the CPU or the address to which data will be sent and stored.
Memory data register (MDR) - A register of a computer's control unit that contains the data to b
stored in the computer storage (e. g. RAM), or the data after a fetch from the computer storage. RegisteR0Rn-1 are provided for general purpose use.Register Y Z are provided for temporary storage durin
execution of some instructions. This organisation is called single bus organization since ALU and a
registers are interconnected via single common bus. The data and address lines of external memory b
are connected to internal processor bus via memory data register and memory address register MA
.Register MDR has two inputs and two outputs. Data may be loaded into MDR either from memory b
or from internal processor bus .The data stored in MDR may be stored in either bus. The input of MAR
connected to instruction decoder and control block. This unit is responsible for issuing signals th
control the operation of all units inside the processor and for interfacing with memory bus. Th
multiplexer selects either the output of register y or a constant value 4 to be provided as input A of ALU
The constant 4 is used to increment the contents of PC.As instruction execution proceeds data atransferred from one register to another passing through ALU to perform arithmetic or logic operatio
The instruction decode and control logic unit is responsible for implementing the actions specified b
instructions loaded in IR. Decoder generates the control signals needed to select registers involved an
direct the Transfer of data .The registers ,ALU and interconnecting bus are collectively referred to
datapath. To execute an instruction processor has to perform following steps
Fetch the contents of the memory location pointed to by the PC. The contents of this location are
loaded into the IR (fetch phase).
IR[[PC]]
Assuming that the memory is byte addressable, increment the contents of the PC by 4 (fetch phase).
PC[PC] + 4
Carry out the actions specified by the instruction in the IR (execution phase). In general execution of
instruction involves
Transfer a word of data from one processor register to another or to the ALU.
Perform an arithmetic or a logic operation and store the result in a processor register.
Fetch the contents of a given memory location and load them into a processor register.
Store a word of data from a processor register into a given memory location. Control sequence for
execution of this instruction for a single bus organization
7/26/2019 computer architecture lecture notes
6/39
6
In step1 instruction fetch operation is initiated by loading contents of PC into MAR and sending a re
request to memory. The select signal is set to select 4 which causes MUX to select constant 4.This valu
is added to operand at input B which is content of PC and result is stored in register Z. The updated valu
is moved from register Z back to PC during step 2 while waiting for memory function to complete.
step 3 word fetched from memory is loaded into IR.
Step 1 to 3 constitute instruction fetch phase .The instruction decoding circuit interprets the contents
IR at beginning of step 4.This enable the control circuitry to activate control signals for steps 4-7 whic
constitute execution phase. The contents of register R3 are transferred to MAR in step 4 and a memo
read operation is initiated. Then the contents of R1 are transferred to register Y in step 5 to prepare f
addition operation. When the read operation is completed the memory operand is available in regist
MDR and addition operation is performed in step 6.The contents of MDR are gated to the bus and thu
also the B input of ALU and register Y is selected as second input of ALU by choosing select y. The su
is stored in register Z then transferred to R1 in step 7.The End instruction causes a new instruction fetccycle to begin by returning to step 1.
Multi bus OrganizationIn a multibus organization several data transfers can take place in parallel. Three bus structures are use
to connect registers and ALU of a processor. All general purpose registers are combined into sing
register called register file. The register file has 3 ports. There are two outputs allowing contents of tw
registers to be accessed simultaneously and have their contents placed on buses A and B. The third po
allows data on bus C to be loaded into third register during some clock cycle.
Buses A and B are used to transfer source operands to A and B inputs of ALU where arithmetic and log
operation may be performed. Result is transferred to destination over bus C. Three bus organizatio
eliminates the need of temporary registers y and z in single bus organization. Another feature of thorganization is introduction of incrementer unit
which is used to increment the PC by 4.Using the incrementer eliminates the need to add 4 to PC usin
ALU.The source for constant 4 at ALU input is used to increment other address such as memory addre
in load,multiple and store instructions
7/26/2019 computer architecture lecture notes
7/39
7
Control sequence for three bus organisation for the instruction ADD R4,R5,R6
In step 1 contents of PC are passed through ALU using R=B control signal and loaded into MAR to star
a memory read operation. At the same time
PC is incremented by 4.The incremented value is loaded into PC at the end clock cycle. In step
processor waits for MFC and loads data received into MDR ,then transfers them to IR in step 3.final
the execution phase of instruction requires only one control step to complete step 4.
7/26/2019 computer architecture lecture notes
8/39
8
Control UnitThe control unit issues control signals external to the processor to cause data exchange with memory an
I/O modules. The control unit also issues control signals internal to the processor to move data betwe
registers, to cause the ALU to perform a specified function, and to regulate other internal operations. Th
control unit performs two basic tasks: Sequencing: The control unit causes the processor to step throug
a series of micro-operations in the proper sequence, based on the program being executed. Executio
The control unit causes each micro-operation to be performed.
Figure is a general model of the control unit, showing all of its inputs and outputs. The inputs are
Clock: This is how the control unit keeps time. The control unit causes one micro-operation (or a s
of simultaneous micro-operations) to be performed for each clock pulse.This is sometimes referred to
the processor cycle time, or the clock cycle time.Instruction register: The opcode and addressing mode of the current instruction are used to determin
which micro-operations to perform during the execute cycle.
Flags: These are needed by the control unit to determine the status of the processor and the outcome
previous ALU operations. For example, for the increment-and-skip-if-zero (ISZ) instruction, the contr
unit will increment the PC if the zero flag is set.
Control signals from control bus: The control bus portion of the system bus provides signals to th
control unit. The outputs are as follows:
Control signals within the processor: These are two types: those that cause data to be moved from on
register to another, and those that activate specific ALU functions.
Control signals to control bus: These are also of two types: contro l signals tomemory, and controlsignals to the I/O modules. Techniques for control unit implementation
Hardwired implementation
Microprogrammed implementation
Hardwired controlIn a hardwired implementation, the control unit is essentially a state machine circuit. Its input log
signals are transformed into a set of output logic signals, which are the control signals.Control unit is
7/26/2019 computer architecture lecture notes
9/39
9
combinational circuit that generates the required control outputs depending on state of all its inputs.Bas
block diagram of hardwired control unit is shown below.
The control unit makes use of the opcode and will perform different actions (issue a differe
combination of control signals) for different instructions. To simplify the control unit logic, there shou
be a unique logic input for each opcode. This function can be performed by a decoder,which takes
encoded input and produces a single output. In general, a decoder will have n binary inputs and 2n bina
outputs. Each of the 2n different input patterns will activate a single unique output. The clock portion
the control unit issues a repetitive sequence of pulses. This is useful for measuring the duration of micr
operations. Essentially, the period of the clock pulses must be long enough to allow the propagation
signals along data paths and through processor circuitry.A counter is used to keep track of contrsteps.Each count of this counter corresponds to control step.The required control signals are determine
by
contents of control step counter
contents of instruction register
contents of condition code flags
External input signals and interrupt requests
By separating decoding and encoding functions more detailed block diagram is shown below
7/26/2019 computer architecture lecture notes
10/39
10
RUN control signal when set to 1,RUN causes counter to be incremented by one at the end of every
clock cycle .When RUN equal to zero counter stops counting.
Generation of Zin control signal
Generation of END control signal
7/26/2019 computer architecture lecture notes
11/39
11
Advantage
Hardwired system can operate at high speed
Disadvantage
Little flexibility
Application
Used in RISC processor
Micro programmed Control
In a micro programmed control unit the logic of the control unit is specified by a microprogram.A micrprogram consists of a sequence of instructions in a microprogramming language. A micro progra
consists of sequence of instructions in micro programming language similar to machine language. The
are very simple instructions that specify micro-operations. A micro programmed control unit is
relatively simple logic circuit that is capable of (1) sequencing through microinstructions and (
generating control signals to execute each microinstruction.
7/26/2019 computer architecture lecture notes
12/39
12
Basic organization of micro programmed control unit
7/26/2019 computer architecture lecture notes
13/39
13
The micro routines for all instructions in the instruction set of computer are stored in special memo
called controlled store or control memorythe control unit can generate the control signals for an
instruction by reading Control word of corresponding microroutine from control store.(A control word
a word whose individual bits represent the various control signals.A sequence of control wo
corresponding to control sequence of machine instruction constitute the micro routine for that instructio
And the individual control words in micro routine are reffered to as micro instructions. To read contr
words sequentially from control store a microprogram counter is used.every time a new micro instructio
is loaded into IR the output of starting address generator is loaded into PC. PC is then automatical
incremented by clock causing successive micro instructions to be read from control store.hence th
control signals are delivered to various parts of processor in correct sequence. To support microprogra
branching organisation of control unit is modified as follows
Starting and branch address generator block loads a new address into PC when a micro instructio
instructs it to do so. PC is incremented every time a new instruction is fetched from control store exce
7/26/2019 computer architecture lecture notes
14/39
14
in following situations.when a new micro instruction PC is loaded into IR is loded withstarting addre
of that instruction .when a branch micro instruction is encountered PC is loaded with branch targ
address if branch condition is satisfied.when an END micro instruction PC is loaded with address
first
CW in micro routine for that instruction cycle.
Advantages of micro programmed control unit
Simplifies design of control unit
Cheaper and less error prone to implement
Disadvantage
Slower than hardwired unit
Application
Used in CISC processor
Micro program sequencing
If all micro programs require only straightforward sequential execution of microinstructions except f
branches, letting a PC governs the sequencing would be efficient. However, this has tw
disadvantages:
Having a separate micro routine for each machine instruction results in a large total number
microinstructions and a large control store.
Longer execution time because it takes more time to carry out the required branches.
A powerful alternative approach is to include an address field as a part of every microinstruction
indicate the location of the next microinstruction to be fetched. Separate branch microinstructions a
virtually eliminated. Microinstructions with Next-Address Field is shown below.
7/26/2019 computer architecture lecture notes
15/39
15
Arithmetic and logic designAn n bit sequence of binary digits an-1,an-2,.....a1,a0 is interpreted as unsigned integer A as
The simplest form of representation that employs a sign bit is sign magnitude representation .In an n b
word rightmost n-1 bits hold magnitude of integer The general representation of signed integer is
Addition/subtraction of signed numbersAdditionAt the ith stage:Input:ci is the carry-in Output:si is the sum ci+1 carry-out to (i+1)st state
7/26/2019 computer architecture lecture notes
16/39
16
Addition logic for a single stage
n-bit adder Cascade n full adder (FA) blocks to form a n-bit adder.
Carries propagate or ripple through this cascade, n-bit ripple carry adder
7/26/2019 computer architecture lecture notes
17/39
17
K n-bit adderK n-bit numbers can be added by cascading k n-bit adders
n-bit subtractor
X
Y is equivalent to adding 2s complement of Y toX. 2s complement is equivalent to 1s complement + 1.
XY = X + Y + 1
2s complement of positive and negative numbers is computed similarly.
n-bit adder/subtractorThe two inputs x and y represent the arguments to be added/subtracted. The control input ADD/SU
determines whether an add or a subtract operation is to be performed such that if the control input is
then an add operation is performed while if the control input is 1 then a subtract operation is performed
7/26/2019 computer architecture lecture notes
18/39
18
Detecting overflows
Overflows can only occur when the sign of the two operands is the same. Overflow occurs if the sign
of the result is different from the sign of the operands.
xn-1, yn-1, sn-1 represent the sign of operand x, operand y and result s respectively.
Circuit to detect overflow can be implemented by the following logic expressions:
Computing the add time
7/26/2019 computer architecture lecture notes
19/39
19
Consider 0th stage: S0 is available after 1 gate delay.
c1 is available after 2 gate delays.
Computing the add time of n bit ripple carry adder
Consider a 4 bit ripple carry adder
s0 available after 1 gate delays, c1 available after 2 gate delays.
s1 available after 3 gate delays, c2 available after 4 gate delays. s2 available after 5 gate delays, c3 available after 6 gate delays.
s3 available after 7 gate delays, c4 available after 8 gate delays
For an n-bit adder, sn-1 is available after 2n-1 gate delays cn is available
after 2n gate delays
Fast addersOne of the main drawbacks of the RIPPLE CARRY ADDER circuit is the expected long delay betwee
the time the inputs are presented to the circuit until the final output is obtained. This is because of th
dependence of each stage on the carry output produced by the previous stage. This chain of dependen
makes the adders delay. In order to speed up the addit ion process, it is necessary to introduce additiocircuits in which the chain of dependence among the adder stages must be broken. One fast adder circu
is carry-look ahead
(CLA) adder
Carry-look-ahead (CLA) adder
C i+1 can be written as
we can write C i+1 as
Where
7/26/2019 computer architecture lecture notes
20/39
20
Gi is called generate function and Pi is called propagate function
Gi and Pi are computed only from xi and yi and not ci, thus they can be computed in one gate delay
after X and Y are applied to the inputs of an n-bit adder.A simpler circuit can be realized as
Which differs from
only when x i=y i =1.
Thus using a cascade of 2 -two input XOR gate to realize sum the basic cell shown below can be used
each bit stage
Expanding ci in terms of i-1 subscripted variables and substitute it in c i+1
Expression
Continuing in this way the final expression for any carry variable is
All carries can be obtained 3 gate delays afterX, Y and c0 are applied.
-One gate delay forPi and Gi
7/26/2019 computer architecture lecture notes
21/39
21
-Two gate delays in the AND-OR circuit for ci+1
All sums can be obtained 1 gate delay(XOR gate after the carries are computed. Independent of n, n-b
addition requires only 4 gate delays.
This is called Carry Lookahead adder
C4 is available after after 3 gate delays and S3 after 4 gate delays where as a 4 bit ripple carry adder C 4
available after 8 gate delays and S3 after 7 gate delays. Performing n-bit addition in 4 gate dela
independent of n is good only theoretically because of fan-in constraints.
Last AND gate and OR gate require a fan-in of (n+1) for a n-bit adder. For a 4-bit adder (n=4) fan-
of 5 is required.
Practical limit for most gates.
In order to add operands longer than 4 bits, we can cascade 4-bit Carry- Lookahead adders. Cascade
Carry-Lookahead adders is called Blocked
Carry-Lookahead adder.
Figure shows a 16 bit adder built from 4 bit adders
7/26/2019 computer architecture lecture notes
22/39
22
Blocked Carry-Look ahead adder
In the first block
And
C16 is obtained as
After xi, yi and c0 are applied as inputs:
Gi andPi for each stage are available after 1 gate delay.
PI is available after 2 and GI after 3 gate delays. All carries are available after 5 gate delays.
c16 is available after 5 gate delays.
s15 which depends on c12 is available after 8 (5+3)gate delays (Since for a 4-bit carry look ahead adde
the last sum bit is available 3 gate delays after all inputs are available)
Multiplication
Multiplication of unsigned numbersConsider nxn multiplication
7/26/2019 computer architecture lecture notes
23/39
23
Product of 2 n-bit numbers is at most a 2n-bit number. Unsigned multiplication can be viewed
addition of shifted versions of the multiplicand.
Multiplication involves the generation of partial products, one for each digit in the multiplier. The
partial products are then summed to produce the final product.
The partial products are easily defined. When the multiplier bit is 0, the partial product is 0.When th
multiplier is 1, the partial product is the multiplicand.
The total product is produced by summing the partial products. For this operation, each successiv
partial product is shifted one position to the left relative to the preceding partial product.
Array multiplier
Multiplicand is shifted by displacing it through an array of adders.
7/26/2019 computer architecture lecture notes
24/39
24
Where each multiplier cell is given as
Array multipliers are:
Extremely inefficient.
Have a high gate count for multiplying numbers of practical size such as 32-bit or 64-bit numbers.
Perform only one function, namely, unsigned integer product.
Assuming that there are 2 gate delays from input to output of a full adder block the worst case sign
propagation delay path (right end of first row to highest product bit output at the left end ,comprising a
cells in bottom row and two cells in right end of all other rows) has a total of 6x(n-1)-1 gate dela
including initial and gate delays in each cell. Since incoming partial product of first row is 0,only AN
gates are required which is included in delay expression.
Sequential multiplicationIn this case, multiplication is performed as a series of (n) conditional addition and shift operations suc
that if the given bit of the multiplier is 0 then only a shift operation is performed, while if the given bit o
the multiplier is 1 then addition of the partial products and a shift operation are performed.
Register configuration in sequential multiplier
7/26/2019 computer architecture lecture notes
25/39
25
Flow chart for unsigned binary multiplication
7/26/2019 computer architecture lecture notes
26/39
26
The multiplier and multiplicand are loaded into two registers (Q and M).
A third register, the A register, is also needed and is initially set to 0. There is
also a 1-bit C register, initialized to 0, which holds a potential carry bit resulting
from addition. The operation of the multiplier is as follows. Control logic reads
the bits of the multiplier one at a time. If is 1, then the multiplicand is added to
the A register and the result is stored in the A register, with the C bit used for
overflow.Then all of the bits of the C, A, and Q registers are shifted to the right
one bit, so that the C bit goes into goes into and is lost. If is 0, then no addition
is performed,just the shift.This process is repeated for each bit of the original
multiplier. The resulting -bit product is contained in the A and Q
registers.Example is shown below
Signed Multiplication
Considering 2s-complement signed operands, what will happen to (-13) (+11)
if following the same method of unsigned multiplication
7/26/2019 computer architecture lecture notes
27/39
27
For a negative multiplier, a straightforward solution is to form the 2scomplement
of both the multiplier and the multiplicand and proceed as in
the case of a positive multiplier.
This is possible because complementation of both operands does not
change the value or the sign of the product.
A technique that works equally well for both negative and positive
multipliersBooth algorithm.
Booth algorithm
7/26/2019 computer architecture lecture notes
28/39
28
The multiplier and multiplicand are placed in the Q and M registers,
respectively. There is also a 1-bit register placed logically to the right of the
least significant bit of the Q register The results of the multiplication will appear
in the A and Q registers. A and Q-1 are initialized to 0. Control logic scans thebits of the multiplier one at a time.Now, as each bit is examined, the bit to its
right is also examined. If the two bits are the same (11 or 00), then all of the
bits of the A,Q, and Q-1 registers are shifted to the right 1 bit. If the two bits
differ, then the multiplicand is added to or subtracted from the A register,
depending on whether the two bits are 01 or 10. Following the addition or
subtraction, the right shift occurs. In either case, the right shift is such that the
leftmost bit of A, namely A n-1not only is shifted into A n-2but also remains in
A n-1.This is required to preserve the sign of the number in A and Q. It is known
as an arithmetic shift, because it preserves the sign bit.
Booth multiplier recording table
7/26/2019 computer architecture lecture notes
29/39
29
In general, in the Booth scheme, -1 times the shifted multiplicand is
selected when moving from 0 to 1, and +1 times the shifted multiplicand is
selected when moving from 1 to 0, as the multiplier is scanned from right to
left.Booth multiplication with a positive multiplierConsider in a multiplication, the multiplier is positive 0011110
Multiplier 0 0 1 1 1 1 0
Booth recorded multiplier 0 +1 0 0 -1 0
Booth multiplication with negative number
7/26/2019 computer architecture lecture notes
30/39
30
Booth recorded multiplier
Best casea long string of 1s (skipping over 1s)
Worst case0s and 1s are alternating
Fast Multiplication
Bit-Pair Recoding of MultipliersBit-pair recoding halves the maximum number of summands (versions of
the multiplicand).
Bit pair recording is derived from booth multiplier scheme
7/26/2019 computer architecture lecture notes
31/39
31
Multiplier bit pair recordingExample of bit Pair recording of multiplier 11010
Example multiplication using bit pair recording
Using booth multiplication
7/26/2019 computer architecture lecture notes
32/39
32
Hence using bit pair recording no;of summands is reduced by n/2,where n
no:of bits in multiplier.
Carry-Save Addition of Summands
CSA speeds up the addition process
Consider the addition of many summands, we can:
Group the summands in threes and perform carry-save addition on each of these groups in parallel
generate a set of S and C vectors in one fulladder delay
Group all of the S and C vectors into threes, and perform carry-save addition on them, generating
further set of S and C vectors in one more full-adder delay
Continue with this process until there are only two vectors remaining They can be added in a RCA
CLA to produce the desired product A multiplication example used to illustrate carry-save addition
7/26/2019 computer architecture lecture notes
33/39
33
Multiplication using normal scheme
7/26/2019 computer architecture lecture notes
34/39
34
Multiplication using CSA of summands
Schematic representation of carry saves addition operations
The outputs S4 and C4 from third CS level are available 6 gate delays later assuming two gate delays pe
CSA level.The final two vectors can be added in further 8 gate delays using carry look ahead adder.Thetotal gate delay is there fore 15.by comparison total gate delay in performing this multiplication using
nxn array multiplier is 6(n-1)-1=29( substituten=6).in general 1.7 log2 K - 1.7 t levels of CSA steps are
needed to reduce K summands to 2 vectors which when added produce desired sum.
Issues with carry save method
If negative summnds are involved it is necessary to accommodate sign extension
2n bit CLA is needed to add final S and C vectors.Fewer bits are actually needed.
n summands are used for nxn multiplication. If bit pair recording is used this will be reduced
n/2.This reduces no:of CSA levels from 1.7 log2 n -1.7 to 1.7 log2n-3.4
Integer divisionManual Division
Steps for Manual Division
Position the divisor appropriately with respect to the dividend and performs a subtraction.
If the remainder is zero or positive, a quotient bit of 1 is determined, the remainder is extended b
another bit of the dividend, the divisor is repositioned, and another subtraction is performed. If th
7/26/2019 computer architecture lecture notes
35/39
35
remainder is negative, a quotient bit of 0 is determined, the dividend is restored by adding back th
divisor, and the divisor is repositioned for another subtraction.
Restoring Division
Circuit arrangement for restoring division
The divisor is placed in the M register, the dividend in the Q register. At each step, the A and Q registe
together are shifted to the left 1 bit. M is subtracted from A to determine whether A divides the parti
remainder.3 If it does, then gets a 1 bit. Otherwise, gets a 0 bit and M must be added back to A to resto
the previous value. The count is then decremented, and the process continues for n steps. At the end, th
quotient is in the Q register and the remainder is in the A register.
StepsRepeat these steps n times
Shift A and Q left one binary position
Subtract M from A, and place the answer back in A
If the sign of A is 1, set q0 to 0 and add M back to A (restore A);
otherwise, set q0 to 1
Flow chart summarizing the restoring method
7/26/2019 computer architecture lecture notes
36/39
36
Example for restoring method
7/26/2019 computer architecture lecture notes
37/39
37
Non-restoring DivisionAvoid the need for restoring A after an unsuccessful subtraction.
Step 1: Do the following n times
If the sign of A is 0, shift A and Q left one bit position and subtract M from A; otherwise, shift A an
Q left and add M to A.
Now, if the sign of A is 0, set q0 to 1; otherwise, set q0 to 0.
Step2: If the sign of A is 1, add M to A
Example
7/26/2019 computer architecture lecture notes
38/39
38
Floating point numbers
Fixed point numbers:Binary point is fixed
Representation of n bit binary fraction
B=b0.b-1b-2b-3..b-(n-1)
In 2s complement system signed value F is given by
F(B)=-b0x20+ b-1x2-1+ b-2x2-2.+ b-(n-1
)x2-(n-1)
Where the range of F
-1
7/26/2019 computer architecture lecture notes
39/39
39
A sample representation of 32 bit number
IEEE notation
IEEE Floating Point notation is the standard representation in use. There are two representations:
- Single precision.
- Double precision.
Both have an implied base of 2.
Single precision:
- 32 bits (23-bit mantissa, 8-bit exponent in excess-127 representation) Double precision:
- 64 bits (52-bit mantissa, 11-bit exponent in excess-1023 representation) Fractional mantissa, with an
implied binary point at immediate left
IEEE notation assumes that all numbers are normalized so that the MSB of the mantissa is a 1 and doe
not store this bit. So the real MSB of a number in the IEEE notation is either a 0 or a 1.
The values of the numbers represented in the IEEE single precisionnotation are of the form:
The hidden 1 forms the integer part of the mantissa.
excess-127 and excess-1023 (not excess-128 or excess-1024) are used to represent the exponent.
In the IEEE representation, the exponent is in excess-127 (excess-1023) notation. The actual exponen
represented are
In single precession case normalized representation requires an exponent less than -126 or greater tha
127.In the first case underflow occurred and in second case an overflow occurred.This is because th
IEEE uses the exponents -127 and 128 (and -1023 and 1024), that is the actual values 0 and 255
represent special conditions:
- Exact zero
- Infinity