Computer Organisation

UNIT -1Computer Types: A digital computer can be defined as a fast electronic calculating machine that accepts digitized input information,processes it according to a list of internally stored instructions and produces the resulting output information Commonly used computer types are1. Personal Computer2. Notebook Computers3. 3.Workstations4. Enterprise Systems & Servers5. Supercomputers

PERSONAL COMPUTERS:1. They are the most commonly used Computers2. They are used in houses, schools & business offices3. They are Desktop Computers. Desktop Computers are the computers that have processing & storage units, visual display and audio output units and a keyboard that can be placed easily on an office desk.

NOTEBOOK COMPUTERS:They are the personal computers in which all its components such as processing unit storage unit and so on can be packaged into a single unit in the form of thin briefcase.They are also called as Laptop Computers

WORKSTATIONS:They have more computational power than that of a personal computerThey have some features of PC but they have high resolution graphics input output capabilityThey are generally used in engineering applications especially for interactive design workThey are used in design works such as animation & video editingExample: Ultra 60 Workstation from Sun Microsystems

ENTERPRISE SYSTEMS:

They are also called as MainframesThey are generally used for data processing in businesses in medium to large corporations which require more computing power and storage capacity than WorkstationsExample: IBM S/390 Servers contains sizable storage unitsThey are capable of handling large number of requests to access the dataThey are widely used in education, business and personal user communitiesAll the requests and responses are usually transported over communication facilities

SUPER COMPUTERS:

They are used for large scale numerical operations required in applications such as Weather Forecasting, Aircraft design & Simulation They are the most powerful computers and they are of large size and they are used to process huge amounts of dataThey are also used by Nuclear Scientists to analyze Nuclear Fission & Nuclear Fusion

Example: Cray T90

FUNCTIONAL UNITS OF A COMPUTER:

The functional units of a Computer areArithmetic and logical unitControl UnitMemory UnitOutput UnitInput Unit

Arithmetic and Logical Unit:It is the unit of the System where most of computer operations are carried outFor example if we want to perform any arithmetic or logical operation first we have to fetch the required operands from memory into processor and in the processor required operation is carried out by ALU and result may be stored in memory or retained in the processor for immediate useWhen operands are brought into processor, they are stored in high speed storage elements called as registersEach Register can store one word of dataRegisters are faster accessible than the memory words

Control Unit:It is the unit which coordinates the operations of memory unit, ALU, Input and Output unitsIt is the nerve center of our system that sends control signals to other units and senses their statesTiming signals are signals that will determine when a given action has to take placeData transfer between processor and memory is controlled by control unit through timing signalsWe can consider CU as a well defined physically separate unit that interact with other units but in practice, most of the control circuitry is physically distributed throughout the machine.

Memory Unit:The function of memory unit is to store programs and dataMemory is organized in the form of memory words. Each memory word will have an unique address.memory Words are sequentially addressed starting with zero.Memory words can be accessed sequentially or randomlyIn Sequential Access Memory, memory words are accessed sequentially one by oneIn this type of memory, 4th memory word is accessed after accessing 0,1,2 and 3rd memories onlyIn Random Access memory, memory words can be accessed randomly.The time required for accessing all memory words will be fixed

Output unit:The main purpose of Output Unit is to send the processed results to the external worldCommonly used output devices are monitors and printersWe have 2 types of printers Impact printers and Non Impact PrintersImpact Printers are the printers in which images are created by using pressing operationExample: Type Writer, Dot Matrix PrinterNon Impact Printers are the printers in which image is created by using methods such as Spraying, photocopying Example: Ink Jet Printers ,Laser Printers

Dot matrix Printers create image by using mechanical device called Print head

Ink jet Printers uses ink jet streams for printing

Input Unit:Input Units are the devices that are used for accepting input information such as programs and data from external world or user

Commonly Used input devices are Key Board, Mouse, Joy Stick

Some devices will function as both input unit and output unit. For example touch screen is used as both input device and output device. So we refer as I/O unit

BASIC OPERATIONAL STEPS:

The following are the operating steps for the execution of a program

1. Program is stored in memory through input unit

2. Program Counter(PC) will point to first instruction of a program at the starting of execution of the program

3. Contents of PC are transferred to MAR and a read control signal is sent to memory

4. After memory access time is completed ,the addressed word is read out of memory and loaded into MDR

5. Next the contents of MDR are transferred to IR and now the instruction is ready to be executed

6. if the instruction involves any operation to be performed by ALU, it is necessary to obtain required operands

7. if an operand resides in memory, it has to be fetched by sending its address to MAR and initiating a read cycle

8. When the operand has been read from memory into MDR ,it is transferred from MDR to ALU

9. After fetching one or more operands in this way ALU can perform the desired operation

10. If the result of this operation is to be stored in memory then result is sent to MDR

11. The address of location where the result is to be stored is sent to MAR and a write cycle is initiated

12. Somewhere during the execution of current instruction, contents of PC is incremented and it contains address of next instruction to be executed

13. As soon as the execution of current instruction is completed, a new instruction fetch may be started..DATA REPRESENTATIONInformation that a Computer is dealing with

* Data - Numeric Data Numbers ( Integer, real) - Non-numeric Data Letters, Symbols

NUMERIC DATA REPRESENTATION

R = 10 Decimal number system,R = 2 BinaryR = 8 Octal,R = 16 Hexadecimal

Radix point(.) separates the integer portion and the fractional portion

DataNumeric data – numbers (integer, real) Non-numeric data - symbols, letters

Number SystemNon positional number system - Roman number systemPositional number system - Each digit position has a value called a weight associated with it - Decimal, Octal, Hexadecimal, BinaryBase (or radix) R number - Uses R distinct symbols for each digit - Example AR = an-1 an-2 ... a1 a0 .a-1…a-m

- V(AR ) =

1n

mi

iiRa

* Relationship between data elements - Data Structures Linear Lists, Trees, Rings, etc

* Program (Instruction) NUMERIC DATA REPRESENTATION

REPRESENTATION OF NUMBERS - POSITIONAL NUMBERS

Decimal Binary Octal Hexadecimal 00 0000 00 0 01 0001 01 1 02 0010 02 2 03 0011 03 3 04 0100 04 4 05 0101 05 5 06 0110 06 6 07 0111 07 7 08 1000 10 8 09 1001 11 9 10 1010 12 A 11 1011 13 B 12 1100 14 C 13 1101 15 D 14 1110 16 E 15 1111 17 F

Binary, octal, and hexadecimal conversion

1 0 1 0 1 1 1 1 0 1 1 0 0 0 1 11 2 7 5 4 3

A F 6 3

OctalBinaryHexa

Data Types

CONVERSION OF BASES

Decimal to Base R number

Base R to Decimal Conversion

V(A) = ak.RkA = an-1 an-2 an-3 … a0 . a-1 … a-m

(736.4)8 = 7 x 82 + 3 x 81 + 6 x 80 + 4 x 8-1 = 7 x 64 + 3 x 8 + 6 x 1 + 4/8 = (478.5)10(110110)2 = ... = (54)10(110.111)2 = ... = (6.785)10(F3)16 = ... = (243)10(0.325)6 = ... = (0.578703703 .................)10

- Separate the number into its integer and fraction parts and convert each part separately.- Convert integer part into the base R number → successive divisions by R and accumulation of the remainders.- Convert fraction part into the base R number → successive multiplications by R and accumulation of integer digits

Data Types

EXAMPLE

Convert 41.687510 to base 2.

Integer = 414120 110 0 5 0 2 1 1 0 0 1

Fraction = 0.68750.6875x 21.3750x 20.7500x 21.5000 x 21.0000

(41)10 = (101001)2 (0.6875)10 = (0.1011)2

(41.6875)10 = (101001.1011)2

Convert (63)10 to base 5: (223)5Convert (1863)10 to base 8: (3507)8Convert (0.63671875)10 to hexadecimal: (0.A3)16

Exercise

Data Types

COMPLEMENT OF NUMBERS

Two types of complements for base R number system: - R's complement and (R-1)'s complement

The (R-1)'s Complement Subtract each digit of a number from (R-1)

Example - 9's complement of 83510 is 16410 - 1's complement of 10102 is 01012(bit by bit complement operation)

The R's Complement Add 1 to the low-order digit of its (R-1)'s complement

Example - 10's complement of 83510 is 16410 + 1 = 16510 - 2's complement of 10102 is 01012 + 1 = 01102

FIXED POINT NUMBERS

Binary Fixed-Point Representation

X = xnxn-1xn-2 ... x1x0. x-1x-2 ... x-m

Sign Bit (xn): 0 for positive & 1 for negative

Remaining Bits (xn-1xn-2 ... x1x0. x-1x-2 ... x-m)

Numbers:Fixed Point Numbers and Floating Point Numbers

SIGNED NUMBERS

Signed magnitude representation Signed 1's complement representation Signed 2's complement representation

Example: Represent +9 and -9 in 7 bit-binary number

Only one way to represent +9 ==> 0 001001 Three different ways to represent -9: In signed-magnitude: 1 001001 In signed-1's complement: 1 110110 In signed-2's complement: 1 110111

In general, in computers, fixed point numbers are represented either integer part only or fractional part only.

Need to be able to represent both positive and negative numbers

- Following 3 representations

CHARACTERISTICS OF 3 DIFFERENT REPRESENTATIONS

ComplementSigned magnitude:Complement only the sign bit Signed 1's complement:Complement all the bits including sign bitSigned 2's complement:Take the 2's complement of the number,including its sign bit. Maximum and minimum represent able Numbers and Representation of Zero X = xn xn-1 ... x0 . x-1 ...

x-mSigned Magnitude

Max: 2n - 2-m 011 ... 11.11 ... 1 Min: -(2n - 2-m) 111 ... 11.11 ... 1 Zero: +0 000 ... 00.00 ... 0 -0 100 ... 00.00 ... 0

Signed 1’s Complement

Max: 2n - 2-m 011 ... 11.11 ... 1 Min: -(2n - 2-m) 100 ... 00.00 ... 0 Zero: +0 000 ... 00.00 ... 0 -0 111 ... 11.11 ... 1

Signed 2’s Complement

Max: 2n - 2-m 011 ... 11.11 ... 1 Min: -2n 100 ... 00.00 ... 0 Zero: 0 000 ... 00.00 ... 0

2’s COMPLEMENT REPRESENTATION WEIGHTS

Signed 2’s complement representation follows a “weight” scheme similar to that of unsigned numbersSign bit has negative weightOther bits have regular weights

X = xn xn-1 ... x0

V(X) = - xn 2n + xi 2ii = 0

n-1

ARITHMETIC ADDITION: SIGNED MAGNITUDE

[1] Compare their signs[2] If two signs are the same, ADD the two magnitudes - Look out for an overflow[3] If not the same, compare the relative magnitudes of the numbers and then SUBTRACT the smaller from the larger --> need a subtractor to add

Fixed Point Representations

ARITHMETIC ADDITION: SIGNED 2’s COMPLEMENT

Example 6 0 0110 9 0 1001 15 0 1111

-6 1 1010 9 0 1001 3 0 0011

6 0 0110 -9 1 0111 -3 1 1101

-9 1 0111 -9 1 0111 -18 (1)0 1110

Add the two numbers, including their sign bit, and discard any carry out of leftmost (sign) bit - Look out for an overflow

overflow9 0 10019 0 1001+)

+)

+)

+)

+)

18 1 0010 2 operands have the same signand the result sign changes

xn-1yn-1s’n-1 + x’n-1y’n-1sn-1 = cn-1 cn

x’n-1y’n-1sn-1(cn-1 cn)

xn-1yn s’n-1(cn-1 cn)


[1] Compare their signs[2] If two signs are the same, ADD the two magnitudes - Look out for an overflow[3] If not the same, compare the relative magnitudes of the numbers and then SUBTRACT the smaller from the larger --> need a subtractor to add

6 0110+) 9 1001 15 1111 -> 01111

9 1001- ) 6 0110 3 0011 -> 00011

9 1001 -) 6 0110 - 3 0011 -> 10011

6 0110+) 9 1001 -15 1111 -> 11111

6 + 9 -6 + 9

6 + (- 9) -6 + (-9)

Overflow 9 + 9 or (-9) + (-9) 9 1001+) 9 1001 (1)0010Overflo

w

ARITHMETIC ADDITION: SIGNED 1’s COMPLEMENTFixed Point Representations

6 0 0110 -9 1 0110 -3 1 1100

+)

Example

not overflow

(cn-1 cn) = 0

-9 1 0110-9 1 0110 (1)0 1100 1 0 1101

+)

+)

9 0 10019 0 1001 1 (1)0010

+)

overflow(cn-1 cn)

End-around carry

-6 1 1001 9 0 1001 (1) 0(1)0010 1 3 0 0011

+)

+)

COMPARISON OF REPRESENTATIONS

* Easiness of negative conversion S + M > 1’s Complement > 2’s Complement* Hardware - S+M: Needs an adder and a subtractor for Addition - 1’s and 2’s Complement: Need only an adder

* Speed of Arithmetic 2’s Complement > 1’s Complement (end-around C)

* Recognition of Zero

2’s Complement is fast


Arithmetic Subtraction in 2’s complement Take the complement of the subtrahend (including the sign bit) and add it to the minuend including the sign bits. ( ± A ) - ( - B ) = ( ± A ) + B ( ± A ) - B = ( ± A ) + ( - B )

Add the two numbers, including their sign bits. - If there is a carry out of the most significant (sign) bit, the result is incremented by 1 and the carry is discarded.

FLOATING POINT NUMBER REPRESENTATION

* The location of the fractional point is not fixed to a certain location* The range of the representable numbers is wide F = EM

mn ekek-1 ... e0 mn-1mn-2 … m0 . m-1 … m-m

sign exponent mantissa

- Mantissa Signed fixed point number, either an integer or a fractional number

- Exponent Designates the position of the radix point Decimal Value V(F) = V(M) * RV(E)

M: MantissaE: ExponentR: Radix

Floating Point Representation

CHARACTERISTICS OF FLOATING POINT NUMBER REPRESENTATIONS

Normal Form - There are many different floating point number representations of the same number → Need for a unified representation in a given computer - the most significant position of the mantissa contains a non-zero digit

Representation of Zero

- Zero Mantissa = 0

- Real Zero Mantissa = 0 Exponent = smallest representable number which is represented as 00 ... 0 Easily identified by the hardware

Floating Point Representation

FLOATING POINT NUMBERS

0 .1234567 0 04sign sign

mantissa exponent==> +.1234567 x 10+04

Example

Note: In Floating Point Number representation, only Mantissa(M) and Exponent(E) are explicitly represented. The Radix(R) and the position of the Radix Point are implied.

Example A binary number +1001.11 in 16-bit floating point number representation (6-bit exponent and 10-bit fractional mantissa)

0 0 00100 100111000

0 0 00101 010011100

Exponent MantissaSignor

OTHER DECIMAL CODES Decimal BCD(8421) 2421 84-2-1 Excess-3

0 0000 0000 0000 0011 1 0001 0001 0111 0100 2 0010 0010 0110 0101 3 0011 0011 0101 0110 4 0100 0100 0100 0111 5 0101 1011 1011 1000 6 0110 1100 1010 1001 7 0111 1101 1001 1010 8 1000 1110 1000 1011 9 1001 1111 1111 1100

d3 d2 d1 d0: symbol in the codes

BCD: d3 x 8 + d2 x 4 + d1 x 2 + d0 x 1 8421 code. 2421: d3 x 2 + d2 x 4 + d1 x 2 + d0 x 1 84-2-1: d3 x 8 + d2 x 4 + d1 x (-2) + d0 x (-1) Excess-3: BCD + 3

Note: 8,4,2,-2,1,-1 in this table is the weight associated with each bit position.

BCD: It is difficult to obtain the 9's complement. However, it is easily obtained with the other codes listed above.→ Self-complementing codes

External Representations

GRAY CODE - ANALYSIS

Letting gngn-1 ... g1 g0 be the (n+1)-bit Gray code for the binary number bnbn-1 ... b1b0

gi = bi bi+1 , 0 i n-1 gn = bnand bn-i = gn gn-1 . . . gn-i bn = gn

0 0 0 0 00 0 0001 0 1 0 01 0 001 1 1 0 11 0 011 1 0 0 10 0 010 1 10 0 110 1 11 0 111 1 01 0 101 1 00 0 100 1 100 1 101 1 111 1 010 1 011 1 001 1 101 1 000

The Gray code has a reflection property - easy to construct a table without calculation, - for any n: reflect case n-1 about a mirror at its bottom and prefix 0 and 1 to top and bottom halves, respectively

Reflection of Gray codes

Note:

Other Binary codes

GRAY CODEOther Binary codes

* Characterized by having their representations of the binary integers differ in only one digit between consecutive integers* Useful in some applications

4-bit Gray codes

Decimalnumber

Gray Binary g3 g2 g1 g0 b3 b2 b1 b0

0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 2 0 0 1 1 0 0 1 0 3 0 0 1 0 0 0 1 1 4 0 1 1 0 0 1 0 0 5 0 1 1 1 0 1 0 1 6 0 1 0 1 0 1 1 0 7 0 1 0 0 0 1 1 1 8 1 1 0 0 1 0 0 0 9 1 1 0 1 1 0 0 110 1 1 1 1 1 0 1 011 1 1 1 0 1 0 1 112 1 0 1 0 1 1 0 013 1 0 1 1 1 1 0 114 1 0 0 1 1 1 1 015 1 0 0 0 1 1 1 1

CHARACTER REPRESENTATION ASCII

ASCII (American Standard Code for Information Interchange) Code

Other Binary codes

MSB (3 bits)

ERROR DETECTING CODESParity System

- Simplest method for error detection - One parity bit attached to the information - Even Parity and Odd Parity

Even Parity - One bit is attached to the information so that the total number of 1 bits is an even number

1011001 0 1010010 1

Odd Parity - One bit is attached to the information so that the total number of 1 bits is an odd number

1011001 1 1010010 0

Error Detecting codes

PARITY BIT GENERATIONParity Bit Generation For b6b5... b0(7-bit information); even parity bit beven For odd parity bit

beven = b6 Å b5 Å ... Å b0 bodd = beven Å 1 = beven

0123456789ABCDEF

NULSOHSTXETXEOTENQACKBELBSHTLFVTFFCRSOSI

SP!“#$%&‘()*+,-./

0123456789:;<=>?

@ABCDEFGHIJKLMNO

PQRSTUVWXYZ[\]mn

‘abcdefghIjklmno

Pqrstuvwxyz{|}~DEL

0 1 2 3 4 5 6 7

DLEDC1DC2DC3DC4NAKSYNETBCANEMSUBESCFSGSRSUS

LSB(4 bits)

CONTROL CHARACTER REPRESENTAION (ACSII)

NUL NullSOH Start of Heading (CC)STX Start of Text (CC)ETX End of Text (CC)EOT End of Transmission (CC)ENQ Enquiry (CC)ACK Acknowledge (CC)BEL BellBS Backspace (FE)HT Horizontal Tab. (FE)LF Line Feed (FE)VT Vertical Tab. (FE)FF Form Feed (FE)CR Carriage Return (FE)SO Shift OutSI Shift InDLE Data Link Escape (CC)

DC1 Device Control 1DC2 Device Control 2DC3 Device Control 3DC4 Device Control 4NAK Negative Acknowledge (CC)SYN Synchronous Idle (CC)ETB End of Transmission Block (CC)CAN CancelEM End of MediumSUB SubstituteESC EscapeFS File Separator (IS)GS Group Separator (IS)RS Record Separator (IS)US Unit Separator (IS)DEL Delete

(CC) Communication Control(FE) Format Effector(IS) Information Separator

Other Binary codes

PARITY GENERATOR AND PARITY CHECKER

Parity Generator Circuit (even parity)b

6b5b4b3b2b1

b0

beven

Parity Checker

b6b5b4b3b2b1

b0

beven

Even Parity error indicator

Error Detecting codes

REGISTER TRANSFER AND MICROOPERATIONS

• Register Transfer Language

• Register Transfer

• Bus and Memory Transfers

• Arithmetic Microoperations

• Logic Microoperations

• Shift Microoperations

• Arithmetic Logic Shift Unit

Unit-2

MICROOPERATIONS (1)

Register Transfer Language

The operations on the data in registers are called microoperations.The functions built into registers are examples of microoperationsShiftLoadClearIncrement…

ORGANIZATION OF A DIGITAL SYSTEM

- Set of registers and their functions- Micro operations set Set of allowable micro operations provided by the organization of the computer- Control signals that initiate the sequence of micro operations (to perform the functions)

Definition of the (internal) organization of a computer


REGISTER TRANSFER LANGUAGE


Rather than specifying a digital system in words, a specific notation is used, register transfer languageFor any function of the computer, the register transfer language can be used to describe the (sequence of) microoperationsRegister transfer languageA symbolic languageA convenient tool for describing the internal organization of digital computersCan also be used to facilitate the design process of digital systems.

DESIGNATION OF REGISTERS


Registers are designated by capital letters, sometimes followed by numbers (e.g., A, R13, IR)Often the names indicate function:MAR- memory address registerPC- program counterIR- instruction registerRegisters and their contents can be viewed and represented in various waysA register can be viewed as a single entity:

Registers may also be represented showing the bits of data they contain

MAR

DESIGNATION OF REGISTERS


R1 Register

Numbering of bits

Showing individual bits

Subfields

PC(H) PC(L)15 8 7 0

- a register - portion of a register - a bit of a register

Common ways of drawing the block diagram of a register

7 6 5 4 3 2 1 0

R215 0

Designation of a register

REGISTER TRANSFER Register Transfer

• Copying the contents of one register to another is a register transfer• A register transfer is indicated as

R2 R1– In this case the contents of register R2 are copied (loaded) into register R1

REGISTER TRANSFERRegister Transfer

A register transfer such as

R3 R5

Implies that the digital system hasthe data lines from the source register (R5) to the destination register (R3)Parallel load in the destination register (R3)Control lines to perform the action

CONTROL FUNCTIONS Register TransferOften actions need to only occur if a certain condition is trueThis is similar to an “if” statement in a programming languageIn digital systems, this is often done via a control signal, called a control functionIf the signal is 1, the action takes placeThis is represented as:

P: R2 R1

Which means “if P = 1, then load the contents of register R1 into register R2”, i.e., if (P = 1) then (R2 R1)

• Copying the contents of one register to another is a register transfer• A register transfer is indicated as

R2 R1– In this case the contents of register R2 are copied (loaded) into register R1

HARDWARE IMPLEMENTATION OF CONTROLLED TRANSFERS

Implementation of controlled transfer

P: R2 R1

Block diagram

Timing diagram

Clock

Register Transfer

Transfer occurs here

R2

R1

Control Circuit

LoadP

n

Clock

Load

t t+1

The same clock controls the circuits that generate the control function and the destination register Registers are assumed to use positive-edge-triggered flip-flops

BASIC SYMBOLS FOR REGISTER TRANSFERS

Capital letters Denotes a register MAR, R2 & numerals Parentheses () Denotes a part of a register R2(0-7), R2(L)Arrow Denotes transfer of information R2 R1Colon : Denotes termination of control function P:Comma , Separates two micro-operations A B, B A

Symbols Description Examples

Register Transfer

SIMULTANEOUS OPERATIONS• If two or more operations are to occur simultaneously, they are separated with commas

P: R3 ¬ R5, MAR ¬ IR

• Here, if the control function P = 1, load the contents of R5 into R3, and at the same time (clock), load the contents of register IR into register MAR

BUS AND BUS TRANSFERBus is a path(of a group of wires) over which information is transferred, from any of several sources to any of several destinations.

From a register to bus: BUS R

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4Register A

Register B Register C Register D

B C D1 1 1

4 x1MUX

B C D2 2 2

4 x1MUX

B C D3 3 3

4 x1MUX

B C D4 4 4

4 x1MUX

4-line bus

x

yselect

0 0 0 0

Register A Register B Register C Register D

Bus lines

Bus and Memory Transfers

TRANSFER FROM BUS TO A DESTINATION REGISTER

Three-State Bus Buffers

Bus line with three-state buffersReg. R0 Reg. R1 Reg. R2 Reg. R3

Bus lines

2 x 4Decoder

Load

D0 D1 D2 D3z

wSelect E (enable)

Output Y=A if C=1High-impedence if C=0Normal input A

Control input C

Select

Enable

0123

S0S1

A0B0C0D0

Bus line for bit 0

Bus and Memory Transfers

BUS TRANSFER IN RTL• Depending on whether the bus is to be mentioned explicitly or not, register transfer can be indicated as

either or

• In the former case the bus is implicit, but in the latter, it is explicitly indicated

SUMMARY OF R. TRANSFER MICROOPERATIONSBus and Memory Transfers

A B Transfer content of reg. B into reg. AAR DR(AD)Transfer content of AD portion of reg. DR into reg. ARA constantTransfer a binary constant into reg. AABUS R1, Transfer content of R1 into bus A and, at the same time, R2 ABUS transfer content of bus A into R2 AR Address registerDR Data registerM[R] Memory word specified by reg. RM Equivalent to M[AR]DR M Memory read operation: transfers content of memory word specified by AR into DRM DR Memory write operation: transfers content of DR into memory word specified by AR

ARITHMETIC MICROOPERATIONS

Summary of Typical Arithmetic Micro-Operations

Arithmetic Microoperations

R3 R1 + R2 Contents of R1 plus R2 transferred to R3R3 R1 - R2Contents of R1 minus R2 transferred to R3R2 R2’Complement the contents of R2 R2 R2’+ 1 2's complement the contents of R2 (negate)R3 R1 + R2’+ 1 subtractionR1 R1 + 1 IncrementR1 R1 - 1 Decrement

The basic arithmetic microoperations areAdditionSubtractionIncrement DecrementThe additional arithmetic microoperations areAdd with carrySubtract with borrowTransfer/Loadetc. …

• Computer system microoperations are of four types:Register transfer microoperations- Arithmetic microoperations- Logic microoperations- Shift microoperations

BINARY ADDER / SUBTRACTOR / INCREMENTER

FA

B0 A0

S0

C0FA

B1 A1

S1

C1FA

B2 A2

S2

C2FA

B3 A3

S3

C3

C4

Binary Adder-Subtractor

FA

B0 A0

S0

C0C1FA

B1 A1

S1

C2FA

B2 A2

S2

C3FA

B3 A3

S3C4

M

Binary Incrementer

HAx y

C S

A0 1

S0

HAx y

C S

A1

S1

HAx y

C S

A2

S2

HAx y

C S

A3

S3C4

Binary Adder


ARITHMETIC CIRCUIT

S1S00123

4x1MUX

X0

Y0

C0

C1

D0FA

S1S00123

4x1MUX

X1

Y1

C1

C2

D1FA

S1S00123

4x1MUX

X2

Y2

C2

C3

D2FA

S1S00123

4x1MUX

X3

Y3

C3

C4

D3FA

Cout

A0

B0

A1

B1

A2

B2

A3

B3

0 1

S0

S1

Cin

S1S0CinYOutputMicrooperation0 00BD = A + BAdd0 01BD = A + B + 1Add with carry0 10B’D = A + B’Subtract with borrow0 11B’D = A + B’+ 1Subtract1 000D = ATransfer A 1 010D = A + 1Increment A1 101D = A - 1Decrement A1 111D = ATransfer A


LOGIC MICROOPERATIONS

Logic Microoperations

Specify binary operations on the strings of bits in registersLogic microoperations are bit-wise operations, i.e., they work on the individual bits of datauseful for bit manipulations on binary data useful for making logical decisions based on the bit valueThere are, in principle, 16 different logic functions that can be defined over two binary input variablesHowever, most systems only implement four of theseAND (), OR (), XOR (), Complement/NOTThe others can be created from combination of these

0 0 0 0 0 … 1 1 10 1 0 0 0 … 1 1 11 0 0 0 1 … 0 1 11 1 0 1 0 … 1 0 1

A B F0 F1 F2 … F13 F14 F15

LIST OF LOGIC MICROOPERATIONS

List of Logic Microoperations - 16 different logic operations with 2 binary vars. - n binary vars → functions

2 2 n

Truth tables for 16 functions of 2 variables and the corresponding 16 logic micro operations

BooleanFunction

Micro-Operations Name

x 0 0 1 1y 0 1 0 1


0 0 0 0 F0 = 0 F 0 Clear 0 0 0 1 F1 = xy F A B AND 0 0 1 0 F2 = xy' F A B’ 0 0 1 1 F3 = x F A Transfer A 0 1 0 0 F4 = x'y F A’ B 0 1 0 1 F5 = y F B Transfer B 0 1 1 0 F6 = x y F A B Exclusive-OR 0 1 1 1 F7 = x + y F A B OR 1 0 0 0 F8 = (x + y)' F A B)’ NOR 1 0 0 1 F9 = (x y)' F (A B)’ Exclusive-NOR 1 0 1 0 F10 = y' F B’ Complement B 1 0 1 1 F11 = x + y' F A B 1 1 0 0 F12 = x' F A’ Complement A 1 1 0 1 F13 = x' + y F A’ B 1 1 1 0 F14 = (xy)' F (A B)’ NAND 1 1 1 1 F15 = 1 F all 1's Set to all 1's

HARDWARE IMPLEMENTATION OF LOGIC MICROOPERATIONS

0 0 F = A B AND0 1 F = AB OR1 0 F = A B XOR1 1 F = A’ Complement

S1 S0

Output -operation Function table


B

A

S

S

F

1

0

i

i

i0

1

2

3

4 X 1MUX

Select

APPLICATIONS OF LOGIC MICROOPERATIONS


Logic microoperations can be used to manipulate individual bits or a portions of a word in a registerConsider the data in a register A. In another register, B, is bit data that will be used to modify the contents of ASelective-set A A + BSelective-complement A A BSelective-clear A A • B’Mask (Delete) A A • BClear A A BInsert A (A • B) + CCompare A A B . . .

• SELECTIVE SETIn a selective set operation, the bit pattern in B is used to set certain bits in A

1 1 0 0 At1 0 1 0 B1 1 1 0 At+1 (A ¬ A + B)

• If a bit in B is set to 1, that same position in A gets set to 1, otherwise that bit in A keeps its previous value

SELECTIVE COMPLEMENT• In a selective complement operation, the bit pattern in B is used to complement certain bits in A

1 1 0 0 At1 0 1 0 B0 1 1 0 At+1 (A ¬ A Å B)

• If a bit in B is set to 1, that same position in A gets complemented from its original value, otherwise it is unchanged

SELECTIVE CLEAR• In a selective clear operation, the bit pattern in B is used to clear certain bits in A

1 1 0 0 At1 0 1 0 B0 1 0 0 At+1 (A ¬ A × B’)

• If a bit in B is set to 1, that same position in A gets set to 0, otherwise it is unchanged

MASK OPERATION• In a mask operation, the bit pattern in B is used to clear certain bits in A

1 1 0 0 At1 0 1 0 B1 0 0 0 At+1 (A ¬ A × B)

• If a bit in B is set to 0, that same position in A gets set to 0, otherwise it is unchanged

CLEAR OPERATION• In a clear operation, if the bits in the same position in A and B are the same, they are cleared in A,

otherwise they are set in A

1 1 0 0 At1 0 1 0 B0 1 1 0 At+1 (A ¬ A Å B)

INSERT OPERATION• An insert operation is used to introduce a specific bit pattern into A register, leaving the other bit

positions unchanged• This is done as

– A mask operation to clear the desired bit positions, followed by– An OR operation to introduce the new bits into the desired positions– Example

» Suppose you wanted to introduce 1010 into the low order four bits of A: 1101 1000 1011 0001 A (Original) 1101 1000 1011 1010 A (Desired)

» 1101 1000 1011 0001 A (Original)1111 1111 1111 0000 Mask1101 1000 1011 0000 A (Intermediate)0000 0000 0000 1010 Added bits1101 1000 1011 1010 A (Desired)

SHIFT MICROOPERATIONSShift Microoperations

There are three types of shiftsLogical shiftCircular shiftArithmetic shiftWhat differentiates them is the information that goes into the serial input

Serialinput

A right shift operation

A left shift operation

Serialinput

LOGICAL SHIFTShift Microoperations

In a logical shift the serial input to the shift is a 0.A right logical shift operation:

A left logical shift operation:

In a Register Transfer Language, the following notation is usedshl for a logical shift left shrfor a logical shift rightExamples:R2 shr R2 R3 shl R3

0

0

CIRCULAR SHIFTShift Microoperations

In a circular shift the serial input is the bit that is shifted out of the other end of the register.A right circular shift operation:

A left circular shift operation:

In a RTL, the following notation is usedcil for a circular shift left cirfor a circular shift rightExamples: R2 cir R2 R3 cil R3

Shift Microoperations

An arithmetic shift is meant for signed binary numbers (integer)An arithmetic left shift multiplies a signed number by twoAn arithmetic right shift divides a signed number by twoThe main distinction of an arithmetic shift is that it must keep the sign of the number the same as it performs the multiplication or divisionA right arithmetic shift operation:

A left arithmetic shift operation:

0

signbit

signbit

ARITHMETIC SHIFT

ARITHMETIC SHIFTShift Microoperations

An left arithmetic shift operation must be checked for the overflow

0

VBefore the shift, if the leftmost two bits differ, the shift will result in anoverflow

In a RTL, the following notation is usedashl for an arithmetic shift leftashrfor an arithmetic shift rightExamples:R2 ashr R2R3 ashl R3

signbit

HARDWARE IMPLEMENTATION OF SHIFT MICROOPERATIONS

Shift Microoperations

S

01

H0MUX

S

01

H1MUX

S

01

H2MUX

S

01

H3MUX

Select0 for shift right (down) 1 for shift left (up)Serial

input (IR)

A0

A1

A2

A3

Serialinput (IL)

ARITHMETIC LOGIC SHIFT UNIT Shift Microoperations

ArithmeticCircuit

LogicCircuit

C

C 4 x 1MUX

Select

0123

F

S3S2S1S0

BA

i

A

D

A

E

shrshl

i+1 i

ii

i+1i-1

i

i

S3 S2 S1 S0 Cin Operation Function0 0 0 0 0 F = A Transfer A0 0 0 0 1 F = A + 1 Increment A0 0 0 1 0 F = A + B Addition0 0 0 1 1 F = A + B + 1 Add with carry0 0 1 0 0 F = A + B’ Subtract with borrow0 0 1 0 1 F = A + B’+ 1 Subtraction0 0 1 1 0 F = A - 1 Decrement A0 0 1 1 1 F = A TransferA0 1 0 0 X F = A B AND0 1 0 1 X F = A B OR0 1 1 0 X F = A B XOR0 1 1 1 X F = A’ Complement A1 0 X X X F = shr A Shift right A into F1 1 X X X F = shl A Shift left A into F

CPU RAM 0

015

Instruction Codes• Every different processor type has its own design (different registers, buses, microoperations, machine

instructions, etc)• Modern processor is a very complex device• It contains

– Many registers– Multiple arithmetic units, for both integer and floating point calculations– The ability to pipeline several consecutive instructions to speed execution– Etc.

• However, to understand how processors work, we will start with a simplified processor model• This is similar to what real processors were like ~25 years ago• M. Morris Mano introduces a simple processor model he calls the Basic Computer• We will use this to introduce processor organization and the relationship of the RTL model to the higher

level computer processor

• The Basic Computer has two components, a processor and memory• The memory has 4096 words in it

– 4096 = 212, so it takes 12 bits to select a word in memory• Each word is 16 bits long

• Program– A sequence of (machine) instructions

• (Machine) Instruction --A group of bits that tell the computer to perform a specific operation (a sequence of micro operation)

• The instructions of a program, along with any needed data are stored in memory• The CPU reads the next instruction from memory• It is placed in an Instruction Register (IR)• Control circuitry in control unit then translates the instruction into the sequence of microoperations

necessary to implement it• Since the memory words, and hence the instructions, are 16 bits long, that leaves 3 bits for the

instruction’s opcode

INSTRUCTION FORMAT Instruction codesA computer instruction is often divided into two partsAn opcode (Operation Code) that specifies the operation for that instructionAn address that specifies the registers and/or locations in memory to use for that operationIn the Basic Computer, since the memory contains 4096 (= 212) words, we needs 12 bit to specify which memory address this instruction will use In the Basic Computer, bit 15 of the instruction specifies the addressing mode (0: direct addressing, 1: indirect addressing)Since the memory words, and hence the instructions, are 16 bits long, that leaves 3 bits for the instruction’s opcode

Opcode Address

Instruction Format

15 14 12 0I

11

Addressing mode

ADDRESSING MODESInstruction codes

The address field of an instruction can represent eitherDirect address: the address in memory of the data to use (the address of the operand), orIndirect address: the address in memory of the address in memory of the data to use

0 ADD

457

22

Operand

457

1 ADD

300

35

1350

300

Operand

1350

+

AC

+

AC

Direct addressing

Indirect addressing

• Effective Address (EA)– The address, that can be directly used without modification to access an operand for a

computation-type instruction, or as the target address for a branch-type instruction

PROCESSOR REGISTERS• A processor has many registers to hold instructions, addresses, data, etc• The processor has a register, the Program Counter (PC) that holds the memory address of the next

instruction to get– Since the memory in the Basic Computer only has 4096 locations, the PC only needs 12 bits

• In a direct or indirect addressing, the processor needs to keep track of what locations in memory it is addressing: The Address Register (AR) is used for this

– The AR is a 12 bit register in the Basic Computer

Registers in the Basic Computer

11 0PC

15 0IR

15 0TR

7 0

OUTR

15 0

DR

15 0AC

11 0AR

INPR0 7

Memory

4096 x 16

CPU

DR 16 Data Register Holds memory operandAR 12 Address Register Holds address for memoryAC 16 Accumulator Processor registerIR 16 Instruction Register Holds instruction codePC 12 Program Counter Holds address of instructionTR 16 Temporary Register Holds temporary dataINPR 8 Input Register Holds input characterOUTR 8 Output Register Holds output character

• When an operand is found, using either direct or indirect addressing, it is placed in the Data Register (DR). The processor then uses this value as data for its operation

• The Basic Computer has a single general purpose register – the Accumulator (AC)

• The significance of a general purpose register is that it can be referred to in instructions– e.g. load AC with the contents of a specific memory location; store the contents of AC into a

specified memory location• Often a processor will need a scratch register to store intermediate results or other temporary data; in the

Basic Computer this is the Temporary Register (TR)• The Basic Computer uses a very simple model of input/output (I/O) operations

– Input devices are considered to send 8 bits of character data to the processor– The processor can send 8 bits of character data to output devices

• The Input Register (INPR) holds an 8 bit character gotten from an input device• The Output Register (OUTR) holds an 8 bit character to be send to an output device

COMMON BUS SYSTEM• The registers in the Basic Computer are connected using a bus• This gives a savings in circuitry over complete connections between registers

BASIC COMPUTER REGISTERSList of BC Registers

Registers

COMMON BUS SYSTEM

Registers

S2S1S0

Bus

Memory unit4096 x 16

LD INR CLR

Address

ReadWrite

AR

LD INR CLR

PC

LD INR CLR

DR

LD INR CLR

ACALUE

INPR

IRLD

LD INR CLR

TR

OUTRLD

Clock

16-bit common bus

7

1

2

3

4

5

6

COMMON BUS SYSTEMRegisters

AR

PC

DR

L I C

L I C

L I C

AC

L I C

ALUE

IR

L

TR

L I C

OUTR LD

INPRMemory

4096 x 16

Address

Read

Write

16-bit Common Bus

7 1 2 3 4 5 6

S0 S1 S2

COMMON BUS SYSTEM

Registers

Three control lines, S2, S1, and S0 control which register the bus selects as its input

Either one of the registers will have its load signal activated, or the memory will have its read signal activatedWill determine where the data from the bus gets loadedThe 12-bit registers, AR and PC, have 0’s loaded onto the bus in the high order 4 bit positionsWhen the 8-bit register OUTR is loaded from the bus, the data comes from the low order 8 bits on the bus

0 0 0x0 0 1AR0 1 0PC0 1 1DR1 0 0AC1 0 1IR1 1 0TR1 1 1Memory

S2 S1 S0 Register

BASIC COMPUTER INSTRUCTIONS

Instructions

Basic Computer Instruction Format

15 14 12 11 0I Opcode Address

Memory-Reference Instructions (OP-code = 000 ~ 110)

Register-Reference Instructions (OP-code = 111, I = 0)

Input-Output Instructions(OP-code =111, I = 1)

15 12 11

0Register operation0 1 1

1

15

12 11

0I/O operation

1 1 1 1

BASIC COMPUTER INSTRUCTIONS

Hex CodeSymbol I = 0 I = 1 Description

AND 0xxx 8xxx AND memory word to ACADD 1xxx 9xxx Add memory word to ACLDA 2xxx Axxx Load AC from memorySTA 3xxx Bxxx Store content of AC into memoryBUN 4xxx Cxxx Branch unconditionallyBSA 5xxx Dxxx Branch and save return addressISZ 6xxx Exxx Increment and skip if zero

CLA 7800 Clear ACCLE 7400 Clear ECMA 7200 Complement ACCME 7100 Complement ECIR 7080 Circulate right AC and ECIL 7040 Circulate left AC and EINC 7020 Increment ACSPA 7010 Skip next instr. if AC is positiveSNA 7008 Skip next instr. if AC is negativeSZA 7004 Skip next instr. if AC is zeroSZE 7002 Skip next instr. if E is zeroHLT 7001 Halt computer

INP F800 Input character to ACOUT F400 Output character from ACSKI F200 Skip on input flagSKO F100 Skip on output flagION F080 Interrupt onIOF F040 Interrupt off

Instructions

INSTRUCTION SET COMPLETENESSA computer should have a set of instructions so that the user can construct machine language programs to evaluate any function that is known to be computable.

• Instruction TypesFunctional Instructions - Arithmetic, logic, and shift instructions - ADD, CMA, INC, CIR, CIL, AND, CLATransfer Instructions - Data transfers between the main memory and the processor registers - LDA, STAControl Instructions - Program sequencing and control - BUN, BSA, ISZInput/Output Instructions - Input and output - INP, OUT

CONTROL UNIT• Control unit (CU) of a processor translates from machine instructions to the control signals for the

microoperations that implement them

• Control units are implemented in one of two ways• Hardwired Control

– CU is made up of sequential and combinational circuits to generate the control signals• Microprogrammed Control

TIMING AND CONTROLControl unit of Basic Computer Timing and control

Instruction register (IR)15 14 13 12 11 - 0

3 x 8decoder

7 6 5 4 3 2 1 0

ID0

15 14 . . . . 2 1 04 x 16

decoder

4-bitsequence

counter(SC)

Increment (INR)Clear (CLR)

Clock

Other inputs

Controlsignals

D

T

T

7

15

0

CombinationalControl

logic

TIMING SIGNALS

Clock

T0 T1 T2 T3 T4 T0

T0

T1

T2

T3

T4

D3

CLR SC

- Generated by 4-bit sequence counter and 416 decoder- The SC can be incremented or cleared.

- Example: T0, T1, T2, T3, T4, T0, T1, . . . Assume: At time T4, SC is cleared to 0 if decoder output D3 is active.

D3T4: SC 0

Timing and control

– A control memory on the processor contains microprograms that activate the necessary control signals

• We will consider a hardwired implementation of the control unit for the Basic Computer

INSTRUCTION CYCLE• In Basic Computer, a machine instruction is executed in the following cycle:

1. Fetch an instruction from memory2. Decode the instruction3. Read the effective address from memory if the instruction has an indirect address

FETCH and DECODE

• Fetch and Decode T0: AR PC (S0S1S2=010, T0=1)T1: IR M [AR], PC PC + 1 (S0S1S2=111, T1=1)T2: D0, . . . , D7 Decode IR(12-14), AR IR(0-11), I IR(15)

S2S1S0

Bus

7Memory

unitAddress

Read

AR

LD

PC

INR

IR

LD Clock

1

2

5

Common bus

T1

T0

Instruction Cycle

DETERMINE THE TYPE OF INSTRUCTION

= 0 (direct)

D'7IT3:AR M[AR]D'7I'T3:NothingD7I'T3:Execute a register-reference instr.D7IT3:Execute an input-output instr.

Instrction Cycle

StartSC

AR PCT0

IR M[AR], PC PC + 1T1

AR IR(0-11), I IR(15)Decode Opcode in IR(12-14),

T2

D7= 0 (Memory-reference)(Register or I/O) = 1

II

Executeregister-reference

instructionSC 0

Executeinput-outputinstruction

SC 0

M[AR]AR Nothing

= 0 (register)

(I/O) = 1 (indirect) = 1

T3 T3 T3

T3

Executememory-reference

instructionSC 0

T4

4. Execute the instruction

• After an instruction is executed, the cycle starts again at step 1, for the next instruction

• Note: Every different processor has its own (different) instruction cycle

REGISTER REFERENCE INSTRUCTIONSRegister Reference Instructions are identified whenD7 = 1, I = 0

MEMORY REFERENCE INSTRUCTIONS Memory, PC after execution

21

0 BSA 135

Next instruction

Subroutine

20

PC = 21

AR = 135

136

1 BUN 135

Memory, PC, AR at time T4

0 BSA 135

Next instruction

Subroutine

20

21

135

PC = 136

1 BUN 135

Memory Memory

LDA: Load to ACD2T4:DR M[AR]D2T5:AC DR, SC 0STA: Store ACD3T4:M[AR] AC, SC 0BUN: Branch UnconditionallyD4T4:PC AR, SC 0BSA: Branch and Save Return AddressM[AR] PC, PC AR + 1

- Register Ref. Instr. is specified in b0 ~ b11 of IR- Execution starts with timing signal T3r = D7 I¢T3 => Register Reference InstructionBi = IR(i) , i=0,1,2,...,11r: SC ¬ 0CLA rB11: AC ¬ 0CLE rB10: E ¬ 0CMA rB9: AC ¬ AC’CME rB8: E ¬ E’CIR rB7: AC ¬ shr AC, AC(15) ¬ E, E ¬ AC(0)CIL rB6: AC ¬ shl AC, AC(0) ¬ E, E ¬ AC(15)INC rB5: AC ¬ AC + 1SPA rB4: if (AC(15) = 0) then (PC ¬ PC+1)SNA rB3: if (AC(15) = 1) then (PC ¬ PC+1)SZA rB2: if (AC = 0) then (PC ¬ PC+1)SZE rB1: if (E = 0) then (PC ¬ PC+1)HLT rB0: S ¬ 0 (S is a start-stop flip-flop)- The effective address of the instruction is in AR and was placed there during

timing signal T2 when I = 0, or during timing signal T3 when I = 1- Memory cycle is assumed to be short enough to complete in a CPU cycle- The execution of MR instruction starts with T4AND to AC

D0T4: DR ¬ M[AR] Read operandD0T5: AC ¬ AC Ù DR, SC ¬ 0 AND with AC

ADD to ACD1T4: DR ¬ M[AR] Read operandD1T5: AC ¬ AC + DR, E ¬ Cout, SC ¬ 0 Add to AC and store carry in E

BSA: D5T4: M[AR] ¬ PC, AR ¬ AR + 1D5T5: PC ¬ AR, SC ¬ 0

ISZ: Increment and Skip-if-ZeroD6T4: DR ¬ M[AR]D6T5: DR ¬ DR + 1D6T4: M[AR] ¬ DR, if (DR = 0) then (PC ¬ PC + 1), SC ¬ 0

FLOWCHART FOR MEMORY REFERENCE INSTRUCTIONS

MR InstructionsMemory-reference instruction

DR M[AR] DR M[AR] DR M[AR] M[AR] ACSC 0

AND

ADD

LDA

STA

AC AC DRSC 0

AC AC + DRE CoutSC 0

AC DRSC 0

D T

0 4 D T

1 4 D T

2 4 D T

3 4

D T

0 5 D T

1 5 D T

2 5

PC ARSC 0

M[AR] PCAR AR + 1

DR M[AR]

BUN

BSA

ISZ

D T

4 4 D T

5 4 D T

6 4

DR DR + 1

D T

5 5 D T

6 5

PC ARSC 0

M[AR] DRIf (DR = 0)then (PC PC + 1)SC 0

D T

6 6

INPUT-OUTPUT AND INTERRUPT

Input-Output Configuration

INPRInput register - 8 bitsOUTROutput register - 8 bitsFGIInput flag - 1 bitFGOOutput flag - 1 bitIENInterrupt enable - 1 bit

- The terminal sends and receives serial information- The serial info. from the keyboard is shifted into INPR - The serial info. for the printer is stored in the OUTR- INPR and OUTR communicate with the terminal serially and with the AC in parallel.- The flags are needed to synchronize the timing difference between I/O device and the computer

A Terminal with a keyboard and a Printer

I/O and Interrupt

Input-outputterminal

Serialcommunication

interface

Computerregisters andflip-flops

Printer

Keyboard

Receiverinterface

Transmitterinterface

FGOOUTR

AC

INPR FGI

Serial Communications PathParallel Communications Path

INPUT-OUTPUT INSTRUCTIONS

D7IT3 = pIR(i) = Bi, i = 6, …, 11

p: SC ¬ 0 Clear SCINP pB11: AC(0-7) ¬ INPR, FGI ¬ 0 Input char. to AC OUT pB10: OUTR ¬ AC(0-7), FGO ¬ 0 Output char. from AC SKI pB9: if(FGI = 1) then (PC ¬ PC + 1) Skip on input flag SKO pB8: if(FGO = 1) then (PC ¬ PC + 1) Skip on output flag

PROGRAM CONTROLLED DATA TRANSFER

loop: If FGI = 1 goto loop

INPR new data, FGI 1

loop: If FGO = 1 goto loop consume OUTR, FGO 1

-- CPU -- -- I/O Device --

/* Input */ /* Initially FGI = 0 */ loop: If FGI = 0 goto loop

AC INPR, FGI 0

/* Output */ /* Initially FGO = 1 */ loop: If FGO = 0 goto loop

OUTR AC, FGO 0

I/O and Interrupt

Start Input

FGI 0

FGI=0

AC INPR

MoreCharacter

END

Start Output

FGO 0

FGO=0

MoreCharacter

END

OUTR AC

AC Data

yes

no

yes

no

FGI=0 FGO=1

yes

yesno

no

FLOWCHART FOR INTERRUPT CYCLER = Interrupt f/f

- The interrupt cycle is a HW implementation of a branch and save return address operation.- At the beginning of the next instruction cycle, the instruction that is read from memory is in address1.- At memory address 1, the programmer must store a branch instruction that sends the control to an interrupt service routine- The instruction that returns the control to the original program is "indirect BUN 0"

I/O and Interrupt

Store return address

R =1=0

in location 0M[0] PC

Branch to location 1PC 1

IEN 0 R 0

Interrupt cycleInstruction cycle

Fetch and decodeinstructions

IEN

FGI

FGO

Executeinstructions

R 1

=1

=1

=1

=0

=0

=0

ION pB7: IEN ¬ 1 Interrupt enable onIOF pB6: IEN ¬ 0 Interrupt enable off

COMPLETE COMPUTER DESCRIPTIONFlowchart of Operations

Description

=1 (I/O) =0 (Register) =1(Indir) =0(Dir)

startSC 0, IEN 0, R 0

R

AR PCR’T0

IR M[AR], PC PC + 1R’T1

AR IR(0~11), I IR(15)D0...D7 Decode IR(12 ~ 14)

R’T2

AR 0, TR PCRT0

M[AR] TR, PC 0RT1

PC PC + 1, IEN 0R 0, SC 0

RT2

D7

I I

ExecuteI/O

Instruction

ExecuteRR

Instruction

AR <- M[AR] IdleD7IT3 D7I’T3

D7’IT3 D7’I’T3

Execute MRInstruction

=0(Instruction =1(Interrupt Cycle) Cycle)

=1(Register or I/O) =0(Memory Ref)

D7’T4

Register-Reference

CLA CLE CMA CME CIR CIL INC SPA SNA SZA SZE HLT

Input-Output

INP OUT SKI SKO ION IOF

D7IT3 = rIR(i) = Bi r: rB11: rB10: rB9: rB8: rB7: rB6: rB5: rB4: rB3: rB2: rB1: rB0:

D7IT3 = p IR(i) = Bi p: pB11: pB10: pB9: pB8: pB7: pB6:

(Common to all register-reference instr)(i = 0,1,2, ..., 11)SC 0AC 0E 0AC ACE EAC shr AC, AC(15) E, E AC(0)AC shl AC, AC(0) E, E AC(15)AC AC + 1If(AC(15) =0) then (PC PC + 1)If(AC(15) =1) then (PC PC + 1)If(AC = 0) then (PC PC + 1)If(E=0) then (PC PC + 1)S 0

(Common to all input-output instructions)(i = 6,7,8,9,10,11)SC 0AC(0-7) INPR, FGI 0OUTR AC(0-7), FGO 0If(FGI=1) then (PC PC + 1)If(FGO=1) then (PC PC + 1)IEN 1IEN 0

Description

COMPLETE COMPUTER DESCRIPTION Microoperations

REGISTERS• In Basic Computer, there is only one general purpose register, the Accumulator (AC)• In modern CPUs, there are many general purpose registers• It is advantageous to have many registers

– Transfer between registers within the processor are relatively fast– Going “off the processor” to access memory is much slower

GENERAL REGISTER ORGANIZATIONGeneral Register

Organization

MUXSELA { MUX } SELB

ALUOPR

R1R2R3R4R5R6R7

Input

3 x 8decoder

SELD

Load(7 lines)

Output

A bus B bus

Clock

OPERATION OF CONTROL UNIT

The control unit Directs the information flow through ALU by - Selecting various Components in the system - Selecting the Function of ALU

Example: R1 R2 + R3[1] MUX A selector (SELA): BUS A R2[2] MUX B selector (SELB): BUS B R3[3] ALU operation selector (OPR): ALU to ADD[4] Decoder destination selector (SELD): R1 Out Bus

Control WordEncoding of register selection fields

Control

Binary CodeSELASELBSELD000InputInputNone001 R1 R1 R1010 R2 R2 R2011 R3 R3 R3100 R4 R4 R4101 R5 R5 R5110 R6 R6 R6111 R7 R7 R7

SELA SELB SELD OPR

3 3 3 5

ALU CONTROL

Encoding of ALU operations OPRSelectOperationSymbol00000Transfer ATSFA00001Increment AINCA00010ADD A + BADD00101Subtract A - BSUB00110Decrement ADECA01000AND A and BAND01010OR A and BOR01100XOR A and BXOR01110Complement ACOMA10000Shift right ASHRA11000Shift left ASHLA

Examples of ALU Microoperations Symbolic DesignationMicrooperationSELASELBSELDOPR Control Word

Control

R1 R2 R3 R2 R3 R1 SUB 010 011 001 00101R4 R4 R5 R4 R5 R4 OR 100 101 100 01010R6 R6 + 1 R6 - R6 INCA 110 000 110 00001R7 R1 R1 - R7 TSFA 001 000 111 00000Output R2 R2 - None TSFA 010 000 000 00000Output Input Input - None TSFA 000 000 000 00000R4 shl R4 R4 - R4 SHLA 100 000 100 11000R5 0 R5 R5 R5 XOR 101 101 101 01100

REGISTER STACK ORGANIZATION

Register Stack

Push, Pop operations

/* Initially, SP = 0, EMPTY = 1, FULL = 0 */

PUSH POP

Stack Organization

SP SP + 1 DR M[SP]M[SP] DR SP SP 1If (SP = 0) then (FULL 1) If (SP = 0) then (EMPTY 1)EMPTY 0 FULL 0

Stack - Very useful feature for nested subroutines, nested interrupt services - Also efficient for arithmetic expression evaluation - Storage which can be accessed in LIFO - Pointer: SP - Only PUSH and POP operations are applicable

ABC

01234

63

Address

FULL EMPTY

SP

DR

Flags

Stack pointer

stack

6 bits

MEMORY STACK ORGANIZATION

Stack Organization

- A portion of memory is used as a stack with a processor register as a stack pointer

- PUSH:SP SP - 1 M[SP] DR - POP:DR M[SP] SP SP + 1

Memory with Program, Data, and Stack Segments

40014000399939983997

3000

Data(operands)

Program(instructions)

1000

PC

AR

SPstack

Stack growsIn this direction- Most computers do not provide hardware to check stack overflow (full stack) or underflow (empty stack) must be done in software

REVERSE POLISH NOTATION

A + BInfix notation+ A BPrefix or Polish notationA B +Postfix or reverse Polish notation

- The reverse Polish notation is very suitable for stack manipulation

Evaluation of Arithmetic Expressions Any arithmetic expression can be expressed in parenthesis-free Polish notation, including reverse Polish notation

(3 * 4) + (5 * 6) 3 4 * 5 6 * +

Stack Organization Arithmetic Expressions: A + B

3 3 12 12 12 12 424 5 5

630

3 4 * 5 6 * +

• PROCESSOR ORGANIZATIONIn general, most processors are organized in one of 3 ways

– Single register (Accumulator) organization» Basic Computer is a good example» Accumulator is the only general purpose register

– General register organization» Used by most modern computer processors» Any of the registers can be used as the source or destination for computer operations

– Stack organization» All operations are done using the hardware stack

INSTRUCTION FORMAT

OP-code field - specifies the operation to be performedAddress field - designates memory address(es) or a processor register(s)Mode field - determines how the address field is to be interpreted (to get effective address or the operand)

The number of address fields in the instruction format depends on the internal organization of CPU The three most common CPU organizations:

Instruction Format

Single accumulator organization:

ADDX /* AC AC + M[X] */General register organization:ADDR1, R2, R3 /* R1 R2 + R3 */ ADDR1, R2 /* R1 R1 + R2 */MOVR1, R2 /* R1 R2 */ ADDR1, X /* R1 R1 + M[X] */Stack organization:PUSHX /* TOS M[X] */ ADD

Instruction Fields

Three-Address Instructions

Program to evaluate X = (A + B) * (C + D) :ADDR1, A, B /* R1 M[A] + M[B]*/ ADDR2, C, D /* R2 M[C] + M[D]*/ MULX, R1, R2 /* M[X] R1 * R2*/

- Results in short programs - Instruction becomes long (many bits)

Two-Address Instructions

Program to evaluate X = (A + B) * (C + D) :

MOV R1, A /* R1 M[A] */ADD R1, B /* R1 R1 + M[A] */MOV R2, C /* R2 M[C] */ADD R2, D /* R2 R2 + M[D] */MUL R1, R2 /* R1 R1 * R2 */MOV X, R1 /* M[X] R1 */

Instruction Format THREE, AND TWO-ADDRESS INSTRUCTIONS

» For example, an OR instruction will pop the two top elements from the stack, do a logical OR on them, and push the result on the stack

ONE, AND ZERO-ADDRESS INSTRUCTIONS

- Use an implied AC register for all data manipulation- Program to evaluate X = (A + B) * (C + D) :

Instruction Format

LOAD A /* AC M[A] */ADD B /* AC AC + M[B] */STORE T /* M[T] AC */LOAD C /* AC M[C] */ADD D /* AC AC + M[D]*/MUL T /* AC AC * M[T]*/STORE X /* M[X] AC */

Zero-Address Instructions- Can be found in a stack-organized computer Program to evaluate X = (A + B) * (C + D) :

PUSHA/* TOS A*/PUSHB/* TOS B*/ADD/* TOS (A + B)*/PUSHC/* TOS C*/PUSHD/* TOS D*/ADD/* TOS (C + D)*/MUL/* TOS (C + D) * (A + B) */ POPX/* M[X] TOS*/

ADDRESSING MODES

• Addressing Modes * Specifies a rule for interpreting or modifying the address field of the instruction (before the operand is actually referenced) * Variety of addressing modes - to give programming flexibility to the user - to use the bits in the address field of the instruction efficiently

TYPES OF ADDRESSING MODES

• Implied ModeAddress of the operands are specified implicitly in the definition of the instruction

- No need to specify address in the instruction - EA = AC, or EA = Stack[SP]

- Examples from Basic ComputerCLA, CME, INP

• Immediate ModeInstead of specifying the address of the operand,

operand itself is specified - No need to specify address in the instruction

- However, operand itself needs to be specified - Sometimes, require more bits than the address - Fast to acquire an operand

• Register Mode Address specified in the instruction is the register address - Designated operand need to be in a register - Shorter address than the memory address - Saving address field in the instruction - Faster to acquire an operand than the memory addressing - EA = IR(R) (IR(R): Register field of IR)

• Register Indirect ModeInstruction specifies a register which contains the memory address of the operand

- Saving instruction bits since register address is shorter than the memory address - Slower to acquire an operand than both the register addressing or memory addressing - EA = [IR(R)] ([x]: Content of x)

• Autoincrement or Autodecrement Mode - When the address in the register is used to access memory, the value in the register is incremented or decremented by 1

Automatically

• Direct Address Mode Instruction specifies the memory address which can be used directly to access the memory - Faster than the other memory addressing modes - Too many bits are needed to specify the address for a large physical memory space - EA = IR(addr) (IR(addr): address field of IR)

• Indirect Addressing ModeThe address field of an instruction specifies the address of a memory location that contains the address

of the operand - When the abbreviated address is used large physical memory can be addressed with a relatively small number of bits - Slow to acquire an operand because of an additional memory access - EA = M[IR(address)]

• Relative Addressing Modes The Address fields of an instruction specifies the part of the address (abbreviated address) which can be used along with a designated register to calculate the address of the operand - Address field of the instruction is short - Large physical memory can be accessed with a small number of address bits - EA = f(IR(address), R), R is sometimes implied 3 different Relative Addressing Modes depending on R; * PC Relative Addressing Mode (R = PC) - EA = PC + IR(address) * Indexed Addressing Mode (R = IX, where IX: Index Register)

ADDRESSING MODES - EXAMPLES -

AddressingMode

EffectiveAddress

Contentof AC

Addressing Modes

Direct address500/* AC (500) */ 800Immediate operand -/* AC 500 */ 500Indirect address800/* AC ((500)) */ 300Relative address702/* AC (PC+500) */ 325Indexed address600/* AC (RX+500) */ 900Register -/* AC R1 */ 400Register indirect400 /* AC (R1) */ 700Autoincrement400 /* AC (R1)+ */ 700Autodecrement399 /* AC -(R) */ 450

Load to AC ModeAddress = 500

Next instruction

200201202

399400

450700

500 800

600 900

702 325

800 300

MemoryAddress

PC = 200

R1 = 400

XR = 100

AC

- EA = IX + IR(address) * Base Register Addressing Mode

(R = BAR, where BAR: Base Address Register)- EA = BAR + IR(address

DATA TRANSFER INSTRUCTIONS

Load LDStore STMove MOVExchange XCHInput INOutput OUTPush PUSHPop POP

Name Mnemonic

Typical Data Transfer Instructions

Direct addressLD ADRAC M[ADR]Indirect addressLD @ADRAC M[M[ADR]]Relative addressLD $ADRAC M[PC + ADR]Immediate operandLD #NBRAC NBRIndex addressingLD ADR(X)AC M[ADR + XR]RegisterLD R1AC R1Register indirectLD (R1)AC M[R1]AutoincrementLD (R1)+AC M[R1], R1 R1 + 1Autodecrement LD -(R1) R1 R1 - 1, AC M[R1]

ModeAssemblyConvention Register Transfer

Data Transfer and Manipulation

Data Transfer Instructions with Different Addressing Modes

FLAG, PROCESSOR STATUS WORD

In Basic Computer, the processor had several (status) flags – 1 bit value that indicated various information about the processor’s state – E, FGI, FGO, I, IEN, RIn some processors, flags like these are often combined into a register – the processor status register (PSR); sometimes called a processor status word (PSW)Common flags in PSW areC (Carry): Set to 1 if the carry out of the ALU is 1S (Sign): The MSB bit of the ALU’s outputZ (Zero): Set to 1 if the ALU’s output is all 0’sV (Overflow): Set to 1 if there is an overflow

Status Flag Circuitc7c8

A B8 8

8-bit ALU

V Z S CF7

F7 - F0

8

F

Check forzero output

PROGRAM CONTROL INSTRUCTIONSProgram Control

PC

+1In-Line Sequencing (Next instruction is fetched from the next adjacent location in the memory)

Address from other source; Current Instruction, Stack, etc; Branch, Conditional Branch, Subroutine, etc

Program Control Instructions

Name MnemonicBranch BRJump JMPSkip SKPCall CALLReturn RTNCompare(by ) CMPTest(by AND) TST* CMP and TST instructions do not retain their

results of operations ( and AND, respectively). They only set or clear certain Flags.

CONDITIONAL BRANCH INSTRUCTIONS

BZBranch if zeroZ = 1BNZBranch if not zeroZ = 0BCBranch if carryC = 1BNCBranch if no carryC = 0BPBranch if plusS = 0BMBranch if minusS = 1BVBranch if overflowV = 1BNVBranch if no overflowV = 0

BHIBranch if higherA > BBHEBranch if higher or equalA BBLOBranch if lowerA < BBLOEBranch if lower or equalA BBEBranch if equalA = BBNEBranch if not equalA B

BGTBranch if greater thanA > BBGEBranch if greater or equalA BBLTBranch if less thanA < BBLEBranch if less or equalA BBEBranch if equalA = BBNEBranch if not equalA B

Unsigned compare conditions (A - B)

Signed compare conditions (A - B)

Mnemonic Branch condition Tested condition

Program Control

SUBROUTINE CALL AND RETURNCall subroutineJump to subroutineBranch to subroutineBranch and save return address

Fixed Location in the subroutine (Memory) Fixed Location in memory In a processor Register In memory stack - most efficient way

Program Control

Subroutine Call

Two Most Important Operations are Implied; * Branch to the beginning of the Subroutine - Same as the Branch or Conditional Branch

* Save the Return Address to get the address of the location in the Calling Program upon exit from the Subroutine Locations for storing Return Address CALL

SP SP - 1 M[SP] PC

PC EA

RTN PC M[SP]

SP SP + 1

PROGRAM INTERRUPT

External interrupts External Interrupts initiated from the outside of CPU and Memory - I/O Device → Data transfer request or Data transfer complete - Timing Device → Timeout - Power Failure - Operator

Internal interrupts (traps) Internal Interrupts are caused by the currently running program - Register, Stack Overflow - Divide by zero - OP-code Violation - Protection Violation

Software Interrupts Both External and Internal Interrupts are initiated by the computer HW. Software Interrupts are initiated by the executing an instruction. - Supervisor Call → Switching from a user mode to the supervisor mode → Allows to execute a certain class of operations

which are not allowed in the user mode

INTERRUPT PROCEDUREInterrupt Procedure and Subroutine CallThe interrupt is usually initiated by an internal or an external signal rather than from the execution of an instruction (except for the software interrupt)- The address of the interrupt service program is determined by the hardware rather than from the address field of an instruction- An interrupt procedure usually stores all the information necessary to define the state of CPU rather than storing only the PC.

The state of the CPU is determined from; Content of the PC Content of all processor registers Content of status bits Many ways of saving the CPU state depending on the CPU architectures

COMPLEX INSTRUCTION SET COMPUTER • These computers with many instructions and addressing modes came to be known as Complex

Instruction Set Computers (CISC) • One goal for CISC machines was to have a machine language instruction to match each high-level

language statement type

VARIABLE LENGTH INSTRUCTIONS • The large number of instructions and addressing modes led CISC machines to have variable length

instruction formats• The large number of instructions means a greater number of bits to specify them• In order to manage this large number of opcodes efficiently, they were encoded with different lengths:

– More frequently used instructions were encoded using short opcodes.– Less frequently used ones were assigned longer opcodes.

• Also, multiple operand instructions could specify different addressing modes for each operand– For example,

» Operand 1 could be a directly addressed register,» Operand 2 could be an indirectly addressed memory location,» Operand 3 (the destination) could be an indirectly addressed register.

• All of this led to the need to have different length instructions in different situations, depending on the opcode and operands used

• For example, an instruction that only specifies register operands may only be two bytes in length– One byte to specify the instruction and addressing mode– One byte to specify the source and destination registers.

• An instruction that specifies memory addresses for operands may need five bytes– One byte to specify the instruction and addressing mode– Two bytes to specify each memory address

» Maybe more if there’s a large amount of memory.

• Variable length instructions greatly complicate the fetch and decode problem for a processor• The circuitry to recognize the various instructions and to properly fetch the required number of bytes for

operands is very complex

• Another characteristic of CISC computers is that they have instructions that act directly on memory addresses

– For example, ADD L1, L2, L3

that takes the contents of M[L1] adds it to the contents of M[L2] and stores the result in location M[L3]

• An instruction like this takes three memory access cycles to execute• That makes for a potentially very long instruction execution cycle• The problems with CISC computers are

– The complexity of the design may slow down the processor,– The complexity of the design may result in costly errors in the processor design and

implementation,– Many of the instructions and addressing modes are used rarely, if ever

SUMMARYOF CISC FEATURES → Format, Length, Addressing Modes → Complicated instruction cycle control due to the complex decoding HW and decoding process - Multiple memory cycle instructions → Operations on memory data → Multiple memory accesses/instruction - Microprogrammed control is necessity → Microprogram control storage takes substantial portion of CPU chip area → Semantic Gap is large between machine instruction and microinstruction - General purpose instruction set includes all the features required by individually different applications → When any one application is running, all the features required by the other applications are extra burden to the application

REDUCED INSTRUCTION SET COMPUTERS • In the late ‘70s and early ‘80s there was a reaction to the shortcomings of the CISC style of processors• Reduced Instruction Set Computers (RISC) were proposed as an alternative• The underlying idea behind RISC processors is to simplify the instruction set and reduce instruction

execution time

• RISC processors often feature:– Few instructions– Few addressing modes– Only load and store instructions access memory– All other operations are done using on-processor registers– Fixed length instructions– Single cycle execution of instructions– The control unit is hardwired, not microprogrammed

• Since all but the load and store instructions use only registers for operands, only a few addressing modes are needed

• By having all instructions the same length, reading them in is easy and fast• The fetch and decode stages are simple, looking much more like Mano’s Basic Computer than a CISC

machine• The instruction and address formats are designed to be easy to decode• Unlike the variable length CISC instructions, the opcode and register fields of RISC instructions can be

decoded simultaneously

• The control logic of a RISC processor is designed to be simple and fast• The control logic is simple because of the small number of instructions and the simple addressing modes• The control logic is hardwired, rather than microprogrammed, because hardwired control is faster

UNIT -3

COMPARISON OF CONTROL UNIT IMPLEMENTATIONS

Implementation of Control Unit

Control Unit ImplementationCombinational Logic Circuits (Hard-wired)

Microprogram

I R Status F/Fs

Control Data

CombinationalLogic Circuits

ControlPoints

CPU

Memory

Timing State

Ins. Cycle State

Control Unit's State

Status F/Fs

Control Data

Next AddressGenerationLogic

CSAR

ControlStorage

(-program memory)

Memory

I R

CSDR

CPs

CPUD

}

TERMINOLOGYMicroprogram - Program stored in memory that generates all the control signals required

to execute the instruction set correctly - Consists of microinstructionsMicroinstruction - Contains a control word and a sequencing word Control Word - All the control information required for one clock cycle Sequencing Word - Information needed to decide the next microinstruction address - Vocabulary to write a microprogramControl Memory(Control Storage: CS) - Storage in the microprogrammed control unit to store the microprogram

Writeable Control Memory(Writeable Control Storage:WCS) - CS whose contents can be modified -> Allows the microprogram can be changed -> Instruction set can be changed or modified

Dynamic Microprogramming - Computer system whose control unit is implemented with

a microprogram in WCS - Microprogram can be changed by a systems programmer or a user

Sequencer (Microprogram Sequencer)

MICROINSTRUCTION SEQUENCING

Sequencing Capabilities Required in a Control Storage- Incrementing of the control address register- Unconditional and conditional branches- A mapping process from the bits of the machineinstruction to an address for control memory- A facility for subroutine call and return

SequencingInstruction code

Mapping logi

c

Multiplexers

Control memory (ROM)

Subroutine

register(SBR)

Branchlogic

Status

bits

Microoperations

Control address register (CAR

)

Incrementer

MUXselect

select a statusbi

tBranch address

CONDITIONAL BRANCH Sequencing

Conditional Branch

If Condition is true, then Branch (address from the next address field of the current microinstruction) else Fall Through Conditions to Test: O(overflow), N(negative), Z(zero), C(carry), etc.

Control address register

Control memory

MUX

Load address

Increment

Status(condition)

bits

Micro-operationsCondition select

Next address

...

A Microprogram Control Unit that determines the Microinstruction Address to be executed n the next clock cycle

- In-line Sequencing - Branch - Conditional Branch - Subroutine - Loop - Instruction OP-code mapping

Unconditional Branch Fixing the value of one status bit at the input of the multiplexer to 1

MAPPING OF INSTRUCTIONS Sequencing

ADD RoutineAND RoutineLDA RoutineSTA RoutineBUN Routine

ControlStorage

00000001001000110100

OP-codes of Instructions ADD AND LDA STA BUN

00000001001000110100

.

.

.

Direct Mapping

Address

10 0000 010

10 0001 010

10 0010 010

10 0011 010

10 0100 010

MappingBits 10 xxxx 010

ADD Routine

Address

AND Routine

LDA Routine

STA Routine

BUN Routine

MAPPING OF INSTRUCTIONS TO MICROROUTINES

Mapping function implemented by ROM or PLA

OP-code

Mapping memory(ROM or PLA)

Control address register

Control Memory

Mapping from the OP-code of an instruction to the address of the Microinstruction which is the starting microinstruction of its execution microprogram

1 0 1 1 Address

OP-code

Mapping bits

Microinstruction address

0 x x x x 0 0

0 1 0 1 1 0 0

MachineInstruction

Sequencing

MICROPROGRAM EXAMPLE Microprogram

Computer Configuration

MUX

AR10 0

PC10 0

Address Memory2048 x 16

MUX

DR15 0

Arithmeticlogic andshift unit

AC15 0

SBR6 0

CAR6 0

Control memory128 x 20

Control unit

MACHINE INSTRUCTION FORMAT

Microinstruction Format

Microprogram

EA is the effective address

Symbol OP-code Description

ADD 0000AC AC + M[EA]BRANCH 0001 if (AC < 0) then (PC EA)STORE 0010M[EA] ACEXCHANGE 0011AC M[EA], M[EA] AC

Machine instruction format

I Opcode15 14 11 10

Address

0

Sample machine instructions

F1 F2 F3 CD BR AD

3 3 3 2 2 7

F1, F2, F3: Microoperation fieldsCD: Condition for branching BR: Branch field

AD: Address field

F3Microoperation Symbol000NoneNOP001AC AC DR XOR010AC AC’ COM011AC shl AC SHL100AC shr AC SHR101PC PC + 1 INCPC110PC AR ARTPC111Reserved

MICROINSTRUCTION FIELD DESCRIPTIONS - F1,F2,F3

F1MicrooperationSymbol000NoneNOP001AC AC + DR ADD010AC 0CLRAC011AC AC + 1INCAC100AC DRDRTAC101AR DR(0-10)DRTAR110AR PCPCTAR111M[AR] DRWRITE

Microprogram

F2Microoperation Symbol000NoneNOP001AC AC – DR SUB010AC AC DR OR011AC AC DR AND100DR M[AR] READ101DR AC ACTDR110DR DR + 1 INCDR111DR(0-10) PC PCTDR

MICROINSTRUCTION FIELD DESCRIPTIONS - CD, BR

CDCondition Symbol Comments00Always = 1 U Unconditional branch01DR(15) I Indirect address bit10AC(15) S Sign bit of AC11AC = 0 Z Zero value in AC

BR Symbol Function 00 JMP CAR AD if condition = 1 CAR CAR + 1 if condition = 0 01 CALL CAR AD, SBR CAR + 1 if condition = 1 CAR CAR + 1 if condition = 0 10 RET CAR SBR (Return from subroutine) 11 MAP CAR(2-5) DR(11-14), CAR(0,1,6) 0

Microprogram

SYMBOLIC MICROINSTRUCTIONS

• Symbols are used in microinstructions as in assembly language• A symbolic microprogram can be translated into its binary equivalent by a microprogram

assembler.• Sample Format

five fields: label; micro-ops; CD; BR; AD

Label: may be empty or may specify a symbolic address terminated with a colon Micro-ops: consists of one, two, or three symbols separated by commas

CD: one of {U, I, S, Z}, where U: Unconditional Branch I: Indirect address bit S: Sign of AC Z: Zero value in AC

BR: one of {JMP, CALL, RET, MAP} AD: one of {Symbolic address, NEXT, empty}

SYMBOLIC MICROPROGRAM - FETCH ROUTINE

AR PCDR M[AR], PC PC + 1AR DR(0-10), CAR(2-5) DR(11-14), CAR(0,1,6) 0

Symbolic microprogram for the fetch cycle:

ORG 64PCTAR U JMP NEXT READ, INCPC U JMP NEXT DRTAR U MAP

FETCH:

Binary equivalents translated by an assembler

1000000 110 000 000 00 00 10000011000001 000 100 101 00 00 10000101000010 101 000 000 00 11 0000000

Binaryaddress F1 F2 F3 CD BR AD

Microprogram

During FETCH, Read an instruction from memory and decode the instruction and update PC

Sequence of microoperations in the fetch cycle:

SYMBOLIC MICROPROGRAM

Control Storage: 128 20-bit words The first 64 words: Routines for the 16 machine instructions The last 64 words: Used for other purpose (e.g., fetch routine and other subroutines) Mapping: OP-code XXXX into 0XXXX00, the first address for the 16 routines are 0(0 0000 00), 4(0 0001 00), 8, 12, 16, 20, ..., 60

Microprogram

ORG 0NOPREADADD

ORG 4NOPNOPNOPARTPC

ORG 8NOPACTDRWRITE

ORG 12NOPREADACTDR, DRTACWRITE

ORG 64PCTARREAD, INCPCDRTARREADDRTAR

IUU

SU IU

IUU

IUUU

UUUUU

CALLJMPJMP

JMPJMPCALLJMP

CALLJMPJMP

CALLJMPJMPJMP

JMPJMPMAPJMPRET

INDRCTNEXTFETCH

OVERFETCHINDRCTFETCH

INDRCTNEXTFETCH

INDRCTNEXTNEXTFETCH

NEXTNEXT

NEXT

ADD:

BRANCH:

OVER:

STORE:

EXCHANGE:

FETCH:

INDRCT:

Label Microops CD BR ADPartial Symbolic Microprogram

DESIGN OF CONTROL UNIT - DECODING ALU CONTROL INFORMATION -

Design of Control Unit

microoperation fields

3 x 8 decoder

7 6 5 4 3 2 1 0

F1

3 x 8 decoder

7 6 5 4 3 2 1 0

F2

3 x 8 decoder

7 6 5 4 3 2 1 0

F3

Arithmeticlogic andshift unit

ANDADD

DRTAC

ACLoad

FromPC

FromDR(0-10)

Select 0 1Multiplexers

ARLoad Clock

AC

DR

D R T A RP C T A R

This microprogram can be implemented using ROM

Microprogram

Address Binary MicroinstructionMicro Routine Decimal Binary F1 F2 F3 CD BR ADADD0 0000000000000 000 01 01 1000011 1 0000001 000 100 000 00 00 0000010 2 0000010 001 000 000 00 00 1000000 3 0000011 000 000 000 00 00 1000000 BRANCH 4 0000100 000 000 000 10 00 0000110 5 0000101 000 000 000 00 00 1000000 6 0000110 000 000 000 01 01 1000011 7 0000111 000 000 110 00 00 1000000 STORE 8 0001000 000 000 000 01 01 1000011 9 0001001 000 101 000 00 00 0001010 10 0001010 111 000 000 00 00 1000000 11 0001011 000 000 000 00 00 1000000 EXCHANGE 12 0001100 000 000 000 01 01 1000011 13 0001101 001 000 000 00 00 0001110 14 0001110 100 101 000 00 00 0001111 15 0001111 111 000 000 00 00 1000000

FETCH 64 1000000 110 000 000 00 00 1000001 65 1000001 000 100 101 00 00 1000010 66 1000010 101 000 000 00 11 0000000INDRCT 67 1000011 000 100 000 00 00 1000100 68 1000100 101 000 000 00 10 0000000

BINARY MICROPROGRAM

MICROPROGRAM SEQUENCER- NEXT MICROINSTRUCTION ADDRESS LOGIC -Design of Control Unit

Subroutine CALL

MUX-1 selects an address from one of four sources and routes it into a CAR - In-Line Sequencing CAR + 1 - Branch, Subroutine Call CS(AD) - Return from Subroutine Output of SBR - New Machine instruction MAP

3 2 1 0SS

10

MUX1

External(MAP)

SBRL

Incrementer

CAR

Clock

Address source selection

In-Line

RETURN form Subroutine

Branch, CALL Address

Control Storage

S1S0 Address Source 00 CAR + 1, In-Line 01 SBR RETURN 10 CS(AD), Branch or CALL 11 MAP

MICROPROGRAM SEQUENCER- CONDITION AND BRANCH CONTROL -

Design of Control Unit

InputlogicI

0I1

TMUX2

Select

1I

SZ

Test

CD Field of CS

From CPU BR field

of CS

L(load SBR with PC) for subroutine Call

S0S1

for next addressselection

I0I1T Meaning Source of Address S1S0 L

000 In-Line CAR+1 00 0 001 JMP CS(AD) 10 0 010 In-Line CAR+1 00 0 011 CALL CS(AD) and SBR <- CAR+1 10 1 10x RET SBR 01 0 11x MAP DR(11-14) 11 0

L

S0 = I0S1 = I0I1 + I0’TL = I0’I1T

Input Logic

MICROPROGRAM SEQUENCERDesign of Control Unit

3 2 1 0S1

MUX1

External(MAP)

SBRLoad

Incrementer

CAR

Inputlogic

I0

T

MUX2

Select

1ISZ

Test

Clock

Control memory

Microops CD BR AD

L

I1 S

0

. . .. . .

MICROINSTRUCTION FORMAT

Microinstruction Format Information in a Microinstruction - Control Information - Sequencing Information - Constant Information which is useful when feeding into the system

These information needs to be organized in some way for - Efficient use of the microinstruction bits - Fast decoding

Field Encoding

- Encoding the microinstruction bits - Encoding slows down the execution speed due to the decoding delay - Encoding also reduces the flexibility due to the decoding hardware

HORIZONTAL AND VERTICAL MICROINSTRUCTION FORMAT

Horizontal Microinstructions Each bit directly controls each micro-operation or each control point Horizontal implies a long microinstruction word Advantages: Can control a variety of components operating in parallel. --> Advantage of efficient hardware utilization Disadvantages: Control word bits are not fully utilized --> CS becomes large --> CostlyVertical Microinstructions A microinstruction format that is not horizontal Vertical implies a short microinstruction word Encoded Microinstruction fields --> Needs decoding circuits for one or two levels of decoding

Microinstruction Format

One-level decoding

Field A2 bits

2 x 4Decoder

3 x 8Decoder

Field B3 bits

1 of 4 1 of 8

Two-level decoding

Field A2 bits

2 x 4Decoder

6 x 64Decoder

Field B6 bits

Decoder and selection logic

unit 4

ARITHMETIC AND LOGIC UNIT

ALU Inputs and Outputs

Integer Representation• Only have 0 & 1 to represent everything• Positive numbers stored in binary

— e.g. 41=00101001• No minus sign

• No period• Sign-Magnitude• Two’s compliment

Sign-Magnitude• Left most bit is sign bit• 0 means positive• 1 means negative• +18 = 00010010• -18 = 10010010• Problems

— Need to consider both sign and magnitude in arithmetic— Two representations of zero (+0 and -0)

Two’s Compliment• +3 = 00000011• +2 = 00000010• +1 = 00000001• +0 = 00000000• -1 = 11111111• -2 = 11111110• -3 = 11111101

Benefits• One representation of zero• Arithmetic works easily (see later)• Negating is fairly easy

— 3 = 00000011— Boolean complement gives 11111100— Add 1 to LSB 11111101

Range of Numbers• 8 bit 2s compliment

— +127 = 01111111 = 27 -1— -128 = 10000000 = -27

• 16 bit 2s compliment— +32767 = 011111111 11111111 = 215 - 1— -32768 = 100000000 00000000 = -215

Conversion Between Lengths• Positive number pack with leading zeros• +18 = 00010010• +18 = 00000000 00010010• Negative numbers pack with leading ones• -18 = 10010010• -18 = 11111111 10010010• i.e. pack with MSB (sign bit)

Addition and Subtraction

• Normal binary addition• Monitor sign bit for overflow

• Take twos compliment of substahend and add to minuend— i.e. a - b = a + (-b)

• So we only need addition and complement circuits

Multiplication• Complex• Work out partial product for each digit• Take care with place value (column)• Add partial products

Multiplication Example• 1011 Multiplicand (11 dec)• x 1101 Multiplier (13 dec)• 1011 Partial products• 0000 Note: if multiplier bit is 1 copy• 1011 multiplicand (place value)• 1011 otherwise zero• 10001111 Product (143 dec)• Note: need double length result

Flowchart for Unsigned Binary Multiplication

Multiplying Negative Numbers• This does not work!• Solution 1

— Convert to positive if required

— Multiply as above— If signs were different, negate answer

• Solution 2— Booth’s algorithm

Booth’s Algorithm

Division• More complex than multiplication• Negative numbers are really bad!• Based on long division

Division of Unsigned Binary Integers

Real Numbers

• Numbers with fractions• Could be done in pure binary

— 1001.1010 = 24 + 20 +2-1 + 2-3 =9.625• Where is the binary point?• Fixed?

— Very limited• Moving?

— How do you show where it is?Floating Point

• +/- .significand x 2exponent• Mis Floating Point nomer• Point is actually fixed between sign bit and body of mantissa• Exponent indicates place value (point position)

Signs for Floating Point• Mantissa is stored in 2s compliment• Exponent is in excess or biased notation

— e.g. Excess (bias) 128 means— 8 bit exponent field— Pure value range 0-255— Subtract 128 to get correct value— Range -128 to +127

Normalization• FP numbers are usually normalized• i.e. exponent is adjusted so that leading bit (MSB) of mantissa is 1• Since it is always 1 there is no need to store it• (c.f. Scientific notation where numbers are normalized to give a single digit before the decimal point• e.g. 3.123 x 103)

Floating Point Arithmetic +/-• Check for zeros• Align significands (adjusting exponents)• Add or subtract significands• Normalize result

Floating Point Addition & Subtraction Flowchart

FP Arithmetic x/• Check for zero• Add/subtract exponents • Multiply/divide significands (watch sign)• Normalize• Round• All intermediate results should be in double length storage

Floating Point Multiplication

Floating point Division

Unit-5

MEMORY HIERARCHY

Magnetictapes

Magneticdisks

I/Oprocessor

CPU

Mainmemory

Cachememory

Auxiliary memory

Register

Cache

Main Memory

Magnetic Disk

Magnetic Tape

Memory Hierarchy is to obtain the highest possibleaccess speed while minimizing the total cost of the memory system

Memory Hierarchy

MAIN MEMORY

RAM and ROM Chips

Typical RAM chip

Typical ROM chip

Chip select 1Chip select 2

ReadWrite

7-bit address

CS1CS2RDWRAD 7

128 x 8RAM

8-bit data bus

CS1 CS2 RD WR 0 0 x x 0 1 x x 1 0 0 0 1 0 0 1 1 0 1 x 1 1 x x

Memory function Inhibit Inhibit Inhibit Write Read Inhibit

State of data busHigh-impedenceHigh-impedenceHigh-impedenceInput data to RAMOutput data from RAMHigh-impedence

Chip select 1Chip select 2

9-bit address

CS1CS2

AD 9

512 x 8ROM

8-bit data bus

Main Memory

MEMORY ADDRESS MAP

RAM 1RAM 2RAM 3RAM 4ROM

0000 - 007F0080 - 00FF0100 - 017F0180 - 01FF0200 - 03FF

ComponentHexa

address

0 0 0 x x x x x x x0 0 1 x x x x x x x0 1 0 x x x x x x x0 1 1 x x x x x x x1 x x x x x x x x x

10 9 8 7 6 5 4 3 2 1

Address bus

Memory Connection to CPU

- RAM and ROM chips are connected to a CPU through the data and address buses

- The low-order lines in the address bus select the byte within the chips and other lines in the address bus select a particular chip through its chip select inputs

Address space assignment to each memory chip

Example: 512 bytes RAM and 512 bytes ROM

Main Memory

CONNECTION OF MEMORY TO CPU

Main Memory

}

CS1CS2RDWRAD7

128 x 8RAM 1

CS1CS2RDWRAD7

128 x 8RAM 2

CS1CS2RDWRAD7

128 x 8RAM 3

CS1CS2RDWRAD7

128 x 8RAM 4

Decoder3 2 1 0

WRRD9 8 7-11016-11Address bus

Data bus

CPU

CS1CS2

512 x 8ROMAD9

1- 7

98

D at a

D at a

D at a

D at a

D at a

AUXILIARY MEMORY

Information Organization on Magnetic Tapes

EOFIRG

block 1 block 2 block

3

block 1block

2

block 3

R1

R2 R3 R4

R5 R

6R1R3

R2 R5 R4

file i

EOF

Organization of Disk Hardware

Track

Moving Head Disk Fixed Head Disk

Auxiliary Memory

ASSOCIATIVE MEMORY

- Accessed by the content of the data rather than by an address- Also called Content Addressable Memory (CAM)

Hardware OrganizationArgument register(A)

Key register (K)

Associative memoryarray and logic

m wordsn bits per word

Matchregister

Input

Read

Write

M

- Compare each word in CAM in parallel with the content of A(Argument Register)- If CAM Word[i] = A, M(i) = 1 - Read sequentially accessing CAM for CAM Word(i) for M(i) = 1- K(Key Register) provides a mask for choosing a particular field or key in the argument in A (only those bits in the argument that have 1’s intheir corresponding position of K are compared)

Associative Memory

ORGANIZATION OF CAM

Internal organization of a typical cell Cij

C11

Word 1

Word i

Word m

Bit 1 Bit j Bit n

M1

Mi

Mm

Associative Memory

Aj

R S

Output

Matchlogic

Input

Write

Read

Kj

MiToF ij

A1

Aj

An

K1

Kj

Kn

C1j

C1n

Ci1

Cij

Cin

Cm1

Cmj

Cmn

CACHE MEMORY

Locality of Reference - The references to memory at any given time interval tend to be confined within a localized areas - This area contains a set of information and the membership changes gradually as time goes by - Temporal Locality The information which will be used in near future is likely to be in use already( e.g. Reuse of information in loops) - Spatial Locality If a word is accessed, adjacent(near) words are likely accessed soon (e.g. Related data items (arrays) are usually stored together; instructions are executed sequentially)Cache - The property of Locality of Reference makes the Cache memory systems work - Cache is a fast small capacity memory that should hold those information which are most likely to be accessed

Cache Memory

Main memory

Cache memory

CPU

PERFORMANCE OF CACHE

All the memory accesses are directed first to CacheIf the word is in Cache; Access cache to provide it to CPUIf the word is not in Cache; Bring a block (or a line) including that word to replace a block now in Cache

- How can we know if the word that is required is there ? - If a new block is to replace one of the old blocks, which one should we choose ?

Memory Access

Performance of Cache Memory System

Hit Ratio - % of memory accesses satisfied by Cache memory system Te: Effective memory access time in Cache memory system Tc: Cache access time Tm: Main memory access time

Te = Tc + (1 - h) Tm

Example: Tc = 0.4 s, Tm = 1.2s, h = 0.85% Te = 0.4 + (1 - 0.85) * 1.2 = 0.58s

Cache Memory

MEMORY AND CACHE MAPPING - ASSOCIATIVE MAPPLING -

Associative mappingDirect mappingSet-associative mapping

Associative Mapping

Mapping FunctionSpecification of correspondence between main memory blocks and cache blocks

- Any block location in Cache can store any block in memory -> Most flexible- Mapping Table is implemented in an associative memory -> Fast, very Expensive- Mapping Table Stores both address and the content of the memory word

address (15 bits)Argument register

Address Data

0 1 0 0 00 2 7 7 72 2 2 3 5

3 4 5 06 7 1 01 2 3 4

CAM

Cache Memory

MEMORY AND CACHE MAPPING - DIRECT MAPPING -

Addressing Relationships

Direct Mapping Cache OrganizationMemoryaddress Memory data

00000 1 2 2 0

0077701000

0177702000

02777

2 3 4 03 4 5 0

4 5 6 05 6 7 0

6 7 1 0

Indexaddress Tag Data

000 0 0 1 2 2 0

0 2 6 7 1 0777

Cache memory

Tag(6) Index(9)

32K x 12

Main memoryAddress = 15 bitsData = 12 bits

512 x 12Cache memoryAddress = 9 bits

Data = 12 bits

00 000

77 777

000

777

- Each memory block has only one place to load in Cache- Mapping Table is made of RAM instead of CAM- n-bit memory address consists of 2 parts; k bits of Index field and n-k bits of Tag field- n-bit addresses are used to access main memory and k-bit Index is used to access the Cache

Cache Memory

DIRECT MAPPING

Direct Mapping with block size of 8 words

Operation

- CPU generates a memory request with (TAG;INDEX) - Access Cache using INDEX ; (tag; data) Compare TAG and tag - If matches -> Hit Provide Cache[INDEX](data) to CPU - If not match -> Miss M[tag;INDEX] <- Cache[INDEX](data) Cache[INDEX] <- (TAG;M[TAG; INDEX]) CPU <- Cache[INDEX](data)

Index tag data

000 0 1 3 4 5 0007 0 1 6 5 7 8010

017

770 0 2777 0 2 6 7 1

0

Block 0

Block 1

Block 63

Tag Block Word6 6 3

INDEX

Cache Memory

MEMORY AND CACHE MAPPING - SET ASSOCIATIVE MAPPING -

Set Associative Mapping Cache with set size of two

- Each memory block has a set of locations in the Cache to load

Index Tag Data

000

0 1

3 4 5 0

0 2

5 6 7 0

Tag Data

777

0 2

6 7 1 0

0 0

2 3 4 0

Operation - CPU generates a memory address(TAG; INDEX) - Access Cache with INDEX, (Cache word = (tag 0, data 0); (tag 1, data 1)) - Compare TAG and tag 0 and then tag 1 - If tag i = TAG -> Hit, CPU <- data i - If tag i TAG -> Miss, Replace either (tag 0, data 0) or (tag 1, data 1), Assume (tag 0, data 0) is selected for replacement, (Why (tag 0, data 0) instead of (tag 1, data 1) ?) M[tag 0, INDEX] <- Cache[INDEX](data 0) Cache[INDEX](tag 0, data 0) <- (TAG, M[TAG,INDEX]), CPU <- Cache[INDEX](data 0)

Cache Memory

BLOCK REPLACEMENT POLICY

Many different block replacement policies are available

LRU(Least Recently Used) is most easy to implement

Cache word = (tag 0, data 0, U0);(tag 1, data 1, U1), Ui = 0 or 1(binary)

Implementation of LRU in the Set Associative Mapping with set size = 2

Modifications

Initially all U0 = U1 = 1 When Hit to (tag 0, data 0, U0), U1 <- 1(least recently used) (When Hit to (tag 1, data 1, U1), U0 <- 1(least recently used)) When Miss, find the least recently used one(Ui=1) If U0 = 1, and U1 = 0, then replace (tag 0, data 0) M[tag 0, INDEX] <- Cache[INDEX](data 0) Cache[INDEX](tag 0, data 0, U0) <- (TAG,M[TAG,INDEX], 0); U1 <- 1 If U0 = 0, and U1 = 1, then replace (tag 1, data 1) Similar to above; U0 <- 1 If U0 = U1 = 0, this condition does not exist If U0 = U1 = 1, Both of them are candidates, Take arbitrary selection

Cache Memory

CACHE WRITE

Write Through

When writing into memory

If Hit, both Cache and memory is written in parallel If Miss, Memory is written For a read miss, missing block may be overloaded onto a cache block

Memory is always updated -> Important when CPU and DMA I/O are both executing

Slow, due to the memory access time

Write-Back (Copy-Back)

When writing into memory

If Hit, only Cache is written If Miss, missing block is brought to Cache and write into Cache For a read miss, candidate block must be written back to the memory

Memory is not up-to-date, i.e., the same item in Cache and memory may have different value

Cache Memory

VIRTUAL MEMORY

Give the programmer the illusion that the system has a very large memory, even though the computer actually has a relatively small main memory

Address Space(Logical) and Memory Space(Physical)

Address Mapping Memory Mapping Table for Virtual Address -> Physical Address

virtual address(logical address) physical address

address space memory space

address generated by programs actual main memory address

Mapping

Virtual address

Virtualaddressregister

Memorymapping

table

Memory tablebuffer register

Main memoryaddressregister

Mainmemory

Main memorybuffer register

Physical Address

Virtual Memory

ASSOCIATIVE MEMORY PAGE TABLE

Assume that Number of Blocks in memory = m Number of Pages in Virtual Address Space = n

Page Table - Straight forward design -> n entry table in memory Inefficient storage space utilization <- n-m entries of the table is empty

- More efficient method is m-entry Page Table Page Table made of an Associative Memory m words; (Page Number:Block Number)

1 0 1

Line number

Page no.

Argument register

1 0 1 0 00 0 1 1 10 1 0 0 01 0 1 0 11 1 0 1 0

Key register

Associative memory

Page no.Block no.

Virtual address

Page Fault Page number cannot be found in the Page Table

Virtual Memory

1. Trap to the OS2. Save the user registers and program state3. Determine that the interrupt was a page fault4. Check that the page reference was legal and

determine the location of the page on the backing store(disk)

5. Issue a read from the backing store to a free framea. Wait in a queue for this device until servicedb. Wait for the device seek and/or latency timec. Begin the transfer of the page to a free frame

6. While waiting, the CPU may be allocated to some other process

7. Interrupt from the backing store (I/O completed)8. Save the registers and program state for the other user9. Determine that the interrupt was from the backing store10. Correct the page tables (the desired page is now in memory)11. Wait for the CPU to be allocated to this process again12. Restore the user registers, program state, and new page table, then resume the interrupted instruction.

PAGE FAULT

Processor architecture should provide the ability to restart any instruction after a page fault.

LOAD M0

Reference1

OS

trap2

3 Page is on backing store

free frame

main memory

4

bring inmissingpage5

resetpagetable

6

restartinstruction

Virtual Memory

PAGE REPLACEMENT

Modified page fault service routine

Decision on which page to displace to make room foran incoming page when no free frame is available

1. Find the location of the desired page on the backing store2. Find a free frame - If there is a free frame, use it - Otherwise, use a page-replacement algorithm to select a victim frame - Write the victim page to the backing store3. Read the desired page into the (newly) free frame4. Restart the user process

2f 0 v i

f v

framevalid/invalid bit

page table

change toinvalid

4reset pagetable fornew page

victim

1

swapoutvictimpage

3swapdesiredpage in backing store

physical memory

Virtual Memory

PAGE REPLACEMENT ALGORITHMSVirtual Memory

FIFO0

7

1

7

2 0 3 0 4 2 3 0 3 2 1 2 0 1 77 0 1

0 07

1

201

231

230

430

420

423

023

013

012

712

702

701

Page frames

Reference string

FIFO algorithm selects the page that has been in memory the longest time Using a queue - every time a page is loaded, its identification is inserted in the queueEasy to implementMay result in a frequent page fault

-Optimal Replacement (OPT) - Lowest page fault rate of all algorithms

Replace that page which will not be used for the longest period of time

0

7

1

7

2 0 3 0 4 2 3 0 3 2 1 2 0 1 77 0 1

0 07

1

201

20

3

24

3

2

03

2

01

701

Page frames

Reference string

PAGE REPLACEMENT ALGORITHMS

- OPT is difficult to implement since it requires future knowledge - LRU uses the recent past as an approximation of near future.

Replace that page which has not been used for the longest period of time

LRU

0

7

1

7

2 0 3 0 4 2 3 0 3 2 1 2 0 1 77 0 1

0 07

1

201

203

403

402

432

032

132

102

107

Page frames

Reference string

Virtual Memory

- LRU may require substantial hardware assistance- The problem is to determine an order for the framesdefined by the time of last use

Unit-6PERIPHERAL DEVICESInput Devices

• Keyboard• Optical input devices

- Card Reader - Paper Tape Reader - Bar code reader - Digitizer - Optical Mark Reader

• Magnetic Input Devices - Magnetic Stripe Reader

• Screen Input Devices - Touch Screen - Light Pen - Mouse

• Analog Input Devices

Output Devices• Card Puncher, Paper Tape Puncher• CRT• Printer (Impact, Ink Jet,

Laser, Dot Matrix)• Plotter• Analog• Voice

I/O BUS AND INTERFACE MODULES

Each peripheral has an interface module associated with it

Interface- Decodes the device address (device code)- Decodes the commands (operation)- Provides signals for the peripheral controller- Synchronizes the data flow and supervises the transfer rate between peripheral and CPU or Memory

Typical I/O instruction

(Command)

Op. code Device address Function code

Input/Output Interfaces

Processor

Interface

Keyboard and

displayterminal

Magnetictape

Printer

Interface Interface Interface

DataAddressControl

Magneticdisk

I/O bus

CONNECTION OF I/O BUS

Connection of I/O Bus to One Interface

Connection of I/O Bus to CPU


I/Obus

Op.code

Deviceaddress

Functioncode

Accumulatorregister

ComputerI/O

control

Sense lines

Data lines

Function code lines

Device address lines

CPU

I/Obus

Device address

Commanddecoder

Function code

Data lines

Buffer register

Peripheralregister

Statusregister

Sense lines

Outputperipheral device

and controller

AD = 1101 InterfaceLogic

I/O BUS AND MEMORY BUS

* MEMORY BUS is for information transfers between CPU and the MM

* I/O BUS is for information transfers between CPU and I/O devices through their I/O interface

* Many computers use a common single bus system for both memory and I/O interface units - Use one common bus but separate control lines for each function - Use one common bus with common control lines for both functions

* Some computer systems use two separate buses, one to communicate with memory and the other with I/O interfaces- Communication between CPU and all interface units is via a commonI/O Bus- An interface connected to a peripheral device may have a number of data registers , a control register, and a status register- A command is passed to the peripheral by sending to the appropriate interface register- Function code and sense lines are not needed (Transfer of data, control, and status information is always via the common I/O Bus)

Functions of Buses

Physical Organizations

I/O Bus


ISOLATED vs MEMORY MAPPED I/O

- Separate I/O read/write control lines in addition to memory read/write control lines- Separate (isolated) memory and I/O address spaces - Distinct input and output instructions

Isolated I/O

Memory-mapped I/O

- A single set of read/write control lines (no distinction between memory and I/O transfer)- Memory and I/O addresses share the common address space -> reduces memory address range available- No specific input or output instruction -> The same memory reference instructions can be used for I/O transfers- Considerable flexibility in handling I/O operations


I/O INTERFACE

- Information in each port can be assigned a meaning depending on the mode of operation of the I/O device → Port A = Data; Port B = Command; Port C = Status- CPU initializes(loads) each port by transferring a byte to the Control Register → Allows CPU can define the mode of operation of each port → Programmable Port: By changing the bits in the control register, it is possible to change the interface characteristics

CS RS1 RS0 Register selected 0 x x None - data bus in high-impedence 1 0 0 Port A register 1 0 1 Port B register 1 1 0 Control register 1 1 1 Status register

Programmable Interface


Chip select

Register select

Register select

I/O read

I/O write

CS

RS1

RS0

RD

WR

Timingand

Control

Busbuffers

Bidirectionaldata bus

Port Aregister

Port Bregister

Controlregister

Statusregister

I/O data

I/O data

Control

Status

Inte

rnal

bus

CPU I/ODevice

ASYNCHRONOUS DATA TRANSFER

Synchronous - All devices derive the timing information from common clock lineAsynchronous - No common clock

Asynchronous data transfer between two independent units requires that control signals be transmitted between the communicating units to indicate the time at which data is being transmitted

Strobe pulse - A strobe pulse is supplied by one unit to indicate the other unit when the transfer has to occur

Handshaking - A control signal is accompanied with each data being transmitted to indicate the presence of data - The receiving unit responds with another control signal to acknowledge receipt of the data

Synchronous and Asynchronous Operations

Asynchronous Data Transfer

Two Asynchronous Data Transfer Methods


STROBE CONTROLAsynchronous Data Transfer

* Employs a single control line to time each transfer* The strobe may be activated by either the source or the destination unit

Sourceunit

Destinationunit

Data bus

Strobe

Data

Strobe

Valid data

Block Diagram

Timing Diagram

Source-Initiated Strobe for Data Transfer

Sourceunit

Destinationunit

Data bus

Strobe

Data

Strobe

Valid data

Block Diagram

Destination-Initiated Strobe for Data Transfer

Timing Diagram

HANDSHAKINGStrobe Methods Source-Initiated

The source unit that initiates the transfer has no way of knowing whether the destination unit has actually received data

Destination-Initiated The destination unit that initiates the transfer no way of knowing whether the source has actually placed the data on the bus

To solve this problem, the HANDSHAKE method introduces a second control signal to provide a Replyto the unit that initiates the transfer

SOURCE-INITIATED TRANSFER USING HANDSHAKE

* Allows arbitrary delays from one state to the next * Permits each unit to respond at its own data transfer rate * The rate of transfer is determined by the slower unit

Block Diagram

Timing Diagram

Accept data from bus.Enable data accepted

Disable data accepted.Ready to accept data(initial state).

Sequence of EventsPlace data on bus.Enable data valid.

Source unit Destination unit

Disable data valid.Invalidate data on bus.

Sourceunit

Destinationunit

Data bus

Data accepted

Data bus

Data valid

Valid data

Data valid

Data accepted


ASYNCHRONOUS SERIAL TRANSFERAsynchronous serial transferSynchronous serial transferAsynchronous parallel transferSynchronous parallel transfer

- Employs special bits which are inserted at both ends of the character code - Each character consists of three parts; Start bit; Data bits; Stop bits.

A character can be detected by the receiver from the knowledge of 4 rules; - When data are not being sent, the line is kept in the 1-state (idle state)

- The initiation of a character transmission is detected by a Start Bit , which is always a 0- The character bits always follow the Start Bit- After the last character , a Stop Bit is detected when the line returns to the 1-state for at least 1 bit time

The receiver knows in advance the transfer rate of the bits and the number of information bits to expect

Four Different Types of Transfer

Asynchronous Serial Transfer

Start bit(1 bit)

StopbitsCharacter bits

1 1 0 0 0 1 0 1

(at least 1 bit)


UNIVERSAL ASYNCHRONOUS RECEIVER-TRANSMITTER - UART -

A typical asynchronous communication interface available as an IC

Transmitter Register - Accepts a data byte(from CPU) through the data bus - Transferred to a shift register for serial transmission Receiver - Receives serial information into another shift register - Complete data byte is sent to the receiver registerStatus Register Bits - Used for I/O flags and for recording errorsControl Register Bits - Define baud rate, no. of bits in each character, whether to generate and check parity, and no. of stop bits

Chip select

Register select

I/O read

I/O write

CS

RS

RD

WR

Timing

andControl

Busbuffers

Bidirectionaldata bus

Transmitterregister

Controlregister

Statusregister

Receiverregister

Shiftregister

Transmittercontrol

and clock

Receivercontrol

and clock

Shiftregister

Transmitdata

Transmitterclock

Receiverclock

Receivedata


CS RS Oper. Register selected

0 x x None 1 0 WR Transmitter register 1 1 WR Control register 1 0 RD Receiver register 1 1 RD Status register

Int

er na l B us

FIRST-IN-FIRST-OUT(FIFO) BUFFER* Input data and output data at two different rates * Output data are always in the same order in which the data entered the buffer.* Useful in some applications when data is transferred asynchronously

4 x 4 FIFO Buffer (4 4-bit registers Ri), 4 Control Registers(flip-flops Fi, associated with each Ri)


4-bitregister

S

R

F

F'

1

1

4-bitregister

S

R

F

F'

2

2

4-bitregister

S

R

F

F'

3

3

4-bitregister

S

R

F

F'

4

4

F

F

S

R

F

F'

S

R

Clock Clock Clock Clock

Dataoutput

Outputready

Delete

Datainput

Insert

Input ready

Master clear

R1 R2

R3

R4

MODES OF TRANSFER - PROGRAM-CONTROLLED I/O -

3 different Data Transfer Modes between the central computer(CPU or Memory) and peripherals; Program-Controlled I/O

Interrupt-Initiated I/O Direct Memory Access (DMA)

Program-Controlled I/O(Input Dev to CPU)

Modes of Transfer

Polling or Status Checking

Continuous CPU involvement CPU slowed down to I/O speed Simple Least hardware

Read status registerCheck flag bit

flag

Read data registerTransfer data to memory

Operationcomplete?

Continue withprogram

= 0

= 1

yes

no

CPU

Data bus

Address bus

I/O read

I/O write

Interface

Data register

Statusregister F

I/O bus

Data valid

Data accepted

I/Odevice

MODES OF TRANSFER - INTERRUPT INITIATED I/O & DMA

DMA (Direct Memory Access)

- Large blocks of data transferred at a high speed to or from high speed devices, magnetic drums, disks, tapes, etc.- DMA controller Interface that provides I/O transfer of data directly to and from the memory and the I/O device- CPU initializes the DMA controller by sending a memory address and the number of words to be transferred- Actual transfer of data is done directly between the device and memory through DMA controller -> Freeing CPU for other tasks

- Polling takes valuable CPU time- Open communication only when some data has to be passed -> Interrupt.- I/O interface, instead of the CPU, monitors the I/O device- When the interface determines that the I/O device is ready for data transfer, it generates an Interrupt Request to the CPU - Upon detecting an interrupt, CPU stops momentarily the task it is doing, branches to the service routine to process the data transfer, and then returns to the task it was performing

Interrupt Initiated I/O

Modes of Transfer

PRIORITY INTERRUPT

Priority Interrupt by Software(Polling) - Priority is established by the order of polling the devices(interrupt sources) - Flexible since it is established by software - Low cost since it needs a very little hardware - Very slow

Priority Interrupt by Hardware - Require a priority interrupt manager which accepts all the interrupt requests to determine the highest priority request - Fast since identification of the highest priority interrupt request is identified by the hardware - Fast since each interrupt source has its own interrupt vector to access directly to its own service routine

Priority - Determines which interrupt is to be served first when two or more requests are made simultaneously - Also determines which interrupts are permitted to interrupt the computer while another is being serviced - Higher priority interrupts can make requests while servicing a lower priority interrupt

Priority Interrupt

HARDWARE PRIORITY INTERRUPT - DAISY-CHAIN -

One stage of the daisy chain priority arrangement

PI RF PO Enable 0 0 0 0 0 1 0 0 1 0 1 0 1 1 1 1

Interrupt Request from any device(>=1) -> CPU responds by INTACK <- 1 -> Any device receives signal(INTACK) 1 at PI puts the VAD on the bus Among interrupt requesting devices the only device which is physically closest to CPU gets INTACK=1, and it blocks INTACK to propagate to the next device

Priority Interrupt

Device 1PI PO

Device 2PI PO

Device 3PI PO

INT

INTACK

Interrupt request

Interrupt acknowledge

To nextdevice

CPU

VAD 1 VAD 2 VAD 3Processor data bus

* Serial hardware priority function* Interrupt Request Line - Single common line* Interrupt Acknowledge Line - Daisy-Chain

S

R

QInterruptrequest

from device

PI

Priority in

RF

Delay

Vector address

VAD

PO

Priority out

Interrupt request to CPU

Enable

PARALLEL PRIORITY INTERRUPT

IEN: Set or Clear by instructions ION or IOFIST: Represents an unmasked interrupt has occurred. INTACK enables tristate Bus Buffer to load VAD generated by the Priority Logic

Interrupt Register: - Each bit is associated with an Interrupt Request from different Interrupt Source - different priority level - Each bit can be cleared by a program instructionMask Register: - Mask Register is associated with Interrupt Register - Each bit can be set or cleared by an Instruction

Priority Interrupt

Maskregister

INTACKfrom CPU

Priorityencoder

I0

I1

I 2

I 3

0

1

2

3

y

x

ISTIEN0

1

2

3

0

0

0

0

0

0

Disk

Printer

Reader

Keyboard

Interrupt register

Enable

Interruptto CPU

VADto CPU

BusBuffer

INTERRUPT PRIORITY ENCODERDetermines the highest priority interrupt when more than one interrupts take place

Priority Encoder Truth table

1 d d d0 1 d d0 0 1 d0 0 0 10 0 0 0

I0

I1

I2

I3 0 0 1

0 1 11 0 11 1 1d d 0

x y IST

x = I0' I1'y = I0' I1 + I0’ I2’(IST) = I0 + I1 + I2 +

I3

Inputs Outputs

Boolean functions

Priority Interrupt

INTERRUPT SERVICE ROUTINE

Initial and Final OperationsEach interrupt service routine must have an initial and final set of operations for controlling the registers in the hardware interrupt system

Initial Sequence [1] Clear lower level Mask reg. bits [2] IST <- 0 [3] Save contents of CPU registers [4] IEN <- 1 [5] Go to Interrupt Service Routine

Final Sequence [1] IEN <- 0 [2] Restore CPU registers [3] Clear the bit in the Interrupt Reg [4] Set lower level Mask reg. bits [5] Restore return address, IEN <- 1

Priority Interrupt

address Memory

JMP PTR

JMP RDR

JMP KBD

JMP DISK0

1

2

3

I/O service programs

Program to servicemagnetic disk

Program to serviceline printer

Program to servicecharacter reader

Program to servicekeyboard

DISK

PTR

RDR

KBD

255256

750

256750

Stack

Main program

current instr.749KBDinterrupt

2

VAD=00000011 3

4

Diskinterrupt

5

6

7

8

9 10

11

1

INTERRUPT CYCLEAt the end of each Instruction cycle - CPU checks IEN and IST - If IEN · IST = 1, CPU -> Interrupt Cycle

SP ¬SP - 1 Decrement stack pointerM[SP] ¬ PC Push PC into stackINTACK ¬ 1 Enable interrupt acknowledgePC ¬ VAD Transfer vector address to PCIEN ¬ 0 Disable further interruptsGo To Fetch to execute the first instruction in the interrupt service routine

DIRECT MEMORY ACCESS

High-impedence(disabled)

when BG isenabled

CPU bus signals for DMA transfer

Block diagram of DMA controller

* Block of data transfer from high speed devices, Drum, Disk, Tape* DMA controller - Interface which allows I/O transfer directly between memory and Device, freeing CPU for other tasks* CPU initializes DMA Controller by sending memory address and the block size(number of words)

Address bus

Data bus

Read

Write

ABUS

DBUS

RDWR

Bus request

Bus granted

BR

BGCPU

Address bus

Data bus

DMA select

Register select

Read

Write

Bus request

Bus grant

Interrupt

DS

RS

RD

WR

BR

BG

Interrupt

Data busbuffers

Address busbuffers

Address register

Word count register

Control register

DMA request

DMA acknowledge to I/O device

Controllogic

Direct Memory Access

Int

er na l B us

DMA I/O OPERATIONStarting an I/O - CPU executes instruction to Load Memory Address Register Load Word Counter Load Function(Read or Write) to be performed Issue a GO command

Upon receiving a GO Command DMA performs I/O operation as follows independently from CPU

Input [1] Input Device <- R (Read control signal) [2] Buffer(DMA Controller) <- Input Byte; and assembles the byte into a word until word is full [4] M <- memory address, W(Write control signal) [5] Address Reg <- Address Reg +1; WC(Word Counter) <- WC - 1 [6] If WC = 0, then Interrupt to acknowledge done, else go to [1]

Output [1] M <- M Address, R M Address R <- M Address R + 1, WC <- WC - 1 [2] Disassemble the word [3] Buffer <- One byte; Output Device <- W, for all disassembled bytes [4] If WC = 0, then Interrupt to acknowledge done, else go to [1]


CYCLE STEALING

While DMA I/O takes place, CPU is also executing instructions

DMA Controller and CPU both access Memory -> Memory Access Conflict

Memory Bus Controller

- Coordinating the activities of all devices requesting memory access - Priority System

Memory accesses by CPU and DMA Controller are interwoven, with the top priority given to DMA Controller -> Cycle Stealing

Cycle Steal

- CPU is usually much faster than I/O(DMA), thus CPU uses the most of the memory cycles - DMA Controller steals the memory cycles from CPU - For those stolen cycles, CPU remains idle - For those slow CPU, DMA Controller may steal most of the memory cycles which may cause CPU remain idle long time


DMA TRANSFER

BG

BRCPU

RD WR Addr Data

InterruptRandom-accessmemory unit (RAM)

RD WR Addr Data

BR

BG

RD WR Addr Data

Interrupt

DS

RS DMAController

I/OPeripheral

deviceDMA request

DMA ack.

Read control

Write control

Data bus

Address bus

Addressselect


INPUT/OUTPUT PROCESSOR - CHANNEL -Channel

- Processor with direct memory access capability that communicates with I/O devices - Channel accesses memory by cycle stealing - Channel can execute a Channel Program - Stored in the main memory - Consists of Channel Command Word(CCW) - Each CCW specifies the parameters needed by the channel to control the I/O devices and perform data transfer operations - CPU initiates the channel by executing an channel I/O class instruction and once initiated, channel operates independently of the CPU

Input/Output Processor

PD PD PD PD

Peripheral devices

I/O bus

Input-outputprocessor

(IOP)

Centralprocessingunit (CPU)

Memory unit

Memor

y Bus

CHANNEL / CPU COMMUNICATION

Send instructionto test IOP.path

If status OK, then sendstart I/O instruction

to IOP.

CPU continues withanother program

Transfer status wordto memory

Access memoryfor IOP program

Conduct I/O transfersusing DMA;

Prepare status report.

I/O transfer completed;Interrupt CPU

Request IOP status

Transfer status wordto memory locationCheck status word

for correct transfer.

Continue

CPU operations IOP operations

Input/Output Processor

PIPELINING AND VECTOR PROCESSING

Parallel Processing Pipelining Arithmetic Pipeline Instruction Pipeline RISC Pipeline Vector Processing Array Processors

- unit 7

PARALLEL PROCESSINGParallel processing will denote the simultaneous occuirrence of data processing tasks for the purpose of increasing the computational speed of a computer systemor

PARALLEL COMPUTERS

Architectural Classification

Number of Data Streams

Number ofInstructionStreams

Single

Multiple

Single Multiple

SISD SIMD

MISD MIMD

Parallel Processing

Flynn's classificationBased on the multiplicity of Instruction Streams and Data StreamsInstruction StreamSequence of Instructions read from memoryData StreamOperations performed on the data in the processor

Execution of Concurrent Events in the computing process to achieve faster Computational Speed

Levels of Parallel Processing Job or Program level

- Task or Procedure level

- Inter-Instruction level

- Intra-Instruction level

COMPUTER ARCHITECTURES FOR PARALLEL PROCESSING

Von-Neuman based

Dataflow

Reduction

SISD

MISD

SIMD

MIMD

Superscalar processors

Superpipelined processors

VLIW

Nonexistence

Array processors

Systolic arrays

Associative processors

Shared-memory multiprocessors

Bus based Crossbar switch based Multistage IN based

Message-passing multicomputers

Hypercube Mesh Reconfigurable

SIMD COMPUTER SYSTEMS

Control Unit

Memory

Alignment network

P P P• • •

M MM • • •

Data bus

Instruction stream

Data stream

Processor units

Memory modules

Characteristics - Only one copy of the program exists - A single controller executes one instruction at a time

TYPES OF SIMD COMPUTERS

Array Processors - The control unit broadcasts instructions to all PEs,and all active PEs execute the same instructions - ILLIAC IV, GF-11, Connection Machine, DAP, MPP

Systolic Arrays

- Regular arrangement of a large number of very simple processors constructed on VLSI circuits - CMU Warp, Purdue CHiP

Associative Processors

- Content addressing - Data transformation operations over many sets of arguments with a single instruction - STARAN, PEPE

PIPELINING

R1 Ai, R2 Bi Load Ai and BiR3 R1 * R2, R4 Ci Multiply and load CiR5 R3 + R4 Add

A technique of decomposing a sequential process into suboperations, with each subprocess being executed in a partial dedicated segment that operates concurrently with all other segments.

Ai * Bi + Ci for i = 1, 2, 3, ... , 7

Ai

R1 R2

Multiplier

R3 R4

Adder

R5

MemoryBi

Ci

Segment 1

Segment 2

Segment 3

ClockPulse

Segment 1 Segment 2 Segment 3

Number R1 R2 R3 R4 R5 1 A1 B1 2 A2 B2 A1 * B1 C1 3 A3 B3 A2 * B2 C2 A1 * B1 + C1 4 A4 B4 A3 * B3 C3 A2 * B2 + C2 5 A5 B5 A4 * B4 C4 A3 * B3 + C3 6 A6 B6 A5 * B5 C5 A4 * B4 + C4 7 A7 B7 A6 * B6 C6 A5 * B5 + C5 8 A7 * B7 C7 A6 * B6 + C6 9 A7 * B7 + C7

OPERATIONS IN EACH PIPELINE STAGE

GENERAL PIPELINE

General Structure of a 4-Segment Pipeline

S R1 1 S R2 2 S R3 3 S R4 4Input

Clock

Space-Time Diagram1 2 3 4 5 6 7 8 9

T1

T1

T1

T1

T2

T2

T2

T2

T3

T3

T3

T3 T4

T4

T4

T4 T5

T5

T5

T5 T6

T6

T6

T6Clock cycles

Segment 1

2

3

4

Pipelining

PIPELINE SPEEDUP

n: Number of tasks to be performed

Conventional Machine (Non-Pipelined)tn: Clock cycle : Time required to complete the n tasks = n * tn

Pipelined Machine (k stages)tp: Clock cycle (time to complete each suboperation): Time required to complete the n tasks = (k + n - 1) * tp

SpeedupSk: Speedup

Sk = n*tn / (k + n - 1)*tp

n Sk =

tntp

( = k, if tn = k * tp )

lim

Pipelining

PIPELINE AND MULTIPLE FUNCTION UNITS

P1

I i

P2

I i+1

P3

I i+2

P4

I i+3

Multiple Functional Units

Example - 4-stage pipeline - subopertion in each stage; tp = 20nS - 100 tasks to be executed - 1 task in non-pipelined system; 20*4 = 80nS Pipelined System (k + n - 1)*tp = (4 + 99) * 20 = 2060nS

Non-Pipelined System n*k*tp = 100 * 80 = 8000nS

Speedup Sk = 8000 / 2060 = 3.88

4-Stage Pipeline is basically identical to the system with 4 identical function units

ARITHMETIC PIPELINEFloating-point adder

[1] Compare the exponents[2] Align the mantissa[3] Add/sub the mantissa[4] Normalize the result

X = A x 2aY = B x 2b

R

Compareexponents

by subtraction

a b

R

Choose exponent

Exponents

R

A B

Align mantissa

Mantissas

Difference

R

Add or subtractmantissas

R

Normalizeresult

R

R

Adjustexponent

R

Segment 1:

Segment 2:

Segment 3:

Segment 4:

Arithmetic Pipeline

INSTRUCTION CYCLE

Six Phases* in an Instruction Cycle[1] Fetch an instruction from memory[2] Decode the instruction[3] Calculate the effective address of the operand[4] Fetch the operands from memory[5] Execute the operation[6] Store the result in the proper place

* Some instructions skip some phases* Effective address calculation can be done in the part of the decoding phase* Storage of the operation result into a register is done automatically in the execution phase

==> 4-Stage Pipeline

[1] FI: Fetch an instruction from memory[2] DA: Decode the instruction and calculate the effective address of the operand[3] FO: Fetch the operand[4] EX: Execute the operation

Instruction Pipeline

INSTRUCTION PIPELINE

Execution of Three Instructions in a 4-Stage Pipeline


FI DA FO EX

FI DA FO EX

FI DA FO EX

i

i+1

i+2

Conventional

Pipelined

FI DA FO EX

FI DA FO EX

FI DA FO EX

i

i+1

i+2

INSTRUCTION EXECUTION IN A 4-STAGE PIPELINE

1 2 3 4 5 6 7 8 9 10

12

13

11F

IDA

FO

EX

1

FI

DA

FO

EXF

IDA

FO

EX F

IDA

FO

EXF

IDA

FO

EXF

IDA

FO

EXF

IDA

FO

EX

2

3

4

5

6

7

FI

Step:Instructi

on

(Branch)


Fetch instructionfrom memory

Decode instructionand calculate

effective address

Branch?

Fetch operandfrom memory

Execute instruction

Interrupt?Interrupthandling

Update PC

Empty pipe

no

yes

yesno

Segment1:

Segment2:

Segment3:

Segment4:

MAJOR HAZARDS IN PIPELINED EXECUTION

Structural hazards(Resource Conflicts) Hardware Resources required by the instructions in simultaneous overlapped execution cannot be metData hazards (Data Dependency Conflicts) An instruction scheduled to be executed in the pipeline requires the result of a previous instruction, which is not yet availableR1 <- B + CR1 <- R1 + 1

Hardware Technique Interlock

- hardware detects the data dependencies and delays the scheduling of the dependent instruction by stalling enough clock cycles Forwarding (bypassing, short-circuiting)

- Accomplished by a data path that routes a value from a source (usually an ALU) to a user, bypassing a designated register. This allows the value to be produced to be used at an earlier stage in the pipeline than would otherwise be possible Software Technique Instruction Scheduling(compiler) for delayed loadControl hazards

DATA HAZARDS

Data Hazards

Occurs when the execution of an instruction depends on the results of a previous instructionADD R1, R2, R3SUB R4, R1, R5

Data hazard can be dealt with either hardware techniques or software technique


CONTROL HAZARDS

Branch Instructions

- Branch target address is not known until the branch instruction is completed

- Stall -> waste of cycle times

FI DA FO EX

FI DA FO EX

BranchInstruction

NextInstruction

Target address available

Dealing with Control Hazards

* Prefetch Target Instruction * Branch Target Buffer * Loop Buffer * Branch Prediction * Delayed Branch


CONTROL HAZARDSInstruction Pipeline

Prefetch Target InstructionFetch instructions in both streams, branch not taken and branch takenBoth are saved until branch branch is executed. Then, select the right instruction stream and discard the wrong streamBranch Target Buffer(BTB; Associative Memory)Entry: Addr of previously executed branches; Target instruction and the next few instructionsWhen fetching an instruction, search BTB.If found, fetch the instruction stream in BTB; If not, new stream is fetched and update BTB

Loop Buffer(High Speed Register file) Storage of entire loop that allows to execute a loop without accessing memoryBranch PredictionGuessing the branch condition, and fetch an instruction stream based on the guess. Correct guess eliminates the branch penaltyDelayed BranchCompiler detects the branch and rearranges the instruction sequence by inserting useful instructions that keep the pipeline busy in the presence of a branch instruction

Branches and other instructions that change the PC make the fetch of the next instruction to be delayed

RISC PIPELINE

Instruction Cycles of Three-Stage Instruction Pipeline

RISC Pipeline

RISC - Machine with a very fast clock cycle that executes at the rate of one instruction per cycle <- Simple Instruction Set Fixed Length Instruction Format Register-to-Register Operations

Data Manipulation Instructions I: Instruction Fetch A: Decode, Read Registers, ALU Operations E: Write a Register

Load and Store Instructions I: Instruction Fetch A: Decode, Evaluate Effective Address E: Register-to-Memory or Memory-to-Register Program Control Instructions I: Instruction Fetch A: Decode, Evaluate Branch Address E: Write Register(PC)

DELAYED LOAD

Three-segment pipeline timing

Pipeline timing with data conflict

clock cycle 1 2 3 4 5 6 Load R1 I A E Load R2 I A E Add R1+R2 I A E Store R3 I A E

Pipeline timing with delayed load

clock cycle 1 2 3 4 5 6 7 Load R1 I A E Load R2 I A E NOP I A E Add R1+R2 I A E Store R3 I A E

LOAD: R1 M[address 1] LOAD: R2 M[address 2] ADD: R3 R1 + R2 STORE: M[address 3] R3

RISC Pipeline

The data dependency is takencare by the compiler rather than the hardware

DELAYED BRANCH

1I

3 4 652Clock cycles:

1. Load A

2. Increment

4. Subtract

5. Branch to X

7

3. Add

8

6. NOP

E

I A E

I A E

I A E

I A E

I A E

9 10

7. NOP

8. Instr. in X

I A E

I A E

1

I

3 4 652Clock cycles:

1. Load A

2. Increment

4. Add

5. Subtract

7

3. Branch to X

8

6. Instr. in X

E

I A E

I A E

I A E

I A E

I A E

Compiler analyzes the instructions before and after the branch and rearranges the program sequence by inserting useful instructions in the delay stepsUsing no-operation instructions

Rearranging the instructions

RISC Pipeline

VECTOR INSTRUCTIONS

f1: V Vf2: V Sf3: V x V Vf4: V x S V

V: Vector operandS: Scalar operand

TypeMnemonic Description (I = 1, ..., n)

Vector Processing

f1 VSQRVector square root B(I) SQR(A(I)) VSINVector sine B(I) sin(A(I)) VCOMVector complement A(I) A(I) f2 VSUMVector summation S A(I) VMAXVector maximum S max{A(I)} f3 VADDVector add C(I) A(I) + B(I) VMPYVector multiply C(I) A(I) * B(I) VANDVector AND C(I) A(I) . B(I) VLARVector larger C(I) max(A(I),B(I)) VTGEVector test > C(I) 0 if A(I) < B(I) C(I) 1 if A(I) > B(I) f4 SADDVector-scalar add B(I) S + A(I) SDIVVector-scalar divide B(I) A(I) / S

VECTOR INSTRUCTION FORMAT

Operation code

Base address source 1

Base address source 2

Base address destination

Vector length

Vector Processing

Vector Instruction Format

Source A

Source B

Multiplier pipeline

Adder pipeline

Pipeline for Inner Product

MULTIPLE MEMORY MODULE AND INTERLEAVINGVector Processing

Multiple Module Memory

Address Interleaving Different sets of addresses are assigned to different memory modules

AR

Memory

array

DR

AR

Memory

array

DR

AR

Memory

array

DR

AR

Memory

array

DR

Address bus

Data bus

M0 M1 M2 M3

UNIT - 8

MULTIPROCESSORS :

A Multiprocessor System is an interconnection of two or more CPU’s with memory & Input –output equipment.Parallel Computing

Simultaneous use of multiple processors, all componentsof a single architecture, to solve a task. Typically processors identical,single user (even if machine multiuser)

Distributed Computing

Use of a network of processors, each capable of beingviewed as a computer in its own right, to solve a problem. Processors may be heterogeneous, multiuser, usually individual task is assigned to a single processors

Pipelining Breaking a task into steps performed by different units, and multiple inputs stream through the units, with next input starting in a unit when previous input done with the unit but not necessarily done with the task

Vector Computing Use of vector processors, where operation such as multiplybroken into several steps, and is applied to a stream of operands(“vectors”). Most common special case of pipelining

Systolic Similar to pipelining, but units are not necessarily arranged linearly,

steps are typically small and more numerous, performed in lockstepfashion. Often used in special-purpose hardware such as image or signal processors

Types Of Multiprocessors:

Tightly Coupled System - Tasks and/or processors communicate in a highly synchronized fashion - Communicates through a common shared memory - Shared memory system

Loosely Coupled System - Tasks or processors do not communicate in a synchronized fashion - Communicates by message passing packets - Overhead for data exchange is high - Distributed memory system

INTERCONNECTION STRUCTURES

* Time-Shared Common Bus* Multiport Memory* Crossbar Switch* Multistage Switching Network* Hypercube System Bus All processors (and memory) are connected to a common bus or busses - Memory access is fairly uniform, but not very scalable

BusA collection of signal lines that carry module-to-module communication- Data highways connecting several digital system elements

Operations of Bus

Devices

M3 S7 M6 S5 M4S2

Devices

M3 S7 M6 S5 M4S2

Bus

M3 wishes to communicate with S5

[1] M3 sends signals (address) on the bus that causes S5 to respond

[2] M3 sends data to S5 or S5 sends data to M3(determined by the command line)

Master Device: Device that initiates and controls the communicationSlave Device: Responding deviceMultiple-master buses

> Bus conflict -> need bus arbitration

SYSTEM BUS STRUCTURE FOR MULTIPROCESSORS

Common

SharedMemory

SystemBus

Controller

CPU

IOP

LocalMemor

y

SystemBus

Controller

CPU

LocalMemor

y

SystemBus

Controller

CPU

IOP

LocalMemor

y

Local Bus

SYSTEM BUS

Local Bus

Local Bus

MULTIPORT MEMORYMultiport Memory Module - Each port serves a CPUMemory Module Control Logic - Each memory module has control logic - Resolve memory module conflicts Fixed priority among CPUsAdvantages - Multiple paths -> high transfer rateDisadvantages - Memory control logic - Large number of cables and

connections

CROSSBAR SWITCHMM4

MM 1 MM 2 MM 3 MM 4

CPU 1

CPU 2

CPU 3

CPU 4

MM1

CPU1

CPU2

CPU3

CPU4

MM2 MM3

MemoryModule

data

address

R/W

memoryenable

}

}

}

data,address, andcontrol from CPU 1




Multiplexersand

arbitrationlogic

A

B

0

1

A connected to 0

A

B

0

1

A connected to 1

A

B

0

1

B connected to 0

A

B

0

1

B connected to 1

Block Diagram of Crossbar Switch

MULTISTAGE SWITCHING NETWORK

Interstage Switch

MULTISTAGE INTERCONNECTION NETWORK

0

1000

001

0

1010

011

0

1100

101

0

1110

111

0

1

0

1

0

1

P1

P2

8x8 Omega Switching Network

01

23

45

67

000001

010011

100101

110111

Binary Tree with 2 x 2 Switches

HYPERCUBE INTERCONNECTION

- p = 2n- processors are conceptually on the corners of a n-dimensional hypercube, and each is directly connected to the n neighboring nodes- Degree = n

One-cube Two-cube Three-cube

11 010

1 00 10

010

110

011 111

101

100

001

000

n-dimensional hypercube (binary n-cube)

Binary Tree with 2 x 2 Switches

INTERPROCESSOR ARBITRATION

Bus Board level bus Backplane level bus Interface level bus

System Bus - A Backplane level bus

- Printed Circuit Board - Connects CPU, IOP, and Memory - Each of CPU, IOP, and Memory board can be plugged into a slot in the backplane(system bus) - Bus signals are grouped into 3 groups

Data, Address, and Control(plus power)

- Only one of CPU, IOP, and Memory can be granted to use the bus at a time - Arbitration mechanism is needed to handle multiple requests

e.g. IEEE standard 796 bus - 86 linesData: 16(multiple of 8)Address: 24Control: 26Power: 20

SYNCHRONOUS & ASYNCHRONOUS DATA TRANSFER

Synchronous Bus Each data item is transferred over a time slice known to both source and destination unit - Common clock source - Or separate clock and synchronization signal is transmitted periodically to synchronize the clocks in the system

Asynchronous Bus * Each data item is transferred by Handshake mechanism - Unit that transmits the data transmits a control signal that indicates the presence of data - Unit that receiving the data responds with another control signal to acknowledge the receipt of the data * Strobe pulse - supplied by one of the units to indicate to the other unit when the data transfer has to occur

BUS SIGNALS

IEEE Standard 796 Multibus Signals (Cont’d)

Miscellaneous controlMaster clock CCLKSystem initializationINITByte high enable BHENMemory inhibit (2 lines)INH1 - INH2Bus lock LOCKBus arbitrationBus request BREQCommon bus requestCBRQBus busy BUSYBus clock BCLKBus priority in BPRNBus priority out BPROPower and ground (20 lines)

INTERPROCESSOR ARBITRATION STATIC ARBITRATION

Serial Arbitration Procedure

Parallel Arbitration Procedure

Interprocessor Arbitration

Busarbiter 1

PI PO

Busarbiter 2

PI PO Busarbiter 3

PI PO Busarbiter 4

PI PO

Highestpriority

1

Bus busy line

To nextarbiter

Busarbiter 1

Ack Req

Busarbiter 2

Ack Req

Busarbiter 3

Ack Req

Busarbiter 4

Ack Req

Bus busy line

4 x 2Priority encoder

2 x 4Decoder

INTERPROCESSOR ARBITRATION DYNAMIC ARBITRATION

Priorities of the units can be dynamically changeable while the system is in operationTime Slice Fixed length time slice is given sequentially to ach processor, round-robin fashion

Polling Unit address polling - Bus controller advances the address to identify the requesting unitLRUFIFORotating Daisy Chain Conventional Daisy Chain - Highest priority to the nearest unit to the bus controller Rotating Daisy Chain - Highest priority to the unit that is nearest to the unit that has most recently accessed the bus(it becomes the bus controller)

INTERPROCESSOR SYNCHRONIZATION

Synchronization Communication of control information between processors - To enforce the correct sequence of processes - To ensure mutually exclusive access to shared writable data Hardware Implementation Mutual Exclusion with a Semaphore Mutual Exclusion - One processor to exclude or lock out access to shared resource by other processors when it is in a Critical Section - Critical Section is a program sequence that, once begun, must complete execution before another processor accesses the same shared resource Semaphore - A binary variable - 1: A processor is executing a critical section,that not available to other processors 0: Available to any requesting processor - Software controlled Flag that is stored in memory that all processors can be access

SEMAPHORE

Testing and Setting the Semaphore - Avoid two or more processors test or set the same semaphore - May cause two or more processors enter the same critical section at the same time - Must be implemented with an indivisible operation

R <- M[SEM] / Test semaphore / M[SEM] <- 1 / Set semaphore / These are being done while locked, so that other processors cannot test and set while current processor is being executing these instructions If R=1, another processor is executing thec critical section, the processor executed this instruction does not access the shared memory If R=0, available for access, set the semaphore to 1 and access The last instruction in the program must clear the semaphore

CACHE COHERENCECache Coherence

Caches are Coherent

Cache Incoherency in Write Through Policy

Cache Incoherency in Write Back Policy

X = 120

X = 120

P1

X = 52

P2

X = 52

P3

Main memory

Caches

Processors

Bus

X = 52

X = 120

P1

X = 52

P2

X = 52

P3

Main memory

Caches

Processors

Bus

X = 52

X = 52

P1

X = 52

P2

X = 52

P3

Main memory

Caches

Processors

Bus

MAINTAINING CACHE COHERENCY

Shared Cache - Disallow private cache - Access time delay

Software Approaches * Read-Only Data are Cacheable - Private Cache is for Read-Only data - Shared Writable Data are not cacheable - Compiler tags data as cacheable and noncacheable - Degrade performance due to software overhead

* Centralized Global Table - Status of each memory block is maintained in CGT: RO(Read-Only); RW(Read and Write) - All caches can have copies of RO blocks - Only one cache can have a copy of RW block

Hardware Approaches * Snoopy Cache Controller

- Cache Controllers monitor all the bus requests from CPUs and IOPs - All caches attached to the bus monitor the write operations - When a word in a cache is written, memory is also updated (write through) - Local snoopy controllers in all other caches check their memory to determine if they have a copy of that word; If they have, that location is marked invalid(future reference to this location causes cache miss)

Computer Organisation

Documents

operations of memory

function of memory unit

processing unit storage

data memory

separate unit

single unit

used computers

printing input unit