Top Banner
SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc
36

SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

Mar 29, 2015

Download

Documents

Taliyah Huggett
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

SE 292 (3:0) High Performance Computing

L2: Basic Computer Organization

R. Govindarajan

govind@serc

Page 2: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

2

Basic Computer Organization Main parts of a computer system:

Processor: Executes programs Main memory: Holds program and data I/O devices: For communication with outside

Machine instruction: Description of primitive operation that machine hardware is able to execute

Instruction Set: Complete specification of all the kinds of instructions that the processor hardware was built to execute

e.g. ADD these two integers

Page 3: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

3

Basic Computer Organization

Memory

I/O

Bus

I/OI/O

ALU Registers

CPU

Control

Page 4: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

4

Inside the Processor… Hardware to manage instruction execution Arithmetic, logic hardware Registers: small units of memory to hold

data/instructions temporarily during execution

Two kinds of registers1. Special purpose registers

2. General purpose registers

Page 5: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

5

Special Purpose Registers Program Counter (PC): specifies location in

memory of instruction being executed Instruction Register (IR): holds that

instruction Processor Status Register: holds status

information about current state of processor, such as whether an arithmetic overflow has occurred, etc

Page 6: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

6

General Purpose Registers Available for use by programmer, possibly for

keeping frequently used data Why? Since there is a large speed disparity

between processor and main memory 1 GHz Processor: 1 nanosecond time scale Memory: ~ 50 - 100 nsec time scale

What do these numbers mean? Instruction operands can come from registers

or from main memory

Page 7: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

7

Basic Computer Organization

CacheMemory

I/O

Bus

I/OI/O

MMU

ALU Registers

CPU

Control

General Purpose Registers Integer

Registers FP Registers

Special Purpose Registers Program

Counter Instruction

Register

Page 8: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

8

Main Memory Holds instructions and data View as sequence of locations, each referred

to by a unique memory address If size of each memory location is 1 Byte, we

call the memory byte addressable This is quite typical, as smallest data

(character) is represented in 1 Byte Larger data items are stored in contiguous

memory locations, e.g., a 4Byte integer would occupy 4 consecutive memory locations

Page 9: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

9

Terms: Byte ordering

What is the integer (4 byte data) at Address 400? Big Endian byte ordering:1AC8B246

Little Endian byte ordering: 46B2C81A

Some machines use big endian byte ordering and others use little endian byte ordering

1A C8 46B2 F0 8C DF1EData

400 406404402Address

0001 1010 1100 1000 1011 0010 0100 0110

In Hexadecimal (0,1,2,…,A,B,C,D,E,F)

0100 0110 1011 0010 1100 1000 0001 1010

Decimal: 449,360,454

Decimal: 1,186,121,754

Page 10: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

10

Terms: Word Size, Word AlignmentWord Size

Normal size of an integer or pointer

32b (4B) on many machines

Word Alignment

`Integer variable X is not word aligned’

The data item is not located at a word boundary

Word boundaries: addresses 0, 4, 8, 12, …

HW:

Write a C program to Identify whether a machine supports Little Endian or BigEndian

Write a C program to transfer a sequence of 4-byte values from a Little Endian to BigEndian.

Page 11: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

11

Instruction Set Architecture (ISA)View of the computer visible to the programmer (or

compiler)

Two kinds of ISAs

1. Complex Instruction Set Computer (CISC)

A single instruction can perform a complex operation involving several actions

2. Reduced Instruction Set Computer (RISC)

Each instruction performs a only simple operation

Page 12: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

12

Instruction Set Architecture Description of machine from view of the

programmer/compiler Example: Intel x86 ISA

Includes specification of1. The different kinds of instructions available

(instruction set)

2. How operands are specified (addressing modes)

3. What each instruction looks like (instruction format)

Page 13: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

13

Kinds of Instructions1. Arithmetic/logical instructions

Add, subtract, multiply, divide, compare (int/fp) Or, and, not, xor Shift (left/right, arithmetic/logical), rotate

2. Data transfer instructions Load (to register from memory) Store (to memory location from register) Move

3. Control transfer instructions Jump, conditional branch, function call, return

4. Other instructions Example: halt

Page 14: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

14

Operand Addressing Modes• Operands to an instruction

• Source: input value to instruction• Destination: where result is to go

• Addressing Mode• How the location of operand is specified

• An operand can be either• in a memory location• in a register

Page 15: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

15

Addressing Modes: Operand in Register1. Register Direct Addressing Mode

Operand is in the specified general purpose register

ExampleSuppose that the General Purpose Registers are

numbered as 0, 1, 2, etc

ADD R1, R2, R3 / R1 R2 + R3

2. Immediate Addressing ModeOperand is included in the instruction

ADD R1, R2, 1 / R1 R2 + 1

R1

R2

R3

17

24

35

59

source operandsdestination operand

Page 16: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

16

Addressing Modes: Operand in Memory3. Register Indirect Addressing Mode

Memory address of operand is in the specified general purpose register

ADD R1, R1, (R2)

4. Base-Displacement Addressing ModeMemory address of operand is calculated as

the sum of value in specified register and specified displacementADD R1, R1, 4(R2)

R1

R1

R2

R2

32

100

32

100

Address 96 100 104 108

Value 0 10 35 -17

MAIN MEMORY

Address 96 100 104 108

Value 0 10 35 -17

MAIN MEMORY

42

67

Page 17: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

17

Addressing Modes: Operand in Memory5. Absolute Addressing Mode

Memory address of operand is specified directly in the instruction

ADD R1, R2, #100

6. Indexed Addressing ModeMemory address of operand is calculated as sum of

contents of 2 registersADD R1, R2, (R3+R4)

Others Auto-increment/decrement (pre/post) PC relative

Page 18: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

18

Case Study: MIPS I Integer Instruction Set Registers

32 32b general purpose registers, R0..R31 R0 hardwired to value 0 R31 implicitly used by instructions JAL, JALR

HI, LO: 2 other 32b registers Used implicitly by multiply and divide instructions

Addressing Modes Immediate, Register direct (arithmetic) Absolute (jumps) Base-displacement (loads, stores) PC relative (branches)

Page 19: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

19

MIPS I ISA: General Comments All instructions, registers are 32b in size Load-store architecture: the only instructions

that have memory operands are loads&stores Terminology

Word: 32b Halfword: 16b Byte: 8b

Displacements and immediates are signed 16 bit quantities

Page 20: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

20

A RISC Instruction SetInstruction Mnemonic Example Meaning

Data Transfer Instructions Load LB, LBU, LH, LHU,

LUI, LW lw R2, 4(R3) R2Mem[R3+4]

Store SB, SH, SW sb R2, -8(R4) Mem[R4 - 8] R2 Move MFHI,MFLO,MTHI,

MTLO mfhi R1 R1 HI

Integer ALU Instructions Add ADD,ADDU,ADDI,

ADDIU add R1, R2, R3 R1 R2 + R3

Subtract SUB, SUBU sub R1, R2, R3 R1 R2 – R3 Multiply MULT, MULTU mult R1, R2 LO LSW ( R1*R2)

HI MSW (R1*R2) Divide DIV, DIVU div R1, R2 LO R1 div R2

HI R1 mod R2 Logical AND,ANDI,OR,ORI

NOR, XOR, XORI ori R1, R2, 0xF0 R1 R1 | SE (0xF0)

Shift SLL, SLV, SRA, SR sr R1, R2, 4 R1 0000 || (R2)31-4 Comparison SLT, SLTI, SLTU,

SLTIU slti R1, R2, 16 R1 1 if R2 < SE(16)

0 otherwise

Page 21: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

21

RISC Instruction Set (contd)

Instruction Mnemonic Example Meaning Control Transfer Instructions

Conditional Branch

BEQ, BGEZ, BLTZ, BLEZ, BGTZ, BNE

bltz R2, -16 PCPC+4 –16 if R2 < 0

Jump J, JR j <target> PC(PC)31-28||target||00 Jump & Link JAL, JALR jalr R2 R31 PC + 8

PC R2 System Call SYSCALL syscall

HW:Write a simple C program and generate the corpg. assembly language program for MIPS architecture. Understand the instructions, function call mechanism, formats of branch and jump instructions, etc.

Page 22: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

27

CISC vs RISC -- ISA Comparison

RISC Code:

lw R1, 0(R3)

lw R2, 0(R4)

add R5, R1, R2

subi R2, R2, 1

sw 0(R3), R5

sw 0(R4), R2

CISC Code:add (R3)+, (R3), (R4)sub (R4), -(R4), 1

a[i++] = a[i] + b[i];

b[i] = b[--i] - 1;

Page 23: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

28

MIPS Instruction Encoding

Example: add R 1, R 2, R 3

Opcode6 bits

Src1 (rs)5 bits

Func. code6-bits

Dst (rd)5 bits

Src2 (rt)5 bits

R-Formatsh amt5 bits

Page 24: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

29

MIPS Instruction Encoding

Opcode6 bits

Src1 (rs)5 bits

constant16-bits

Dst (rt)5 bits

I-Format

Example: addi R 1, R 2, 8

lw R 1, 24 (R 2)

bltz R 1, loop

Page 25: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

30

MIPS Instruction Encoding

Opcode6 bits

Jump address26-bits

J-Format

Example: jal fact

Page 26: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

32

On Instruction Processing Fetch

Get instruction whose address is in PC from memory into IR

Increment PC Decode

Understand instruction, addressing modes, etc Calculate effective addresses and fetch operands

Execute Do required operation

Write back the result of the instruction

Page 27: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

34

Instruction Execution

Mem

IR

+

PC

NPC4

Instruction Fetch (IF) from program memory

to instruction register

IR Mem [PC]

Increment PC

Instr Fetch

Page 28: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

35

Instruction Execution…

Instr Fetch

RegFile

signextend

A

Imm

B

Instr Decode

InstMem

IR

+

PC

NPC4

A RegisterFile[rs] B RegisterFile[rt]Imm sign extend(IR15-0)

Instruction Decode & Operand Fetch (ID)

Page 29: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

36

Instruction Execution…Execution (EX)

Arithmetic Inst: ALU-Out A op B ALU-Out A op Imm

Load/Store Inst: ALU-Out A + Imm

Branch Inst: ALU-Out NPC + Imm

Jump Inst: PC NPC 31-28 || IR 25-0 ||00

Imm

NPCALU-

outALU

Zero?

B

A

Cond.

Execution

Page 30: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

37

Instruction Execution…Memory (MEM)

Execution Memory

Imm

NPCALUoutALU

Zero?

Mem LMDB

A

Cond

Store Instr Mem[ALUOut] B

Load Instr LMD Mem[ALUout]

Page 31: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

38

Instruction Execution…Write Back (WB)

ALU Inst RegisterFile[rd] ALUout

Load Inst RegisterFile[rt] LMD

Conditional Branch Inst PC ALU-out if Cond PC NPC otherwise

Page 32: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

39

Processor Datapath

MemIR

+

PC

NPC

RegFile

signextend

A

Imm

B

Inst Fetch

IF

Inst Decode

ID

4

ALUoutALU

Zero?

MemLMD

Execution

EX

Memory

MEM

Cond

WB

Page 33: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

40

Our Assumptions1. Disparity in Processor vs Memory speed

Time for performing addition, register access, etc. vs memory fetch?

Which stages require memory access?

2. Main memory delays not typically seen by instruction processor Otherwise timeline is dominated by them There is some hardware mechanism through which

most memory access requests can be satisfied at processor speeds (cache memory)

3. Preferable that the time required for each stage of instruction processing to be the same – cycle time

Page 34: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

41

Processor cycle time: time required to do

Cache memory access Register access + some logic (like decode) ALU operation

Inst Fetch IF

Inst Decode ID

ExecutionEX

Memory MEM

MemIR

+PC

NPC

RegFile

signextend

A

Imm

B

4

ALUoutALU

Zero?

Mem LMD

Cond

WriteBack WB

Page 35: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

43

Performance of Processor Which is more important?

execution time of a single instruction throughput of instruction execution i.e.,

number of instructions executed per unit time Cycles Per Instruction (CPI)

Current ideas: CPI between 3 and 5

Page 36: SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan govind@serc.

CPI Calculation Cycles for

ALU Ins. – 4; Load – 5 ; Store – 4; Conditional – 4; Jump – 3;

% of Instructions in a Program ALU Ins. – 45 %; Load – 15% ; Store – 10% ;

Conditional – 20% ; Jump – 10%; CPI = ?

CPI = 0.45*4 + 0.25*5 + 0.1*4 + 0.2*4 + 0.1*3 = 4.55

How to improve CPI? Pipelining : Fetch the next instruction while the

previous is being decoded.

44