(246431835) instruction set principles (2) (1)

Instruction Set Architecture

� What is an instruction set?�Portion of the machine visible to the programmer or

compiler writer�Each instruction is directly executed by hardware

� Examples�DEC VAX

�INTEL IA-32

�H and P DLX

�MIPS, Power PC, SPARC

�ARM

Instruction Set Principles

� Chapter 2 in both 2nd and 3rd edition

operand

Instruction Set Architecture

� How are they represented?�By bits

�Typically 16, 32, 64

� Variable or Fixed�Fixed – each instruction is same size

What you see What the machine sees

Add A,B

op-code operand

Application Areas

� Desktop computing� Code size not important

� Integer and floating-point performance is important

� Servers� Floating-point not important

� Integer performance is important

� Embedded applications� Value cost and power efficiency is important

� Code size is important

� Multimedia and DSP applications� Real time constraints

� Power efficient

Classifying Instruction Sets

� Type of internal CPU storage�Stack – operands are implicit

�Accumulator – one operand is implicit

�General purpose registers – explicit operands


� Where are the operands?�Accumulator, Stack, registers, memory

�Register-Register (Load-Store), Memory-Register, Memory-Memory

� Size and type of operands�8 bits, 16 bits, unsigned, signed, floating point

� Addressing modes

� Types of operations


� Number of operands

� Two operand format�Add R1,R2�The result is placed in the first operand

�Source and result are the same

� Three operand format�Add R3,R1,R2�The result is placed in the first operand


� Type of internal CPU storage�Stack – both operands are implicit

�Accumulator – one operand is implicit

�General purpose registers – explicit operands

Pre-1980’s

operatorStack Accumulator Register

C = A + B Push A Load A Load R1,A

Push B Add B Load R2,B

Add Store C Add R3,R1,R2

operand Pop C Store C,R3

Number of Operands

� Changes the instruction length

� Variable number of operands Æ variable length

� Variable length increases the complexity of the architecture

Classifying GPR Machines

� The number of memory operands? � Notation: Rx is a register and A is a memory location

� Load Store Machines (Register-Register) – 0 memory operands� Load R1,A� Load R2,B� Add R1,R2

� Register-Memory Machines – 1 memory operand� Load R1,A� Add R1,B

� Memory-Memory Machines – 2 or 3 memory operands� Add R1,A,B� Add C,A,B

Register-Register Machines (0,3)

� Example� Add R1, R2, R3� ARM, MIPS, PowerPC, SPARC

� Advantages� Simple fixed length instruction encoding

� Decoding is simplified

� CPI is uniform

� Code generation is simplified

� Disadvantages� Instruction count is high

� Some instructions are short wasting bits (low density)

� Leads to large programs

Types of General Purpose RegisterMachines

� Notation (m,n): m memory operands n total operands

Need for memory address may limit the number of registers

Register operand

Add R1, C

op-code Memory address operand

Memory operands take up many bits leaving fewer bits for theRegister operand thus allowing a fewer number of registers.

Register-Memory (1,2)

� Example� Add R1, C� Add R1, R2� Intel 80x86, Motorola 68000

� Advantages� Data access immediate without loading� Instruction format simple� Instruction density higher than (0,3) model� Note: instruction density better use of bits

� Disadvantages� Source may be destroyed� Need for memory address may limit the number of registers� CPI will vary depending on type of operands

Memory Addressing

� How is the memory address interpreted?

� Byte addressed

� Byte order�Big Endian vs. Little Endian

� Alignment�An object of size s bytes at byte address A is aligned if

A mod s = 0

� Addressing modes

Memory-Memory (3,3)

� Advantages�Best instruction density

�Doesn’t waste registers for temporary results

� Disadvantages�Large variation in instruction size (3 operand

instructions)�Large variation in CPI

�Can worsen memory bottleneck

� Most complex model – currently extinct� VAX

Interpreting Addresses

• Memory is just a bunch of bits.• How big can the address be?

32-Bit addressing

Address Memory


• Memory is just a bunch of bits.• How do we address it?

Byte addressing

Address Memory

Byte Ordering


• What is the length of the thing we are addressing?

•Typical lengths: byte 8, half-word16, word 32, double word 64

Word addressing

Address Memory

Alignment

� For a byte addressed machine�all byte accesses are aligned

�word accesses are aligned if the address is a multiple of 4

�32-bit integer accesses are aligned if the address is a multiple of 4

�64-bit floating point accesses are aligned if the address is a multiple of 8

Alignment

�An object of size s bytes at byte address A is aligned ifA mod s = 0

Byte 0 Byte 1 Byte 2 Byte 3 Byte 4

Accessing this word is a misaligned access.

Misalignment may cause slow performance

Addressing Modes – DataMode Example Meaning When used

Register Add R4,R3 R[4]=R[4]+R[3] When a value is in a register

Immediate Add R4,#3 R[4]=R[4]+3 For constants

Displacement Add R4,100(R1) R[4]=R[4]+M[100+R[1]] Accessing local variables

Register Add R4,(R1) R[4]=R[4]+M[R[1]] Accessing Deferred or pointer Indirect

Indexed Add R3,(R1+R2) R[3]=R[3]+M[R[1]+R[2]] Array addressing

Direct Add R1,(1001) R[1]=R[1]+M[1001] Accessing static

data

Memory Add R1,@(R3) R[1]=R[1]+M[M[R[3]]] Dereferencing a

indirect pointer

Addressing Modes

� GPR machines can address Constants, Registers, and Memory

� An address mode determines how a memory address is determined


Indexed Add R3,(R1+R2) R[3]=R[3]+M[R[1]+R[2]] Array addressing

The address is computed by adding the contents of two registers

23

R1 = 16 22

R2 = 3 21Load Byte R3, (R1+R2)Loads the byte at address 19 into Array register 3.Same as loading the 4 byte of the array

17

16


Displacement Add R4,100(R1) R[4]=R[4]+M[100+R[1]] Accessing local variables

The address is computed by adding a constant to the number in a register.

Examples

•R3 is a register containing the number 400•Load Word R2,0(R3)

•Load the word at memory address 400 into register 2•Load Word R4,4(R3)

•Load the word at memory address 404 into register 4

Addressing Modes: Comments

� Programs typically produce�Displacement

�Immediate

�Register deferred

� Displacement Addressing Modes

�How large a displacement?

�Affects instruction length

� Immediate�Comparisons for branching

�Most are small (if a = 0) then …

Addressing Modes: Comments

� Change the instruction count�Complex address modes reduce IC

� Change the organization of the machine�Complex address modes increase the complexity

�May increase CPI

Operator Types

� Arithmetic and logical – Add, Sub, AND, OR

� Data transfer – Load, Store, Move

� Control – Branch, Jump, Calls, Returns

� System – Operating system calls

� Floating point

� Decimal – Decimal arithmetic, character conversion

� String – Move, Compares, Searches

� Graphics – Pixel operations (MMX)

How Large a Displacement?

Register operand Displacement

Add R4,100(R1)

op-code Register operand

The bigger the displacement the more bits that be used. Leads to larger instruction size.

Operator Types

� Data transfer – Load, Store, Move�They transfer data.

�Load transfers data from memory to a register

�Store transfers data from a register to memory

�Move transfers data between registers

Operator Types

� Arithmetic and logical – Add, Sub, AND, OR�They do the obvious thing

�Use the ALU (arithmetic logic unit)

Instructions for Control Flow

� Many names – transfer, branch, jump

� Our terminology�Jump – unconditional change

�Branch – conditional change

� Four types�Conditional branches

�Jumps

�Procedure calls

�Procedure returns


Load …

Load …

Add …

jump

…

…

…

…

PC = 86 before jump

86

PC-Relative

� Most architectures use PC relative

� Use fewer bits for destination

� Program independence – easier to link code

83 Load …

84 Load …

PC = 96 after jump 85 Add …

jump 10


� Change the program counter (PC)�PC-relative addressing

� The operand for a control flow instruction is the destination

� Control flow instructions also have addressing modes

Addressing Modes

� Direct (immediate) or Indirect

� Direct then destination is known at compile time

� Indirect known at runtime�Case, switch statements�Usually the destination is put in a register

Conditional Branch

� If (a = b) { … } else { … }� Implementation issue: How is the condition set?

Jump

� Unconditional change in the order of execution of instructions

� Can be used for looping

for (i=1 to 100)

{ … }

Instructions: Summary

� Type of operation�ALU, Data transfer, Floating-point, Control

� Are operands explicit or implicit�Explicit – registers and memory

�Implicit – stack and accumulator

� How many operands are in memory�Load-store, register-memory, memory-memory

� How is the address determined (mode)�Immediate, indirect etc …

Commonly Executed Instructions

Rank Instruction Percentage

1 load 22

2 Conditional branch 20

3 Compare 16

4 Store 12

5 Add 8

6 And 6

7 Sub 5

8 Move 4

9 Call 1

10 return 1

Total 96%

80x86 processor using SPEC92

specifier nNo of operands specifier 1

Three Basic Variations

Variable approach (e.g. VAX)

Operation & Address Address field 1 … Address Address field n

These bits determine the address mode (explicit)

These bits determine which operation and how many operands

Encoding the ISA

� What is the binary representations of theInstruction Set Architecture?

� How are the operations encoded?�The op-code

� How are the operands encoded?�Variable or fixed

� How is the address mode encoded?�Explicit or implicit in the op-code





Fixed approach (e.g. DLX, MIPS, Power PC, Sparc)

Operation Address field 1 Address field 2 Address field 3

Hybrid approach (e.g. IBM 360/70, Intel 80x86)

Operation Address specifier Address field

Operation Address specifier Address field 1 Address field 2

Multiple formats. The op-code determines the length.





Fixed approach (e.g. DLX, MIPS, Power PC, Sparc)

Operation Address field 1 Address field 2 Address field 3

The address mode is implicit in the op-code

Compiler

� In the past decisions were made to make assembly language programming easier

� Today compilers do the work

� Compiler and ISA are not independent

Trade-offs

Fixed Easy to decode Many instructions

Variable Hard to decode Few instructions

Machine independent Loop transformations

(e.g. register count and types)

Language independent Machine dependent optimizations

The Register Allocation Problem

� Accessing registers is faster than memory

� Compiler should first ensure correctness then

� Compiler should minimize calls to memory

� Problem: How to assign variables to registers

Variable 1 Register 1Variable 2 Register 2… …

… …Variable n Register m

Many more variables than registers

Code

optimizations

Structure of CompilersDependency Function

Language dependent Font-end Transform to common form

Somewhat language dependent High-level Procedure in-lining

Small language dependence Global Register allocationSome machine dependence Optimizer

Detailed instruction selection

Highly machine dependent GeneratorHighly ISA dependent

The Effect on Register Allocation?

� Stack�Register allocation generally effective

� Global data area�Register allocation is difficult

� Heap�Register allocation is near impossible

�Too many pointers

�Too big

Where are the variables?

� Stack� Used for local variables and activation records� Scalars (single variables as opposed to arrays)� Register allocation generally effective

� Global data area� Statically declared objects� Global variables and constants� Register allocation is difficult

� Heap� Dynamic objects� Generally accessed via pointers� Not scalars� Register allocation is near impossible

Instruction set properties that help compiler writers

� Orthogonal�Operations, data types, and addressing modes

should be independent

� Provide primitives not solutions�What works in one language may be bad for another

�Avoid high-level instructions

(246431835) instruction set principles (2) (1)

Technology

(246431835) instruction set principles (2) (1)