Top Banner
Lec 9 Systems Architecture 1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or all figures from Computer Organization and Design: The Hardware/Software Approach, Third Edition, by David Patterson and John Hennessy, are copyrighted material (COPYRIGHT 2004 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED).
27

Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

Dec 29, 2015

Download

Documents

Betty Weaver
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

Lec 9 Systems Architecture 1

Systems Architecture

Lecture 10: Alternative Instruction Sets

Jeremy R. Johnson Anatole D. RuslanovWilliam M. Mongan

Some or all figures from Computer Organization and Design: The Hardware/Software Approach, Third Edition, by David Patterson and John Hennessy, are copyrighted material (COPYRIGHT 2004 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED).

Page 2: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

Lec 9 Systems Architecture 2

Introduction

• Objective: To compare MIPS to several alternative instruction set architectures and to better understand the design decisions made in MIPS.

• MIPS is an example of a RISC (Reduced Instruction Set Computer) architecture as compared to a CISC (Complex Instruction Set Computer) architecture.

• MIPS trades complexity of instructions and hence greater number of instructions, for a simpler implementation and shorter clock cycle or reduced number of clock cycles per instruction.

• Alternative instruction set, including recent versions of MIPS– Provide more powerful operations

– Aim at reducing the number of instructions executed

– The danger is a slower cycle time and/or a higher CPI

Page 3: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

Lec 9 Systems Architecture 3

Characteristics of MIPS

• Load/Store architecture• General purpose register machine (32 registers)• ALU operations have 3 register operands (2 source + 1 dest)• 16 bit constants for immediate mode• Simple instruction set

– Simple branch operations (beq, bne)– Use register to set condition (e.g. slt)– Operations such as move, li, blt built from existing operations

• Uniform encoding– All instructions are 32-bits long– Opcode is always in the high-order 6 bits– 3 types of instruction formats– Register fields in the same place for all formats

Page 4: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

Lec 9 Systems Architecture 4

Design Principles

• Simplicity favors regularity– uniform instruction length– all ALU operations have 3 register operands– register addresses in the same location for all instruction formats

• Smaller is faster– register architecture– small number of registers

• Good design demands good compromises– fixed length instructions and only 16 bit constants– several instruction formats but consistent length

• Make common cases fast– immediate addressing– 16 bit constants– only beq and bne

Page 5: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

Lec 9 Systems Architecture 5

MIPS Addressing Modes• Immediate Addressing

– 16 bit constant from low order bits of instruction– addi $t0, $s0, 4

• Register Addressing– add $t0, $s0, $s1

• Base Addressing (displacement addressing)– 16-bit constant from low order bits of instruction plus base register– lw $t0, 16($sp)

• PC-Relative Addressing– (PC+4) + 16-bit address (word) from instruction– bne $s0, $s1, Target

• Pseudodirect Addressing– high order 4 bits of PC+4 concatenated with 26 bit word address - low order 26 bits

from instruction shifted 2 bits to the right– j Address

Page 6: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

Lec 9 Systems Architecture 6

PowerPC• Similar to MIPS (RISC)• Two additional addressing modes

– indexed addressing - base register + index register• PowerPC: lw $t1, $a0+$s3• MIPS: add $t0, $a0,$s3

lw $t1, 0($t0)– Update addressing - displacement addressing + increment

• PowerPC: lwu $t0, 4($s3)• MIPS: lw $t0, 4($s3)

addi $s3, $s3, 4• Additional instructions

– separate counter register used for loops– PowerPC: bc Loop, ctr!=0– MIPS: Loop:

addi $t0, $t0, -1

bne $t0, $zero, Loop

Page 7: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

Lec 9 Systems Architecture

Characteristics of 80x86 / IA-32• Evolved from 8086 (and backward compatible!!!)

• Register-Memory architecture

• 8 General purpose registers (evolved)

• Complex instruction set– Instruction lengths vary from 1 to 17 bytes long – A postbyte used to indicate addressing mode when not in opcode– Instructions may have many variants– Special instructions (move, push, pop, string, decimal)– Use condition codes – 7 data addressing modes – complex - with 8 or 32 bit displacement– Instructions can operate on 8, 16, or 32 bits (mode) changed with prefix– One operand must act as both a source and destination– One operand can come from memory

• Saving grace:– the most frequently used instructions are not too difficult to build– compilers avoid the portions of the architecture that are slow

Page 8: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

8

The Intel x86 ISA• Evolution with backward compatibility

– 8080 (1974): 8-bit microprocessor• Accumulator, plus 3 index-register pairs

– 8086 (1978): 16-bit extension to 8080• Complex instruction set (CISC)

– 8087 (1980): floating-point coprocessor• Adds FP instructions and register stack

– 80286 (1982): 24-bit addresses, MMU• Segmented memory mapping and protection

– 80386 (1985): 32-bit extension (now IA-32)• Additional addressing modes and operations• Paged memory mapping as well as segments

Page 9: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

9

The Intel x86 ISA• Further evolution…

– i486 (1989): pipelined, on-chip caches and FPU• Compatible competitors: AMD, Cyrix, …

– Pentium (1993): superscalar, 64-bit datapath• Later versions added MMX (Multi-Media eXtension)

instructions• The infamous FDIV bug

– Pentium Pro (1995), Pentium II (1997)• New microarchitecture (see Colwell, The Pentium

Chronicles)– Pentium III (1999)

• Added SSE (Streaming SIMD Extensions) and associated registers

– Pentium 4 (2001)• New microarchitecture• Added SSE2 instructions

Page 10: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

10

The Intel x86 ISA• And further…

– AMD64 (2003): extended architecture to 64 bits– EM64T – Extended Memory 64 Technology (2004)

• AMD64 adopted by Intel (with refinements)• Added SSE3 instructions

– Intel Core (2006)• Added SSE4 instructions, virtual machine support

– AMD64 (announced 2007): SSE5 instructions• Intel declined to follow, instead…

– Advanced Vector Extension (announced 2008)• Longer SSE registers, more instructions

• If Intel didn’t extend with compatibility, its competitors would!– Technical elegance ≠ market success

Page 11: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

Lec 9 Systems Architecture 11

IA-32 Registers and Data Addressing

• Registers in the 32-bit subset that originated with 80386GPR 0

GPR 1

GPR 2

GPR 3

GPR 4

GPR 5

GPR 6

GPR 7

Code segment pointer

Stack segment pointer (top of stack)

Data segment pointer 0

Data segment pointer 1

Data segment pointer 2

Data segment pointer 3

Instruction pointer (PC)

Condition codes

Use

031Name

EAX

ECX

EDX

EBX

ESP

EBP

ESI

EDI

CS

SS

DS

ES

FS

GS

EIP

EFLAGS

Page 12: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

Lec 9 Systems Architecture 12

IA-32 Addressing Modes

Mode Description MIPS equivalent

Register indirect address in register lw $s0, 0($s1)

Based mode with 8 or 32-bit displacement

address is contents of base register plus displacement

lw $s0, const($s1)

# const <= 16 bits

Base plus scaled index (not in MIPS)

Base + (2scale index)

mul $t0, $s2, 2scale

add $t0, $t0, $s1

lw $s0, 0($t0)

Base plus scaled index 8 or 32-bit plus displacement (not in MIPS)

Base + (2scale index) + displacement

mul $t0, $s2, 2scale

add $t0, $t0, $s1

lw $s0, const($t0)

# const <= 16 bits

There are some restrictions on register use ( not “general purpose”).

Page 13: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

Lec 9 Systems Architecture 13

Typical IA-32 Instructions

Instruction Function

JE nameif equal(condition code) EIP = name, EIP - 128 < name < EIP + 128

JMP name EIP = name

CALL name SP = SP - 4; M[SP] = EIP + 5; EIP = name

MOVW EBX,[EDI+45] EBX = M[EDI+45]

PUSH ESI SP = SP - 4; M[SP] = ESI

POP EDI EDI = M[SP]; SP = SP + 4

ADD EAX,#6765 EAX = EAX + 6765

TEST EDX, #42 set condition code (flags) with EDX and 42

MOVSL M[EDI] = M[ESI]; EDI = EDI + 4; ESI = ESI + 4

Page 14: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

Lec 9 Systems Architecture 14

IA-32 instruction Formats

• Typical formats: (note the different instruction lengths)a. JE EIP + displacement

b. CALL

c. MOV EBX, [EDI + 45]

d. PUSH ESI

e. ADD EAX, #6765

f. TEST EDX, #42

ImmediatePostbyteTEST

ADD

PUSH

MOV

CALL

JE

w

w ImmediateReg

Reg

wd Displacementr/m

Postbyte

Offset

DisplacementCondi-tion

4 4 8

8 32

6 81 1 8

5 3

4 323 1

7 321 8

Page 15: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

15

Implementing IA-32

• Complex instruction set makes implementation difficult– Hardware translates instructions to simpler microoperations

• Simple instructions: 1–1

• Complex instructions: 1–many

– Microengine similar to RISC

– Market share makes this economically viable

• Comparable performance to RISC– Compilers avoid complex instructions

Page 16: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

Lec 9 Systems Architecture 16

Architecture Evolution

• Accumulator– EDSAC

• Extended Accumulator (special purpose register)– Intel 8086

• General Purpose Register– register-register (CDC 6600, MIPS, SPARC, PowerPC)– register-memory (Intel 80386, IBM 360)– memory-memory (VAX)

• Alternative– stack– high-level language

Page 17: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

17

Example: Clearing and Array

clear1(int array[], int size) { int i; for (i = 0; i < size; i += 1) array[i] = 0;}

clear2(int *array, int size) { int *p; for (p = &array[0]; p < &array[size]; p = p + 1) *p = 0;}

move $t0,$zero # i = 0loop1: sll $t1,$t0,2 # $t1 = i * 4 add $t2,$a0,$t1 # $t2 = # &array[i] sw $zero, 0($t2) # array[i] = 0 addi $t0,$t0,1 # i = i + 1 slt $t3,$t0,$a1 # $t3 = # (i < size) bne $t3,$zero,loop1 # if (…) # goto loop1

move $t0,$a0 # p = & array[0] sll $t1,$a1,2 # $t1 = size * 4 add $t2,$a0,$t1 # $t2 = # &array[size]loop2: sw $zero,0($t0) # Memory[p] = 0 addi $t0,$t0,4 # p = p + 4 slt $t3,$t0,$t2 # $t3 = #(p<&array[size]) bne $t3,$zero,loop2 # if (…) # goto loop2

Page 18: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

18

Comparison of Array vs. Ptr

• Multiply “strength reduced” to shift

• Array version requires shift to be inside loop– Part of index calculation for incremented i

– c.f. incrementing pointer

• Compiler can achieve same effect as manual use of pointers– Induction variable elimination

– Better to make program clearer and safer

Page 19: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

19

ARM & MIPS Similarities

• ARM: the most popular embedded core• Similar basic set of instructions to MIPS

ARM MIPS

Date announced 1985 1985

Instruction size 32 bits 32 bits

Address space 32-bit flat 32-bit flat

Data alignment Aligned Aligned

Data addressing modes 9 3

Registers 15 × 32-bit 31 × 32-bit

Input/outputMemory mapped

Memory mapped

Page 20: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

20

Compare and Branch in ARM

• Uses condition codes for result of an arithmetic/logical instruction

– Negative, zero, carry, overflow

– Compare instructions to set condition codes without keeping the result

• Each instruction can be conditional– Top 4 bits of instruction word: condition value

– Can avoid branches over single instructions

Page 21: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

21

Instruction Encoding

Page 22: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

22

Fallacies

• Powerful instruction higher performance– Fewer instructions required– But complex instructions are hard to implement

• May slow down all instructions, including simple ones

– Compilers are good at making fast code from simple instructions

• Use assembly code for high performance– But modern compilers are better at dealing with

modern processors– More lines of code more errors and less

productivity

Page 23: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

23

Fallacies

• Backward compatibility instruction set doesn’t change– But they do accrete more instructions

x86 instruction set

Page 24: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

24

Pitfalls

• Sequential words are not at sequential addresses– Increment by 4, not by 1!

• Keeping a pointer to an automatic variable after procedure returns

– e.g., passing pointer back via an argument

– Pointer becomes invalid when stack popped

Page 25: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

25

Concluding Remarks

• Design principles

1. Simplicity favors regularity

2. Smaller is faster

3. Make the common case fast

4. Good design demands good compromises

• Layers of software/hardware– Compiler, assembler, hardware

• MIPS: typical of RISC ISAs– c.f. x86

Page 26: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

April 19, 2023 Chapter 2 — Instructions: Language of the Computer

26

Concluding Remarks

• Measure MIPS instruction executions in benchmark programs– Consider making the common case fast

– Consider compromises

Instruction class MIPS examples SPEC2006 Int SPEC2006 FP

Arithmetic add, sub, addi 16% 48%

Data transferlw, sw, lb, lbu, lh, lhu, sb, lui 35% 36%

Logicaland, or, nor, andi,

ori, sll, srl 12% 4%

Cond. Branchbeq, bne, slt, slti, sltiu 34% 8%

Jump j, jr, jal 2% 0%

Page 27: Lec 9Systems Architecture1 Systems Architecture Lecture 10: Alternative Instruction Sets Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some or.

Lec 9 Systems Architecture

• Instruction complexity is only one variable– lower instruction count vs. higher CPI / lower clock rate

• Design Principles:– simplicity favors regularity– smaller is faster– good design demands compromise– make the common case fast

• Instruction set architecture– a very important abstraction indeed!

Summary