Top Banner
Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing
76

Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Computer Architecture

Lecture 18

Superscalar Processor and High Performance Computing

Page 2: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Static Superscalar Pipeline

Fetch 64-bits/clock cycle; Int on left, FP on right– Can only issue 2nd instruction if 1st instruction issues– More ports for FP registers to do FP load & FP op in a pair

Type Pipe StagesInt. instruction IF ID EX MEM WBFP instruction IF ID EX MEM WBInt. instruction IF ID EX MEM WBFP instruction IF ID EX MEM WBInt. instruction IF ID EX MEM WBFP instruction IF ID EX MEM WB

1 cycle load delay can cause delay up to 3 instructions in Superscalar - instruction in right half can’t use it, nor instructions in next slot

Page 3: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dynamic Super Scalar pipeline in operation

Wait for Operands

Check for RSCheck for RAW

EXTAC

MemAcces

CDB #1A

1A2

A3

A4

M1

M2

.. M7

Divide

Wait for Operands

Wait for Operands

Wait for Operands

LD/ST

FP

Write Reg

ISSUE/ Rename to

RS

ISSUE/ Rename to

RS

Instr.

Cach

e

Wider Bus

CDB #2Wait for

Operands

Wait for Operands

Wait for Operands

Read Reg

Page 4: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Example 1

Loop: L.D F0,0(R1) ;F0=array elementADD.D F4,F0,F2S.D F4,0(R1) ; store result ADDIU R1,R1,#-8;8 bytes (per DW)

BNE R1,R2,LOOP ;branch R1!=R2

Page 5: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1)

1 DADDIU R1,R1,#-8

1 BNE R1,R2,Loop

2 L.D F0,0(R1)

2 ADD.D F4,F0,F2

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 6: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1) 2

1 DADDIU R1,R1,#-8

2

1 BNE R1,R2,Loop

2 L.D F0,0(R1)

2 ADD.D F4,F0,F2

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 7: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1) 2 3

1 DADDIU R1,R1,#-8

2

1 BNE R1,R2,Loop 3

2 L.D F0,0(R1)

2 ADD.D F4,F0,F2

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 8: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1) 2 3

1 DADDIU R1,R1,#-8

2 4

1 BNE R1,R2,Loop 3

2 L.D F0,0(R1) 4

2 ADD.D F4,F0,F2 4

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 9: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3

2 L.D F0,0(R1) 4

2 ADD.D F4,F0,F2 4

2 S.D F4,0(R1) 5

2 DADDIU R1,R1,#-8

5

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 10: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5,6 Wait for L.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4

2 ADD.D F4,F0,F2 4

2 S.D F4,0(R1) 5

2 DADDIU R1,R1,#-8 5

2 BNE R1,R2,Loop 6

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 11: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5,6,7 Wait for L.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 Wait for BNE

2 ADD.D F4,F0,F2 4

2 S.D F4,0(R1) 5

2 DADDIU R1,R1,#-8 5

2 BNE R1,R2,Loop 6

3 L.D F0,0(R1) 7

3 ADD.D F4,F0,F2 7

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 12: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 Wait for L.D

2 S.D F4,0(R1) 5 8 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 Wait for ALU

2 BNE R1,R2,Loop 6 Wait for DADDIU

3 L.D F0,0(R1) 7 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8

3 DADDIU R1,R1,#-8 8

3 BNE R1,R2,Loop

Page 13: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 Wait for L.D

2 S.D F4,0(R1) 5 8 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 Wait for ALU

2 BNE R1,R2,Loop 6 Wait for DADDIU

3 L.D F0,0(R1) 7 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 Wait for ALU

3 BNE R1,R2,Loop 9

Page 14: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10 Wait for L.D

2 S.D F4,0(R1) 5 8 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 Wait for DADDIU

3 L.D F0,0(R1) 7 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 Wait for ALU

3 BNE R1,R2,Loop 9 Wait for DADDIU

Page 15: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10,11 Wait for L.D

2 S.D F4,0(R1) 5 8 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 Wait for ALU

3 BNE R1,R2,Loop 9 Wait for DADDIU

Page 16: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10,11,12 Wait for L.D

2 S.D F4,0(R1) 5 8 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 12 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 Wait for ALU

3 BNE R1,R2,Loop 9 Wait for DADDIU

Page 17: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D

2 S.D F4,0(R1) 5 8 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 12 13 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 13 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 Wait for ALU

3 BNE R1,R2,Loop 9 Wait for DADDIU

Page 18: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D

2 S.D F4,0(R1) 5 8 14 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 12 13 14 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 13 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 14 Wait for ALU

3 BNE R1,R2,Loop 9 Wait for DADDIU

Page 19: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D

2 S.D F4,0(R1) 5 8 14 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 12 13 14 Wait for BNE

3 ADD.D F4,F0,F2 7 15 Wait for L.D

3 S.D F4,0(R1) 8 13 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 14 15 Wait for ALU

3 BNE R1,R2,Loop 9 Wait for DADDIU

Page 20: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D

2 S.D F4,0(R1) 5 8 14 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 12 13 14 Wait for BNE

3 ADD.D F4,F0,F2 7 15,16 Wait for L.D

3 S.D F4,0(R1) 8 13 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 14 15 Wait for ALU

3 BNE R1,R2,Loop 9 16 Wait for DADDIU

Page 21: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 1 Integer UnitIt. Instructions Issues Execute

sMem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D

2 S.D F4,0(R1) 5 8 14 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 12 13 14 Wait for BNE

3 ADD.D F4,F0,F2 7 15,16,17 Wait for L.D

3 S.D F4,0(R1) 8 13 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 14 15 Wait for ALU

3 BNE R1,R2,Loop 9 16 Wait for DADDIU

Page 22: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 1 Integer Unit

It. Instructions Issues Executes

Mem Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8

2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D

2 S.D F4,0(R1) 5 8 14 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 12 13 14 Wait for BNE

3 ADD.D F4,F0,F2 7 15,17 18 Wait for L.D

3 S.D F4,0(R1) 8 13 19 Wait for ADD.D

3 DADDIU R1,R1,#-8

8 14 15 Wait for ALU

3 BNE R1,R2,Loop 9 16 Wait for DADDIU

Page 23: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Separate MEM and INT

Wait for Operands

Check for RS

Check for RAW

Wait for Operands

EXTAC

MemAccess

CDB #1

EX

A1

A2

A3

A4

M1

M2

.. M7

Divide

Wait for Operands

Wait for Operands

Wait for Operands

Integer

LD/ST

FP

Write Reg

ISSUE/ Rename to

RS

ISSUE/ Rename to

RS

Instr.

Cach

e

Wider Bus

CDB #2

Wait for Operands

Wait for Operands

Wait for Operands

Wait for Operands

Read Reg

Page 24: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 2 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1)

1 DADDIU R1,R1,#-8

1 BNE R1,R2,Loop

2 L.D F0,0(R1)

2 ADD.D F4,F0,F2

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 25: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 2 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1) 2

1 DADDIU R1,R1,#-8

2

1 BNE R1,R2,Loop

2 L.D F0,0(R1)

2 ADD.D F4,F0,F2

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 26: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 2 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1) 2 3

1 DADDIU R1,R1,#-8

2 3

1 BNE R1,R2,Loop 3

2 L.D F0,0(R1)

2 ADD.D F4,F0,F2

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 27: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 2 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 Wait for LD.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3

2 L.D F0,0(R1) 4

2 ADD.D F4,F0,F2 4

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 28: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5 Wait for LD.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4

2 ADD.D F4,F0,F2 4

2 S.D F4,0(R1) 5

2 DADDIU R1,R1,#-8

5

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 29: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5,6 Wait for LD.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 Wait for BNE

2 ADD.D F4,F0,F2 4 Wait for L.D

2 S.D F4,0(R1) 5 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 Executes earlier

2 BNE R1,R2,Loop 6

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 30: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5,6,7 Wait for LD.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 Wait for BNE

2 ADD.D F4,F0,F2 4 Wait for L.D

2 S.D F4,0(R1) 5 7 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6

3 L.D F0,0(R1) 7

3 ADD.D F4,F0,F2 7

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 31: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 Wait for L.D

2 S.D F4,0(R1) 5 7 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8

3 L.D F0,0(R1) 7

3 ADD.D F4,F0,F2 7

3 S.D F4,0(R1) 8

3 DADDIU R1,R1,#-8

8

3 BNE R1,R2,Loop

Page 32: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 9 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 9 Wait for L.D

2 S.D F4,0(R1) 5 7 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8 Wait for ADDIU

3 L.D F0,0(R1) 7 9 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8

3 DADDIU R1,R1,#-8

8 9

3 BNE R1,R2,Loop 9

Page 33: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 9 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 9,10 Wait for L.D

2 S.D F4,0(R1) 5 7 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8 Wait for ADDIU

3 L.D F0,0(R1) 7 9 10 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 10 Wait for ADD.D

3 DADDIU R1,R1,#-8

8 9 10 Executes earlier

3 BNE R1,R2,Loop 9 Wait for ADDIU

Page 34: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 9 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 9,10,11 Wait for L.D

2 S.D F4,0(R1) 5 7 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8 Wait for ADDIU

3 L.D F0,0(R1) 7 9 10 11 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 10 Wait for ADD.D

3 DADDIU R1,R1,#-8

8 9 10 Executes earlier

3 BNE R1,R2,Loop 9 11 Wait for ADDIU

Page 35: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 9 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 9-11 12 Wait for L.D

2 S.D F4,0(R1) 5 7 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8 Wait for ADDIU

3 L.D F0,0(R1) 7 9 10 11 Wait for BNE

3 ADD.D F4,F0,F2 7 12 Wait for L.D

3 S.D F4,0(R1) 8 10 Wait for ADD.D

3 DADDIU R1,R1,#-8

8 9 10 Executes earlier

3 BNE R1,R2,Loop 9 11 Wait for ADDIU

Page 36: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 9 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 9-11 12 Wait for L.D

2 S.D F4,0(R1) 5 7 13 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8 Wait for ADDIU

3 L.D F0,0(R1) 7 9 10 11 Wait for BNE

3 ADD.D F4,F0,F2 7 12,13 Wait for L.D

3 S.D F4,0(R1) 8 10 Wait for ADD.D

3 DADDIU R1,R1,#-8

8 9 10 Executes earlier

3 BNE R1,R2,Loop 9 11 Wait for ADDIU

Page 37: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 9 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 9-11 12 Wait for L.D

2 S.D F4,0(R1) 5 7 13 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8 Wait for ADDIU

3 L.D F0,0(R1) 7 9 10 11 Wait for BNE

3 ADD.D F4,F0,F2 7 12-14 15 Wait for L.D

3 S.D F4,0(R1) 8 10 16 Wait for ADD.D

3 DADDIU R1,R1,#-8

8 9 10 Executes earlier

3 BNE R1,R2,Loop 9 11 Wait for ADDIU

Page 38: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Speculative Execution

Need to overcome Branch Hazards Precise Exception

Page 39: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Speculative Pipeline

ISSUE/ Rename to RS

Check for RS

Check for RAW

CDB

A1

A2

A3

A4

Wait for Operands

FP

Write Reg

Wait for Operands

EXTAC

MemAcces

LD/ST

Wait for Operands

EXInteger

M1

M2

.. M7

Wait for Operands

DivideWait for Operands

ROB

Read Reg

Page 40: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

The Hardware: Reorder Buffer

If inst write results in program order, reg/memory always get the correct values

Reorder buffer (ROB) – reorder out-of-order inst to program order at the time of writing reg/memory (commit)

If some inst goes wrong, handle it at the time of commit – just flush inst afterwards

Inst cannot write reg/memory immediately after execution, so ROB also buffer the results

No such a place in Tomasulo original

ReorderBufferDecode

FU1 FU2

RS RS

Fetch Unit

Rename

L-bufS-buf

DM

Regfile

IM

Page 41: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Issue — get instruction from FP Op QueueCondition: a free RS at the required FUActions: (1) decode the instruction; (2) allocate a RS

and ROB entry; (3) do source register renaming; (4) do dest register renaming; (5) read register file; (6) dispatch the decoded and renamed instruction to the RS and ROB

Execution — operate on operands (EX)Condition: At a given FU, At lease one instruction is

readyAction: select a ready instruction and send it to the FU

Write result — finish execution (WB)Condition: At a given FU, some instruction finishes FU

executionActions: (1) FU writes to CDB, broadcast to all RSs and

to the ROB; (2) FU broadcast tag (ROB index) to all RS; (3) de-allocate the RS. Note: no register status update at this time

Speculative Tomasulo Algorithm

Page 42: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Speculative Tomasulo Algorithm

Commit—update register with reorder result Condition: ROB is not empty and ROB head

inst has finished execution Actions if no mis-prediction/exception: (1)

write result to register/memory, (2) update register status, (3) de-allocate the ROB entry

Actions if with mis-prediction/exception: flush the pipeline, e.g. (1) flush IFQ; (2) clear register status; (3) flush all RS and reset FU;

(4) reset ROB

Page 43: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Example

while (A(i) <> x){ A(i) ++;i++; }

Loop: LD R2,0(R1) ;R1 = base address of A()

DADDIU R2,R2,#1 SD R2,0(R1) ;store result

DADDIU R1,R1,#4 ; BNE R2,R3,LOOP ; x = R3

Page 44: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Non-Speculative execution:Dual issue, 2 CDB, 2 Int Units

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 LD R2,0(R1) 1 First issue

1 ADDIU R2,R2,#1 1

1 SD R2,0(R1)

1 DADDIU R1,R1,#4

1 BNE R2,R3,Loop

2 LD R3,0(R1)

2 ADDIU R2,R2,#1

2 SD R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 LD R2,0(R1)

3 ADDIU R2,R2,#1

3 SD R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 45: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 LD R2,0(R1) 1 2 First issue

1 ADDIU R2,R2,#1 1 Wait for LW

1 SD R2,0(R1) 2

1 DADDIU R1,R1,#4 2

1 BNE R2,R3,Loop

2 LD R3,0(R1)

2 ADDIU R2,R2,#1

2 SD R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 LD R2,0(R1)

3 ADDIU R2,R2,#1

3 SD R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 46: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 LD R2,0(R1) 1 2 3 First issue

1 ADDIU R2,R2,#1 1 Wait for LW

1 SD R2,0(R1) 2 3 Wait for ADDIU

1 DADDIU R1,R1,#4 2 3

1 BNE R2,R3,Loop 3

2 LD R3,0(R1)

2 ADDIU R2,R2,#1

2 SD R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 LD R2,0(R1)

3 ADDIU R2,R2,#1

3 SD R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 47: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 LD R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 Wait for LW

1 SD R2,0(R1) 2 3 Wait for ADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3

2 LD R3,0(R1) 4

2 ADDIU R2,R2,#1 4

2 SD R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 LD R2,0(R1)

3 ADDIU R2,R2,#1

3 SD R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 48: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 LD R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 Wait for LW

1 SD R2,0(R1) 2 3 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 Wait for DADDIU

2 LD R3,0(R1) 4 Wait for BNE

2 ADDIU R2,R2,#1 4 Wait for LW

2 SD R2,0(R1) 5

2 DADDIU R1,R1,#4 5

2 BNE R2,R3,Loop

3 LD R2,0(R1)

3 ADDIU R2,R2,#1

3 SD R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 49: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 LD R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for LW

1 SD R2,0(R1) 2 3 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 Wait for DADDIU

2 LD R3,0(R1) 4 Wait for BNE

2 ADDIU R2,R2,#1 4 Wait for LW

2 SD R2,0(R1) 5 Wait for DADDIU

2 DADDIU R1,R1,#4 5 Wait for BNE

2 BNE R2,R3,Loop 6

3 LD R2,0(R1)

3 ADDIU R2,R2,#1

3 SD R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 50: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 LD R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for LW

1 SD R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 LD R3,0(R1) 4 Wait for BNE

2 ADDIU R2,R2,#1 4 Wait for LW

2 SD R2,0(R1) 5 Wait for DADDIU

2 DADDIU R1,R1,#4 5 Wait for BNE

2 BNE R2,R3,Loop 6 Wait for DADDIU

3 LD R2,0(R1) 7

3 ADDIU R2,R2,#1 7

3 SD R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 51: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 LD R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 SD R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 LD R3,0(R1) 4 8 Wait for BNE

2 ADDIU R2,R2,#1 4 Wait for LW

2 SD R2,0(R1) 5 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 Wait for BNE

2 BNE R2,R3,Loop 6 Wait for DADDIU

3 LD R2,0(R1) 7 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 SD R2,0(R1) 8

3 DADDIU R1,R1,#4 8

3 BNE R2,R3,Loop

Page 52: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 LD R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 SD R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 LD R3,0(R1) 4 8 9 Wait for BNE

2 ADDIU R2,R2,#1 4 Wait for LW

2 SD R2,0(R1) 5 9 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 Wait for DADDIU

3 LD R2,0(R1) 7 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 SD R2,0(R1) 8 Wait for DADDIU

3 DADDIU R1,R1,#4 8 Wait for BNE

3 BNE R2,R3,Loop 9

Page 53: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 LD R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 SD R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 LD R3,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 Wait for LW

2 SD R2,0(R1) 5 9 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 Wait for DADDIU

3 LD R2,0(R1) 7 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 SD R2,0(R1) 8 Wait for DADDIU

3 DADDIU R1,R1,#4 8 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Page 54: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 LD R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 SD R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 LD R3,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 Wait for LW

2 SD R2,0(R1) 5 9 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 Wait for DADDIU

3 LD R2,0(R1) 7 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 SD R2,0(R1) 8 Wait for DADDIU

3 DADDIU R1,R1,#4 8 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Page 55: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 LD R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 SD R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 LD R3,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 SD R2,0(R1) 5 9 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 Wait for DADDIU

3 LD R2,0(R1) 7 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 SD R2,0(R1) 8 Wait for DADDIU

3 DADDIU R1,R1,#4 8 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Page 56: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 LD R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 SD R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 LD R3,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 SD R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 LD R2,0(R1) 7 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 SD R2,0(R1) 8 Wait for DADDIU

3 DADDIU R1,R1,#4 8 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Page 57: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 LD R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 SD R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 LD R3,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 SD R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 LD R2,0(R1) 7 14 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 SD R2,0(R1) 8 Wait for DADDIU

3 DADDIU R1,R1,#4 8 14 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Page 58: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 LD R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 SD R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 LD R3,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 SD R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 LD R2,0(R1) 7 14 15 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 SD R2,0(R1) 8 15 Wait for DADDIU

3 DADDIU R1,R1,#4 8 14 15 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Page 59: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 LD R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 SD R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 LD R3,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 SD R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 LD R2,0(R1) 7 14 15 16 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 SD R2,0(R1) 8 15 Wait for DADDIU

3 DADDIU R1,R1,#4 8 14 15 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Page 60: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 LD R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 SD R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 LD R3,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 SD R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 LD R2,0(R1) 7 14 15 16 Wait for BNE

3 ADDIU R2,R2,#1 7 17 Wait for LW

3 SD R2,0(R1) 8 15 Wait for DADDIU

3 DADDIU R1,R1,#4 8 14 15 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Page 61: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 LD R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 SD R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 LD R3,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 SD R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 LD R2,0(R1) 7 14 15 16 Wait for BNE

3 ADDIU R2,R2,#1 7 17 18 Wait for LW

3 SD R2,0(R1) 8 15 Wait for DADDIU

3 DADDIU R1,R1,#4 8 14 15 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Page 62: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Non-Speculative execution:Dual issue, 2 CDB (Gap b/w Issue and Execute)

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 LD R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 SD R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 LD R3,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 SD R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 LD R2,0(R1) 7 14 15 16 Wait for BNE

3 ADDIU R2,R2,#1 7 17 18 Wait for LW

3 SD R2,0(R1) 8 15 19 Wait for DADDIU

3 DADDIU R1,R1,#4 8 14 15 Wait for BNE

3 BNE R2,R3,Loop 9 19 Wait for DADDIU

Page 63: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Speculative execution:Dual issue, 2 CDB

Page 64: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Speculative execution:Dual issue, 2 CDB

It. Instructions Iss EXE MEM CDB Commit

Comment

1 LD R2,0(R1) 1 First issue

1 ADDIU R2,R2,#1 1

1 SD R2,0(R1)

1 DADDIU R1,R1,#4

1 BNE R2,R3,Loop

2 LD R3,0(R1)

2 ADDIU R2,R2,#1

2 SD R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 LD R2,0(R1)

3 ADDIU R2,R2,#1

3 SD R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 65: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Speculative execution:Dual issue, 2 CDB

It. Instructions Iss EXE MEM CDB Commit Comment

1 LD R2,0(R1) 1 2 First issue

1 ADDIU R2,R2,#1 1

1 SD R2,0(R1) 2

1 DADDIU R1,R1,#4 2

1 BNE R2,R3,Loop

2 LD R3,0(R1)

2 ADDIU R2,R2,#1

2 SD R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 LD R2,0(R1)

3 ADDIU R2,R2,#1

3 SD R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 66: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Speculative execution:Dual issue, 2 CDB

It. Instructions Iss EXE MEM CDB Commit Comment

1 LD R2,0(R1) 1 2 3 First issue

1 ADDIU R2,R2,#1 1 LUD

1 SD R2,0(R1) 2 3 Wait for R2

1 DADDIU R1,R1,#4 2 3

1 BNE R2,R3,Loop 3

2 LD R3,0(R1)

2 ADDIU R2,R2,#1

2 SD R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 LD R2,0(R1)

3 ADDIU R2,R2,#1

3 SD R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 67: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Speculative execution:Dual issue, 2 CDB

It. Instructions Iss EXE MEM CDB Commit Comment

1 LD R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 LUD

1 SD R2,0(R1) 2 3 Wait for R2

1 DADDIU R1,R1,#4 2 3 4

1 BNE R2,R3,Loop 3

2 LD R3,0(R1) 4

2 ADDIU R2,R2,#1 4

2 SD R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 LD R2,0(R1)

3 ADDIU R2,R2,#1

3 SD R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 68: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Speculative execution:Dual issue, 2 CDB

It. Instructions Iss EXE MEM CDB Commit Comment

1 LD R2,0(R1) 1 2 3 4 5 First issue

1 ADDIU R2,R2,#1 1 5 LUD

1 SD R2,0(R1) 2 3 Wait for R2

1 DADDIU R1,R1,#4 2 3 4

1 BNE R2,R3,Loop 3

2 LD R3,0(R1) 4 5 Speculative

2 ADDIU R2,R2,#1 4

2 SD R2,0(R1) 5

2 DADDIU R1,R1,#4 5

2 BNE R2,R3,Loop

3 LD R2,0(R1)

3 ADDIU R2,R2,#1

3 SD R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 69: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Speculative execution:Dual issue, 2 CDB

It. Instructions Iss EXE MEM CDB Commit Comment

1 LD R2,0(R1) 1 2 3 4 5 First issue

1 ADDIU R2,R2,#1 1 5 6 LUD

1 SD R2,0(R1) 2 3 Wait for R2

1 DADDIU R1,R1,#4 2 3 4

1 BNE R2,R3,Loop 3

2 LD R3,0(R1) 4 5 6 Speculative

2 ADDIU R2,R2,#1 4 LUD

2 SD R2,0(R1) 5 6 Speculative

2 DADDIU R1,R1,#4 5 6 Speculative

2 BNE R2,R3,Loop 6

3 LD R2,0(R1)

3 ADDIU R2,R2,#1

3 SD R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 70: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Speculative execution:Dual issue, 2 CDB

It. Instructions Iss EXE MEM CDB Commit Comment

1 LD R2,0(R1) 1 2 3 4 5 First issue

1 ADDIU R2,R2,#1 1 5 6 7 LUD

1 SD R2,0(R1) 2 3 7 Wait for R2

1 DADDIU R1,R1,#4 2 3 4

1 BNE R2,R3,Loop 3 7 R2 Avail.

2 LD R3,0(R1) 4 5 6 7 Speculative

2 ADDIU R2,R2,#1 4 LUD

2 SD R2,0(R1) 5 6 Speculative

2 DADDIU R1,R1,#4 5 6 7 Speculative

2 BNE R2,R3,Loop 6

3 LD R2,0(R1) 7

3 ADDIU R2,R2,#1 7

3 SD R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 71: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Speculative execution:Dual issue, 2 CDB

It. Instructions Iss EXE MEM CDB Commit Comment

1 LD R2,0(R1) 1 2 3 4 5 First issue

1 ADDIU R2,R2,#1 1 5 6 7 LUD

1 SD R2,0(R1) 2 3 7 Wait for R2

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8 R2 Avail.

2 LD R3,0(R1) 4 5 6 7 Speculative

2 ADDIU R2,R2,#1 4 8 LUD

2 SD R2,0(R1) 5 6 Speculative

2 DADDIU R1,R1,#4 5 6 7 Speculative

2 BNE R2,R3,Loop 6

3 LD R2,0(R1) 7 8 Speculative

3 ADDIU R2,R2,#1 7

3 SD R2,0(R1) 8

3 DADDIU R1,R1,#4 8

3 BNE R2,R3,Loop

Page 72: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Speculative execution:Dual issue, 2 CDB

It. Instructions Iss EXE MEM CDB Commit Comment

1 LD R2,0(R1) 1 2 3 4 5 First issue

1 ADDIU R2,R2,#1 1 5 6 7 LUD

1 SD R2,0(R1) 2 3 7 Wait for R2

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8 R2 Avail.

2 LD R3,0(R1) 4 5 6 7 9

2 ADDIU R2,R2,#1 4 8 9

2 SD R2,0(R1) 5 6

2 DADDIU R1,R1,#4 5 6 7

2 BNE R2,R3,Loop 6

3 LD R2,0(R1) 7 8 9 Speculative

3 ADDIU R2,R2,#1 7

3 SD R2,0(R1) 8 9

3 DADDIU R1,R1,#4 8 9

3 BNE R2,R3,Loop 9

Page 73: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Speculative execution:Dual issue, 2 CDB

It. Instructions Iss EXE MEM CDB Commit Comment

1 LD R2,0(R1) 1 2 3 4 5 First issue

1 ADDIU R2,R2,#1 1 5 6 7 LUD

1 SD R2,0(R1) 2 3 7 Wait for R2

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8 R2 Avail.

2 LD R3,0(R1) 4 5 6 7 9

2 ADDIU R2,R2,#1 4 8 9 10

2 SD R2,0(R1) 5 6 10

2 DADDIU R1,R1,#4 5 6 7

2 BNE R2,R3,Loop 6 10

3 LD R2,0(R1) 7 8 9 10 Speculative

3 ADDIU R2,R2,#1 7

3 SD R2,0(R1) 8 9

3 DADDIU R1,R1,#4 8 9 10

3 BNE R2,R3,Loop 9

Page 74: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Speculative execution:Dual issue, 2 CDB

It. Instructions Iss EXE MEM CDB Commit Comment

1 LD R2,0(R1) 1 2 3 4 5 First issue

1 ADDIU R2,R2,#1 1 5 6 7 LUD

1 SD R2,0(R1) 2 3 7 Wait for R2

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8 R2 Avail.

2 LD R3,0(R1) 4 5 6 7 9

2 ADDIU R2,R2,#1 4 8 9 10

2 SD R2,0(R1) 5 6 10

2 DADDIU R1,R1,#4 5 6 7 11

2 BNE R2,R3,Loop 6 10 11

3 LD R2,0(R1) 7 8 9 10 Speculative

3 ADDIU R2,R2,#1 7 11

3 SD R2,0(R1) 8 9

3 DADDIU R1,R1,#4 8 9 10

3 BNE R2,R3,Loop 9

Page 75: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Speculative execution:Dual issue, 2 CDB

It. Instructions Iss EXE MEM CDB Commit Comment

1 LD R2,0(R1) 1 2 3 4 5 First issue

1 ADDIU R2,R2,#1 1 5 6 7 LUD

1 SD R2,0(R1) 2 3 7 Wait for R2

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8 R2 Avail.

2 LD R3,0(R1) 4 5 6 7 9

2 ADDIU R2,R2,#1 4 8 9 10

2 SD R2,0(R1) 5 6 10

2 DADDIU R1,R1,#4 5 6 7 11

2 BNE R2,R3,Loop 6 10 11

3 LD R2,0(R1) 7 8 9 10 12 Speculative

3 ADDIU R2,R2,#1 7 11 12

3 SD R2,0(R1) 8 9

3 DADDIU R1,R1,#4 8 9 10

3 BNE R2,R3,Loop 9

Page 76: Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

Speculative execution:Dual issue, 2 CDB

It. Instructions Iss EXE MEM CDB Commit Comment

1 LD R2,0(R1) 1 2 3 4 5 First issue

1 ADDIU R2,R2,#1 1 5 6 7 LUD

1 SD R2,0(R1) 2 3 7 Wait for R2

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8 R2 Avail.

2 LD R3,0(R1) 4 5 6 7 9

2 ADDIU R2,R2,#1 4 8 9 10

2 SD R2,0(R1) 5 6 10

2 DADDIU R1,R1,#4 5 6 7 11

2 BNE R2,R3,Loop 6 10 11

3 LD R2,0(R1) 7 8 9 10 12

3 ADDIU R2,R2,#1 7 11 12 13

3 SD R2,0(R1) 8 9 13

3 DADDIU R1,R1,#4 8 9 10

3 BNE R2,R3,Loop 9 13