Top Banner
RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing Laboratory University of California, Berkeley
18

RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

RAMP Gold: Architecture and Timing Model

Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson,

Krste Asanović

Parallel Computing LaboratoryUniversity of California, Berkeley

Page 2: RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

RAMP Gold Overview

• Tiled CMP simulator• ISA: SPARC V8

– (ARM/Thumb-2 later?)

• Split timing and function (both on FPGA)

• Host-multithreaded• Runs on V5LX110T

(XUP)

Par Lab InfiniCore

Functional Functional Model Model PipelinePipeline

Arch State

Timing Timing Model Model PipelinePipeline

Timing State

Page 3: RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

RAMP Gold Target Machine

SPARC V8CORE

SPARC V8CORE

I$I$ D$D$

DRAMDRAM

Shared L2$ / InterconnectShared L2$ / Interconnect

SPARC V8CORE

SPARC V8CORE

I$I$ D$D$

SPARC V8CORE

SPARC V8CORE

I$I$ D$D$

SPARC V8CORE

SPARC V8CORE

I$I$ D$D$

64 cores

Page 4: RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

RAMP Gold v1 Target Features

• 64 single issue in-order SPARCv8 processors– Simple, 5-stage pipeline– FPU

• Cache Timing model– Configurable size, line size, associativity,

miss penalty, shared/private– Change parameters without resynthesis

Page 5: RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

RAMP Gold Architecture

• Mapping the target machine directly to an FPGA is inefficient

• Solution: split timing and functionality + Multithreading– The timing logic decides how many

target cycles an instruction sequence should take

– Simulating the functionality of an instruction might take multiple host cycles

Page 6: RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

Function/Timing Split Advantages

• Flexibility– Can configure target at runtime– Synthesize design once, change target

model parameters at will

• Efficient FPGA resource usage– Example 1: model a 2-cycle FPU in 10 host

cycles– Example 2: model a 16MB L2$ using only

256KB host BRAM to store tags/metadata

• Enables multithreading

Page 7: RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

Split Timing and Function

• Functional model executes ISA correctly• Timing model determines how long a program takes to

run

CPUCPU

L1 D$L1 D$

MEMMEM

=

Target Machine

CPU FMCPU FM

MEM FMMEM FM

Functional Model Timing Model

CPU TMCPU TM

L1 D$ TML1 D$ TM

MEM TMMEM TM

L1 D$ FML1 D$ FM +

Page 8: RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

• Functional model executes ISA correctly• Timing model determines how long a program takes to

run

CPUCPU

L1 D$L1 D$

MEMMEM

CPU FMCPU FM

MEM FMMEM FM=

Target Machine Functional Model Timing Model

CPU TMCPU TM

L1 D$ TML1 D$ TM

MEM TMMEM TM

+

Split Timing and Function

Page 9: RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

TM + FM from 30,000 ft

CPU TimingModel

CPU TimingModel

L1 D$ Timing Model

L1 D$ Timing Model

CPU FunctionalModel

CPU FunctionalModel

Memory TimingModel

Memory TimingModel

Memory FunctionalModel

Memory FunctionalModelinstruction

ld/st addressstore data

ld/st address stall

load data

ld/st addressstore data

stall

instructioncomplete

Page 10: RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

TM + FM from 3,000 ft

Memory TimingModel

Memory TimingModel

Memory Functional

Model

Memory Functional

Modelinstruction

ld/st address,store data

ld/st address stall

load data

ld/st address,store data

stall

instructioncomplete

CPU TM

IFIF CTRLCTRL

DECDEC EXEX MEMMEM WBWB

CPU FM

TM1TM1 TM2TM2

L1 D$ TM

Page 11: RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

Example: Target Load Miss

Memory TimingModel

Memory TimingModel

Memory Functional

Model

Memory Functional

Modelinstruction

ld/st address,store data

ld/st address stall

load data

ld/st address,store data

stall

instructioncomplete

CPU TM

IFIF CTRLCTRL

DECDEC EXEX MEMMEM WBWB

CPU FM

TM1TM1 TM2TM2

L1 D$ TM

11

22

33

44

44

44

55

66

77

Page 12: RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

Timing-Driven Host Pipeline

TSTS IFIF

DEDE EXEX WBWBMEM2MEM2

TM1TM1

TARGET MEMORY TM/FMTARGET MEMORY TM/FM

TM2TM2 TM3TM3

L1 D$ TM

MEM1MEM1

Store Buffer

Load ResultBuffer

CPU/D$ Timing Model

CPU Functional Model

{TID,INST} {TID,ADDR}

T0 T1 T2ADD LD ST

ST LD

ADD

Page 13: RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

Cache Modeling

• The cache model maintains tag, state, protocol bits internally

• Whenever the functional model issues a memory operation, the cache model determines how many target cycles to stall

tagtag indexindex offsetoffset

tag, statetag, state tag, statetag, state tag, statetag, state

==== ==

hit/miss

associativity

Page 14: RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

Multithreaded, Pipelined Cache TM

tag, statetag, state

==Address

tag, statetag, state

==

tag, statetag, state

==

Index

hit?

Page 15: RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

Quick & Dirty Validation

• 32KB, 2-way L1 D$, 64B lines• 256KB, 4-way L2$, 64B lines

Page 16: RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

Status

• Functional + simple timing model work in HW– Running real programs (e.g. SPLASH2)

• Near term future work– Move from current “functional-first + stall”

configuration to timing-driven described here– More interesting memory system timing

model– Functional potpourri (FDIV, MMU, …)

Page 17: RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

DEMO

• Run OCEAN with different L1 D$ parameters

Page 18: RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson, Krste Asanović Parallel Computing.

Questions?

Thank you!