Top Banner
MacSim Architecture Studies 1 MacSim Tutorial (In ISCA-39, 2012)
9

MacSim Architecture Studies

Feb 22, 2016

Download

Documents

Otylia

MacSim Architecture Studies. Architecture Studies Using MacSim. Thread fetch policies Branch predictor. Software and Hardware prefetcher Cache studies (sharing, inclusion) DRAM scheduling Interconnection studies. Power model. Front-end. Memory System. Misc. Prefetcher Study. MacSim. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MacSim Architecture Studies

MacSim Tutorial (In ISCA-39, 2012)1

MacSim Architecture Studies

Page 2: MacSim Architecture Studies

MacSim Tutorial (In ISCA-39, 2012)

Architecture Studies Using MacSim

• Thread fetch policies

• Branch predictor

• Software and Hardware prefetcher

• Cache studies (sharing, inclusion)

• DRAM scheduling• Interconnection

studies

• Power model

Front-end Memory System Misc.

2/8

Page 3: MacSim Architecture Studies

MacSim Tutorial (In ISCA-39, 2012)

Prefetcher Study

Memory System

Trace Generator(PIN, GPUOCelot)

Hardware Prefetcher

FrontendSoftware prefetch instructionsPTX prefetch, prefetchux86 prefetcht0, prefetcht1, prefetchnta

Hardware prefetch requests

Stream, stride, GHB, …

• Many-thread Aware Prefetching Mechanism [Lee et al. MICRO-43, 2010]• When prefetching works, when it doesn’t, and why [Lee et al. ACM TACO, 2012]

MacSim

3/8

Page 4: MacSim Architecture Studies

MacSim Tutorial (In ISCA-39, 2012)

Cache and NoC Studies| Cache studies – sharing, inclusion property| On-chip interconnection studies

• TLP-Aware Cache Management Policy [Lee and Kim, HPCA-18, 2012]

$ $$ $ $ $ $

Shared $

Interconnection

Private Caches

Interconnection

Shared Cache

4/8

Page 5: MacSim Architecture Studies

MacSim Tutorial (In ISCA-39, 2012)

Heterogeneity Aware NoC| Heterogeneous link configuration

Ring NetworkGPU

CPU

L3

MC

Different topologies

C C M M

C C M M

C C G G

C C G G

C0

L3

G0

M1

C1 C2 G1 G2

M0 L3 L3 L3

C0

L3

G0

M1

C1C2 G1 G2

M0 L3 L3 L3

• On-chip Interconnection for CPU-GPU Heterogeneous Architecture [Lee et al. under review]

5/8

Page 6: MacSim Architecture Studies

MacSim Tutorial (In ISCA-39, 2012)

Instruction Fetch and DRAM Scheduling

Execution

Trace Generator(GPUOCelot) Frontend

• Effect of Instruction Fetch and Memory Scheduling on GPU Performance [Lakshminarayana and Kim, LCA-GPGPU, 2010]

DRAM

RR, ICOUNT, FAIR, LRF, …

FCFS, FRFCFS, FAIR, …

6/8

Page 7: MacSim Architecture Studies

MacSim Tutorial (In ISCA-39, 2012)

DRAM Scheduling in GPGPUsDRAM Bank

DRAM Controller

Core-0 Core-1

Qs for Core-0

RHRMRMRMRM

RHRMRMRM

RHRMRM

W0 W1 W2 W3

Tolerance(Core-0) < Tolerance(Core-1)

Qs for Core-1

RHRMRMRM

RHW0 W1 W2 W3

Potential of Requests from Core-0 = |W0|α + |W1|α + |W2|α + |W3|α

= 4α + 3α + 5α (α < 1)

Reduction in potential if:row hit from queue of length L is serviced next Lα – (L – 1)α

row hit from queue of length L is serviced next Lα – (L – 1/m)α

m = cost of servicing row miss/cost of servicing row hit

Tolerance(Core-0) < Tolerance(Core-1) select Core-0

Servicing row hit from W1 (of Core-0) results in greatest reduction in potential, so service row hits from W1 next

• DRAM Scheduling Policy for GPGPU Architectures Based on a Potential Function [Lakshminarayana et al. IEEE CAL, 2011]

7/8

Page 8: MacSim Architecture Studies

MacSim Tutorial (In ISCA-39, 2012)

Power Research & Validation

| Verifying simulator and GTX580| Modeling X86-CPU power| Modeling GPU power

Still on-going research

8/8

Fetch3%

Decode1% Schedule

3%

RF4%

EX_alu6%

EX_fpu48%EX_SFU

1%

EX_LD/ST3%

Execution0%

MMU0%

L126%

SharedMem1%

ConstCache1%

TextureCache1%

Page 9: MacSim Architecture Studies

MacSim Tutorial (In ISCA-39, 2012)

MacSim’s Roadmap

2012 ~ 2013

Power/Energy Model

ARM ArchitectureMobile Platform

OpenGL Program