Top Banner
www.compaq.com Simultaneous Simultaneous Multithreading: Multithreading: Multiplying Alpha Multiplying Alpha Performance Performance Dr. Joel Emer Dr. Joel Emer Principal Member Technical Staff Principal Member Technical Staff Alpha Development Group Alpha Development Group Compaq Computer Corporation Compaq Computer Corporation
23

Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

Simultaneous Multithreading:Simultaneous Multithreading:Multiplying Alpha PerformanceMultiplying Alpha Performance

Dr. Joel EmerDr. Joel EmerPrincipal Member Technical StaffPrincipal Member Technical Staff

Alpha Development GroupAlpha Development GroupCompaq Computer CorporationCompaq Computer Corporation

Page 2: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

OutlineOutline

Alpha Processor RoadmapAlpha Processor Roadmap Motivation for Introducing SMTMotivation for Introducing SMT Implementation of an SMT CPUImplementation of an SMT CPU Performance EstimatesPerformance Estimates Architectural AbstractionArchitectural Abstraction

Page 3: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

Higher Performance

Lo

wer

Co

st

2000 2001 2002 20031998 1999

21264 21264 EV6 EV6

2126421264EV68EV68

0.35m

2126421264EV67EV67

0.28m

0.18m

EV7EV70.18m

...

EV8EV80.125m

Alpha Microprocessor OverviewAlpha Microprocessor Overview

EV78EV780.125m

First System Ship

Page 4: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

EV8 Technology OverviewEV8 Technology Overview

Leading edge process technology – 1.2-2.0GHzLeading edge process technology – 1.2-2.0GHz0.125µm CMOS0.125µm CMOSSOI-compatibleSOI-compatibleCu interconnectCu interconnect low-k dielectricslow-k dielectrics

Chip characteristicsChip characteristics~1.2V Vdd~1.2V Vdd~250 Million transistors~250 Million transistors~1100 signal pins in flip chip packaging~1100 signal pins in flip chip packaging

Page 5: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

EV8 Architecture OverviewEV8 Architecture Overview

Enhanced out-of-order executionEnhanced out-of-order execution 8-wide superscalar8-wide superscalar Large on-chip L2 cacheLarge on-chip L2 cache Direct RAMBUS interfaceDirect RAMBUS interface On-chip router for system interconnect On-chip router for system interconnect Glueless, directory-based, ccNUMA for up to 512-way SMPGlueless, directory-based, ccNUMA for up to 512-way SMP 4-way simultaneous multithreading (SMT)4-way simultaneous multithreading (SMT)

Page 6: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

GoalsGoals

Leadership single stream performanceLeadership single stream performance

Extra multistream performance with multithreadingExtra multistream performance with multithreadingWithout major architectural changesWithout major architectural changesWithout significant additional costWithout significant additional cost

Page 7: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

Instruction IssueInstruction Issue

Reduced function unit utilization due to dependencies

Time

Page 8: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

Superscalar IssueSuperscalar Issue

Superscalar leads to more performance, but lower utilization

Time

Page 9: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

Predicated IssuePredicated Issue

Adds to function unit utilization, but results are thrown away

Time

Page 10: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

Chip MultiprocessorChip Multiprocessor

Limited utilization when only running one thread

Time

Page 11: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

Fine Grained MultithreadingFine Grained Multithreading

Intra-thread dependencies still limit performance

Time

Page 12: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

Simultaneous MultithreadingSimultaneous Multithreading

Maximum utilization of function units by independent operations

Time

Page 13: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

Basic Out-of-order PipelineBasic Out-of-order Pipeline

Fetch Decode/Map

Queue Reg Read

Execute Dcache/Store Buffer

Reg Write

Retire

PC

Icache

RegisterMap

DcacheRegs Regs

Thread-blind

Page 14: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

SMT PipelineSMT Pipeline

Fetch Decode/Map

Queue Reg Read

Execute Dcache/Store Buffer

Reg Write

Retire

IcacheDcache

PC

RegisterMap

Regs Regs

Page 15: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

Changes for SMTChanges for SMT

Basic pipeline – unchangedBasic pipeline – unchanged

Replicated resourcesReplicated resources Program countersProgram counters Register mapsRegister maps

Shared resourcesShared resources Register file (size increased)Register file (size increased) Instruction queueInstruction queue First and second level cachesFirst and second level caches Translation buffersTranslation buffers Branch predictorBranch predictor

Page 16: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

Multiprogrammed workloadMultiprogrammed workload

0%

50%

100%

150%

200%

250%

SpecInt SpecFP Mixed Int/FP

1T

2T

3T

4T

Page 17: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

Decomposed SPEC95 ApplicationsDecomposed SPEC95 Applications

0%

50%

100%

150%

200%

250%

Turb3d Swm256 Tomcatv

1T

2T

3T

4T

Page 18: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

Multithreaded ApplicationsMultithreaded Applications

0%

50%

100%

150%

200%

250%

300%

Barnes Chess Sort TP

1T

2T

4T

Page 19: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

Architectural AbstractionArchitectural Abstraction

1 CPU with 4 Thread Processing Units (TPUs)1 CPU with 4 Thread Processing Units (TPUs) Shared hardware resourcesShared hardware resources

TPU 0 TPU1 TPU2 TPU3

Icache TLB Dcache

Scache

Page 20: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

System Block DiagramSystem Block Diagram

EV8M

IOEV8

M

IOEV8

M

IO

EV8M

IOEV8

M

IOEV8

M

IO

EV8M

IOEV8

M

IOEV8

M

IO

0 1 2 3

Page 21: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

Quiescing Idle ThreadsQuiescing Idle Threads

Problem:Problem:Spin looping thread consumes resourcesSpin looping thread consumes resources

Solution:Solution:Provide quiescing operation that allows aProvide quiescing operation that allows aTPU to sleep until a memory location changesTPU to sleep until a memory location changes

Page 22: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

SummarySummary

Alpha will maintain single stream performance leadership Alpha will maintain single stream performance leadership

SMT will significantly enhance multistream performanceSMT will significantly enhance multistream performanceAcross a wide range of applications,Across a wide range of applications,Without significant hardware cost, andWithout significant hardware cost, andWithout major architectural changesWithout major architectural changes

Page 23: Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.

www.compaq.com

ReferencesReferences

""Simultaneous Multithreading: Maximizing On-Chip ParallelismSimultaneous Multithreading: Maximizing On-Chip Parallelism" by Tullsen, " by Tullsen, Eggers and Levy in ISCA95.Eggers and Levy in ISCA95.

""Exploiting Choice: Instruction Fetch and Issue on an Implementable Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreaded ProcessorSimultaneous Multithreaded Processor" by Tullsen, Eggers, Emer, Levy, Lo " by Tullsen, Eggers, Emer, Levy, Lo and Stamm in ISCA96.and Stamm in ISCA96.

““Converting Thread-Level Parallelism to Instruction-Level Parallelism via Converting Thread-Level Parallelism to Instruction-Level Parallelism via Simultaneous MultithreadingSimultaneous Multithreading” by Lo, Eggers, Emer, Levy, Stamm and Tullsen ” by Lo, Eggers, Emer, Levy, Stamm and Tullsen in ACM Transactions on Computer Systems, August 1997.in ACM Transactions on Computer Systems, August 1997.

““Simultaneous Multithreading: A Platform for Next-Generation Prcoessors” by Simultaneous Multithreading: A Platform for Next-Generation Prcoessors” by Eggers, Emer, Levy, Lo, Stamm and Tullsen in IEEE Micro, October, 1997.Eggers, Emer, Levy, Lo, Stamm and Tullsen in IEEE Micro, October, 1997.