www.compaq.com Simultaneous Simultaneous Multithreading: Multithreading: Multiplying Alpha Multiplying Alpha Performance Performance Dr. Joel Emer Dr. Joel Emer Principal Member Technical Staff Principal Member Technical Staff Alpha Development Group Alpha Development Group Compaq Computer Corporation Compaq Computer Corporation
23
Embed
Www.compaq.com Simultaneous Multithreading: Multiplying Alpha Performance Dr. Joel Emer Principal Member Technical Staff Alpha Development Group Compaq.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Dr. Joel EmerDr. Joel EmerPrincipal Member Technical StaffPrincipal Member Technical Staff
Alpha Development GroupAlpha Development GroupCompaq Computer CorporationCompaq Computer Corporation
www.compaq.com
OutlineOutline
Alpha Processor RoadmapAlpha Processor Roadmap Motivation for Introducing SMTMotivation for Introducing SMT Implementation of an SMT CPUImplementation of an SMT CPU Performance EstimatesPerformance Estimates Architectural AbstractionArchitectural Abstraction
Leading edge process technology – 1.2-2.0GHzLeading edge process technology – 1.2-2.0GHz0.125µm CMOS0.125µm CMOSSOI-compatibleSOI-compatibleCu interconnectCu interconnect low-k dielectricslow-k dielectrics
Chip characteristicsChip characteristics~1.2V Vdd~1.2V Vdd~250 Million transistors~250 Million transistors~1100 signal pins in flip chip packaging~1100 signal pins in flip chip packaging
Enhanced out-of-order executionEnhanced out-of-order execution 8-wide superscalar8-wide superscalar Large on-chip L2 cacheLarge on-chip L2 cache Direct RAMBUS interfaceDirect RAMBUS interface On-chip router for system interconnect On-chip router for system interconnect Glueless, directory-based, ccNUMA for up to 512-way SMPGlueless, directory-based, ccNUMA for up to 512-way SMP 4-way simultaneous multithreading (SMT)4-way simultaneous multithreading (SMT)
www.compaq.com
GoalsGoals
Leadership single stream performanceLeadership single stream performance
Extra multistream performance with multithreadingExtra multistream performance with multithreadingWithout major architectural changesWithout major architectural changesWithout significant additional costWithout significant additional cost
www.compaq.com
Instruction IssueInstruction Issue
Reduced function unit utilization due to dependencies
Time
www.compaq.com
Superscalar IssueSuperscalar Issue
Superscalar leads to more performance, but lower utilization
Time
www.compaq.com
Predicated IssuePredicated Issue
Adds to function unit utilization, but results are thrown away
Time
www.compaq.com
Chip MultiprocessorChip Multiprocessor
Limited utilization when only running one thread
Time
www.compaq.com
Fine Grained MultithreadingFine Grained Multithreading
Solution:Solution:Provide quiescing operation that allows aProvide quiescing operation that allows aTPU to sleep until a memory location changesTPU to sleep until a memory location changes
www.compaq.com
SummarySummary
Alpha will maintain single stream performance leadership Alpha will maintain single stream performance leadership
SMT will significantly enhance multistream performanceSMT will significantly enhance multistream performanceAcross a wide range of applications,Across a wide range of applications,Without significant hardware cost, andWithout significant hardware cost, andWithout major architectural changesWithout major architectural changes
www.compaq.com
ReferencesReferences
""Simultaneous Multithreading: Maximizing On-Chip ParallelismSimultaneous Multithreading: Maximizing On-Chip Parallelism" by Tullsen, " by Tullsen, Eggers and Levy in ISCA95.Eggers and Levy in ISCA95.
""Exploiting Choice: Instruction Fetch and Issue on an Implementable Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreaded ProcessorSimultaneous Multithreaded Processor" by Tullsen, Eggers, Emer, Levy, Lo " by Tullsen, Eggers, Emer, Levy, Lo and Stamm in ISCA96.and Stamm in ISCA96.
““Converting Thread-Level Parallelism to Instruction-Level Parallelism via Converting Thread-Level Parallelism to Instruction-Level Parallelism via Simultaneous MultithreadingSimultaneous Multithreading” by Lo, Eggers, Emer, Levy, Stamm and Tullsen ” by Lo, Eggers, Emer, Levy, Stamm and Tullsen in ACM Transactions on Computer Systems, August 1997.in ACM Transactions on Computer Systems, August 1997.
““Simultaneous Multithreading: A Platform for Next-Generation Prcoessors” by Simultaneous Multithreading: A Platform for Next-Generation Prcoessors” by Eggers, Emer, Levy, Lo, Stamm and Tullsen in IEEE Micro, October, 1997.Eggers, Emer, Levy, Lo, Stamm and Tullsen in IEEE Micro, October, 1997.