Top Banner
SAM: Optimizing Multithreaded Cores for Speculative Parallelism MALEEN ABEYDEERA , SUVINAY SUBRAMANIAN, MARK JEFFREY, JOEL EMER, DANIEL SANCHEZ PACT 2017
53

SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

Aug 06, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAM:OptimizingMultithreadedCoresforSpeculativeParallelismMALEENABEYDEERA, SUVINAY SUBRAMANIAN, MARKJEFFREY,JOEL EMER, DANIEL SANCHEZ

PACT 2017

Page 2: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

ExecutiveSummaryAnalyzestheinterplaybetweenhardwaremultithreadingandspeculativeparallelism

(eg:ThreadLevelSpeculationandTransactionalMemory )

Conventionalmultithreadingcausesperformancepathologiesonspeculativeworkloads• Increaseinabortedwork• Inefficientuseofspeculationresources

Why?Allthreadsaretreatedequally

SpeculationAwareMultithreading(SAM)• Prioritizethreadsrunning tasksmorelikelytocommit

SAMmakesmultithreadingmoreuseful

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 2

Page 3: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

ExecutiveSummaryAnalyzestheinterplaybetweenhardwaremultithreadingandspeculativeparallelism

(eg:ThreadLevelSpeculationandTransactionalMemory )

Conventionalmultithreadingcausesperformancepathologiesonspeculativeworkloads• Increaseinabortedwork• Inefficientuseofspeculationresources

Why?Allthreadsaretreatedequally

SpeculationAwareMultithreading(SAM)• Prioritizethreadsrunning tasksmorelikelytocommit

SAMmakesmultithreadingmoreuseful

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 2

Page 4: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

Outline

BackgroundonspeculativeparallelismPitfallsofspeculativeparallelismwithconventionalmultithreadingSAMonin-ordercoresSAMonout-of-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 3

Page 5: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

BackgroundonSpeculativeParallelism

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 4

ParallelizetaskswhenthedependencesarenotknowninadvanceHardwareexecutesalltasksinparallel,abortinguponconflictsWhichtasktoabort?Conflictresolutionpolicy

Page 6: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

BackgroundonSpeculativeParallelism

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 4

ParallelizetaskswhenthedependencesarenotknowninadvanceHardwareexecutesalltasksinparallel,abortinguponconflictsWhichtasktoabort?Conflictresolutionpolicy

SpeculativeParallelism

Orderede.g.Thread-LevelSpeculation(TLS)

(Programorderdictatestheconflictresolutionorder)

Unorderede.g.HardwareTransactionalMemory

(Anyexecutionorderisvalid,buthigh-performanceconflictresolutionpoliciesdefineanorder)

Page 7: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

BackgroundonSpeculativeParallelism

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 4

ParallelizetaskswhenthedependencesarenotknowninadvanceHardwareexecutesalltasksinparallel,abortinguponconflictsWhichtasktoabort?Conflictresolutionpolicy

Implicitorderamongalltasksinanyspeculativesystem

SpeculativeParallelism

Orderede.g.Thread-LevelSpeculation(TLS)

(Programorderdictatestheconflictresolutionorder)

Unorderede.g.HardwareTransactionalMemory

(Anyexecutionorderisvalid,buthigh-performanceconflictresolutionpoliciesdefineanorder)

Page 8: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

BaselineSystem- Swarm[Jeffrey,MICRO’15]

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 5

void desTask(Timestamp ts , GateInput* input) {Gate* g = input ->gate ();bool toggledOutput = g.simulateToggle(input); if ( toggledOutput ) {

for (GateInput* i : g-> connectedInputs ()) {swarm::enqueue(desTask , ts+delay(g,i), i);

}}

}

Page 9: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

BaselineSystem- Swarm[Jeffrey,MICRO’15]

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 5

void desTask(Timestamp ts , GateInput* input) {Gate* g = input ->gate ();bool toggledOutput = g.simulateToggle(input); if ( toggledOutput ) {

for (GateInput* i : g-> connectedInputs ()) {swarm::enqueue(desTask , ts+delay(g,i), i);

}}

} Taskscreatechildrentasks(functionptr,timestamp,args)

Timestampedtasks

Page 10: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

BaselineSystem- Swarm[Jeffrey,MICRO’15]

Tasksappeartoexecuteintimestamporder

Unorderedexecutionviaequaltimestamps

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 5

void desTask(Timestamp ts , GateInput* input) {Gate* g = input ->gate ();bool toggledOutput = g.simulateToggle(input); if ( toggledOutput ) {

for (GateInput* i : g-> connectedInputs ()) {swarm::enqueue(desTask , ts+delay(g,i), i);

}}

} Taskscreatechildrentasks(functionptr,timestamp,args)

Timestampedtasks

Page 11: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SwarmMicroarchitecture

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 6

Equaltimestamps:globalorderviaVirtualTime(VT)

Timestamp Tiebreaker

Virtual Time

Page 12: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SwarmMicroarchitecture

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 6

Mem / IO

Mem

/ IO

Mem / IO

Mem

/ IO

16-tile, 64-core CMP Tile Organization

Core Core Core Core

L1I/D L1I/D L1I/D L1I/D

L2

L3 SliceRouter

Task Unit

Tile

Equaltimestamps:globalorderviaVirtualTime(VT)

Timestamp Tiebreaker

Virtual Time

Page 13: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SwarmMicroarchitecture

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 6

Mem / IO

Mem

/ IO

Mem / IO

Mem

/ IO

16-tile, 64-core CMP Tile Organization

Core Core Core Core

L1I/D L1I/D L1I/D L1I/D

L2

L3 SliceRouter

Task Unit

Tile

Equaltimestamps:globalorderviaVirtualTime(VT)

Tasksexecuteout-of-order,butcommitinVTorder

Timestamp Tiebreaker

Virtual Time

Commitqueue:stateoftaskswaitingtocommit

Page 14: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

Outline

BackgroundonspeculativeparallelismPitfallsofspeculativeparallelismwithconventionalmultithreadingSAMonin-ordercoresSAMonout-of-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 7

Page 15: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

PitfallsofSpeculation-ObliviousMultithreading

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 8

Systemconfiguration:64-coreSMTsystemIn-ordercorewith2-wideissueSpeculation-oblivious round-robin order

Page 16: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

PitfallsofSpeculation-ObliviousMultithreading

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 8

Systemconfiguration:64-coreSMTsystemIn-ordercorewith2-wideissueSpeculation-oblivious round-robin order

Page 17: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

PitfallsofSpeculation-ObliviousMultithreading

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 8

Insights:1.Multithreadingcanbehighlybeneficial

Systemconfiguration:64-coreSMTsystemIn-ordercorewith2-wideissueSpeculation-oblivious round-robin order

Micro-opsissuedfromcommitted tasks

Noreadymicro-opstoissue

Page 18: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

PitfallsofSpeculation-ObliviousMultithreading

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 8

Insights:1.Multithreadingcanbehighlybeneficial

However,multithreadingcanalsoleadto:2.Increasedaborts

Systemconfiguration:64-coreSMTsystemIn-ordercorewith2-wideissueSpeculation-oblivious round-robin order

Micro-opsissuedfromcommitted tasks

Noreadymicro-opstoissue

Micro-opsissuedfromabortedtasks

Page 19: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

PitfallsofSpeculation-ObliviousMultithreading

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 8

Insights:1.Multithreadingcanbehighlybeneficial

However,multithreadingcanalsoleadto:2.Increasedaborts3.Inefficientuseofspeculationresources

Systemconfiguration:64-coreSMTsystemIn-ordercorewith2-wideissueSpeculation-oblivious round-robin order

Micro-opsissuedfromcommitted tasks

Noreadymicro-opstoissue

Micro-opsissuedfromabortedtasks

Resourcestalls

Page 20: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

PitfallsofSpeculation-ObliviousMultithreading

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 8

Insights:1.Multithreadingcanbehighlybeneficial

However,multithreadingcanalsoleadto:2.Increasedaborts3.Inefficientuseofspeculationresources

Unlikely-to-committaskshurtthethroughputoflikely-to-commitones

Systemconfiguration:64-coreSMTsystemIn-ordercorewith2-wideissueSpeculation-oblivious round-robin order

Micro-opsissuedfromcommitted tasks

Noreadymicro-opstoissue

Micro-opsissuedfromabortedtasks

Resourcestalls

Page 21: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

Speculation-AwareMultithreading

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 9

Prioritizethreadsaccordingtotheirconflictresolutionpriorities

ReduceSpeculationResourceStalls(taskscommitearly)

ReduceAborts(focusresourcesontaskslikelytocommit)

Page 22: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

Outline

BackgroundonspeculativeparallelismPitfallsofspeculativeparallelismwithconventionalmultithreadingSAMonin-ordercoresSAMonout-of-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 10

Page 23: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMonin-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 11

SMTIssue

Fetch Decode leslesRegisterFiles

Pipe 0Pipe 1

Int ALUFP ALU

Int ALUMem/DCache

Thread micro-op queues

Page 24: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMonin-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 11

SMTIssue

Fetch Decode leslesRegisterFiles

Pipe 0Pipe 1

Int ALUFP ALU

Int ALUMem/DCache

Thread micro-op queues

Page 25: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMonin-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 11

SMTIssue

Fetch Decode leslesRegisterFiles

Pipe 0Pipe 1

Int ALUFP ALU

Int ALUMem/DCache

Thread micro-op queues

Conflict resolutionpriority updates(Virtual Times)

Task Unit

Page 26: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMonin-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 11

SMTIssue

Fetch Decode leslesRegisterFiles

Pipe 0Pipe 1

Int ALUFP ALU

Int ALUMem/DCache

Thread micro-op queues

SAM issue priorities(higher is better)

SortMax

Ready

52:9

52:717:195:4

Virtual Times

3

24

1

IssueThreadID

Conflict resolutionpriority updates(Virtual Times)

Task Unit

Page 27: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

ExperimentalMethodology

BaselineSystem• Swarm+Wait-N-GoTM [Jafrietal.ASPLOS’13]conflictresolutiontechniques• Cycle-accurate,event-driven,Pin-basedsimulator• Modelsystemsupto64cores• Cores:2wideissue,upto8threadspercore

Benchmarks• Ordered:Swarm[Jeffreyetal.MICRO’15,MICRO’16]– 8benchmarks• Unordered:STAMP[Minhetal. IISWC’08]– 8benchmarks

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 12

Page 28: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMmakesmultithreadingmoreeffective

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 13

1 Thread

Ordered Benchmarks Unordered Benchmarks

Page 29: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMmakesmultithreadingmoreeffective

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 13

8 Thread Round Robin1 Thread

Ordered Benchmarks Unordered Benchmarks

Page 30: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMmakesmultithreadingmoreeffective

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 13

8 Thread SAM8 Thread Round Robin1 Thread

Ordered Benchmarks Unordered Benchmarks

Page 31: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMmakesmultithreadingmoreeffective

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 13

8 Thread SAM8 Thread Round Robin1 Thread

Ordered Benchmarks Unordered Benchmarks

8threadedcoresoutperformsinglethreadedcoresby1.85X

WithSAM,thebenefitincreasesto2.33X

Page 32: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMmakesmultithreadingmoreeffective

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 13

8 Thread SAM8 Thread Round Robin1 Thread

Ordered Benchmarks Unordered Benchmarks

8threadedcoresoutperformsinglethreadedcoresby1.85X

WithSAM,thebenefitincreasesto2.33X

Page 33: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMmakesmultithreadingmoreeffective

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 13

8 Thread SAM8 Thread Round Robin1 Thread

Ordered Benchmarks Unordered Benchmarks

8threadedcoresoutperformsinglethreadedcoresby1.85X

WithSAM,thebenefitincreasesto2.33X

Page 34: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

WhydoesSAMhelp?

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 14

SAMmatchesRRwhentherearenopathologies

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready Other

Page 35: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

WhydoesSAMhelp?

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 14

SAMmatchesRRwhentherearenopathologies

SAMreduceswastedwork

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready Other

Page 36: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

WhydoesSAMhelp?

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 14

SAMmatchesRRwhentherearenopathologies

SAMreduceswastedwork

SAMreducesresourcestalls

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready Other

Page 37: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

Outline

BackgroundonspeculativeparallelismPitfallsofspeculativeparallelismwithconventionalmultithreadingSAMonin-ordercoresSAMonout-of-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 15

Page 38: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMonout-of-ordercoresUnlikein-ordercores,prioritiesaffectpipelineefficiency• Asinglethreadcanclogcoreresources• Increasedwrongpathexecution

Despitethese,prioritizingtasksisbetter

Needforaggressiveprioritizationaffectscoredesign• Shared,notpartitionedROBs

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 16

SMTIssue

Fetch Decode

Thread micro-op queues

Issue Buffer

PhysicalRegFile

Pipe 0 ReorderBuffer

Pipe 1

In-flight uops (for ICount)

3 9 4 2

SAM priorities

3 4 2 1

Conflict resolutionpriority updates(from task unit)

Conflict res. priorities

2 3 2 1

Page 39: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMtradeoffs without-of-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 17

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Baselinepolicy- ICount(IC)

sssp – 8threads

Page 40: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMtradeoffs without-of-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 17

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Baselinepolicy- ICount(IC)

SAMismorebeneficialwithdynamicallysharedROBsReducesaborts+resourcestalls

sssp – 8threads

Page 41: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMtradeoffs without-of-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 17

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Baselinepolicy- ICount(IC)

SAMismorebeneficialwithdynamicallysharedROBsReducesaborts+resourcestalls

Butreducedpipelineefficiency

sssp – 8threads

Page 42: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMtradeoffs without-of-ordercores

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 17

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Baselinepolicy- ICount(IC)

SAMismorebeneficialwithdynamicallysharedROBsReducesaborts+resourcestalls

ButreducedpipelineefficiencyIncreaseinwrong-pathissues+not-readystalls

sssp – 8threads

Page 43: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

AdaptiveSAMpolicy

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 18

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Page 44: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

AdaptiveSAMpolicy

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 18

HardwarecounterstotrackcyclesMicro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Page 45: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

AdaptiveSAMpolicy

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 18

Aborted Resource NotreadyWrongpath

HardwarecounterstotrackcyclesMicro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Page 46: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

AdaptiveSAMpolicy

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 18

Aborted Resource NotreadyWrongpath

Hardwarecounterstotrackcycles

+

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Page 47: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

AdaptiveSAMpolicy

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 18

Aborted Resource NotreadyWrongpath

Hardwarecounterstotrackcycles

Cycleslosttotasklevelspeculation

+

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Page 48: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

AdaptiveSAMpolicy

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 18

Aborted Resource NotreadyWrongpath

Hardwarecounterstotrackcycles

Cycleslosttotasklevelspeculation

Cycleslosttopipelineinefficiencies

+ +

>

UseSAM UseICount

True False

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Page 49: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMonOoO cores(allbenchmarks)

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 19

At8threads/core:• Multithreadingimprovesperformance

oversinglethreadedcoresby1.1x

Averageoverallbenchmarks

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Page 50: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMonOoO cores(allbenchmarks)

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 19

At8threads/core:• Multithreadingimprovesperformance

oversinglethreadedcoresby1.1x• WithSAM,improvementrisesto1.5x

Averageoverallbenchmarks

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Page 51: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

SAMonOoO cores(allbenchmarks)

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 19

At8threads/core:• Multithreadingimprovesperformance

oversinglethreadedcoresby1.1x• WithSAM,improvementrisesto1.5x

Adaptivepolicyslightlyincreasesperformanceat2and4threads

Averageoverallbenchmarks

Micro-opsissued Unusedissueslots(reason)Committed Aborted Resource Notready OtherWrongpath

Page 52: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

Conclusion

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 20

Conventionalmultithreadingcausesperformancepathologiesonspeculativeworkloads• Increaseinabortedwork• Inefficientuseofspeculationresources

SpeculationAwareMultithreading(SAM)Prioritizethreadsrunningtasksmorelikelytocommit

SAMmakesmultithreadingmoreuseful

Page 53: SAM: Optimizing Multithreaded Cores for Speculative ... · SAM : OPTIMIZING MULTITHREADED CORES FOR SPECULATIVE PARALLELISM 2. Executive Summary Analyzes the interplay between hardware

Questions?

SAM:OPTIMIZINGMULTITHREADEDCORESFORSPECULATIVEPARALLELISM 21

Conventionalmultithreadingcausesperformancepathologiesonspeculativeworkloads• Increaseinabortedwork• Inefficientuseofspeculationresources

SpeculationAwareMultithreading(SAM)Prioritizethreadsrunningtasksmorelikelytocommit

SAMmakesmultithreadingmoreuseful