Top Banner
Warp-Aware Trace Scheduling for GPUS James Jablin (Brown) Thomas Jablin (UIUC) Onur Mutlu (CMU) Maurice Herlihy (Brown)
65

Warp-Aware Trace Scheduling for GPUS James Jablin (Brown) Thomas Jablin (UIUC) Onur Mutlu (CMU) Maurice Herlihy (Brown)

Dec 21, 2015

Download

Documents

Dwain Thornton
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Slide 1
  • Warp-Aware Trace Scheduling for GPUS James Jablin (Brown) Thomas Jablin (UIUC) Onur Mutlu (CMU) Maurice Herlihy (Brown)
  • Slide 2
  • Historical Trends in GFLOPS: CPUs vs. GPUs Reproduced from NVIDIA C Programming Guide (Version 5.0)
  • Slide 3
  • Performance Pitfalls Control flow can negatively affect performance.
  • Slide 4
  • Performance Pitfalls Pipeline Stall execution delay in an instruction pipeline to resolve a dependency
  • Slide 5
  • Hardware: CPU versus GPU Reproduced from NVIDIA C Programming Guide (Version 5.0)
  • Slide 6
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Slide 12
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Performance Pitfalls Pipeline Stall execution delay in an instruction pipeline to resolve a dependency
  • Slide 19
  • Performance Pitfalls Pipeline Stall execution delay in an instruction pipeline to resolve a dependency Warp Divergence threads within a warp take different paths and the different execution paths are serialized
  • Slide 20
  • Warp Divergence Example
  • Slide 21
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Slide 27
  • Slide 28
  • Warp-Aware Trace Scheduling Schedule instructions across basic block boundaries to expose additional ILP
  • Slide 29
  • Warp-Aware Trace Scheduling Schedule instructions across basic block boundaries to expose additional ILP while managing and optimizing warp divergence.
  • Slide 30
  • Origins: Microcode Trace Scheduling generalizing local and disparate vertical-to- horizontal microcode compaction StepDescription
  • Slide 31
  • Origins: Microcode Trace Scheduling generalizing local and disparate vertical-to- horizontal microcode compaction StepDescription 1. Trace Selection
  • Slide 32
  • Origins: Microcode Trace Scheduling generalizing local and disparate vertical-to- horizontal microcode compaction StepDescription 1. Trace Selection 2. Trace Formation
  • Slide 33
  • Origins: Microcode Trace Scheduling generalizing local and disparate vertical-to- horizontal microcode compaction StepDescription 1. Trace Selection 2. Trace Formation 3. Local Scheduling
  • Slide 34
  • Origins: Microcode Trace Scheduling generalizing local and disparate vertical-to- horizontal microcode compaction StepDescription 1. Trace SelectionPartition basic blocks into regions 2. Trace Formation Facilitate local scheduling, potentially adding nodes and edges 3. Local SchedulingSchedule instructions within each region
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Backup Slides
  • Slide 52
  • Slide 53
  • GPU Programming Model
  • Slide 54
  • Slide 55
  • Slide 56
  • Characterizing the Grid
  • Slide 57
  • Characterizing the Grid, Block
  • Slide 58
  • Slide 59
  • Characterizing the Grid, Block, and Thread
  • Slide 60
  • Slide 61
  • Warp Divergence Examples
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65