Diamonds are a Memory Controller’s Best Friend* *Also known as: Achieving Predictable Performance through Better Memory Controller Placement in Many-Core CMPs, from ISCA ’09. Those responsible for the original title have been sacked. Dennis Abts Google Natalie Enright Jerger University of Toronto John Kim KAIST Dan Gibson Univ of Wisconsin Mikko Lipasti Univ of Wisconsin
Dennis Abts Google . Natalie Enright Jerger University of Toronto. John Kim KAIST. Diamonds are a Memory Controller’s Best Friend*. Dan Gibson Univ of Wisconsin. Mikko Lipasti Univ of Wisconsin. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Diamonds are a Memory Controller’s Best Friend*
*Also known as: Achieving Predictable Performance through Better Memory Controller Placement in Many-Core CMPs, from ISCA ’09. Those responsible for the original title have been sacked.
Dennis AbtsGoogle
Natalie Enright JergerUniversity of Toronto
John KimKAIST
Dan GibsonUniv of Wisconsin
Mikko Lipasti
Univ of Wisconsin
Executive Summary ®• On what tiles should memory controllers reside?
• Diamond MC placement works well for on-chip meshes and tori– Diamonds minimize maximum channel load– Diamonds deliver lower and more predictable
runtimes
Background• Diverse on-chip communication
– Cache-to-cache– LD/ST to Memory– Off-chip traffic (e.g., I/O)
• Processors/chip on the rise– Pins available for memory not rising as fast: Memory
bandwidth becomes more precious– Reality: Many Cores, Few Memory Controllers
• Tiled architectures gaining popularity– Commonly employ on-chip meshes or tori
The Problem• What Memory Controller placement is best
overall?– Flip-chip packaging allows flexible escape routes– n tiles and m ports:
• Don’t worry, there are only configurations!
– What are the characteristics of the best configuration?
• Performance: Low runtime for a set of objective workloads• Throughput: Low latency as a function of offered load• Fairness: Similar (low) average memory latency across all
nodes.• Predictability: Low latency and runtime variance
nm
Slight Simplification: Assume n = k2 and m =
2k
Baseline Placement: row0_7• Ports to MCs located at