Top Banner
Integrated, Application-Level, Performance-Energy Modeling for Heterogeneous Architectures PIs:Sudha Yalamanchili, Hyesoon Kim, Students: Eric Anger, Prasun Gera, Nagesh B. Lakshminarayana Collaborators: Jeremiah J. Wilke, Patrick S McCormick, Sudha Yalamanchili
22

Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Jan 23, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Integrated, Application-Level, Performance-Energy Modeling for Heterogeneous Architectures

PIs:Sudha  Yalamanchili,  Hyesoon  Kim,    Students:  Eric  Anger,  Prasun  Gera,    

Nagesh  B.  Lakshminarayana    Collaborators:  Jeremiah  J.  Wilke,  Patrick  S  McCormick,  

Sudha  Yalamanchili    

Page 2: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Goals

Large Graphs New generation of applications

Needs: Fast simulation/profiling & Understanding high-level behaviors

Page 3: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

For whom?

•  Application developers – Application optimizations for different

architectures – Algorithm selections

•  Hardware developers – Architecture parameter decisions – Large scale hardware developers

Page 4: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Motivation •  NVIDIA GPU-K40 •  BFS algorithm: different implementations, different

inputs

21.7 29

0 1 2 3 4 5 6 7 8 9

10

eu-2005 italy rgg_n_2_18_so

Rela

tive

Ener

gy

(nor

mal

ized

to th

e m

inim

um

ener

gy)

HIPC LS SHOC1 SHOC2

Page 5: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Proposed Framework

•  Fast and scalable simulation •  Application optimization guide

ApplicaFon  

Arch-­‐Independent  Metrics  

Energy  Model  

ApplicaFon  Energy  Profiler   Macro  SST  Simulator    

Hardware Model

Hardware parameters

Model training

Page 6: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Application Level Energy Profiling

•  Function level energy profiling – collect time and energy per function

boundaries – 

•  Instruction level profiling

– synchronizations, parallelism?

Page 7: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Profiling Mechanisms

•  Profiling with Byfl* –  From LANL –  To collect hardware independent metrics –  To help application developers –  Instrumenting code in LLVM’s immediate representation –  Profiled information: All IR level information

•  Low-level primitives (barriers, synchronization information), computation per memory bytes etc.

•  Profiling for fast and scalable hardware simulation –  Application skeleton (more detail in later)

•  Profiling for application understanding

Scott Pakin, Patrick McCormick, “Hardware-independent application characterization,” IISWC 2013 https://github.com/losalamos/Byfl

Page 8: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Why Architecture Independent Metrics?

•  To get high-level information – Synchronization overhead? – # of data accesses – Data movements – Leading to more software level optimization

decisions. •  Separate the hardware dependent

overhead and software caused overhead

Page 9: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Eg) Power Efficiency and TLP

source: An Integrated GPU Power and Performance Model, ISCA’10 ,

peak power efficient point

Page 10: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Hardware Modeling

ApplicaFon  

Arch-­‐Independent  Metrics  

Energy  Model  

Hardware Model Memory  

Hierarchy  Model  ISA  TranslaFon  

Model  

Memory  Access  CharacterizaFon  

Workload  CharacterizaFon  

Hardware  Performance  Counters  +  RAPL    

Regression  Based  Performance  Model  

Oracle hardware modeling

Model feedback

Arch-­‐Independent  Metrics  

Page 11: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Power modeling with Application metrics •  Now, we need to model architecture components •  Not all memory instructions are equal!!!

0

500

1000

1500

2000

2500

190 210 230 L2

are

a (m

m²)

Total power (W)

1r/1w, 1b

1r/1w, 2b

1r/1w, 4b

1r/1w, 8b

2r/2w, 1b

2r/2w, 2b

2r/2w, 4b

2r/2w, 8b

52  

0  

2  

4  

6  

8  

10  

12  

14  

16  

18  

20  

L1  cache   Texture  cache    constant  cache  GDDR  memory    

Power  con

sump2

on  fa

ctor  per  access    

Cache Modeling is critical!

source: An Integrated GPU Power and Performance Model, ISCA’10 ,

Page 12: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Hardware Model Stage

ApplicaFon  

Arch-­‐Independent  Metrics  

Energy  Model  

Hardware Model Memory  

Hierarchy  Model  ISA  TranslaFon  

Model  

Memory  Access  CharacterizaFon  

Workload  CharacterizaFon  

Hardware  Performance  Counters  +  RAPL    

Regression  Based  Performance  Model  

Oracle hardware modeling

Model feedback

Arch-­‐Independent  Metrics  

Energy  Model  

Page 13: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Training Power Model

•  Collect power numbers using RAPL – Read hardware performance counters and

memory power consumption values –  Integrate RAPL calls from LLVM

•  Regression based power modeling – Eiger

Andrew Kerr, Eric Anger, Gilber Hendry, and Sudhakar Yalamanchili. “Eiger: A framework for the automated synthesis of statistical performance models”, In High Performance Computing, 2012.

RAPL, https://01.org/blogs/tlcounts/2014/running-average-power-limit-%E2%80%93-rapl

Page 14: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Eiger Framework

•  Manipulations of data à analyze relationships –  Aid in analysis and verification of model behavior

•  Ease model exploration •  Extensible to new modeling techniques

Eiger Database

PCA, Clustering, etc.

Raw Data

Analysis Results

Simulator

Measurement API (lwperf)

Empirical Data

Training Data

Model Parameters

Regression, Model Estimation

Analysis and Modeling Reports

SST Interface

Model

Measurement Analysis Model Construction Reporting and Export

14

Page 15: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Framework

ApplicaFon  

Arch-­‐Independent  Metrics  

Energy  Model  

ApplicaFon  Energy  Profiler   Macro  SST  Simulator    

Hardware Model

Hardware parameters

Model training

Page 16: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Application Skeletons

MPI_scatter

MPI_send MPI_recv

MPI_gather

MPI_send

SST/macro Network simulation <sstmac/sstmpi.h>

Replace with a model of execution time

Simplified code which (approximately) reproduces some behavior of interest for a full application

Example - Communication Skeletons: Capture only control

flow and communication

Eiger Framework

16

will expand for energy

Page 17: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Macro-scale Simulation Structure* Instances of application skeletons

switch

node

Nodes Model: •  Multithreading •  Accelerators •  NIC effects/contention

Network switches model: •  Packet arbitration •  Adaptive Routing •  Queuing/buffering

Messages modeled as: •  Flows •  Packets •  Packet trains

switch

switch switch

node node node

node node node node

17

*SST SNL

Region-specific energy models

System-level estimates

Page 18: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Application-level Energy Profiler

•  Source code level energy profiler – Function level, instruction level

•  Future work will construct more high-level analysis – e.g.) analysis of TLP, synchronization

overhead, data movement

Page 19: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Future Work

•  More coding, modeling, …. – all model components need to be improved/

integrated – Validations – Large scale simulations

•  More high-level application level energy models

Page 20: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Summary Q&A-I •  Major contributions of the work

–  Integrated performance/power model starting from application level

•  What are the gaps in the research area –  Providing feedback to high-level applications from low-level

application modeling

•  What major opportunities –  Frame can provide tools for application developers to optimize

their applications and hardware developers to simulate large scale applications

Page 21: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

Summary-Q&A II •  What is the one thing that would make it easier/possible

to leverage/use the results of other projects to further your own research –  Faster cache modeling

•  What would you like to most see solved/addressed other than what they are working on? –  Low-overhead DRAM (memory technology) specific models –  Hardware performance counters to measure detailed DRAM

access behaviors

Page 22: Integrated, Application-Level, Performance-Energy Modeling ...hpc.pnl.gov/modsim/2014/Presentations/Kim.pdf– Integrated performance/power model starting from application level •

THANK YOU!