ICE: A General and Validated Energy Complexity Model for Multithreaded Algorithms Vi Tran, Phuong Ha Department of Computer Science, UiT The Arctic University of Norway The 2nd EXCESS workshop, Aug. 26, 2016
ICE: A General and Validated Energy Complexity Model for Multithreaded
Algorithms
Vi Tran, Phuong Ha
Department of Computer Science, UiT The Arctic University of Norway
The 2nd EXCESS workshop, Aug. 26, 2016
Motivation – Energy Complexity Models
Time complexity models contribute to
Analysis and development of performance-efficientalgorithms
Energy complexity models are crucial to
Understand the energy consumption of algorithms
Improve energy efficiency of algorithms
Reduce energy consumption of computing systems
Energy complexity models must be
Applicable to both sequential and multi-threadedalgorithms
Considering both algorithm and platform characteristics
Not only theoretical, but also validated on real platformsand application kernels
Vi Tran, Phuong Ha 2
Motivation – ICE Complexity Model
Vi Tran, Phuong Ha 3
Differences of ICE (Ideal Cache Energy) complexity model compared to available energy models
Contributions
Vi Tran, Phuong Ha 4
Devise a new ICE model to answer the question:
Given two parallel algorithms A and B for a given problem,
which algorithm consumes less energy analytically?
Conduct two cases studies to apply the ICE model on
data-intensive algorithms
computation-intensive algorithms
Validate the ICE model
with different algorithms using different input types on different HPC
platforms
Results: 100% matching of the ICE model and experimental data
ICE complexity model does not provide the absolute estimation of
energy consumption
Outline
Motivation
Contributions
Shared Memory Machine Model
ICE: Energy Complexity Models
A Case Study of Energy Complexity – SpMV
A Case Study of Energy Complexity – Matmul
ICE Model Validation
Conclusion
Vi Tran, Phuong Ha 5
Shared Memory Machine Model
Vi Tran, Phuong Ha 6
The energy consumption of a parallel algorithm:
Energy for memory accesses are analysed based on I/O complexity
There are two available I/O models for parallel algorithms
Model Approach Limitation
PEM (Parallel External Memory)
N cores access n blocks simultaneously, I/O complexity = O(1)
Suitable for time complexity, rather than energy complexity
IDC (Ideal Distributed Cache)
N cores access n blocks simultaneously, I/O complexity = O(n)
Only applicable for divide-and-conquer algorithms
Traditional IC (Ideal Cache)
I/O complexity = O(n)Find upper-bound on I/O complexity
Applicable for both sequential and multithreaded algorithms
Outline
Motivation
Contributions
Shared Memory Machine Model
ICE: Energy Complexity Models
A Case Study of Energy Complexity – SpMV
A Case Study of Energy Complexity – Matmul
ICE Model Validation
Conclusion
Vi Tran, Phuong Ha 7
ICE Complexity Model - Parameters
Vi Tran, Phuong Ha 8
The ICE model considers both machine and algorithm characteristics
ICE Complexity Model – Compute-Bound
Vi Tran, Phuong Ha 9
: the static (or leakage) energy
: the dynamic energy of computation
: the dynamic energy of memory accesses
The energy consumption of a parallel algorithm:
If an algorithm is compute-bound
ICE Complexity Model – Memory-Bound
Vi Tran, Phuong Ha 10
: the static (or leakage) energy
: the dynamic energy of computation
: the dynamic energy of memory accesses
The energy consumption of a parallel algorithm:
If an application is memory-bound:
ICE Complexity Model
Vi Tran, Phuong Ha 11
If an application is compute-bound:
If an application is memory-bound:
Where
Platform Parameters
We provide the parameter values for 11 recent HPC platforms
Vi Tran, Phuong Ha 12
Outline
Motivation
Contributions
Shared Memory Machine Model
ICE: Energy Complexity Models
A Case Study of Energy Complexity – SpMV
A Case Study of Energy Complexity – Matmul
ICE Model Validation
Conclusion
Vi Tran, Phuong Ha 13
Case Studies - SpMV Energy Complexity
SpMV energy complexity:
Analyse Work Complexity, I/O complexity and Span Complexity of
Compress Sparse Collumn (CSC)
Compressed Sparse Block (CSB)
Compressed Sparse Row (CSR)
Vi Tran, Phuong Ha 14
Case Studies – CSC-SpMV
Compressed Sparse Column
Vi Tran, Phuong Ha 15
SpMV is a memory-bound algorithm
Case Studies – CSB-SpMV
Vi Tran, Phuong Ha 16
SpMV is a memory-bound algorithm
Compressed Sparse Block
Case Study - Dense Matrix Multiplication (Matmul)
Matmul: A [n][m]* B [m][p] = C [n][p]
Simple Matmul (Simple-Matmul): a 3-loop implementation of Matmul
Cache-oblivious Matmul (CO-Matmul): a recursive Matmul. At each step,
If n >= max (m, p)
If m >= max (n, p)
If p >= max (n, m)
Vi Tran, Phuong Ha 17
A B
A1 B
A2 B
= =
C1
C2
= Cx
x
x
Matmul Complexity Analysis
Matmul is compute-bound algorithm
Vi Tran, Phuong Ha 18
Simple-Matmul Energy Complexity
Matmul complexity analysis
Matmul Complexity Analysis
Matmul is compute-bound algorithm
Vi Tran, Phuong Ha 19
CO-Matmul Energy Complexity
Matmul complexity analysis
Outline
Motivation
Contributions
Shared Memory Machine Model
ICE: Energy Complexity Models
A Case Study of Energy Complexity – SpMV
A Case Study of Energy Complexity – Matmul
ICE Model Validation
Conclusion
Vi Tran, Phuong Ha 20
Model Validation
Vi Tran, Phuong Ha 21
The ICE model objective is to answer the question:
Given two parallel algorithms A and B for a given problem,
which algorithm consumes less energy analytically?
Validate the ICE model with different SpMV and Matmul
algorithms
Validate the ICE model with different input types
Sparse matrices (a subset of Florida set): varied matrix sizes (n,
m) and varied patterns (nz, nc)
Dense matrix: varied matrix sizes (n, m)
Validate the ICE model with experimental data on 2 HPC
platforms (Xeon and Xeon Phi)
Model Validation – Expected Results
Vi Tran, Phuong Ha 22
Compute the energy consumption ratio of
two SpMV algorithms (i.e., CSC-SpMV and CSB-SpMV) and
two Matmul algorithms (i.e., Simple-Matmul and CO-Matmul)
Expected results: the energy comparison from both energy
model and experimental data is matched
The energy ratio of CSC-energy to CSB-energy is greater/lesser
than 1 from both model and experimental data
The energy ratio of Simple-Matmul to CO-Matmul is greater/lesser
than 1 from both model and experimental data
ICE Model Validation - SpMV
Vi Tran, Phuong Ha 23
Energy consumption ratio of CSC to CSB SpMV on Xeon and Xeon Phi
Match percentage of energy comparison for CSC and CSB-SpMV is 100%
ICE Model Validation - Matmul
Vi Tran, Phuong Ha 24
Energy consumption ratio of Simple-Matmul to CO-Matmul on Xeon and Xeon Phi
Match percentage of energy comparison for Simple-Matmul and CO-Matmul is 100%
Outline
Motivation
Contributions
Shared Memory Machine Model
ICE: Energy Complexity Models
A Case Study of Energy Complexity – SpMV
A Case Study of Energy Complexity – Matmul
ICE Model Validation
Conclusion
Vi Tran, Phuong Ha 25
Conclusion - Energy/Power Model Studies
Vi Tran, Phuong Ha 26
Devise a new energy complexity model (ICE) for general and multi-threaded algorithms
Analyse algorithms by their work, span and I/O complexity
Proposing Ideal Cache Memory Model to analyse I/O complexity in the ICE model.
Considering static and dynamic energy of computation and memory access as platform parameters
Propose a new way to analyse I/O complexity in energy complexity model
Conduct two case studies (i.e., SpMV and Matmul) to demonstrate how to use the ICE model
Conduct experimental studies to validate the ICE model:
For data-intensive and computation-intensive algorithms
With different input matrix types and sizes and two HPC platforms (e.g., Intel Xeon and Xeon Phi)
Energy comparison of two given algorithms are 100% matched