Top Banner
ADVANCED SCIENTIFIC COMPUTING Dr. – Ing. Morris Riedel Adjunct Associated Professor School of Engineering and Natural Sciences, University of Iceland Research Group Leader, Juelich Supercomputing Centre, Germany Parallelization Fundamentals August 29, 2017 Room TG-227 High Performance Computing LECTURE 2
40

High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Mar 11, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

ADVANCED SCIENTIFIC COMPUTING

Dr. – Ing. Morris RiedelAdjunct Associated ProfessorSchool of Engineering and Natural Sciences, University of IcelandResearch Group Leader, Juelich Supercomputing Centre, Germany

Parallelization FundamentalsAugust 29, 2017Room TG-227

High Performance Computing

LECTURE 2

Page 2: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Review of Lecture 1 – High Performance Computing

Theory (e.g. known physical laws) Technology

Architecture

Lecture 2 – Parallelization Fundamentals

[3] Introduction to High Performance Computing for Scientists and Engineers

Software

[2] Distributed & Cloud Computing Book

[1] LLView Tool

2 / 40

Page 3: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Outline of the Course

1. High Performance Computing

2. Parallelization Fundamentals

3. Parallel Programming with MPI

4. Advanced MPI Techniques

5. Parallel Algorithms & Data Structures

6. Parallel Programming with OpenMP

7. Hybrid Programming & Patterns

8. Debugging & Profiling Techniques

9. Performance Optimization & Tools

10. Scalable HPC Infrastructures & GPUs

Lecture 2 – Parallelization Fundamentals

11. Scientific Visualization & Steering

12. Terrestrial Systems & Climate

13. Systems Biology & Bioinformatics

14. Molecular Systems & Libraries

15. Computational Fluid Dynamics

16. Finite Elements Method

17. Machine Learning & Data Mining

18. Epilogue

+ additional practical lectures for our

hands-on exercises in context

3 / 40

Page 4: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Outline

Common Strategies for Parallelization Simple Parallel Computing Examples Parallelization Methods Overview Domain Decomposition & Halo/Ghost Layer Data Parallelism Methods Functional Parallelism Methods

Parallelization Terminology Moore’s Law & Parallelization Reasons Speedup & Load Imbalance Parallelization Goals & Challenges Fast & Scalable Applications High Performance & Analysis

Lecture 2 – Parallelization Fundamentals

Promises from previous lecture(s):

Lecture 1: Lecture 2 will give in-depth details on parallelization fundamentals & performance relationships

4 / 40

Page 5: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Common Strategies for Parallelization

Lecture 2 – Parallelization Fundamentals 5 / 40

Page 6: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Parallel Computing (cf. Lecture 1)

All modern supercomputers depend heavily on parallelism

Often known as ‘parallel processing’ of some problem space Tackle problems in parallel to enable the ‘best performance’ possible

‘The measure of speed’ in High Performance Computing matters Common measure for parallel computers established by TOP500 list Based on benchmark for ranking the best 500 computers worldwide

Lecture 2 – Parallelization Fundamentals

We speak of parallel computing whenever a number of ‘compute elements’ (e.g. cores) solve a problem in a cooperative way

[3] Introduction to High Performance Computing for Scientists and Engineers

[4] TOP 500 supercomputing sites

6 / 40

Page 7: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Simple Parallel Computing Example on Multi-Core CPUs

1. Think how the data elements can be divided onto CPUs/cores

2. Think what each CPUs/cores should doExample: Find largest (maximum) element in an array

Lecture 2 – Parallelization Fundamentals

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

CPU/core 1 CPU/core 2 CPU/core 3 CPU/core 4

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Max-local A Max-local B Max-local C Max-local D

Max-global = Max (Max-local A,B,C,D)

[2] Distributed & Cloud Computing Book

7 / 40

Page 8: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Parallel Matrix-Vector Multiplication Example on GPUs

PO – P4 are processes on four GPU cores

Lecture 2 – Parallelization Fundamentals

Step one: each GPU core has a column of matrix B (named as Bpart) Step one: each GPU core has an element of column vector C (named Cpart)

Step two: Each GPU core performs an independent vector-scalar multiplication(based on their Bpart and Cpart contents)

Step three: Each GPU core has a part of the result vector A (named Apart) and is written in device memory

(nice parallelization possiblevia independent computing)

[2] Distributed & Cloud Computing Book

(GPUs are designed tocompute large numbersof floating point operations in parallel)

8 / 40

Page 9: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Parallelization Methods & Domain Decomposition

Data Parallelism

Functional Parallelism

Lecture 2 – Parallelization Fundamentals

[5] 2013 SMU HPC Summer Workshop [6] Parallel Computing Tutorial

(different forms of domaindecomposition methods)

9 / 40

Page 10: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Parallelization Methods in Detail

Data Parallelism (aka SPMD) N processors/cores work on

‘different parts of the data’ E.g. Medium-grained loop parellelization E.g. Domain decomposition

Functional Parallelism (aka MPMD) N processors/cores work on on

‘different sub-tasks’ of the problem Processors/cores work jointly together by

exchanging data and do synchronization E.g. Master-worker scheme E.g. Functional decomposition

In the Single Program Multiple Data (SPMD) paradigm each processor executes the same ‘code’ but with different data

In the Multiple Program Multiple Data (MPMD) paradigm each processor executes different ‘code’ with different data

Lecture 2 – Parallelization Fundamentals

Lectures 12-17 will provide details on applied parallelization methods within parallel applications

10 / 40

Page 11: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Data Parallelism: Medium-grained Loop Parallelization

Idea: Computations performed on individual array elements are independent of each other Good for parallel execution by N processors

(e.g. using shared memory)

Lecture 6 about OpenMP will include ‘data parallelism on loops’ methods that are useful here

c is a constant!a, b are different arrays

t1

t2 < t1Modified from [3] Introduction to High Performance Computing for Scientists and Engineers

Lecture 2 – Parallelization Fundamentals 11 / 40

Page 12: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Data Parallelism: Domain Decomposition

Idea: Simplified picture of reality with a ‘computational domain’ represented as a ‘grid’ (rather course-grained) or a ‘mesh’ Grids define discrete positions for the physical quantities

of the complete domain Grids are not always Cartesian and often adapted to the numerical

constraints of a certain algorithm in question The supercomputer then simulates the reality with observables

(e.g. certain physical variables) on this grid using N processors

Work distribution: Assign N parts of the grid to N processors

In parallel computing a Grid distribution can be related to solvingvariables in linear equations (or find best estimates of values)

Lecture 2 – Parallelization Fundamentals 12 / 40

Page 13: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Grid vs. Lattice Approach

Lecture 2 – Parallelization Fundamentals

[7] Map Analysis - Understanding Spatial Patterns and Relationships, Book

13 / 40

Page 14: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Scientific computing with HPC simulates ‘ ~realistic behaviour ‘ Apply common patterns over time & simulate based on numerical methods Increasing granularity (e.g. domain decomposition) needs more computing

Terrestrial Systems – Example Towards Realistic Simulations

Lecture 2 – Parallelization Fundamentals

(introduce more and more physical parameters over time…)

(compute more physical laws…)

(add scientific domain studies: e.g. rainfall, ocean waves, wind, oil, storms… )

(add objects to study: boats, fish, birds, people, oil platform, …)

Lecture 12 about Terrestrial Systems will provide more details on domain decomposition aspects

14 / 40

Page 15: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Data Parallelism: Domain Decomposition & Application

Parallelizing a two-dimensional Jacobi solver Jacobi method is a known ‘iterative method’ in numerical simulations

(iterative: step by step closer to the solution with approximations) Application example: heat dissipation & heatmap

Lecture 2 – Parallelization Fundamentals

[8] Templates for the solution of linear Systems [9] YouTube, Heat Dissipation Jacobi Method

15 / 40

Page 16: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Data Parallelism: Formulas Across Domain Decomposition

From the problem to computational data structures Apply an ‘isotropic lattice‘ technique

Lecture 2 – Parallelization Fundamentals

The isotropic lattice term is derived from ‘isotropy‘ that stands for uniformity in all orientations

[3] Introduction to High Performance Computing for Scientists and Engineers

‘change over time’diffusion equation

k / y

i / x

[10] Wikipedia on ‘stencil code’

16 / 40

Page 17: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Data Parallelism: Domain Decomposition & Equations

Example: Parallelizing a two-dimensional Jacobi solver Jacobi method is a known ‘iterative method’ in numerical simulations

(iterative: step by step closer to the solution with approximations) Solving n linear equations with n unknown variables, diagonal dominance

• Picking start values and iterate towards a ~final solutions (reducing errors/step) Goal: Update physical variables on a ‘N x N grid’ until approximations

good enough (maybe only solution to 97%, but enough & shorter time) Domain decomposition for N

processors subdivides the computational domain in N subdomains

[3] Introduction to High Performance Computing for Scientists and Engineers

Lecture 2 – Parallelization Fundamentals

Find (approximate) values for K and I update arrays

In each time step (e.g. T1) re-usingvalues from previous iteration (e.g. T0)

17 / 40

Page 18: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Data Parallelism: Domain Decomposition & Halo/Ghost

Two-dimensional Jacobi solver in context of parallel systems: Shared-memory and complete domain fits into memory

Relatively easy: all grid sites in all domains can be updated before the processors have to synchronize at the end of the sweep (i.e. time step)

Distributed-memory with no access to ‘neighbours memory’ Complex: updating the boundary sites of

one domain requires data from adjacent domain(s)

Idea: before a domain update (next step), all boundary values needed for the upcoming sweep must be communicatedto the relevant neighboring domains

We need to store this data somewhere,so extra grid points introduced (halo/ghost layers)

Lecture 2 – Parallelization Fundamentals

[3] Introduction to High Performance Computing for Scientists and Engineers

(boundary)

(halo / ghost)

18 / 40

Page 19: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Data Parallelism: Domain Decomposition & Communication

Two-dimensional Jacobi solver in context of communication cost: Often choosing the optimal domain decomposition is application-specific Next neighbour interactions needed and can vary (more/less shaded cells) Simple: Cutting in four stripes domains (left) incurs more communication Optimal decomposition: four domains (right) incurs less communication

[3] Introduction to High Performance Computing for Scientists and Engineers

Lecture 2 – Parallelization Fundamentals

3 * 16 = 48 4 * 8 = 32

Lecture 7 will provide more details on the 2D Jacobi application example and stencil methods

19 / 40

Page 20: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Functional Parallelism: Master-Worker Scheme

Idea: One processor performs administrative tasks while others solve a particular problem jointly together

Master Distributes work and collects

results from workers Could be single bottleneck

N Workers (old: slaves) Whenever a worker has finished

a package it stops or requests a new task from the master depending on the application

Example: Find largest element in array Which CPU/core does the global max?

P1

P2

P3

P4Master

N Workers

Lecture 2 – Parallelization Fundamentals 20 / 40

Page 21: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Functional Parallelism: Functional Decomposition

Idea: Couple different running codes in order to compute functions that jointly are used to solve a higher-level problem

Example: Multi-physics simulation of a race car (Multi-physics problems gaining popularity since they reflect better reality) Air flow around race car with Computational Fluid Dynamics (CFD) code Parallel finite element simulation could describe the reaction

of the flexible structures of the car body to the computed air flow(involves accurate geometry and material properties in context)

Both codes need to be coupled with a communication layer

Processors compute whole airflow

Processors compute reaction of car structures(eventually trying different materials out)

Both coupled (do it efficiently is not so easy)

Modified from [11] Caterham F1 team racespast competitionwith HPC

Lecture 2 – Parallelization Fundamentals 21 / 40

Page 22: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

[Video] PEPC – Particle Acceleration Application

Lecture 2 – Parallelization Fundamentals

[12] PEPC Video Application Example

22 / 40

Page 23: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Parallelization Terminology

Lecture 2 – Parallelization Fundamentals 23 / 40

Page 24: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Parallelization in High Performance Computing

Parallelization in HPC is essential due to the following capabilities Perform calculations, visualizations, and data processing… … at an incredible, ever-increasing speed … at an unprecedented granularity and / or accuracy

Lecture 2 – Parallelization Fundamentals

[13] JSC HPC Visualization Team

HPC uses parallel computing in order to tackle problems & increase insights

HPC can perform virtual experiments that are too dangerous or too expensive

HPC enables simulation of real-world phenomena not possible otherwise

HPC automates re-occuring processing of large quantities of data or many equations

24 / 40

Page 25: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Moore’s Law

Lecture 2 – Parallelization Fundamentals

Moore’s Laws says that the number of transistors on integrated circuits doubles approximately every two years (exponential growth)

[14] Wikipedia ‘Moore’s Law’

(seven lastdots areactually

many-coreGPGPUs,

cf. Lecture 1)

25 / 40

Page 26: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Reasons for Parallelization

The concept of ‘parallelization’ getting more mainstream today Supercomputers (that are massively parallel computers today) Multi-core PCs and Laptops (with increasing amount of cores, 2x, 4x, etc.) Many-core GPUs not only used for graphics but also for general processing

Two major reasons to engage in parallelization

The reason influences the choosen ‘parallelization method(s)’ Example: SPMD or MPMD

A single core is too slow to perform the required task(s)in a certain constrained amount of time

The available memory on a single system is not sufficientto tackle a problem in a required granularity or precision.

Derived from [3] Introduction to High Performance Computing for Scientists and Engineers

Lecture 2 – Parallelization Fundamentals 26 / 40

Page 27: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Parallelization Goal: Speedup Term

Consider simple situation: All processing units execute their assigned work in exactly the same amount of time Solving a problem would take Time T sequentially (1 Worker essentially) Having N workers solve the problem now ideally only in T/N This is a speedup of N

Modified from [3] Introduction to High Performance Computing for Scientists and Engineers

T = ‘timesteps’, here 12N = # workers, here 3

Speedup:T/N = 12/3 = 4 ‘timesteps’

N = 3Workers

W=12‘timesteps’

Lecture 2 – Parallelization Fundamentals 27 / 40

Page 28: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Parallelization Challenge: Load Imbalance Term

Consider a more realistic situation: Not all workers might execute their tasks in the same amount of time Reason: The problem simply can not be properly partitioned

into pieces with equal complexity Nearly worst case: All but a few have nothing to do but wait

for the latecomers to arrive (because of different execution times)

Lecture 2 – Parallelization Fundamentals

Modified from [3] Introduction to High Performance Computing for Scientists and Engineers

unusedresources

Load imbalance hampers performance, because someresources are underutilized

28 / 40

Page 29: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Load Imbalance Example

Parallel Programming Problems Wrong assumptions in

distributed-memory programming Cost and side effects of

the programmed communications

Lecture 2 – Parallelization Fundamentals

General Problems Serial execution limits Load Imbalance Unnecessary synchronizations

‘parallelperformanceissues’

t = 38secondsoverall

MPI programruntime

‘idle resources’

[3] Introduction to High Performance Computing for Scientists and Engineers

29 / 40

Page 30: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Parallelization Goal: Better Granularity & Accuracy

Lecture 2 – Parallelization Fundamentals

[15] F. Berman: ‘Maximising the Potential of Research Data’30 / 40

Page 31: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Parallelization Challenges: Optimal Domain Decompositions

Tree codes – ‘another form of smart domain decomposition‘ E.g. to speed up N-body simulations with long range interactions

Lecture 2 – Parallelization Fundamentals

Lecture 5 will provide more details on tree-code algorithms and related data structure designs

[16] PEPC Webpage

31 / 40

Page 32: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Importance of FLOPs in HPC

Lecture 2 – Parallelization Fundamentals

© Photograph by Rama, Wikimedia Commons, Cc-by-sa-2.0-fr

1.000.000 FLOP/s~1984

1.000.000.000.000.000 FLOP/s~295.000 cores~2009 (JUGENE)

>5.000.000.000.000.000FLOP/s~ 500.000 cores~ 2016

Fast and/or high performance means many n floating point operations (FLOP) per one second

32 / 40

Page 33: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Towards Fast and Scalable Applications

Many factors influence the scalablility of an application Benefits of smart domain decomposition methods is just one factor E.g. PEPC Tree-code on whole BlueGene/Q

Raises several questions and challenges What means faster? How we get to an application that is scalable?

Lecture 2 – Parallelization Fundamentals

[16] PEPC Webpage

Scalability is the ability of a system, network, or process to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth.

[17] Wikipedia on ‘scalability’

33 / 40

Page 34: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Performance Analysis is a key field in HPC

Analysis is typically performed using (automated) software tools Measure and analyze the runtime behaviour of parallel programs Identifies potential performance bottlenecks Offer performance optimization hints and views of the location in time Guides exploring causes of bottlenecks in communication/synchronization

Lecture 2 – Parallelization Fundamentals

[18] SCALASCA Performance Tool

Lecture 9 will give details on how to measure performance in parallel programms and related tools

34 / 40

Page 35: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Performance Analysis in Distributed-Memory Programming

Lecture 2 – Parallelization Fundamentals

[19] VAMPIR Performance Tool

35 / 40

Page 36: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

[Video] Parallelization From Theory to Practice

Lecture 2 – Parallelization Fundamentals

[20] Power! Youtube Video

36 / 40

Page 37: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Lecture Bibliography

Lecture 2 – Parallelization Fundamentals 37 / 40

Page 38: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Lecture Bibliography (1)

[1] LLView Tool, Online: http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Software/LLview/_node.html

[2] K. Hwang, G. C. Fox, J. J. Dongarra, ‘Distributed and Cloud Computing’, Book, Online: http://store.elsevier.com/product.jsp?locale=en_EU&isbn=9780128002049

[3] Introduction to High Performance Computing for Scientists and Engineers, Georg Hager & Gerhard Wellein, Chapman & Hall/CRC Computational Science, ISBN 143981192X

[4] TOP500 Supercomputing Sites, Online: http://www.top500.org/

[5] 2013 SMU HPC Summer Workshop, Session 8: Introduction to Parallel Computing,Online: http://dreynolds.math.smu.edu/SMUHPC_workshop/session_8.html

[6] Introduction to Parallel Computing Tutorial, Online: https://computing.llnl.gov/tutorials/parallel_comp/

[7] Map Analysis, Understanding Spatial Patterns and Relationships, Joseph K. Berry, Online: http://www.innovativegis.com/basis/Books/MapAnalysis/Default.htm

[8] Templates for the solution fo linear systems, building blocks for iterative methods, book, Online: http://www.netlib.org/linalg/html_templates/Templates.html

[9] Jacobi Heat Dissipation, Online: https://www.youtube.com/watch?v=jBbanIGoIhE

[10] Wikipedia on ‘stencil code‘, Online: http://en.wikipedia.org/wiki/Stencil_code

Lecture 2 – Parallelization Fundamentals 38 / 40

Page 39: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Lecture Bibliography (2)

[11] Caterham F1 Team Races Past Competition with HPC, Online: http://insidehpc.com/2013/08/15/caterham-f1-team-races-past-competition-with-hpc

[12] PEPC Video Application Example, FZ Juelich, Online: http://www.fz-juelich.de/ias/jsc/EN/AboutUs/Organisation/ComputationalScience/Simlabs/slpp/SoftwarePEPC/_node.html

[13] JSC HPC Visualization Team, Online: http://www.fz-juelich.de/ias/jsc/EN/Expertise/Support/Visualization/_node.html

[14] Wikipedia ‘Moore’s Law’, Online: http://en.wikipedia.org/wiki/Moore's_law

[15] Fran Berman, ‘Maximising the Potential of Research Data’ [16] PEPC Webpage, FZ Juelich, Online:

http://www.fz-juelich.de/ias/jsc/EN/AboutUs/Organisation/ComputationalScience/Simlabs/slpp/SoftwarePEPC/_node.html

[17] Wikipedia Scalability, Online: http://en.wikipedia.org/wiki/Scalability

[18] Scalasca Performance Analysis Tool, Online: http://www.scalasca.org/

[19] VAMPIR Performance Analysis Tool,Online: http://www.vampir.eu/

[20] Power! | Copyright GeonX 2013, Geon Technologies, Online: http://www.youtube.com/watch?v=nEDOSGC3wFs

Lecture 2 – Parallelization Fundamentals 39 / 40

Page 40: High Performance Computing - Morris Riedelmorrisriedel.de/wp-content/uploads/2018/03/HPC-Lecture-2-HPC-Parallelization...(e.g. certain physical variables) on this grid using N processors

Lecture 2 – Parallelization Fundamentals 40 / 40