Top Banner
Multicore Birgit Plötzeneder, 11/24/10
39

Multicore

Jun 19, 2015

Download

Technology

This talk was given to the TumFUG Linux/Unix-User group at the TU München.

Contact me via [email protected]
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Multicore

Multicore Birgit Plötzeneder, 11/24/10

Page 2: Multicore

Intro (Why?)

Architecture

LanguagesOMP

MPI

Tools

Page 3: Multicore

Darling, I shrunk the

computer. *

• copyright by Prof. Erik Hagersten/ Uppsala, who does awesome work

Signal propagation delay » transistor delay

Not enough ILP for more transistors

Power consumption

Page 4: Multicore

O R

LY

?

You want FASTER code. NOW.

- prefetching - high comp load - image/video - fun

Page 5: Multicore
Page 6: Multicore

Intel Core 2 Quad

Page 7: Multicore

AMD Shanghai (K10)

Page 8: Multicore

Intel Dunnington (Xeon 74xx)

Page 9: Multicore

Intel i7

Page 10: Multicore

AMD Magny Cours

Page 11: Multicore

The Secret..

Page 12: Multicore

Moving from 1 core to 4 cores can give you a factor of

Page 13: Multicore

Moving from memory to L1 can give you a factor of

Page 14: Multicore

Disabling the L2 cache will reduce system performance more than disabling a second CPU core of a dual-core processor…

Page 15: Multicore

* see Iris Christadler, LRZ

Page 16: Multicore

OMP and MPI

Page 17: Multicore

Program start:only master thread runs

Parallel region: team of worker threads is generated (“fork”)

Threads synchronize when leaving parallel region (“join”)

OpenMP-Concept

Page 18: Multicore

A First Program

Page 19: Multicore

Work-sharing constructs

omp for or omp dosections single master

Page 20: Multicore

Data sharing attribute clausesshared: visible and accessible by all threads simultaneously. Default (!i). a[i]=a[i-1]..

private: each thread will have a local copy, value is not maintained for use outside

firstprivate: like private except initialized to original value.

lastprivate: like private except original value is updated after construct.

reduction (->reduction ops)

Page 21: Multicore

Scheduling clauses

schedule(type, chunk):

static

dynamic

guided

Page 22: Multicore

Other clausescritical: executed by only one thread at a time

atomic: similar to critical section, but may be better

ordered: executed in the order in which iterations would be executed in a sequential loop

barrier nowait

Page 23: Multicore

Using clauses

Page 24: Multicore
Page 25: Multicore

MPI-Concept

mpicc <options> prog.c

mpirun -arch <architecture> -np<np> prog

Page 26: Multicore

MPI

Page 27: Multicore

MPI

Page 28: Multicore

MPI program: 6 basic callsMPI_INITMPI_COMM_RANKMPI_COMM_SIZE

MPI_SENDMPI_RECVMPI_FINALIZE

MPI messages data (startbuf, count, datatype) envelope (destination/source, tag, communicatior)

Communicators

MPI

Page 29: Multicore

Communication modes

•Collective vs P2P▫One2All, All2All, All2One

•Blocking / Nonblocking•Synchronous / Asynchronous

Page 30: Multicore

Communication modes

•synchronous mode ("safest"): Is the receiver ready?•ready mode (lowest system overhead)- only if there is a receiver waiting (streaming)buffered mode (decouples sender from receiver), buffer size, buffer attachment!standard mode

Page 31: Multicore

 Communication

Mode Blocking RoutinesNon-Blocking

Routines

  synchronous MPI_SSEND MPI_ISSEND  ready MPI_RSEND MPI_IRSEND  buffered MPI_BSEND MPI_IBSEND  standard MPI_SEND MPI_ISEND  MPI_RECV MPI_IRECV  MPI_SENDRECV

 MPI_SENDRECV_REPLACE

Page 32: Multicore

Collective communicationBarrier

Broadcast

Gather

Scatter

Reduction

Page 33: Multicore

gprof

valgrind

PAPI

Page 34: Multicore

PAPI

PAPI is a library that monitors hardware events when a program runs. Papiex is a tool that makes it easy to get access to performance counters using PAPI.*

*http://icl.cs.utk.edu/papi/

papiex –e <EVENT> ./my_prog (to turn of optimizations (use the flag -O0) for some tests)

Page 35: Multicore

ProfilersTwo Types

Statistical Profilers Event Based Profilers

Statistical Profiling:

Interrupts at random

intervals and records

which program instruction

the CPU is executing.

Event Based Profiling:

Interrupts triggered by hardware counter events

are recorded. Measuring

profiles affects performance.

Still a lot of data saved.

Page 36: Multicore

Tracing

Wrappers for function calls (for example MPI_Recv)

Records when a function was called and with what parameters

Which nodes exchanged messages, message size…

Can affect performance

Page 37: Multicore

Intel tracing tools

Marmot MPI correctness and portability checker

MpiP - http://mpip.sourceforge.net/

Page 38: Multicore

Extrae + Paraver

module add paraver

mpi2prv -f TRACE.mpits -o MPImatrix.prvv

Scalasca

Screenshots and examples of profilers/tracing tools available – but not on the internet.v

Page 39: Multicore

This talk was given to the TumFUG Linux/Unix-User group at the TU München.

Contact me via [email protected]

You may use the pictures of the processors (not the screenshots, not the overview pic which I only adapted), but please do notify and credit me

accordingly. Some of the code was copy-pasted from Wikipedia.

I've removed copy-right problematic parts.