Top Banner
Parallel Object Programming in POP- C++: a case study for sparse matrix vector multiplication Clovis Dongmo Jiogo Pierre Manneback Faculté polytechnique de Mons Pierre Kuonen University of Fribourg
30

Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Aug 08, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Parallel Object Programming in POP-C++: a case study for sparse matrix

vector multiplication

Clovis Dongmo JiogoPierre MannebackFaculté polytechnique de Mons

Pierre KuonenUniversity of Fribourg

Page 2: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Purpose of this work

Test Pop-C++ for some scientific computations on Grids

Present the parallel programming model POP-C++Evaluate its performances in Grid environmentShow how POP-C++ can improve matrix computations

Page 3: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Agenda

Overview of POP-C++Sparse Matrix/Vector productProgramming in Pop-C++Experimental resultsFuture work

Page 4: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Object oriented application

POP: Parallel Object Programming

Grid environment

ObjectObj

Object

ObjectObject

Object

• Heterogeneous• Large scale• Unstructured• Dynamic and unknown topology

• Distributed objects• Heterogeneous• Dynamic

execute

Page 5: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Approach of POP-C++

Service oriented approachResource allocation driven by object requirementsVarious invocations semanticsObject-oriented parallel programming paradigm (parallel objects)Object-oriented Programming System

Page 6: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

POP-C++ Programming Model

Extension of C++ languageData transmission via shared objectTwo level of parallelism

Inter-object parallelismIntra-object parallelism

Transparent and dynamic object allocation guided by the object resources needCapacity to glue to Grid Toolkits

Page 7: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Semantic invocation : interface side

Two ways to call a method

SynchronousMethod returns when the execution is finished

Same semantic than sequential invocation

AsynchronousMethod returns immediately

Allows parallelism but.. no returned value

Object 1 Object 2

Object 1 Object 2

Parallelexecution

Page 8: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Method call semantics : definition

1 - An arriving concurrent call can be executed concurrently (time sharing) when it arrives, except if mutex calls are pending or executing. In the later case he is executed after completion of all mutex calls previously arrived.

2 - An arriving sequential call is executed after completion of all sequential and mutex calls previously arrived.

3 - An arriving mutex call is executed after completion of all calls previously arrived.

Page 9: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

O1 O2

Method call semantics : example

All calls are asynchronous

Delayed

O2.Mseq()

O2.Mconc()

O2.Mseq()

O2.Mconc()

O2.Mseq()

O2.Mmut()

O2.Mconc()

Delayed

Delayed

Delayed

Page 10: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

POP-C++ Syntax

POP-C++ is an implementation of the parallel object model as an extension of C++ with six new key words :

parclass : to declare a parallel classasync : asynchronous method callsync : synchronous method callconc : concurrent method executionseq : sequential method executionmutex : mutex method execution

Page 11: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

POP-C++ architecture

A multi-layer architecture

Integration of new middleware into the system in a PnP flavor

Computational environment

POP-C++ essential service abstractions

Globus Toolkit XtremWeb Standalone POP-C++

Grid Web computing

Testing Distributed

environment

Other toolkits

Other distributed

environments

POP-C++ programming

POP-C++ services for

Globus

POP-C++ services for XtremWeb

POP-C++ services for

testing

Other customizable

services

Customizable service implementations

Page 12: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Requirement-driven objects

Each parallel object has a user-specified object description (OD)OD describes the requirements of parallel objectsOD is used as a guideline for allocating resource and object migrationOD can be expressed in terms of:

Maximum computing power (e.g. Mflops)Communication bandwidth with its interfaceMemory needed

OD can be parameterized on each parallel object (based on the actual input)

Page 13: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Object description example

parclass Matrix{Matrix (int n) @{

od.power(300 , 100);od.memory(n*n*sizeof(double)/1E6)od.protocol("socket http")

… }}The creation of an object for Matrix parallel class requires:

A computing power of 300Mfps, but 100Mfps are acceptableA capacity memory of de n*n*sizeof(double)/1E6 MbytesA protocol socket or http for the communication

Page 14: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Agenda

Overview of POP-C++Sparse Matrix/Vector productProgramming in Pop-C++Experimental resultsFuture work

Page 15: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Sparse storage format : CRS

11 0 14 0 00 22 0 0 00 0 0 0 014 0 0 0 4515 0 0 45 0

Row_ptr[*] = [1; 3; 4; 6; 8]Col_ind[*] = [1; 3; 2; 1; 5; 1; 4]Mat_val[*] = [11; 14; 22; 14; 45; 15; 45]

CRS data structure use three vectors

Page 16: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Sparse Matrix/vector partitioning

××××××××

× × × ×× × × ×

× × × ×× × ×

× × ×× × × × ×

× × × ×× × × × ×

==

R1

R2

R3

Sparse matrix is partitioned according to the resource power

××××××××

Page 17: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

×××× ××××× ×××××××××× ×× ××××× ×× ××××××××××× × ××× ××××× ××××× × ×××× ×× ×× ×××× ×× ×× ×××××××× ×××××× ×××××××

A1

A2

A3

A4

A

A1

A2

A3

A4

Tminimal

Execution time

?

Distribution model

Find a matrix partitioning which minimizes the total execution time?

Page 18: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Objectives:

Load balancing:

Fast : linear computing time

Efficient : ε << 1

Balancing Heuristic

∑ε+=ε+≈i

ii

avgi

i W)1(pp)1(W

pkpW

Page 19: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Agenda

Overview of POP-C++Sparse Matrix/Vector productProgramming in Pop-C++Experimental resultsFuture work

Page 20: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

The parallel class SparseMatrixparclass SparseMatrix{

public :SparseMatrix(int wanted, int min)@od.power(wanted, min) ;seq async void Init( [in, size=n+1] double *rom_ptr, int n, …) ;seq async void MvMultiply( [in, size=n] double *vector, int n) ;mutex sync int GetResult( [out, size=m] double *V, int m) ;

private :double *mat_val , *vect_res; int *col_ind, *row_ptr;…}

The object requirements are defined by the constructor

Page 21: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Minimal extension of C++

parclass Foo {…

Foo(…) @ {power =100; };

conc async void Mymethod(…);

};

class Foo {…

Foo(…);

Void Mymethod(…);

};

Foo : : Foo(…) {…}

Void Foo : : Mymethod (…) {… }

C++Constructor:

Method:

POP-C++

Shared implementation

Page 22: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Methods are implemented in C++

…void SparseMatrix : :MvMultiply ( double *vector, int n) {

for (int i = 0 ; i < n ; i++){vect res[i] = 0.0 ;for (int j=row ptr[i] ; j<row ptr[i+1] ; j++)

vect res[i] += mat val[j] * vector[col ind[j]] ;}

}…

Page 23: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

5721R4R3R2R1

power_ptr

R1R2

R4R3

row_ptrmat_valcol_indvector

SetMatVarDataMatDist

PartitionMatrix

Init

GetResult

MvMultiply

ComputeResult

Fichiers de données

Execution steps

Page 24: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Agenda

Overview of POP-C++Sparse Matrix/Vector productProgramming in Pop-C++Experimental resultsFuture work

Page 25: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Experimental Platform

PCs properties

AMD Athelon2 Ghz256Mb of RamFast Ethernet

Cluster properties

Cluster Sun Fire V2010 bi-opteron nodes1.8 Ghz1Gb of RamGigaBit Ethernet

Page 26: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Test matrices

Nom matrice Domaine

d’application Taille(n) NZ(m)

(a) fidapFinite element modeling 16614 1091362

(b) poisson3DbFinite element modeling 85623 2374949

(c) Stanford-web Web crawling 281903 2382912

(d) Stanford-w.b. Web crawling 685230 8006115

Matrix Markets Format

<i> <j> <Aij>

Page 27: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Experimental results

# Proc. Matrice Type

POP-C++ 108.2 62.8 31.4 22.9 22.7 LAM/MPI 96.5 52.6 39.2 20.7 16.9 POP-C++ 230.3 120.0 63.3 41.4 36.4 LAM/MPI 215.6 111.6 73.8 43.2 33.6 POP-C++ 267.7 112.4 80.5 49.2 48.4 LAM/MPI 173.5 101.3 64.5 46.2 46.8 (d)

(c)

(b)

#1 #2 #4 #8 #12

Total execution time for 1000 iterations

Page 28: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Experimental results

0

50

100

150

200

250

300

# 1 # 2 # 4 # 8 # 12

Nombre de processeursTe

mps

(s)

POP-C++

LAM/MPI

0

50

100

150

200

250

#1 #2 #4 #8 #12

Nombre de processeurs

Tem

ps(s

)

POP-C++

LAM/MPI

Matrice (b) Matrice (d)

Page 29: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Agenda

Overview of POP-C++Sparse Matrix/Vector productProgramming in Pop-C++Experimental resultsFuture work

Page 30: Parallel Object Programming in POP- C++: a case study for ...pmaa06.irisa.fr/pres/13-Jiogo-PMAA06.pdf · Method call semantics : definition 1 - An arriving concurrent call can be

Future work

Improve the performance by coupling POP-C++ with MPISetting up a Scheduler for tasks assignmentImplement iterative methods in grid environment based on heuristic for load balancingEvaluate POP-C++ performance in Globus environment