Prologue Taxonomy PROVA! Conclusions Achieve Reproducible Research with PROVA!: Performance Reproduction of Various Applications Helmar Burkhart, Antonio Maffia, Danilo Guerrera University of Basel [email protected]Friedrich-Alexander-Universit¨ at Erlangen-N¨ urnberg Tue, Oct 26, 2015
84
Embed
Achieve Reproducible Research with PROVA!: Performance ... · Prologue Taxonomy PROVA! Conclusions Dilemma of Experimental Science "Trouble at the lab. Scientists like to think of
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Prologue Taxonomy PROVA! Conclusions
Achieve Reproducible Research with PROVA!:Performance Reproduction of Various Applications
“Non-reproducible single occurrences are of no significance toscience.” (Karl Popper The Logic of Scientific Discovery1934/1959)
How Do Computational Disciplines Perform?
Algorithmic Engineering: J. of Experimental Algorithmicswww.jea.acm.org
Programming Languages & Methodologies:http://evaluate.inf.usi.ch
Artificial Intelligence: http://recomputation.org
Manifesto (”If we can compute your experiment now,anyone can recompute it 20 years from now”), VirtualMachine (”the only way, Runtime performance is asecondary issue).
A computational problem is solved by an algorithmic method on acompute system.
Problem: specification of the problem including characteristicparameters.
Method: description of the algorithmic approach used totackle the problem.
System: representation of the compute environment (bothhardware and software), on which an experiment is run.
Prologue Taxonomy PROVA! Conclusions
Problem / Method / System
A computational problem is solved by an algorithmic method on acompute system.
Problem: specification of the problem including characteristicparameters.
Method: description of the algorithmic approach used totackle the problem.
System: representation of the compute environment (bothhardware and software), on which an experiment is run.
Prologue Taxonomy PROVA! Conclusions
Problem / Method / System
A computational problem is solved by an algorithmic method on acompute system.
Problem: specification of the problem including characteristicparameters.
Method: description of the algorithmic approach used totackle the problem.
System: representation of the compute environment (bothhardware and software), on which an experiment is run.
Prologue Taxonomy PROVA! Conclusions
Problem / Method / System
A computational problem is solved by an algorithmic method on acompute system.
Problem: specification of the problem including characteristicparameters.
Method: description of the algorithmic approach used totackle the problem.
System: representation of the compute environment (bothhardware and software), on which an experiment is run.
Prologue Taxonomy PROVA! Conclusions
Micro- and Macro-Experiments
micro-experiment 1
Syst
em
Method
ProblemPro1
Met1
Sys1
Prologue Taxonomy PROVA! Conclusions
Micro- and Macro-Experiments
micro-experiment 1
Syst
em
Method
ProblemPro1
Met1
Met2
Sys1
micro-experiment 2
Prologue Taxonomy PROVA! Conclusions
Micro- and Macro-ExperimentsSyst
em
Method
ProblemPro1
Met1
Met2
Met3
Sys1
micro-experiment 1
micro-experiment 2
micro-experiment 3
Prologue Taxonomy PROVA! Conclusions
Micro- and Macro-ExperimentsSyst
em
Method
ProblemPro1
Met1
Met2
Met3macro-experiment
Sys1
micro-experiment 1
micro-experiment 2
micro-experiment 3
Prologue Taxonomy PROVA! Conclusions
Reproducibility Levels
Repetition: re-running the original micro- ormacro-experiment without any variation of the parameters,should drive to the same results and a certain level ofcredibility is guaranteed (completeness of documentation)
Replication: is related to the system hosting an experiment.An experiment should not be bound to a specific computeenvironment (portability)
Re-experimentation: if changing the methods drives to thesame outputs, the scientific approach is proven (correctnessof the approach)
Prologue Taxonomy PROVA! Conclusions
Reproducibility Levels
Repetition: re-running the original micro- ormacro-experiment without any variation of the parameters,should drive to the same results and a certain level ofcredibility is guaranteed (completeness of documentation)
Replication: is related to the system hosting an experiment.An experiment should not be bound to a specific computeenvironment (portability)
Re-experimentation: if changing the methods drives to thesame outputs, the scientific approach is proven (correctnessof the approach)
Prologue Taxonomy PROVA! Conclusions
Reproducibility Levels
Repetition: re-running the original micro- ormacro-experiment without any variation of the parameters,should drive to the same results and a certain level ofcredibility is guaranteed (completeness of documentation)
Replication: is related to the system hosting an experiment.An experiment should not be bound to a specific computeenvironment (portability)
Re-experimentation: if changing the methods drives to thesame outputs, the scientific approach is proven (correctnessof the approach)
Prologue Taxonomy PROVA! Conclusions
Our Goals
Provide a clean and easy to use environment
Do not introduce overhead that affects the performancemeasurements
Allow on-the-fly generation of graphs, regarding the chosenperformance metric (so far GFlops/s)
Automatic generation of the documentation, regarding bothsystem and software stack
Support the reproducibility levels for benchmark experiments
Provide collaborative approach
Create an ecosystem of trusted scientific computing, startingfrom our field
Prologue Taxonomy PROVA! Conclusions
Our Goals
Provide a clean and easy to use environment
Do not introduce overhead that affects the performancemeasurements
Allow on-the-fly generation of graphs, regarding the chosenperformance metric (so far GFlops/s)
Automatic generation of the documentation, regarding bothsystem and software stack
Support the reproducibility levels for benchmark experiments
Provide collaborative approach
Create an ecosystem of trusted scientific computing, startingfrom our field
Prologue Taxonomy PROVA! Conclusions
Our Goals
Provide a clean and easy to use environment
Do not introduce overhead that affects the performancemeasurements
Allow on-the-fly generation of graphs, regarding the chosenperformance metric (so far GFlops/s)
Automatic generation of the documentation, regarding bothsystem and software stack
Support the reproducibility levels for benchmark experiments
Provide collaborative approach
Create an ecosystem of trusted scientific computing, startingfrom our field
Prologue Taxonomy PROVA! Conclusions
Our Goals
Provide a clean and easy to use environment
Do not introduce overhead that affects the performancemeasurements
Allow on-the-fly generation of graphs, regarding the chosenperformance metric (so far GFlops/s)
Automatic generation of the documentation, regarding bothsystem and software stack
Support the reproducibility levels for benchmark experiments
Provide collaborative approach
Create an ecosystem of trusted scientific computing, startingfrom our field
Prologue Taxonomy PROVA! Conclusions
Our Goals
Provide a clean and easy to use environment
Do not introduce overhead that affects the performancemeasurements
Allow on-the-fly generation of graphs, regarding the chosenperformance metric (so far GFlops/s)
Automatic generation of the documentation, regarding bothsystem and software stack
Support the reproducibility levels for benchmark experiments
Provide collaborative approach
Create an ecosystem of trusted scientific computing, startingfrom our field
Prologue Taxonomy PROVA! Conclusions
Our Goals
Provide a clean and easy to use environment
Do not introduce overhead that affects the performancemeasurements
Allow on-the-fly generation of graphs, regarding the chosenperformance metric (so far GFlops/s)
Automatic generation of the documentation, regarding bothsystem and software stack
Support the reproducibility levels for benchmark experiments
Provide collaborative approach
Create an ecosystem of trusted scientific computing, startingfrom our field
Prologue Taxonomy PROVA! Conclusions
Our Goals
Provide a clean and easy to use environment
Do not introduce overhead that affects the performancemeasurements
Allow on-the-fly generation of graphs, regarding the chosenperformance metric (so far GFlops/s)
Automatic generation of the documentation, regarding bothsystem and software stack
Support the reproducibility levels for benchmark experiments
Provide collaborative approach
Create an ecosystem of trusted scientific computing, startingfrom our field
Prologue Taxonomy PROVA! Conclusions
PROVA! - First Version of the Architecture
https ssh Web Browser
Remote Environment
Cluster
ParallelmachineParallel
machine...Parallelmachine
File Storage
workspaces
WorkspaceUserN
WorkspaceUser1
...
NFS
Experimentand Analysis
Server
Front-end machine
Framework
Prologue Taxonomy PROVA! Conclusions
Replication Workflow
AdminModules
Add, removeList available and installed
ScientistProjects
Add, remove, list
Methods
Add, remove implemented project codeList, download available local implemented example code
Network interface to manage experiments
Prologue Taxonomy PROVA! Conclusions
Workflow Structure
The root directory ($BENCHROOT) contains the following folders:$BENCHROOT
modules avail
openmp
mpi
modules installed
openmp
scripts
compile.sh
run.sh
...
util
Prologue Taxonomy PROVA! Conclusions
Workspace Structure
The root directory ($WORKSPACE) contains as many folders asthe projects defined by the user:
$WORKSPACE
Project 1
Method 1
src
Makefile
example.c
dim input.h
out
DATE-PARAMS-THREADS.out
bin
project
Prologue Taxonomy PROVA! Conclusions
How to install a module?
1 #! / b i n / bash2 s e t −e34 PWD PATH=‘pwd ‘5 PACKAGE NAME=” mpich−3 . 1 . 2 . t a r . gz ”6 URL=” h t t p : / /www. mpich . org / s t a t i c / downloads / 3 . 1 . 2 /$PACKAGE NAME”7 RELEASE=” mpich−3.1.2 ” #name o f t h e f o l d e r i n s i d e t h e package downloaded89 i f [ −a $PACKAGE NAME ] ; then
10 rm $PACKAGE NAME ;11 f i12 #Download Mpi13 wget $URL14 #e x t r a c t Mpi15 t a r x f $PACKAGE NAME16 rm −f $PACKAGE NAME17 i f [ −d ” b i n ” ] ; then18 rm −r f b i n ;19 f i20 cd $RELEASE21 #c o n f i g u r e to i n s t a l l i n a l o c a l d i r e c t o r y22 . / c o n f i g u r e −−p r e f i x=$PWD PATH/ b i n −−d i s a b l e−f o r t r a n −−d i s a b l e−f c2324 NPROC=‘ nproc ‘25 i f [ $NPROC −gt 1 ] ; then26 make −j$NPROC27 e l s e28 make29 f i30 make i n s t a l l
Prologue Taxonomy PROVA! Conclusions
Modules Management
So far modules are managed by mean of scripts:
admin has to write the installation script and care aboutdependencies
users must ask him to install a module before using it in theirprojects
Our idea is to exploit Lmod1, an environmental modules system, toprovide a convenient way to dynamically change the users’environment through modulefiles. This includes easily adding orremoving directories to the PATH environment variable. Amodulefile contains the necessary information to allow a user torun a particular application or provide access to a particular library.
On a higher level, we plan to use Easybuild2, a software build andinstallation framework that allows you to manage (scientific)software on High Performance Computing (HPC) systems in anefficient way. It can be used for keeping track of versions and easilybuild a consistent software stack.
a flexible framework for building/installing (scientific) software
fully automates software builds
allows for easily reproducing previous builds
keep the software build recipes/specifications simple andhuman-readable
supports co-existence of versions/builds via dedicatedinstallation prefix and module files
Possibility to save a configuration file with the modules youuse more often
Possible to create such a file and attach it to an experimentExport source code + environment
. . .
Prologue Taxonomy PROVA! Conclusions
PROVA! - Current Version of the Architecture
https ssh Web Browser
Remote Environment
Cluster
ParallelmachineParallel
machine...Parallelmachine
File Storage
workspaces
WorkspaceUserN
WorkspaceUser1
...
NFS
Experimentand Analysis
Server
Front-end machine
Framework Scheduler
Prologue Taxonomy PROVA! Conclusions
New Workflow Structure
The root directory ($BENCHROOT) contains the following folders:$BENCHROOT
methodType avail
software
lua
lmod
python
easybuild
modules
my easyblocks
software
...
...
Prologue Taxonomy PROVA! Conclusions
Commands overview
eb cmd Use EasyBuild included in the tool
methodType Manage method types
project Manage projects (CRUD)
method Manage methods (CRUD)
compile Manage compile of the methods of a project
run Manage run of methods of a project
exp Manage the experiment definition and execution
run exp Manage the execution of an experiment (compile+ run + gather)
Prologue Taxonomy PROVA! Conclusions
Sample usage
Initialization
Prologue Taxonomy PROVA! Conclusions
Sample usage
Initialization
cd $PROVAROOT; source ./util/BaseSetup.sh ˜/myworkspace
Prologue Taxonomy PROVA! Conclusions
Sample usage
Installation of the tool
Prologue Taxonomy PROVA! Conclusions
Sample usage
Installation of the tool
workflow install
Prologue Taxonomy PROVA! Conclusions
Sample usage
List available method types
Prologue Taxonomy PROVA! Conclusions
Sample usage
List available method types
workflow methodType −l
Prologue Taxonomy PROVA! Conclusions
Sample usage
List available method types
workflow methodType −l
Output:
Installed :
Available :
List of MethodTypes in $BENCHROOT/methodType avail:MethodTypes: OpenMP−4.0−GCC−4.9.2 OpenMPI−1.8.4−GCC−4.9.2
Prologue Taxonomy PROVA! Conclusions
Sample usage
How to get new methodType?
Prologue Taxonomy PROVA! Conclusions
Sample usage
How to get new methodType?
Add a directory to the ”methodType avail” path:
methodType avail
methodTypeName
.methodType
compile.sh
run.sh
src
example.c
Makefile
...
Prologue Taxonomy PROVA! Conclusions
Sample usage
How to get new methodType?
1 {2 ”name ” : ” OpenMPI−1.8.4−GCC−4.9.2” ,3 ” eb modu les ” : [4 ”OpenMPI/1.8.4−GCC−4.9.2”5 ] ,6 ” v e r s i o n ” : ” 1 . 0 ” ,7 ”comment ” : ” OpenMPI 1 . 8 . 4 based on GCC 4 . 9 . 2 ”8 }
Prologue Taxonomy PROVA! Conclusions
Sample usage
Installation of a methodType
Prologue Taxonomy PROVA! Conclusions
Sample usage
Installation of a methodType
workflow methodType −i OpenMP−4.0−GCC−4.9.2
Prologue Taxonomy PROVA! Conclusions
Sample usage
Installation of a methodType
1 f n I n s t a l l ( ){2 #r e a d method t y p e name3 METHOD TYPE=$14 #r e a d E a s y B u i l d modules needed by t h e method t y p e i n EB MODULES5 r e a d −r a EB MODULES <<< $ ( awk −F ’ [ ”” ] ’ ’/ eb modu les /{ g e t l i n e ; w h i l e ( $0
! ˜ / ] / ) { p r i n t $2 ; g e t l i n e ; } } ’ $ m e t h o d T y p e a v a i l d i r /$METHOD TYPE/ . methodType )
6 #t a k e t h e number o f E a s y B u i l d modules7 ebModLen=${#EB MODULES[∗ ]}8 #f o r each E a s y B u i l d module check i f i n s t a l l e d . I f not , t r y to i n s t a l l i t !9 f o r ( ( i =0; i<${ebModLen} ; i++ ) )
10 do11 #t r y to i n s t a l l eb module12 EB MOD=”${EB MODULES [ $ i ]//\//−}”13 echo ” Try to i n s t a l l $EB MOD . eb ”14 eb ”$EB MOD . eb −r ” | | t r u e #s k i p e r r o r and check a f t e r w a r d s15 #t r y to l o a d module ’ s c a t e g o r y16 EB MOD CAT=$ ( c a t $EASYBUILD PREFIX/ e b f i l e s r e p o /${EB MOD%%−∗}/$EB MOD
. eb | awk ’/ m o d u l e c l a s s /{ p r i n t $3 } ’ )17 cd $EASYBUILD PREFIX/ modules18 module use $EB MOD CAT19 #check t h e module i n s t a l l a t i o n20 . . .21 done22 }
Prologue Taxonomy PROVA! Conclusions
Sample usage
Creation of a project into the defined workspace
Prologue Taxonomy PROVA! Conclusions
Sample usage
Creation of a project into the defined workspace
workflow project −c −p myfirstproject −−params x max y max −−threads 2−−values 20 20 −−comment ”first project with PROVA!”
Prologue Taxonomy PROVA! Conclusions
Sample usage
Creation of a project into the defined workspace
workflow project −c −p myfirstproject −−params x max y max −−threads 2−−values 20 20 −−comment ”first project with PROVA!”
Output:
I will create the project : ” myfirstproject ” in the folder :/ scicore /home/burkhart/guerrera/newwork space/Project : ” myfirstproject ” created !You can add a method to this project running:workflow method −c −p myfirstproject −m method type −n method name
Jobs: no clue about when the job finishes its execution
Homogeneity of nodes: libraries and sw are installed on ashared NFS so all the nodes must be homogeneous to runsuch sw
Experiment is run as a block: bad resource usage
Installation of the modules is simply delegated to EasyBuild
Visualization is not so powerful
Prologue Taxonomy PROVA! Conclusions
Current limitations
Jobs: no clue about when the job finishes its execution
Web interface has a field for showing real time the output (notuseful anymore)How to get notified? (email not useful)Solved by using GC3Pie (from the next version)
Homogeneity of nodes: libraries and sw are installed on ashared NFS so all the nodes must be homogeneous to runsuch sw
Experiment is run as a block: bad resource usage
Installation of the modules is simply delegated to EasyBuild
Visualization is not so powerful
Prologue Taxonomy PROVA! Conclusions
Current limitations
Jobs: no clue about when the job finishes its execution
Homogeneity of nodes: libraries and sw are installed on ashared NFS so all the nodes must be homogeneous to runsuch sw
Experiment is run as a block: bad resource usage
Installation of the modules is simply delegated to EasyBuild
Visualization is not so powerful
Prologue Taxonomy PROVA! Conclusions
Current limitations
Jobs: no clue about when the job finishes its execution
Homogeneity of nodes: libraries and sw are installed on ashared NFS so all the nodes must be homogeneous to runsuch sw
Solution: EasyBuild allows to install on all nodes via jobsDrawback: must manage different mount points for thesoftware stack (EB delegates it to sys admins - architecturespecific)Issue: in a job there is a command of type ”workflow”: wemust first make it more fine grain, writing in the job only bashcommands (thus PROVA! can be installed on the front-endonly)
Experiment is run as a block: bad resource usage
Installation of the modules is simply delegated to EasyBuild
Visualization is not so powerful
Prologue Taxonomy PROVA! Conclusions
Current limitations
Jobs: no clue about when the job finishes its execution
Homogeneity of nodes: libraries and sw are installed on ashared NFS so all the nodes must be homogeneous to runsuch sw
Experiment is run as a block: bad resource usage
Installation of the modules is simply delegated to EasyBuild
Visualization is not so powerful
Prologue Taxonomy PROVA! Conclusions
Current limitations
Jobs: no clue about when the job finishes its execution
Homogeneity of nodes: libraries and sw are installed on ashared NFS so all the nodes must be homogeneous to runsuch sw
Experiment is run as a block: bad resource usage
it allocates the max cores required by the experiment since thebeginning
Installation of the modules is simply delegated to EasyBuild
Visualization is not so powerful
Prologue Taxonomy PROVA! Conclusions
Current limitations
Jobs: no clue about when the job finishes its execution
Homogeneity of nodes: libraries and sw are installed on ashared NFS so all the nodes must be homogeneous to runsuch sw
Experiment is run as a block: bad resource usage
Installation of the modules is simply delegated to EasyBuild
Visualization is not so powerful
Prologue Taxonomy PROVA! Conclusions
Current limitations
Jobs: no clue about when the job finishes its execution
Homogeneity of nodes: libraries and sw are installed on ashared NFS so all the nodes must be homogeneous to runsuch sw
Experiment is run as a block: bad resource usage
Installation of the modules is simply delegated to EasyBuild
in the first version libraries and software stack were notmanaged, butnow you need to know EasyBuild to customize the softwareyou use: i.e changing a flag in the command line vs creating anew easyconfig
Visualization is not so powerful
Prologue Taxonomy PROVA! Conclusions
Current limitations
Jobs: no clue about when the job finishes its execution
Homogeneity of nodes: libraries and sw are installed on ashared NFS so all the nodes must be homogeneous to runsuch sw
Experiment is run as a block: bad resource usage
Installation of the modules is simply delegated to EasyBuild
Visualization is not so powerful
Prologue Taxonomy PROVA! Conclusions
Conclusions
Reproducibility needs to be emphasized in computationalsciences.
Repeatability of an experiment only possible if precisedescription of experiment is given: Problem, System, andMethod.
Repeatability: World-wide access to experiments throughInternet feasible (security and authentication mechanismsessential).
Replication and re-experimentation: harder to achieve but notimpossible.
Prologue Taxonomy PROVA! Conclusions
Future Work
Porting to external HPC environments
https ssh Web Browser
Remote Environment
Cluster
ParallelmachineParallel
machine...Parallelmachine
File Storage
workspaces
WorkspaceUserN
WorkspaceUser1
...
NFS
git git
Experimentand Analysis
Server
Front-end machine
Framework
Remote Storage
User1 repository
Workspace UserN repository
Workspace...
Cloud
Scheduler
Collaborative Performance Engineering
Integration of Performance Tool (Likwid)
Correctness check for experiments results
Reorganize the visualization of experiments and results