Makefile::Parallel - Dependency specification language
Post on 30-Jun-2015
2612 Views
Preview:
Transcript
Makefile::Parallel
Dependency Specification Language
Rúben Fonseca
root@cpan.org
"Well done is quickly done"Ceaser Augustus
~ 100 64-bits cores~ 100Gb RAM~ 4TB storageMyrinet 10GbLinux (CentOS)
SeARCH Cluster
How to run those processes?
3 options!
Solution 1
while(! end) { run process $p while(! finished $p) { sleep n }
mark $p as done}
Solution 2
#!/bin/foo
run p1run p2 result1run p3 result2run p4 result1 result2
Solution 3
Makefile::Parallel
:-)
M::P - Design Goals
Formal specification to describe process dependenciesReuse a well known language syntax (don't reinvent the wheel)Embed other languages to reuse their expressive powerMake it easy to run and maintainGenerate profiling dataHandle and recover from errors Run on different parallel / distributed computaing platformsHave fun and profit :-)
M::P - The language
Started as a Makefile subsetEvolved with our own sugar syntaxSupport for parametric rules Perl
Parse::YAPP
M::P - Simple example
prepare: (5:00) mkdir OutputData p <- sub{ print "$_\n" for (3..10) } run$p: prepare (20:00:00) [2] runMyProgram -p $p InputData > OutputData/run.$p cleanup: run$p (5:00) for a in @p; do rm -f OutputData/run.${a}.tmp; done
M::P - Simple exampleprepare: (5:00) mkdir OutputData p <- sub{ print "$_\n" for (3..10) } run$p: prepare (20:00:00) [2] runMyProgram -p $p InputData > OutputData/run.$p cleanup: run$p (5:00) for a in @p; do rm -f OutputData/run.${a}.tmp; done
M::P - Simple exampleprepare: (5:00) mkdir OutputData p <- sub{ print "$_\n" for (3..10) } run$p: prepare (20:00:00) [2] runMyProgram -p $p InputData > OutputData/run.$p cleanup: run$p (5:00) for a in @p; do rm -f OutputData/run.${a}.tmp; done
M::P - Scheduler
Try to run the processes in parallelMust be extensiveShould generate logs and visual profiling data
PBS - Portable Batch System
Job Scheduling for Clusters
TorqueOpenPBS Common API
qsubqstatqdeltracejob
M::P Scheduler - Main Loop
do { launch processes with fulfilled deps collect ended processes for each proc in ended_processes if proc has variables to expand calculate variables manipulate the dependency graph save journal sleep 10 seconds} while(!all processes completed)
M::P GetOpts
-local=(n) Schedule on the local machine-pbs Schedule on a PBS capable cluster-continue Recover from the last error-dump Dumps the AST of the parsed specification-clean Remove temporary files-debug (no need to debug)
M::P Profiling
Runtime logging: [...]2006/12/12 10:49:22 The job "ipfp005" is ready to run. Launching2006/12/12 10:49:22 Launched "ipfp005" (23996)2006/12/12 10:49:52 Process 23996 (ipfp005) has terminated [30s]2006/12/12 10:49:52 The job "postipfp005" is ready to run. Launching2006/12/12 10:49:52 Launched "postipfp005" (23997)2006/12/12 10:50:02 Process 23997 (postipfp005) has terminated [10s][...]
M::P Profiling
Final report: ID Start Time End Time Elapsedcodify 2006-12-12T10:41:10 2006-12-12T10:49:11 8m 1sngramsA 2006-12-12T10:49:11 2006-12-12T11:07:46 18m 34sngramsB 2006-12-12T10:49:11 2006-12-12T11:05:44 16m 33sinitmat001 2006-12-12T10:49:11 2006-12-12T10:50:12 1minitmat002 2006-12-12T10:49:11 2006-12-12T10:50:43 1m 31sinitmat003 2006-12-12T10:49:11 2006-12-12T10:51:03 1m 51s[...]
M::P Profiling
M::P Error Handling
M::P Error handling
M::P Real World Usage
NLP process
~ 100 lines of Makefile::Parallel syntax~ 4Gb of textDesktop P4 3Ghz == 1 weekSeARCH cluster == 12 hours
M::P Real World Usage
M::P About the module
use Clone qw(clone);use Cwd;use Data::Dumper;use Digest::MD5;use GraphViz;use Log::Log4perl;use Parse::Yapp;use Proc::Reliable;use Proc::Simple;use Time::HiRes qw(gettimeofday tv_interval);use Time::Interval;use Time::Piece::ISO;
M::P About the module
M::P New features
0.4- support for multiple parametric variables on each rule- new bugs added
0.5- fixed a bunch of bugs- new bugs added
0.6- fixed a small (but important!) bug- ran out of new ideas
Need a direction!
ambs@cpan.org
Thank you
root@cpan.org
top related