Top Banner
Parallel Serial Jobs Using GNU PARALLEL Wei Feinstein HPC User Services LSU HPC & LONI [email protected] Louisiana State University Febuary, 2017 Parallel Serial Jobs Using GNU Parallel
44

Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

Jul 04, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

Parallel Serial Jobs Using GNU PARALLEL

Wei Feinstein HPC User Services LSU HPC & LONI [email protected]

Louisiana State University Febuary, 2017

Parallel Serial Jobs Using GNU Parallel

Page 2: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

Overview•  Dilemmaofrunningserialjobsonmodernclusters

•  GNUPARALLEL•  Introduc@on•  Toolflags•  Examples

Parallel Serial Jobs Using GNU Parallel

Page 3: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

LSU/LONIHPCEnvironment

•  Clusters, such as MIKE/SMIC/QB2, designed to run large-scale parallel tasks.

•  Full nodes assigned - access to 8, 16, 20, or even 48 cores per node, depending on systems.

Q: How to handle thousands of serial (1-core)

tasks without going crazy?

3Parallel Serial Jobs Using GNU Parallel

Page 4: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

ProblemSchema@c

Heap'O'Data

Serial Process

Pile'O'Results

Single process applied to many input pieces!

Generates many output

Parallel Serial Jobs Using GNU Parallel

Page 5: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

Jobsubmissionexamplesusedbysomeusers

Parallel Serial Jobs Using GNU Parallel

Page 6: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

Runningblastpwithonecore#! /bin/bash blast-1core.qsub1#PBS -A hpc_hpcadmin3#PBS -l nodes=1:ppn=1#PBS -l walltime=00:20:00#PBS –q single#PBS –N blastp-worst

export DIR=/project/$USER/distribution_workload/blastcd $DIR; blastp –query data/input1.faa –db db/xxx –out output/input1.out

#! /bin/bash blast-1core.qsub2#PBS -A hpc_hpcadmin3#PBS -l nodes=1:ppn=1#PBS -l walltime=00:20:00#PBS –q single#PBS –N blastp-worst

export DIR=/project/$USER/distribution_workload/blastcd $DIR; blastp –query data/input2.faa –db db/xxx –out output/input2.out

Needtosub

mitNjobsgive

nNinputs

Parallel Serial Jobs Using GNU Parallel

Page 7: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

Runningblastpwithonenode#!/bin/bash blast-1node.qsub#PBS-Ahpc_hpcadmin3#PBS-lnodes=1:ppn=16#PBS-lwall@me=00:20:00#PBS–qworkq#PBS–Nblastp-be8erexportDIR=/project/$USER/distribu@on_workload/blastcd$DIR;blastp–querydata/input1.faa–dbdb/xxx–outoutput/input1.out&blastp–querydata/input2.faa–dbdb/xxx–outoutput/input2.out&…blastp–querydata/input16.faa–dbdb/xxx–outoutput/input16.out&wait#allthechildprocessestofinishbeforetermina@ngtheparentprocess

•  Onlysinglenodecanbeused•  Samecommandsyntaxwithdifferentinputs

Parallel Serial Jobs Using GNU Parallel

Page 8: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

Mul@-NodeConsidera@ons●

The mother superior node is the only one holds all the job information, like environment variables, list of node names, etc. Start programs on other nodes with remote shell commands, like ssh. Assure all programs finish before script exits.

Parallel Serial Jobs Using GNU Parallel

Page 9: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

Runningblastpwithtwonodes#! /bin/bash blast-2nodes.qsub#PBS -A hpc_hpcadmin3#PBS -l nodes=2:ppn=16#PBS -l walltime=00:20:00#PBS –q workq#PBS –N blastp-2nodesexport DIR=/project/$USER/distribution_workload/blastcd $DIR; mkdir output

# on compute node1(mother superior )blastp –query data/input1.faa –db db/xxx –out output/input1.out &…blastp –query data/input8.faa –db db/xxx –out output/input8.out &# start tasks on compute node2ssh -n $HOST2 “cd $DIR; blastp –query data/input9.faa –db db/xxx –out output/input9.out” &…ssh -n $HOST2 “cd $DIR; blastp –query data/input16.faa –db db/xxx –out output/input16.out” &wait #all the child processes to finish before terminating the parent process

Whatifusing10node

s??

Parallel Serial Jobs Using GNU Parallel

Page 10: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

DesiredSolu@ons

§  Avoiddetailedscrip@ngrequirements-butallowflexibilityandadaptability.

§  Minimizecustomiza@onandmakeparalleliza@onautomagically.

§  Batchenvironmentaware,par@cularlywallclock@meconstraints.

Parallel Serial Jobs Using GNU Parallel

Page 11: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

GNUParallel

§  A shell tool to execute independent jobs in parallel using one/more computers

§  A job can be a single command or script §  Typical input could be a list of files, a list of parameters… §  Under the hood: the mother superior spawns ssh

connections to each remote node/core, where an independent task is conducted

https://www.gnu.org/software/parallel/

Parallel Serial Jobs Using GNU Parallel

Page 12: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

GNUParallelinstalledonLSU/LONIclusters

Distributed Workload with GNU Parallel 4

•  On SuperMike2: soft add +gnuparallel-20161022-gcc-4.4.6

•  On QB2/SMIC:

module load gnuparallel/20170122

$parallel--versionGNUparallel20170122Copyright(C)2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017OleTangeandFreeSolwareFounda@on,Inc.LicenseGPLv3+:GNUGPLversion3orlater<hnp://gnu.org/licenses/gpl.html>Thisisfreesolware:youarefreetochangeandredistributeit.…

Parallel Serial Jobs Using GNU Parallel

Page 13: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

SerialBlastJob(run_blast.sh)

#!/bin/bashexportDATADIR=/project/$USER/GNU_PARALLEL/blastblastp-query$1-db$DATADIR/db/img_v400_PROT.00–out$2-ouqmt7-max_target_seqs10-num_threads1

[wfeinste@mike421blast]$headinput.lst/project/wfeinste/GNU_PARALLEL/blast/data/test33.faa/project/wfeinste/GNU_PARALLEL/blast/data/test21.faa/project/wfeinste/GNU_PARALLEL/blast/data/test12.faa...

Parallel Serial Jobs Using GNU Parallel

Page 14: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

DistributeSerialtasks(blast_job.pbs)

#!/bin/bash#PBS-lnodes=2:ppn=16#PBS-lwalltime=1:00:00#PBS-Ahpc_hpcadmin3#PBS-qworkq#PBS–Nblast#PBS–joeexportWDIR=/project/$USER/GNU_PARALLEL/serialcd$WDIR;exportJOBS_PER_NODE=16#parallelcommandflagsPARALLEL=“parallel-j$JOBS_PER_NODE--slf$PBS_NODEFILE--wd$WDIR --jobloglogs/runtask.log--resume”#gnu-parallellaunchserialtasks$PARALLEL–ainput.lstshrun_blast.sh{}output/{/.}.out

[wfeinste@mike421blast]$headinput.lst/project/wfeinste/GNU_PARALLEL/blast/data/test33.faa/project/wfeinste/GNU_PARALLEL/blast/data/test21.faa/project/wfeinste/GNU_PARALLEL/blast/data/test12.faa...

Parallel Serial Jobs Using GNU Parallel

Page 15: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

Parallel Serial Jobs Using GNU Parallel

GNUParallelPerformance(1000serialblasttasks)

Page 16: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

GNUParallelSyntax

•  Reading command arguments on the command line: •  parallel[OPTIONS]COMMAND{}:::TASKLIST

•  Reading command arguments from an input file: •  parallel[OPTIONS]COMMAND{}::::TASKLIST.LST•  parallel–aTASKLIST.LST[OPTIONS]COMMAND{}

Parallel Serial Jobs Using GNU Parallel

Page 17: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

TASKLISTfromcommandlineparallel[OPTIONS]COMMAND{}:::TASKLIST

1)  parallelecho{}:::AB

AB

2)Parallelecho{1}{2}:::AB:::CD

ACADBCBD

3)parallel--linkecho{1}{2}:::AB:::CD

ACBD

4)Parallel--linkapp-xxx{1}{2}:::a1a2:::b1b2app-xxa1b1app-xxa2b2

Parallel Serial Jobs Using GNU Parallel

Page 18: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

TASKLISTFromaFile•  parallel–aTASKLISTCOMMAND{}•  parallel[OPTIONS]COMMAND{}::::TASKLIST

Examples:1)parallel–acsv-file.csvecho{}

1,input1.txt2,input2.txt

2)Parallel--colsep‘\,’–acsv-file.csvecho{1}

12

3)Parallel--colsep‘\,’–acsv-file.csvecho{1}{2}

1 input1.txt2 input2.txt

$catcsv-file.csv1,input1.txt2,input2.txt

Parallel Serial Jobs Using GNU Parallel

Page 19: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

ManipulateInputString

Defaultstring{}parallelecho{}:::path/input1.fas --->path/input1.fasRemoveextension{.}parallelecho{.}:::path/input1.fas --->path/input1Removepath{/}parallelecho{/}:::path/input1.fas --->input1.fasRemovepathandextension{/.}parallelecho{/.}:::path/input1.fas --->input1Changeextensionandpathparallelechooutput/{/.}.out:::path/input1.fas --->output/input1.out

Parallel Serial Jobs Using GNU Parallel

Page 20: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

GNUParallelOp@ons/Flagsparallel–ainput.lst[OPTIONS]COMMAND{}

Options:--jobs --slf --wd --progress --joblog--resume --timeout …

Parallel Serial Jobs Using GNU Parallel

Page 21: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

--jobs(-j)•  --jobsN(-jN)

–  Number of jobs/per machine (node). Run up to N jobs in parallel.

–  0 means as many as possible. Default is 100% which will run one job per CPU core on each machine.

–  On HPC/LONI clusters, N is number of jobslots per node. –  Make sure you use GNU Parallel version >=20161022

•  -j+/-N CPU cores +/- N jobs on each node.

Parallel Serial Jobs Using GNU Parallel

Page 22: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

--slf(--sshloginfile)--slfnodefile(--sshloginfile $PBS_NODEFILE)

usedwhenmorethanonenodesisrequestedforajobcat$PBS_NODEFILE

mike421 mike421 ... mike421 mike429 mike429 ... mike429

16coresonmike421

16coresonmike429

Parallel Serial Jobs Using GNU Parallel

Page 23: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

--wd(--workkdir)

•  Default working dir on remote machines is login ~

•  Can be changed by --wd e.g. --wd $PBS_O_WORKDIR

Parallel Serial Jobs Using GNU Parallel

Page 24: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

--joblog•  Generate a log file of each completed sub-tasks •  Used as checkpoints to resume unfinished tasks (--resume) •  Identify failed jobs

Seq HostStarJme JobRunMme SendReceiveExitvalSignalCommand1  qb0681487624896.425 256.295 0 65990 0 0 shrun_namd.sh/project/wfeinste/GNU_PARALLEL/MPI/input/ubq_ws_eq.conf83  qb0681487624896.435 256.294 0 65832 0 0 shrun_namd.sh/project/wfeinste/GNU_PARALLEL/MPI/input/ubq_ws_eq2.conf82  qb0581487624896.430 259.486 0 65849 0 0 shrun_namd.sh/project/wfeinste/GNU_PARALLEL/MPI/input/ubq_ws_eq1.conf8…..

Parallel Serial Jobs Using GNU Parallel

Page 25: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

--resume --resume --joblog job.log

•  Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc.

•  If you need to rerun a GNU Parallel job from the start, be sure to delete old job.log

Parallel Serial Jobs Using GNU Parallel

Page 26: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

--progressShow overall computational progress Computers/CPUcores/Maxjobstorun1:mike136/16/162:mike264/16/16Computer:jobsrunning/jobscompleted/%ofstartedjobs/Averagesecondstocompletemike136:16/0/50%/0.0smike264:16/0/50%/0.0susinginput:../blast/data/test1.faaoutput/test1.outmike136:16/1/51%/31.0smike264:16/0/48%/0.0susinginput:../blast/data/test2.faaoutput/test2.outmike136:16/1/50%/33.0smike264:16/1/50%/33.0susinginput:../blast/data/test3.faaoutput/test3.out

Parallel Serial Jobs Using GNU Parallel

Page 27: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

--timeoutsecs•  A job gets terminated if the command runs

longer than xxx seconds. •  Useful if you know the command should have

failed if running longer than a threshold.

Parallel Serial Jobs Using GNU Parallel

Page 28: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

Op@onstolimitresources•  ToavoidoverloadingsystemsGNUparallelcanlookatthesystemloadbeforestar@nganotherjobe.g.:parallel--load100%echoloadislessthan{}jobpercpu:::1

•  Checkifthesystemisswapping

e.g.:parallel--noswapechothesystemisnotswapping:::now•  Using--memfreetocheckifthereisenoughmemoryfree.Theyoungestjobifthememoryfreefallsbelow50%.Thekilledjobwillputbackonthequeueandretriedlater.e.g.parallel--memfree1Gechowillrunifmorethan1GBis:::free

Parallel Serial Jobs Using GNU Parallel

Page 29: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

ThreeExamples•  Serialjob(blastpusingsinglecore)blastp –query input.fna –db db/xxx –out output/ input.out -num_threads 1

•  MulMple-threadedjob(blastpusingmulMplecores) blastp –query input.fna –db db/xxx –out output/ input.out -num_threads 4

•  MPIjobs(namd) mpirun –np 2 `which namd2` input.conf

Parallel Serial Jobs Using GNU Parallel

Page 30: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

SerialExample•  Serialjob(blastpusingsinglecore)blastp –query input.fna –db db/xxx –out output/ input.out -num_threads 1

•  Mul@ple-threadedjob(blastpusingmul@plecores) blastp –query input.fna –db db/xxx –out output/ input.out -num_threads 4

•  MPIjobs(namd): mpirun –np 2 `which namd2` input.conf

Parallel Serial Jobs Using GNU Parallel

Page 31: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

SerialBlastJob(serial/run_blast.sh)

#!/bin/bashexportDATADIR=/project/$USER/GNU_PARALLEL/blastblastp-query$1-db$DATADIR/db/img_v400_PROT.00–out$2-ouqmt7-max_target_seqs10-num_threads1

[wfeinste@mike421blast]$headinput.lst/project/wfeinste/GNU_PARALLEL/blast/data/test33.faa/project/wfeinste/GNU_PARALLEL/blast/data/test21.faa/project/wfeinste/GNU_PARALLEL/blast/data/test12.faa...

Parallel Serial Jobs Using GNU Parallel

Page 32: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

DistributeSerialtasks(serial/blast_job.pbs)

#!/bin/bash#PBS-lnodes=2:ppn=16#PBS-lwalltime=1:00:00#PBS-Ahpc_hpcadmin3#PBS-qworkq#PBS–Nblast#PBS–joeexportWDIR=/project/$USER/GNU_PARALLEL/serialcd$WDIR;exportJOBS_PER_NODE=16#parallelcommandoptionsPARALLEL=“parallel-j$JOBS_PER_NODE--slf$PBS_NODEFILE

--wd$WDIR--jobloglogs/runtask.log--resume”#gnu-parallellaunchserialtasks$PARALLEL–ainput.lstshrun_blast.sh{}output/{/.}.out

[wfeinste@mike421blast]$headinput.lst/project/wfeinste/GNU_PARALLEL/blast/data/test33.faa/project/wfeinste/GNU_PARALLEL/blast/data/test21.faa/project/wfeinste/GNU_PARALLEL/blast/data/test12.faa...

Parallel Serial Jobs Using GNU Parallel

Page 33: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

StepstoforTes@ng

ItisVERYimportanttotestyourscriptsbeforesubmiynghundredsofserialtasks1.  Testyourserialtask2.  Testtheparalleljobinterac@vely

Parallel Serial Jobs Using GNU Parallel

Page 34: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

TestSerialScript(Step1)

Step1:blastp–query$DATADIR/data/test1.faa-db$DATADIR/db/img_v400_PROT.00–out$WORKDIR/output/test1.out-ouqmt7-max_target_seqs10-num_threads1

#!/bin/bashexportDATADIR=/project/$USER/GNU_PARALLEL/blastblastp-query$1-db$DATADIR/db/img_v400_PROT.00–out$2-ouqmt7-max_target_seqs10-num_threads1

Parallel Serial Jobs Using GNU Parallel

Page 35: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

TestGNUParallel(Step2)

Step2:•  Request an interactive node

qsub –I –A xxx –l nodes=1:ppn=16 –l walltime=2:00:00 •  Interactively run above cmd line by line

#PBS…exportWDIR=/project/$USER/GNU_PARALLEL/serialcd$WDIR;exportJOBS_PER_NODE=16#parallelcommandoptionsPARALLEL=“parallel-j$JOBS_PER_NODE--slf$PBS_NODEFILE--wd$WDIR --jobloglogs/task.log--resume”#gnu-parallellaunchserialtasks$PARALLEL–ainput.lstshrun_blast.sh{}output/{/.}.out

Parallel Serial Jobs Using GNU Parallel

Page 36: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

Mul@-threadExample•  Serialjob(blastpusingsinglecore)blastp –query input.fna –db db/xxx –out output/ input.out -num_threads 1

•  MulMple-threadedjob(blastpusingmulMplecores) blastp –query input.fna –db db/xxx –out output/ input.out -num_threads 4

•  MPIjobs(namd): mpirun –np 2 `which namd2` input.conf

Parallel Serial Jobs Using GNU Parallel

Page 37: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

Mul@-threadBlastTask(mul@_threads/run_blast.sh)

#!/bin/bashecho"usinginput:$1$2$3"exportDATADIR=/project/$USER/GNU_PARALLEL/blastblastp-query$1-db$DATADIR/db/img_v400_PROT.00–out$2-ouqmt7-max_target_seqs10-num_threads$3

Parallel Serial Jobs Using GNU Parallel

Page 38: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

Distributemul@-threadtasks(mul@_threads/blast_job.pbs)

Parallel Serial Jobs Using GNU Parallel

#!/bin/bash#PBS-lnodes=2:ppn=16#PBS-lwalltime=1:00:00#PBS-Ahpc_hpcadmin3#PBS-qworkq#PBS–Nblast-mt-parallel#PBS–joeexportWDIR=/project/$USER/GNU_PARALLEL/mul@_threadscd$WDIR;exportJOBS_PER_NODE=8exportNTHREADS=2#parallelcommandoptionsPARALLEL=“parallel-j$JOBS_PER_NODE--slf$PBS_NODEFILE--wd

$WDIR--jobloglogs/runtask.log--resume”#gnu-parallellaunchserialtasks$PARALLEL–ainput.lstshrun_blast.sh{}output/{/.}.out$NTHREADS\

Page 39: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

MPIExample•  Serialjob(blastpusingsinglecore)blastp –query input.fna –db db/xxx –out output/ input.out -num_threads 1

•  Mul@ple-threadedjob(blastpusingmul@plecores) blastp –query input.fna –db db/xxx –out output/ input.out -num_threads 4

•  MPIjobs(namd) mpirun –np 2 `which namd2` input.conf

Parallel Serial Jobs Using GNU Parallel

Page 40: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

SingleMPINAMDTask(MPI/run_namd.sh)

#!/bin/bashecho"$1$2$3"mpirun-np$2`whichnamd2`$1>$3

Parallel Serial Jobs Using GNU Parallel

Page 41: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

DistributeMPIJobs(MPI/namd_job.pbs)

#!/bin/bash#PBS-lnodes=2:ppn=16#PBS-lwalltime=1:00:00#PBS-Ahpc_hpcadmin3#PBS-qworkq#PBS–Nblast#PBS–joeexportWDIR=/project/$USER/GNU_PARALLEL/MPIcd$WDIR;exportJOBS_PER_NODE=2exportNPROCS=8 #launch8MPIprocesses/task#parallelcommandoptionsPARALLEL=“parallel-j$JOBS_PER_NODE--slf$PBS_NODEFILE--wd$WDIR --jobloglogs/runtasks.log--resume”#gnu-parallellaunchserialtasks$PARALLEL–ainput.lstshrun_namd.sh{}$NPROCSoutput/{/.}.log

[wfeinste@mike421blast]$catinput.lst/project/wfeinste/GNU_PARALLEL/MPI/input/ubq_ws_eq.conf/project/wfeinste/GNU_PARALLEL/MPI/input/ubq_ws_eq1.conf/project/wfeinste/GNU_PARALLEL/MPI/input/ubq_ws_eq2.conf/project/wfeinste/GNU_PARALLEL/MPI/input/ubq_ws_eq3.conf….

Parallel Serial Jobs Using GNU Parallel

Page 42: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

LoadBalancing•  GNU Parallel starts next job once the previous finishes •  CPUs are kept busy to the max level

No Load Balancing With Load Balancing

Parallel Serial Jobs Using GNU Parallel

Page 43: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

Conclusions•  GNUParallelisaneffec@vetoolandeasytouse•  Parallelindependentjobs,olenwithdifferentparameters

•  Serialtasks(1-core)•  Mul@-threadtasks(mul@plecores)•  MPIjobs

•  Bewiseonhowmanynodestorequest•  Toomanysshconnec@onsfromthemothersuperior•  ToomanyCPUsmaybecomeidletowaitforthelastafewtaskstofinish

•  Thoroughtes@ngbeforestartanyproduc@onruns

Parallel Serial Jobs Using GNU Parallel

Page 44: Parallel Serial Jobs Using GNU PARALLEL--resume --resume --joblog job.log • Allows GNU Parallel to resume a job where it is left out due to failure, time out and etc. • If you

FutureTrainings

1.February22,2017:ParallelSerialJobsUsingGNUParallel2.March8,2017:[email protected],2017:[email protected],2017:[email protected],2017:IntermidiatePythonProgramming6.April5,2017:MachineLearninginHPCEnvironments

hnp://www.hpc.lsu.edu/training/tutorials.php#upcoming

Parallel Serial Jobs Using GNU Parallel