Top Banner
Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex
22

Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Dec 16, 2015

Download

Documents

Evan Ferguson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Workshop: Using the VIC3 Cluster for Statistical Analyses

Support perspective

G.J. Bex

Page 2: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Overview

• Cluster VIC3: hardware & software• Statistics research scenario• Worker framework• MapReduce with Worker• Q&A

Page 3: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Birds eye view of VIC3

login1

login2

svcs1svcs2

r1i0n0

r1i0n1 r1i3n15

r2i0n0

r2i0n1

netapp

~vsc30034

/bin

r2i3n15

Page 4: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

VIC3 nodes• Compute nodes

– 112 nodes with 2 quad core 'harpertown', 8GB RAM– 80 nodes with 2 quad core 'nehalem', 24GB RAM– 6 nodes with 2 quad core 'nehalem', 72 GB RAM and local hard disk

• Storage– 20 TB disk space shared between home directories and scratch

space, access via NFS– 4 nodes with disks for a parallel file system (needed for MPI I/O

jobs)

• Service nodes include 2 login nodes

1584 cores, for16.6 TFlop (theoretical peak)

Page 5: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

What can you run?

• All open source linux software• All linux software the K.U.Leuven has a license

for that covers the cluster, and you are a K.U.Leuven staff member

• All linux software you have a license for that covers the cluster

• No Windows software

R, SAS, MATLAB are ok for K.U.Leuven & UHasselt users

Page 6: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Overview

• Cluster VIC3: hardware & software• Statistics research scenario• Worker framework• MapReduce with Worker• Q&A

Page 7: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Running example: SAS code

• Your SAS program, e.g., 'clmk.sas'– is usually interactive– depends on parameters, e.g.,• type of distribution• alpha, beta

– has to be run for several types and values of alpha and beta

Page 8: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Running example: batch mode

• 1st step: convert it for batch mode– capture command line variables:

– run it from the command line:

…%LET type = "%scan(&sysparm, 1, %str(:))";%LET alpha = %scan(&sysparm, 2, %str(:));%LET beta = %scan(&sysparm, 3, %str(:));…

$ sas –batch –noterminal –sysparm discr:1.3:15.0 clmk.sas

Page 9: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

login

I've got a job to do: PBS files

compute nodes

queue system/scheduler:Torque/Moab

#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

sas -batch –noterminal \ -sysparm discr:1.3:15.0 clmk.sas

#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

sas -batch –noterminal \ -sysparm discr:1.3:15.0 clmk.sas

clmk.pbs

$ msub clmk.pbs

Page 10: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

No more modifying!#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

sas -batch –noterminal \ -sysparm discr:1.3:15.0 clmk.sas

#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

sas -batch –noterminal \ -sysparm discr:1.3:15.0 clmk.sas

$ msub clmk.pbs

#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas

#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas

$ msub clmk.pbs –v type=discr,alpha=1.3,beta=15.0

Page 11: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Going parallel… or nuts?

• Parameter sets…– are independent, so computations can be done in

parallel!– but all combination of type, alpha, beta: large

number of jobs

Worker framework

Page 12: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Overview

• Cluster VIC3: hardware & software• Statistics research scenario• Worker framework• MapReduce with Worker• Q&A

Page 13: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Conceptuallytype alpha beta

discr 1.3 15.0

discr 1.3 30.0

discr 1.8 15.0

discr 1.8 30.0

… … …

cont 1.3 15.0

… … …

#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas

#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas

Page 14: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Concretetype alpha beta

discr 1.3 15.0

discr 1.3 30.0

discr 1.8 15.0

discr 1.8 30.0

… … …

cont 1.3 15.0

… … …

#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas

#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

sas -batch –noterminal \ -sysparm $type:$alpha:$beta clmk.sas

clmk.pbs

clmk.csv

$ module load worker/1.0$ wsub –data clmk.csv –batch clmk.pbs -l nodes=2:ppn=8

N

N rows will be computed in parallel by 2 × 8 – 1 = 15 cores

Page 15: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Caveat 1: time is of the essence…

• How long does your job need? (= walltime)– time to compute N rows/requested cores

• walltime limitations– more than 5 minutes– less than 2 days

• hence, if walltime exceeds 2 days, split data and submit multiple jobs

• explicitly request sufficient walltime:

No hard limits,but guidelines toreduce queue time

$ wsub –data clmk.csv –batch clmk.pbs \ -l nodes=2:ppn=8,walltime=36:00:00

Page 16: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Caveat 2: slave labour

• P cores, how to choose P?– functions• 1 master• P – 1 slaves

– each compute node has 8 cores, so P mod 8 = 0– N >> P: better load balancing, efficiency– larger P• shorter walltime• (potentially) longer time in queue

shortest turn-around: hard to predict

turn-around=

queue time+

walltime

Page 17: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Caveat 3: independence

#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

log_name="clmk-$type-$alpha-$beta.log"print_name="clmk-$type-$alpha-$beta.lst"

sas -batch –noterminal \ -log $log_name \ -print $print_name \ -sysparm $type:$alpha:$beta clmk.sas

#!/bin/bash –lmodule load SAS/9.2cd $PBS_O_WORKDIR

log_name="clmk-$type-$alpha-$beta.log"print_name="clmk-$type-$alpha-$beta.lst"

sas -batch –noterminal \ -log $log_name \ -print $print_name \ -sysparm $type:$alpha:$beta clmk.sas

SAS locks log and output files!

Make sure each computation writes to its own files!

Page 18: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Overview

• Cluster VIC3: hardware & software• Statistics research scenario• Worker framework• MapReduce with Worker• Q&A

Page 19: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Conceptually: MapReduce

data.txt

data.txt.1

data.txt.2

data.txt.7

…result.txt

result.txt.1

result.txt.2

result.txt.7

…map reduce

Page 20: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Concrete: -prolog & -epilog

data.txt

data.txt.1

data.txt.2

data.txt.7

…result.txt

result.txt.1

result.txt.2

result.txt.7

prolog.sh epilog.shprolog.sh

batch.sh

batch.sh

batch.sh

$ wsub –prolog prolog.sh –batch batch.sh \ –epilog epilog.sh –l nodes=3:ppn=8

Page 21: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Overview

• Cluster VIC3: hardware & software• Statistics research scenario• Worker framework• MapReduce with Worker• Q&A

Page 22: Workshop: Using the VIC3 Cluster for Statistical Analyses Support perspective G.J. Bex.

Where to find help?

• http://www.vscentrum.be/vsc-help-center• [email protected]• http://status.kuleuven.be/hpc• UHasselt staff: [email protected]