Top Banner
BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013
23
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

BIOSTAT LINUX CLUSTER

By

Helen Wang

October 10, 2013

Page 2: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

What is Beowulf Cluster• A Beowulf cluster is a computer cluster of what are normally identical, commodity-

grade computers networked into a small local area network with libraries and programs installed which allow processing to be shared among them. The result is a high-performance parallel computing cluster from inexpensive hardware.

• The name Beowulf originally referred to a specific computer built in 1994 by Thomas Sterling and Donald Becker at NASA. The name "Beowulf" comes from the main character in the Old English epic poem Beowulf, which was bestowed by Sterling because the eponymous hero is described as having "thirty men's heft of grasp in the gripe of his hand".

• There is no particular piece of software that defines a cluster as a Beowulf. Beowulf clusters normally run a Unix-like operating system, such as BSD, Linux, or Solaris, normally built from free and open source software. Commonly used parallel processing libraries include Message Passing Interface (MPI) and Parallel Virtual Machine (PVM). Both of these permit the programmer to divide a task among a group of networked computers, and collect the results of processing. We are using a software named PBS Pro ( Portable Batch System) from Altair Engineering on our cluster.

• Beowulf systems are now deployed worldwide, chiefly in support of scientific computing.

Page 3: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

In a Short Word

What is a Beowulf cluster?

A Beowulf clusters is a group of commodity computers - connected together via a local area network.  Each computer or node, which can have  a single or multiple processors, runs its own copy of an Open Source Unix-like operating system such as Linux,BSD, or Solaris.

Page 4: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

Basic Beowulf Cluster Structure

Page 5: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

A brief look of our cluster

Page 6: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

What is our cluster configuration

• The VCU Department of Biostatistics hosts a newly updated Linux Beowulf cluster, installed with applications supporting computationally intensive processing. The following description can be used for the "Computing Resources" section of grant applications.

- 31  Dell PE R610/R620 servers with CentOS6 64 bits Linux OS- 320 cores using Intel Xeon 56XX processors (2.67GHz to 3.4GHz)- Total 2.784TB RAM ( 80GB-128GB per node)- 100TB network attached storage with 100TB backup storage with 10GBE connection.- 9TB internal disk storage  ( 120GB-900GB per node)- 10GBE network connections to all nodes and storage.- Fail-over redundant master servers

Page 7: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

Software available on cluster

• R 3.0.1 with CRAN packages and BIOCONDUCTOR packages• R-Studio server• C++/G++ and Fortran compilers• JAVA compiler• Perl compiler• Python and Biopython compiler• SAS-9.3 and SAS-9.4 for Linux 64bit• PLINK• HAPLO• MPI and OpenMPI• PBS Pro 10.4 (a portable batch system for cluster)• Additional open source software upon user request

Page 8: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

Biostat Beowulf Cluster Login Info

• Server Name: Merlot.bis.vcu.edu• IP: 128.172.4.89• 2nd server as failover: blanc.bis.vcu.edu• IP: 128.172.4.90 (invisible on mission)• Software recommended to access

servers: PC USERS: 1. MobaXterm http://mobaxterm.mobatek.net/ 2. ssh /open ssh / putty / winscp MAC USERS: Mac Terminal, Fetch

Page 9: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

Access ClusterServer and nodes

Master node (master1 / master2): merlot.bis.vcu.edu Running CentOS ( redhat kenrnel)Version 5.5, x86-64 Open source or Software download – choose 64 bits CentOS or RHEL 5 if possible Purposes:

front-end user interface; slow; - not for running any jobs. Jobs running on master will be terminated without noticeaccessible from outside through VPN;

Slave nodes (nodes): node1–node29 : dual quart core or 6 core Intel Xeon processors with 80-128GB GB RAM

Purposes:computation; not prefer to access user interface, accessible via master and managed by portable batch management ( PBS ); fast;internal network; -10.0.0.X, not accessible directly from outside

Page 10: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

Access Merlot from your computerHow to use MobaXterm or ssh to access server• Outside of VCU please use VCU VPN Client http://www.ts.vcu.edu/software-center/general-purpose/juniper-vpn/• Open new session then SSH to add the server name• Open “SSH settings” to fill the information

remote hostname: 128.172.4.89username: YOUR_ACCOUNT_NAMEport number: 22

• Open session settings to put merlot in Session Name• ssh –X USERACCT@SERVER_IP for graphical access• Select the server to test the connection and exchange keys by

giving password• Create profile or bookmark for the easy access every time

Page 11: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

UNIX Commands You Need to Know• pwd• clear• mkdir• cp• cd• mv• ls• head• more less• wc• man• rm• chmod• grep with head and tail• nano• sed• cut• top• scp acct1@server1:/yourpath/yourfiles acct2@server2:/yourpath/

Page 12: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

How to use cluster to submit jobs

IMPORTANT

• The logon machine (MERLOT.BIS.VCU.EDU) is only used for login.

Jobs running on merlot will be terminated without notice.

What do you need to submit a job via PBS

• an executable script, the script can be a program code ( R, SAS or other language) , shell script or a collection of command lines.

• test how much resources and time you may need before you submit multiple jobs.

• which queue to use to submit a job

Node and queue configuration

• Nodes: Nodes are the physical computer servers incorporated together to make the cluster.

• Queues: Queues are being used by pbs scheduler to send jobs to different nodes, each node is assigned to a queue to handle different type of jobs.

• Most queue limits can be checked by running the command

qstat -q.

Note that if you need more job permissions, please send request to system admin and your supervisor to get a temporary expansion on job submissions

Page 13: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

Nodes and queues configuration

Queue Name Node assigned Job Limit (proc/user)

Comments

sasq 1, 2 10 run sas jobs

serial 3-14, 16-19 20 Run R and generic jobs RAM<2GB

teach 15 8 Teaching only

workq 20-22 5 Run large memory jobs RAM<10GB

TBD ( parallel) 23-29 20-30 Run special jobs-TBD

Page 14: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

Submitting a Job

Jobs are submitted to a PBS queue so that PBS can dispatch them to be run on one or more of a cluster's compute nodes. There are two main types of PBS jobs:

• Non-interactive Batch Jobs: This is the most common PBS job. A job script is created that contains PBS resource requests and the commands necessary to execute the job. The job script is then submitted to PBS to be run non-interactively.

• Interactive Batch Jobs: This is a way to get an interactive terminal on one or more of the compute nodes of a cluster. Commands can then be run interactively through that terminal directly on the compute nodes for the duration of the job. Interactive jobs are helpful for such things as program debugging and running many short jobs.

Page 15: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

A PBS script is a standard Unix/Linux shell script that contains a few extra comments at the beginning that specify directives to PBS. These comments all begin with #PBS.

• The most important PBS directives are:

PBS Directives Description

#PBS –I walltime=HH:MM:SS This directive specifies the maximum walltime (real time, not CPU time) that a job should take. If this limit is exceeded, PBS will stop the job. Keeping this limit close to the actual expected time of a job can allow a job to start more quickly than if the maximum walltime is always requested.

#PBS -l pmem=SIZEgb This directive specifies the maximum amount of physical memory used by any process in the job. For example, if the job would run four processes and each would use up to 2 GB (gigabytes) of memory,  then the directive would read #PBS -l pmem=2gb

#PBS -l nodes=N:ppn=M This specifies the number of nodes (nodes=N) and the number of processors per node (ppn=M) that the job should use. PBS treats a processor core as a processor, so a system with eight cores per compute node can have ppn=8 as its maximum ppn request. Note that unless a job has some inherent parallelism of its own through something like MPI or OpenMPI, requesting more than a single processor on a single node is usually wasteful and can impact the job start time.

#PBS -q queuename This specifies what PBS queue a job should be submitted to. This is only necessary if a user has access to a special queue. This option can and should be omitted for jobs being submitted to a system's default queue.

#PBS -j oe Normally when a command runs it prints its output to the screen. This output is often normal output and error output. This directive tells PBS to put both normal output and error output into the same output file.

Page 16: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

An example of PBS script

#This is a sample PBS script. It will request 1 processor on 1 node for 10 hours. # #Request 1 processors on 1 node # #PBS -l nodes=1:ppn=1 # #Request 10 hours of walltime # #PBS -l walltime=10:00:00 # #Request 1 gigabyte of memory per process # #PBS -l mem=1gb # #Request that regular output and terminal output go to the same file # #PBS -j oe # #The following is the body of the script. By default, PBS scripts execute in your home directory, not the #directory from which they were submitted. The following line places you in the directory from which the job #was submitted. # cd $PBS_O_WORKDIR # #Now we want to run the program "hello". "hello" is in the directory that this script is being submitted from, #$PBS_O_WORKDIR. # echo " " echo " " echo "Job started on `hostname` at `date`" ./hello echo " " echo "Job Ended at `date`" echo " "

Page 17: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

Template used on cluster• Modify template to create your own pbs script for running

programs#!/bin/bash#PBS -q serial#PBS -N MYSCRIPT### cd to the directory from which I submitted the job. Otherwise it will execute in my home directory.#set WORKDIR = ~/YOURWORDIR#PBS -V#echo “PBS batch job id is $PBS_JOBID“echo "Working directory of this job is: " $WORKDIR#echo "Beginning to run job“Command line you need to execute the job ( /home/huan/bin/calculate - PARAMETEERS)

Page 18: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

• Job Submission Syntax

qsub SCRIPTFILE

Existing job submission scripts

– /usr/local/bin/q*– R USERS qR YOUR_R_SCRIPT– Large memory R jobs qRL YOUR_R_SCRIPT– SAS USERS qsas YOUR_SAS_CODE– Generic or other resources qsub YOUR_OWN_SCRIPT

Interactive Batch Jobs• Interactive PBS jobs are similar to non-interactive PBS jobs in that they are submitted to PBS via the command qsub.

When submitting an interactive PBS job, PBS script is not necessary. All PBS directives can be specified on the command line. The syntax for qsub for submitting an interactive PBS job is:

qsub -I ... pbs directives..

• The -I flag above tells qsub that this is an interactive job.

• The following example shows using qsub to submit an interactive job using one processor on one node for four hours

merlot:~$ qsub -I -l nodes=1:ppn=1 -l walltime=4:00:00 qsub: waiting for job 1064159.merlot.bis.vcu.edu start qsub: job 1064159.merlot.bis.vcu.edu ready node12:~$

• There are two things of note here. The first is that the qsub command doesn't exit when run with the interactive -I flag. Instead, it waits until the job is started and gives a prompt on the first compute node assigned to a job. The second thing of note is the prompt node12:~$ - this shows that commands are now being executed on the compute node node12.

Page 19: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

Monitoring and Managing Jobs

• Check Job Status using qstat

Command Description

qstat Shows the status of all PBS jobs. The time displayed is the CPU time used by the job.

qstat –sqstat -a

Shows the status of all PBS jobs. The time displayed is the walltime used by the job.

qstat –u USERID Shows the status all PBS jobs submitted by the user userid. The time displayed is the walltime used by the job.

qstat -n Shows the status all PBS jobs along with a list of compute nodes that the job is running on.

qstat –f JOBID Shows detailed information about the job jobid.

Page 20: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

Job Running Status

State meaning

Q The job is queued and is waiting to start.

R The job is currently running

E The job is currently ending.

H The job has a user or system hold on it and will not be eligible to run until the hold is removed.

Page 21: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

Managing jobs

• Deleting jobs

- qdel JOBID delete a job by Job_ID

- qdel $(qselect –u USERNAME) delete all jobs owned by USERNAME

• View job output If the PBS directive #PBS -j oe is used in a PBS script, the non-error and the

error output are both written to the Jobname.oJob_ID file.

JobName.oJobID : This file would contain the non-error output that would normally be written to the screen.

JobName.eJobID: This file would contain the error output that would normally be written to the screen.

Page 22: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

More to monitor a node

• To check a node configuration$pbsnodes NODE#

• To check a node statusnodestatus NODE#

• Limitation for the name of the SCRIPTNo more than 10 charactersno space in betweenno special characters.use a temporary name if necessary and change it back when the job is done.

Page 23: BIOSTAT LINUX CLUSTER By Helen Wang October 10, 2013.

At Last

• Edit file using nano or vihttp://www.ts.vcu.edu/faq/unix/picoeditor.htmlhttp://www.ts.vcu.edu/faq/unix/vieditor.html

• use samba connection to map a network drive on PC, recommending to use “EditPad Lite”

• Some one uses Rmate for macbook• Useful links

http://www.ts.vcu.edu/faq/unix/docs.html

• Wiki page for biostat cluster – need vcu eID to login https://wiki.vcu.edu/display/biosit/Cluster+Computing