Top Banner
Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri [email protected]
44

BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri [email protected]

Aug 18, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

Computational Skills Primer

Lecture 21/24/2018

BF528

Instructor: Kritika Karri

[email protected]

Page 2: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

● Who has used SCC before ?

● How long have you worked on SCC ?

● Who has worked on any other cluster ?

● Do you have previous experience working with basic linux and command

line usage (CLI)?

● Who has gone through the tutorial assigned on basic linux and command

line usage ?

Page 3: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

Computer was born in the mind of man, not the other way around!!

Goal of this lecture:

- Overcome the fear of black screen (if you have one !!)

- Use some quick tips for working on SCC which will come in handy for your upcoming projects.

- Unleash the power of shared computing and learn to use it efficiently.

Page 4: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

● Patience with self and with your group mates

● Keep an open mind

● It’s more about learning and less about grades.

● Attitude of collaboration

● It’s OK to not know - we can learn together!!

● Rome ne s'est pas faite en un jour !!!

Page 5: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

● Shared Computing Cluster (SCC)

○ Shared: Multi-user, Multi-tasking environment.

○ Computing: Interactive jobs, Single processor and parallel jobs,Graphics job etc.

○ Cluster: Nexus of computers connected by a fast local area network which

coordinated the computational workload via job scheduler

Page 6: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu
Page 7: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

● A computer cluster is a set of loosely or tightly connected computers that work

together so that, in many respects, they can be viewed as a single system.

● Computer clusters have each node set to perform the same task, controlled and

scheduled by software.

● The components of a cluster are usually connected to each other through fast local

area networks, with each node (computer used as a server) running its own instance

of an operating system.

Page 8: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu
Page 9: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

● Collaborate on projects

● Run code that exceeds workstation capability

● Secured Network

● Fast and easy data share

● Access restricted data like (dbGap)

● Run code that runs for long periods of time

(days, weeks, months)

● Run code in highly parallelized formats (use

100 machines simultaneously).

Page 10: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu
Page 11: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu
Page 12: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

Essential navigation commands:

● pwd print current directory

● ls list files

● cd change directory

We use “pathnames” to refer to files and directories in the Linux file system. There are two types of

pathnames:

● Absolute – the full path to a directory or file; begins with /

● Relative – a partial path that is relative to the current working directory; does not begin with /

Special characters interpreted by the shell for filename expansion:

● ~ your home directory

● . current directory

● .. parent directory

● * wildcard matching any filename

● ? wildcard matching any character

● TAB try to complete (partially typed) file or directory name

Page 13: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

Useful options for the “ls” command:◦ls -a List all files, including hidden files beginning with a period “.”◦ls -ld * List details about a directory and not its contents◦ls -F Put an indicator character at the end of each name◦ls –l Simple long listing◦ls –lR Recursive long listing◦ls –lh Give human readable file sizes◦ls –lS Sort files by file size◦ls –lt Sort files by modification time (very useful!)

Page 14: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

cp [file1] [file2] copy filemkdir [name] make directoryrmdir [name] remove (empty) directorymv [file] [destination] move/rename filerm [file] remove (-r for recursive)file [file] identify file typeless [file] page through filehead -n [file] display first n linestail -n [file] display last n linesln –s [file] [new] create symbolic linkcat [file] [file2…] display file(s) tac [file] [file2…] display file in reverse order

Page 15: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

● Count everything

○ [kkarri@scc4 ~]$ wc ncRNA_pfam.output

○ 1158238 6690230 57727093 ncRNA_pfam.output

● Count lines

○ [kkarri@scc4 ~]$ wc -l ncRNA_pfam.output

○ 1158238 ncRNA_pfam.output

● Count words

○ [kkarri@scc4 ~]$ wc -w ncRNA_pfam.output

○ 6690230 ncRNA_pfam.output

Page 16: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

Find command can be used to locate a file or directory using

following options:

● find . –name my-file.txt # search for my-file.txt in .

● find ~ -name bu –type d # search for “bu” directories in ~

● find ~ -name ‘*.txt’ # search for “*.txt in ~

● find ./directory from current -name ‘.*jpg’ #search for all

jpg file in directory path from current directory

Page 17: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

1. Access you project directory and create a directory named work.2. Copy all the .txt files from /project/bf528/kkarri/ to your work directory3. Rename the file names as file1.txt , file2.txt and so on..4. Count the number of lines in all these files.5. There is a hidden R script file (.R extension) in /project/bf528/- Find the file and

copy it to your work directory.6. Rename the file from to pearson_script.R

Page 18: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

File Editors

● Vim : A better version of ‘vi’ (an early full-screen editor). Nano: ● Gedit: Notepad-like editor with some programming features . Requires Xwindows.

Advantages of Vim and Nano

Nano:

● Easy to use and master.● Nano has most of the shortcuts listed at

the bottom of the window, making it extremely simple to use.

● Search function● Search and replace● "Goto line" command● Automatic indentation

Vim:

● Tough to get started with and master. The editing and command modes will confuse beginners.

● Session recovery● Split screen● Tab expansion● Completion commands● Syntax coloring

Page 19: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

Files Access Control:● Every file has an owner.● Every file belongs to a group.● Every file has “permissions” controlling access to it.

[kkarri@scc4 ~]$drwxr-xr-x 3 kkarri waxmanlab 512 Jan 21 16:03 newdir

● “drwxr-xr-x” gives the “permissions” for this directory (or file). The “d” indicates this is a directory. There are then three sets of three characters for “user” (u), “group” (g), and “other” (o) access levels. “r” indicates a file/directory is readable, “w” writable, and “x” executable. A “-” indicates no such permission exists.

Page 20: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

Change the permissions on the directory “newdir” so that members of your group can write to it:[kkarri@scc4 ~]$ chmod g+w newdir[kkarri@scc4 ~]$ ls -ltotal 0

drwxrwxr-- 3 kkarri waxmanlab 512 Jan 21 16:03 newdir

Page 21: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

● The chmod command also works with the following mappings, readable=4, writable=2, executable=1, which are combined like so:

[kkarri@scc4 ~]$ ls –l newdirdrwxrwxr-x 3 kkarri waxmanlab 512 Jan 21 16:03 newdir[kkarri@scc4 ~]$ chmod 750 newdir[kkarri@scc4 ~]$ ls -l newdirdrwxr-x--- 3 kkarri waxmanlab 512 …

Page 22: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

● tar (Tape ARchiver) : To create a disk file tar archive. Here are the options we are using:○ -z: Write the archive through gzip○ -c: Create a new tar archive○ -v: Verbose, show the files being worked on as tar is running○ -f: Specify the name of an archive file

$ tar -zcvf moe.tar.gz /home/moeTo restore files from a tar archive, use

$ tar -zxvf archivename

● gzip is a utility for compressing and decompressing individual files. To compress files, use:$ gzip filename

○ The filename will be deleted and replaced by a compressed file called filename.Z or filename.gz. To reverse the compression process, use:

$ gzip -d filename

● viewing compressed text files with zcat

○ $ zcat geneList.gz , $ zcat geneList.gz | head

Page 23: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

● Shell Script : sh script_name.sh

● Rscript : Rscript script_name.R

● Python : python script_name.py

Page 24: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

1. Open the pearson_script.R and try to edit the script. Can you edit the file ? 2. What is the permission for your R script ?3. Change the permission for user to be able to write and execute.4. In each of your text files (.txt), substitute ‘Con’ with ‘Control’ and save the changes.5. Execute your pearson_script.R6. Create a pdf folder and copy all the pdf files (*.pdf) and compress them as .tar.gz

Page 25: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

In general

● Home Directory – Personal files, custom scripts.● /project – Source code, files you can’t replace.● /projectnb – Output files, downloaded data sets.

Large quantities of data that you could recreate in the incredibly unlikely event of a disastrous data loss.

Restricted data (dbGAP)

● /restricted/project/PROJNAME backed up space for dbGaP data

● /restricted/projectnb/PROJNAME– not backed up space for dbGaP data

● Only accessible through scc4.bu.edu and compute nodes.

Page 26: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

● Each node (login or compute) has a directory called /scratch stored on a local hard

drive.

○ This can be used by batch jobs to quickly write temporary files.

● If you wish to keep these files, you should copy them to your own space when the job

completes.

● Scratch files are kept for 30 days, with no guarantees.

Page 27: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

● Interactive job – running interactive shell: run GUI applications, code debugging,

benchmarking of serial and parallel code performance;

● Interactive Graphics job ( for running interactive software with advanced graphics )

.

● Batch job – Execution of the program without manual intervention.

Page 28: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

● Modules – Used to load applications not automatically loaded by the system, including alternative versions of applications.

- Check the available modules

[kkarri@scc4 new_cuffmerge]$ module avail R

- Load a module in current environment

[kkarri@scc4 new_cuffmerge]$ module load R/3.4.0

- Unload a module

[kkarri@scc4 new_cuffmerge]$ module unload R/3.4.0

● To check the version of a tool or software

○ kkarri@scc4 new_cuffmerge]$ which R

Page 29: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu
Page 30: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

Batch Jobs – qsub and qstat

Use the Open Grid Scheduler (OGS) command qsub to submit the compiled program to the batch system:[kkarri@scc4 stranded]$ qsub stranded_transcriptome.qsub

[kkarri@scc4 stranded]$ qsub -P waxmanlab stranded_transcriptome.qsub

Check the status of your job qstat

[kkarri@scc4 stranded]$ qstat -u kkarri

job-ID prior name user state submit/start at queue slots ja-task-ID

---------------------------------------------------------------------------------------------------------------

3987947 0.11135 QLOGIN kkarri r 01/20/2018 11:23:05 [email protected] 32

3990472 0.11118 new_cuffme kkarri r 01/21/2018 13:09:13 [email protected] 28

Page 31: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu
Page 32: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu
Page 33: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

What happens if you use more slots than requested?● We kill it to preserve other jobs running on that node.If you have email notifications enabled, you will receive a notice that the job was aborted.● Note that it ran for 9 minutes and the CPU ran for 22.

You will also receive an explanation email.

Page 34: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

More information available on:

http://www.bu.edu/tech/support/research/computing-resources/tech-summary/

Page 35: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

● OpenMP: Single node using multiple processes

○ Common with scripts when the user only wants a single job.

● OpenMP: Single node threading a single process

○ Commonly built into applications.

● OpenMPI: Multi-node, many CPU, shared memory processing

○ Very powerful computation, not used much on BUMC.

More information available on:

http://www.bu.edu/tech/support/research/computing-resources/tech-summary/

Page 36: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu
Page 37: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu
Page 38: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

● Using qdel command and Job id you can request to delete a job

■ [kkarri@scc4 stranded]$ qdel 3992851

● kkarri has deleted job 3992851

● Delete Multiple jobs using a pattern or keyword:

○ killing all jobs that started with cuff

■ qstat -u kkarri | awk '$3 ~ "cuff" {cmd="qdel " $1; system(cmd); close(cmd)}'

○ ends with certain string (i already have an alias called job that will give me the full name of job)

■ qstat -u kkarri | awk '$3 ~ "featureCount$" {cmd="qdel " $1; system(cmd);

close(cmd)}'

○ End multiple with sequential job ids

■ qdel echo `seq -f "%.0f" 401 405`

Page 39: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

● Request an interactive session using qsh○ [kkarri@scc4 stranded]$ qsh -P waxmanlab

Your job 3992885 ("INTERACTIVE") has been submitted

waiting for interactive job to be scheduled …

● Request an interactive session using qlogin○ [kkarri@scc4 stranded]$ qlogin -P waxmanlab -pe omp

16 -l h_rt=12:00:00 #asking for 16 cores

More number of core requested , more time to get access to the session !!!!

Page 40: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu
Page 41: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

Boston University’s Virtual Private Network (VPN) creates a “tunnel” between your computer and the campus network that encrypts your transmissions to BU. Use of the VPN also identifies you as a member of the Boston University community when you are not connected directly to the campus network, allowing you access to restricted networked resources.

● Gain access to restricted resources when you are away from BU, including departmental servers

(such as printers and shared drives).

● Protect data being sent across the Internet through VPN encryption, including sensitive information

such as your BU login name and Kerberos password.

● Increase security when connecting to the Internet through an open wireless network (such as in a

cafe or at the airport) by using the BU VPN software.

Page 42: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

fastqc A quality control tool for high throughput sequence data(will discuss in detail in coming lectures)

The input for this tool is a .fastq.gz file and the command to run is “fastqc name.fastq.gz”

1. Copy the test.qsub script from /project/bf528/kkarri 2. Check the availability of module fastqc 3. Open the script in vim or gedit and edit the script by specifying incomplete parameters (

In CAPITALS) 4. Add the fastqc command using the SRR1177960_R1.fastq.gz file located in

/project/bf528/kkarri folder (hint: use pwd to get the file path)5. Submit test.qsub as batch job and check the status of your job.

Page 43: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

● For the following jobs, what according to you would be a suitable mode of job run

on scc- an interactive session (qsh,qlogin) or batch job (qsub)

○ Alignment of ~50 millions raw sequencing reads to a large reference genome.

○ Run a compute process > 15 min

○ Run a job > 12hrs

Page 44: BF528 Computational Skills Primer€¦ · Computational Skills Primer Lecture 2 1/24/2018 BF528 Instructor: Kritika Karri kkarri@bu.edu

● For in-depth understanding of these concepts go through the following modules on cluster computing and advance command line text editors:

● http://foundations-in-computational-skills.readthedocs.io/en/latest/content/workshops/06_cluster_computing/06_cluster_computing.html

● http://foundations-in-computational-skills.readthedocs.io/en/latest/content/workshops/03_advanced_cli/03_advanced_cli.html