Top Banner
I Workshop on command- line tools (day 2) Center for Applied Genomics Children's Hospital of Philadelphia February 12-13, 2015
30

Workshop on command line tools - day 2

Jul 30, 2015

Download

Software

Leandro Lima
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Workshop on command line tools - day 2

I Workshop on command-line tools

(day 2)

Center for Applied GenomicsChildren's Hospital of Philadelphia

February 12-13, 2015

Page 2: Workshop on command line tools - day 2

awk - a powerful way to check conditions and show specific columnsExample: show only CNV that use less than 3 targets (exons)tail -n +2 DATA.xcnv | awk '$8 <= 3'

Page 3: Workshop on command line tools - day 2

awk - different ways to do the same thingtail -n +2 DATA.xcnv | awk '$8 <= 3'

# same effect 1

tail -n +2 DATA.xcnv | awk '$8 <= 3 {print}'

# same effect 2

tail -n +2 DATA.xcnv | awk 'if ($8 <= 3) {print}'

# same effect 3

tail -n +2 DATA.xcnv | awk 'if ($8 <= 3) {print $0}'

# different effect

tail -n +2 DATA.xcnv | awk 'if ($8 <= 3) {print $1}'

Page 4: Workshop on command line tools - day 2

awk - more options on if statement# Applying XHMM "gold" thresholds (KB >= 1,

# NUM_TARG >= 3, Q_SOME >= 65, Q_NON_DIPLOID >= 65)

tail -n +2 DATA.xcnv | \

awk '$4 >= 1 && $8 >= 3 && $10 >= 65 && $11 >= 65' \

> DATA.gold.xcnv

# Using only awk

awk 'NR > 1 && $4 >= 1 && $8 >= 3 &&

$10 >= 65 && $11 >= 65' DATA.xcnv > DATA.gold2.xcnv

Page 5: Workshop on command line tools - day 2

diff - compare files line by line

# Comparediff DATA.gold.xcnv DATA.gold2.xcnv

# Tip: install tkdiff to use a# graphic version of diff

Page 6: Workshop on command line tools - day 2

Exercises1. Using adhd.map, show 10 SNPs with rsID starting with 'rs' on

chrom. 2, between positions 1Mb and 2Mb2. Check which chromosome has more SNPs3. Check which snp IDs are duplicated

Page 7: Workshop on command line tools - day 2

Suggestions# 1.

grep '\brs' adhd.map | \

awk '$1 == 2 && int($4) >= 1000000 && int($4) <= 2000000' | \

less

# 2.

cut -f1 adhd.map | sort | uniq -c | sort -k1n | tail -1

# 3.

cut -f2 adhd.map | sort | uniq -c | awk '$1 > 1'

Page 8: Workshop on command line tools - day 2

More awk - inserting external variablesawk -v Mb=1000000 -v chrom=2 \

'$1 == chrom && int($4) >= Mb && int($4) <= 2*Mb' \

adhd.map | less

# Printing specific columns

awk -v Mb=1000000 -v chrom=2 \

'$1 == chrom && int($4) >= Mb && int($4) <= 2*Mb

{print $1" "$2" "$4}' \

adhd.map | less

Page 9: Workshop on command line tools - day 2

Using awk to check number of variantsin ped files# Options using only awk, but takes (much) more time

awk 'NR == 1 {print (NF-6)/2}' adhd.ped

awk 'NR < 2 {print (NF-6)/2}' adhd.ped # Slow, too

# Better alternative

head -n 1 adhd.ped | awk '{print (NF-6)/2}'

# Now, the map file

wc -l adhd.map

Page 10: Workshop on command line tools - day 2

time - time command execution

time head -n 1 adhd.ped | awk '{print (NF-6)/2}'real 0m0.485suser 0m0.391ssys 0m0.064s

time awk 'NR < 2 {print (NF-6)/2}' adhd.ped

# Forget… just press Ctrl+Creal 1m0.611suser 0m51.261ssys 0m0.826s

Page 11: Workshop on command line tools - day 2

top - display and update sorted information about processes / display Linux taks

top

z : colork : kill processu : choose specific userc : show complete commands running1 : show usage of singles CPUsq : quit

Page 12: Workshop on command line tools - day 2

screen - screen manager with terminal emulation (i)

screenscreen -S <session_name>Ctrl+a, then c: create windowCtrl+a, then n: go to next windowCtrl+a, then p: go to previous windowCtrl+a, then 0: go to window number 0Ctrl+a, then z: leave your session, but keep running

Page 13: Workshop on command line tools - day 2

screen - screen manager with terminal emulation (ii)

Ctrl+a, then [ : activate copy mode (to scroll screen) q : quit copy modeexit : close current windowscreen -r : resume the only session detachedscreen -r <session_name> : resume specific session detachedscreen -rD <session_name> : reattach session

Page 14: Workshop on command line tools - day 2

split - split a file into piecessplit -l <lines_of_each_piece> <input> <prefix>

# Examplesplit -l 100000 adhd.map map_

wc -l map_*

Page 15: Workshop on command line tools - day 2

in-line Perl/sed to find and replace (i)head DATA.gold.xcnv | cut -f3 | perl -pe 's/chr/CHR/g'

head DATA.gold.xcnv | cut -f3 | perl -pe 's/chr//g'

# Other possibilities

head DATA.gold.xcnv | cut -f3 | perl -pe 's|chr||g'

head DATA.gold.xcnv | cut -f3 | perl -pe 's!chr!!g'

head DATA.gold.xcnv | cut -f3 | sed 's/chr//g'

# Creating a BED file

head DATA.gold.xcnv | cut -f3 | perl -pe 's/[:-]/\t/g'

Page 16: Workshop on command line tools - day 2

in-line Perl/sed to find and replace (ii)# "s" means substitute

# "g" means global (replace all matches, not only first)

# See the difference...

head DATA.gold.xcnv | cut -f3 | sed 's/9/nine/g'

head DATA.gold.xcnv | cut -f3 | sed 's/9/nine/'

# Adding more replacements

head DATA.gold.xcnv | cut -f3 | sed 's/1/one/g; s/2/two/g'

Page 17: Workshop on command line tools - day 2

copy from terminal to clipboard/paste from clipboard to terminal

# This is like Ctrl+V in your terminal

pbpaste

# This is like Ctrl+C from your terminal

head DATA.xcnv | pbcopy

# Then, Ctrl+V in other text editor

# On Linux, you can install "xclip"http://sourceforge.net/projects/xclip/

Page 18: Workshop on command line tools - day 2

datamash - command-line calculations

tail -n +2 DATA.xcnv | \ head | \ cut -f6,10,11 | \ datamash mean 1 sum 2 min 3 # mean of 1st column # sum of 2nd column # minimum of 3rd column

http://www.gnu.org/software/datamash/

Page 19: Workshop on command line tools - day 2

touch - change file access and modification times

ls -lh DATA.gold.xcnvtouch DATA.gold.xcnvls -lh DATA.gold.xcnv

Page 20: Workshop on command line tools - day 2

Introduction to "for" looptail -n +2 DATA.xcnv | cut -f1 | sort | uniq | head > samples.txt

for sample in `cat samples.txt`; do touch $sample.txt; done

ls -lh Sample*

for sample in `cat samples.txt`; do

mv $sample.txt $sample.csv;

done

ls -lh Sample*

Page 21: Workshop on command line tools - day 2

Variables (i)

i=1name=Leandrocount=`wc -l adhd.map`echo $iecho $nameecho $count

Page 22: Workshop on command line tools - day 2

Variables (ii)

# Examplesbwa=/home/users/llima/tools/bwahg19=/references/hg19.fasta

# Do not run$bwa index $hg19

Page 23: Workshop on command line tools - day 2

System variablesecho $HOMEecho $USERecho $PWD

# directory where bash looks for your programsecho $PATH

Page 24: Workshop on command line tools - day 2

Exercise

1. Create a program that shows input parameters/arguments

2. Create a program (say, "fields", or "colnames") that prints the column names of a <tab>-delimited file (example: DATA.xcnv)

3. Send this program to your PATH

Page 25: Workshop on command line tools - day 2

Running a bash script (i)cat > arguments.shecho Your program is $0echo Your first argument is $1echo Your second argument is $2

echo You entered $# parameters.# Ctrl+C to exit "cat"

Page 26: Workshop on command line tools - day 2

Running a bash script (ii)bash arguments.shbash arguments.sh A B C D E

Page 27: Workshop on command line tools - day 2

ls -lh arguments.sh

-rw-r--r--

# First characterb Block special file.c Character special file.d Directory.l Symbolic link.s Socket link.p FIFO.- Regular file.

chmod - set permissions (i)

Page 28: Workshop on command line tools - day 2

Next charactersuser, group, others | read, write, executels -lh arguments.sh-rw-r--r--

# Everybody can read# Only user can write/modify

chmod - set permissions (ii)

Page 29: Workshop on command line tools - day 2

# Add writing permission to groupchmod g+w arguments.sh ls -lh arguments.sh# Remove writing permission from groupchmod g-w arguments.shls -lh arguments.sh# Add execution permission to allchmod a+x arguments.shls -lh arguments.sh

chmod - set permissions (iii)

Page 30: Workshop on command line tools - day 2

# Add writing permission to group./arguments.sh ./arguments.sh A B C D E# change the namemv arguments.sh arguments# Send to your PATH (showing on Mac)sudo cp arguments /usr/local/bin/# Go to other directory# Type argu<Tab>, and "which arguments"

Run your program again