Top Banner
Understanding and using GNU/Linux by Giuseppe Profiti January 2014 Tutorial for the Programming for Bioinformatics course, International Master of Bioinformatics University of Bologna, Italy http://www.biocomp.unibo.it/lsbioinfo/
64
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Linux

Understanding and usingGNU/Linux

by Giuseppe ProfitiJanuary 2014

Tutorial for the Programming for Bioinformatics course,International Master of Bioinformatics

University of Bologna, Italyhttp://www.biocomp.unibo.it/lsbioinfo/

Page 2: Introduction to Linux

January 2014 Giuseppe Profiti 2/64

Goals and means

● Goals– Understanding what an Operating System is

– Know how to proficiently use GNU/Linux

● Means– Simple examples (maybe biology-inspired)

– Exercises and hands-on

● Not covered– Formal details

– “How do I use <our favourite software>?”

Page 3: Introduction to Linux

January 2014 Giuseppe Profiti 3/64

What is an Operating System?

● It's a piece of software

● It manages hardware and software resources

● It's useful for general purpose and heterogeneous hardware systems Im

age

fro

m W

ikim

ed

ia C

om

mo

ns,

Pu

blic

Do

ma

in

Page 4: Introduction to Linux

January 2014 Giuseppe Profiti 4/64

Hardware, OS and software

Hardware

Operatingsystem

Image from Flickr, released under Creative Commons BY by Petr Dosek

Page 5: Introduction to Linux

January 2014 Giuseppe Profiti 5/64

Same OS, different software

Image from Wikimedia Commons, Public Domain (NASA)

Ima

ge f

rom

Flic

kr,

Cre

ativ

e C

om

mo

ns

BY

by

Texa

s A

&M

Un

ive

rsity

Page 6: Introduction to Linux

January 2014 Giuseppe Profiti 6/64

Another example

Image from Flickr, released under Creative Commons BY by Andrea Arden

Different hardware, different OS and software

Page 7: Introduction to Linux

January 2014 Giuseppe Profiti 7/64

GNU/Linux● Originates from Unix● Linux is the kernel

– Manages the hardware, memory and so on

● GNU is a set of software and tools– They run on top of Linux

– Provide functionality

● Multi user, multi threaded● Ubuntu, Lubuntu, Xubuntu, Debian, Red Hat..● MacOS is based on Unix too

Page 8: Introduction to Linux

January 2014 Giuseppe Profiti 8/64

What's the difference?

Image from Wikimedia Commons, GNU GPL license

Page 9: Introduction to Linux

January 2014 Giuseppe Profiti 9/64

A Linux distribution includes

● The Kernel (Linux)● An install system for the distribution● Drivers

– How the system can manage specific hardware

● A package manager– To install and update software

– Usually different from one distribution to the other

Page 10: Introduction to Linux

January 2014 Giuseppe Profiti 10/64

Login

● Once started the system asks for your– Username

– Password

● Each user has a different main folder on disk● Users have different access rights● The superuser (called “root”) can do everything● On Ubuntu, the main user you created when

installing can run programs as root, if needed

Page 11: Introduction to Linux

January 2014 Giuseppe Profiti 11/64

Shell

● It is the main interface with the system● Can be used to

– Navigate the file system

– Execute tools

– Install software

– Connect to other machines

– Edit files

– … everything the system can do

● Also called Console, or Terminal

Page 12: Introduction to Linux

January 2014 Giuseppe Profiti 12/64

How a shell looks like

Image from Wikimedia Commons, licensed as Public Domain by User:AVRS

Page 13: Introduction to Linux

January 2014 Giuseppe Profiti 13/64

“It's a trap!”

Every time you use the mouse in a shell,you are doing something wrong.

Ima

ge b

y M

an

ue

l R.,

Wik

ime

dia

Co

mm

on

s, C

C-B

Y

Page 14: Introduction to Linux

January 2014 Giuseppe Profiti 14/64

Exercise 1: Open a shell

● If you don't use the Graphical User Interface– You already are in a shell

● If you use the Graphical User Interface– In Ubuntu: Click the logo, type “terminal”, select it

– Other systems: find the terminal icon somewhere

● The terminal may have a black, white or colour background– No matter the colour, it works in the same way

Page 15: Introduction to Linux

January 2014 Giuseppe Profiti 15/64

The prompt

● It is a string saying that the shell is ready● It may state the current directory● It ends with $,%,> or #● After that, you can type a command● After a command, you type the Enter key

Page 16: Introduction to Linux

January 2014 Giuseppe Profiti 16/64

Exercise 2: create a directory

● To create a directory (or folder) type:

mkdir tutorial-p2b

● and press the Enter key ↵● What do you see?

● To check the existence of the new directory:

ls

● and press the Enter key ↵

Page 17: Introduction to Linux

January 2014 Giuseppe Profiti 17/64

Upper-case and lower-case

● The shell is CASE SENSITIVE– Upper-case and lower-case are different

● LS is different from ls

● Tutorial-p2b is not tutorial-p2b

● Then, to run a program, you have to type its name correctly

● You can use the TAB key ↹ to complete a filename after typing its initials– IF the system can distinguish what file you want

Page 18: Introduction to Linux

January 2014 Giuseppe Profiti 18/64

Exercise 3: look inside a directory

● Type:

ls tutorial-p2b ↵● Type:

ls Tutorial-p2b ↵● Type:

ls tut

● Then the TAB key ↹ , then the Enter key ↵

Page 19: Introduction to Linux

January 2014 Giuseppe Profiti 19/64

File system

● It stores both data files and programs● Directories are lists of files● Hierarchical structure● The root of the tree is the directory /

/

home etc bin

me you

Page 20: Introduction to Linux

January 2014 Giuseppe Profiti 20/64

Filesystem

● Files and directories are stored in a filesystem● The filesystem is like a tree:

– It has one root directory “/”

– Each subdirectory is a branch in the tree

– Each file is a leaf

Page 21: Introduction to Linux

January 2014 Giuseppe Profiti 21/64

Path

● A path specifies a location in the filesystem● It indicates the branches to follow● Each branch (directory) is separated by /● The path can be absolute or relative● Absolute: always starts from the root

– i.e. “/home/Alice/Desktop/vacation/sunset.jpg”

● Relative: starts from your current directory– i.e. “Desktop/vacation/sunset.jpg” if you are in

/home/Alice/

Page 22: Introduction to Linux

January 2014 Giuseppe Profiti 22/64

Special directories

● The current directory is “.”– So “sunset.jpg” and “./sunset.jpg” are the same file

● The previous directory is “..”– i.e. If you are in “/home/Alice/Desktop/work/”, you

write “../vacation/sunset.jpg”

– If you are in “/home/Alice/experiment/data/”, you type “../../Desktop/vacation/sunset.jpg”

Page 23: Introduction to Linux

January 2014 Giuseppe Profiti 23/64

/

B A

WORKHOME

A

3.TXT

1.TXT

3.TXT2.TXT1.TXT

Exercise 4: path

While in /home/ check the following relative paths:● A/1.TXT● ../WORK/1.TXT● ../WORK/A/../1.TXT● ../WORK/A/../../HOME/B/../A/1.TXT

Specify the absolute paths for the following files:● leftmost and rightmost 3.TXT● leftmost and rightmost 1.TXT

Page 24: Introduction to Linux

January 2014 Giuseppe Profiti 24/64

File permissions

● Files can be read, written and executed● The owner of a file can restrict these operations

– For herself

– For other members of the group

– For everyone else

Examples:● Experiment data that should not be overwritten● Data shared only with group members for read

purposes

Page 25: Introduction to Linux

January 2014 Giuseppe Profiti 25/64

File permissions

● Permissions can be changed using chmod

● The shortcuts are:– User (u), Group (g), Others (o), All (a)

– adding (+), removing (-)

– Read (r), Write (w) and eXecute (x)

● To remove the write permission to the group:

chmod g-w

Page 26: Introduction to Linux

January 2014 Giuseppe Profiti 26/64

File types

● Extensions mean nothing– .doc, .jpg and so on are just conventions

● Text and binary files– Text can be printed and read by humans

● Plain text, CSV, XML are all text-based

– Binary can be read by programs

● Data and programs– A program can be executed by the system

(executable permission does not make a program)

Page 27: Introduction to Linux

January 2014 Giuseppe Profiti 27/64

Programs and processes

● An executable program sits in the disk● A running program becomes a process

– You can have multiple processes spawned from the same program: i.e. many blastall running

● Each process has a unique identifier (pid)● To inspect the running processes: ps or top

● To quit a running process, use CTRL+c or

kill <pid>

Page 28: Introduction to Linux

January 2014 Giuseppe Profiti 28/64

Exercise 5: processes

● Open two shells● In one shell run the following command

sleep 20m

● In the other shell, run ps to find the pid of sleep

● Kill the process using

kill <pid>

● Note: on remote servers you can't CTRL-C unless you keep the connection open

Page 29: Introduction to Linux

January 2014 Giuseppe Profiti 29/64

Parameters vs arguments

● The argument(s) is the subject of the operation– ls /home/Alice/Desktop

– kill 260046

● Parameters (or options) modify the behaviour– ls -l /home/Beatrix/Desktop

– top -h

● Parameters usually start with minus sign– Single one for single letter (-h, -p, -t)

– double for longer parameters (--help, --out)

Page 30: Introduction to Linux

January 2014 Giuseppe Profiti 30/64

Inspecting a file

● head prints the first 10 lines

● tail prints the last 10 lines

– You can change the number of lines of both head and tail by specifying it as parameter

● cat shows the whole file

– Beware to long files

● more shows the whole file, paginated

Page 31: Introduction to Linux

January 2014 Giuseppe Profiti 31/64

Exercise files

● Download the following files

http://profiti.web.cs.unibo.it/res/p2b/ex.tar.gz● Uncompress it● It should contain the following files:

– test1.txt

– test2.txt

– data1.txt

– data2.txt

Page 32: Introduction to Linux

January 2014 Giuseppe Profiti 32/64

Exercise 6: looking into a file

● Print the content of test1.txt

cat test1.txt

● Print the first 10 lines of test1.txt

head test1.txt

● Print the first line of test1.txt

head -n 1 test1.txt

● Print the last line of test1.txt

tail -n 1 test1.txt

Page 33: Introduction to Linux

January 2014 Giuseppe Profiti 33/64

Finding text: grep

● It prints the lines containing a match

grep “pattern” filename

● Pattern can be a string or a regular expression● Useful parameters

– -w matches whole words (i.e. spaces around)

– -x matches whole lines

– -i ignore case (uppercase = lowercase)

– -v reverse match (i.e. lines NOT containing pattern)

Page 34: Introduction to Linux

January 2014 Giuseppe Profiti 34/64

Exercise 7: grep

● Find all the lines containing “m” in test1.txt

grep “m” test1.txt

● Find all the lines NOT containing “m” in test1.txt

grep -v “m” test1.txt

● Find all the lines containing “omen” in test1.txt

grep “omen” test1.txt

● Please notice that “momentum” matches– Try using the -w option

Page 35: Introduction to Linux

January 2014 Giuseppe Profiti 35/64

Finding text: grep /2

● You can provide a file of patterns

grep -f patterns.txt data.txt

● The program looks for every line as a separate pattern

● It may take a while if the two files are big

Page 36: Introduction to Linux

January 2014 Giuseppe Profiti 36/64

Comparing

● Look for the differences in two similar files

diff file1 file2

● Compares the two files line by line● Output

– Line numbers for the different lines

– “<” for lines only in file1

– “>” for lines only in file2

● It is not quite easy to use

Page 37: Introduction to Linux

January 2014 Giuseppe Profiti 37/64

Sorting

● Diffing is easier when data are sorted

sort filename

● Useful parameters:– -n numerical sort (otherwise 100 < 2)

– -r reverse sort

– -k x sort on column number x

– -t x uses x as column separator

Page 38: Introduction to Linux

January 2014 Giuseppe Profiti 38/64

Getting columns

● Printing a specific column with cut (ex.: 3rd)

cut -f 3 filename

● You can specify column separator with -d

● Useful arguments for -f:

– N prints the Nth column, counted starting from 1

– N- prints from the Nth to the end of the line

– N-M prints from Nth to Mth (included)

– -M prints from 1 up to Mth (included)

Page 39: Introduction to Linux

January 2014 Giuseppe Profiti 39/64

Redirection

● You can save the result of commands to a file● The output is redirected using >

ls > files.list

● Append with >>

● Errors are not “output”, use 2>

● Both output and error redirected with &>

● Input redirection with <

cat < file.txt

Page 40: Introduction to Linux

January 2014 Giuseppe Profiti 40/64

Pipe: motivation

● Example: I want the file names for all the files with rwx permissions

● Solution with redirection:ls -l > files.list

grep “rwx”files.list > wanted-files.list

cut -f 10- -d” ” wanted-files.list > result.list

Page 41: Introduction to Linux

January 2014 Giuseppe Profiti 41/64

Pipe

● Too many intermediate files

– Possibly big: disk space issues

– Hard to remember: do I need myfiles.list or my.list?● Rule of thumb: keep intermediate result only if you

need it later for other analysis

● For everything else, use pipe |

ls -l | grep “rwx” | cut -f 10- -d” ” > result.list

● Pipe sends the result of a command to the input of the following one

Page 42: Introduction to Linux

January 2014 Giuseppe Profiti 42/64

Pipe

● All the previous examples work also without a file as input, but with a pipe

● The first 10 lines of a list of files

ls | head

● The first column of the last line of a sorted file

sort file.txt | tail -1 | cut -f 1

Page 43: Introduction to Linux

January 2014 Giuseppe Profiti 43/64

Pipe vs sequence

● Pipe sends the result to the next command● If you want to execute commands in sequence,

separate them using ;

ls; head test.txt

● What if the second depends from the first?

python my.py > a.txt && sort a.txt

Page 44: Introduction to Linux

January 2014 Giuseppe Profiti 44/64

Editing a file

● Too many to list them all, just the more common● On the shell

– cat > filename writes everything you type to file● CTRL+d ends the input

– nano, pico: easy to use

– vim, emacs: more advanced

● On the GUI– gedit

– gvim

Page 45: Introduction to Linux

January 2014 Giuseppe Profiti 45/64

Shell scripting

● What if the command is very long and you have to use it again?

● What if you have to repeat the same operations for many inputs?

● Shell scripting is programming for the shell● Same primitives of programming languages

– If choices, for loops

– Parameters, variables

Page 46: Introduction to Linux

January 2014 Giuseppe Profiti 46/64

Shell scripting /2

● Save commands to a text file● Add execution permissions to the file● Call the file from the shell● Example:

for i in $(ls *.fasta); do echo $i, $(grep “^>” $i | wc -l); done | sort -n -k 2 > $1

Page 47: Introduction to Linux

January 2014 Giuseppe Profiti 47/64

Shell scripting /3

for i in $(ls *.fasta); do echo $i, $(grep “^>” $i | wc -l); done | sort -n -k 2 > $1

● $( ) returns the output of the commands inside

● Useful for cat and everything that returns a content

Page 48: Introduction to Linux

January 2014 Giuseppe Profiti 48/64

Shell scripting /4

for i in $(ls *.fasta); do echo $i, $(grep “^>” $i | wc -l); done | sort -n -k 2 > $1

● for execute the commands between do and done one time for each iteration

● i is the iteration variable, it gets one of the values (in the example, a file name), you access its value using $i

Page 49: Introduction to Linux

January 2014 Giuseppe Profiti 49/64

Shell scripting /5

for i in $(ls *.fasta); do echo $i, $(grep “^>” $i | wc -l); done | sort -n -k 2 > $1

● The final result of all the for loops is passed to sort

● This script returns a list of fasta file with an associated number of entries, sorted by that number

Page 50: Introduction to Linux

January 2014 Giuseppe Profiti 50/64

Shell scripting /6

for i in $(ls *.fasta); do echo $i, $(grep “^>” $i | wc -l); done | sort -n -k 2 > $1

● The final result is redirected to a file, specified at command line

● Examples:

bash myscript.sh result1.txt

bash myscript.sh result2.txt

Page 51: Introduction to Linux

January 2014 Giuseppe Profiti 51/64

Awk

● Awk executes a series of commands for each line of the input

● It can execute different commands for different lines, using matching regular expressions

● It may be faster than other tools● It is easy to use and powerfull

Page 52: Introduction to Linux

January 2014 Giuseppe Profiti 52/64

Awk /2

awk '/<regex>/ {<commands>}' a.txt

● You can specify multiple regular expressions● Commands can contain if and assignments● Two special keywords instead of regex

– BEGIN matches the beginning of the input, before the first line

– END matches the end of the input, after the last line

Page 53: Introduction to Linux

January 2014 Giuseppe Profiti 53/64

Awk /3

awk 'BEGIN {a=0} {a=a+1} END{print a}'

● It counts the number of lines● Before the first line, sets the variable a to zero● For each line, increases the counter

– There is no regex, so each line matches

● At the end, prints the value of the counter● Works better than wc -l

Page 54: Introduction to Linux

January 2014 Giuseppe Profiti 54/64

Awk /4

awk '{print $2,$3}'

● Prints the second and the third column● Columns are separated by space● You can specify a different separator with -F

awk -F “,” '{print $2,$3}'

● NF is the number of columns (or “fields”)● $NF is the value of the last column

Page 55: Introduction to Linux

January 2014 Giuseppe Profiti 55/64

Awk /5

awk '/^ATOM/ {if ($5==”A”) print $7,$8,$9}'

● Prints the positions for each atom in the A chain● It matches only lines starting with “ATOM”● You can select lines not matching a patternawk '!/(TAG)|(TAA)|(TGA)/ {print $3,$4}'

● The ! means “not matching”● Round brackets group patterns● | is for alternatives

Page 56: Introduction to Linux

January 2014 Giuseppe Profiti 56/64

Awk exercise 1

1.Print lines containing m in test1.txt

2.Print lines not containing m in test1.txt

3.Print lines with A in second column in test1.txt

4.Print the third column of test2.txt

(a) Use comma as separator

(b) Use E as separator

Page 57: Introduction to Linux

January 2014 Giuseppe Profiti 57/64

Awk /6

awk 'BEGIN {name=””} /^>/ {name=$0; d[name]=””} !/^>/ {d[name]=d[name]+length($0)} END {for (i in d) print substr(i,2,length(i)),d[i]}'

● Uses an array d, it's like python dictionaries● $0 is the whole line

● substr is the substring, positions starts from 1● Prints a list of fasta entries and their length

Page 58: Introduction to Linux

January 2014 Giuseppe Profiti 58/64

Awk exercise 2

● Print the sum of the elements of the third column of test1.txt

● Print the average of the elements of the fourth column of test1.txt

● Take a look at data1.txt and data2.txt– Did you just open them with an editor?

– Did you just use “cat”? Or “more”?

Page 59: Introduction to Linux

January 2014 Giuseppe Profiti 59/64

Awk exercise 3

● How many lines in data1.txt and data2.txt?$wc -l data* 2999997 data1.txt 2999999 data2.txt

● Is it true?– data1.txt contains 2999998 lines

– data2.txt contains 3000000 lines

● They contain the same numbers, but 2● Which ones?

Page 60: Introduction to Linux

January 2014 Giuseppe Profiti 60/64

Awk /7

awk 'BEGIN {while ((getline<"patterns.txt")>0)diz[$1]=0} {if ($1 in diz) print $0}'

● Works like grep -f patterns.txt

● Getline reads the file one line at the time● Each line becomes a key in the array● The input is then checked against existing keys● For big files, it is faster than grep

– O(N*M) vs O(N+M)

Page 61: Introduction to Linux

January 2014 Giuseppe Profiti 61/64

Awk exercise 3, solution

diff <(sort data1.txt) <(sort data2.txt)

● Diff is picky, the result is not that good– Took 14 seconds on a test computer

grep -v -f data1 data2.txt

● Good luck, it may take a while– It may freeze your computer

● Awk takes 4 seconds on a test computer

Page 62: Introduction to Linux

January 2014 Giuseppe Profiti 62/64

Awk vs Python

● Reading fasta, awk style

awk 'BEGIN {name=””} /^>/ {name=$0; d[name]=0} !/^>/ {d[name]=d[name]+length($0)} END {for (i in d) print substr(i,2,length(i)),d[i]}'

● Note: awk scripts can be saved to a file

● Use the -f option to call the saved file

Page 63: Introduction to Linux

January 2014 Giuseppe Profiti 63/64

Awk vs Python

● Reading fasta, Python styleimport sysf = open(sys.argv[1])d = {}name = “”for r in f: r = r.rstrip() if r[0]=='>': name = r[1:] d[name]=0 else: d[name]+=len(r)f.close()for k in d: print k,d[k]

Page 64: Introduction to Linux

January 2014 Giuseppe Profiti 64/64