Getting Started with the BDSG Login Service · Getting Started with the BDSG Login Service Introduction and Learning Objectives ... This tutorial will introduce the concept of the

1

last modified 05/07/16

Getting Started with the BDSG Login Service Introduction and Learning Objectives This document comes in two parts – a short introduction to a number of necessary concepts,

and a set of annotated practical exercises to work through.

This tutorial will introduce the concept of the Unix operating system and then some of the

commonly used inbuilt commands. Basic programs for editing files are shown, and then

some command-line syntax useful for (re)directing input and output to programs and other

file manipulations. A short glossary/summary of commands is given at the end of the

document. By the end of the practical, you should be comfortable moving around your

account, manipulating directories, files and running simple commands.

Requirements To work through the exercises in this practical, you will need login access to a machine

running linux – either a server or a linux workstation. Here we give instructions assuming

that you will have a user account and password on the bioinformatics server

codon.bioinformatics.ic.ac.uk, which you will access via a local machine (PC, Mac or linux

workstation) with connection software already installed – such as a College teaching

machine. You can work through the tutorial using any local machine (PC, Mac or linux

machine) that is connected to the college network.

If you are working from anywhere else e.g. from home, you will need to use a VPN

connection as the server will only accept connections from the ic.ac.uk domain for security

reasons. (see below)

Supplementary information on how to install the connection software you will need on your

local machine (if not already installed) and how to log in using different combinations (e.g.

from a MAC or a linux box to Codon) and how to configure the necessary connections are

all available from http://www.imperial.ac.uk/bioinformatics-data-science-group/support/help/

under ‘connecting to codon’). You will need your standard college username and password

to access these help pages. If you don’t already have an account with us (Bioinformatics

Support Service/ Bioinformatics Data Science Group), you will need to apply for one via our

web-form at

http://www.imperial.ac.uk/bioinformatics-data-science-group/support/apply-for-account

Help on setting up VPN on a private machine is available from the main ICT web-site at

http://www.imperial.ac.uk/admin-services/ict/self-service/connect-communicate/remote-

access/method/set-up-vpn/

http://www.imperial.ac.uk/bioinformatics-data-science-group/support/help/

http://www.imperial.ac.uk/bioinformatics-data-science-group/support/apply-for-account

http://www.imperial.ac.uk/admin-services/ict/self-service/connect-communicate/remote-access/method/set-up-vpn/

http://www.imperial.ac.uk/admin-services/ict/self-service/connect-communicate/remote-access/method/set-up-vpn/

2

What is UNIX? UNIX is a commonly used operating system. UNIX has evolved since it was first created in the

early 70’s. This has led to the development of a large number of programs which, while strictly

speaking, are not part of “UNIX”, but come bundled with the operating system so you would

expect to find them on all UNIX machines. It is this bundled software that makes UNIX so popular

and powerful. This does not mean that the software was written to be easy to use, but it does

allow us to perform very complicated tasks. Unfortunately as so many people have contributed,

the look and feel of UNIX may not always appear consistent or logical. UNIX comes in a variety

of different ‘flavours’, but one of the most commonly used ones today is ‘Linux’ and that is what

the servers you will use today are running. Linux is distributed by several groups and Red Hat,

Ubuntu, Debian and Fedora are all names of common distributions.

Shells When UNIX was first invented it was felt that someone should invent a shell to protect the users

from its raw edges. Shells listen for your commands and then convert these into real UNIX on

your behalf. The important point is that there are many different shells - the borne shell (sh), the

C-shell (csh), the korne-shell (ksh), the t-shell (tcsh) and the bash shell.

We use the bash shell as our standard login shell on the training servers, as it is the default user

shell on most linux distributions – but please note that there are others, and the syntax of doing

things MAY NOT BE THE SAME in different shell environments. We have set up the servers so

that all the bioinformatics software you will need later is set up automatically. This will not always

be the case on other systems, and you may have to type additional commands or add them to

setup scripts to tell programs where to find dependencies, variables etc. This is a more advanced

topic and is outside the scope of this practical.

X-windows and X11 emulators The X Window System is a network-transparent window system that can run on a wide range of

machines. This system allows you to log onto a remote machine, but instead of having to enter

orders to that machine on the command line, a windows environment is displayed on your

screen. This window can have pull down menus, buttons, etc. The current version is X Window

System, Version 11 (or X11).

To use any X11 program on the server you only need to load one extra piece of software, called

an X11 emulator, onto your local desktop computer. If you are working on a Linux machine, X11

itself will almost certainly be already installed as part of its operating system. With an X11

emulator, you can run the X11 program across the network on your PC or Mac.

An X11 program is generally already installed on College teaching PCs. If you want to use X11

on your own local machine, free software is available for both PCs and Macs. Full information on

the software required to use X11, and also to move files backwards and forwards between

Codon and your local computer is available as above from

http://www.imperial.ac.uk/bioinformatics-data-science-group/support/help/.

Logging into the Unix Server codon.bioinformatics.ic.ac.uk These instructions by default assume that you are in a College computer–teaching room or

have in front of you a PC attached to the College network that already has a copy of the

following connection software installed: Putty (SSH client), Filezilla (secure file transfer),

XMing or Exceed (X11 emulator) - and will use the Bioinformatics Support Service/

http://www.imperial.ac.uk/bioinformatics-data-science-group/support/help/

3

Bioinformatics Data Science Group’s server codon.bioinformatics.ic.ac.uk for the

remainder of the tutorial. You will need a username and password specific for this machine

– your standard College username/password WILL NOT work here.

If you are logging in from other machines, specific instructions for doing so are available

from our web site as listed on the previous page.

Log in to your PC using your standard college username and password. You will need to

use the Putty program to login to our server, where you will run the remainder of the

practical. To allow the server to display graphics on your screen, you will also need to use

X11 software on the PC. Your teaching machine has either Exceed or XMing installed for

this purpose.

You will first need to configure and save a session inside Putty:

Find and double click on the PuTTY icon. If it is not on the desktop, look in the Start Menu,

following All Programs:

You should see a screen that looks rather like this: (but the ‘Saved Sessions’ field may be

empty)

Type codon.bioinformatics.ic.ac.uk into the Host Name (or IP address) box, and ensure

that the Protocol is set to SSH (shown by the ring above). Then click on

SSH in the left-hand pane to open its additional options – select X11, as indicated by the

arrow. You will then see the following:

4

Make sure that the box next to Enable X11 forwarding is ticked, as shown. Then click

Session in the Category list (on the left). This will take you back to the original screen,

where you should save the settings you have made as follows:

Type codon.bioinformatics in the Saved Sessions box, and click on Save. You have now

made a shortcut to enable you to login to the server next time without having to do any

configuration.

TOP TIP (optional): the session you have created will generate a screen for you to work in

that has white text on a black background. If you prefer alternative colours, you can change

them inside the Putty configuration. Make sure that your ‘codon.bioinformatics’ saved

session is loaded by selecting it and clicking on ‘load’, then go to the menu on the left side of

the Putty screen and click on the “Window -> Colours option as below.

Select ‘Default Background’ and then ‘Modify’. You can now select a suitable background

colour. Now select a ‘Default Foreground’ colour, to produce text that is visible against the

background. When you are done, you can go back to the ‘Session’ menu at the top of the

left-hand menu and click on Save

Now we need to start an X11 emulator program. Here we will assume that your PC has

Exceed installed. Some machines may have XMing installed instead - Alternative notes for

using XMing are shown in a boxed section at the end of the Exceed notes.

5

Look for the Exceed icon on the desktop, which looks a bit like this:

If you can’t find it, search for Exceed under the ‘All programs’ windows menu, and launch by

clicking on the icon you find there. NOTE: you may not see any new program window

appear on your desktop. Now you have X11 running, you can connect to the server, by

going back to Putty, clicking on your codon.bioinformatics saved session to select it, and

clicking on the ‘Load’ button followed by ‘Open’.

You will see a new window appear, that will look something like this (colour may differ

depending on your Putty configuration):

You will need the codon.bioinformatics username and password we have sent to you

earlier by email. You cannot log into this server using your standard college username and

password. Type in your username and password, pressing return each time (you won’t see

any characters on the screen when you type the password).

The first information to appear on the screen once you have logged in, is the location and

date/time that you last logged into the server, followed by a banner telling you which

machine you are connected to and a help email address ([email protected]). After this, is a

section where the administrator of the server can add any new messages about the service

– for instance warning of scheduled maintenance sessions (not shown in the example

above). This is called the Message of the Day (MOTD for short). On codon, you will then see

some horizontal bars that show a summary of how much space your account is using (more

on this in a later section).

The Prompt

The line on the screen that appears after you have typed in your password and pressed

return which has the form of

[sarahb@codon ~]$ all of this together is called the prompt

6

This reminds you of your username (e.g. sarahb), the short name of the machine you are logged in to (codon), and the directory you are currently in (~ - here your home directory – more about this later). The prompt reminds you that the machine is waiting for you to give a command. We can open as many terminal windows at once as are wanted or needed (there is a limit but you won’t ever need that many). You can start another terminal, by going back to Putty and starting another codon.bioinformatics) session, the same as before. Terminal windows are maximised and minimised, and moved around the screen the same way as for normal PC or Mac windows. Size can also be adjusted by dragging on a corner while using the left mouse button. Please do not close them by clicking on the X in the top right hand corner – this is not a safe way to log out.

N.B. When you have finished and are ready to log out, you can close a terminal window, by typing exit at the prompt, or <Ctrl> d (hold down the control key and type d)

Now we can check that X11 forwarding is working by typing the command:

xeyes

After a short pause, you should see a pair of googly eyes appear somewhere on your

screen:

If you can see them, go back to your putty session and stop

the xeyes program by typing <Ctrl> c (i.e. hold the control key down, while typing the

letter c)

If you see an error message and no eyes appear, please go back to your Putty configuration

and check that you have the Enable X11 forwarding box ticked – save any changes,

restart Putty and try Xeyes again.

Using XMing For X11 on a PC (instead of Exceed) Look on your desktop, quick launch bar or under the programs menu, for the Xming Icon, which looks a bit like this

Start the program by selecting it from the menu or double-clicking the shortcut. You won’t see very much happening at this point - XMing will add an icon to the notification panel at the bottlom right of your screen. Really you should start the X11 programs BEFORE starting your Putty connection.

7

Terminals – copying and pasting When X11 was originally designed, the assumption was that everyone would have a three-

button mouse, using the left mouse button to highlight and copy, and the middle button to

paste – so, what do you do if your mouse has less buttons?

Many mice only have two buttons, or perhaps 2 buttons and a central scroll wheel. To

emulate the third button (needed for pasting inside a terminal window), there are 2

possibilities, depending on how the mouse has been configured. Where there is a scroll-

wheel, pressing down on the scroll wheel (i.e. into the body of the mouse, rather than turning

the wheel) will paste, or if there are only 2 buttons and no scroll wheel, pressing the 2

buttons simultaneously will paste. On a Mac, you may only find one button on the mouse, or

two. There, holding down the Apple command key on the keyboard and c while

highlighting text should copy, while using the command key with v should allow you to

paste. Remember, whatever you are selecting to copy and paste will get pasted where your

cursor is. To select something for copying, press the left button and select the text, then

paste using whatever is designated as the middle mouse button – as above.

Exercise:

Go back to Putty and open another terminal window. Practice selecting some text in one

window and pasting it into the other terminal. You can also copy and paste between the

terminal and other programs on your PC e.g. Notepad. When you are happy, you can shut

the extra terminal window by typing exit

Basic commands It is possible to achieve a great deal with only a basic set of Unix commands. The server can be

thought of as a very large filing cabinet, containing files within file folders, within file folders, etc.

Folders are commonly referred to as Directories and Subdirectories within Unix. Carrying on

with the filing cabinet simile, imagine how hard it would be to find anything if you just threw all

your documents in the drawer without any folders, or dividers – chaos! The same thing will

happen to your Unix account if you choose to keep all your documents in your home directory,

instead of creating subdirectories (file folders) to store associated data together.

Your home directory is the directory you automatically start off in every time you log in to your

account.

There is a short-cut name for your home directory when you are typing – which is the ~ (tilde)

symbol.

First we will make a directory called course in which to store the files you will generate today,

type:

mkdir course

To move into this new directory, type: (note that the prompt changes to show the new directory)

jbloggs@codon ~]$ cd course

8

jbloggs@codon course]$

cd stands for change directory

course is a subdirectory of jbloggs, this person’s home directory. To show the fully qualified

pathname for your current directory type:

pwd

/home/jbloggs/course (typical reply)

pwd stands for ‘print working directory’ and will return the full path to where you are on the

machine relative to a fixed point – the Root of the machine. Here, this tells you that course is a

directory, within the directory jbloggs, which is a directory within home - we are using a

‘hierarchical file system’ which means that we can have directories within directories.

N.B.

Knowing the full path for a particular file is important when you need to tell the machine where to

find files you want to work on, which may reside in directories other than the one you are

currently working in. There are 3 ways of specifying the location of a file or directory:

1. Absolute address from the root of the machine (like the one shown above)

2. Relative to your home directory

3. Relative to the current directory you are working in at the time

Use the one that is easiest for you at the time. A location relative to the root of the machine

always starts with a /

When you type pwd, you will see an absolute path from the root of the machine. Other

forward slashes are added to delineate between directories. Relative paths do not start with

a /. There are various shortcut symbols to help you move around as well. We will explore

paths shortly in the exercises, but here is another example:

Examples of absolute addresses of the files:

/usr/users/fred

/software/eric

/data/rnaseq/reference/Homo_sapiens/Ensembl/hg19/Annotation/foo.gtf

As an example, let us assume we are currently in /usr and we want to move the file fred

into the software directory, we would type:

mv users/fred /software

or, alternatively we could type mv users/fred ../software

i.e. the symbol “..” stands for backwards one directory towards the root of the machine

a single dot “.” stands for the directory that you are in at the time – i.e. your current directory.

9

the tilde symbol “~” stands for your home directory

Now you can try this out in the exercise below: Type the following commands.

pwd (this will return your current directory, in this case course)

cd .. (this will move you one directory backwards to your home directory

cd /usr/biosoft (this will change your directory to one called /usr/biosoft

ls (this will list the contents of this directory)

cd ~ (this returns you to your home directory)

A note on filenames Unix makes use of many of the character keys on your keyboard. Some of them have

special attributes which means that they cannot be used in standard filenames – as they are

interpreted to mean something specific. There are ways of wrapping them so that they are

not interpreted by the operating system (e.g. by using an escape character first such as “\“ in

front of a space in a file name) but it is generally a GOOD IDEA TO AVOID using the

following characters in your file and directory names `¬!$%&*():;~#?/><,|\{ } [ ] / to stop

unexpected effects.

Spaces are also not expected in a file name, and if present, any characters after the space

will be ignored, e.g. a file called my filecalledfred will actually be seen as “my”

whereas my\ filecalledfred will be correctly seen.

Hyphens, underscores and full stops in file and directory names are fine.

NOTE – a full stop used at the first character of a filename or directory will create what is

known as a hidden file (one that is not seen when you list the contents of a directory). These

are generally used to tidy away configuration files that affect the way your account works.

Now we can copy some files into the course directory that you made earlier. These files are

currently sitting in a directory called intro_course type:

cd course

cp /home/biotrain/intro_course/* .

cp (short for copy) requires the name of the file or directory to copy and then the place to

put the copy.

NOTE the full stop, which comes after a space - and yes you do need to type it as it

specifies the place to put the copies! Here, the full stop is short-hand for ‘the directory I am

currently in”.

This copies all files (*) in the directory intro_course, which is a sub-directory of the home

directory, to the current directory (.) but not the directory intro_course itself. The * is known

as a wildcard (more about this later)

TIP – to copy a directory and all of its contents (including other subdirectories and their

contents, if present, we have to copy recursively. e.g.

cp -R /home/biotrain/intro_course .

10

this would copy the directory intro_course AND all of its contents to your current

directory.

As with most UNIX commands, if this command has worked, there will be no output to tell

you so. If anything is printed (except the usual prompt) this command has not worked, go

back and check you have typed it in EXACTLY as above. If you receive one, the error

message may be informative - for instance

cp: cannot stat fred: No such file or directory

(this suggests that the copy command cannot find the file you are trying to copy – in this case fred)

To list the files now present in your current (working) directory, type:

ls (if this is empty, your copy command hasn’t worked - try again)

To list all the files in your home directory, (the one with the same name as your

username) type:

ls ~ (the tilde or ~ symbol is an abbreviated name for your home directory)

Command line arguments There are two ways that a program can be given additional information - either

1) It can ask you questions on the commandline (prompt) – that you type answers to

2) You can offer the information without being prompted

The drawback of the program asking a question is that if it can do 20 different things, then

being asked 20 questions each time you run it can be very tedious. By convention most

UNIX programs don’t ask for information, they expect you to supply it. This is achieved by

using “command line arguments” sometimes also called flags. By convention, something

is indicated as an “argument” by placing a dash in front of it. Some bioinformatics programs

will ask a basic range of questions but expect additional information to be given via the

command line. We will look more closely at command line arguments, using ls as an

example case.

try typing ls –l

To list all of your files using a different combination of command line arguments, that

influence the output, try typing:

ls –Rl ~ (This is , R and then a small L, not the number one)

The –R flag causes ls to search recursively through all directories below, in this case, your home

directory which is indicated by using the ~ symbol.

11

The –l flag causes a long listing of the information including sizes, ownership and creation/last

modification times.

On this machine, files and directories listed by ls are shown coloured by their type:

Blue: Directory

Green: Executable or recognized data file

Sky Blue: Linked file

Pink: Graphic image file

Red: Archive file

This can make things a little hard to read sometimes. We have set an alias on the ls

command so that when it is run, it automatically and silently adds the option to show

colours. To see this alias type

alias ls and you will see the following:

alias ls='ls --color=tty' (in other words, if someone types ls, you actually run ls

with the optional flag “–colour=tty” to colour output by type if run in a terminal).

Note: You can turn this colour-coding off in the terminal window (for this session only) by

typing unalias ls

Now try to sort all your files according to their age (newest last)

ls –lrt

Finally, we can take a look at some files you don’t normally see when you list with ls

ls –la ~

this makes visible so-called hidden files and directories whose names start with a full stop, .e.g. drwx------ 8 train17 training 4096 May 17 13:49 .

drwxr-xr-x 23 root root 4096 May 5 13:43 ..

-rw------- 1 train17 training 17208 May 16 15:33 .bash_history

-rw-r--r-- 1 train17 training 18 Nov 20 05:02 .bash_logout

-rw-r--r-- 1 train17 training 193 Nov 20 05:02 .bash_profile

-rw-r--r-- 1 train17 training 231 Nov 20 05:02 .bashrc

drwx------ 3 train17 training 20 May 12 12:10 .cache

Here the top line is returning information about your current directory (.) and the second line, the directory one further back towards the root of the machine. If you were to type the command inside /homes/train99 for instance, the top line would refer to train99 and the second line to homes. NOTE – hidden or dot files (e.g. .bashrc) are generally doing useful work inside your account, influencing your environment. DO NOT DELETE THEM. If you delete them by mistake, your account may not look the same next time you log in, or certain programs may no longer work as expected. If so – contact [email protected] for help.

Ownership and Permissions Example of file information returned by the command ls -l:

drwx------ 20 sarahb system 8192 Sep 12 2002 www_data/

mailto:[email protected]

12

-rw-r--r-- 1 johnp system 1200 Sep 24 17:30 tape.txt

The first character of each line (as below) indicates the type of the file. For example, a d in this position indicates a directory, - indicates a regular data file.

-rw-r--r-- 1 sarah system 1200 Sep 24 17:30 tape_change.txt

^^^

The next three characters define the permissions afforded to the owner of the file. In this case,

they should be set to rw- for all the listed files. This indicates that the file owner can read, write

to, but not execute the files.

Write permission is required in order to edit or delete a file. Execute permission is required if the

file is a program file or a file containing a list of textual UNIX commands (a script). Without

execute permission, a program or script file cannot be made to run, i.e. be executed. Execute

permission is also required for directories in order to gain full access to the files stored within.

-rw-r----- 1 sarah system 1200 Sep 24 17:30 tape_change.txt

^^^

It is possible to divide the users of a system into groups. This allows users to set their file

permissions such that members of their group have greater access to their files than do other

users of the system. The next three characters define the permissions that the members of the

user’s group have. The group name is given in the fourth column (in this case, “system”).

These three characters should be set r--, indicating that members of your user group may read

your files but may not write to (i.e. amend or delete) or execute them.

The next three positions refer to the access that everyone else (world) would have (in this case none – as shown by a dash). -rw-r----- 1 sarah system 1200 Sep 24 17:30 tape_change.txt

^^ ^

The second column reports the number of links to the file (you can ignore this figure). -rw-r----- 1 sarah system 1200 Sep 24 17:30 tape_change.txt

^

The next columns report the owner of the file (sarah) and the group (system) to which the file

belongs. The figures following this are the size of the file in bytes (characters if you prefer), the

date and time that the file was last modified, and finally the name of the file (or directory).

Changing file permissions There may be a situation where you want someone else to be able to copy or read one of your

files. You will have to change the permissions on the files to allow them to do so. You must also

change the permissions of the parent directories, as these override those of individual files. It is a

very common error to forget to do this. The command to change permissions is chmod. You

have to specify who you are modifying permissions for, and what permissions you are changing,

and for what file/directory.

N.B. Unless you have a specific need to share a specific file-set, you should not normally

need to modify the permissions in your account.

u means user and refers to the owner of the file g means group, and refers to the group the file belongs to o means others, everyone apart from those above a means all, i.e. user, group and others Also, as we have seen above, r means read permission, w means write permission and x means execute permission.

13

So, for example, to give read permission to someone in the same group for a file called “filename” in ~/course. ls –l ~

chmod g+r ~ (allow people in the group to read my home directory)

chmod g+r ~/course (allow people in the group to read the directory course)

chmod g+r ~/course/filename (allow the group to read the file called filename)

chmod a+r ~ (allow everyone to read your home directory)

If you wanted to remove the permissions use – instead of + chmod g-r ~ stop your group from reading your home directory

Looking at text –based files There are a number of commands you can use to look at files that contain text. Sometimes

you may want to just send the contents to the screen all in one stream without stopping but

more generally you may want to be able to look at the content a screen-full at a time.

Two of the most useful are: more filename

less filename

These two commands are very similar, but less has greater flexibility – (less does more than

more does – silly pun). Both will present information from the file to the screen one page at a

time, (as opposed to other commands that scroll down the document too quickly to read

such as cat). However, less will allow you to scroll back up the document using the arrow

keys, whereas more only allows you to scroll down. To exit out of a document you are

reading using more or less, type

q

Note: more is a standard UNIX command, whereas less may or may not be available on

other UNIX systems you may encounter.

For example, try the following: cat cd4_human.pep

more cd4_human.pep

The more command shows you the contents of a file one page at a time and tells you to hit

the space bar to continue. Now try:

less cd4_human.pep

You should be able to use the arrow keys to scroll up and down the document.

When using less there are a number of keystrokes you can use to give for navigation

h help

14

q quit program

space bar next page

return key next line

f forward one page (same as pressing the space bar)

b back one page

G go to the end of the file

g go to the start of the file

j moves you forward a line

k moves you back one line

/xxx search for the characters xxx in the file8.and highlight matches

n find next occurrence of search pattern above

? search in the opposite direction

Now Try looking at one of your files using less, and navigate around the document

using some of the commands shown above.

Looking at Other Files

If files have been compressed using the gzip command, they will usually have a filename

which ends in .gz. If they are text-based files, you will be able to read the contents without

uncompressing it using a command zcat. This sends the output to screen all in one go, so

you might want to redirect it into the less command so you can read it one screen-full at a

time

zcat myfile.gz | less (more on redirection later…)

If you really need to look inside a binary file, you will either need to use a program designed

to work specifically with its exact format (and this will depend on which program created it) –

or you can extract readable strings out of it using the strings command)

Wild cards The * character is a ‘wildcard’. That means it can mean any symbol or symbols.

Thus:

*.seq means all files ending in .seq

c*.pep all files whose names begins with c and ends with .pep

* all files

Wild cards allow us to specify alternative filenames with a minimum number of keystrokes.

We can also use the ? character, meaning “any single character”, so

more cd4_?????.pep

will display any file beginning in cd4_, followed by any 5 characters, and then .pep. So,

here this would match cd4_mouse.pep and cd4_human.pep files, but not cd4_rat.pep.

We can also use square brackets to denote a range of letters [a-z] or a selection of

letters [abrh], so

15

more cd4_[abrh]????.pep

will match cd4_human.pep and cd4_rabit.pep, but not cd4_mouse.pep or cd4_rat.pep.

Now you try these commands on your files in your current directory, also

more *.pep

Copying and deleting files and directories Here, we will carry out a number of basic file manipulations using UNIX commands. A

summary of the commands we use and what they do is provided at the end of these notes.

We are going to:

• make a new subdirectory under the one we are in at the moment

• move some files into it

• rename a few files

• delete the directory we have made (and its contents)

These are all functions that you will need in order to be able to organise and navigate within

your own account.

Type:

cd ~/course moves you into the directory course, under your home directory

mkdir test make a new directory called test

ls test list all the files in the directory called test (directory should be empty)

Now, we are going to copy a file into that directory.

cp cd4_human.pep test copy the file cd4_human.pep into the directory test

cp stands for copy, and is an important command. An important point to note is that you can

copy files, or directories, (if you add certain flags). Notice that above, we are copying the file

cd4_human.pep to the directory test. If we had not previously created the directory called

test, the computer would have assumed that what we wanted to do was to copy the file

“cd4_human.pep” and call the copy “test”. If you wanted to be sure, you could write the

following, but it does the same as the command above:

cp cd4_human.pep test/cd4_human.pep

Remember, to the computer, files and directories are two different things. A directory is

something you can store other things in. But you do have to TELL the computer if you intend

something to be a directory or just a file. That is why you have special commands, like

mkdir, to make a directory.

Now, try the following:

cd test move into the directory “test”

mv cd4_human.pep newname.pep rename the file cd4_human.pep to newname.pep

cp newname.pep second.pep make a copy of newname.pep called second.pep

16

mv is short for move, and is the command used for either moving files to new locations, or

purely renaming them (a similar act!).

Now we want to delete, or remove, newname.pep. Type:

rm newname.pep

Now, let’s move up a directory, to course, and then delete the test directory completely:

cd ..

rm -r test

The .. is a shortcut, meaning ‘go back up one directory from where you currently are’ – in

this case back from test to course.

The flag -r is required to delete directories and will delete a directory recursively along with

all of its contents – including other subdirectories so BE CAREFUL.

On this machine, you will be prompted to examine contents of a non-empty directory and

asked if you want to delete each subdirectory and contents individually (type Y or N when

prompted). To blindly delete without examining, again insert a backslash in front of the rm to

unalias it.

.

Empty directories are more usually deleted using the rmdir command.

Note the difference between the mv and cp commands. If the entity you are moving or

copying is a directory, the source file(s) are moved (mv) or copied (cp) into that directory. If

the name given as the place you are moving or copying to is not already known to the

computer as a directory, then the file is copied (cp) or renamed (mv).

If the target (destination) is a file which already exists, then the program will ask you to first

confirm the action (this is not standard, most UNIX systems will immediately overwrite the

original file).

ls (returns 2 files that exist)

normal_1_1_fastqc.html

normal_1_2_fastqc.html

cp normal_1_1_fastqc.html normal_1_2_fastqc.html

cp: overwrite normal_1_2_fastqc.html? n (user is prompted if existing file

should be over-written - file is not overwritten as answer no is given)

If you add a -f flag to the copy command, it forces the action to be done silently, but take

care if you choose to do this!

The file system has been set up to try to stop you copying over files and directories

accidentally. However, NOT ALL PROGRAMS ARE AS NICE. Although the more

dangerous commands (cp, rm, mv) have been modified so that they will at least ask first,

most bioinformatics programs won’t. So if you repeat an analysis, the results of the

second analysis may overwrite those of the first unless you give the program a new

destination name for the output. Other programs may just silently fail to run.

17

The moral here is to make copies of important files before you start manipulating them:

cp file file.orig

Using an editor to create a file There are several text editors available to you on our servers. The most universal UNIX

editor is vi (or its slightly more helpful version vim) but it isn’t the simplest editor to use.

Today we will be looking at two editors, a simple editor called pico, and a windows-based

editor called gedit.

Pico [Pico is available on Codon but not currently on training.medbio].

We will create a simple text file. We will make a file of filenames called cd4.list. So, we

start up the editor pico, telling it to edit or create the file cd4.list. If we gave it no filename

pico would start a new file and ask you for a name to save it under when you exit the

program.

pico –w cd4.list (the –w flag tells pico not to linewrap long lines)

.

At the bottom of the screen you will see the standard pico commands in reverse video like

this: (I have started to type test into the editor pane)

e.g. cntrl-x exit cntrl-u undo cntrl-w search.

You need to use the arrow keys to move around inside your document. Now Type the

following lines into the file:

cd4_cerae.pep

cd4_erypa.pep

cd4_human.pep

When you have inserted the three lines quit from the editor by pressing <Cntrl> x

The list you created consists of the names of three files in your current directory. Some programs

can take as input a list file like this (i.e. a file containing the names of other files to input). If you

18

wanted to use files in any other directory, you will need to tell the machine where to look for

them, either relative to the place where you are when using the listfile, or the absolute path from

the root of the machine. To do this, you need to specify the full path of your files.

e.g: /home/jbloggs/course/cd4_cerae.pep

Remember - The pwd command can be very useful is you are not sure what the full path to your

file is!

Now let’s look at the gedit editor (the Gnu editor - where Gnu is a free software foundation

rather than an ungulate).

Gedit requires an X11 connection, by the way, while pico does not, and will work in a simple

terminal – one reason to be familiar with both.

Start it by typing

gedit cd4.list &

The & symbol runs the program in the background so you can carry on working in the

terminal as well, if you wish – more about this later

This is a slightly more friendly-looking editor. Here we have several menus, selected by using the

right mouse button. At the bottom of the window, a menu currently showing a Plain Text option,

allows you to select auto-syntax prompting, for a large number of possible programming

languages, including Python, C++.

now try editing the text file you have loaded. When you have seen enough, save the file and

then exit gedit.

Finding Help There are a number of places you can go to find programs you need, or find out about

programs or commands. Here are a few options:

19

The man command

man more

The UNIX command for getting help is man (because it brings up manual pages). These

pages provide information on a number of programs on the system, including many of the

UNIX commands you may have cause to need. If you type:

man ls

you can now read all about the ls command, including what extra information you can give

the program to get it to do particular things. Of course, you need to already know the name

of the program to get help this way. If you don't know this however, you can type either of

the following to try and find out what commands exist for what you want to do:

man -k keyword

or

apropos keyword

You can now look at the man page for any command you think is appropriate.

Other help A few bioinformatics programs (e.g. Hmmer) have man pages, but most don’t. Often help

files are distributed in html (web) format or as pdf files and can be found by searching our

web site by software name or by looking in our software database

(http://www.imperial.ac.uk./bioinformatics-data-science-group/resources/software).

If you experience problems using your account, including forgotten passwords, running out

of space, can’t find what you want, programs behaving unexpectedly, please contact out

email help-desk by mailing [email protected]

Passwords If you are using this Unix account for more than the duration of this practical session (i.e. a

one day temporary account), you should change your password from the temporary one you

were assigned. Generally, temporary new passwords are disabled after a week, and if you

do not log in and change it before then, you may find yourself locked out until you contact

[email protected] for a new one.

This is an important security step as we have had to distribute your default usernames and

passwords by email. New passwords must be a minimum of 8 characters, contain at least

one number, capital and/or extended character (not spaces, exclamation marks, full stops,

brackets or slashes), and contain no obvious words. To change your password, type the

command passwd as below and then you will be prompted for your current password and

then to type in a new one twice (which will not show up on the screen):

passwd

[trainer@training rnaseq]$ passwd

Changing password for user trainer.

http://www.imperial.ac.uk./bioinformatics-data-science-group/resources/software



20

Changing password for trainer.

(current) UNIX password: (type the old password and press return)

New password: (type the new password and press return)

New password: (type the new password and press return)

password for user trainer changed

If a password selected is not suitable or if there are differences between the first and second version

of the new password, you will be warned and the password will not be changed.

If you forget what you have changed the password to – you will need to email [email protected]

with your name, name of the machine you are trying to log in to (e.g. training.medbio.ic.ac.uk)

and current username, and we can send a new temporary one for you.

Quotas

On most of our servers, we use a system of quotas to control the amount of file storage any

particular user can use. As well as encouraging you to consider keeping your account tidy by

periodically removing unwanted temporary files, this can help to control the output of

runaway processes. Generally, quotas operate a bit like a bank account with an account limit

(soft quota limit) and an overdraft facility (hard quota limit). Quotas are usually set on

actual space (in kilobytes) and inodes (numbers of files). Unless you are storing huge

numbers of very tiny files, you are unlikely to ever hit the inode quota.

On some of servers – Codon, for instance, you will see information on your current space

and quota when you first login, something like this:

The histogram indicating the proportion of space used will display orange markers when you

are using more than 80% of your space, and red when over 90%.

Note – normal accounts will show only one bar here, when given access to a larger project

space which is quota-ed separately (e.g. data/syntegron in this example), each project will


21

show as a bar. Please note that the training server training.medbio – may not show this

histogram.

To see your account quota and how much space you are using you can use the quota

command.

quota –s

(the -s argument makes the quota command return appropriate units for allocated space –

here megabytes MB and Gigabytes GB – the default shows only blocks units).

[sarahb@codon ~]$ quota -s

Disk quotas for user sarahb (uid 1003):

Filesystem blocks quota limit grace files quota limit grace

192.168.0.45:/mnt/home

5291M 10240M 11264M 4957 0 0

Filesystem - the file system to which the quota is applied (in this case, the file

system containing the home directories).

blocks - how much disk space you are using (in this case, 70110 Mb)

quota - the actual amount of space you have been given to use (97,280 Mb)

limit - this is the absolute hard limit of space you can use. There is a small overdraft

allowance of space between your quota and limit - but you cannot exceed your hard

limit – files will no longer be created and your account may act strangely for this

reason.

grace - when you have filled up your allocated quota, the system automatically gives

you a period of time (7 days) during which you can use your overdraft space up to

the hard Limit. If you have not had a tidy up or contacted us within this time you will

not be able to create new files or edit files until you have removed something to free

up some space.

Files - the quota on the number of actual files you are allowed to have. At present we

are not applying quotas to the number of files you can store – hence the zeros.

Home Directories and where to work

Every account has a 'home' directory associated with it, which is your personal space to

store your data, and is backed up daily. By default, our home directories have a quota of

10Gb. This can be extended up to 100Gb on request. Project directories are created under

/data and can be made available to a group of users or an individual where larger space is

required, or contents need to be shared across a specific group. (there may be charges

associated with this additional space.

If you think you have DELETED a file or mis-edited it, from inside your account, (including

project directories) and really need the old version – and the file was created more than 24

hours ago – please send an email to [email protected] giving the exact name and previous

location of the file or files. We may be able to restore a previous version from backups (this

is not generally possible from temporary training accounts).


22

Scratch Directories - Codon has a sizeable unquota-ed scratch volume available for use

as temporary workspace. The term scratch is generally used to denote a working space

which is used for temporary storage of data that is not backed up. By using this space in

your day-to-day work when you know you will be creating very large interim results files, you

do not need to worry about exceeding your home directory quota. BUT this space is not

backed up and old data is subject to automatic removal after fixed periods of non-use.

Consequently it is extremely important to ensure you move data requiring long-term storage

to either your home directory or project storage. NOTE – you will not need scratch or project

space for any training courses and training accounts are not set up with these.

Please contact [email protected] if you feel that you need to use the scratch volume,

and a directory will be created for you within it.

Getting Files on and off the Server For security reasons, the standard ftp protocol is disabled on our servers. To transfer files on

and off the servers you will need to use a more secure protocol such as sftp or scp.

For transferring files between a PC or Mac and our servers we recommend the FileZilla

secure ftp client – which is free, and installed as standard on College Desktop machines.

On a windows machine, you may also find Winscp useful.

Information on using FileZila with a Mac is available at

https://wiki.imperial.ac.uk/display/BioInfoSupport/Installing+FileZilla+on+OSX

Now we will use Filezilla to transfer a file from the server to our local PC. First you will need

to launch FileZilla on your local PC. Find the FileZilla icon on the desktop and

double click to start it. If you cannot find one, search for the program by name in the All

Programs Search box. Once Filezilla launches it will look something like this:


23

but if you have not used it before, the right hand (server) pane will be empty. First you will

need to add the details of the server you want to connect to

type the server fullname (e.g. training.medbio.ic.ac.uk) into the Host dialogue box, your

username for that server in the Username box, the related Password in the Password box,

type 22 into the Port box (this is the default port number on the server for listening for SFTP

requests) and the press the Quickconnect button.

Some information will appear in the top section, telling you that you are being connected and

you may see a new window telling you that authentication keys are being added to the

server – if prompted, say yes. Once connected, the right hand Remote pane will be

populated with the directory tree starting at your home directory. Now you can browse to the

file you want to transfer (you can select more than one using the Shift or Ctrl keys) , click to

select it and simply drag it across from the Remote (server) pane to the left-hand Local (PC)

pane, into the appropriate folder. If you want to view a part of the server outside of your

account (e.g. /data) you can simply type the full directory address in the Remote site

dialogue box as below.

Now go to /data/rnaseq/fastqfiles on the server and copy the file normal_1_1.fastq to your

C:\temp or tmp folder. You can take a look at the contents of the file if you wish, using

Notepad or Wordpad. Once you have finished with FileZilla, shut the server connection by

using the Server menu and Disconnect option and then shutting the program in the usual

way.

24

Please note that Microsoft Office format files (e.g. Word .docx, Excel .xls) are not readable

on our Unix servers. If you need to read contents from these formats, you can save as .txt or

.csv respectively before transfer, but the default end-of-line characters (that you can’t

normally see) are different between windows and linux and may cause problems with some

programs.

More Useful UNIX Features Unix supports a “standard input” and “standard output” model. Normally, information is

accepted from the keyboard, (also known as “standard input”) and programs send

information back to the screen (also known as “standard output”). However, this input/output

can be redirected e.g. to a file. Some programs send their output to a file as standard. Errors

produced by a command are usually sent to the screen as well – although they come via a

different stream - generally called “standard error”.

Standard Output redirection

As an example we will consider the cat (concatenate files) command. This normally sends

the contents of a named file to the screen, all in one go. You can, however, redirect the

output into a file using the > symbol and the name of the file you wish the information to be

sent to.

cat unknown.tfa sends the information in file1 to the screen

cat unknown.tfa > another.file sends information in file1into a file called

another.file

cat unknown.tfa unknown2.tfa > another2.file sends contents of file1 and

file2 to another2.file , concatenating their contents one after the other.

Note: redirecting in this way will not overwrite an output file of the same name if it already

exists. You will need to remove it first, or force an overwrite by adding an exclamation mark

! immediately after the >

Standard Error redirection

Sometimes it is useful to trap errors that a program produces while it is running -for instance

if you are running a program from within a script, or if it is producing too much information on

the screen to easily read.

First we need to generate an error – we can produce one using grep (this program requires

a pattern to look for, and a file/files to find the pattern in)

[sarahb@codon course]$ grep

Usage: grep [OPTION]... PATTERN [FILE]...

Try `grep --help' for more information.

Now redirect the error and save it in a file as follows:

25

grep 2> error.out

less error.out

what did you see on the screen, and where did the error go?

Input redirection We can also redirect input. This can be useful for programs that normally require extra

information to run. If you type this extra information, exactly as required by the program, into

a file, you can then use this file to input information into the program.

For example:

blast < standard_blast_answers sends the information standard_blast_answers to the

blast program

To be really fancy we can even redirect the input and the output at the same time.

blast < standard_blast_answers >output

(If you think you know all the standard information that blast needs to run you can try this….

But we haven’t made a file for you in this instance).

Piping and other useful manipulations If we wish to carry out a series of actions on the same information, we can pipe the output of

the first action to the second, and if desired, pipe the output of the second action to a third,

etc. In other words, you can use the standard output of one program directly as the standard

input of another. This is more easily understood with an example:

When you list all the files in a directory but there are too many to fit on the page, you may

want to use the command more to allow you to view them page by page. To do this you can

pipe the results of the command ls –l through the command more. (The pipe symbol is ‘|’.)

ls –l ~ | more (list all the files in my home directory and view them page by page)

ls -l ~/course | more (list files in the directory course and view them page by page)

An Example: Let us say that you want to identify all files created in January. One way of

doing this would be to make a long listing with ls -l and then look at the list. A better way

would be to pipe the output of the list to a program that searches for a pattern - the grep

(global regular expression) command.

ls -l /tmp | grep Jan (here grep is searching for the pattern Jan)

Grep is a very useful command so you might like to look at its help pages. N.B. grep

supports regular expression searching (not covered here).

The search can be made case-insensitive by using the flag -i.

To only report incidences of a pattern present at the first character in a line, add a ^ symbol

in front of the search pattern

To add line numbers to where it reports a match, use –n

26

To only report the first n instances of a pattern, use –m number where number is the

number of returns you want

Now try each of these flags out, by searching the file cd4_human.pep for the pattern A

To sort all your files by their file size, you could do the following.

ls –l ~ | sort -n –k 5 (sort numerically on 5th field of text)

To find out more about the sort command, you can also read the man page about it

at your leisure.

Simple counting

You can count the number of characters, words, or lines in a text file using the wc command.

Investigate the options by using wc –help

Now try running it with different options on the file cd4_human.pep

Running a process in the background

Some processes can take a while to run. You may not want to have these things running on

your screen. Fear not, you can place such processes in the “background” in two simple

steps, when it is already running

<Ctrl> z this command suspends the current process

bg this starts the process running again, but in the background

Alternatively, you can start the program in the background, by adding an & symbol after the

program name. You can now work on other things, or logout, leaving the background

process running. Typing

fg

will bring it back to the foreground if you want to continue working interactively with this

process. If you try to logout and get prompted that there are suspended jobs, chances are

you’ve left a job suspended and forgotten about it. To see all your suspended jobs, and jobs

running in the background type

jobs

Now try this out. Look at the contents of a large file using more (e.g. more

unknown.tfa), then suspend the more process, put it in the background, run jobs to check

it is still running. Finally type fg to bring it back to the foreground. Quit more as usual, e.g. by

typing q.

NOTE if you had more than one job running in the background, fg will work on the newest

one. Each background job gets assigned a number, visible when you type jobs. To call

another job back to the foreground type % followed by the number of the job you are

interested in

e.g.

sarahb@codon [course] jobs

[1] - Suspended (tty output) less unknown.tfa

27

[2] + Suspended (tty output) vi eric

sarahb@codon [course] %2

this brings the second job – in this case vi back to the foreground so you can interact with it.

A job can be executed directly in the background by appending & to the end of the

commandline. Try typing

less unknown.tfa &

now bring the job back to the foreground and quit less.

History You can call back and edit commands you have previously given, by using the arrow

direction keys. You can also look at old commands by typing history which returns a

numbered list in order, with the most recent commands at the bottom. You can recall a

specific one by typing an exclamation mark followed by the job number shown by history

!32 reruns command 32

!em reruns the last command that started with em

N.B to bring the cursor back to the beginning of a line of command you are in the middle of

typing, or one you have just recalled as above use <Ctrl> A.

Now take a look at your command history. Choose a command that you would like to re-run

and rerun it by using its number.

Finding the Jobs you are running Sometimes you want to find how your jobs are running – and perhaps you would like to

terminate one before it finishes. You can view the jobs you have currently running by typing

ps

this will show some but not all the processes on the machine that currently belong to you.

This will return the name of the program that is running together with the command line

options used to run it, and information on the resources being used by the job and how long

it has been running. Each job or process is given a unique PID which you can use to refer to

it. A more useful version of this command is to use

ps –ef | grep username

if you add your username here, this will show every single process belonging to you. You

can terminate or ‘kill’ your own processes if you know their PID using the kill command.

Please note that if you kill the wrong processes, you may log yourself out unexpectedly! You

can only kill processes belonging to yourself. First use a ps to determine the PID of the

process you wish to kill and then type

kill XXXX (where XXXX is the PID of the process you wish to terminate).

28

Now start xeyes running in the background by typing xeyes &

Now try and find the xeyes process in the process table as above, and kill it. Your xeyes

should disappear.

Some processes may take a short while to die. There are more imperative ways of killing

jobs but we won’t cover them here. To view the most CPU and memory intensive jobs

running on the machine you can use the command

top

This will list the top resourced processes on the machine, as well as the resources they are

using, their ownership, and the overall load on the machine. After refreshing the list a few

times, top will quit. For more information, try using man top.

When you feel you have seen enough and you are ready to log out of any terminal windows

windows you have open, you can close them and logout by typing exit or <Ctrl> d in each

one.

Summary of useful UNIX commands Cntrl-c Stop a process

Cntrl-z Suspend a process, see also jobs, fg and bg.

bg To send a suspended job to the background

cat Type file contents to the screen all at once (see also more)

cat To concatenate files together (cat file1 file2 file3 > newfile)

cd Change directory (cd subdirectory)

chmod To change the permissions or protection on a file (chmod a+r somefile)

cp Copy a file (cp filename1 filename2)

cp Copy a file to a directory (cp filename directoryName)

emacs, xemacs A text editor, more powerful than pico, but more complex.

fg Brings a suspended or background job to the foreground

finger or f To find more information about a user or users, try finger jbloggs

grep To search for files containing a pattern. To search files for the word ATWH (grep

ATWH filename)

history To list the last 50 commands you have entered

jobs Lists any suspended or background processes that you might have

kill To stop or kill a running process where 23459 is your process ID. (see also top and ps)

exit How to exit from the machine, if you get a message about suspend jobs then type it

twice

ls List the files in your directory

ls -l List the files in your directory but with “longer” information

man command For help about UNIX command command

29

man -k keyword Lists all UNIX commands that mention the word “keyword”

mkdir make a directory

more Type a file to the screen a page at a time (press q to quit, space bar for next page).

mv move a file into a directory (mv filename directoryName)

mv Rename a file (mv oldname newname)

passwd To change your password

pico A file editor (pico filename)

pwd Print the full path of your current directory

ps List your current processes

quota To show your disk space quota and current use.

rm Delete a file (rm filename)

rmdir Delete a directory (directory must be empty)

top To see who is hogging all the CPU time

who To list users currently logged on

Getting Started with the BDSG Login Service · Getting Started with the BDSG Login Service Introduction and Learning Objectives ... This tutorial will introduce the concept of the

Documents