Top Banner
How to Install and Use a Standalone BLAST (B asic L ocal A lignment S earch T ool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06
24

How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) ServerDoug DavisPlant Science DivisionUniv. of Missouri6/26/06

Page 2: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

Lab Premise

Bioinformatics research is typically web-based

Access to necessary URLs may be hampered by need for administrator permissions

Solution: Standalone BLAST (you will be provided a CD containing all necessary files at the lab’s conclusion)

Page 3: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

Lab Goals

See where BLAST fits into the larger scheme of bioinformatics

Demonstrate installation of a standalone BLAST server on a Windows XP PC (should also work on a Windows 2000 PC)

Gain initial familiarity with available standalone BLAST parameters

Page 4: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

Bioinformatics Defined

Study of biological questions using computers in place of traditional labware (e.g. test tubes, pH meters, electrophoretic equipment)

Dependent on databases containing molecular data generated over many decades

Millions of sequences are in these databases; best of all, tools like BLAST can search for sequences in such large databases very rapidly

Page 5: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

What Is BLAST?

BLAST is a program that searches for similarities among molecular sequences- works with nucleic acids and proteins

It performs local (as opposed to global) alignments using a special set of scoring matrices

It calculates statistical significance for any matches it finds (allows you to evaluate the degree of similarity)

a very powerful tool for characterizing unknown sequences by using sequence alignments to known sequences

Page 6: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

The usual way BLAST is employed… Requires an active internet connection to visit websites

where molecular databases reside (e.g. http://www.ncbi.nlm.nih.gov)- you have a lot of flexibility working over the web (many different databases and informatics tools can be rapidly accessed)

You specify a target database to be searched using the website’s BLAST server

You upload the query sequences (these are the sequences you want to learn more about) to a web-BLAST server; then these sequences are compared by the BLAST alignment algorithm to all sequences in the specified target database

Page 7: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

If BLAST detects a match between query sequences anddatabase sequences, this indicates some meaningfulrelationship between the aligned sequences.

Target database sequences

This database contains manysequences which are al-

ready characterized, these arethe “knowns”

Query sequence(s)

These are sequences you wantto know more about. Consider

them as “unknowns”.

BLASTprogram

BLAST Session Setup

Page 8: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

Here’s how the BLAST session looks in“Command Prompt” (this is the programyou will use in Windows to run BLAST):

Page 9: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

Here’s the “Hit Table” Output from a BLASTSession- the Hit Table format is a stripped-downBLAST output

Page 10: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

Hit Table Format of BLAST Output The output report fields are outlined here

# BLASTN 2.2.14 [May-07-2006]# Query: 5221 sequences# Database: maize_genes.txt# Fields: Query, Subject, %ID, AlignLngth, Mismatch, Gaps, Qry_start, Qry_end, Subj_start, Subj_end, e-val, bit_score

CK828121 TC279221 88.96 589 55 8 74 653 274 861 0.0 642CF624012 TC279225 94.03 318 19 0 143 460 411 94 4e-136 480CF624331 TC279227 99.25 665 4 1 1 665 846 183 0.0 1277CK826720 TC279296 100.00 28 0 0 1 28 1666 1639 3e-008 56.0CF623767 TC281097 81.85 292 39 13 283 567 1513 1229 3e-035 145

Page 11: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

BLAST Report Field Explanations

mismatches- number of nucleotides that don’t match over the length of the aligned portion

gaps- a confusing field, as these can be caused both by truncation of sequence or when there are multiple, contiguous mismatches in the middle of an alignment- then the matching algorithm introduces a gap into the alignment

e-value- a statistic which indicates the probability of recovering the sequence of interest, given the size of the database searched; it is strongly influenced by the size of the database searched

bit score- a probability statistic which takes the size of the searched database into account (high scores indicate strong alignments); unaffected by the size of the database searched

Page 12: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

Query= gi|44900833|gb|CK827378.1|CK827378 zmrsub1_0B20-006-a11.s4zmrsub1 Zea mays cDNA 3', mRNA sequence (609 letters) Score ESequences producing significant alignments: (bits) Value

TC280752 UP|Q9LLI2_MAIZE (Q9LLI2) Cellulose synthase-8, complete 32 0.34

>TC280752 UP|Q9LLI2_MAIZE (Q9LLI2) Cellulose synthase-8, complete Length = 3931

Score = 32.2 bits (16), Expect = 0.34 Identities = 22/24 (91%) Strand = Plus / Plus

Query: 531 cgaggcggaggacgccgtcgacga 554 ||||| |||||||| |||||||||Sbjct: 519 cgaggaggaggacggcgtcgacga 542

Default BLAST Output: Graphical Alignmentof Query Sequence to Subject Sequence inthe Target Database (nucleotide-nucleotide)

Page 13: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

How Does BLAST Make the Alignments?

C O E L A C A N T H

0 0 0 0 0 0 0 0 0 0 0

P 0 0 0 0 0 0 0 0 0 0 0

E 0 0 0 1 0 0 0 0 0 0 0

L 0 0 0 0 2 1 0 0 0 0 0

I 0 0 0 0 1 1 0 0 0 0 0

C 0 1 0 0 0 0 2 0 0 0 0

A 0 0 0 0 0 0 0 3 2 1 0

N 0 0 0 0 0 0 0 1 4 3 2

Answer: Local Alignment is based on the “Smith-Waterman Algorithm”

the local alignment produced by this algorithm is: ELACAN ELICAN

Page 14: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

How to Calculate Smith-Waterman Matrix Values Matches are assigned a value of +1, mismatches are -1,

gaps (where there is no character to try matching with in one of the sequences) are also assigned a value of -1

Calculate the match score: sum of the score in the preceeding diagonal cell plus the gap penalty (+1 if no gap, -1 if there is a gap)

Calculate the horizontal gap score: sum of the cell to the left plus the gap penalty

Calculate the vertical gap score: sum of the cell above plus the gap penalty

The maximum score is never less than 0.

Page 15: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

What Types of Questions Can BLAST Be Used to Answer?

Find genes in a genomic sequence

Predict a protein’s function

Predict the 3-D structure of a protein

Identify members of gene/protein families

Page 16: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

Why install a Standalone Copy of BLAST? You don’t need administrator permissions to

run it

Easier to control the output format (you aren’t stuck with what the website decides you should have)

More user control (easier to construct custom BLAST queries)

Page 17: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

Flow of Events in a BLAST Session

format the targetdatabase (protein ornucleic acid)

create a file that contains thequery sequences

create a blank file that will receive the

BLAST output

submit the BLASTjob using the

command promptreview the BLASToutput; formulatenew hypothesis

Page 18: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

BLAST Installation Details: Part 1

Insert the provided CD and locate the file named “ncbi.ini” (this file contains the path to the BLAST\data subfolder)

Click the “Start” button on your desktop, then click on “My Computer”, then click on the C:\ drive

Open the WINDOWS, WINNT, or WINDOWS NT folder and drag the ncbi.ini file into either of these folders

Page 19: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

BLAST InstallationDetails:Part 2

Go to C:\Program Files

Drag the BLAST folder on your CD into the C:\Program Files folder- be careful to not place it inside another folder that resides in C:\Program Files.

Open the BLAST folder and click the file named “blast-2.2.14-ia32-win32” to install the BLAST application

Page 20: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

BLAST Installation Details: Part 3

Drag the .txt file “maize_genes” from the CD into the “C:\Program Files\BLAST\data” folder

Create and save a blank text (.txt) file named “query_seqs” in the “C:\Program Files\BLAST\data” folder

Open the .txt file named “Install_Lab_seqs” from the CD, and copy the contents; paste these into the file “query_seqs” then save the file

Create and save a .txt file named “output” in the “C:\Program Files\BLAST\data” folder- this file will receive the BLAST output

Page 21: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

BLAST Installation Details: Part 4 Move the following files from the “C:\

Program Files\BLAST\bin” folder into the “C:\Program Files\BLAST\data” folder: “formatdb”, “blastall”, “blastclust”, and “megablast” (these are the “executable” files you will need to make BLAST run)

Click Start, select “All Programs”, then select “Accessories”; click the “Command Prompt” icon to open a “command line” session

Page 22: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

Get Ready to BLAST

Type the following in at the command prompt: “formatdb –i maize_genes.txt –p F –o F” (this command will format the target database, maize_genes.txt, so that it can be searched by BLAST)

Page 23: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

Using Standalone BLAST

At the command prompt, type the following: C:\Program Files\BLAST\data>megablast -i query_seqs.txt -d maize_genes.txt -o output.txt -F "m D" -D 3

Press the Enter button, then BLAST will start processing the commands

When the program terminates (you will get a new command prompt), open the output.txt file to inspect the results.

Page 24: How to Install and Use a Standalone BLAST (Basic Local Alignment Search Tool) Server Doug Davis Plant Science Division Univ. of Missouri 6/26/06.

Different Types of BLAST

There are 5 types of BLAST available:

megaBLAST: very rapid (~12-fold faster than BLASTN), DNA query against DNA databases

BLASTN: same set-up as megaBLAST, slower, but more options for query construction

BLASTP: protein used to search protein databaseBLASTX: translated DNA search of protein databaseTBLASTN: protein used to search translated DNA

databaseTBLASTX: DNA translated in all 6 frames versus a

translated DNA database

We’ll look more at these this afternoon