Top Banner
80

The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

Dec 25, 2015

Download

Documents

Cornelius Ward
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.
Page 2: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

The 5 Standard BLAST ProgramsProgram Database Query Typical Uses

BLASTN Nucleotide Nucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying related transcripts.

BLASTP Protein Protein Identifying common regions between proteins. Collecting related proteins for phylogenetic analysis.

BLASTX Protein Nucleotide Finding protein-coding genes in genomic DNA.

TBLASTN Nucleotide Protein Identifying transcripts similar to a known protein (finding proteins not yet in GenBank). Mapping a protein to genomic DNA.

TBLASTX Nucleotide Nucleotide Cross-species gene prediction. Searching for genes missed by traditional methods.

Page 3: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

WU-BLAST vs. NCBI-BLAST• faster (except for BLASTN)• word size unlimited• nucleotide matrices• gapped lambda for BLASTN• links, topcomboN, kap• altscore• no additional output formats• no PSI-BLAST, PHI-BLAST, MegaBLAST

Page 4: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.
Page 5: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.
Page 6: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

>gi|23098447|ref|NP_691913.1| (NC_004193) 3-oxoacyl-(acyl carrier protein) reductase [Oceanobacillus iheyensis] Length = 253

Score = 38.9 bits (89), Expect = 3e-05 Identities = 17/40 (42%), Positives = 26/40 (64%) Frame = -1

Query: 4146 VTGAGHGLGRAISLELAKKGCHIAVVDINVSGAEDTVKQI 4027 VTGA G+G+AI+ A +G + V D+N GA+ V++ISbjct: 10 VTGAASGMGKAIATLYASEGAKVIVADLNEEGAQSVVEEI 49

Page 7: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

BLAST ALGORITHM BLAST STATISTCS

Word Hit Heuristic

Extension Heuristic

Karlin-Altschul statistics:a general theory of alignment statisticsApplicability goes well beyond BLAST

TWO ASPECTS OF BLAST

BLAST uses Karlin-Altschul Statistics to determinethe statistical significance of the alignments it produces.

Page 8: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

BLAST ALGORITHM BLAST STATISTCS

Word Hit Heuristic

Extension Heuristic

Karlin-Altschul statistics:a general theory of alignment statisticsApplicability goes well beyond BLAST

TWO ASPECTS OF BLAST

BLAST uses Karlin-Altschul Statistics to determinethe statistical significance of the alignments it produces.

Page 9: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

>gi|23098447|ref|NP_691913.1| (NC_004193) 3-oxoacyl-(acyl carrier protein) reductase [Oceanobacillus iheyensis] Length = 253

Score = 38.9 bits (89), Expect = 3e-05 Identities = 17/40 (42%), Positives = 26/40 (64%) Frame = -1

Query: 4146 VTGAGHGLGRAISLELAKKGCHIAVVDINVSGAEDTVKQI 4027 VTGA G+G+AI+ A +G + V D+N GA+ V++ISbjct: 10 VTGAASGMGKAIATLYASEGAKVIVADLNEEGAQSVVEEI 49

Page 10: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

Alignment OverviewSequence alignment takes place in a 2-dimensional space where diagonal lines represent regions of similarity. Gaps in an alignment appear as broken diagonals. The search space is sometimes considered as 2 sequences and somtimes as query x database.

Sequence 1

alignments gapped alignment

Search space

• Global alignment vs. local alignment– BLAST is local

• Maximum scoring pair (MSP) vs. High-scoring pair (HSP)– BLAST finds HSPs (usually the MSP too)

• Gapped vs. ungapped– BLAST can do both

Page 11: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.
Page 12: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

The BLAST Algorithm:Seeding (W and T)

Sequence 1

word hits

RGD 17

KGD 14

QGD 13

RGE 13

EGD 12

HGD 12

NGD 12

RGN 12

AGD 11

MGD 11

RAD 11

RGQ 11

RGS 11

RND 11

RSD 11

SGD 11

TGD 11

BLOSUM62 neighborhood

of RGD

T=12

• Speed gained by minimizing search space• Alignments require word hits• Neighborhood words• W and T modulate speed and sensitivity

Page 13: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.
Page 14: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

The BLAST Algorithm:2-hit Seeding

word clustersisolated words

• Alignments tend to have multiple word hits.

• Isolated word hits are frequently false leads.

• Most alignments have large ungapped regions.

• Requiring 2 word hits on the same diagonal (of 40 aa for example), greatly increases speed at a slight cost in sensitivity.

Page 15: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

The BLAST Algorithm: Extension

extension

alignment

• Alignments are extended from seeds in each direction.

• Extension is terminated when the maximum score drops below X.

The quick brown fox jumps over the lazy dog.The quiet brown cat purrs when she sees him.

X = 5

length of extension

trim to max

Text examplematch +1mismatch -1no gaps

Page 16: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

>gi|23098447|ref|NP_691913.1| (NC_004193) 3-oxoacyl-(acyl carrier protein) reductase [Oceanobacillus iheyensis] Length = 253

Score = 38.9 bits (89), Expect = 3e-05 Identities = 17/40 (42%), Positives = 26/40 (64%) Frame = -1

Query: 4146 VTGAGHGLGRAISLELAKKGCHIAVVDINVSGAEDTVKQI 4027 VTGA G+G+AI+ A +G + V D+N GA+ V++ISbjct: 10 VTGAASGMGKAIATLYASEGAKVIVADLNEEGAQSVVEEI 49

Page 17: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

BLAST ALGORITHM BLAST STATISTCS

Word Hit Heuristic

Extension Heuristic

Karlin-Altschul statistics:a general theory of alignment statisticsApplicability goes well beyond BLAST

TWO ASPECTS OF BLAST

BLAST uses Karlin-Altschul Statistics to determinethe statistical significance of the alignments it produces.

Page 18: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

BLAST STATISTCS

Karlin-Altschul statistics: a general theory of alignment statistics; applicability goes well beyond BLAST

Notational issuesInformation theory: nats & bitsHow alignments are scoredHw scoring schemes are createdλ , E & H

Page 19: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

5

6

4

How many runs with a score of X do we expect to find?

Page 20: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

my $total = 0;foreach my $k (keys %frequencies){

$total += $frequencies{$k};}

my %frequences;

$frequencies{‘A’} = 0.25;$frequencies{‘T’} = 0.25;$frequencies{‘G’} = 0.25;$frequencies{‘C’} = 0.25;

n

iiptotal

1

Understanding Gaussian sum notation

Page 21: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

A little information theory

1)5.0(log2

Page 22: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

G=A=T=C=0.25

A=T=0.45; G=C=0.05

Page 23: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

bits vs. nats

)2(log/)(log)(log 2 ee nn

)(log 2 nbits )(log nnats e

Page 24: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.
Page 25: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.
Page 26: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

pM=0.01

pI =0.1

qMI=0.002

SMI=log2(.002/0.01*0.1) = +1 bits

SMI=loge(.002/0.01*0.1) = +.693 nats

pR=0.1

pL =0.1

qRL=0.002

SRL=log2(.002/0.1*0.1) = -2.322 bits

SRL=loge(.002/0.01*0.1) = -1.609 nats

Page 27: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

The BLOSUM MATRICES are int(log2 *3)

‘munge’ factor

Page 28: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

The BLOSUM MATRICES are int(log2 *3)

‘munge’ factor

Why do this?

Page 29: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

Recall that :

λ is the number that will convert the ‘munged’Sij back into its ‘original’ qij for purposes of further calculation.

2Int(3* )

Page 30: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

2Int(3* )

λ allows us to recover thatoriginal qij for purposes of furthercalculation

ijSjiij eppq

Page 31: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

λ is found by successiveapproximation using the Identity below

Page 32: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

Further calculations you can do once you know lambda

Expected scoreRelative entropyTarget frequenciesConvert a raw score to a nat/bit score

Page 33: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

Expected score of the matrix

ij

i

jji

i

SppE

1

20

1

Note must be negative for K-A stats to apply

What is the expected score of a +1/-3 scoring scheme?

Page 34: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.
Page 35: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

Relative Entropy of the matrix

BLOSUM 42 < BLOSUM 62 < BLOSUM 80

‘Think of Entropy in terms of degeneracy and promiscuity’

H = far from equilibrium

H = near equilibrium, alignments contain little information

Page 36: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.
Page 37: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

Every scoring scheme is implicitly an log-odds scoring scheme.Every scoring scheme has a set of target frequencies

In other words, even a simple +1/-3 scoring scheme is implictly a log odds scheme.

What data justify this scheme; what imaginary dataDoes the scheme imply?

Target Frequencies

Page 38: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

Further calculations you can do once you know lambda

Every scoring scheme is implicitly a log odds scoring matrix;Every log odds matrix has an implicit set of target frequencies.This is quite profound insight.

Page 39: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

Commercial break!

Page 40: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

BLAST STATISTCS

The basic operations:

Actual vs. Effective lengths,Raw scores,Normalized scores e.g. nat and bit scoresE & P

Page 41: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

>gi|23098447|ref|NP_691913.1| (NC_004193) Length = 253

Score = 38.9 bits (89), Expect = 3e-05 Identities = 17/40 (42%), Positives = 26/40 (64%) Frame = -1

Query: 4146 VTGAGHGLGRAISLELAKKGCHIAVVDINVSGAEDTVKQI 4027 VTGA G+G+AI+ A +G + V D+N GA+ V++ISbjct: 10 VTGAASGMGKAIATLYASEGAKVIVADLNEEGAQSVVEEI 49

Page 42: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.
Page 43: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

The Karlin-Altschul Equation

A minor constant

Expected number of alignments

Length of query

Length of database

Search space

Raw score

Scaling factor

Normalized score

Page 44: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

The Karlin-Altschul Equation

A minor constant

Expected number of alignments

Length of query

Length of database

Search space

Raw score

Scaling factor

Normalized score

Page 45: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

SKmneE

SenKmE ''

ACTUAL vs. EFFECTIVE LENGTHS

Page 46: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

SKmneE SKmne 1

)ln()/1ln( SeKmn

SKmn )ln(

SKmn )ln(

HlS

lHKmn /)ln(

Recall that H is nats/aligned residue, thus

The ‘expected HSP length’

HKmnl /)ln(

Dependent on search space

Page 47: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

HKmnl /)ln(

ACGTGTGCGCAGTGTCGCGTGTGCACACTATAGCC

Actual length (m)

effective length(m’) = m –l

effectve length (n’) = total length db – num_seqs*l

What happens if m’ < 0 ?

Page 48: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

The Karlin-Altschul Equation

A minor constant

Expected number of alignments

Length of query

Length of database

Search space

Raw score

Scaling factor

Normalized score’ ’

Page 49: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

>gi|23098447|ref|NP_691913.1| (NC_004193) Length = 253

Score = 38.9 bits (89), Expect = 3e-05 Identities = 17/40 (42%), Positives = 26/40 (64%) Frame = -1

Query: 4146 VTGAGHGLGRAISLELAKKGCHIAVVDINVSGAEDTVKQI 4027 VTGA G+G+AI+ A +G + V D+N GA+ V++ISbjct: 10 VTGAASGMGKAIATLYASEGAKVIVADLNEEGAQSVVEEI 49

Converting a raw score to a bit score

Page 50: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

KSS rawnatsnats ln'

KSS rawbitsbits ln'

)2ln(/' 'natsbits SS

Converting a raw score to a bit score

Page 51: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

>gi|23098447|ref|NP_691913.1| (NC_004193) Length = 253

Score = 38.9 bits (89), Expect = 3e-05 Identities = 17/40 (42%), Positives = 26/40 (64%) Frame = -1

Query: 4146 VTGAGHGLGRAISLELAKKGCHIAVVDINVSGAEDTVKQI 4027 VTGA G+G+AI+ A +G + V D+N GA+ V++ISbjct: 10 VTGAASGMGKAIATLYASEGAKVIVADLNEEGAQSVVEEI 49

Converting a raw score or a bit score to an Expect

Page 52: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

SenKmE '''

'' natsSenmE

KSS rawnatsnats ln'

'

2'' bitsSnmE

Converting a raw score or a bit score to an Expect

Page 53: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

>gi|23098447|ref|NP_691913.1| (NC_004193) Length = 253

Score = 38.9 bits (89), Expect = 3e-05 Identities = 17/40 (42%), Positives = 26/40 (64%) Frame = -1

Query: 4146 VTGAGHGLGRAISLELAKKGCHIAVVDINVSGAEDTVKQI 4027 VTGA G+G+AI+ A +G + V D+N GA+ V++ISbjct: 10 VTGAASGMGKAIATLYASEGAKVIVADLNEEGAQSVVEEI 49

Converting an Expect to a WU-BLAST P value

Page 54: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

EeP 1

)1ln( PE

Note that E ~= P if either value < 1e-5

Converting an Expect to a WU-BLAST P value

Page 55: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

>gi|23098447|ref|NP_691913.1| (NC_004193) Length = 253

Score = 38.9 bits (89), Expect = 3e-05 Identities = 17/40 (42%), Positives = 26/40 (64%) Frame = -1

Query: 4146 VTGAGHGLGRAISLELAKKGCHIAVVDINVSGAEDTVKQI 4027 VTGA G+G+AI+ A +G + V D+N GA+ V++ISbjct: 10 VTGAASGMGKAIATLYASEGAKVIVADLNEEGAQSVVEEI 49

Review: where the parts of an HSP come from, and what they mean

Page 56: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

Why use Karlin-Altschul statistics?Why not just stop with the raw score?

Page 57: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

Why use Karlin-Altschul statistics?Why not just stop with the raw score?

Scores is fine, if you are only interested In the top score… when to stop?

How to compare scores produced using two different scoring schemes?Bit score provide a common currency for scores,i.e. 52 bits is 52 bits is 52 bits.

Scores don’t reflect database size; Expects do.

K-A stats is a bit like stoichiometry: Score ~ weight λ ~ Avogadro's’ number E ~ mass

Page 58: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.
Page 59: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

WU-BLASTN

Page 60: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

NCBI-BLASTN

Page 61: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.
Page 62: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.
Page 63: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

SKmneE SKmne 1SKmn )ln(

rawSKmn /)ln(

/)ln(1 KmnSE

NCBI ~ 15WU-BLAST ~170

So how long would an oligo have to be to generate a score of 15 or 170?

Page 64: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

HKmnl /)ln(

lncbi=16

lwu-BLAST=294

Page 65: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.
Page 66: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

Sum Statistics

Page 67: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

>gi|23098447|ref|NP_691913.1| (NC_004193) Length = 253

Score = 38.9 bits (89), Expect = 3e-05 Identities = 17/40 (42%), Positives = 26/40 (64%) Frame = -1

Query: 4146 VTGAGHGLGRAISLELAKKGCHIAVVDINVSGAEDTVKQI 4027 VTGA G+G+AI+ A +G + V D+N GA+ V++ISbjct: 10 VTGAASGMGKAIATLYASEGAKVIVADLNEEGAQSVVEEI 49

Review: where the parts of an HSP come from, and what they mean

Page 68: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

What’s different about this BLAST Hit ?

Page 69: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

What’s different about this BLAST Hit ?

Page 70: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

What’s different about this BLAST Hit ?

Sum Statistics

Page 71: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

BLAST uses two distinct methods to calculate an Expect

Page 72: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

Sum Statistics

Sum statistics increases the significance (decreases the E-value) for groups of consistent alignments.

Page 73: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.
Page 74: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.
Page 75: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

Actual Vs. effective lengths for BLASTX etc

Sum Stats are ‘pair-wise’ in their focus

In other words, for the purposes of sum stat calculationsn = the length of the sbjct sequence; not the length on the db!

Page 76: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

Sum Statistics are based on a ‘sum score’; ratherthan the raw score of the alignments

The sum score is not reported by BLAST!

Page 77: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

Calculating a Sum score

Page 78: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

Converting a Sum score to an Expect(n)

Page 79: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

Expect = 3.7e-10

Expect = 2.6e-8

Sum Statistics take home: buyer beware

Best to calculate the ‘Expect(1)’ for each hit.

Which –hopefully– you now know how to do!

Page 80: The 5 Standard BLAST Programs ProgramDatabaseQueryTypical Uses BLASTNNucleotide Mapping oligonucleotides, amplimers, ESTs, and repeats to a genome. Identifying.

Enough BLAST for one day!