BLAST Tutorial 3 What is BLAST? • Basic Local Alignment Search Tool • Is a set of similarity search programs designed to explore sequence databases. What are similarity searches good for? • One sequence by itself is not informative; it must be analyzed by comparative methods against existing sequence databases to develop hypothesis concerning relatives and function BLAST program Database Query
22
Embed
BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
BLAST
Tutorial 3
What is BLAST?• Basic Local Alignment Search Tool• Is a set of similarity search programs designed to explore sequence databases.
What are similarity searches good for?• One sequence by itself is not informative; it must be analyzed by comparative methods against existing sequence databases to develop hypothesis concerning relatives and function
BLAST program DatabaseQuery
NameQuery typeDatabase
blastnGenomicGenomic
blastpProteinProtein
blastxTranslated genomic
Protein
tblastnProteinTranslated genomic
tblastxTranslated genomic
Translated genomic
BLAST Databases
http://www.ncbi.nlm.nih.gov/BLAST/
Place Query
Choose Database
?
BLASTN Databases
Gene collection
GenBank, EMBL, DDBJ, PDB and NCBI reference sequences (RefSeq)
Reward and penalty for matching and mismatching bases
Cost to create and extend a gap
Remove low information content
Limit search to specific organism
?
Search for homologous to chick “olfactory receptor 6” gene
Query sequence Matched Areas of database sequences
Global Alignments
Local Alignments
Sequence Identifier
Sequence description
Score(bits)
CoverageIdentity
E value
Score andE value
Identities and gaps
Strand
Multiple hits on a same subject
Design of the BLAST survey
Consider your research question:
•Are you looking for an particular gene in a particular species?: BLAST against the genome of that species.
•Are you looking for additional members of a gene family across all species? : BLAST against the gene collection database.
•Are you looking for exact motif matches? : increase gap penalty or use megablast.
Score and E-value
Score (S): (identities + mismatches)-gaps
Depends on search space
Query length(bp)
Database length(bp)
Depends on scoring system
Score
Bit Score (S’):
Score and E-value
•The score is a measure of the similarity of the query to the sequence shown.
•The E-value is a measure of the reliability of the score.
•The definition of the E-value is: The probability due to chance, that there is another alignment with a similarity greater than the given S score.
Score and E-value
The Size of the E-value
•The typical threshold for a good E-value from a BLAST search is E=10-6≈e-6 or lower.
•The reason for such low values is that an E=0.001 in a million entry database would still leave 1000 entries due to chance. An E=e-6 would only leave one entry due to chance.
Given the following parameters:Query length: 150=1.37 K=0.711Average Sequence length in database: 270Number of sequences in database: 4,554,026
Exercise
Calculate the S, S’ and E for the following BLAST hit: