Top Banner
1 Exercise: Exercise: BIOINFORMATIC DATABASES BIOINFORMATIC DATABASES and and BLAST BLAST
52

1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline NCBI and Entrez Pubmed Google scholar RefSeq Swissprot Fasta format PDB: Protein.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

11

Exercise:Exercise:BIOINFORMATIC DATABASESBIOINFORMATIC DATABASES

andandBLASTBLAST

Page 2: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

22

OutlineOutline

NCBI and EntrezNCBI and Entrez PubmedPubmed Google scholarGoogle scholar RefSeqRefSeq SwissprotSwissprot Fasta formatFasta format PDBPDB:: Protein Data Bank Protein Data Bank Organism specific databasesOrganism specific databases SummarySummary Pairwise Sequence Alignment and BLASTPairwise Sequence Alignment and BLAST OverviewOverview Query type: DNA or ProteinQuery type: DNA or Protein

Page 3: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

33

What’s in a database?What’s in a database? Sequences – genes, proteins, etcSequences – genes, proteins, etc

Full genomesFull genomes

Annotation – information about genes/proteins:Annotation – information about genes/proteins:- function- function- cellular location- cellular location- chromosomal location- chromosomal location- introns/exons- introns/exons- protein structure- protein structure- phenotypes, diseases- phenotypes, diseases

PublicationsPublications

Page 4: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

44

NCBI and EntrezNCBI and EntrezNational center for biotechnology informationNational center for biotechnology information

One of the largest and most comprehensive One of the largest and most comprehensive databases belonging to the NIH (national databases belonging to the NIH (national institute of health)institute of health) The primary Federal agency for conducting and The primary Federal agency for conducting and

supporting medical research in the USAsupporting medical research in the USA Entrez is the search engine of NCBIEntrez is the search engine of NCBI Search for :Search for :

genes, proteins, genomes, structures, diseases, genes, proteins, genomes, structures, diseases, publications and morepublications and more

httphttp://://wwwwww..ncbincbi..nlmnlm..nihnih..govgov//

Page 5: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

55

PubMed: search for published papersPubMed: search for published papers

Yang X, Kurteva S, Ren X, Lee S,Yang X, Kurteva S, Ren X, Lee S,Sodroski JSodroski J.. “Subunit stoichiometry of human immunodeficiency virus “Subunit stoichiometry of human immunodeficiency virus type 1 envelope glycoprotein trimers during virus entry into host type 1 envelope glycoprotein trimers during virus entry into host

cells “, J Virolcells “, J Virol.. 2006 May;80(9):4388-95. 2006 May;80(9):4388-95.

Page 6: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

66

Use fields!Use fields!Yang[AU] AND glycoprotein[TI] AND 2006[DP] AND J virol[TA]

For the full list of field tags: go to help >- Search Field Descriptions and Tags

Page 7: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

77

ExerciseExercise

Retrieve all publications in which the Retrieve all publications in which the first first author is:author is: Pe'er I Pe'er I and the and the last author is:last author is: Shamir RShamir R

Page 8: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

88

Using limitsUsing limits

Page 9: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

99

Google scholarGoogle scholarhttp://scholar.google.com/

Page 10: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

1010

Page 11: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

1111

NCBI gene & protein databases: NCBI gene & protein databases: GenBankGenBank

GenBankGenBank is an annotated collection of all is an annotated collection of all publicly available DNA sequences (and publicly available DNA sequences (and their amino-acid translations)their amino-acid translations)

Holds over Holds over 148148 billionbillion bases (2009)bases (2009)

Page 12: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

1212

Searching NCBI for the protein Searching NCBI for the protein human CD4human CD4

Search demonstrationSearch demonstration

Page 13: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

1313

Page 14: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

1414

Using field descriptions, qualifiers, Using field descriptions, qualifiers, and boolean operatorsand boolean operators

Cd4[GENE] AND human[ORGN] Cd4[GENE] AND human[ORGN] Or Or Cd4[gene name] AND human[organism]Cd4[gene name] AND human[organism]

List of field codes: List of field codes: httphttp://://wwwwww..ncbincbi..nlmnlm..nihnih..govgov//entrezentrez//queryquery//staticstatic//helphelp//Summary_MatricesSummary_Matrices..html#Search_Fields_and_Qualifiershtml#Search_Fields_and_Qualifiers

Boolean Operators:Boolean Operators:ANDANDORORNOTNOT

Note: do not use the field Protein name [PROT], only GENE!Note: do not use the field Protein name [PROT], only GENE!

Page 15: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

1515

This time we directly search in the protein databaseThis time we directly search in the protein database

Page 16: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

1616

RefSeqRefSeq RefSeq: sub-collection of NCBI databases with RefSeq: sub-collection of NCBI databases with

only non-redundant, highly annotated entries only non-redundant, highly annotated entries (genomic DNA, transcript (RNA), and protein (genomic DNA, transcript (RNA), and protein products)products)

Page 17: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

1717

Page 18: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

1818An explanation on GenBank records

Page 19: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

1919

SwissprotSwissprot

A protein sequence database which A protein sequence database which strives to provide a high level of strives to provide a high level of annotation:annotation:* the function of a protein* the function of a protein* domains structure* domains structure* post* post--translational modificationstranslational modifications* variants* variants

One entry for each proteinOne entry for each protein

Page 20: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

2020

UniProtUniProt

Page 21: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

2121

Fasta formatFasta format

> gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens] MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLTKGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLTLTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSIVYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPLHLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAKVSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCVRCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI

Save accession numbers for future use (makes searching quicker):RefSeq accession number: NP_000607.1

header

ID/accession description

sequence

Page 22: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

2222

DownloadingDownloading

Page 23: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

2323

PDBPDB:: Protein Data Bank Protein Data Bank

Main database of 3D structuresMain database of 3D structures Includes ~56,000 entries (Includes ~56,000 entries (proteinsproteins, ,

nucleic acids, others)nucleic acids, others) Proteins organized in groups, families etcProteins organized in groups, families etc Is highly redundant Is highly redundant

different conformations (e.g., ligand different conformations (e.g., ligand dependent)dependent)

http://www.rcsb.orghttp://www.rcsb.org

Page 24: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

2424

Human CD4 in complex with HIV gp120Human CD4 in complex with HIV gp120

gp120

CD4

PDB ID 1G9M

Page 25: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

2525

Model organisms have independent databases:Model organisms have independent databases:

Organism specific databasesOrganism specific databases

HIV database http://hiv-web.lanl.gov/content/index

http://gmod.org/wiki/Main_Page?q=node/71

Page 26: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

2626

SummarySummary

General and comprehensive databases:General and comprehensive databases: NCBI, EMBL, DDBJNCBI, EMBL, DDBJ

Genome specific databases:Genome specific databases: ENSEMBL, UCSC genome browserENSEMBL, UCSC genome browser

Highly annotated databases:Highly annotated databases: Proteins:Proteins:

• Swissprot, RefSeqSwissprot, RefSeq Structures:Structures:

• PDBPDB

Page 27: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

2727

And always remember:And always remember:

1.1.GoogleGoogle (or any search engine) (or any search engine)

2.2.RTFM -RTFM -

Read the manual!!! (/help/FAQ)Read the manual!!! (/help/FAQ)

Page 28: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

2828

Pairwise Pairwise Sequence Sequence

Alignment and Alignment and BLASTBLAST

Page 29: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

2929

What is sequence alignment?What is sequence alignment?

Alignment: Alignment: Comparing two (pairwise) or Comparing two (pairwise) or more (multiple) sequences. Searching for more (multiple) sequences. Searching for a series of identical or similar characters in a series of identical or similar characters in the sequences.the sequences.

MVNLTSDEKTAVLALWNKVDVEDCGGE|| || ||||| ||| || || ||MVHLTPEEKTAVNALWGKVNVDAVGGE

Page 30: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

3030

Local vs. GlobalLocal vs. Global Global alignmentGlobal alignment – finds the best – finds the best

alignment across the alignment across the wholewhole two two sequences.sequences.

Local alignmentLocal alignment – finds regions of – finds regions of high similarity in high similarity in partsparts of the of the sequences.sequences.

ADLGAVFALCDRYFQ|||| |||| |ADLGRTQN-CDRYYQ

ADLG CDRYFQ|||| |||| |ADLG CDRYYQ

Page 31: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

3131

In the course of evolution, the sequences In the course of evolution, the sequences changed from the ancestral sequence by changed from the ancestral sequence by random mutationsrandom mutations

Three types of mutations:Three types of mutations:

1.1. InsertionInsertion - AAGA - AAGA AAG AAGTTAA

2.2. DeletionDeletion - A - AAAGAGA AGA AGA

3.3. SubstitutionSubstitution -- AA AAGGAA AA AACCAA

Evolutionary changes in sequencesEvolutionary changes in sequences

InsertionInsertion + + DeletionDeletion IndelIndel

Page 32: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

3232

Scoring schemeScoring scheme

Match/mismatch scores: substitution matricesMatch/mismatch scores: substitution matrices Nucleic acids:Nucleic acids:

• Transition-transversionTransition-transversion Amino acids:Amino acids:

• Evolution (empirical data) based: (PAM, BLOSUM)Evolution (empirical data) based: (PAM, BLOSUM)• Physico-chemical properties based (Grantham, Physico-chemical properties based (Grantham,

McLachlan)McLachlan)

Gap penaltyGap penalty

Page 33: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

3333

Computation time:Computation time:How do we search a database?How do we search a database?

If each pairwise alignment takes 1/10 of a If each pairwise alignment takes 1/10 of a second, and if the database contains 10second, and if the database contains 107 7

sequences, it will take sequences, it will take 101066 seconds seconds = = 11.511.5 daysdays to complete one search. to complete one search.

150,000 searches (at least!!) are 150,000 searches (at least!!) are performed per day. >82,000,000 sequence performed per day. >82,000,000 sequence records in GenBank.records in GenBank.

Page 34: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

3434

ConclusionConclusion

Using the exact comparison pairwise Using the exact comparison pairwise alignment algorithm between the query alignment algorithm between the query and all DB entries – too slowand all DB entries – too slow

Page 35: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

3535

HeuristicHeuristic

Definition:Definition: a heuristic is a a heuristic is a design to solve a problem that design to solve a problem that does not provide an exact does not provide an exact solution (but is not too bad) but solution (but is not too bad) but reduces the time complexity of reduces the time complexity of the exact solutionthe exact solution

Page 36: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

3636

BLASTBLAST

BLAST - BLAST - BBasic asic LLocal ocal AAlignment and lignment and SSearch earch TToolool

A heuristic for searching a database for A heuristic for searching a database for similar sequencessimilar sequences

The heuristic based on restrictions of the The heuristic based on restrictions of the similarity (such as using similarity (such as using ungapped word ungapped word matching instead of single character matching instead of single character matching).matching).

Page 37: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

3737

Query: DNA Protein

Database: DNA Protein

Query type: DNA or ProteinQuery type: DNA or Protein All types of searches are possibleAll types of searches are possible

blastn – nuc vs. nucblastp – prot vs. prot

blastx – translated query vs. protein database

tblastn – protein vs. translated nuc. DB

tblastx – translated query vs. translated database

Page 38: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

3838

Query typeQuery type

Information content in the letters:Information content in the letters: Nucleotides: 4 letter alphabetNucleotides: 4 letter alphabet Amino acids: 20 letter alphabetAmino acids: 20 letter alphabet

• Two random DNA sequences will, on average, have 25% identity• Two random protein sequences will, on average, have 5% identity

The amino-acid sequence is often preferable for homology search

Selection (and hence conservation) works (mostly) at Selection (and hence conservation) works (mostly) at the protein levelthe protein level

Page 39: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

3939

E-valueE-value The number of times we will theoretically The number of times we will theoretically

find an alignment with a score ≥ find an alignment with a score ≥ YY of a of a random sequence vs. a random databaserandom sequence vs. a random database

Theoretically, we could trust

any result with an

E-value ≤ 1

In practice – BLAST uses estimations. E-values of 10-4 and lower indicate a

significant homology.E-values between 10-4 and 10-2 should be checked (similar domains, maybe

non-homologous).E-values between 10-2 and 1 do not

indicate a good homology

Page 40: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

4040

Filtering low complexityFiltering low complexity

Low complexity regionsLow complexity regions : e.g., Proline rich : e.g., Proline rich areas (in proteins), Alu repeats (in DNA)areas (in proteins), Alu repeats (in DNA)

Regions of low complexity generate high Regions of low complexity generate high scores of alignment, BUT – this does not scores of alignment, BUT – this does not indicate homologyindicate homology

Page 41: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

4141

BLAST 2 sequences at NCBI BLAST 2 sequences at NCBI

Produces the Produces the locallocal alignment of two given alignment of two given sequences using sequences using BLASTBLAST (Basic Local (Basic Local Alignment Search Tool)Alignment Search Tool) engine for local engine for local alignmentalignment

Does not use an optimal algorithm but a Does not use an optimal algorithm but a heuristicheuristic

Page 42: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

4242

Back to NCBIBack to NCBI

Page 43: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

4343

BLAST – bl2seqBLAST – bl2seq

Page 44: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

4444

blastnblastn – nucleotide – nucleotide

blastpblastp – protein – protein

Bl2Seq - queryBl2Seq - query

Page 45: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

4545

Bl2seq resultsBl2seq results

Page 46: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

4646

Bl2seq resultsBl2seq results

MatchMatch DissimilarityDissimilarity GapsGaps SimilaritySimilarity Low Low

complexitycomplexity

Page 47: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

4747

BLAST – programsBLAST – programs

Query: DNA Protein

Database: DNA Protein

Page 48: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

4848

BLAST – BlastpBLAST – Blastp

Page 49: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

4949

Blastp - resultsBlastp - results

Page 50: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

5050

Blastp – results (cont’)Blastp – results (cont’)

Page 51: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

5151

Blastp – acquiring sequencesBlastp – acquiring sequences

Page 52: 1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.

5252

Blastp – acquiring sequences Blastp – acquiring sequences (cont’)(cont’)