Top Banner
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software Web addresses
23

1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

Dec 31, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

1

Orthology and paralogy

A practical approach

Searching the primaries

Searching the secondaries

Significance of database matches

DB Web addresses

Software Web addresses

Page 2: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

2

Why Search Databases?

• To find out if a new DNA sequence already is deposited in the databanks.

• To find proteins homologous to a putative coding ORF.

Page 3: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

3

Why Search Databases?

• To find similar non-coding DNA stretches in the database, (for example: repeat elements, regulatory sequences).

• To locate false priming sites for a set of PCR oligonucleotides.

Page 4: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

4

What Databases Are Available?• DNA (nucleotide sequences):

The big databases: Genbank, Embl, DDBJ an their weekly updates. These databases exchange information routinely.

• Genomic databases like the: Human (GDB), Mouse (MGB), Yeast (SGB), etc…

• Special databases: ESTs (expressed sequence tags) STSs (sequence-tagged sites) EPD (eukaryotic promoter database) REPBASE (repetitive sequence database) and many others.

Page 5: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

5

What Databases Are Available?• Protein (amino acid sequences):

The big databases are: Swiss-Prot ( high level of annotation) PIR (protein identification resource)

• Translated databases like: SPTREMBL (translated EMBL) GenPept (translation of coding regions in GenBank)

• Special databases like: PDB(sequences derived from the 3D structure Brookhaven PDB)

Page 6: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

6

Web Addresses

• http://www.ncbi.nlm.nih.gov/Entrez/– http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=sear

ch&DB=nucleotide– http://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.

html– http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein

Page 7: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

7

Let us go

http://www.ncbi.nlm.nih.gov/Entrez/

Page 8: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

8

What is GenBank?

• http://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html

• GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences …

Page 10: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

10

NCBI databases

• http://www.ncbi.nlm.nih.gov/Database/index.html

http://www.ncbi.nlm.nih.gov/Database/tut1.html

Let us try a tutorial

Page 11: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

11

Web Addresses

• http://www.ebi.ac.uk/Databases/– http://www.ebi.ac.uk/embl/index.html– http://www.ebi.ac.uk/swissprot/index.html– http://www.ebi.ac.uk/microarray/ArrayExpress/

arrayexpress.html

Page 12: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

12

Homology and Analogy

It is important to understand a concept that underpins sequence analysis - homology.

The term homology is confounded and abused in the literature.

Simply, sequences are said to be homologous if they are related by divergence from a common ancestor.

Page 13: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

13

What Is Homology ?(from the Technion

course)• Similarity or likeness between

properties in species.• Before Darwin, homology was

defined morphologically:• Example:

Page 14: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

14

Homology

Bats and butterflies fly, but are different.

Bats fly and whales swim, yet the bones in a bat's wing and a whale's flipper are strikingly alike.

Bats and butterflies wings are not homologous.

Bats wings and whales flippers are homologous.

Page 15: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

15

Homology Interpretation from Darwin

to 21st Century• Darwin (1859) explained homology

as the result of descent with modification from a common ancestor.

• Modern genetics: Homology information is in the genes.

• Two sequences are homologous if they are both similar and have a common ancestor.

Page 16: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

16

When Does Similarity Imply

Homology?• Similarity by itself is not enough: for

example, short sequences similarity could be random (result from different ancestors).

• Large enough similarities typically imply homology (and usually we do not have direct evidence on descent).

• Sequence similarity comes with a significance measure.

Page 17: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

17

Homology and Analogy

Understanding homology allows us to appreciate the concept of analogy; this is encountered in protein structures that share similar folds but have no demonstrable sequence similarity; or that share groups of catalytic residues with almost exactly equivalent spatial geometries, but otherwise have neither sequence nor structural similarity. Such relationships are thought to result from convergence to similar biological solutions from different evolutionary starting-points.

Page 18: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

18

Homology and Analogy

The essence of sequence analysis is the inference of homology.

Homology is not a measure of similarity, but an absolute statement that sequences have a divergent rather than a convergent relationship.

Thus, phrases that quantify homology are meaningless.

Page 19: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

19

Orthology and Paralogy

Homologous proteins may perform the same function in different species (orthologues) or different but related functions within one organism (paralogues).

Comparison of orthologues allows study of molecular palaeontology, while paralogues have provided deeper insights into the underlying mechanisms of evolution.

Page 20: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

20

Orthology and Paralogy

Paralogues arose from single genes via successive duplication events.

The duplicated genes followed separate evolutionary pathways, and new specificities evolved through variation and adaptation.

Page 21: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

21

Complete genomes

• http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Genome

• Let us walk around among genomes

Page 22: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

22

COGsPhylogenetic classification of proteins encoded in complete

genomesClusters of Orthologous Groups of proteins (COGs) were

delineated by comparing protein sequences encoded in 43 complete genomes, representing 30 major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain. Proteins from two eukaryotic genomes (Drosophila melanogaster and Caenorhabditis elegans) were assigned to COGs and can be reached from each individual COG page.

Page 23: 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

23

COGs

• http://www.ncbi.nlm.nih.gov/COG/• Cognitor• http://www.ncbi.nlm.nih.gov/COG/xognitor.html

• COG Help• http://www.ncbi.nlm.nih.gov/COG/

COGhelp.html#top»FTPftp://ftp.ncbi.nih.gov/genomes/Bacteria/Mycobacterium_leprae/