Top Banner
Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics
70

Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Dec 17, 2015

Download

Documents

Deborah Ross
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Comparative Genomics

Todd CastoeBiochemistry and Molecular Genetics

Page 2: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

The First Genomes

Page 3: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Figure 18.6 Genomes 3 (© Garland Science 2007)

Page 4: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

http://www.zo.utexas.edu/faculty/antisense/Download.html

Tree of life from David Hillis’ lab (based on ~3000 rRNAs)

animalsplants

fungi

protists

bacteriaarchaea

you are here

Page 5: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

http://www.zo.utexas.edu/faculty/antisense/Download.html

you are here

Tree of life from David Hillis’ lab (based on ~3000 rRNAs)

Page 6: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Hedges, Nat Rev Genet 2003

Page 7: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

An argument for model speciesand the need for comparative genomics

Most human proteins are ancient

Page 8: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Timescale of eukaryote evolution

HUMAN PROTEINS…

~30%

~50%

~75%

>90%

Page 9: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Gu X. et al. Nature Genetics (2002) 31 205-209

Divergences within 749 gene families in the Human Genome

Page 10: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Genomes have been recycling for Billions of years

Page 11: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

11

What is comparative genomics

There are many ways that genomes can be compared

• Whole genome– Genome size– Genome alignments– Synteny (gene order conservation)– Gene number– Anomalous regions

• Gene-centric– Gene families and unique genes– Gene clustering by function

• Gene sequence variations– Codon usage, SNPs, inDels, pseudogenes

Page 12: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

12

1. Conservation over long evolutionary distances suggests functional constraints

2. Lack of conservation over short distances may be indicative of adaptive evolution

3. Helps us identify both coding and non-coding genes and regulatory elements

4. Characterizing the differences between organisms reveals mechanisms of change

5. Allows us to achieve a greater understanding of vertebrate evolution

6. Leveraging knowledge between species for annotation and inference of function

7. Tells us what is common and what is unique between different species at the genome level

8. The function of human genes and other regions may be revealed by studying their counterparts in simpler model organisms

Why Comparative Genomics?

Page 13: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

13

Comparing Genome SizeThe ‘C-value paradox’

Genome size does NOT correlate with organismal complexity

Page 14: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Why Are Some Genomes So Large?

• There is no clear correlation between genome size and genetic complexity.

• C-value – The total amountof DNA in the genome (perhaploid set of chromosomes)

• C-value paradox – Thelack of relationshipbetween the DNA content(C-value) of an organismand its coding potential.

Haploid Genome Size (log scale)

Page 15: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.
Page 16: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Transposable Element

Contrasted Genome Landscapes

Page 17: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

The amount of TE correlate positively with genome size

Plas

mod

ium

Slim

e m

old

Budd

ing

yeas

t

Fiss

ion

yeas

t

Neu

rosp

ora

Arab

idop

sis

Bras

sica

Rice

Mai

zeNem

atod

eDro

soph

ilaM

osqu

itoSe

a sq

uirt

Zebr

afish

Fugu

Mou

seHum

an

0

500

1000

1500

2000

2500

3000 Genomic DNA

TE DNA

Protein-codingDNA

Mb

Feschotte & Pritham 2006

17

Page 18: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

18

• Variation in gene numbers cannot explain variation in genome size among eukaryotes

• Most of variation in genome size is due to variation in the amount of repetitive DNA (mostly derived from TEs)

• TEs accumulate in intergenic and intronic regions

•CONCLUSIONS…• TEs have played an important role in genome evolution and

diversification

• Facilitate expansion and contraction of genomes AND gene families

Transposable Elements…

Page 19: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

19

Coarse Comparisons of Genomes

Page 20: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

20

Fugu GenomeScience 2002

365 Mb(1/10 the human)

Tiny vertebrate genome

Humans and Fish shared common

ancestor 450Mya!

Page 21: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

21

Among the Smallest Vertebrate Genome

• Genome is < 1/6 repetitive DNA– Vs. ~50% in us

• ¾ of human proteins have a strong match to Fugu (pretty good for 450My)

• ¼ of human proteins had highly diverged from, or had no pufferfish homologs

Page 22: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

22

Shadows of the Ancient Vertebrate Genome…

• Conserved linkages between Fugu and human – Preservation of chromosomal chunks from the

common vertebrate ancestor (synteny)

• BUT, lots of cut/copy-paste…. And some general scrambling of gene order

Page 23: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Shadows of the Ancient Vertebrate Genome…

• Conserved linkages between Fugu and human – Preservation of chromosomal chunks from the

common vertebrate ancestor

• BUT, lots of cut/copy-paste…. And some general scrambling of gene order

Page 24: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

What a little genome… …with little introns

• The Fugu genome is compact partly because introns are shorter compared with the human genome

• The Fugu mode of intron size is 79 bp– 75% of introns 425 bp in length

• The human mode is 87 bp – 75% of introns 2609 bp

• Fugu: 500 introns > 10Kb --- Human: 12,000 > 10Kb

• The total numbers of introns are roughly the same– 161,536 introns in Fugu– 152,490 introns in human

Page 25: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

What a little genome… …with little introns

Page 26: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

GC Content DifferencesProbably related to the relative complexity of the chromatin structure in humans versus the Fugu.

Page 27: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Fugu-Human Syntenyhttp://blast.fugu-sg.org/fugu-synteny/viewer_newServer.php

I think their maps, however, are confusing and not that informative, -scaffolds were not physically mapped to chromosomes…

Let’s look instead at the other pufferfish, Tetraodon, that was sequenced the following year..

-physical mapping to chromosomes was complete

Page 28: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

28

Tetraodon-Human Synteny

Page 29: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

29

Comparative Genomics – SynetnyHuman Chrom.1 vs. Chimp

Page 30: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

30

Comparative Genomics – SynetnyHuman Chrom.1 vs. Mouse

Page 31: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

31

Comparative Genomics – SynetnyHuman Chrom.1 vs. Cow

Page 32: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

32

Comparative Genomics – SynetnyHuman Chrom.1 vs. Opossum

Page 33: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

33

Comparative Genomics – SynetnyHuman Chrom.1 vs. Platypus

Page 34: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

34

Comparative Genomics – SynetnyHuman Chrom.1 vs. Chicken

Page 35: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

35

Synteny

• Large blocks of synteny exist even at great phylogenetic distance

• Also substantial scrambling, even at short distance…

Page 36: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Whole Genome Alignments

• Functional sequences often evolve more slowly than non-functional sequences, therefore sequences that remain conserved may perform a biological function.

• Comparing genomic sequences from species at different evolutionary distances allows us to identify:– Coding genes– Non-coding genes– Non-coding regulatory sequences

36

Page 37: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

The Rate of Evolution Depends on Constraints

Human vs. Rodent Comparison

Highest substitution rates: pseudogenes introns 3’ flanking (not transcribed to mature mRNA) 4-fold degenerate sites Intermediate substitution rates: 5’ flanking (contains promoter) 3’, 5’ untranslated (transcribed to mRNA) 2-fold degenerate sitesLowest substitution rates: Nondegenerate sites

Page 38: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Selection of Species for DNA comparisons

Both coding and

non-coding

sequences

~70-75%

~150 MYA

4.2

Opossum

0.42.53.0Size (Gbp)

~65%~80%>99%Sequence

conservation (in coding regions)

Primarily coding

sequences

Both coding and non-coding sequences

Recently changed

sequences and genomic

rearrangements

Aids identification of…

~450 MYA~ 65 MYA~5 MYATime since divergence

PufferfishMouseChimpanzeeHuman vs..

38

Page 39: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

39

Comparative Analyses of Sequence Conservation

Hypothesis: areas with high sequence similarity are likely to contain functionally important elements:

protein-coding exonstranscription factor binding sites

These two are conceptually the same…

Phylogenetic Shadowing (fine scale)Identifying regions that do not accumulate change

Phylogenetic Footprinting (large scale)Identifying which regions stay somewhat conserved (identifiable) across larger evolutionary distances

Page 40: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

40

UCSC Genome Browser

Page 41: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

41

In these comparative genomic charts, it is easy to see why meaningful comparisons between humans and other primates have been difficult.

The pink areas represent regions of high conservation between the two species being compared, (meaning the sequences are the same in both), the blue areas represent the positions of protein-coding regions and the purple areas represent the non-protein coding parts of a gene.

Page 42: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Phylogenetic shadowing analyses sequence variation in a multiple alignment to identify regions that accumulate variation at a slower rate.

Each position of an alignment is fitted to a phylogenetic model to calculate the likelihood that the position is evolving at a fast or a slow rate (a).

Generally, positions with several sequence differences across species are more likely to be evolving at a fast rate, and in turn identify the least variable regions (b).

The slowly evolving regions often correspond to functional sequences.

42

Page 43: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

43

Phylogenetic Footprinting (VISTA)

Page 44: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

44

Identification of Conserved Regulatory Elements

Page 45: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Comparative analysis of multi-species sequences from targeted genomic regions

45

Nature, 2003

Page 46: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

CFTR Locus

Encodes the protein: Cystic Fibrosis Transmembrane Conductance Regulator

– An ion channel across the cell membrane

– The transport of chloride through CFTR helps control the movement of water in tissues and maintain the fluidity of mucus and other secretions

– Normal functioning ensures that organs such as the lungs and pancreas function properly

– Most CF patients show a deletion that either leads to an amino acid substitution, or a deletion of part of an exon of CFTR

Page 47: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

47

Comparative Genomics of the CFTR Locus

• CFTR = 1.8 Mb of human Ch7, Sequenced for 12 ssp.

• How does a single locus change over evolutionary time?

• How much does it change?

• What types of changes are more/less common?

• Do some lineages have more of certain changes than others?

• How much comparative genomic data do we need???

Page 48: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

48

Sequence Conservation

Page 49: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

49

Looking backward from the human genome How much is still there after 450my (Fugu)

Page 50: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Differences in exon length

Differences in exon lengths:+ = insertion-= deletione = extension due to alteration of splice site or stop codon s = early stop codon

Data like this sure makes you wonder about mouse models of human disease, eh?

Page 51: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Transposable ElementsGone Wild!

51

High Turnover in TEs despite gene conservation

Page 52: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Nucleotide Changes

52

Big insertions/deletionsMore common Than nucleotide changes!

In primates, large indels are the principal mechanism accounting forthe observed sequence differences

Page 53: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Using all 12 species, they found 561 Multi-Species ConservedSequences (MCSs)

So, how many could we find using just the Mouse genome (rather than all 12)

Less than half even with high false positives…!!!

Using evolutionary conservation to ID functionally important conserved human genome segments

How many comparative genomes do we need – can’t we just use the mouse? (Lots, and NO)…

53

False Pos.

False Neg.True Pos.

Page 54: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Multi-Species Conserved Sequences

54

Strong argument for comparative genomics:Need many species, and distant species – like cat, dog, fish - to ID conserved possibly-functional regions in humans!

950 of the 1,194 MCSs are neither exonic nor lie less than 1-kb upstream of transcribed sequence.

Meaning they are otherwise hard to predict

(= Evolutionary Distance)

Page 55: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

55

Take Home Messages… • Identification of conserved non-coding segments beyond those previously

identified experimentally, and evidence we can find more with even more genomes!!!

• These were not detectable by pair-wise sequence comparisons alone– Underscores importance of comparative genomics

• Need many diverse species to figure out these questions!

• Analysis of TE insertions highlights variation in genome dynamics among species– The rate of TE evolutionary dynamics in vertebrates is amazing, and hugely important for

the structure and evolution of the genome

• Importance of large insertion-deletion (not necessarily nucleotide changes) between closely related species, including humans and primates

Page 56: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

ENCODE Project• Cross-reference existing with new data on human

genome function

• Identify the functional relevance of as many bases of human genome as possible.

56

Page 57: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

57

ENCODE Project Findings (2007)• A total of 5% of the bases in the genome can be confidently identified as

being under evolutionary constraint in mammals

• For ~60% of these conserved bases, evidence of function based on experimental assays

• However, not all bases within known functional regions are evolutionarily conserved

• Much of the variation, while functional, appears to be evolving under little selective constraint!– While functional, must not be important enough for “fitness” to be

highly conserved….

Page 58: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

58

Evolutionarily Conserved Regions

Page 59: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Comparative Genomics

Where do babies come from? (ask your parents)Where do genes come from?

Evolution of Gene Families in Vertebrates

59

Page 60: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Gene Duplication

Orthologous genes: in different organisms, diverged from common ancestral gene by speciationA1 – A2 or B1 – B2

Paralogous genes: originated from common ancestral gene via gene duplicationA1 – B1 or A1 – B2, etc…

Homologs: genes that have the same ancestor

Page 61: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Orthologues and Paralogues

Page 62: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

The Fate of Gene Duplicates

Functional Conservation – both copies can retain original function

Gene Loss – one (or both) copies can be lost either by complete deletion or by mutation leading to a pseudogene (non-functional copy)

Neofunctionalization – e.g., one copy may take on a new function while the other copy retains the original function

Subfunctionalization - each copy becomes specialized for a subset of ancestral gene’s roles (Hox genes seem to be an example)

Page 63: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

genomeduplication

Van de Peer et al. Nature Reviews Genetics (2009)

Humans

Page 64: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Gene Duplication

Most gene families are small; exceptions often have an adaptive basis: immunoglobulin genes (1000 copies in humans), olfactory receptor genes (100’s of copies in mammals)

Page 65: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

65

Rho GTPases – Molecular SwitchesControl cytoskeletal architecture, survival, adhesion, proliferation, motility, etc.

Page 66: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

66

Gene Gain and Loss…. In 550MY

Sea urchin is estimated to have 23,300 genes with representatives of nearly all vertebrate gene families

•Gene families are not as large as in vertebrates

•Some genes thought to be vertebrate-specific were found in the sea urchin

•Others were identified in sea urchin but not the chordate lineage, which suggests loss in the vertebrates.

•The sea urchin has orthologs of genes associated with •Vision•Hearing•Balance•chemosensation in vertebrates

• raw material for current vertebrate complex sensory gene programs)..

Page 67: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

67

Expansion of urchin-specific Rho GTPases

Page 68: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

Gain and loss of genes in gene families

Demuth et al., 2006, PLoS 1

Human genome has 689 genes not present in the chimp and the chimp has 729 genes not present in humans.

GAINLOSS

Page 69: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

69

Despite expansion-contraction of gene families, there is little novel gain or complete loss

Opossum genome… 180MY of change

• The opossum genome contains ~18,000–20,000 protein-coding genes, the vast majority have eutherian orthologues.

• Lineage-specific genes largely originate from expansion and rapid turnover in gene families involved in immunity, sensory perception and detoxification.

• Only eight currently have strong evidence of representing functional genes without homologues in humans!

Page 70: Comparative Genomics Todd Castoe Biochemistry and Molecular Genetics.

70

Conclusions• Studying biology and medicine means studying

recycled genomic material

• Studying evolution informs genomics– Studying genomics informs evolution

• Knowing how genomes evolve can directly inform on how they function

• More genomes = more data points for studying how they change through evolution, thus how they function