Top Banner
Comparative genomics. Comparing the DNA sequences from several species makes it possible to eliminate spurious gene predictions, sometimes find new ones, and find regulatory regions — short sequences that turn genes on and off. Red boxes highlight areas of sequence similarity between at least two species. Functional sequences — genes and regulatory elements — tend to be conserved across all species. The figure IB404 - 6. Other Fungi – February
14

IB404 - 6. Other Fungi – February 6

Feb 05, 2016

Download

Documents

azize

IB404 - 6. Other Fungi – February 6. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: IB404 - 6. Other Fungi – February 6

Comparative genomics. Comparing the DNA sequences from several species makes it possible to eliminate spurious gene predictions, sometimes find new ones, and find regulatory regions — short sequences that turn genes on and off. Red boxes highlight areas of sequence similarity between at least two species. Functional sequences — genes and regulatory elements — tend to be conserved across all species. The figure shows how one true regulatory element and gene might emerge from a comparison of four yeast species.

IB404 - 6. Other Fungi – February 6

Page 2: IB404 - 6. Other Fungi – February 6

Eric Lander’s group at MIT Whitehead Institute sequenced 3 species,as did Bob Waterston’s group at WashU, plus four more.

WaterstonLander

Below is a schematic showing the well-conserved order and orientation of most genes, in this case for a segment containing about 35 genes. The red ones are shared, but the blue ones are different.

Other Saccharomyces species

Page 3: IB404 - 6. Other Fungi – February 6

Some observations from these comparisons across four species:1. ~500 predicted genes encoding >100 amino acids appear not to be real.2. Confirm most predicted introns, and find 60 more, for 300 total.3. Intergenic nucleotides change twice as frequently as genic nucleotides.4. 14% of intergenic sites show indels, but only 1% of genic (and no frameshifts in the genic regions, of course).5. Proteins range in conservation from 100% for MATa2 mating type to just 13% for YBR184W (involved in gamete formation). This gene also has elevated Ka/Ks ratio of 0.7 compared to ±0.1 for most genes.

Ks is the frequency of synonymous or silent changes in nucleotides. Ka is the frequency of non-synonymous or replacement changes.

So Ks is mostly 3rd codon positions, and Ka is mostly 1st and 2nd.

If ratio of Ka/Ks = 1, then no selection or positive selection. Usually Ka/Ks << 1, indicating stabilizing or negative selection.

Page 4: IB404 - 6. Other Fungi – February 6

Universal genetic code

1. Single codons for M & W, so any change changes aa.2. Pairs for 7 amino acids, so even 3rd position changes can change aa.3. Only R(arg) and L(leu) can change first position without changing aa.4. Serine is really strange, indeed can only easily change from AGY to TCN via Threonine (ACN).

Page 5: IB404 - 6. Other Fungi – February 6

Conservation in the GAL1–GAL10 intergenic region (next slide). Multiple alignment of the four species shows good overlap between functional nucleotides and stretches of conservation. Asterisks denote conserved positions in the multiple alignment. Blue arrows denote the start and transcriptional orientation of the flanking ORFs. Experimentally validated transcription-factor-binding footprints are boxed and labeled according to the bound TF. Stretches of conserved nucleotides are underlined. Note the TATA boxes where transcription starts for each ORF. Nucleotides matching the published Gal4 motif are shown in red. Note that there are four Gal4 binding sites, which is common for promoters where cooperative binding of the TF is required for activation of the promoter. Presumably this regulatory region controls both ORFs. Scer, S. cerevisiae; Spar, S. paradoxus; Smik, S. mikatae; Sbay, S. bayanus.

Page 6: IB404 - 6. Other Fungi – February 6
Page 7: IB404 - 6. Other Fungi – February 6

Waterston’s analysis extended this to seven species - then can see more divergent evolution of the promoter region in more divergent species.

Gal4 Mig

Page 8: IB404 - 6. Other Fungi – February 6

Phylogenetic footprintingBoth groups undertook extensive analyses to identify additional copies of already known regulatory motifs (enhancers and silencers), as well as attempted to identify new ones.

1. They could identify sequence motifs upstream of genes of similar function, hence implied in common regulation, e.g. ACTCTTTT for amino acid metabolism, or GTACGGAT for ribosome biogenesis, or TTGCAA for peroxisome function.

2. They identified motifs of unknown function upstream of genes with coherent expression patterns from microarray studies, e.g. TGTTCT for expression in mitochondria, or CAAACAAA, AAGTA and TTTCTAGA for stress-induced genes, or TAGAAA and TTCTTTC for genes in the cell cycle.

Note from this that enhancers are relatively short 6-12 bp regions, which therefore can probably arise de novo, and as easily be lost.

Page 9: IB404 - 6. Other Fungi – February 6

Genome structureComparison of 11 yeast genomes allows reconstruction of the ancestor of them all, roughly 100 MYA, with 4,700 genes. To get to S. cerevisiae from this ancestor involved the whole genome duplication, loss of roughly 3,400 genes, plus 73 inversions and 66 reciprocal translocations.

Page 10: IB404 - 6. Other Fungi – February 6

Schizosaccharomyces pombe - fission yeast1. Large international consortium, but most famous is Paul Nurse, who won the 2001 Nobel for discovery of cdc2 encoding cyclin-dependent kinase involved in cell-cycle control.2. 14 Mbp genome encoding 5000 proteins.3. 43% of genes have introns, total 4,700 introns.4. Centromeres are relatively long, 35-110 kb, and consist of repeats ranging up to 1.8 kb long.5. S. cerevisiae centromeres are only 150-180 bp long with a 120 bp core region!

Page 11: IB404 - 6. Other Fungi – February 6

1. About 2/3 of these two yeasts genes encode proteins with clear matches in the nematode worm, C. elegans. So these are eukaryotic-specific genes.

2. In each case about 150 genes are shared with Ce, but not the other yeast, so these are ancient genes, yet lost in one or other yeast.

3. About 800 genes or 15% are shared between the two yeasts only, that is, are yeast-specific.

4. A similar number are unique to each yeast, more for Sc due to genome duplication in Sc.

Large-scale comparisons

Page 12: IB404 - 6. Other Fungi – February 6

Neurospora crassa - bread mold1. Beadle and Tatum’s “one-gene-one-enzyme” hypothesis.2. Also used for circadian rhythm, genome defense systems, and DNA methylation studies, e.g. led to recognition of importance in cancer.3. 40 Mbp genome done by WGS at MIT-Whitehead.4. ±10,000 predicted genes, so proteome is twice the size of the yeasts.5. Roughly 1 gene per 4 kb, compared with 1 per 2 kb in yeasts, and 2 introns per gene on average. So genome is rather more complicated.6. ±4,000 of these proteins had no matches in databases at the time.7. Repeat-induced point mutation (RIP) is a novel genome-defense mechanism whereby one copy of any repeated sequence is methylated at CpG dinucleotides and then is subject to elevated rates of point mutations, eventually becoming a pseudogene. Defends against transposons, but also limits the sizes of gene families.

Page 13: IB404 - 6. Other Fungi – February 6

Encephalitozoon cuniculi - microsporidian1. Microspordia are obligate intracellular parasites of animals.2. E. cuniculi infects mammals, including immune-compromised Hs.3. Microsporidia don’t have mitochondria, so were once thought to be a basal lineage of protists near the base of the Eukarya, but have some mitochondrially-derived genes, so perhaps have a mitosome - below.4. Genome is 3 Mbp with ±2000 genes.5. Proteins 10-30% shorter than yeast.6. Lost gene pathways, e.g. Krebs cycle.

Page 14: IB404 - 6. Other Fungi – February 6

Phytophthora infestans - an Oomycete1. Cause of potato blight and the Irish potato famine.2. Not really a fungus, but similar in biology.3. 240 Mbp genome is relatively large, due to an explosion of repetitive

elements, making up 75% of the sequence.4. Twice the size of two other Phytophthora, causing soybean root rot

and sudden oak death, due to these repeats.5. Roughly 18,000 genes predicted, and comparisons allowed

recognition of rapidly evolving “disease effector protein” genes.6. These mediate pathogenicity, and evasion of resistant potatoes.7. They are buried in repeats, mediating some of their rapid evolution?