Evolution of gene family size change in fungi

Evolution of gene family size change in fungi

Jason StajichUniversity of California, Berkeley

BD model uses information about the time in the phylogenetictree and the birth and death rates of genes, it offers an ideal nullmodel for hypothesis testing. Using a BD model in this waymakes it possible to identify gene families that have undergoneunusual expansions or contractions. This method furthermoreenables us to identify the branch in the phylogeny upon whichthe unlikely change took place.

As argued above, likelihoods or conditional likelihoods can-not directly be used to identify unusual gene families, becauselarger gene families will by necessity result in lower likelihoodsunder a stochastic BD process (the “large family bias”). Instead,we can use our conditional likelihoods as test statistics to calcu-late conditional P-values, each one conditioned on one of thepossible root-node assignments. Such a conditional P-value isdefined as the probability that a random gene family (with fixedroot family size) has a smaller conditional likelihood than thegiven gene family. Then, because the true root-node value is un-known, we conservatively pick the largest conditional P-value,which we can show to represent a tight upper bound on the trueP-value in our problem (see Methods; Supplemental material).Such an upper bound on the P-value is called a supremum P-value in statistics, and it is often used for composite hypothesistesting with one or more nuisance parameters (Lehmann 1959;Demortier 2003). Because of its tightness as an upper bound inour problem, we refer to the supremum P-value as simply theP-value in the remainder of this study. In the Methods section weshow how it can efficiently and accurately be computed using asampling procedure.

Furthermore, we propose two methods to identify thebranch in the phylogeny upon which nonrandom changes oc-curred (for families with a low P-value). Our first method com-putes a P-value corresponding to the observed data after the de-letion of one branch in the PGM, and this once for each branch(for each gene family). If, after the deletion of a branch, theresulting P-value rises above some threshold P-value (0.01 here),then the branch that was cut is implicated in nonrandom evo-lution. Our second method uses a likelihood ratio test to comparea model allowing the ! parameter to vary along each branchsingly to the model with one ! for the whole tree (see Methods;Supplemental materials). It is notable that, in all cases, thebranch with the largest likelihood ratio was also the branch thatyielded the largest P-value after cutting it, as computed by thefirst method.

Global view of Saccharomyces gene family evolutionWe used the machinery described above to study the evolution ofgene family size in five whole fungal genomes. To our knowl-edge, the five sequenced Saccharomyces genomes are the best ex-ample of a closely related group of eukaryotes, where multiplewhole genomes have been sequenced and where there is also awell-supported phylogenetic tree with branch lengths.

The consensus phylogenetic tree of the five Saccharomycesspecies (Fig. 2) comes from the study of Rokas et al. (2003) thatused 106 orthologous genes from each of the species, singly andby concatenation. The tree had 100% bootstrap support at everynode. In Newick notation, the tree in Figure 2 is written (S. baya-nus (S. kudriavzevii(S. mikatae(S. paradoxus S. cerevisiae)))). Branchlengths were inferred from the data in Rokas et al. (2003) andKellis et al. (2003). They are indicated in Figure 2 as time, t, inmillion years. We estimated the evolutionary rate parameter ! as0.002 per million years (see Supplemental materials).

To define gene families, we took all of the genes in all fivespecies together and generated a pairwise matrix of distancesamong genes (see Supplemental materials). We then clusteredgenes using the TRIBE-MCL algorithm (Van Dongen 2000; En-right et al. 2002), and counted the number of genes in eachfamily that came from each species. By clustering all of the genesat the same time, we are able to confidently compare the size offamilies between genomes.

In the 32 million years since the most recent common an-cestor of the five species, 1254 of the 3517 gene families sharedamong them have changed in size; the remaining set are mono-morphic across the tree (of course, equal numbers of losses andgains in any single gene family will be unobservable). Using ourPGM we were able to infer the most likely ancestral gene familysizes for all of these gene families. This makes it possible to countchanges in gene family size on all eight branches of the tree, andenables us to infer their direction by a comparison of the speciesat the top and bottom of each branch in the tree. Expansionsoutnumbered contractions on four of the eight branches, andcontractions outnumbered expansions on the remaining four.Table 1 shows the number of families that expanded, contracted,or stayed the same on each branch of the tree.

We can see that along branches 2 and 3, leading to S. kudria-vzevii and S. mikatae, many more families have expanded thancontracted. Concomitant with this, these two genomes havemore genes (7144 and 7236) than any of the other three (6265,6128, and 6700 for S. bayanus, S. paradoxus, and S. cerevisiae; see

Figure 2. The phylogenetic tree. Branch lengths t are given in millionsof years. The branch numbers used in this study are shown in circles.

Table 1. The number of gene families that showed an expansion,no change, or a contraction along the eight branches, accordingto the most likely assignments of the gene family sizes ofthe ancestors

Branch # ExpansionsNo

change ContractionsAverage

expansion

1 (t = 32) 97 3181 239 !0.0502 (t = 27) 383 3032 102 0.0953 (t = 22) 509 2922 86 0.1474 (t = 12) 96 3383 38 0.0195 (t = 12) 44 3426 47 0.0216 (t = 5) 3 3491 23 !0.0057 (t = 10) 10 3313 194 !0.0528 (t = 5) 2 3515 0 0.001

The first column contains the branch number, along with the length ofthe branch, t, in millions of years. The next three columns show howoften an expansion, no change, or a contraction occurred along thisbranch. The last column shows the average gene family expansionamong all families along each branch, where a contraction is counted asa negative expansion.

Gene family evolution

Genome Research 1155www.genome.org

N.crassaA.gossypiiR.oryzaeA.oryzaeA.terreusC.cinereusU.maydis

101

1

10

100

1000

10000

100

Family size

Fre

qu

en

cy o

f F

am

ily s

ize

Outline

• Gene family size change - a model

• Cornucopia of fungal genomes

• Methodology for comparing family size

• Lineage specific expansions

Gene family evolution• Gene duplications are the crucible of new

genes and thus new functions

• Many comparative approaches focus only on identifiable one-to-one orthologs.

• Signature of adaptive evolution can be confounded in multi-gene families

• How important is lineage-specific expansion in adaptive changes?

Identifying family expansions

• Previous work only considered pairwise

• Ad hoc comparison of gene family sizes

• C.elegans-C.briggsae - GPCR family expansions (Stein et al, PLOS Biology 2004)

• A. gambiae-D. melanogaster - Mosquito specific family expansions related to symbiotic bacteria (Holt et el, Science 2002).

• Need a null model

Gene family sizes follow power law distribution

N.crassaA.gossypiiR.oryzaeA.oryzaeA.terreusC.cinereusU.maydis

101

1

10

100

1000

10000

100

Family size

Fre

quency o

f F

am

ily s

ize

Multicopy genesSugar transporters

P450 Enzymes

single copy genesPRP8 (splicing)

CDC48 (cell cycle ATPase)

Phylogenetic evaluation of gene family size change• Previous methods only used ad hoc

statistics

• Explicit model for gene family size change according to a Birth-Death models

• Apply BD to family size along phylogeny using probabilistic graph models

• CAFE - Computational Analysis of gene Family Evolution Hahn et al, Genome Res 2005

De Bie, et al Bioinformatics 2006Demuth et al, submitted

CAFE

• Use a Probabilistic Graph Model for:

• Ancestral states

• Birth and Death rate (lamda)

• Per branch changes

• P-values

BD model uses information about the time in the phylogenetictree and the birth and death rates of genes, it offers an ideal nullmodel for hypothesis testing. Using a BD model in this waymakes it possible to identify gene families that have undergoneunusual expansions or contractions. This method furthermoreenables us to identify the branch in the phylogeny upon whichthe unlikely change took place.

As argued above, likelihoods or conditional likelihoods can-not directly be used to identify unusual gene families, becauselarger gene families will by necessity result in lower likelihoodsunder a stochastic BD process (the “large family bias”). Instead,we can use our conditional likelihoods as test statistics to calcu-late conditional P-values, each one conditioned on one of thepossible root-node assignments. Such a conditional P-value isdefined as the probability that a random gene family (with fixedroot family size) has a smaller conditional likelihood than thegiven gene family. Then, because the true root-node value is un-known, we conservatively pick the largest conditional P-value,which we can show to represent a tight upper bound on the trueP-value in our problem (see Methods; Supplemental material).Such an upper bound on the P-value is called a supremum P-value in statistics, and it is often used for composite hypothesistesting with one or more nuisance parameters (Lehmann 1959;Demortier 2003). Because of its tightness as an upper bound inour problem, we refer to the supremum P-value as simply theP-value in the remainder of this study. In the Methods section weshow how it can efficiently and accurately be computed using asampling procedure.

Furthermore, we propose two methods to identify thebranch in the phylogeny upon which nonrandom changes oc-curred (for families with a low P-value). Our first method com-putes a P-value corresponding to the observed data after the de-letion of one branch in the PGM, and this once for each branch(for each gene family). If, after the deletion of a branch, theresulting P-value rises above some threshold P-value (0.01 here),then the branch that was cut is implicated in nonrandom evo-lution. Our second method uses a likelihood ratio test to comparea model allowing the ! parameter to vary along each branchsingly to the model with one ! for the whole tree (see Methods;Supplemental materials). It is notable that, in all cases, thebranch with the largest likelihood ratio was also the branch thatyielded the largest P-value after cutting it, as computed by thefirst method.

Global view of Saccharomyces gene family evolutionWe used the machinery described above to study the evolution ofgene family size in five whole fungal genomes. To our knowl-edge, the five sequenced Saccharomyces genomes are the best ex-ample of a closely related group of eukaryotes, where multiplewhole genomes have been sequenced and where there is also awell-supported phylogenetic tree with branch lengths.

The consensus phylogenetic tree of the five Saccharomycesspecies (Fig. 2) comes from the study of Rokas et al. (2003) thatused 106 orthologous genes from each of the species, singly andby concatenation. The tree had 100% bootstrap support at everynode. In Newick notation, the tree in Figure 2 is written (S. baya-nus (S. kudriavzevii(S. mikatae(S. paradoxus S. cerevisiae)))). Branchlengths were inferred from the data in Rokas et al. (2003) andKellis et al. (2003). They are indicated in Figure 2 as time, t, inmillion years. We estimated the evolutionary rate parameter ! as0.002 per million years (see Supplemental materials).

To define gene families, we took all of the genes in all fivespecies together and generated a pairwise matrix of distancesamong genes (see Supplemental materials). We then clusteredgenes using the TRIBE-MCL algorithm (Van Dongen 2000; En-right et al. 2002), and counted the number of genes in eachfamily that came from each species. By clustering all of the genesat the same time, we are able to confidently compare the size offamilies between genomes.

In the 32 million years since the most recent common an-cestor of the five species, 1254 of the 3517 gene families sharedamong them have changed in size; the remaining set are mono-morphic across the tree (of course, equal numbers of losses andgains in any single gene family will be unobservable). Using ourPGM we were able to infer the most likely ancestral gene familysizes for all of these gene families. This makes it possible to countchanges in gene family size on all eight branches of the tree, andenables us to infer their direction by a comparison of the speciesat the top and bottom of each branch in the tree. Expansionsoutnumbered contractions on four of the eight branches, andcontractions outnumbered expansions on the remaining four.Table 1 shows the number of families that expanded, contracted,or stayed the same on each branch of the tree.

We can see that along branches 2 and 3, leading to S. kudria-vzevii and S. mikatae, many more families have expanded thancontracted. Concomitant with this, these two genomes havemore genes (7144 and 7236) than any of the other three (6265,6128, and 6700 for S. bayanus, S. paradoxus, and S. cerevisiae; see

Figure 2. The phylogenetic tree. Branch lengths t are given in millionsof years. The branch numbers used in this study are shown in circles.

Table 1. The number of gene families that showed an expansion,no change, or a contraction along the eight branches, accordingto the most likely assignments of the gene family sizes ofthe ancestors

Branch # ExpansionsNo

change ContractionsAverage

expansion

1 (t = 32) 97 3181 239 !0.0502 (t = 27) 383 3032 102 0.0953 (t = 22) 509 2922 86 0.1474 (t = 12) 96 3383 38 0.0195 (t = 12) 44 3426 47 0.0216 (t = 5) 3 3491 23 !0.0057 (t = 10) 10 3313 194 !0.0528 (t = 5) 2 3515 0 0.001

The first column contains the branch number, along with the length ofthe branch, t, in millions of years. The next three columns show howoften an expansion, no change, or a contraction occurred along thisbranch. The last column shows the average gene family expansionamong all families along each branch, where a contraction is counted asa negative expansion.

Gene family evolution

Genome Research 1155www.genome.org

Hahn et al, Genome Res 2005

37 Fully sequenced fungal genomes

Rhizopus oryazeNeurospora crassaPodospora anserinaChaetomium globosumMagnaporthe griseaFusarium verticillioidesFusarium graminearumTrichoderma reeseiSclerotinia sclerotiorumBotrytis cinereaStagonospora nodorumUncinocarpus reesiiCoccidioides immitisHistoplasma capsulatumAspergillus fumigatusAspergillus nidulansAspergillus terreusAspergillus oryzaeAshbya gosspyiiKluyveromyces lactisSaccharomyces cerevisiaeCandida glabrataCandida lusitaniaeDebaryomyces hanseniiCandida guilliermondiiCandida tropicalisCandida albicansCandida dubliniensisYarrowia lipolyticaSchizosaccharomyces pombeCryptococcus neoformansCryptococcus neoformans H99Cryptococcus gattii WM276Cryptococcus gattii R265Phanerochaete chrysosporiumCoprinus cinereusUstilago maydis

Saprophyte

Bread mold, Opp Hum pathogen

Saprophyte

Hemibiotroph - Rice

Hemibiotroph - wheat

Primary Hum pathogen


Opp Hum pathogen

Saprophyte/Industrial uses

Hemibiotroph - maize

Saprophyte

Necrotroph

Necrotroph - fruits


Opp Hum pathogen

Biotroph/Industrial uses

Industrial uses

Industrial uses

Opp Hum pathogen

Opp Hum pathogen

Opp Hum pathogen

Opp Hum pathogen

Opp Hum pathogen

Opp Hum pathogenSaprophyte

SaprophyteBiotroph - maize

Opp Hum pathogen

Opp Hum pathogen

Opp HumOpp Hum pathogen

Industrial uses

SaprophyteOpp Hum Pathogen

Saprophyte

pathogen

0100200300400500600700800900Million

years ago

Euascomycota

Zygomycota

Hemiascomycota

Archiascomycota

Basidiomycota

50+ More funded and in progress world-wide

Species Clade Sequencing centerSchizosaccharomyces japonicus Archaeascomycta Broad-FGISchizosaccharomyces octosporus Archaeascomycta Broad-FGIPneumocystis carinii Archaeascomycta Sanger, UC, Broad-FGIPneumocystis carinii hominis Archaeascomycta UC, Broad-FGI, UCAmanita bisporigera Basidiomycota: Homobasidiomycota MSUCrinipellis perniciosa Basidiomycota: Homobasidiomycota Univ CampinasGanoderma lucidum Basidiomycota: Homobasidiomycota Yang-Ming UnivHebeloma cylindrosporum Basidiomycota: Homobasidiomycota INRALaccaria bicolor Basidiomycota: Homobasidiomycota JGI-DOEPhakopsora pachyrhizi Basidiomycota: Homobasidiomycota JGI-DOEPostia placenta Basidiomycota: Homobasidiomycota JGI-DOESchizophyllum commune Basidiomycota: Homobasidiomycota JGI-DOESporobolomyces roseus Basidiomycota: Urediniomycota JGI-DOEPhakopsora meibomiae Basidiomycota: Urediniomycota JGI-DOEBatrachochytrium dendrobatidis Chytridiomycota Broad-FGI & JGI-DOEPiromyces sp. Chytridiomycota JGI-DOEGlomus intraradices Glomeromycota JGI-DOEPhycomyces blakesleeanus Zygomycota JGI-DOEBrachiola algerae Microsporidia GenoscopeNosema (Antonospora) locustae Microsporidia MBLEnterocytozoon bieneusi Microsporidia Tufts Univ

Table C.3: Additional funded fungal genome sequencing projects as of Spring 2006. This data was partially derivedfrom the Genomes online database (190)

124

Sequencing In-Progress*

R

RR

Species Clade Sequencing centerAspergillus niger Euascomycota: Eurotiomycota DOE-JGIAspergillus flavus Euascomycota: Eurotiomycota NCSUAspergillus clavatus Euascomycota: Eurotiomycota OUNeosartorya fischeri Euascomycota: Eurotiomycetes TIGRHistoplasma capsulatum WU24 Euascomycota: Eurotiomycota Broad-FGIHistoplasma capsulatum 186R,217B Euascomycota: Eurotiomycota WUSTLCoccidioides posadasii Euascomycota: Eurotiomycota TIGRCoccidioides immitis 10 strains Euascomycota: Eurotiomycota Broad-FGI & TIGRParacoccidioides brasiliensis Euascomycota: Eurotiomycota Univ of BrazilAscosphaera apis Euascomycota: Eurotiomycota BCMEpichloe festucae Euascomycota: Sordariomycetes UKPodospora anserina Euascomycota: Sordariomycetes Broad-FGITrichoderma atroviride Euascomycota: Sordariomycetes DOE-JGITrichoderma virens Euascomycota: Sordariomycetes DOE-JGILeptosphaeria maculans Euascomycota: Dothideomycetes GenoscopeAlternaria brassicicola Euascomycota: Dothideomycetes VPI & WUSTLXanthoria parietina (lichen) Euascomycota: Lecanoromycetes DOE-JGICandida albicans WO-1 Hemiascomycota Broad-FGILodderomyces elongisporus Hemiascomycota Broad-FGIPichia stipitis Hemiascomycota JGI-DOESaccharomces bayanus Hemiascomycota (49, 167)Saccharomces castellii Hemiascomycota (49)Saccharomces cerevevisiae RM11-1A Hemiascomycota Broad-FGISaccharomces cerevevisiae YJM789 Hemiascomycota (113)Saccharomyces kluyeri Hemiascomycota WUSTL (finishing)Saccharomces kudriavzevii Hemiascomycota (49)Saccharomces mikatae Hemiascomycota (49, 167)Saccharomces paradoxus Hemiascomycota (167)Saccharomyces pastorianus Hemiascomycota Kitasato UnivZygosaccharomyces rouxii Hemiascomycota CNRS-Genoscope

Table C.2: In progress and funded fungal genome sequencing projects as of Spring 2006. This data was partially derivedfrom the Genomes online database (190)

123

Sequ

enci

ng In

-Pro

gres

s*

+++

R

R

R

R

Genome annotation

• Many of the fungal genomes were only assembled genomic sequence.

• Automated annotation pipeline was built to generate to get systematic gene prediction.

• Several gene prediction programs were trained and results were combined with GLEAN (Liu, Mackey, Roo, et al unpublished) to produce composite gene calls.

Rhizopus oryazeNeurospora crassaPodospora anserinaChaetomium globosumMagnaporthe griseaFusarium verticillioidesFusarium graminearumTrichoderma reeseiSclerotinia sclerotiorumBotrytis cinereaStagonospora nodorumUncinocarpus reesiiCoccidioides immitisHistoplasma capsulatumAspergillus fumigatusAspergillus nidulansAspergillus terreusAspergillus oryzaeAshbya gosspyiiKluyveromyces lactisSaccharomyces cerevisiaeCandida glabrataCandida lusitaniaeDebaryomyces hanseniiCandida guilliermondiiCandida tropicalisCandida albicansCandida dubliniensisYarrowia lipolyticaSchizosaccharomyces pombeCryptococcus neoformansCryptococcus neoformans H99Cryptococcus gattii WM276Cryptococcus gattii R265Phanerochaete chrysosporiumCoprinus cinereusUstilago maydis

Saprophyte

Bread mold, Opp Hum pathogen

Saprophyte

Hemibiotroph - Rice




Opp Hum pathogen

Saprophyte/Industrial uses

Hemibiotroph - maize

Saprophyte

Necrotroph

Necrotroph - fruits


Opp Hum pathogen

Biotroph/Industrial uses

Industrial uses

Industrial uses

Opp Hum pathogen

Opp Hum pathogen

Opp Hum pathogen

Opp Hum pathogen

Opp Hum pathogen

Opp Hum pathogenSaprophyte

SaprophyteBiotroph - maize

Opp Hum pathogen

Opp Hum pathogen

Opp HumOpp Hum pathogen

Industrial uses

SaprophyteOpp Hum Pathogen

Saprophyte

pathogen

0100200300400500600700800900

Million years ago

Euascomycota

Zygomycota

Hemiascomycota

Basidiomycota

Ascomycota

Intron frequency varies among the fungi

1 2 3 4 5 60

100

200

300

400

500

Mean Introns per gene

Me

an

In

tro

n le

ng

th

Rory

Umay

SpomCneo

PchrCcin

Cgla

Klac

Ylip

Scer

Dhan

Agos

Pans

SNAPTwinscan

GlimmerGenscan

ZFF to GFF3

GFF2 to GFF3

Tools::Genscan

Tools::Glimmer

BLASTZBLASTN

SearchIO

Bio:

:DB:

:GFF

HMMER

GFF to AA

Genome

Proteins

predicted proteins

SearchIO

ProteinsFASTAall-vs-all

exonerateprotein2genome

GFF2 to GFF3

Findorthologs

protein to genome

coordinates

GLEAN(combiner)

Tools::GFF

exonerateest2genome

ESTs

SearchIO

MCL

Gene families

Rfam tRNAscan

Multiple sequence alignment

Intron mapping into

alignmentIntron analysis

Bio::AlignIOaa2cds alignment

AnalysisMethodology

http://fungal.genome.duke.edu

Generic Genome Browser

Methods: gene family identification

• All-vs-All pairwise sequence searches (FASTP)

• Cluster genes by similarity using Markov CLustering (MCL) algorithm

• Identify families with unusually large size changes along phylogeny with CAFE

• Use 37 fungal genomes from 5 major clades

A B C D E

FASTAall-vs-all

MCL

Gene families

CAFE

18 U. maydis

5 C. gattii R265

5 C. gattii WM276

5 gattii

5 C. neoformans JEC21

5 C. neoformans var grubii

5 neoformans

5 Cryptococcus

163 P. chrysosporium

141 C.cinereus

136 Homobasidiomycota

23 Hymenomycota

23 Basidiomycota

0100200300400

Family 1 P < 0.001 Branch A

Family 2 P < 0.001 Branch B

Family 3 P=0.02 Branch C,E

Family 4 P=0.03 Branch D

10 1 2

14 18 2

7 1 1

6 1 12

6 1 8

3 1 1+

Family count

Spec

ies

Families with significant expansionsVitamin & Cofactor transport

Lactose & sugar transport

Amine transport

Myo-instol, quinate, and glucose transport

Oligopeptide transport

ABC transporter

MFS, drug pump, & sugar transport

Transport

Monocarboxylate & sugar transport

ABC transport

Amino acid permease

Methytransferase

Cytochrome P450: CYP64

Cytochrome P450: CYP53,57A

Cytochrome P450

Kinase

Subtilase family

NADH flavin oxidoreductase

Aldehyde dehydrogenase

Aldo/kedo reductase

Multicopper oxidase

AMP-binding enzyme

TransportersKinasesP450

Oxidation

49 significant families

Transporters

• Of 45 significant families, 22 were related to transport

• Vitamin and amino acid transport

• Sugar and sugar-like transporters

• Multidrug and efflux pumps

• ABC transporters (ATP Binding Cassette)

21 Rhizopus oryaze

20 Neurospora crassa

27 Podospora anserina

19 Chaetomium globosum27

28

46 Magnaporthe grisea

31

84 Fusarium verticillioides

62 Fusarium graminearum63

33 Trichoderma reesei38

31 Sordariomycetes

22 Sclerotinia sclerotiorum

25 Botrytis cinerea24

30

66 Stagonospora nodorum

25 Uncinocarpus reesii

22 Coccidioides immitis24

17 Histoplasma capsulatum25

44 Aspergillus fumigatus

50 Aspergillus nidulans

64 Aspergillus terreus

90 Aspergillus oryzae62

53

48

33 Eurotiomycota

32

30 Euascomycota

5 Ashbya gosspyii

9 Kluyveromyces lactis8

7 Saccharomyces cerevisiae

6 Candida glabrata7

8

18 Candida lusitaniae

24 Debaryomyces hansenii

18 Candida guilliermondii18

17

8 Candida tropicalis

7 Candida albicans

7 Candida dubliniensis7

8

15 Candidacae

15

30 Yarrowia lipolytica

21 Hemiascomycota

23 HemiEuascomycota

10 Schizosaccharomyces pombe22 Ascomycota

33 Cryptococcus neoformans neoformans

32 Cryptococcus neoformans grubii32

25 Cryptococcus gattii WM276

26 Cryptococcus gattii R26526

28

27 Phanerochaete chrysosporium

24 Coprinus cinereus

24

13 Ustilago maydis22 Basidiomycota

22

22

0100200300400500600700800900

Vitamin & Cofactor

Transporters

Marked branches with

significant (P<0.05)

expansions or contractions

Transporter expansions• Sugar related, Drug pump, and Major

Facilitator Superfamily

• Aspergillus spp, Fusarium spp, S. nodorum

• Euascomycota

• Vitamin transport

• C. neoformans, Fusarium

• A. nidulans (Biotin)

• Saccharomyces expansions independent!

Aspergillus

Fusarium

S. nodorum

C. neoformans

Sugar transporter use in phytopathogens

• Sugar transporters are used to extract nutrients from host

• Haustorium: specialized structure for plant parasitism

• Many sugar transporters highly and specifically expressed in haustoria

Haustorium

http://tolweb.org/Robert Bauer

Podospora anserina

Chaetomium globosum

Neurospora crassaPyrenomycota

Magnaporthe grisea

Fusarium verticillioides

Fusarium graminearum

Trichoderma reesei

Sordariomycota

Sclerotinia sclerotiorum

Botrytis cinerea

Aspergillus terreus

Aspergillus oryzae

Aspergillus nidulans

Aspergillus fumigatusAspergillus

Uncinocarpus reesii

Coccidioides immitis

Histoplasma capsulatum

Eurotiomycota

Stagonospora nodorum

Debaryomyces hansenii

Candida guilliermondii

Candida lusitaniae

Candida albicans

Candida dubliniensis

Candida tropicalis

Candidacae

Ashbya gosspyii

Kluyveromyces lactis

Saccharomyces cerevisiae

Candida glabrata

Saccharomyces

Yarrowia lipolytica

Hemiascomycota

Hemi-Euascomycota

Schizosaccharomyces pombe

Ascomycota

Cryptococcus neoformans

Cryptococcus neoformans grubii

Cryptococcus gattii WM276

Cryptococcus gattii R265Phanerochaete chrysosporium

Coprinus cinereusHomobasidiomycota

Hymenomycota

Ustilago maydisBasidiomycota

Rhizopus oryaze0100200300400500600700800900

400 My

966 My

Basidiomycota changes

U.maydisC.cinereusP.chrysosporiumC.neoformans

P450 CYP64

18 U. maydis

5 C. gattii R265

5 C. gattii WM276

5 gattii



5 neoformans

5 Cryptococcus


141 C.cinereus


23 Hymenomycota

23 Basidiomycota

0100200300400

P450 enzymes involved in synthesis and cleavage of chemical bonds. Drug metabolism in animals.

CYP64: Step in Aspergillus spp aflatoxin pathwayP. chrysosporium implicated in lignin and hydrocarbon degradation.

Million yearsago

CYP64 was from independent duplication

ccin

03995 ccin

12432

ccin

12477

ccin

12447

ccin

03760 ccin

08843

ccin

08880

ccin

08948

ccin

08949

ccin

09228

ccin

08947

ccin

08946

ccin

12431

ccin

12515

ccin

08608

ccin

12514

ccin

07535

ccin

07536

ccin

05141

ccin

07531

ccin

04462

ccin

04461

ccin

04460

ccin

07538

ccin

01326

ccin

04884

ccin

07555

ccin

07554

ccin

03994

ccin

12516

ccin

09337

ccin

09357

ccin

09950

ccin

00042

ccin

00039

ccin

00043

ccin

11079

ccin

11073

ccin

12301

ccin

03618

ccin

12868

ccin

12386

ccin

03622

ccin

09244

ccin

08520

ccin

10950

ccin

13218

ccin

13220

pchr

04215

pchr

02481

pchr

02475

pchr

02324

pchr

02322

pchr

02461

pchr

02249

pchr

02248

pchr

02471

pchr

02460

pchr

02469

pchr

02472

pchr

02470

pchr

02468

pchr

02462

pchr

02442

pchr

02441

pchr

02317

pchr

02473

pchr

02477

pchr

02480

pchr

02479

pchr

02474

pchr

02459

pchr

02478

pchr

09197

pchr

08602

pchr

02326

pchr

08048

pchr

08046

pchr

08045

pchr

08047

pchr

02328

pchr

06733

tree 1

6195

pchr

10861

pchr

07430

pchr

07443

C. cinereus expansion P. chrysosporium expansion

Tom VolkMario Cervini

Local duplications created CYP64 expansion

9k 10k 11k 12k 13k 14k 15k 16k 17k 18k 19k 20k 21k 22k 23k 24k

pchr_24pchr_24

GLEAN modelsGLEAN_02414

Probability 1

GLEAN_02415

Probability 0.999937

GLEAN_02416


GLEAN_02417


Pfam domainsp450

Cytochrome P450 evalue:1e-28

p450 p450


p450p450

Cytochrome P450 evalue:6.3e-23

p450p450


p450

Interpretation of CYP64 expansion

18 U. maydis

5 C. gattii R265

5 C. gattii WM276

5 gattii



5 neoformans

5 Cryptococcus


141 C.cinereus


23 Hymenomycota

23 Basidiomycota

0100200300400Million yearsago

Angiosperm diversification

Hydrophobin Family

• Self assembling proteins involved in fungal cell wall

• Part of what makes a mushroom

• 8 Cysteine residues critical to function

• Help spores stay airborne resisting water

P.chr C.cin C.neo U.may

21 33 0 2

umay UM05010umay UM04433

ccin 10587ccin 10586







ccin 06185ccin 06184ccin 06194ccin 08744

ccin 06204ccin 05130ccin 05145

ccin 00406pchr 10481

pchr 10482pchr 03412














ccin 086570.1

Local Duplications

C. cinereus

P. chrysosporium

Cryptococcus sugar transporters expansion

19 U.maydis

47 C.gattii R265

50 C.gattii WM276

50 gattii

57 C.neoformans JEC21

59 C.neoformans H99

57 neoformans

50 Cryptococcus

23 P.chrysosporium

20 C.cinereus

23 Homobasidio

24 Hymenomycota

24

0100200300400

FIG

.1.

The

exte

rnal

and

inte

rnal

stru

ctur

esof

C.n

eofo

rman

sar

esh

own

bym

eans

ofa

mod

ified

Indi

ain

kpr

epar

atio

n.M

agni

ficat

ion,

ca.!

1,00

0.

Con

tinue

dfro

mpr

eced

ing

page

2291

Cryptococcus sugar transporters

• 3x as many sugar transporters in C. neoformans (~50) than other basidiomycetes

• “sugar coated killer”

• Capsule is a mixture of glucose, xylose, and mannose.

• Transporters could be important in capsule synthesis

Zerpa et al, 1996

Conclusions

• Transporters are highly expanded in independent lineages

• Saprophytic and phytopathogenic lifestyles

• Adaptive Homobasidiomycete (mushroom) expansions

• Lignin degradation - saprophytic lifestyles

• Hydrophobins - cell wall structures

Acknowledgments

Matthew Hahn (Indiana)Jeff Demuth

Sang-Gook Han

Tijl De BieNello Cristianini

Aaron MackeyIan Korf

Mario Stanke

Fred Dietrich (Duke)

Sequencing centersBroad Institute

Joint Genome InstituteGénolevures

Stanford UniversityTIGR

Welcome Trust Sanger Centre(NIH and NSF)

Evolution of gene family size change in fungi

Technology

millions family size

multigene families

demuth et

null model

phylogenetic tree

branch lengths t

gossypii r

terreus c