Top Banner
1 L U N D U N I V E R S I T Y Comparative Genomics in Basidiomycetes - Analyzing multigene families Balaji Rajashekar Anders Tunlid Dag Ahrén Jason Stajich
16

1 L U N D U N I V E R S I T Y Comparative Genomics in Basidiomycetes - Analyzing multigene families Balaji Rajashekar Anders Tunlid Dag Ahrén Jason Stajich.

Mar 31, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 L U N D U N I V E R S I T Y Comparative Genomics in Basidiomycetes - Analyzing multigene families Balaji Rajashekar Anders Tunlid Dag Ahrén Jason Stajich.

1

L U N D U N I V E R S I T Y

Comparative Genomics in Basidiomycetes- Analyzing multigene families

Balaji RajashekarAnders Tunlid

Dag Ahrén

Jason Stajich

Page 2: 1 L U N D U N I V E R S I T Y Comparative Genomics in Basidiomycetes - Analyzing multigene families Balaji Rajashekar Anders Tunlid Dag Ahrén Jason Stajich.

2

L U N D U N I V E R S I T Y

Basidiomycete genome data

Protein coding genes

Genome size (Mb)

Laccaria bicolor 20,614 64.9

Coprinopsis cinerea 13,544 36.25-37.5

Phanerochaete chrysosporium

10,048 35.1

Cryptococcus neoformans 7302 19.5

Ustilago maydis 6522 19.7

58,030

Page 3: 1 L U N D U N I V E R S I T Y Comparative Genomics in Basidiomycetes - Analyzing multigene families Balaji Rajashekar Anders Tunlid Dag Ahrén Jason Stajich.

3

L U N D U N I V E R S I T Y

Sequence similarity & clustering

• BLASTP

Gene 1

Gene 2

Gene 3Gene 4

Gene 5Gene 6

Gene 7

Gene 8

Gene 9Gene 10

Page 4: 1 L U N D U N I V E R S I T Y Comparative Genomics in Basidiomycetes - Analyzing multigene families Balaji Rajashekar Anders Tunlid Dag Ahrén Jason Stajich.

4

L U N D U N I V E R S I T Y

TribeMCL (Enright et al. NAR 2002)

TribeMCL animation

• BLASTP: All against all for the basidiomycete genomes• 58,000 versus 58,000 proteins

• Split generated network into families

• Data and settings dependent

Page 5: 1 L U N D U N I V E R S I T Y Comparative Genomics in Basidiomycetes - Analyzing multigene families Balaji Rajashekar Anders Tunlid Dag Ahrén Jason Stajich.

5

L U N D U N I V E R S I T Y

Gene family distribution

Laccaria

Coprinopsis

Phanerochaete

Cryptococcus

Ustilago

Families present 5947 5148 4126 3056 2583

Families not present

1405 2204 3226 4296 4769

Total 7352 7352 7352 7352 7352

Page 6: 1 L U N D U N I V E R S I T Y Comparative Genomics in Basidiomycetes - Analyzing multigene families Balaji Rajashekar Anders Tunlid Dag Ahrén Jason Stajich.

6

L U N D U N I V E R S I T Y

Global view of proteins vs genome size

Page 7: 1 L U N D U N I V E R S I T Y Comparative Genomics in Basidiomycetes - Analyzing multigene families Balaji Rajashekar Anders Tunlid Dag Ahrén Jason Stajich.

7

L U N D U N I V E R S I T Y

Gene family size distribution

Page 8: 1 L U N D U N I V E R S I T Y Comparative Genomics in Basidiomycetes - Analyzing multigene families Balaji Rajashekar Anders Tunlid Dag Ahrén Jason Stajich.

8

L U N D U N I V E R S I T Y

Statistical analyses of gene families

CAFE (Bie et al, Bioinformatics 2006)• Model the evolution of gene family sizes• Takes phylogeny into account• Calculates birth and death of genes in all

nodes• Identifies families with accelerated gene

gain/loss including extinction

Page 9: 1 L U N D U N I V E R S I T Y Comparative Genomics in Basidiomycetes - Analyzing multigene families Balaji Rajashekar Anders Tunlid Dag Ahrén Jason Stajich.

9

L U N D U N I V E R S I T Y

Gene family expansions/contractions

Branch

Divergence time (MYA)

Expansion No change Contractions Average expansion

1 246 109 5248 26 0.036

2 167 426 4873 84 0.178

3 57 393 4855 135 0.130

4 84 1064 3844 475 0.695

5 84 459 4111 813 0.056

6 140 371 3291 1721 -0.169

7 308 307 2272 2804 -0.519

8 554 96 2043 3244 -0.655

Page 10: 1 L U N D U N I V E R S I T Y Comparative Genomics in Basidiomycetes - Analyzing multigene families Balaji Rajashekar Anders Tunlid Dag Ahrén Jason Stajich.

10

L U N D U N I V E R S I T Y

Protein families in Laccaria

5383 Protein families analysed by CAFE1969 Unique protein families7352 Protein families in total

Page 11: 1 L U N D U N I V E R S I T Y Comparative Genomics in Basidiomycetes - Analyzing multigene families Balaji Rajashekar Anders Tunlid Dag Ahrén Jason Stajich.

11

L U N D U N I V E R S I T Y

Example of families >25 Laccaria proteins

Protein family Lac Copr Phae Cryp Ust Pfam accession Pfam description

Significantly Expanded

       

1* 216 97 91 75 74 PF00400 WD domain, G-beta repeat

2* 150 113 109 86 74 PF00069, PF07714 Protein kinase domain, Protein tyrosine kinase

22 102 13 2 1 0

Unique          

5 206 0 0 0 PF00931, PF05729 NB-ARC domain, NACHT domain

17* 128 0 0 0

64 56 0 0 0

Page 12: 1 L U N D U N I V E R S I T Y Comparative Genomics in Basidiomycetes - Analyzing multigene families Balaji Rajashekar Anders Tunlid Dag Ahrén Jason Stajich.

12

L U N D U N I V E R S I T Y

Identification of significant families

Page 13: 1 L U N D U N I V E R S I T Y Comparative Genomics in Basidiomycetes - Analyzing multigene families Balaji Rajashekar Anders Tunlid Dag Ahrén Jason Stajich.

13

L U N D U N I V E R S I T Y

PCA of expression data

PCA case scores

Axi

s 2

Axis 1

MycPiv

MycPgh

MycD_I

MycD_II

FBE

FBL

K

E

K2MS238N_IMS238N_II

-0.8

-1.5

-2.3

-3.1

0.8

1.5

2.3

3.1

3.8

-0.8-1.5-2.3-3.1 0.8 1.5 2.3 3.1 3.8

PCA variable loadings

Axi

s 2

Axis 1

e_gww1.12.148.1

estExt_fgenesh2_pg.C_360055gww1.55.29.1

e_gwh1.27.56.1

eu2.Lbscf0030g00660

estExt_Genewise1_worm.C_40153

eu2.Lbscf0017g00490

eu2.Lbscf0013g00800

gwh1.7.135.1

gww1.27.41.1

eu2.Lbscf0006g02780

e_gwh1.2.448.1

e_gww1.6.110.1e_gwh1.44.45.1

estExt_GeneWisePlus_human.C_280082

e_gwh1.5.192.1e_gww1.19.31.1

e_gwh1.2.152.1eu2.Lbscf0003g07090

estExt_GeneWisePlus_worm.C_670057

eu2.Lbscf0024g01630gww1.57.59.1

eu2.Lbscf0007g01060

eu2.Lbscf0017g00380

gwh1.1.563.1

eu2.Lbscf0001g04800

estExt_GeneWisePlus_worm.C_10268eu2.Lbscf0002g06120gwh1.1.661.1

e_gww1.9.180.1

eu2.Lbscf0005g05230fgenesh3_pg.C_scaffold_12000242

estExt_Genewise1_human.C_400022gww1.1.1402.1

estExt_fgenesh2_pg.C_50174

gwh1.20.71.1estExt_GeneWisePlus_worm.C_60427e_gwh1.5.229.1

fgenesh3_pg.C_scaffold_40000114

gww1.1.1304.1

e_gww1.1.407.1gww1.6.137.1

fgenesh3_pg.C_scaffold_40000128

gww1.4.783.1

gwh1.37.23.1estExt_fgenesh2_pg.C_10789

estExt_GeneWisePlus_worm.C_10838

e_gww1.58.24.1gwh1.5.298.1

gww1.12.152.1

gwh1.4.771.1

fgenesh3_pg.C_scaffold_5000229

estExt_fgenesh2_pm.C_20123

e_gwh1.1.1345.1

e_gww1.30.42.1

e_gwh1.20.192.1

fgenesh3_pg.C_scaffold_229000001e_gwh1.20.74.1

eu2.Lbscf0003g07170

e_gww1.11.208.1gww1.36.63.1

gwh1.2.846.1

estExt_GeneWisePlus_human.C_90342eu2.Lbscf0068g00700

eu2.Lbscf0031g00330

eu2.Lbscf0151g00020

e_gwh1.1.623.1fgenesh3_pg.C_scaffold_6000181

eu2.Lbscf0003g05610

e_gwh1.5.297.1

gww1.2.997.1

eu2.Lbscf0012g00450

gww1.11.144.1

gww1.17.111.1gww1.3.211.1

e_gww1.2.453.1

e_gwh1.2.226.1

e_gww1.5.565.1

e_gww1.12.82.1

fgenesh3_pg.C_scaffold_6000326eu2.Lbscf0018g02120

fgenesh3_pg.C_scaffold_75000050

gwh1.4.754.1e_gww1.54.37.1

gwh1.10.174.1e_gww1.5.208.1

gww1.1.1346.1

estExt_fgenesh2_pg.C_120345

eu2.Lbscf0068g00180

e_gwh1.36.32.1eu2.Lbscf0001g01480gww1.1.38.1

e_gww1.4.280.1

eu2.Lbscf0060g00980

eu2.Lbscf0015g01810

gwh1.29.47.1fgenesh3_pm.C_scaffold_4000007

estExt_GeneWisePlus_human.C_120035e_gww1.2.361.1e_gww1.1.1259.1e_gww1.20.9.1

eu2.Lbscf0063g00790

eu2.Lbscf0005g02250e_gww1.61.7.1

gww1.11.198.1gwh1.4.285.1gwh1.9.351.1

gww1.5.261.1

e_gww1.2.425.1

eu2.Lbscf0001g02580estExt_GeneWisePlus_worm.C_330026gwh1.8.309.1e_gwh1.3.275.1

e_gww1.50.20.1eu2.Lbscf0004g00600eu2.Lbscf0035g01560

e_gwh1.4.532.1estExt_GeneWisePlus_worm.C_30636

eu2.Lbscf0002g05080eu2.Lbscf0002g09250e_gwh1.11.136.1eu2.Lbscf0010g02740

eu2.Lbscf0003g04790

eu2.Lbscf0003g08400

eu2.Lbscf0001g06630gww1.54.35.1gww1.72.23.1

eu2.Lbscf0004g00650

gww1.5.440.1

estExt_fgenesh2_pg.C_50124

gwh1.2.310.1fgenesh3_pg.C_scaffold_5000516e_gww1.4.568.1

eu2.Lbscf0014g00770

eu2.Lbscf0075g00640

gwh1.12.277.1

eu2.Lbscf0015g01340

eu2.Lbscf0026g01000

eu2.Lbscf0009g02240

e_gwh1.3.317.1

estExt_fgenesh2_pg.C_70239e_gwh1.8.173.1

eu2.Lbscf0015g01850

gww1.21.68.1

fgenesh3_pg.C_scaffold_2000796

fgenesh3_pg.C_scaffold_79000007

eu2.Lbscf0018g02060

eu2.Lbscf0026g00940

estExt_Genewise1_worm.C_30660

estExt_fgenesh2_pg.C_70059

-0.05

-0.11

-0.16

-0.22

-0.27

0.05

0.11

0.16

0.22

0.27

-0.05-0.11-0.16-0.22-0.27 0.05 0.11 0.16 0.22 0.27

Protein family 211 experiments

Mycelia

Mycorrhiza

Fruiting bodies

Axis 1

Page 14: 1 L U N D U N I V E R S I T Y Comparative Genomics in Basidiomycetes - Analyzing multigene families Balaji Rajashekar Anders Tunlid Dag Ahrén Jason Stajich.

14

L U N D U N I V E R S I T Y

Comparative Genomics in Basidiomycetes- Analyzing multigene families

Balaji RajashekarAnders Tunlid

Dag Ahrén

Jason Stajich

Page 15: 1 L U N D U N I V E R S I T Y Comparative Genomics in Basidiomycetes - Analyzing multigene families Balaji Rajashekar Anders Tunlid Dag Ahrén Jason Stajich.

15

L U N D U N I V E R S I T Y

Identification of significant families

Page 16: 1 L U N D U N I V E R S I T Y Comparative Genomics in Basidiomycetes - Analyzing multigene families Balaji Rajashekar Anders Tunlid Dag Ahrén Jason Stajich.

16

L U N D U N I V E R S I T Y