Top Banner
Modeling Bayesian Phylogenetic Inference in Protein Data Analysis by Using Mr. Bayes, Proml, Consensus Applications
25
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cpu time 051010

Modeling Bayesian Phylogenetic Inference in Protein

Data Analysis by Using Mr. Bayes, Proml, Consensus

Applications

Page 2: Cpu time 051010

Mr. Bayes vs. Proml (maximum likelihood)

1 3 5 7 9

11

13

15

17

19

21

S1

0

2000

4000

6000

8000

10000

12000

Series1Series2

Page 3: Cpu time 051010

CPU time/Mr. Bayes/Proml

1

9

17

S1

S2

S3

0

500

1000

1500

2000

2500

3000

3500

4000

4500

Series1

Series2

Series3

Page 4: Cpu time 051010

Diff. of Maximum Likelihood(Mr. Bayes – Proml)

vs. CPU (sec)

maximum likelihood

0500

10001500200025003000350040004500

0 200 400 600 800

diff (postml - proml)

cp

u t

ime (

sec)

Series1

Page 5: Cpu time 051010

Diff. of Maximum Likelihood(Mr. Bayes – Proml)

vs. CPU (sec)maximum likelihood

0500

10001500200025003000350040004500

1 4 7 10 13 16 19

diff (postml - preml)

cp

u t

ime (

sec)

Series1

Series2

Page 6: Cpu time 051010

Linear Regression in Testing Datasets

linear regression

0

2000

4000

6000

8000

10000

12000

0 5000 10000 15000

Series1

Page 7: Cpu time 051010

Testing Datasets Plus One/Two Long Branch’s Datasets

147101316192225283134

S10

2000

4000

6000

8000

10000

12000

14000

16000

mrbayes vs proml (plus AB,CD data)

Series1

Series2

Page 8: Cpu time 051010

Linear Regression After Bayesian Correction for Testing Datasets & One/Two Long Branch’s Datasets

0

2000

4000

6000

8000

10000

12000

14000

16000

0 5000 10000 15000 20000

Series1

Page 9: Cpu time 051010

Phylogeny for All Testing Datasets

phy all

-0.5

0

0.5

1

1.5

2

2.5

3

0 50 100 150 200 250 300

no.

leng

th Series1

Page 10: Cpu time 051010

Phylogeny for All Datasets

phylogeny for all datasets

-0.5

0

0.5

1

1.5

2

2.5

3

0 50 100 150 200 250 300 350 400

no.

leng

th Series1

Page 11: Cpu time 051010

One Long Branch Datasets

one long branch

-0.5

0

0.5

1

1.5

2

2.5

0 10 20 30 40 50 60

no.

leng

th Series1

Page 12: Cpu time 051010

Two Long Branches Datasets

two long branches

00.20.40.60.8

11.21.41.6

0 10 20 30 40 50 60

no.

leng

th Series1

Page 13: Cpu time 051010

Phylogeny (sequence length from Proml)

phy07

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0 5 10 15

species

len

gth

Series1

AB50J

-0.5

0

0.5

1

1.5

2

0 2 4 6 8

no.

len

gth

Series1

CD20J

0

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8

no.

len

gth

Series1

phy06

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0 2 4 6 8 10

species

len

gth

Series1

Page 14: Cpu time 051010

One/Two Long Branch’s Datasets(Maximum Likelihood)

CD

10J

1200

0.74

179

1132

5.10

468

CD

50J

1264

8.05

314

1193

6.72

682

S111000115001200012500

1300013500

14000

Series1

Page 15: Cpu time 051010

Data Analysis• Testing datasets: phy01 ~ phy21, nexus01

~ nexus21)

• Experimental datasets: one long branch (AB10J ~ AB70aj), two long branches (CD10J ~ CD70aj)

• Operation systems: Mac OS X ver. 10.3.9

• Dual 800 MHZ PowerPC G4

• 256 MB SDRAM• Mr. Bayes – 3.1.1

• Phylip 3.67 (Proml, Consensus)

Page 16: Cpu time 051010

continue• Testing sample size: 21x2• Experimental samples: 7x2• Degree of freedom: 20• Chi square: 283.1561 > 31.41(alpha=0.05)• Proml and Mr. Bayes are two dep val.• ANOVA Ssw=2669051, Ssb=24253093• Sstotal=50943143.71• Eta square= 0.476081596• Type I error=0.05• Type II error=1.83%• Power= 98.17%• Instrument threshold=1xE-8

Page 17: Cpu time 051010

Testing Datasets

y(Mr.Bayes)= 1.058351726x(Proml)+14.79771

0.999724correl

14.79771intercept

1.058352slope

Testing datasets in linear regression between Mr. Bayes and Proml)

104.6243131.7778878.5355sd

226.959296.33331576.467mean

diff(Mr.Bayes-Proml)characterCPU

Testing samples:

Page 18: Cpu time 051010

0.996717correl

0.109857f-test5343.856intercept

3.47E-05t-test0.492193slope

Linear regression between experimental samples:

364.0589179.5717sd

13122.8611802.88mean

CD(two long branches)AB(one long branch)

Experimental samples:

Page 19: Cpu time 051010

Linear Regression for All

y(Mr. Bayes)= 1.058352x(Proml)+14.79764

0.999959correl

14.79764intercept

1.058352slope

Linear regression for all datasets(including experimental and testing)

385.302190.05sd

13903.512506.4mean

CD(two long branches)AB(one long branch)

After Bayesian modeling

Page 20: Cpu time 051010

Tree Hierarchical Structure: AB10J• AB10J.JTT• +----------seq.7 • | • +-----5 +---------seq.4 • | | | • | +---2 +-------seq.6 • | | +----4 • | +----3 +----------seq.5 • | | • | +-----------seq.2 • | • 1------------------------------------------seq.3 • | • +--------seq.1 • AB10J.consensus• +--------------------seq.4• |• +--1.0-| +------seq.6• | | +--1.0-|• | +--1.0-| +------seq.5• +------| |• | | +-------------seq.2• | |• | | +------seq.1• | +----------------1.0-|• | +------seq.3• |• +----------------------------------seq.7

Page 21: Cpu time 051010

Histogram AB10J

AB10J

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 2 4 6 8

no.

len

gth

Series1

Page 22: Cpu time 051010

Tree Hierarchical Structure:CD10J• CD10J.JTT• +----seq.7 • | • +--5 +----seq.4 • | | | • | +-2 +-------------------seq.6 • | | +-4 • | +--3 +-----seq.5 • | | • | +-----seq.2 • | • 1---seq.3 • | • +---------------------seq.1 • CD10J.consensus• +--------------------seq.4• |• +--1.0-| +------seq.6• | | +--1.0-|• | +--1.0-| +------seq.5• +------| |• | | +-------------seq.2• | |• | | +------seq.1• | +----------------1.0-|• | +------seq.3• |• +----------------------------------seq.7

Page 23: Cpu time 051010

Histogram CD10J

CD10J

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 2 4 6 8

no.

len

gth

Series1

Page 24: Cpu time 051010

Discussion

• Bayesian modeling can be used to evaluate type I,II errors, eta square, power, Chi square X2, Anova, correlated coefficient, linear regression etc ..

• It is possible to design a 2x2 table in order to evaluate risk such as RD, RR, RO

• Proml and consensus features bring out a histogram’s profile including hierarchical tree structure and it is possible for peak area integration

Page 25: Cpu time 051010

Questions

• CPU time can be used to count all activities in hydrogen bonds through kinesthetic module in computer, and hydrogen bond’s configurations of DNA match from pairs of A-T, A-U. C-G, and/or DNA alignment from separate genetic codes of A, T, U, C, G.

• CPU time is possible to count all triggering by stem cell activity through functional proteins.

• CPU time has been already used in Forensic science to count pattern differentiation from suspect sample in judiciary investigations.