Top Banner
Phylogenetics in R Scott Chamberlain November 18, 2011
21

Phylogenetics in R

May 11, 2015

Download

Technology

schamber

Talk given on 18 Nov, 2011 on doing phylogenetics in R.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Phylogenetics in R

Phylogenetics in R

Scott ChamberlainNovember 18, 2011

Page 2: Phylogenetics in R

What sorts of phylogenetics things can I do in R?

Page 3: Phylogenetics in R

The run down• Get sequence data• Align sequence data• Phylogenetic inference

– NJ, maxlik, parsimony, Bayesian, UPGMA

• Visualize phylogenies• Traits on trees

–Phylogenetic signal–Trait evolution–Ancestral state character reconstruction

• Tree simulations• Get trees• Phylogenetic community structure• Bonus stuff: polytomy resolver

Page 4: Phylogenetics in R

Basic trees in R

Example

require(ape)tr1 <- read.tree(text = "(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);")tr1 # print tree summarywrite.tree(tr1) # print tree in newick format "(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);"

tr1$tip.label # tip labels "B" "C" "D" "A"

tr1$edge.length # edge labels 0.04 0.01 0.05 0.05 0.06 0.10

tr1$node.label # node labels NULL [MEANING – no node labels]

# Assign properties to treestr1$tip.label <- c('sleepy','happy','grumpy','frumpy') # label tipstr1$tip.label # did it work? "sleepy" "happy" "grumpy" "frumpy“

Etcetera for other tree properties

Page 5: Phylogenetics in R

Get sequence data# install and load apeinstall.packages("ape"); require(ape)

# get data from Genbank# make vector of accession numbers, for ITS 1 and 2 region for Gossypium (cotton) species

cotton_acc <- c("U56806", "U12712", "U56810","U12732", "U12725", "U56786", "U12715","AF057758","U56790", "U12716", "U12729","U56798", "U12727", "U12713", "U12719","U56811", "U12728", "U12730", "U12731","U12722", "U56796", "U12714", "U56789","U56797", "U56801", "U56802", "U12718","U12710", "U56804", "U12734", "U56809","U56812", "AF057753", "U12711", "U12717","U12723", "U12726")

# get data from Genbankrequire(ape)cotton <- read.GenBank(cotton_acc, species.names = T)

# name the sequences with species names instead of access numbersnames_accs <- data.frame(species = attr(cotton, "species"), accs = names(cotton))names(cotton) <- attr(cotton, "species")

Page 6: Phylogenetics in R

Align sequence datarun external: clustal, mafft

# multiple sequence alignment### Get clustalw here, and install: http://www.clustal.org/

# set to your working directorysetwd(“/path on your computer to/ClustalW2")

# write fasta file to directorywrite.dna(cotton, "cotton.fas", format = "fasta")

# run clustal multiple alignment, prints clustal output to consolesystem(paste('"./clustalw2" cotton.fas')) # should work on OSX or Windows

# read the alignment back in to Rcotton_clustalaligned <- read.dna("cotton.aln", format="clustal")

Manual aligment may have to be done, dare I say it, not in R

Page 7: Phylogenetics in R

Get and align sequencesDIY

• Get together with a few other people…or not– Choose some species to investigate– Get their accession numbers on GenBank– Download sequence data from Genbank– If you are really adventurous, also align sequences

Page 8: Phylogenetics in R

Phylogenetic inference Tools

R Packages: ape, phangorn, phyclust, phytools, scaleboot

• ape has the most functionality for phylogenetic inference

• You should be able to call MrBayes form R, but I don’t know how – package phyloch?

Page 9: Phylogenetics in R

Phylogenetic inference • Fitting evol models: see fxn modelTest in package phangorn

• NJinstall.packages(“ape"); require(ape)data(woodmouse)trw <- nj(dist.dna(woodmouse))plot(trw)

• Maximum likelihoodinstall.packages("phangorn"); require(phangorn)data(Laurasiatherian)dm <- dist.logDet(Laurasiatherian) njtree <- NJ(dm)MLfit <- pml(njtree, Laurasiatherian) # optimize edge length parameterMLfit_ <- optim.pml(MLfit, model = "GTR") MLfit_$treeplot(MLfit_$tree)

• Parsimonyinstall.packages("phangorn"); require(phangorn)data(Laurasiatherian) dm = dist.logDet(Laurasiatherian) tree = NJ(dm) treepars <- optim.parsimony(tree, Laurasiatherian)

Page 10: Phylogenetics in R

Phylogenetic inference---Continued• Bayesian

– You can do this (maybe) with the package phyloch (get here: http://www.christophheibl.de/Rpackages.html ), by calling MrBayes from R…

– …however, MrBayes is giving way to RevBayes here: http://sourceforge.net/projects/revbayes/), fyi

Page 11: Phylogenetics in R

Phylogenetic inferenceDIY

• With your partners…or not– Use the sequence data from GenBank you got

earlier– (if you didn’t align the sequences, don’t worry

about it – OR use data set provided with ape or other package)

– Do some phylogenetic inference a couple of different ways (e.g., NJ and parsimony)

Page 12: Phylogenetics in R

Visualize phylogenies

R Packages: ape, ade4, phytools, phylobase, ouch, paleoPhylo

# visualize phylogeniesinstall.packages("ape")require(ape)tree <- rcoal(10)treeplot(tree)plot(tree, type = "cladogram")plot(tree, type = "unrooted")plot(tree, type = "radial")plot(tree, type = "fan")

Page 13: Phylogenetics in R

Visualize phylogeniesDIY

• Get together with a few other people…or not– Use the tree you made, or use one provided with

ape, or other packages – Do basic plotting, e.g.: plot(mytree)– Then see if you can

• color the branches, • label the branches with the edge lengths• change the tip labels• etc.

Page 14: Phylogenetics in R

Traits on treesphylogenetic signal

R Packages: ape, picante, caper, phytoolsExamples from picante and phytools:# phylogenetic signalinstall.packages("picante")require(picante)randtree <- rcoal(20)randtraits <- rTraitCont(randtree)Kcalc(randtraits[randtree$tip.label],randtree)

install.packages("phytools")require(phytools)tree <- rbdtree(1,0,Tmax=4) # make a treex <- fastBM(tree) # simulate traitsphylosig(tree, x, method="lambda", test=TRUE) # calcualte physig, lambdaphylosig(tree, x, method="K", test=TRUE) # calcualte physig, K

Page 15: Phylogenetics in R

Traits on treesmodeling trait evolution

R Packages: ape, picante, caper, geiger, PHYLOGR, phytools, ade4, motmot

Above can do: trait evolution of traits, including: discrete and continuous, and with Brownian motion or OU models

See also: • Rbrownie• Various dev evol modeling frameworks to be included in geiger

soon: auteur, mecca, medusa, and fossilmedusahere: http://www.webpages.uidaho.edu/~lukeh/software/index.html

Page 16: Phylogenetics in R

Ancestral state reconstruction

R Packages: ape, ouch, phytools

Function ‘ace’ in the ape package works nicelyBut very sensitive to parameters

Exampledata(bird.orders)x <- rnorm(23)out <- ace(x, bird.orders)

out$ace will have the ancestral character values (which you’ll have to match to nodes of your tree)

Page 17: Phylogenetics in R

Tree simulationsR Packages: Treesim, geiger, ape, phybaseExamplerequire(ape)tree <- rcoal(10) # Make a random treetrait <- rTraitCont(tree, model = "BM") # Simulate a trait on that tree

# Write a function to make a tree, simulate a BM trait, and take the mean of that traitmyfunc <- function(n) { tree <- rcoal(n) trait <- rTraitCont(tree, model = "BM") mean(trait)}

# do it 100 times and make a data.frame required for ggplot2 plottingdat <- replicate(100, myfunc(10))dat2 <- data.frame(dat)

# plot resultsrequire(ggplot2)ggplot(dat2, aes(dat)) + geom_histogram()

Page 18: Phylogenetics in R

Get trees

rOpenSci’s treeBASE packageon CRAN: http://cran.r-project.org/web/packages/treebase/

install.packages("treebase") # installrequire(treebase) # loadtree <- search_treebase("Derryberry", "author")[[1]] # searchmetadata(tree$S.id) # metadata for treeplot(tree) # plot the tree

Page 19: Phylogenetics in R

Phylogenetic community structure

R Packages: picante (includes phylocom functionality)--Although, not bladj for some reason, talk to me if you want to run bladj from R

Example

Fxn ‘comdistnt’ calculates intercommunity mean nearest taxon index

data(phylocom)comdistnt(phylocom$sample, cophenetic(phylocom$phylo), abundance.weighted=FALSE)

Also, new approach to phycommstruct in R from Matt Helmus, code here:http://r-ecology.blogspot.com/2011/10/phylogenetic-community-structure-pglmms.html

Page 20: Phylogenetics in R

Bonus: Polytomy resolver

MEE paper: “A simple polytomy resolver for dated phylogenies” by Kuhn, Mooers, and Thomas

– Paperhttp://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/abstract

– Supp info has R scripts: http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/suppinfo

Page 21: Phylogenetics in R

Resources• Bodega Phylogenetics Wiki:

– Home: http://bodegaphylo.wikispot.org/Front_Page – BROWNIE tutorial: http://bodegaphylo.wikispot.org/Morphological_Diversification_and_Rates_of_Evolution

– Phylogenetic signal tutorial: http://bodegaphylo.wikispot.org/IV._Testing_Phylogenetic_Signal_in_R

• R phylo-wiki (from NESCent): • http://www.r-phylo.org/wiki/HowTo/Table_of_Contents • CRAN task view, Phylogenetics:

http://cran.r-project.org/web/views/Phylogenetics.html • rmesquite: https://r-forge.r-project.org/R/?group_id=213 • R-phylogenetics listserve:

https://stat.ethz.ch/mailman/options/r-sig-phylo/