Phylogenetics in R
Scott ChamberlainNovember 18, 2011
What sorts of phylogenetics things can I do in R?
The run down• Get sequence data• Align sequence data• Phylogenetic inference
– NJ, maxlik, parsimony, Bayesian, UPGMA
• Visualize phylogenies• Traits on trees
–Phylogenetic signal–Trait evolution–Ancestral state character reconstruction
• Tree simulations• Get trees• Phylogenetic community structure• Bonus stuff: polytomy resolver
Basic trees in R
Example
require(ape)tr1 <- read.tree(text = "(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);")tr1 # print tree summarywrite.tree(tr1) # print tree in newick format "(((B:0.05,C:0.05):0.01,D:0.06):0.04,A:0.1);"
tr1$tip.label # tip labels "B" "C" "D" "A"
tr1$edge.length # edge labels 0.04 0.01 0.05 0.05 0.06 0.10
tr1$node.label # node labels NULL [MEANING – no node labels]
# Assign properties to treestr1$tip.label <- c('sleepy','happy','grumpy','frumpy') # label tipstr1$tip.label # did it work? "sleepy" "happy" "grumpy" "frumpy“
Etcetera for other tree properties
Get sequence data# install and load apeinstall.packages("ape"); require(ape)
# get data from Genbank# make vector of accession numbers, for ITS 1 and 2 region for Gossypium (cotton) species
cotton_acc <- c("U56806", "U12712", "U56810","U12732", "U12725", "U56786", "U12715","AF057758","U56790", "U12716", "U12729","U56798", "U12727", "U12713", "U12719","U56811", "U12728", "U12730", "U12731","U12722", "U56796", "U12714", "U56789","U56797", "U56801", "U56802", "U12718","U12710", "U56804", "U12734", "U56809","U56812", "AF057753", "U12711", "U12717","U12723", "U12726")
# get data from Genbankrequire(ape)cotton <- read.GenBank(cotton_acc, species.names = T)
# name the sequences with species names instead of access numbersnames_accs <- data.frame(species = attr(cotton, "species"), accs = names(cotton))names(cotton) <- attr(cotton, "species")
Align sequence datarun external: clustal, mafft
# multiple sequence alignment### Get clustalw here, and install: http://www.clustal.org/
# set to your working directorysetwd(“/path on your computer to/ClustalW2")
# write fasta file to directorywrite.dna(cotton, "cotton.fas", format = "fasta")
# run clustal multiple alignment, prints clustal output to consolesystem(paste('"./clustalw2" cotton.fas')) # should work on OSX or Windows
# read the alignment back in to Rcotton_clustalaligned <- read.dna("cotton.aln", format="clustal")
Manual aligment may have to be done, dare I say it, not in R
Get and align sequencesDIY
• Get together with a few other people…or not– Choose some species to investigate– Get their accession numbers on GenBank– Download sequence data from Genbank– If you are really adventurous, also align sequences
Phylogenetic inference Tools
R Packages: ape, phangorn, phyclust, phytools, scaleboot
• ape has the most functionality for phylogenetic inference
• You should be able to call MrBayes form R, but I don’t know how – package phyloch?
Phylogenetic inference • Fitting evol models: see fxn modelTest in package phangorn
• NJinstall.packages(“ape"); require(ape)data(woodmouse)trw <- nj(dist.dna(woodmouse))plot(trw)
• Maximum likelihoodinstall.packages("phangorn"); require(phangorn)data(Laurasiatherian)dm <- dist.logDet(Laurasiatherian) njtree <- NJ(dm)MLfit <- pml(njtree, Laurasiatherian) # optimize edge length parameterMLfit_ <- optim.pml(MLfit, model = "GTR") MLfit_$treeplot(MLfit_$tree)
• Parsimonyinstall.packages("phangorn"); require(phangorn)data(Laurasiatherian) dm = dist.logDet(Laurasiatherian) tree = NJ(dm) treepars <- optim.parsimony(tree, Laurasiatherian)
Phylogenetic inference---Continued• Bayesian
– You can do this (maybe) with the package phyloch (get here: http://www.christophheibl.de/Rpackages.html ), by calling MrBayes from R…
– …however, MrBayes is giving way to RevBayes here: http://sourceforge.net/projects/revbayes/), fyi
Phylogenetic inferenceDIY
• With your partners…or not– Use the sequence data from GenBank you got
earlier– (if you didn’t align the sequences, don’t worry
about it – OR use data set provided with ape or other package)
– Do some phylogenetic inference a couple of different ways (e.g., NJ and parsimony)
Visualize phylogenies
R Packages: ape, ade4, phytools, phylobase, ouch, paleoPhylo
# visualize phylogeniesinstall.packages("ape")require(ape)tree <- rcoal(10)treeplot(tree)plot(tree, type = "cladogram")plot(tree, type = "unrooted")plot(tree, type = "radial")plot(tree, type = "fan")
Visualize phylogeniesDIY
• Get together with a few other people…or not– Use the tree you made, or use one provided with
ape, or other packages – Do basic plotting, e.g.: plot(mytree)– Then see if you can
• color the branches, • label the branches with the edge lengths• change the tip labels• etc.
Traits on treesphylogenetic signal
R Packages: ape, picante, caper, phytoolsExamples from picante and phytools:# phylogenetic signalinstall.packages("picante")require(picante)randtree <- rcoal(20)randtraits <- rTraitCont(randtree)Kcalc(randtraits[randtree$tip.label],randtree)
install.packages("phytools")require(phytools)tree <- rbdtree(1,0,Tmax=4) # make a treex <- fastBM(tree) # simulate traitsphylosig(tree, x, method="lambda", test=TRUE) # calcualte physig, lambdaphylosig(tree, x, method="K", test=TRUE) # calcualte physig, K
Traits on treesmodeling trait evolution
R Packages: ape, picante, caper, geiger, PHYLOGR, phytools, ade4, motmot
Above can do: trait evolution of traits, including: discrete and continuous, and with Brownian motion or OU models
See also: • Rbrownie• Various dev evol modeling frameworks to be included in geiger
soon: auteur, mecca, medusa, and fossilmedusahere: http://www.webpages.uidaho.edu/~lukeh/software/index.html
Ancestral state reconstruction
R Packages: ape, ouch, phytools
Function ‘ace’ in the ape package works nicelyBut very sensitive to parameters
Exampledata(bird.orders)x <- rnorm(23)out <- ace(x, bird.orders)
out$ace will have the ancestral character values (which you’ll have to match to nodes of your tree)
Tree simulationsR Packages: Treesim, geiger, ape, phybaseExamplerequire(ape)tree <- rcoal(10) # Make a random treetrait <- rTraitCont(tree, model = "BM") # Simulate a trait on that tree
# Write a function to make a tree, simulate a BM trait, and take the mean of that traitmyfunc <- function(n) { tree <- rcoal(n) trait <- rTraitCont(tree, model = "BM") mean(trait)}
# do it 100 times and make a data.frame required for ggplot2 plottingdat <- replicate(100, myfunc(10))dat2 <- data.frame(dat)
# plot resultsrequire(ggplot2)ggplot(dat2, aes(dat)) + geom_histogram()
Get trees
rOpenSci’s treeBASE packageon CRAN: http://cran.r-project.org/web/packages/treebase/
install.packages("treebase") # installrequire(treebase) # loadtree <- search_treebase("Derryberry", "author")[[1]] # searchmetadata(tree$S.id) # metadata for treeplot(tree) # plot the tree
Phylogenetic community structure
R Packages: picante (includes phylocom functionality)--Although, not bladj for some reason, talk to me if you want to run bladj from R
Example
Fxn ‘comdistnt’ calculates intercommunity mean nearest taxon index
data(phylocom)comdistnt(phylocom$sample, cophenetic(phylocom$phylo), abundance.weighted=FALSE)
Also, new approach to phycommstruct in R from Matt Helmus, code here:http://r-ecology.blogspot.com/2011/10/phylogenetic-community-structure-pglmms.html
Bonus: Polytomy resolver
MEE paper: “A simple polytomy resolver for dated phylogenies” by Kuhn, Mooers, and Thomas
– Paperhttp://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/abstract
– Supp info has R scripts: http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2011.00103.x/suppinfo
Resources• Bodega Phylogenetics Wiki:
– Home: http://bodegaphylo.wikispot.org/Front_Page – BROWNIE tutorial: http://bodegaphylo.wikispot.org/Morphological_Diversification_and_Rates_of_Evolution
– Phylogenetic signal tutorial: http://bodegaphylo.wikispot.org/IV._Testing_Phylogenetic_Signal_in_R
• R phylo-wiki (from NESCent): • http://www.r-phylo.org/wiki/HowTo/Table_of_Contents • CRAN task view, Phylogenetics:
http://cran.r-project.org/web/views/Phylogenetics.html • rmesquite: https://r-forge.r-project.org/R/?group_id=213 • R-phylogenetics listserve:
https://stat.ethz.ch/mailman/options/r-sig-phylo/