Top Banner
Species Delimitation Plugin Manual Authors Brad Masters, Vicky Fan and Howard Ross Bioinformatics Institute University of Auckland Contact: [email protected] Purpose The Species Delimitation plugin for the Geneious bioinformatics software (www.biomatters.com) is an exploratory tool that allows users to assess putative species in phylogenetic trees. The plugin summarises measures of phylogenetic support and diagnosability of species defined as userselected collections of taxa on usersupplied trees, but it does not provide definitive support for species groups (Figure 1). The phylogenetic trees may be estimated using modules in Geneious, or with external applications. Distinct clades in a tree are often interpreted as species. However monophyly of a set of taxa can occur by chance within a larger panmictic group as a result of the coalescent process. The plugin implements the method of Rosenberg (2007) for calculating the probability of reciprocal monophyly under the null model of random coalescence. Species boundaries are sometimes identified by deep divergences in phylogenetic trees estimated from single gene sequence alignments. However, when a sample of sequences is collected from a panmictic population, one can sometimes observe a marked cladistic structure, arising solely from the stochastic process of gene coalescence. In these situations, if the distance from a speciesdefining node to the tips is much smaller than the distance from that speciesdefining node to its ancestral node, then we might mistakenly infer the presence of a cryptic species. The plugin implements a method developed by Rodrigo and colleagues (2008) to estimate the probability of observing such a divergence under the null hypothesis of coalescence acting on a single, panmictic population of constant size. Diagnosability is an important criterion in species delimitation. Hebert and colleagues (2003) proposed that the ‘barcode gap’ provided evidence that singlegene sequences could provide reliable identification of most species. The relationship between genetic differentiation and the reliability of species identification was assessed in several different evolutionary scenarios in a simulation study by Ross and colleagues (2008). The plugin implements the findings of that study, to present the probability that a member of a putative species could be identified correctly given the current alignment as the reference dataset. You can use the plugin to investigate different hypotheses of species boundaries. To do so, make several copies of a tree and then assign taxa to species differently in each.
6

Species Delimitation Plugin Manual v1 › plugins › 3rdparty › ... · Figure%1.%A%tree%documentand%its%associated%Species%Delimitation%option%panel.%...

Jul 05, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Species Delimitation Plugin Manual v1 › plugins › 3rdparty › ... · Figure%1.%A%tree%documentand%its%associated%Species%Delimitation%option%panel.% Download(The%Species%Delimitation%plugin,%documentation%and%example%tree%can%be

Species  Delimitation  Plugin  Manual  

Authors  

Brad  Masters,  Vicky  Fan  and  Howard  Ross  Bioinformatics  Institute  University  of  Auckland  

Contact:  [email protected]  

Purpose    

The   Species   Delimitation   plugin   for   the   Geneious   bioinformatics   software  (www.biomatters.com)  is  an  exploratory  tool  that  allows  users  to  assess  putative  species  in  phylogenetic   trees.   The   plugin   summarises   measures   of   phylogenetic   support   and  diagnosability  of  species  defined  as  user-­‐selected  collections  of  taxa  on  user-­‐supplied  trees,  but   it   does   not   provide   definitive   support   for   species   groups   (Figure   1).   The   phylogenetic  trees  may  be  estimated  using  modules  in  Geneious,  or  with  external  applications.  

Distinct  clades  in  a  tree  are  often  interpreted  as  species.  However  monophyly  of  a  set  of  taxa  can  occur  by  chance  within  a   larger  panmictic  group  as  a   result  of   the  coalescent  process.  The   plugin   implements   the  method   of   Rosenberg   (2007)   for   calculating   the   probability   of  reciprocal  monophyly  under  the  null  model  of  random  coalescence.  

Species   boundaries   are   sometimes   identified   by   deep   divergences   in   phylogenetic   trees  estimated  from  single  gene  sequence  alignments.  However,  when  a  sample  of  sequences  is  collected   from   a   panmictic   population,   one   can   sometimes   observe   a   marked   cladistic  structure,  arising  solely  from  the  stochastic  process  of  gene  coalescence.  In  these  situations,  if   the  distance   from  a   species-­‐defining  node   to   the   tips   is  much   smaller   than   the  distance  from   that   species-­‐defining  node   to   its   ancestral   node,   then  we  might  mistakenly   infer   the  presence  of  a  cryptic   species.  The  plugin   implements  a  method  developed  by  Rodrigo  and  colleagues  (2008)  to  estimate  the  probability  of  observing  such  a  divergence  under  the  null  hypothesis  of  coalescence  acting  on  a  single,  panmictic  population  of  constant  size.  

Diagnosability  is  an  important  criterion  in  species  delimitation.  Hebert  and  colleagues  (2003)  proposed   that   the   ‘barcode   gap’   provided   evidence   that   single-­‐gene   sequences   could  provide   reliable   identification   of   most   species.   The   relationship   between   genetic  differentiation  and   the   reliability  of   species   identification  was  assessed   in   several  different  evolutionary   scenarios   in   a   simulation   study   by   Ross   and   colleagues   (2008).   The   plugin  implements   the   findings   of   that   study,   to   present   the   probability   that   a   member   of   a  putative  species  could  be   identified  correctly  given  the  current  alignment  as   the  reference  dataset.  

You  can  use  the  plugin  to  investigate  different  hypotheses  of  species  boundaries.  To  do  so,  make  several  copies  of  a  tree  and  then  assign  taxa  to  species  differently  in  each.  

Page 2: Species Delimitation Plugin Manual v1 › plugins › 3rdparty › ... · Figure%1.%A%tree%documentand%its%associated%Species%Delimitation%option%panel.% Download(The%Species%Delimitation%plugin,%documentation%and%example%tree%can%be

 

Figure  1.  A  tree  document  and  its  associated  Species  Delimitation  option  panel.  

Download  

The  Species  Delimitation  plugin,  documentation  and  example  tree  can  be  downloaded  from  

www.cebl.auckland.ac.nz/~hros001/Software/SpDelim/  

or  it  can  be  installed  directly  from  within  Geneious  (see  next).  

Installation    

Requires  Geneious.  Install  it  from  www.geneious.com.  

Page 3: Species Delimitation Plugin Manual v1 › plugins › 3rdparty › ... · Figure%1.%A%tree%documentand%its%associated%Species%Delimitation%option%panel.% Download(The%Species%Delimitation%plugin,%documentation%and%example%tree%can%be

To  install  the  plugin:  

• Start  the  Geneious  program.  

• From  the  Tools  :  Plugins  menu  either:  

o Select  Install  plugin  from  a  gplugin  file.    

o Browse  to  the  directory  containing  the  plugin  and  select  the  plugin.  Geneious  will  install  it.  

  or:  

o Select  the  Species  Delimitation  plugin  from  the  list.  

o Click  the  Install  button  associated  with  it.  

• If  Geneious  prompts  for  a  restart,  please  do  so  before  continuing.  

Using  the  Plugin  

Getting  Started:  Once  the  Species  Delimitation  plugin  is  installed,  you  can  use  it  by  selecting  a   tree  document,   selecting  Tree  View,   and   in   the   right-­‐hand   sidebar   selecting   the  Species  Delimitation   option   panel   (Figure   1).   This   panel   displays   all   the   information   the   Species  Delimitation  tool  can  access  from  the  current  tree  document.  Using  the  Species  Delimitation  tool   will   not   alter   the   topology   or   branch   lengths   of   your   tree   but   it   will   alter   the   node  colourings.    

We  suggest  that  you  first  copy  and  paste  the  tree  document,  to  give  a  working  copy  of  the  tree  with  its  nodes  coloured  and  statistics  calculated.  

Adding   Species:   Taxa   (i.e.,   sequences,   leaves,   tips)   belonging   to   the   same   species   are  indicated   or   specified   by   a   shared   colour.   You   can   assign   taxa   to   species   in   two   different  ways:  

1.  Select  a  group  of  nodes,  and  in  the  Species  Delimitation  option  panel  enter  a  name  and  click  Add  Selection  to  define  a  species.  The  nodes  will  be  given  a  randomly  chosen  colour.  

2.  The  species  may  be  defined  in  advance  by  using  the  Geneious  Color  Nodes  function  (Pro  Version  only).  Select  a  set  of  nodes  and  assign  the  same  colour  to  them.  Or,  as  you  work  on  the  tree,  select  a  set  of  nodes,  use  the  Color  Nodes  tool  to  give  them  a  unique  color,  click  off  the  nodes,  and  click  Reload  Tree   in  the  Species  Delimitation  option  panel.  All  nodes  of  the  same  colour  will  be  added  to  the  same  species.  

Rename   Selection:   To   change   the   name   assigned   to   a   species,   or   to   assign   a   name   to   an  unnamed  species,  select  entry  from  the  list  of  species  and  click  Rename  Selection.    

Redefining  Species:  Currently  the  plugin  is  not  capable  of  additions  or  deletions  of  members  of   a   species   set.   To   redefine   a   species,   select   the   entry   from   the   list   of   species   and   click  Remove.  Reassign  the  taxa  using  the  methods  for  adding  species.    

Reload  Tree:  This  clears  all  of  the  species  sets  and  redefines  them  based  on  the  identically  coloured   groups   in   the   tree.   This   is   useful   if   species   require   specific   colour   settings.   The  Geneious  colouring  tool  will  not  automatically  assert  the  coloured  group  as  a  species  in  the  Species  Delimitation  tool.    

Save   SpDelim   Results:   When   you   click   Save   SpDelim   Results   in   the   Species   Delimitation  option  panel,  the  species  sets,  with  their  colourings,  and  the  associated  statistics  are  saved.  

Page 4: Species Delimitation Plugin Manual v1 › plugins › 3rdparty › ... · Figure%1.%A%tree%documentand%its%associated%Species%Delimitation%option%panel.% Download(The%Species%Delimitation%plugin,%documentation%and%example%tree%can%be

A   table   summarizing   the   statistics   is   created   in   the   Species   Delimitation   tab   in   the   tree  document.  Note  that  only  one  set  of  results  can  be  stored  with  each  tree  document.  If  you  want   to   investigate   several   scenarios,   then   copy   and   paste   several   copies   of   the   tree  document  and  develop  a  different  scenario  on  each  copy.  To  save  the  statistics  to  file,  go  to  the  Species  Delimitation  tab  and  click  the  Save  to  Text  File  button.  

The  Save   SpDelim  Results   button  will   overwrite   previous   species   delimitation   results.   It   is  possible   to   save   the   tree   in   its   current   state   without   updating   the   results   by   using   the  Geneious  Save  command  (Ctrl+S,  Cmnd+S,  or  Save  icon),  however  this  will  cause  the  Species  Delimitation   tab   to   become   out   of   sync  with   the   species   groups   on   the   tree.   Clicking   the  Save  SpDelim  Results  button  again  will  resynchronize  the  tree  and  results.  

Reset:  This  clears  all  of  the  colouring  and  species  sets  from  the  tree.  Warning:  This  will  also  remove  any  colouring  added  to  the  trees  prior  to  use  of  the  plugin.  

Example  Tree  

The  tree  shown  in  Figure  1  may  be  downloaded  and  imported  into  Geneious.  Use  the  File  :  Import   :   From   file   command.   You   can   then   select   the   clades,  define   them  as   species,   and  check  that  you  get  the  same  results  as  illustrated.  The  tree  was  estimated  using  sequences  deposited  in  Genbank  by  Milá  and  colleagues  (2007).  It  is  only  intended  to  illustrate  the  use  of  this  plugin  and  not  to  be  considered  an  analysis  of  the  delimitation  of  species  within  this  group  of  organisms.  

Interpreting  the  Results    

The   plugin   reports   several   statistics   that   can   be   useful   when   considering   where   species  might   be   delimited.   The   statistics   presented   are   largely   based   on   group-­‐to-­‐group  comparisons.  Consequently  at  least  two  species  must  be  defined  in  order  to  obtain  results.  The   results   are   calculated   dynamically   with   each   addition   or   removal   of   a   species   group.  Results  can  be  viewed  by  selecting  a  species  using  the  drop  down  menu  or  by  selecting  all  the   nodes   of   a   species   on   the   tree   itself.   Selecting   a  monophyletic   species   group   is   easily  accomplished  by  selecting  its  most  recent  common  ancestor.    

Monophyletic?:  Whether   the  species   is  monophyletic  or  not.  This  property   is   important   in  determining   whether   it   is   possible   to   calculate   P(Randomly   Distinct),   Clade   Support   or  Rosenberg’s   PAB   as   these  methods   are   only   applicable   to  monophyletic   groups.   Note   that  species  containing  a  single  member  are  monophyletic  by  definition.  

Intra  Dist:  The  average  pairwise  tree  distance  among  members  of  the  focal  species.  Larger  values  indicate  that  the  members  of  the  species  are  more  diverse.    

Inter  Dist:  The  average  pairwise  tree  distance  between  the  members  of  the  focal  species  and  members   of   the   next   closest   species.   Larger   values   indicate   the   species   groups   are  increasingly  distinct.      

Intra/Inter:   The   ratio   of   Intra   Dist   to   Inter   Dist.   This   provides   a   measure   of   genetic  differentiation  between  the  focal  species  and  its  nearest  neighbouring  species.  Small  values  indicate   that   genetic   differences  within   the   focal   species   are   small   relative   to   differences  between  members  of  the  focal  species  and  members  of  the  closest  species.    

P  ID(Strict):  The  mean  probability,  with  the  95%  confidence  interval  (CI)  for  the  prediction,  of   making   a   correct   identification   of   an   unknown   specimen   of   the   focal   species   using  

Page 5: Species Delimitation Plugin Manual v1 › plugins › 3rdparty › ... · Figure%1.%A%tree%documentand%its%associated%Species%Delimitation%option%panel.% Download(The%Species%Delimitation%plugin,%documentation%and%example%tree%can%be

placement  on  a  tree  and  the  criterion  that  it  must  fall  within,  but  not  sister  to,  the  species  clade.    

P  ID(Liberal):  The  mean  probability,  with  the  95%  confidence  interval  (CI)  for  the  prediction,  of  making  a  correct  identification  of  an  unknown  specimen  of  the  focal  species  using  BLAST  (best  sequence  alignment),  DNA  Barcoding  (closest  genetic  distance)  or  placement  on  a  tree,  with  the  criterion  that  it  falls  sister  to  or  within  a  monophyletic  species  clade.    

P   ID(Strict)   and   P   ID(Liberal)   are   based   on   the   premise   that   the   putative   species   is  represented  in  the  reference  sequence  alignment  by  the  current  member  taxa  and  that  any  query   or   unknown   sequence   is   drawn   from   a   species   having   the   same   coalescent  model.  They  are  derived  from  the  simulation  results  of  Ross  et  al.   (2008).  Regression  analysis  was  applied   to   these   results   to   model   the   response   variable,   the   probability   of   a   correct  identification,   by   the  explanatory   variable,   the   Intra/Inter   genetic   distance   ratio.   Separate  regression  analyses  were  performed  for  cases  with  specific  numbers  of  reference  sequences  for  a  putative  species.  The  cases  were  1,  2,  3,  4,  5-­‐6,  7-­‐8,  9-­‐11,  12-­‐15,  and  16-­‐19  references  per  species.  Separate  models  were  developed  for  the  Strict  and  Liberal  criteria  and  these  are  applied  to  the  taxa  in  the  user’s  tree.  

Av(MRCA):  The  mean  distance  between  the  most  recent  common  ancestor  of  a  species  and  its  members.    

P(Randomly   Distinct):   The   probability   that   a   clade   has   the   observed   degree   of  distinctiveness,  i.e.,  has  such  a  long  subtending  branch,  due  to  random  coalescent  processes  (Rodrigo  et  al.,  2008).  Focal  groups  with  values  between  0.05  and  1  represent  groups  that  have   branching   events   that   would   be   expected   under   the   coalescent  model   in   a  Wright-­‐Fisher  population  and  a  strict  molecular  clock.  We  can  only  conclude  that  the  focal  group  has  branching   significantly   different   to  what  we  would   expect   under   the   coalescent  process   if  the  result  is  less  than  0.05.  A  value  this  low  indicates  the  possibility  that  a  cryptic  species  is  present,  as  the  lineage  is  not  conforming  to  a  Wright-­‐Fisher  model.    

Notes:  

1. This  calculation  is  only  relevant  if  the  tree  is  estimated  under  a  strict  molecular  clock.  

2. The   formula   used   to   calculate   this   statistic   becomes   unstable   due   to   computational  issues  with  precision  when  the  number  of   individuals   in  the  tree  exceeds  40.  It  will  still  be   useful   for   species   groups   with   a   deep   lineage,   however,   shallow   groups   whose  ancestor  exceeds  the  40th  coalescent  point  will  always  return  the  value  1.  This  high  value  indicates  that  we  cannot  confirm  or  deny  the  assumption  of  a  population  acting  under  the  Wright-­‐Fisher  model.    

Clade   Support:   Bootstrap   support   or   Bayesian   posterior   probability   will   be   available   for  monophyletic   species   where   the   tree   estimation   technique   has   estimated   the   sequence  support  for  the  indicated  clade.  Larger  values  indicate  stronger  support.    

Note  that  when  the  Bootstrap  support  is  computed  using  Geneious  then  it  will  automatically  be  given  the  field  name  “Consensus  support(%)”  or  “bootstrap  proportion”.  In  a  similar  way,  the   Bayesian   posterior   probability   is   automatically   named   “Clade   support”   by   Geneious.  When   these   support   values  are   computed  by  other   applications,   such  as  MrBayes,   Paup*,  PHYML   or   PHYLIP,   then   they   will   have   no   associated   field   name.   When   the   trees   are  imported   into  Geneious,   then   the   field   is   given   the  generic  name  “label”.  However,   if   you  first  import  such  a  tree  into  FigTree,  assign  a  different  name  to  the  support  value  field  (e.g.  

Page 6: Species Delimitation Plugin Manual v1 › plugins › 3rdparty › ... · Figure%1.%A%tree%documentand%its%associated%Species%Delimitation%option%panel.% Download(The%Species%Delimitation%plugin,%documentation%and%example%tree%can%be

“PostProb”  or  “BPP”)  and  then  export  the  tree,  the  support  values  will  have  the  new  names  assigned  to  them.  Then  if  the  tree  is  subsequently  imported  to  Geneious,  the  support  value  will  have  the  field  name  that  you  assigned.  This  plugin  only  supports  the  default  field  names  used   by   Geneious   (“Consensus   support(%)”,   “bootstrap   proportion”,   “Clade   support”   and  “label”)   and  will   be   unable   to   summarize   support   values  with   other   names.  Nevertheless,  you   will   be   able   to   display   such   values   on   your   tree   by   selecting   the   Show   Node   Labels  option  and  selecting  the  appropriate  field  from  the  dropdown  list.  

Rosenberg’s  PAB:  The  probability  that  species  A  represented  by  a  sequences,  in  a  clade  of  a  +  b   sequences,  will   be   reciprocally  monophyletic  with   the   remaining  b   sequences  under   the  null  model  of  random  coalescence.  

Technical  Issues  

Speed   of   operation   is   an   issue   with   this   plugin,   especially   for   larger   trees   containing  hundreds  of   taxa.   The  primary   reason   is   that   the  plugin  uses   the   tree  as   its  data,   not   any  underlying  sequence  alignment.   In  performing  its  calculations,  the  plugin  must  compute  all  pairwise  distances  on  the  tree.  As  the  number  of  taxa  (n)  increases,  the  number  of  pairwise  distances  increases  in  proportion  to  n2.  This  version  introduces  some  changes  to  improve  the  performance  of  the  plugin,  and  the  effects  should  be  noticed  with  larger  trees.  

References  

Hebert  PDN,  Cywinska  A,  Ball  SL,  deWaard  JR  (2003)  Biological  identifications  through  DNA  barcodes.  Proceedings  of  the  Royal  Society  of  London.  Series  B,  Biological  Sciences  270,  313-­‐321.  

Milá  B,  Smith  TB,  Wayne  RK  (2007)  Speciation  and  rapid  phenotypic  differentiation  in  the  yellow-­‐rumped  warbler  Dendroica  coronata  complex.  Molecular  Ecology  16,  159-­‐173.  

Rodrigo  AG,  Bertels  F,  Heled  J,  Noder  R,  Shearman  H,  Tsai  P  (2008)  The  perils  of  plenty:  what  are  we  going  to  do  with  all  these  genes?  Philosophical  Transactions  of  the  Royal  Society  London.  Series  B,  Biological  Sciences  363,  3893-­‐3902.  

Rosenberg  NA  (2007)  Statistical  tests  for  taxonomic  distinctiveness  from  observations  of  monophyly.  Evolution  61,  317-­‐323.  

Ross  HA,  Murugan  S,  Li  WLS  (2008)  Testing  the  reliability  of  genetic  methods  of  species  identification  via  simulation.  Systematic  Biology  57,  216-­‐230.  

 

v  1.03  

30  July  2010