Article Fast Track A Versatile and Highly Efficient Toolkit Including 102 Nuclear Markers for Vertebrate Phylogenomics, Tested by Resolving the Higher Level Relationships of the Caudata Xing Xing Shen, 1 Dan Liang, 1 Yan Jie Feng, 1 Meng Yun Chen, 1 and Peng Zhang* ,1 1 Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China *Corresponding author: E-mail: [email protected]. Associate editor: Xun Gu Abstract Resolving difficult nodes for any part of the vertebrate tree of life often requires analyzing a large number of loci. Developing molecular markers that are workable for the groups of interest is often a bottleneck in phylogenetic research. Here, on the basis of a nested polymerase chain reaction (PCR) strategy, we present a universal toolkit including 102 nuclear protein-coding locus (NPCL) markers for vertebrate phylogenomics. The 102 NPCL markers have a broad range of evolutionary rates, which makes them useful for a wide range of time depths. The new NPCL toolkit has three important advantages compared with all previously developed NPCL sets: 1) the kit is universally applicable across vertebrates, with a PCR success rate of 94.6% in 16 widely divergent tested vertebrate species; 2) more than 90% of PCR reactions produce strong and single bands of the expected sizes that can be directly sequenced; and 3) all cleanup PCR reactions can be sequenced with only two specific universal primers. To test its actual phylogenetic utility, 30 NPCLs from this toolkit were used to address the higher level relationships of living salamanders. Of the 639 target PCR reactions performed on 19 salamanders and several outgroup species, 632 (98.9%) were successful, and 602 (94.1%) were directly sequenced. Concatenation and species-tree analyses on this 30-locus data set produced a fully resolved phylogeny and showed that Cryptobranchoidea (Cryptobranchidae + Hynobiidae) branches first within the salamander tree, followed by Sirenidae. Our experimental tests and our demonstration for a particular case show that our NPCL toolkit is a highly reliable, fast, and cost-effective approach for vertebrate phylogenomic studies and thus has the potential to accelerate the completion of many parts of the vertebrate tree of life. Key words: nuclear marker, phylogenomic, vertebrate, salamander, phylogeny. Introduction Building phylogenomic supermatrices with multiple nuclear loci has become the standard method of resolving species relationships in difficult biological scenarios (Delsuc et al. 2005). One efficient method of constructing multilocus data sets is expressed sequence tag (EST) (Philippe and Telford 2006; Dunn et al. 2008; Philippe et al. 2009) or transcriptome (Ku ¨nstner et al. 2010) sequencing, in which high-quality RNA is extracted from each organism of inter- est and a huge number of ESTs or transcripts are then sequenced by Sanger or next-generation sequencing (NGS). However, this approach often generates patchy data sets with a high proportion of missing data, which may compromise phylogenetic inference (Lemmon et al. 2009; Roure et al. 2013). More importantly, this approach is not workable for many older collections because these specimens can only provide DNA samples. A second effi- cient way to construct multilocus data sets is the sequence capture method in which target genomic regions are selec- tively captured by hybridization with probes before NGS (Crawford et al. 2012; Faircloth et al. 2012; Lemmon et al. 2012; McCormack et al. 2012). The most attractive feature of this method is that it can generate hundreds to thou- sands of loci for many samples in a short time. However, experimentally, the efficiency of sequence capture is con- siderably influenced by the divergence between probes and target sequences (Lemmon et al. 2012; McCormack et al. 2013). More importantly, turning the huge data set derived from sequence capture into sequences that researchers can analyze requires sophisticated bioinformatic processing, which is currently quite challenging to most phylogenetic researchers (McCormack et al. 2013). Therefore, although the sequence capture method is efficient and promising, its immaturity currently restricts its wide application in the community. Currently, for vertebrate phylogenetics, the most widely used approach for building multilocus data sets is still con- ventional targeted polymerase chain reaction (PCR) and the sequencing of selected orthologous genes. However, the PCR-based method is laborious: 1) most practitioners spend much time developing and screening molecular mar- kers that are workable for their studied taxa and suitable to their evolutionary timescale of interest (Murphy et al. 2001; Li et al. 2007; Townsend et al. 2008; Wright et al. 2008; Shen et al. 2011); 2) it requires PCR of each organism at ß The Author 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected]Mol. Biol. Evol. 30(10):2235–2248 doi:10.1093/molbev/mst122 Advance Access publication July 4, 2013 2235 at Vanderbilt University - Massey Law Library on March 18, 2015 http://mbe.oxfordjournals.org/ Downloaded from
14
Embed
A Versatile and Highly Efficient Toolkit Including 102 ...completion of many parts of the vertebrate tree of life. Key words: nuclear marker, phylogenomic, vertebrate, salamander,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Article
FastT
rackA Versatile and Highly Efficient Toolkit Including 102 NuclearMarkers for Vertebrate Phylogenomics Tested by Resolvingthe Higher Level Relationships of the CaudataXing Xing Shen1 Dan Liang1 Yan Jie Feng1 Meng Yun Chen1 and Peng Zhang1
1Key Laboratory of Gene Engineering of the Ministry of Education State Key Laboratory of Biocontrol School of Life SciencesSun Yat-Sen University Guangzhou China
Corresponding author E-mail alarzhanggmailcom
Associate editor Xun Gu
Abstract
Resolving difficult nodes for any part of the vertebrate tree of life often requires analyzing a large number of lociDeveloping molecular markers that are workable for the groups of interest is often a bottleneck in phylogenetic researchHere on the basis of a nested polymerase chain reaction (PCR) strategy we present a universal toolkit including 102nuclear protein-coding locus (NPCL) markers for vertebrate phylogenomics The 102 NPCL markers have a broad range ofevolutionary rates which makes them useful for a wide range of time depths The new NPCL toolkit has three importantadvantages compared with all previously developed NPCL sets 1) the kit is universally applicable across vertebrates witha PCR success rate of 946 in 16 widely divergent tested vertebrate species 2) more than 90 of PCR reactions producestrong and single bands of the expected sizes that can be directly sequenced and 3) all cleanup PCR reactions can besequenced with only two specific universal primers To test its actual phylogenetic utility 30 NPCLs from this toolkit wereused to address the higher level relationships of living salamanders Of the 639 target PCR reactions performed on 19salamanders and several outgroup species 632 (989) were successful and 602 (941) were directly sequencedConcatenation and species-tree analyses on this 30-locus data set produced a fully resolved phylogeny and showedthat Cryptobranchoidea (Cryptobranchidae + Hynobiidae) branches first within the salamander tree followed bySirenidae Our experimental tests and our demonstration for a particular case show that our NPCL toolkit is a highlyreliable fast and cost-effective approach for vertebrate phylogenomic studies and thus has the potential to accelerate thecompletion of many parts of the vertebrate tree of life
Key words nuclear marker phylogenomic vertebrate salamander phylogeny
IntroductionBuilding phylogenomic supermatrices with multiple nuclearloci has become the standard method of resolving speciesrelationships in difficult biological scenarios (Delsuc et al2005) One efficient method of constructing multilocusdata sets is expressed sequence tag (EST) (Philippe andTelford 2006 Dunn et al 2008 Philippe et al 2009) ortranscriptome (Kunstner et al 2010) sequencing in whichhigh-quality RNA is extracted from each organism of inter-est and a huge number of ESTs or transcripts are thensequenced by Sanger or next-generation sequencing(NGS) However this approach often generates patchydata sets with a high proportion of missing data whichmay compromise phylogenetic inference (Lemmon et al2009 Roure et al 2013) More importantly this approachis not workable for many older collections because thesespecimens can only provide DNA samples A second effi-cient way to construct multilocus data sets is the sequencecapture method in which target genomic regions are selec-tively captured by hybridization with probes before NGS(Crawford et al 2012 Faircloth et al 2012 Lemmon et al2012 McCormack et al 2012) The most attractive feature
of this method is that it can generate hundreds to thou-sands of loci for many samples in a short time Howeverexperimentally the efficiency of sequence capture is con-siderably influenced by the divergence between probes andtarget sequences (Lemmon et al 2012 McCormack et al2013) More importantly turning the huge data set derivedfrom sequence capture into sequences that researchers cananalyze requires sophisticated bioinformatic processingwhich is currently quite challenging to most phylogeneticresearchers (McCormack et al 2013) Therefore althoughthe sequence capture method is efficient and promising itsimmaturity currently restricts its wide application in thecommunity
Currently for vertebrate phylogenetics the most widelyused approach for building multilocus data sets is still con-ventional targeted polymerase chain reaction (PCR) andthe sequencing of selected orthologous genes Howeverthe PCR-based method is laborious 1) most practitionersspend much time developing and screening molecular mar-kers that are workable for their studied taxa and suitable totheir evolutionary timescale of interest (Murphy et al 2001Li et al 2007 Townsend et al 2008 Wright et al 2008Shen et al 2011) 2) it requires PCR of each organism at
The Author 2013 Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution All rights reserved For permissions pleasee-mail journalspermissionsoupcom
each locus not to mention the extra effort involved in PCRoptimization gel-purification and cloning On the otherhand the PCR-based method also has its advantages 1)it is highly targeted and can produce nearly complete datamatrices and the data analysis process is straightforwardand familiar to most empirical researchers 2) it requiresno prior genomic knowledge of the targeted organismsand 3) it works with tiny amounts of DNA and thusappears to be an ideal solution when DNA samples arelimited
For most interspecific phylogenetic projects nuclear pro-tein-coding loci (NPCLs) that are developed on exons arelikely the markers of choice for the PCR-based strategy be-cause they provide an appropriate level of variation easyalignment across a large phylogenetic span and relativelystraightforward detection of paralogs (Thomson et al 2010)In this study our aim was to develop a suite of universal NPCLmarkers and an efficient experimental protocol for vertebratephylogenomics Aimed at eliminating the drawbacks of theconventional PCR-based method we designed our NPCLtoolkit and protocol to 1) include approximately 100 NPCLmarkers (we think the economic transition from PCR to se-quence capture is at approximately 100 loci if more than 100loci are to be used the PCR method is not cost-efficient) 2)work for all major jawed vertebrate clades and provide goodresolution at different evolutionary timescales 3) producesingle and strong amplicon bands without any PCR optimi-zation in most cases and 4) yield PCR products that can bedirectly cleaned and sequenced without gel purification orcloning in most cases
Because our NPCL toolkit is designed for universal phy-logenetic applications in vertebrates it should be tested ina real case with some difficult samples Salamanders arewell known to have much larger genomes than most ver-tebrates (often 10 times the human genome httpwwwgenomesizecom) The PCR-based method normally per-forms poorly for salamanders (personal experience andcommunication with colleagues) For example Shen et al(2011) amplified 22 NPCL markers in 16 tested vertebratesIn all 15 nonsalamander species approximately 90 of themarkers could be successfully amplified however for thetested salamander species Batrachuperus yenyuanensis only8 of 22 NPCL markers (36) could be amplified Here weapply our NPCL toolkit and protocol to address the higherlevel relationships of living salamanders as a test ofthe toolkitrsquos utility Our results demonstrate that the newuniversal NPCL toolkit and protocol are fast and effectivein constructing multilocus data matrices for vertebratephylogenomics
Results
Experimental Performance and Characteristics of theNew NPCL Toolkit
The newly developed NPCL toolkit contains 102 NPCLmarkers ranging from 510 to 1650 bp with an averagelength of 1050 bp each NPCL marker comprises two pairsof primers for the nested PCR strategy (supplementary table
S1 Supplementary Material online) These 102 NPCL markersare broadly distributed on 21 chromosomes of the humangenome (fig 1) We classified their PCR performance intothree levels 1) producing a single target band of the expectedsize 2) producing a target band but also significant nonspe-cific bands and 3) not producing a target band The first twoconditions are considered successful The PCR performancesof the 102 NPCL markers across 16 diverse vertebrate speciesand three representative electropherograms are shown infigure 2 Of the 102 NPCL markers 57 have a 100 successrate in the 16 tested vertebrate species 87 have a success rateof more than 90 and the remaining 15 range from 56 to88 (fig 2) Of the 1632 PCR reactions (102 loci 16 taxa)1544 (946) were successful with 1485 (91) producingstrong single target bands that can be used for direct se-quencing In the demonstration case in which 30 NPCL mar-kers were used to investigate the higher level relationships ofliving salamanders 632 (989) of the 639 target fragmentswere successful Of the 632 successful reactions 602 (95)were directly sequenced with the general sequencing primersldquoSeq_Frdquo and ldquoSeq_Rrdquo The PCR success rates for each of the102 NPCL markers across the 16 tested vertebrate species areshown in figure 3
The evolutionary rate as evidenced by the degree of var-iability is an important parameter of an NPCL marker be-cause it determines applicability for different phylogeneticquestions Although our NPCL toolkit has a high PCR successrate in highly diverged taxa that success does not meanthat the NPCL markers in the toolkit are very conservedAs figure 3 illustrates our toolkit includes NPCLs with abroad range of evolutionary rates approximately 4-foldAmong the 102 NPCL markers 60 evolve faster than RAG1an NPCL that has been widely used for phylogenetic inferencein various vertebrate groups Because previous analyses basedon RAG1 data resulted in highly resolved and robustly sup-ported phylogenetic relationships at multiple hierarchicallevels (San Mauro et al 2005 Wiens et al 2005 Hugallet al 2007 Roelants et al 2007) this indicates that ourNPCL toolkit has the potential to resolve questions of bothdeep and shallow phylogeny
It is well known that the fish-specific genome duplicationoccurred in the teleosts (Meyer and Van de Peer 2005)Although most duplicated genes were secondarily lostsome were retained or evolved new functions For anNPCL marker if there are two similar copies in teleost ge-nomes it is difficult to check the orthologous status of theobtained fragments To this end we took the zebrafishsequence of each NPCL to Basic Local Alignment SearchTool (Blast) against all available teleost genomes in theENSEMBL database If an NPCL receives more than twoBlast hits and the top Blast score is not more than twicethe second Blast score that NPCL might have an extra copyin teleost genomes Using this method of the 102 NPCLs itwas found that only six (CXCR4 GLCE KCNF1 LINGO1NTN1 and PCDH10) may have extra copies in teleost ge-nomes (fig 3 supplementary table S1 SupplementaryMaterial online) This result indicates that our NPCL toolkitis also suitable for phylogenetic inference in teleosts
2236
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Phylogenetic Performance in a Real Case
Our demonstration case included 19 salamander species thatspan salamander evolutionary diversity (supplementary tableS2 Supplementary Material online) The nine outgroupspecies (two frogs two caecilians one turtle one bird twomammals and one coelacanth) provided a largely balancedrepresentation of relatives of salamanders The 30 newlyamplified NPCLs exhibited levels of variation comparablewith that of the traditional RAG1 gene with variable sitesvarying between 30 and 51 of all sequenced sites (table 1)The data set combining these 30 NPCLs comprises 27834 bpand exhibits little substitution saturation (supplementary figS1 Supplementary Material online) The phylogenetic analy-ses of the concatenated data set using three tree-buildingmethods (maximum likelihood [ML] Bayesian and CAT-mixture model) produced an identical fully resolved treefor 28 taxa (fig 4) In all 25 nodes of the tree the statisticalsupport was highly robust (BPML 99ndash100 PPBAY = 10PPCAT = 10) The species tree estimated from 30 individualNPCLs without data concatenation using the pseudo-ML
approach is identical to those estimated from the concate-nated analyses All nodes received bootstrap support valuesvarying between 74 and 100 (fig 4) We also conducted phy-logenetic analyses at the amino acid level (9278 deducedamino acid residues) using three tree-building methods(ML Bayesian and CAT-mixture model) The protein treetopology is identical to the DNA result with just slightlylower branch support for some nodes (supplementary figS2 Supplementary Material online) Therefore we did notfurther analyze the protein data set
The monophyly of extant amphibians with respect to am-niotes and the close relationship between frogs and salaman-ders (the Batrachia hypothesis) are repeatedly recovered inmost recent molecular studies (San Mauro et al 2005 Zhanget al 2005 Frost et al 2006 Hugall et al 2007 Roelants et al2007 Zhang and Wake 2009 San Mauro 2010 Pyron andWiens 2011) However a recent molecular study based on26 nuclear genes (Fong et al 2012) supports a caecilianndashsalamander sister relationship with the possible paraphylyof extant amphibians Our phylogenetic analyses based on
RERE
KIAA2013SPEN
CPT2
LPHN2
LRRC8D
DISP1
EXOC8
GGPS1
1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22 X
KCNF1
SOCS5
MSH6
LRRTM4
LCT
CXCR4
B3GALT1
TTN
IRS1
SH3BP4
LRRN1
CELSR3
GRM2
CASR
ZIC1
P2RY1
MB21D2
KIAA1239
ANKRD50
FAT4PCDH10
FAT1
ENC1
DMXL1
PCDH1ARSI
FAT2
FILIP1DOPEY1
FUT9
REV3L
DSE
SYNE1
GPERMIOS
KBTBD2
PCLO
PIK3CG
LRRN3
EXTL3
MOS
VCPIP1
PDP1
ZFPM2
ZHX2
LINGO2ZEB1
ROR2
GRIN3A
SVEP1
DBC1
DOLK
DCHS1
RAG1RAG2HYPCHST1
FZD4
ARID2
CAND1
MGAT4C
FICD
SACS
FREM2
MYCBP2
SLITRK1
LIG4
STON2
DISP2
VPS18
CILPFEM1BGLCELINGO1
DET1
PPL
DNAH3
SALL1
NTN1
MED1
WFIKKN2
MED13BPTF
EVPL
SETBP1
DSEL
FLRT3
ADNPZBED4
PANX2
TLR7NHS
RP2
FIG 1 Chromosome mapping of the 102 NPCL markers in the Homo sapiens genome
2237
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
(a)
(b)
(c)4500bp
1200bp
200bp
4500bp
4500bp
1200bp
200bp
1200bp
200bp
4500bp
1200bp
200bp
4500bp
4500bp
1200bp
200bp
1200bp
200bp
4500bp
1200bp
200bp
4500bp
4500bp
1200bp
200bp
1200bp
200bp
Tim
e (M
a) 300
200
100
400
0
457
Tim
e (M
a) 300
200
100
400
0
457
Tim
e (M
a) 300
200
100
400
0
457
102 NPCLs
34 NP
CLs
16 Taxa 16 Taxa 16 Taxa 34 NP
CLs
34 NP
CLs
ADNP
ANKRD50
ARSI
BPTF
CAND1
CASR
CXCR4
DBC1
DISP1
DISP2
DNAH3
DOPEY1
ENC1
EXTL3
FAT1
FAT4
FICD
FLRT3
FZD4
GGPS1
GLCE
GRIN3A
GRM2
KBTBD2
KCNF1
KIAA2013
LIG4
LINGO1
LPHN2
LRRN1
MB21D2
MIOS
MYCBP2
NHS
P2RY1
PANX2
PIK3CG
RAG1
RAG2
ROR2
SACS
SALL1
SETBP1
SOCS5
SPEN
STON2
VPS18
ZEB1
ZFPM2
ZHX2
HYP
CHST1
RERE
SVEP1
LCT
PDP1
MED13
LINGO2
LRRC8D
LRRTM4
RP2
SH3BP4
VCPIP1
DET1
FREM2
MSH6
PCLO
PPL
ARID2
DCHS1
DSEL
FILIP1
KIAA1239
SLITRK1
CPT2
MGAT4C
FEM1B
DMXL1
DOLK
ZBED4
REV3L
IRS1
FAT2
CILP
FUT9
LRRN3
TTN
GPER
WFIKKN2
MED1
EXOC8
B3GALT1
CELSR3
DSE
EVPL
PCDH1
TLR7
MOS
PCDH10
NTN1
SYNE1
ZIC1
CAND1 HYP IRS1
FAT2
CELSR3
RERE
SVEP1
DNAH3
GRM2
Sphyrna
Pangasius
Lepisosteus
Protopterus
IchthyophisB
atrachuperusR
anaM
us
StruthioZ
osteropsC
rocodylus
Trionyx
Podocnem
is
Sus
Hem
idactylusN
aja
Sphyrna
Pangasius
Lepisosteus
Protopterus
IchthyophisB
atrachuperusR
anaM
us
StruthioZ
osteropsC
rocodylus
Trionyx
Podocnem
is
Sus
Hem
idactylusN
aja
Sphyrna
Pangasius
Lepisosteus
Protopterus
IchthyophisB
atrachuperusR
anaM
us
StruthioZ
osteropsC
rocodylus
Trionyx
Podocnem
is
Sus
Hem
idactylusN
aja
FIG 2 PCR performance of the 102 NPCL markers in 16 divergent vertebrate species Each square and electrophoretic lane is aligned with the testedspecies (a) The draft divergence timescale for the 16 tested vertebrate species is based on Inoue et al (2010) and the book The Timetree of Life (b) ThePCR performance of each NPCL marker is ranked by three different-colored squares black producing single target band gray having a target band butwith significant nonspecific bands white no target band The102 NPCL markers are sorted according to their PCR success rates (c) Three typicalagarose electrophoresis results for 9 NPCL markers
2238
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
30 nuclear genes provide further support for the monophylyof lissamphibians and the Batrachia hypothesis (fig 4)Additionally all possible hypotheses against the monophylyof extant amphibians and the Batrachia hypothesis wererejected in our topological tests (table 2) However theBatrachia hypothesis did not receive strong support in ourspecies tree analysis (BPMP-EST = 74 fig 4) suggesting thatmore nuclear genes are still needed to resolve this node
The monophyly of the internally fertilizing salamanders(Salamandroidea all salamanders exclusive of HynobiidaeCryptobranchidae and Sirenidae) is strongly supported inour analyses (fig 4) in line with most recent molecular studies(Wiens et al 2005 Roelants et al 2007 Zhang and Wake 2009Pyron and Wiens 2011) but differing strongly from Frostet al (2006) who recovered a clade comprising SirenidaeDicamptodontidae Ambystomatidae and SalamandridaeThe internally fertilizing salamanders include two well-supported clades one is composed of AmbystomatidaeDicamptodontidae and Salamandridae and the otherof Proteidae Rhyacotritonidae Amphiumidae andPlethodontidae (fig 4)
Currently two hypotheses have been proposed regardingthe basal split within living salamanders The traditional viewfavors Sirenidae as the sister group to all remaining salaman-ders (Duellman and Trueb 1994) This hypothesis receivedstrong support in two recent studies (based on mitochon-drial genomes BPML = 98 Zhang and Wake 2009 based onmitochondrial genomes and nuclear genes BPMLgt 80 SanMauro 2010) In contrast some studies suggest that thebasal split separates Cryptobranchidae + Hynobiidae fromall other salamanders (Gao and Shubin 2001 Wiens et al2005 Frost et al 2006 Roelants et al 2007 Pyron and Wiens2011) but always without strong support (BPMLlt 71) Ourphylogenetic analyses based on 30 independent NPCLs sup-ported the second hypothesis that Cryptobranchoidea(Cryptobranchidae + Hynobiidae) branched first withinthe living salamanders This result is extremely robust inour concatenation analyses (BPML = 99 PPBAY = 10PPCAT = 10 fig 4) and statistically rejects all alternative hy-potheses (table 2) In the species tree analysis without dataconcatenation this result is also strong (BPMP-EST = 83fig 4)
How many nuclear genes then are needed to robustlyresolve the question of the basal split within living salaman-ders Our analysis of data subsets indicates a progressiveincrease in the bootstrap support value for the node of inter-est (fig 4) when an increasing number of genes are analyzed(fig 5) Analyses based on single genes rarely resolve the nodeof interest with any confidence Analyses based on 5ndash10 genesproduce bootstrap support values of 60ndash80 in concatena-tion analyses (fig 5) which is congruent with all previousnuclear studies using similar-sized data sets (Roelants et al2007 Pyron and Wiens 2011) Taking a bootstrap value of 95in concatenation analyses as the threshold of ldquofully resolvedrdquothe minimum number of nuclear genes needed to resolve theroot of the salamander tree is approximately 25 The previouscontradictory results may be due to the overwhelminglystrong signals from the mitochondrial genome Because
100908070604 0503020101020304060 50708090100
100908070604 0503020101020304060 50708090100
PCR success rate in 16 vertebrates () Relative evolutionary rate
FIG 3 Relative evolutionary rates of 102 NPCL in vertebrates The 102NPCLs are arranged in order of increasing variability on the right sideand their PCR success rates in the 16 tested vertebrates are shown onthe left side NPCLs indicated with asterisks may have extra copiesin teleost genomes and thus are not suitable for phylogenetic studiesof teleosts
2239
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
the initial diversification of salamanders occurred withina relatively short window of time (Weisrock et al 2005)the genealogical histories of individual gene loci maysometimes appear misleading in terms of the relation-ships among species due to incomplete lineage sortingUnfortunately the mitochondrial genome recorded such anincorrect history
DiscussionThe NPCL toolkit and experimental protocol introducedhere is a highly reliable rapid and cost-effective method forbuilding medium-scale multilocus data to produce high-resolution phylogenetic relationships This phylogenomicapproach has the potential to accelerate the completion ofmany parts of the vertebrate tree of life because no furthermarker development is required which is often the bottle-necks in phylogenetic research Once a specific phylogeneticquestion within vertebrates arises researchers simply need tocheck the list for our toolkit and look for NPCL markers withexpected evolutionary rates and experimental performance
for their groups of interest Then many orthologous loci canbe quickly obtained by traditional PCR and Sanger sequenc-ing usually without time-consuming gel cutting and cloningApplying the NPCL toolkit may also have another benefit forassembling the vertebrate tree of life because people workingon different groups can easily use the same set of loci whichwill facilitate combined analyses
Merits of the Toolkit
Because of the use of the nested PCR strategy outlined earliermost NPCL in the toolkit work for all major jawed vertebrategroups with high experimental success rates (nor-mallygt 95) Such results were achieved in unified PCR con-ditions without any extra effort involving cycling conditionoptimization This feature of the toolkit enables it to surpasspreviously developed nuclear marker sets (Murphy et al 2001Li et al 2007 Thomson et al 2008 Townsend et al 2008Wright et al 2008 Portik et al 2011 Shen et al 2011Zhou et al 2011) Most previous nuclear marker sets weredeveloped for specific animal groups and their application to
Table 1 Summary Information for the 30 NPCL Amplified in 19 Salamander Taxa
NOTEmdashLength length of refined alignment Var sites variable sites PI sites parsimony informative sites RCV relative composition variability
2240
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Aneides hardii
Plethodon jordani
Batrachoseps major
Eurycea bislineata
Amphiuma means
Rhyacotriton variegatus
Proteus anguinus
Necturus beyeri
Tylototriton asperrimus
Cynops orientalis
Salamandra salamandra
Dicamptodon aterrimus
Ambystoma mexicanum
Pseudobranchus axanthus
Siren intermedia
Ranodon sibiricus
Batrachuperus yenyuanensis
Onychodactylus fischeri
Andrias davidianus
Silurana tropicalis
Bombina fortinuptialis
Typhlonectes natans
Gallus gallus
Ichthyophis bannanicus
Homo sapiens
Chrysemys picta bellii
Mus musculus
Latimeria chalumnae
01 subsititutionssite
Dicamptodontidae
Hynobiidae
Cryptobranchidae
Sirenidae
Plethodontidae
Rhyacotritonidae
Amphiumidae
Proteidae
Salamandridae
Ambystomatidae
ANURA
GYMNOPHIONA
Cry
ptob
ranc
hoid
eaSa
lam
andr
oide
a
30 nuclear genes
(total 27834 bp)
1
Non-amphibianOutgroup
99101083
99101074
FIG 4 Higher-level phylogenetic relationships of 10 salamander families inferred from 30 NPCL markers The tree was inferred by concatenationanalyses using ML BI and the mixture model (CAT) and by species-tree analysis using the pseudo-ML approach (MP-EST) Branch support valuesare indicated beside nodes in order of ML bootstrap (BPML) BI posterior probability (PPBI) CAT posterior probability (PPCAT) and MP-EST bootstrap(BPMP-EST) from left to right The filled squares represent BPMLgt 95 PPBAY = 10 PPCAT = 10 and BPMP-ESTgt 95 The circled number refers to the nodeof interest studied in figure 6 Branch lengths are from the ML analysis
Table 2 Statistical Confidence (P Values) for Alternative Branching Hypotheses Based on 30-Gene Data Set
Alternative Topology Tested Ln L P Value Rejection
Sirenidae is sister to Cryptobranchoidea 423 0004 0002 0004 + + +
Gymnophiona is sister to Anura (monophyletic lissamphibians) 343 0013 0012 0013 + + +
Gymnophiona is sister to Caudata (monophyletic lissamphibians) 439 0002 0001 0001 + + +
Anura is sister to Amniota (paraphyletic lissamphibians) 1446 5E30 0 0 + + +
Gymnophiona is sister to Amniota (paraphyletic lissamphibians) 1290 1E69 0 0 + + +
Caudata is sister to Amniota (paraphyletic lissamphibians) 1728 00001 0 0 + + +
2241
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
other distantly related groups is usually difficult For exampleSpinks et al (2010) collected 120 nuclear markers from aviansquamate and mammalian phylogenetic studies and evalu-ated their PCR performance in turtles They found that onlyeight nuclear markers successfully produced single expectedbands across 13 tested turtle species In another case Fongand Fujita (2011) developed 75 nuclear markers for vertebratephylogenetics but approximately 60 of the target fragmentswere unable to obtain in three test species (two reptiles andone lissamphibian) Therefore although the nested PCRmethod introduced here requires an additional PCR reactionthe extra work is still worthwhile
In PCR-based phylogenetic projects even when thePCR reactions are successful the products often contain sig-nificant nonspecific amplicons Such a condition requires ad-ditional effort involving gel purification and cloning whichinvolves much more time than the PCR reaction Our NPCLtoolkit is specifically designed to solve this problem so thatnormally over 90 of PCR reactions produce strong andsingle expected bands Moreover most of the primers usedto date for nuclear marker sets are degenerate and thus arenot suitable for direct sequencing PCR products Benefitingfrom the use of our nested PCR strategy we introduce an-choring sequences to the ends of PCR fragments while main-taining PCR efficiency Such introduced anchoring sequencesbring the added benefit that two specific sequencing primers(Seq_F and Seq_R) can be used in all Sanger sequencingreactions
One additional feature of our NPCL toolkit is that theaverage length of the NPCLs within it is 1050 bp a lengththat can easily be amplified in one PCR reaction and se-quenced in both directions to allow efficient use of resourcesIn contrast the average marker lengths are 920 bp for 10NPCLs in Li et al (2007) 760 bp for 26 NPCLs in Townsendet al (2008) 873 bp for 22 NPCLs in Shen et al (2011) and470 bp for 75 NPCLs in Fong and Fujita (2011) respectivelyLonger markers will provide more sites than shorter ones for
equivalent money and time This feature makes our NPCLtoolkit more cost-effective than previously developed nuclearmarker sets
Phylogenetic Utility
The vertebrate NPCL toolkit we developed here shows greatpromise in terms of phylogenetic utility A remarkable featureof our NPCL toolkit is that it provided 102 NPCLs with abroad range of evolutionary rates In the case of our demon-stration we used 30 NPCLs to resolve a family-level salaman-der phylogeny using both traditional concatenation analysesand a more promising species-tree analysis However thisexample does not mean that our toolkit performs wellonly on deep-timescale questions Our ongoing study usingthis toolkit to resolve the intra-relationships withinPlethodontidae a rapidly radiating group of salamanders sug-gests that the toolkit developed here also performs well inresolving species-level phylogenies For many vertebrategroups in which applicable nuclear markers are limitedsuch as some teleosts frogs and salamanders our NPCLtoolkit can provide a one-stop solution for phylogenetic stud-ies from the family level to the species level Even for thosegroups in which specific nuclear marker sets have beendeveloped our toolkit is still worth trying as many moreloci can be easily obtained that may resolve some difficultbranches
The Toolkit Is a Good Addition to Sequence CaptureApproaches
Recently sequence capture approaches have been applied tovertebrate phylogenomics (Crawford et al 2012 Fairclothet al 2012 Lemmon et al 2012 McCormack et al 2012)These approaches begin with the selective capture of geno-mic regions Briefly fragmented gDNA is hybridized to DNAor RNA probes either on an array or in solution NontargetedDNA is then washed away and the targeted DNA is se-quenced through NGS The most promising feature of thesequence capture approach is that it can simultaneously pro-duce hundreds to thousands of loci for tens of individualswithin a relatively short time Therefore the sequence captureapproach is considered to be much more cost-effective thanthe PCR-based method According to the calculation ofLemmon et al (2012) for a 100 taxa 500 loci project thecost of the sequence capture method is just 1ndash35 of thePCR-based method
However the sequence capture approach is currently toochallenging for most phylogenetic researchers Typical NGSruns (454 or Illumina) used by the sequence capture methodgenerate 1000000ndash2000000000 sequences Storing and pro-cessing these NGS data require significant computer memoryhardware upgrades and bioinformatic programming skillswhich are often not familiar to most phylogenetic researchersMoreover phylogenetic reconstruction assumes that ortho-logous genes are being analyzed across species For the PCR-based method the detection of paralogous genes is relativelystraightforward However in the sequence capture methodthe captured genomic regions comprise short conservedcores (probe regions) and long unconserved flanking
1
100
90
80
70
60
50
30
20
10
0
40
5 10 15 20 25 30
Concatenation analyses
Species tree analyses
Boo
tstr
ap s
uppo
rt (
)
Number of genes
FIG 5 The effect of increasing the number of nuclear loci on resolvingthe basal split within salamanders Each data point represents the meanof support values estimated from 30 randomly sampled subsets Thedashed line indicates the threshold of 95 bootstrap support valuesThe statistical plots show that the minimum number of nuclear locineeded to robustly resolve the basal split within salamanders is 25
2242
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
sequences Because paralogy cannot be detected until afterthe data are aligned those unalignable sequences will makethe detection of paralogy more difficult
In fact not every phylogenetic project will use more than500 loci as the sequence capture method normally doesBased on both empirical and simulation data 20ndash50 lociare generally sufficient to answer many phylogenetic ques-tions (Rokas et al 2003 Spinks et al 2009) This is also thenumber of loci that most phylogenetic studies will use Insuch a situation adopting the sequence capture method isnot cost-effective because researchers need to use relativelyexpensive NGS sequencing and spend time learning new ex-perimental techniques and carrying out sophisticated bioin-formatic processing Our NPCL toolkit is specially designed forsuch medium-scale phylogenetic projects using approxi-mately 50 loci Such a number of expected loci can beeasily fulfilled with our 102 NPCLs Because more than 90of the PCR reactions generated by our toolkit can be directlysequenced the average cost for one locus per sample is ratherlow In our laboratory generating one new sequence typicallycosts US$ 3 (without considering labor)
In addition researchers sometimes have only tiny amountsof DNA but they wish to perform a multilocus phylogeneticanalysis In such a situation the sequence capture method isdifficult to implement because it normally requires DNA atthe microgram level (Lemmon et al 2012) Our NPCL toolkitcan fill the gap here Benefiting from the use of the nestedPCR strategy the sensitivity of PCR reactions in our method isextremely high In many test experiments in our laboratorythe toolkit and protocol could produce target bands withonly 5ndash10 ng of DNA
Our NPCL toolkit is an alternative to the sequence capturemethod for the everyday work of phylogenetic researchersWhich method to choose depends on two major drivers theamounts of DNA and the expected number of loci Whenyour DNA is limited the better solution may be PCR other-wise sequence capture also works Taking into account themoney and time the two methods require we speculate thatthe economic transition point from PCR to sequence captureis at approximately 100 loci That assessment is why ourtoolkit includes 102 NPCL markers Our proposal is thatwhen using 100 loci one can try our NPCL toolkit whenusing gt100 loci sequence capture should be used
Future Directions
In this study we used multiple genome alignments depositedin the University of CaliforniandashSan Cruz (UCSC) genomebrowser to identify long and conserved exons across jawedvertebrates Benefiting from the use of a nested PCR strategythe experimental performance of the developed NPCLs indi-cated that they are highly stable in all major jawed vertebrategroups Recently a database for mining exon and intron mar-kers called EvolMarkers has been built by Li et al (2012)Careful investigation of this database may identify many con-served exons within nonvertebrates whose interrelationshipsare currently more problematic than those of vertebratesBecause the nonvertebrates constitute many distantly related
groups it may be impossible to develop a single set of PCRprimers for all nonvertebrates However following a similarmarker development strategy multiple NPCL toolkits couldbe constructed for various groups of nonvertebrates such asarthropods echinoderms and molluscs In addition becauseintrons are flanked by conserved exons the idea of the use ofnested PCRs for marker development could also be applied tothe development of EPIC (exon-primed intron crossing) mar-kers which are more suitable in shallow-scale phylogenetic orphylogeographic projects
Despite the benefits of our proposed method it must berecognized that when handling large-scale projects such as200 taxa 100 loci the use of our toolkit and Sanger se-quencing will still require significant cost time and laborAn alternative solution is to use NGS to replace Sanger se-quencing Recently 454 NGS technology has been applied tosequence-targeted gene regions from a pool of PCR productsfrom different specimens (Binladen et al 2007 Meyer et al2008) In such experiments specific tagging sequences mustbe added to amplicons by either PCR (Binladen et al 2007) orblunt-end ligation (Meyer et al 2008) Therefore if the tailingsequences of the second-round PCR primers in our NPCLtoolkit are replaced by tagging sequences instead (for tagdesigning see Faircloth and Glenn 2012) all PCR productscan be pooled together and sequenced with the 454 NGSwhich will greatly reduce the money and time cost comparedwith Sanger sequencing However parallel tagged sequencingvia NGS does not circumvent the process of PCR for eachindividual at each locus which may be the most onerous partof a large-scale phylogenomic project Some promising newtechnologies may help to solve this problem such as micro-droplet PCR (Tewhey et al 2009) where millions of individualPCR reactions are performed in picoliter-scale droplets simul-taneously and the 9696 Dynamic Array by Fluidigm whichallows 96 primer combinations to be used on 96 samples(9216 total PCR reactions) on a single PCR plate Howeverthere has been little research to applying NGS and new high-throughput PCR technologies to phylogenomics so theirease-of-use and cost-effectiveness still need to be explored
Summary
In conclusion we have developed an improved method forrapidly amplifying and sequencing NPCLs that has proven tobe useful and effective for molecular phylogenetic studies ofvertebrates The newly developed toolkit provides an attrac-tive alternative to available methods for vertebratephylogenomics
Materials and Methods
Development of NPCL and Primer Design
Our previous study showed that nested PCR is overwhel-mingly more effective than conventional PCR for obtainingtarget amplicons from complex genomic environments (Shenet al 2012) However nested PCR requires four conservedregions to design two pairs of primers (illustrated in fig 6yellow blocks represent the conserved regions used for primerdesign) which means that only relatively long exons are
2243
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li
et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons
As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the
Enrich target region from complex genomic environment with one pair of high degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min
Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min
Evaluate agarose gel electrophoretic results and sequencing
gDNA
the first round PCR product
Second PCR with tailed primers F2 and R2 using the 1st PCR as template
First PCR with primers F1 and R1 using gDNA as template
PCR evaluating and sequencing
25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing
A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product
PCR was performed with 50-100 ng DNA in a 25 ul reaction
PCR was performed with 1ul 1st PCR in a 25 ul reaction
(i)
(ii)
(iii)
F1
R2
F2
Seq_F
Seq_R
R1
Target Region
Target Region
Target Region
Target Region
Target Region
conserved blocks
single and strong bandN
Y (normally gt 90)
gel cutting or cloning then sequencing
cleanup with ExoI and FastAPdirect sequencing by general sequencing primers
Seq_F and Seq_R
Laboratory Protocol
FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region
2244
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR
Experimental Testing for Candidate Markers in16 Jawed Vertebrates
To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification
All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C
One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the
PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6
Calculation of Relative Evolutionary Rateof 102 NPCLs
The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates
Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships
To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant
2245
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online
Phylogenetic Analyses
Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)
For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model
The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence
We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree
Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8
model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates
Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)
Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)
Acknowledgments
The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)
ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R
Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197
Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786
Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375
Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press
Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749
Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543
Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726
Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990
Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307
Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370
2246
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577
Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321
Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563
Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586
Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518
Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179
Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276
Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701
Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288
Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145
Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744
Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744
Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971
Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302
McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754
McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538
Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945
Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278
Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618
Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712
Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620
Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10
Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583
Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892
Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804
Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542
Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214
San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561
San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599
Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252
Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256
Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508
Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247
Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947
Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129
Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956
Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690
Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031
Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525
Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195
Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142
Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777
Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110
2247
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156
Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508
Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400
Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140
2248
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
each locus not to mention the extra effort involved in PCRoptimization gel-purification and cloning On the otherhand the PCR-based method also has its advantages 1)it is highly targeted and can produce nearly complete datamatrices and the data analysis process is straightforwardand familiar to most empirical researchers 2) it requiresno prior genomic knowledge of the targeted organismsand 3) it works with tiny amounts of DNA and thusappears to be an ideal solution when DNA samples arelimited
For most interspecific phylogenetic projects nuclear pro-tein-coding loci (NPCLs) that are developed on exons arelikely the markers of choice for the PCR-based strategy be-cause they provide an appropriate level of variation easyalignment across a large phylogenetic span and relativelystraightforward detection of paralogs (Thomson et al 2010)In this study our aim was to develop a suite of universal NPCLmarkers and an efficient experimental protocol for vertebratephylogenomics Aimed at eliminating the drawbacks of theconventional PCR-based method we designed our NPCLtoolkit and protocol to 1) include approximately 100 NPCLmarkers (we think the economic transition from PCR to se-quence capture is at approximately 100 loci if more than 100loci are to be used the PCR method is not cost-efficient) 2)work for all major jawed vertebrate clades and provide goodresolution at different evolutionary timescales 3) producesingle and strong amplicon bands without any PCR optimi-zation in most cases and 4) yield PCR products that can bedirectly cleaned and sequenced without gel purification orcloning in most cases
Because our NPCL toolkit is designed for universal phy-logenetic applications in vertebrates it should be tested ina real case with some difficult samples Salamanders arewell known to have much larger genomes than most ver-tebrates (often 10 times the human genome httpwwwgenomesizecom) The PCR-based method normally per-forms poorly for salamanders (personal experience andcommunication with colleagues) For example Shen et al(2011) amplified 22 NPCL markers in 16 tested vertebratesIn all 15 nonsalamander species approximately 90 of themarkers could be successfully amplified however for thetested salamander species Batrachuperus yenyuanensis only8 of 22 NPCL markers (36) could be amplified Here weapply our NPCL toolkit and protocol to address the higherlevel relationships of living salamanders as a test ofthe toolkitrsquos utility Our results demonstrate that the newuniversal NPCL toolkit and protocol are fast and effectivein constructing multilocus data matrices for vertebratephylogenomics
Results
Experimental Performance and Characteristics of theNew NPCL Toolkit
The newly developed NPCL toolkit contains 102 NPCLmarkers ranging from 510 to 1650 bp with an averagelength of 1050 bp each NPCL marker comprises two pairsof primers for the nested PCR strategy (supplementary table
S1 Supplementary Material online) These 102 NPCL markersare broadly distributed on 21 chromosomes of the humangenome (fig 1) We classified their PCR performance intothree levels 1) producing a single target band of the expectedsize 2) producing a target band but also significant nonspe-cific bands and 3) not producing a target band The first twoconditions are considered successful The PCR performancesof the 102 NPCL markers across 16 diverse vertebrate speciesand three representative electropherograms are shown infigure 2 Of the 102 NPCL markers 57 have a 100 successrate in the 16 tested vertebrate species 87 have a success rateof more than 90 and the remaining 15 range from 56 to88 (fig 2) Of the 1632 PCR reactions (102 loci 16 taxa)1544 (946) were successful with 1485 (91) producingstrong single target bands that can be used for direct se-quencing In the demonstration case in which 30 NPCL mar-kers were used to investigate the higher level relationships ofliving salamanders 632 (989) of the 639 target fragmentswere successful Of the 632 successful reactions 602 (95)were directly sequenced with the general sequencing primersldquoSeq_Frdquo and ldquoSeq_Rrdquo The PCR success rates for each of the102 NPCL markers across the 16 tested vertebrate species areshown in figure 3
The evolutionary rate as evidenced by the degree of var-iability is an important parameter of an NPCL marker be-cause it determines applicability for different phylogeneticquestions Although our NPCL toolkit has a high PCR successrate in highly diverged taxa that success does not meanthat the NPCL markers in the toolkit are very conservedAs figure 3 illustrates our toolkit includes NPCLs with abroad range of evolutionary rates approximately 4-foldAmong the 102 NPCL markers 60 evolve faster than RAG1an NPCL that has been widely used for phylogenetic inferencein various vertebrate groups Because previous analyses basedon RAG1 data resulted in highly resolved and robustly sup-ported phylogenetic relationships at multiple hierarchicallevels (San Mauro et al 2005 Wiens et al 2005 Hugallet al 2007 Roelants et al 2007) this indicates that ourNPCL toolkit has the potential to resolve questions of bothdeep and shallow phylogeny
It is well known that the fish-specific genome duplicationoccurred in the teleosts (Meyer and Van de Peer 2005)Although most duplicated genes were secondarily lostsome were retained or evolved new functions For anNPCL marker if there are two similar copies in teleost ge-nomes it is difficult to check the orthologous status of theobtained fragments To this end we took the zebrafishsequence of each NPCL to Basic Local Alignment SearchTool (Blast) against all available teleost genomes in theENSEMBL database If an NPCL receives more than twoBlast hits and the top Blast score is not more than twicethe second Blast score that NPCL might have an extra copyin teleost genomes Using this method of the 102 NPCLs itwas found that only six (CXCR4 GLCE KCNF1 LINGO1NTN1 and PCDH10) may have extra copies in teleost ge-nomes (fig 3 supplementary table S1 SupplementaryMaterial online) This result indicates that our NPCL toolkitis also suitable for phylogenetic inference in teleosts
2236
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Phylogenetic Performance in a Real Case
Our demonstration case included 19 salamander species thatspan salamander evolutionary diversity (supplementary tableS2 Supplementary Material online) The nine outgroupspecies (two frogs two caecilians one turtle one bird twomammals and one coelacanth) provided a largely balancedrepresentation of relatives of salamanders The 30 newlyamplified NPCLs exhibited levels of variation comparablewith that of the traditional RAG1 gene with variable sitesvarying between 30 and 51 of all sequenced sites (table 1)The data set combining these 30 NPCLs comprises 27834 bpand exhibits little substitution saturation (supplementary figS1 Supplementary Material online) The phylogenetic analy-ses of the concatenated data set using three tree-buildingmethods (maximum likelihood [ML] Bayesian and CAT-mixture model) produced an identical fully resolved treefor 28 taxa (fig 4) In all 25 nodes of the tree the statisticalsupport was highly robust (BPML 99ndash100 PPBAY = 10PPCAT = 10) The species tree estimated from 30 individualNPCLs without data concatenation using the pseudo-ML
approach is identical to those estimated from the concate-nated analyses All nodes received bootstrap support valuesvarying between 74 and 100 (fig 4) We also conducted phy-logenetic analyses at the amino acid level (9278 deducedamino acid residues) using three tree-building methods(ML Bayesian and CAT-mixture model) The protein treetopology is identical to the DNA result with just slightlylower branch support for some nodes (supplementary figS2 Supplementary Material online) Therefore we did notfurther analyze the protein data set
The monophyly of extant amphibians with respect to am-niotes and the close relationship between frogs and salaman-ders (the Batrachia hypothesis) are repeatedly recovered inmost recent molecular studies (San Mauro et al 2005 Zhanget al 2005 Frost et al 2006 Hugall et al 2007 Roelants et al2007 Zhang and Wake 2009 San Mauro 2010 Pyron andWiens 2011) However a recent molecular study based on26 nuclear genes (Fong et al 2012) supports a caecilianndashsalamander sister relationship with the possible paraphylyof extant amphibians Our phylogenetic analyses based on
RERE
KIAA2013SPEN
CPT2
LPHN2
LRRC8D
DISP1
EXOC8
GGPS1
1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22 X
KCNF1
SOCS5
MSH6
LRRTM4
LCT
CXCR4
B3GALT1
TTN
IRS1
SH3BP4
LRRN1
CELSR3
GRM2
CASR
ZIC1
P2RY1
MB21D2
KIAA1239
ANKRD50
FAT4PCDH10
FAT1
ENC1
DMXL1
PCDH1ARSI
FAT2
FILIP1DOPEY1
FUT9
REV3L
DSE
SYNE1
GPERMIOS
KBTBD2
PCLO
PIK3CG
LRRN3
EXTL3
MOS
VCPIP1
PDP1
ZFPM2
ZHX2
LINGO2ZEB1
ROR2
GRIN3A
SVEP1
DBC1
DOLK
DCHS1
RAG1RAG2HYPCHST1
FZD4
ARID2
CAND1
MGAT4C
FICD
SACS
FREM2
MYCBP2
SLITRK1
LIG4
STON2
DISP2
VPS18
CILPFEM1BGLCELINGO1
DET1
PPL
DNAH3
SALL1
NTN1
MED1
WFIKKN2
MED13BPTF
EVPL
SETBP1
DSEL
FLRT3
ADNPZBED4
PANX2
TLR7NHS
RP2
FIG 1 Chromosome mapping of the 102 NPCL markers in the Homo sapiens genome
2237
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
(a)
(b)
(c)4500bp
1200bp
200bp
4500bp
4500bp
1200bp
200bp
1200bp
200bp
4500bp
1200bp
200bp
4500bp
4500bp
1200bp
200bp
1200bp
200bp
4500bp
1200bp
200bp
4500bp
4500bp
1200bp
200bp
1200bp
200bp
Tim
e (M
a) 300
200
100
400
0
457
Tim
e (M
a) 300
200
100
400
0
457
Tim
e (M
a) 300
200
100
400
0
457
102 NPCLs
34 NP
CLs
16 Taxa 16 Taxa 16 Taxa 34 NP
CLs
34 NP
CLs
ADNP
ANKRD50
ARSI
BPTF
CAND1
CASR
CXCR4
DBC1
DISP1
DISP2
DNAH3
DOPEY1
ENC1
EXTL3
FAT1
FAT4
FICD
FLRT3
FZD4
GGPS1
GLCE
GRIN3A
GRM2
KBTBD2
KCNF1
KIAA2013
LIG4
LINGO1
LPHN2
LRRN1
MB21D2
MIOS
MYCBP2
NHS
P2RY1
PANX2
PIK3CG
RAG1
RAG2
ROR2
SACS
SALL1
SETBP1
SOCS5
SPEN
STON2
VPS18
ZEB1
ZFPM2
ZHX2
HYP
CHST1
RERE
SVEP1
LCT
PDP1
MED13
LINGO2
LRRC8D
LRRTM4
RP2
SH3BP4
VCPIP1
DET1
FREM2
MSH6
PCLO
PPL
ARID2
DCHS1
DSEL
FILIP1
KIAA1239
SLITRK1
CPT2
MGAT4C
FEM1B
DMXL1
DOLK
ZBED4
REV3L
IRS1
FAT2
CILP
FUT9
LRRN3
TTN
GPER
WFIKKN2
MED1
EXOC8
B3GALT1
CELSR3
DSE
EVPL
PCDH1
TLR7
MOS
PCDH10
NTN1
SYNE1
ZIC1
CAND1 HYP IRS1
FAT2
CELSR3
RERE
SVEP1
DNAH3
GRM2
Sphyrna
Pangasius
Lepisosteus
Protopterus
IchthyophisB
atrachuperusR
anaM
us
StruthioZ
osteropsC
rocodylus
Trionyx
Podocnem
is
Sus
Hem
idactylusN
aja
Sphyrna
Pangasius
Lepisosteus
Protopterus
IchthyophisB
atrachuperusR
anaM
us
StruthioZ
osteropsC
rocodylus
Trionyx
Podocnem
is
Sus
Hem
idactylusN
aja
Sphyrna
Pangasius
Lepisosteus
Protopterus
IchthyophisB
atrachuperusR
anaM
us
StruthioZ
osteropsC
rocodylus
Trionyx
Podocnem
is
Sus
Hem
idactylusN
aja
FIG 2 PCR performance of the 102 NPCL markers in 16 divergent vertebrate species Each square and electrophoretic lane is aligned with the testedspecies (a) The draft divergence timescale for the 16 tested vertebrate species is based on Inoue et al (2010) and the book The Timetree of Life (b) ThePCR performance of each NPCL marker is ranked by three different-colored squares black producing single target band gray having a target band butwith significant nonspecific bands white no target band The102 NPCL markers are sorted according to their PCR success rates (c) Three typicalagarose electrophoresis results for 9 NPCL markers
2238
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
30 nuclear genes provide further support for the monophylyof lissamphibians and the Batrachia hypothesis (fig 4)Additionally all possible hypotheses against the monophylyof extant amphibians and the Batrachia hypothesis wererejected in our topological tests (table 2) However theBatrachia hypothesis did not receive strong support in ourspecies tree analysis (BPMP-EST = 74 fig 4) suggesting thatmore nuclear genes are still needed to resolve this node
The monophyly of the internally fertilizing salamanders(Salamandroidea all salamanders exclusive of HynobiidaeCryptobranchidae and Sirenidae) is strongly supported inour analyses (fig 4) in line with most recent molecular studies(Wiens et al 2005 Roelants et al 2007 Zhang and Wake 2009Pyron and Wiens 2011) but differing strongly from Frostet al (2006) who recovered a clade comprising SirenidaeDicamptodontidae Ambystomatidae and SalamandridaeThe internally fertilizing salamanders include two well-supported clades one is composed of AmbystomatidaeDicamptodontidae and Salamandridae and the otherof Proteidae Rhyacotritonidae Amphiumidae andPlethodontidae (fig 4)
Currently two hypotheses have been proposed regardingthe basal split within living salamanders The traditional viewfavors Sirenidae as the sister group to all remaining salaman-ders (Duellman and Trueb 1994) This hypothesis receivedstrong support in two recent studies (based on mitochon-drial genomes BPML = 98 Zhang and Wake 2009 based onmitochondrial genomes and nuclear genes BPMLgt 80 SanMauro 2010) In contrast some studies suggest that thebasal split separates Cryptobranchidae + Hynobiidae fromall other salamanders (Gao and Shubin 2001 Wiens et al2005 Frost et al 2006 Roelants et al 2007 Pyron and Wiens2011) but always without strong support (BPMLlt 71) Ourphylogenetic analyses based on 30 independent NPCLs sup-ported the second hypothesis that Cryptobranchoidea(Cryptobranchidae + Hynobiidae) branched first withinthe living salamanders This result is extremely robust inour concatenation analyses (BPML = 99 PPBAY = 10PPCAT = 10 fig 4) and statistically rejects all alternative hy-potheses (table 2) In the species tree analysis without dataconcatenation this result is also strong (BPMP-EST = 83fig 4)
How many nuclear genes then are needed to robustlyresolve the question of the basal split within living salaman-ders Our analysis of data subsets indicates a progressiveincrease in the bootstrap support value for the node of inter-est (fig 4) when an increasing number of genes are analyzed(fig 5) Analyses based on single genes rarely resolve the nodeof interest with any confidence Analyses based on 5ndash10 genesproduce bootstrap support values of 60ndash80 in concatena-tion analyses (fig 5) which is congruent with all previousnuclear studies using similar-sized data sets (Roelants et al2007 Pyron and Wiens 2011) Taking a bootstrap value of 95in concatenation analyses as the threshold of ldquofully resolvedrdquothe minimum number of nuclear genes needed to resolve theroot of the salamander tree is approximately 25 The previouscontradictory results may be due to the overwhelminglystrong signals from the mitochondrial genome Because
100908070604 0503020101020304060 50708090100
100908070604 0503020101020304060 50708090100
PCR success rate in 16 vertebrates () Relative evolutionary rate
FIG 3 Relative evolutionary rates of 102 NPCL in vertebrates The 102NPCLs are arranged in order of increasing variability on the right sideand their PCR success rates in the 16 tested vertebrates are shown onthe left side NPCLs indicated with asterisks may have extra copiesin teleost genomes and thus are not suitable for phylogenetic studiesof teleosts
2239
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
the initial diversification of salamanders occurred withina relatively short window of time (Weisrock et al 2005)the genealogical histories of individual gene loci maysometimes appear misleading in terms of the relation-ships among species due to incomplete lineage sortingUnfortunately the mitochondrial genome recorded such anincorrect history
DiscussionThe NPCL toolkit and experimental protocol introducedhere is a highly reliable rapid and cost-effective method forbuilding medium-scale multilocus data to produce high-resolution phylogenetic relationships This phylogenomicapproach has the potential to accelerate the completion ofmany parts of the vertebrate tree of life because no furthermarker development is required which is often the bottle-necks in phylogenetic research Once a specific phylogeneticquestion within vertebrates arises researchers simply need tocheck the list for our toolkit and look for NPCL markers withexpected evolutionary rates and experimental performance
for their groups of interest Then many orthologous loci canbe quickly obtained by traditional PCR and Sanger sequenc-ing usually without time-consuming gel cutting and cloningApplying the NPCL toolkit may also have another benefit forassembling the vertebrate tree of life because people workingon different groups can easily use the same set of loci whichwill facilitate combined analyses
Merits of the Toolkit
Because of the use of the nested PCR strategy outlined earliermost NPCL in the toolkit work for all major jawed vertebrategroups with high experimental success rates (nor-mallygt 95) Such results were achieved in unified PCR con-ditions without any extra effort involving cycling conditionoptimization This feature of the toolkit enables it to surpasspreviously developed nuclear marker sets (Murphy et al 2001Li et al 2007 Thomson et al 2008 Townsend et al 2008Wright et al 2008 Portik et al 2011 Shen et al 2011Zhou et al 2011) Most previous nuclear marker sets weredeveloped for specific animal groups and their application to
Table 1 Summary Information for the 30 NPCL Amplified in 19 Salamander Taxa
NOTEmdashLength length of refined alignment Var sites variable sites PI sites parsimony informative sites RCV relative composition variability
2240
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Aneides hardii
Plethodon jordani
Batrachoseps major
Eurycea bislineata
Amphiuma means
Rhyacotriton variegatus
Proteus anguinus
Necturus beyeri
Tylototriton asperrimus
Cynops orientalis
Salamandra salamandra
Dicamptodon aterrimus
Ambystoma mexicanum
Pseudobranchus axanthus
Siren intermedia
Ranodon sibiricus
Batrachuperus yenyuanensis
Onychodactylus fischeri
Andrias davidianus
Silurana tropicalis
Bombina fortinuptialis
Typhlonectes natans
Gallus gallus
Ichthyophis bannanicus
Homo sapiens
Chrysemys picta bellii
Mus musculus
Latimeria chalumnae
01 subsititutionssite
Dicamptodontidae
Hynobiidae
Cryptobranchidae
Sirenidae
Plethodontidae
Rhyacotritonidae
Amphiumidae
Proteidae
Salamandridae
Ambystomatidae
ANURA
GYMNOPHIONA
Cry
ptob
ranc
hoid
eaSa
lam
andr
oide
a
30 nuclear genes
(total 27834 bp)
1
Non-amphibianOutgroup
99101083
99101074
FIG 4 Higher-level phylogenetic relationships of 10 salamander families inferred from 30 NPCL markers The tree was inferred by concatenationanalyses using ML BI and the mixture model (CAT) and by species-tree analysis using the pseudo-ML approach (MP-EST) Branch support valuesare indicated beside nodes in order of ML bootstrap (BPML) BI posterior probability (PPBI) CAT posterior probability (PPCAT) and MP-EST bootstrap(BPMP-EST) from left to right The filled squares represent BPMLgt 95 PPBAY = 10 PPCAT = 10 and BPMP-ESTgt 95 The circled number refers to the nodeof interest studied in figure 6 Branch lengths are from the ML analysis
Table 2 Statistical Confidence (P Values) for Alternative Branching Hypotheses Based on 30-Gene Data Set
Alternative Topology Tested Ln L P Value Rejection
Sirenidae is sister to Cryptobranchoidea 423 0004 0002 0004 + + +
Gymnophiona is sister to Anura (monophyletic lissamphibians) 343 0013 0012 0013 + + +
Gymnophiona is sister to Caudata (monophyletic lissamphibians) 439 0002 0001 0001 + + +
Anura is sister to Amniota (paraphyletic lissamphibians) 1446 5E30 0 0 + + +
Gymnophiona is sister to Amniota (paraphyletic lissamphibians) 1290 1E69 0 0 + + +
Caudata is sister to Amniota (paraphyletic lissamphibians) 1728 00001 0 0 + + +
2241
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
other distantly related groups is usually difficult For exampleSpinks et al (2010) collected 120 nuclear markers from aviansquamate and mammalian phylogenetic studies and evalu-ated their PCR performance in turtles They found that onlyeight nuclear markers successfully produced single expectedbands across 13 tested turtle species In another case Fongand Fujita (2011) developed 75 nuclear markers for vertebratephylogenetics but approximately 60 of the target fragmentswere unable to obtain in three test species (two reptiles andone lissamphibian) Therefore although the nested PCRmethod introduced here requires an additional PCR reactionthe extra work is still worthwhile
In PCR-based phylogenetic projects even when thePCR reactions are successful the products often contain sig-nificant nonspecific amplicons Such a condition requires ad-ditional effort involving gel purification and cloning whichinvolves much more time than the PCR reaction Our NPCLtoolkit is specifically designed to solve this problem so thatnormally over 90 of PCR reactions produce strong andsingle expected bands Moreover most of the primers usedto date for nuclear marker sets are degenerate and thus arenot suitable for direct sequencing PCR products Benefitingfrom the use of our nested PCR strategy we introduce an-choring sequences to the ends of PCR fragments while main-taining PCR efficiency Such introduced anchoring sequencesbring the added benefit that two specific sequencing primers(Seq_F and Seq_R) can be used in all Sanger sequencingreactions
One additional feature of our NPCL toolkit is that theaverage length of the NPCLs within it is 1050 bp a lengththat can easily be amplified in one PCR reaction and se-quenced in both directions to allow efficient use of resourcesIn contrast the average marker lengths are 920 bp for 10NPCLs in Li et al (2007) 760 bp for 26 NPCLs in Townsendet al (2008) 873 bp for 22 NPCLs in Shen et al (2011) and470 bp for 75 NPCLs in Fong and Fujita (2011) respectivelyLonger markers will provide more sites than shorter ones for
equivalent money and time This feature makes our NPCLtoolkit more cost-effective than previously developed nuclearmarker sets
Phylogenetic Utility
The vertebrate NPCL toolkit we developed here shows greatpromise in terms of phylogenetic utility A remarkable featureof our NPCL toolkit is that it provided 102 NPCLs with abroad range of evolutionary rates In the case of our demon-stration we used 30 NPCLs to resolve a family-level salaman-der phylogeny using both traditional concatenation analysesand a more promising species-tree analysis However thisexample does not mean that our toolkit performs wellonly on deep-timescale questions Our ongoing study usingthis toolkit to resolve the intra-relationships withinPlethodontidae a rapidly radiating group of salamanders sug-gests that the toolkit developed here also performs well inresolving species-level phylogenies For many vertebrategroups in which applicable nuclear markers are limitedsuch as some teleosts frogs and salamanders our NPCLtoolkit can provide a one-stop solution for phylogenetic stud-ies from the family level to the species level Even for thosegroups in which specific nuclear marker sets have beendeveloped our toolkit is still worth trying as many moreloci can be easily obtained that may resolve some difficultbranches
The Toolkit Is a Good Addition to Sequence CaptureApproaches
Recently sequence capture approaches have been applied tovertebrate phylogenomics (Crawford et al 2012 Fairclothet al 2012 Lemmon et al 2012 McCormack et al 2012)These approaches begin with the selective capture of geno-mic regions Briefly fragmented gDNA is hybridized to DNAor RNA probes either on an array or in solution NontargetedDNA is then washed away and the targeted DNA is se-quenced through NGS The most promising feature of thesequence capture approach is that it can simultaneously pro-duce hundreds to thousands of loci for tens of individualswithin a relatively short time Therefore the sequence captureapproach is considered to be much more cost-effective thanthe PCR-based method According to the calculation ofLemmon et al (2012) for a 100 taxa 500 loci project thecost of the sequence capture method is just 1ndash35 of thePCR-based method
However the sequence capture approach is currently toochallenging for most phylogenetic researchers Typical NGSruns (454 or Illumina) used by the sequence capture methodgenerate 1000000ndash2000000000 sequences Storing and pro-cessing these NGS data require significant computer memoryhardware upgrades and bioinformatic programming skillswhich are often not familiar to most phylogenetic researchersMoreover phylogenetic reconstruction assumes that ortho-logous genes are being analyzed across species For the PCR-based method the detection of paralogous genes is relativelystraightforward However in the sequence capture methodthe captured genomic regions comprise short conservedcores (probe regions) and long unconserved flanking
1
100
90
80
70
60
50
30
20
10
0
40
5 10 15 20 25 30
Concatenation analyses
Species tree analyses
Boo
tstr
ap s
uppo
rt (
)
Number of genes
FIG 5 The effect of increasing the number of nuclear loci on resolvingthe basal split within salamanders Each data point represents the meanof support values estimated from 30 randomly sampled subsets Thedashed line indicates the threshold of 95 bootstrap support valuesThe statistical plots show that the minimum number of nuclear locineeded to robustly resolve the basal split within salamanders is 25
2242
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
sequences Because paralogy cannot be detected until afterthe data are aligned those unalignable sequences will makethe detection of paralogy more difficult
In fact not every phylogenetic project will use more than500 loci as the sequence capture method normally doesBased on both empirical and simulation data 20ndash50 lociare generally sufficient to answer many phylogenetic ques-tions (Rokas et al 2003 Spinks et al 2009) This is also thenumber of loci that most phylogenetic studies will use Insuch a situation adopting the sequence capture method isnot cost-effective because researchers need to use relativelyexpensive NGS sequencing and spend time learning new ex-perimental techniques and carrying out sophisticated bioin-formatic processing Our NPCL toolkit is specially designed forsuch medium-scale phylogenetic projects using approxi-mately 50 loci Such a number of expected loci can beeasily fulfilled with our 102 NPCLs Because more than 90of the PCR reactions generated by our toolkit can be directlysequenced the average cost for one locus per sample is ratherlow In our laboratory generating one new sequence typicallycosts US$ 3 (without considering labor)
In addition researchers sometimes have only tiny amountsof DNA but they wish to perform a multilocus phylogeneticanalysis In such a situation the sequence capture method isdifficult to implement because it normally requires DNA atthe microgram level (Lemmon et al 2012) Our NPCL toolkitcan fill the gap here Benefiting from the use of the nestedPCR strategy the sensitivity of PCR reactions in our method isextremely high In many test experiments in our laboratorythe toolkit and protocol could produce target bands withonly 5ndash10 ng of DNA
Our NPCL toolkit is an alternative to the sequence capturemethod for the everyday work of phylogenetic researchersWhich method to choose depends on two major drivers theamounts of DNA and the expected number of loci Whenyour DNA is limited the better solution may be PCR other-wise sequence capture also works Taking into account themoney and time the two methods require we speculate thatthe economic transition point from PCR to sequence captureis at approximately 100 loci That assessment is why ourtoolkit includes 102 NPCL markers Our proposal is thatwhen using 100 loci one can try our NPCL toolkit whenusing gt100 loci sequence capture should be used
Future Directions
In this study we used multiple genome alignments depositedin the University of CaliforniandashSan Cruz (UCSC) genomebrowser to identify long and conserved exons across jawedvertebrates Benefiting from the use of a nested PCR strategythe experimental performance of the developed NPCLs indi-cated that they are highly stable in all major jawed vertebrategroups Recently a database for mining exon and intron mar-kers called EvolMarkers has been built by Li et al (2012)Careful investigation of this database may identify many con-served exons within nonvertebrates whose interrelationshipsare currently more problematic than those of vertebratesBecause the nonvertebrates constitute many distantly related
groups it may be impossible to develop a single set of PCRprimers for all nonvertebrates However following a similarmarker development strategy multiple NPCL toolkits couldbe constructed for various groups of nonvertebrates such asarthropods echinoderms and molluscs In addition becauseintrons are flanked by conserved exons the idea of the use ofnested PCRs for marker development could also be applied tothe development of EPIC (exon-primed intron crossing) mar-kers which are more suitable in shallow-scale phylogenetic orphylogeographic projects
Despite the benefits of our proposed method it must berecognized that when handling large-scale projects such as200 taxa 100 loci the use of our toolkit and Sanger se-quencing will still require significant cost time and laborAn alternative solution is to use NGS to replace Sanger se-quencing Recently 454 NGS technology has been applied tosequence-targeted gene regions from a pool of PCR productsfrom different specimens (Binladen et al 2007 Meyer et al2008) In such experiments specific tagging sequences mustbe added to amplicons by either PCR (Binladen et al 2007) orblunt-end ligation (Meyer et al 2008) Therefore if the tailingsequences of the second-round PCR primers in our NPCLtoolkit are replaced by tagging sequences instead (for tagdesigning see Faircloth and Glenn 2012) all PCR productscan be pooled together and sequenced with the 454 NGSwhich will greatly reduce the money and time cost comparedwith Sanger sequencing However parallel tagged sequencingvia NGS does not circumvent the process of PCR for eachindividual at each locus which may be the most onerous partof a large-scale phylogenomic project Some promising newtechnologies may help to solve this problem such as micro-droplet PCR (Tewhey et al 2009) where millions of individualPCR reactions are performed in picoliter-scale droplets simul-taneously and the 9696 Dynamic Array by Fluidigm whichallows 96 primer combinations to be used on 96 samples(9216 total PCR reactions) on a single PCR plate Howeverthere has been little research to applying NGS and new high-throughput PCR technologies to phylogenomics so theirease-of-use and cost-effectiveness still need to be explored
Summary
In conclusion we have developed an improved method forrapidly amplifying and sequencing NPCLs that has proven tobe useful and effective for molecular phylogenetic studies ofvertebrates The newly developed toolkit provides an attrac-tive alternative to available methods for vertebratephylogenomics
Materials and Methods
Development of NPCL and Primer Design
Our previous study showed that nested PCR is overwhel-mingly more effective than conventional PCR for obtainingtarget amplicons from complex genomic environments (Shenet al 2012) However nested PCR requires four conservedregions to design two pairs of primers (illustrated in fig 6yellow blocks represent the conserved regions used for primerdesign) which means that only relatively long exons are
2243
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li
et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons
As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the
Enrich target region from complex genomic environment with one pair of high degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min
Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min
Evaluate agarose gel electrophoretic results and sequencing
gDNA
the first round PCR product
Second PCR with tailed primers F2 and R2 using the 1st PCR as template
First PCR with primers F1 and R1 using gDNA as template
PCR evaluating and sequencing
25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing
A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product
PCR was performed with 50-100 ng DNA in a 25 ul reaction
PCR was performed with 1ul 1st PCR in a 25 ul reaction
(i)
(ii)
(iii)
F1
R2
F2
Seq_F
Seq_R
R1
Target Region
Target Region
Target Region
Target Region
Target Region
conserved blocks
single and strong bandN
Y (normally gt 90)
gel cutting or cloning then sequencing
cleanup with ExoI and FastAPdirect sequencing by general sequencing primers
Seq_F and Seq_R
Laboratory Protocol
FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region
2244
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR
Experimental Testing for Candidate Markers in16 Jawed Vertebrates
To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification
All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C
One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the
PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6
Calculation of Relative Evolutionary Rateof 102 NPCLs
The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates
Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships
To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant
2245
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online
Phylogenetic Analyses
Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)
For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model
The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence
We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree
Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8
model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates
Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)
Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)
Acknowledgments
The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)
ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R
Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197
Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786
Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375
Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press
Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749
Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543
Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726
Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990
Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307
Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370
2246
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577
Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321
Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563
Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586
Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518
Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179
Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276
Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701
Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288
Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145
Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744
Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744
Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971
Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302
McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754
McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538
Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945
Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278
Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618
Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712
Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620
Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10
Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583
Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892
Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804
Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542
Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214
San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561
San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599
Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252
Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256
Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508
Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247
Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947
Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129
Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956
Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690
Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031
Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525
Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195
Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142
Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777
Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110
2247
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156
Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508
Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400
Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140
2248
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Phylogenetic Performance in a Real Case
Our demonstration case included 19 salamander species thatspan salamander evolutionary diversity (supplementary tableS2 Supplementary Material online) The nine outgroupspecies (two frogs two caecilians one turtle one bird twomammals and one coelacanth) provided a largely balancedrepresentation of relatives of salamanders The 30 newlyamplified NPCLs exhibited levels of variation comparablewith that of the traditional RAG1 gene with variable sitesvarying between 30 and 51 of all sequenced sites (table 1)The data set combining these 30 NPCLs comprises 27834 bpand exhibits little substitution saturation (supplementary figS1 Supplementary Material online) The phylogenetic analy-ses of the concatenated data set using three tree-buildingmethods (maximum likelihood [ML] Bayesian and CAT-mixture model) produced an identical fully resolved treefor 28 taxa (fig 4) In all 25 nodes of the tree the statisticalsupport was highly robust (BPML 99ndash100 PPBAY = 10PPCAT = 10) The species tree estimated from 30 individualNPCLs without data concatenation using the pseudo-ML
approach is identical to those estimated from the concate-nated analyses All nodes received bootstrap support valuesvarying between 74 and 100 (fig 4) We also conducted phy-logenetic analyses at the amino acid level (9278 deducedamino acid residues) using three tree-building methods(ML Bayesian and CAT-mixture model) The protein treetopology is identical to the DNA result with just slightlylower branch support for some nodes (supplementary figS2 Supplementary Material online) Therefore we did notfurther analyze the protein data set
The monophyly of extant amphibians with respect to am-niotes and the close relationship between frogs and salaman-ders (the Batrachia hypothesis) are repeatedly recovered inmost recent molecular studies (San Mauro et al 2005 Zhanget al 2005 Frost et al 2006 Hugall et al 2007 Roelants et al2007 Zhang and Wake 2009 San Mauro 2010 Pyron andWiens 2011) However a recent molecular study based on26 nuclear genes (Fong et al 2012) supports a caecilianndashsalamander sister relationship with the possible paraphylyof extant amphibians Our phylogenetic analyses based on
RERE
KIAA2013SPEN
CPT2
LPHN2
LRRC8D
DISP1
EXOC8
GGPS1
1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22 X
KCNF1
SOCS5
MSH6
LRRTM4
LCT
CXCR4
B3GALT1
TTN
IRS1
SH3BP4
LRRN1
CELSR3
GRM2
CASR
ZIC1
P2RY1
MB21D2
KIAA1239
ANKRD50
FAT4PCDH10
FAT1
ENC1
DMXL1
PCDH1ARSI
FAT2
FILIP1DOPEY1
FUT9
REV3L
DSE
SYNE1
GPERMIOS
KBTBD2
PCLO
PIK3CG
LRRN3
EXTL3
MOS
VCPIP1
PDP1
ZFPM2
ZHX2
LINGO2ZEB1
ROR2
GRIN3A
SVEP1
DBC1
DOLK
DCHS1
RAG1RAG2HYPCHST1
FZD4
ARID2
CAND1
MGAT4C
FICD
SACS
FREM2
MYCBP2
SLITRK1
LIG4
STON2
DISP2
VPS18
CILPFEM1BGLCELINGO1
DET1
PPL
DNAH3
SALL1
NTN1
MED1
WFIKKN2
MED13BPTF
EVPL
SETBP1
DSEL
FLRT3
ADNPZBED4
PANX2
TLR7NHS
RP2
FIG 1 Chromosome mapping of the 102 NPCL markers in the Homo sapiens genome
2237
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
(a)
(b)
(c)4500bp
1200bp
200bp
4500bp
4500bp
1200bp
200bp
1200bp
200bp
4500bp
1200bp
200bp
4500bp
4500bp
1200bp
200bp
1200bp
200bp
4500bp
1200bp
200bp
4500bp
4500bp
1200bp
200bp
1200bp
200bp
Tim
e (M
a) 300
200
100
400
0
457
Tim
e (M
a) 300
200
100
400
0
457
Tim
e (M
a) 300
200
100
400
0
457
102 NPCLs
34 NP
CLs
16 Taxa 16 Taxa 16 Taxa 34 NP
CLs
34 NP
CLs
ADNP
ANKRD50
ARSI
BPTF
CAND1
CASR
CXCR4
DBC1
DISP1
DISP2
DNAH3
DOPEY1
ENC1
EXTL3
FAT1
FAT4
FICD
FLRT3
FZD4
GGPS1
GLCE
GRIN3A
GRM2
KBTBD2
KCNF1
KIAA2013
LIG4
LINGO1
LPHN2
LRRN1
MB21D2
MIOS
MYCBP2
NHS
P2RY1
PANX2
PIK3CG
RAG1
RAG2
ROR2
SACS
SALL1
SETBP1
SOCS5
SPEN
STON2
VPS18
ZEB1
ZFPM2
ZHX2
HYP
CHST1
RERE
SVEP1
LCT
PDP1
MED13
LINGO2
LRRC8D
LRRTM4
RP2
SH3BP4
VCPIP1
DET1
FREM2
MSH6
PCLO
PPL
ARID2
DCHS1
DSEL
FILIP1
KIAA1239
SLITRK1
CPT2
MGAT4C
FEM1B
DMXL1
DOLK
ZBED4
REV3L
IRS1
FAT2
CILP
FUT9
LRRN3
TTN
GPER
WFIKKN2
MED1
EXOC8
B3GALT1
CELSR3
DSE
EVPL
PCDH1
TLR7
MOS
PCDH10
NTN1
SYNE1
ZIC1
CAND1 HYP IRS1
FAT2
CELSR3
RERE
SVEP1
DNAH3
GRM2
Sphyrna
Pangasius
Lepisosteus
Protopterus
IchthyophisB
atrachuperusR
anaM
us
StruthioZ
osteropsC
rocodylus
Trionyx
Podocnem
is
Sus
Hem
idactylusN
aja
Sphyrna
Pangasius
Lepisosteus
Protopterus
IchthyophisB
atrachuperusR
anaM
us
StruthioZ
osteropsC
rocodylus
Trionyx
Podocnem
is
Sus
Hem
idactylusN
aja
Sphyrna
Pangasius
Lepisosteus
Protopterus
IchthyophisB
atrachuperusR
anaM
us
StruthioZ
osteropsC
rocodylus
Trionyx
Podocnem
is
Sus
Hem
idactylusN
aja
FIG 2 PCR performance of the 102 NPCL markers in 16 divergent vertebrate species Each square and electrophoretic lane is aligned with the testedspecies (a) The draft divergence timescale for the 16 tested vertebrate species is based on Inoue et al (2010) and the book The Timetree of Life (b) ThePCR performance of each NPCL marker is ranked by three different-colored squares black producing single target band gray having a target band butwith significant nonspecific bands white no target band The102 NPCL markers are sorted according to their PCR success rates (c) Three typicalagarose electrophoresis results for 9 NPCL markers
2238
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
30 nuclear genes provide further support for the monophylyof lissamphibians and the Batrachia hypothesis (fig 4)Additionally all possible hypotheses against the monophylyof extant amphibians and the Batrachia hypothesis wererejected in our topological tests (table 2) However theBatrachia hypothesis did not receive strong support in ourspecies tree analysis (BPMP-EST = 74 fig 4) suggesting thatmore nuclear genes are still needed to resolve this node
The monophyly of the internally fertilizing salamanders(Salamandroidea all salamanders exclusive of HynobiidaeCryptobranchidae and Sirenidae) is strongly supported inour analyses (fig 4) in line with most recent molecular studies(Wiens et al 2005 Roelants et al 2007 Zhang and Wake 2009Pyron and Wiens 2011) but differing strongly from Frostet al (2006) who recovered a clade comprising SirenidaeDicamptodontidae Ambystomatidae and SalamandridaeThe internally fertilizing salamanders include two well-supported clades one is composed of AmbystomatidaeDicamptodontidae and Salamandridae and the otherof Proteidae Rhyacotritonidae Amphiumidae andPlethodontidae (fig 4)
Currently two hypotheses have been proposed regardingthe basal split within living salamanders The traditional viewfavors Sirenidae as the sister group to all remaining salaman-ders (Duellman and Trueb 1994) This hypothesis receivedstrong support in two recent studies (based on mitochon-drial genomes BPML = 98 Zhang and Wake 2009 based onmitochondrial genomes and nuclear genes BPMLgt 80 SanMauro 2010) In contrast some studies suggest that thebasal split separates Cryptobranchidae + Hynobiidae fromall other salamanders (Gao and Shubin 2001 Wiens et al2005 Frost et al 2006 Roelants et al 2007 Pyron and Wiens2011) but always without strong support (BPMLlt 71) Ourphylogenetic analyses based on 30 independent NPCLs sup-ported the second hypothesis that Cryptobranchoidea(Cryptobranchidae + Hynobiidae) branched first withinthe living salamanders This result is extremely robust inour concatenation analyses (BPML = 99 PPBAY = 10PPCAT = 10 fig 4) and statistically rejects all alternative hy-potheses (table 2) In the species tree analysis without dataconcatenation this result is also strong (BPMP-EST = 83fig 4)
How many nuclear genes then are needed to robustlyresolve the question of the basal split within living salaman-ders Our analysis of data subsets indicates a progressiveincrease in the bootstrap support value for the node of inter-est (fig 4) when an increasing number of genes are analyzed(fig 5) Analyses based on single genes rarely resolve the nodeof interest with any confidence Analyses based on 5ndash10 genesproduce bootstrap support values of 60ndash80 in concatena-tion analyses (fig 5) which is congruent with all previousnuclear studies using similar-sized data sets (Roelants et al2007 Pyron and Wiens 2011) Taking a bootstrap value of 95in concatenation analyses as the threshold of ldquofully resolvedrdquothe minimum number of nuclear genes needed to resolve theroot of the salamander tree is approximately 25 The previouscontradictory results may be due to the overwhelminglystrong signals from the mitochondrial genome Because
100908070604 0503020101020304060 50708090100
100908070604 0503020101020304060 50708090100
PCR success rate in 16 vertebrates () Relative evolutionary rate
FIG 3 Relative evolutionary rates of 102 NPCL in vertebrates The 102NPCLs are arranged in order of increasing variability on the right sideand their PCR success rates in the 16 tested vertebrates are shown onthe left side NPCLs indicated with asterisks may have extra copiesin teleost genomes and thus are not suitable for phylogenetic studiesof teleosts
2239
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
the initial diversification of salamanders occurred withina relatively short window of time (Weisrock et al 2005)the genealogical histories of individual gene loci maysometimes appear misleading in terms of the relation-ships among species due to incomplete lineage sortingUnfortunately the mitochondrial genome recorded such anincorrect history
DiscussionThe NPCL toolkit and experimental protocol introducedhere is a highly reliable rapid and cost-effective method forbuilding medium-scale multilocus data to produce high-resolution phylogenetic relationships This phylogenomicapproach has the potential to accelerate the completion ofmany parts of the vertebrate tree of life because no furthermarker development is required which is often the bottle-necks in phylogenetic research Once a specific phylogeneticquestion within vertebrates arises researchers simply need tocheck the list for our toolkit and look for NPCL markers withexpected evolutionary rates and experimental performance
for their groups of interest Then many orthologous loci canbe quickly obtained by traditional PCR and Sanger sequenc-ing usually without time-consuming gel cutting and cloningApplying the NPCL toolkit may also have another benefit forassembling the vertebrate tree of life because people workingon different groups can easily use the same set of loci whichwill facilitate combined analyses
Merits of the Toolkit
Because of the use of the nested PCR strategy outlined earliermost NPCL in the toolkit work for all major jawed vertebrategroups with high experimental success rates (nor-mallygt 95) Such results were achieved in unified PCR con-ditions without any extra effort involving cycling conditionoptimization This feature of the toolkit enables it to surpasspreviously developed nuclear marker sets (Murphy et al 2001Li et al 2007 Thomson et al 2008 Townsend et al 2008Wright et al 2008 Portik et al 2011 Shen et al 2011Zhou et al 2011) Most previous nuclear marker sets weredeveloped for specific animal groups and their application to
Table 1 Summary Information for the 30 NPCL Amplified in 19 Salamander Taxa
NOTEmdashLength length of refined alignment Var sites variable sites PI sites parsimony informative sites RCV relative composition variability
2240
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Aneides hardii
Plethodon jordani
Batrachoseps major
Eurycea bislineata
Amphiuma means
Rhyacotriton variegatus
Proteus anguinus
Necturus beyeri
Tylototriton asperrimus
Cynops orientalis
Salamandra salamandra
Dicamptodon aterrimus
Ambystoma mexicanum
Pseudobranchus axanthus
Siren intermedia
Ranodon sibiricus
Batrachuperus yenyuanensis
Onychodactylus fischeri
Andrias davidianus
Silurana tropicalis
Bombina fortinuptialis
Typhlonectes natans
Gallus gallus
Ichthyophis bannanicus
Homo sapiens
Chrysemys picta bellii
Mus musculus
Latimeria chalumnae
01 subsititutionssite
Dicamptodontidae
Hynobiidae
Cryptobranchidae
Sirenidae
Plethodontidae
Rhyacotritonidae
Amphiumidae
Proteidae
Salamandridae
Ambystomatidae
ANURA
GYMNOPHIONA
Cry
ptob
ranc
hoid
eaSa
lam
andr
oide
a
30 nuclear genes
(total 27834 bp)
1
Non-amphibianOutgroup
99101083
99101074
FIG 4 Higher-level phylogenetic relationships of 10 salamander families inferred from 30 NPCL markers The tree was inferred by concatenationanalyses using ML BI and the mixture model (CAT) and by species-tree analysis using the pseudo-ML approach (MP-EST) Branch support valuesare indicated beside nodes in order of ML bootstrap (BPML) BI posterior probability (PPBI) CAT posterior probability (PPCAT) and MP-EST bootstrap(BPMP-EST) from left to right The filled squares represent BPMLgt 95 PPBAY = 10 PPCAT = 10 and BPMP-ESTgt 95 The circled number refers to the nodeof interest studied in figure 6 Branch lengths are from the ML analysis
Table 2 Statistical Confidence (P Values) for Alternative Branching Hypotheses Based on 30-Gene Data Set
Alternative Topology Tested Ln L P Value Rejection
Sirenidae is sister to Cryptobranchoidea 423 0004 0002 0004 + + +
Gymnophiona is sister to Anura (monophyletic lissamphibians) 343 0013 0012 0013 + + +
Gymnophiona is sister to Caudata (monophyletic lissamphibians) 439 0002 0001 0001 + + +
Anura is sister to Amniota (paraphyletic lissamphibians) 1446 5E30 0 0 + + +
Gymnophiona is sister to Amniota (paraphyletic lissamphibians) 1290 1E69 0 0 + + +
Caudata is sister to Amniota (paraphyletic lissamphibians) 1728 00001 0 0 + + +
2241
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
other distantly related groups is usually difficult For exampleSpinks et al (2010) collected 120 nuclear markers from aviansquamate and mammalian phylogenetic studies and evalu-ated their PCR performance in turtles They found that onlyeight nuclear markers successfully produced single expectedbands across 13 tested turtle species In another case Fongand Fujita (2011) developed 75 nuclear markers for vertebratephylogenetics but approximately 60 of the target fragmentswere unable to obtain in three test species (two reptiles andone lissamphibian) Therefore although the nested PCRmethod introduced here requires an additional PCR reactionthe extra work is still worthwhile
In PCR-based phylogenetic projects even when thePCR reactions are successful the products often contain sig-nificant nonspecific amplicons Such a condition requires ad-ditional effort involving gel purification and cloning whichinvolves much more time than the PCR reaction Our NPCLtoolkit is specifically designed to solve this problem so thatnormally over 90 of PCR reactions produce strong andsingle expected bands Moreover most of the primers usedto date for nuclear marker sets are degenerate and thus arenot suitable for direct sequencing PCR products Benefitingfrom the use of our nested PCR strategy we introduce an-choring sequences to the ends of PCR fragments while main-taining PCR efficiency Such introduced anchoring sequencesbring the added benefit that two specific sequencing primers(Seq_F and Seq_R) can be used in all Sanger sequencingreactions
One additional feature of our NPCL toolkit is that theaverage length of the NPCLs within it is 1050 bp a lengththat can easily be amplified in one PCR reaction and se-quenced in both directions to allow efficient use of resourcesIn contrast the average marker lengths are 920 bp for 10NPCLs in Li et al (2007) 760 bp for 26 NPCLs in Townsendet al (2008) 873 bp for 22 NPCLs in Shen et al (2011) and470 bp for 75 NPCLs in Fong and Fujita (2011) respectivelyLonger markers will provide more sites than shorter ones for
equivalent money and time This feature makes our NPCLtoolkit more cost-effective than previously developed nuclearmarker sets
Phylogenetic Utility
The vertebrate NPCL toolkit we developed here shows greatpromise in terms of phylogenetic utility A remarkable featureof our NPCL toolkit is that it provided 102 NPCLs with abroad range of evolutionary rates In the case of our demon-stration we used 30 NPCLs to resolve a family-level salaman-der phylogeny using both traditional concatenation analysesand a more promising species-tree analysis However thisexample does not mean that our toolkit performs wellonly on deep-timescale questions Our ongoing study usingthis toolkit to resolve the intra-relationships withinPlethodontidae a rapidly radiating group of salamanders sug-gests that the toolkit developed here also performs well inresolving species-level phylogenies For many vertebrategroups in which applicable nuclear markers are limitedsuch as some teleosts frogs and salamanders our NPCLtoolkit can provide a one-stop solution for phylogenetic stud-ies from the family level to the species level Even for thosegroups in which specific nuclear marker sets have beendeveloped our toolkit is still worth trying as many moreloci can be easily obtained that may resolve some difficultbranches
The Toolkit Is a Good Addition to Sequence CaptureApproaches
Recently sequence capture approaches have been applied tovertebrate phylogenomics (Crawford et al 2012 Fairclothet al 2012 Lemmon et al 2012 McCormack et al 2012)These approaches begin with the selective capture of geno-mic regions Briefly fragmented gDNA is hybridized to DNAor RNA probes either on an array or in solution NontargetedDNA is then washed away and the targeted DNA is se-quenced through NGS The most promising feature of thesequence capture approach is that it can simultaneously pro-duce hundreds to thousands of loci for tens of individualswithin a relatively short time Therefore the sequence captureapproach is considered to be much more cost-effective thanthe PCR-based method According to the calculation ofLemmon et al (2012) for a 100 taxa 500 loci project thecost of the sequence capture method is just 1ndash35 of thePCR-based method
However the sequence capture approach is currently toochallenging for most phylogenetic researchers Typical NGSruns (454 or Illumina) used by the sequence capture methodgenerate 1000000ndash2000000000 sequences Storing and pro-cessing these NGS data require significant computer memoryhardware upgrades and bioinformatic programming skillswhich are often not familiar to most phylogenetic researchersMoreover phylogenetic reconstruction assumes that ortho-logous genes are being analyzed across species For the PCR-based method the detection of paralogous genes is relativelystraightforward However in the sequence capture methodthe captured genomic regions comprise short conservedcores (probe regions) and long unconserved flanking
1
100
90
80
70
60
50
30
20
10
0
40
5 10 15 20 25 30
Concatenation analyses
Species tree analyses
Boo
tstr
ap s
uppo
rt (
)
Number of genes
FIG 5 The effect of increasing the number of nuclear loci on resolvingthe basal split within salamanders Each data point represents the meanof support values estimated from 30 randomly sampled subsets Thedashed line indicates the threshold of 95 bootstrap support valuesThe statistical plots show that the minimum number of nuclear locineeded to robustly resolve the basal split within salamanders is 25
2242
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
sequences Because paralogy cannot be detected until afterthe data are aligned those unalignable sequences will makethe detection of paralogy more difficult
In fact not every phylogenetic project will use more than500 loci as the sequence capture method normally doesBased on both empirical and simulation data 20ndash50 lociare generally sufficient to answer many phylogenetic ques-tions (Rokas et al 2003 Spinks et al 2009) This is also thenumber of loci that most phylogenetic studies will use Insuch a situation adopting the sequence capture method isnot cost-effective because researchers need to use relativelyexpensive NGS sequencing and spend time learning new ex-perimental techniques and carrying out sophisticated bioin-formatic processing Our NPCL toolkit is specially designed forsuch medium-scale phylogenetic projects using approxi-mately 50 loci Such a number of expected loci can beeasily fulfilled with our 102 NPCLs Because more than 90of the PCR reactions generated by our toolkit can be directlysequenced the average cost for one locus per sample is ratherlow In our laboratory generating one new sequence typicallycosts US$ 3 (without considering labor)
In addition researchers sometimes have only tiny amountsof DNA but they wish to perform a multilocus phylogeneticanalysis In such a situation the sequence capture method isdifficult to implement because it normally requires DNA atthe microgram level (Lemmon et al 2012) Our NPCL toolkitcan fill the gap here Benefiting from the use of the nestedPCR strategy the sensitivity of PCR reactions in our method isextremely high In many test experiments in our laboratorythe toolkit and protocol could produce target bands withonly 5ndash10 ng of DNA
Our NPCL toolkit is an alternative to the sequence capturemethod for the everyday work of phylogenetic researchersWhich method to choose depends on two major drivers theamounts of DNA and the expected number of loci Whenyour DNA is limited the better solution may be PCR other-wise sequence capture also works Taking into account themoney and time the two methods require we speculate thatthe economic transition point from PCR to sequence captureis at approximately 100 loci That assessment is why ourtoolkit includes 102 NPCL markers Our proposal is thatwhen using 100 loci one can try our NPCL toolkit whenusing gt100 loci sequence capture should be used
Future Directions
In this study we used multiple genome alignments depositedin the University of CaliforniandashSan Cruz (UCSC) genomebrowser to identify long and conserved exons across jawedvertebrates Benefiting from the use of a nested PCR strategythe experimental performance of the developed NPCLs indi-cated that they are highly stable in all major jawed vertebrategroups Recently a database for mining exon and intron mar-kers called EvolMarkers has been built by Li et al (2012)Careful investigation of this database may identify many con-served exons within nonvertebrates whose interrelationshipsare currently more problematic than those of vertebratesBecause the nonvertebrates constitute many distantly related
groups it may be impossible to develop a single set of PCRprimers for all nonvertebrates However following a similarmarker development strategy multiple NPCL toolkits couldbe constructed for various groups of nonvertebrates such asarthropods echinoderms and molluscs In addition becauseintrons are flanked by conserved exons the idea of the use ofnested PCRs for marker development could also be applied tothe development of EPIC (exon-primed intron crossing) mar-kers which are more suitable in shallow-scale phylogenetic orphylogeographic projects
Despite the benefits of our proposed method it must berecognized that when handling large-scale projects such as200 taxa 100 loci the use of our toolkit and Sanger se-quencing will still require significant cost time and laborAn alternative solution is to use NGS to replace Sanger se-quencing Recently 454 NGS technology has been applied tosequence-targeted gene regions from a pool of PCR productsfrom different specimens (Binladen et al 2007 Meyer et al2008) In such experiments specific tagging sequences mustbe added to amplicons by either PCR (Binladen et al 2007) orblunt-end ligation (Meyer et al 2008) Therefore if the tailingsequences of the second-round PCR primers in our NPCLtoolkit are replaced by tagging sequences instead (for tagdesigning see Faircloth and Glenn 2012) all PCR productscan be pooled together and sequenced with the 454 NGSwhich will greatly reduce the money and time cost comparedwith Sanger sequencing However parallel tagged sequencingvia NGS does not circumvent the process of PCR for eachindividual at each locus which may be the most onerous partof a large-scale phylogenomic project Some promising newtechnologies may help to solve this problem such as micro-droplet PCR (Tewhey et al 2009) where millions of individualPCR reactions are performed in picoliter-scale droplets simul-taneously and the 9696 Dynamic Array by Fluidigm whichallows 96 primer combinations to be used on 96 samples(9216 total PCR reactions) on a single PCR plate Howeverthere has been little research to applying NGS and new high-throughput PCR technologies to phylogenomics so theirease-of-use and cost-effectiveness still need to be explored
Summary
In conclusion we have developed an improved method forrapidly amplifying and sequencing NPCLs that has proven tobe useful and effective for molecular phylogenetic studies ofvertebrates The newly developed toolkit provides an attrac-tive alternative to available methods for vertebratephylogenomics
Materials and Methods
Development of NPCL and Primer Design
Our previous study showed that nested PCR is overwhel-mingly more effective than conventional PCR for obtainingtarget amplicons from complex genomic environments (Shenet al 2012) However nested PCR requires four conservedregions to design two pairs of primers (illustrated in fig 6yellow blocks represent the conserved regions used for primerdesign) which means that only relatively long exons are
2243
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li
et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons
As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the
Enrich target region from complex genomic environment with one pair of high degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min
Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min
Evaluate agarose gel electrophoretic results and sequencing
gDNA
the first round PCR product
Second PCR with tailed primers F2 and R2 using the 1st PCR as template
First PCR with primers F1 and R1 using gDNA as template
PCR evaluating and sequencing
25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing
A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product
PCR was performed with 50-100 ng DNA in a 25 ul reaction
PCR was performed with 1ul 1st PCR in a 25 ul reaction
(i)
(ii)
(iii)
F1
R2
F2
Seq_F
Seq_R
R1
Target Region
Target Region
Target Region
Target Region
Target Region
conserved blocks
single and strong bandN
Y (normally gt 90)
gel cutting or cloning then sequencing
cleanup with ExoI and FastAPdirect sequencing by general sequencing primers
Seq_F and Seq_R
Laboratory Protocol
FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region
2244
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR
Experimental Testing for Candidate Markers in16 Jawed Vertebrates
To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification
All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C
One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the
PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6
Calculation of Relative Evolutionary Rateof 102 NPCLs
The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates
Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships
To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant
2245
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online
Phylogenetic Analyses
Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)
For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model
The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence
We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree
Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8
model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates
Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)
Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)
Acknowledgments
The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)
ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R
Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197
Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786
Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375
Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press
Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749
Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543
Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726
Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990
Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307
Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370
2246
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577
Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321
Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563
Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586
Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518
Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179
Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276
Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701
Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288
Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145
Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744
Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744
Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971
Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302
McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754
McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538
Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945
Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278
Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618
Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712
Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620
Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10
Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583
Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892
Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804
Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542
Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214
San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561
San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599
Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252
Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256
Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508
Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247
Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947
Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129
Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956
Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690
Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031
Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525
Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195
Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142
Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777
Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110
2247
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156
Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508
Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400
Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140
2248
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
(a)
(b)
(c)4500bp
1200bp
200bp
4500bp
4500bp
1200bp
200bp
1200bp
200bp
4500bp
1200bp
200bp
4500bp
4500bp
1200bp
200bp
1200bp
200bp
4500bp
1200bp
200bp
4500bp
4500bp
1200bp
200bp
1200bp
200bp
Tim
e (M
a) 300
200
100
400
0
457
Tim
e (M
a) 300
200
100
400
0
457
Tim
e (M
a) 300
200
100
400
0
457
102 NPCLs
34 NP
CLs
16 Taxa 16 Taxa 16 Taxa 34 NP
CLs
34 NP
CLs
ADNP
ANKRD50
ARSI
BPTF
CAND1
CASR
CXCR4
DBC1
DISP1
DISP2
DNAH3
DOPEY1
ENC1
EXTL3
FAT1
FAT4
FICD
FLRT3
FZD4
GGPS1
GLCE
GRIN3A
GRM2
KBTBD2
KCNF1
KIAA2013
LIG4
LINGO1
LPHN2
LRRN1
MB21D2
MIOS
MYCBP2
NHS
P2RY1
PANX2
PIK3CG
RAG1
RAG2
ROR2
SACS
SALL1
SETBP1
SOCS5
SPEN
STON2
VPS18
ZEB1
ZFPM2
ZHX2
HYP
CHST1
RERE
SVEP1
LCT
PDP1
MED13
LINGO2
LRRC8D
LRRTM4
RP2
SH3BP4
VCPIP1
DET1
FREM2
MSH6
PCLO
PPL
ARID2
DCHS1
DSEL
FILIP1
KIAA1239
SLITRK1
CPT2
MGAT4C
FEM1B
DMXL1
DOLK
ZBED4
REV3L
IRS1
FAT2
CILP
FUT9
LRRN3
TTN
GPER
WFIKKN2
MED1
EXOC8
B3GALT1
CELSR3
DSE
EVPL
PCDH1
TLR7
MOS
PCDH10
NTN1
SYNE1
ZIC1
CAND1 HYP IRS1
FAT2
CELSR3
RERE
SVEP1
DNAH3
GRM2
Sphyrna
Pangasius
Lepisosteus
Protopterus
IchthyophisB
atrachuperusR
anaM
us
StruthioZ
osteropsC
rocodylus
Trionyx
Podocnem
is
Sus
Hem
idactylusN
aja
Sphyrna
Pangasius
Lepisosteus
Protopterus
IchthyophisB
atrachuperusR
anaM
us
StruthioZ
osteropsC
rocodylus
Trionyx
Podocnem
is
Sus
Hem
idactylusN
aja
Sphyrna
Pangasius
Lepisosteus
Protopterus
IchthyophisB
atrachuperusR
anaM
us
StruthioZ
osteropsC
rocodylus
Trionyx
Podocnem
is
Sus
Hem
idactylusN
aja
FIG 2 PCR performance of the 102 NPCL markers in 16 divergent vertebrate species Each square and electrophoretic lane is aligned with the testedspecies (a) The draft divergence timescale for the 16 tested vertebrate species is based on Inoue et al (2010) and the book The Timetree of Life (b) ThePCR performance of each NPCL marker is ranked by three different-colored squares black producing single target band gray having a target band butwith significant nonspecific bands white no target band The102 NPCL markers are sorted according to their PCR success rates (c) Three typicalagarose electrophoresis results for 9 NPCL markers
2238
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
30 nuclear genes provide further support for the monophylyof lissamphibians and the Batrachia hypothesis (fig 4)Additionally all possible hypotheses against the monophylyof extant amphibians and the Batrachia hypothesis wererejected in our topological tests (table 2) However theBatrachia hypothesis did not receive strong support in ourspecies tree analysis (BPMP-EST = 74 fig 4) suggesting thatmore nuclear genes are still needed to resolve this node
The monophyly of the internally fertilizing salamanders(Salamandroidea all salamanders exclusive of HynobiidaeCryptobranchidae and Sirenidae) is strongly supported inour analyses (fig 4) in line with most recent molecular studies(Wiens et al 2005 Roelants et al 2007 Zhang and Wake 2009Pyron and Wiens 2011) but differing strongly from Frostet al (2006) who recovered a clade comprising SirenidaeDicamptodontidae Ambystomatidae and SalamandridaeThe internally fertilizing salamanders include two well-supported clades one is composed of AmbystomatidaeDicamptodontidae and Salamandridae and the otherof Proteidae Rhyacotritonidae Amphiumidae andPlethodontidae (fig 4)
Currently two hypotheses have been proposed regardingthe basal split within living salamanders The traditional viewfavors Sirenidae as the sister group to all remaining salaman-ders (Duellman and Trueb 1994) This hypothesis receivedstrong support in two recent studies (based on mitochon-drial genomes BPML = 98 Zhang and Wake 2009 based onmitochondrial genomes and nuclear genes BPMLgt 80 SanMauro 2010) In contrast some studies suggest that thebasal split separates Cryptobranchidae + Hynobiidae fromall other salamanders (Gao and Shubin 2001 Wiens et al2005 Frost et al 2006 Roelants et al 2007 Pyron and Wiens2011) but always without strong support (BPMLlt 71) Ourphylogenetic analyses based on 30 independent NPCLs sup-ported the second hypothesis that Cryptobranchoidea(Cryptobranchidae + Hynobiidae) branched first withinthe living salamanders This result is extremely robust inour concatenation analyses (BPML = 99 PPBAY = 10PPCAT = 10 fig 4) and statistically rejects all alternative hy-potheses (table 2) In the species tree analysis without dataconcatenation this result is also strong (BPMP-EST = 83fig 4)
How many nuclear genes then are needed to robustlyresolve the question of the basal split within living salaman-ders Our analysis of data subsets indicates a progressiveincrease in the bootstrap support value for the node of inter-est (fig 4) when an increasing number of genes are analyzed(fig 5) Analyses based on single genes rarely resolve the nodeof interest with any confidence Analyses based on 5ndash10 genesproduce bootstrap support values of 60ndash80 in concatena-tion analyses (fig 5) which is congruent with all previousnuclear studies using similar-sized data sets (Roelants et al2007 Pyron and Wiens 2011) Taking a bootstrap value of 95in concatenation analyses as the threshold of ldquofully resolvedrdquothe minimum number of nuclear genes needed to resolve theroot of the salamander tree is approximately 25 The previouscontradictory results may be due to the overwhelminglystrong signals from the mitochondrial genome Because
100908070604 0503020101020304060 50708090100
100908070604 0503020101020304060 50708090100
PCR success rate in 16 vertebrates () Relative evolutionary rate
FIG 3 Relative evolutionary rates of 102 NPCL in vertebrates The 102NPCLs are arranged in order of increasing variability on the right sideand their PCR success rates in the 16 tested vertebrates are shown onthe left side NPCLs indicated with asterisks may have extra copiesin teleost genomes and thus are not suitable for phylogenetic studiesof teleosts
2239
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
the initial diversification of salamanders occurred withina relatively short window of time (Weisrock et al 2005)the genealogical histories of individual gene loci maysometimes appear misleading in terms of the relation-ships among species due to incomplete lineage sortingUnfortunately the mitochondrial genome recorded such anincorrect history
DiscussionThe NPCL toolkit and experimental protocol introducedhere is a highly reliable rapid and cost-effective method forbuilding medium-scale multilocus data to produce high-resolution phylogenetic relationships This phylogenomicapproach has the potential to accelerate the completion ofmany parts of the vertebrate tree of life because no furthermarker development is required which is often the bottle-necks in phylogenetic research Once a specific phylogeneticquestion within vertebrates arises researchers simply need tocheck the list for our toolkit and look for NPCL markers withexpected evolutionary rates and experimental performance
for their groups of interest Then many orthologous loci canbe quickly obtained by traditional PCR and Sanger sequenc-ing usually without time-consuming gel cutting and cloningApplying the NPCL toolkit may also have another benefit forassembling the vertebrate tree of life because people workingon different groups can easily use the same set of loci whichwill facilitate combined analyses
Merits of the Toolkit
Because of the use of the nested PCR strategy outlined earliermost NPCL in the toolkit work for all major jawed vertebrategroups with high experimental success rates (nor-mallygt 95) Such results were achieved in unified PCR con-ditions without any extra effort involving cycling conditionoptimization This feature of the toolkit enables it to surpasspreviously developed nuclear marker sets (Murphy et al 2001Li et al 2007 Thomson et al 2008 Townsend et al 2008Wright et al 2008 Portik et al 2011 Shen et al 2011Zhou et al 2011) Most previous nuclear marker sets weredeveloped for specific animal groups and their application to
Table 1 Summary Information for the 30 NPCL Amplified in 19 Salamander Taxa
NOTEmdashLength length of refined alignment Var sites variable sites PI sites parsimony informative sites RCV relative composition variability
2240
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Aneides hardii
Plethodon jordani
Batrachoseps major
Eurycea bislineata
Amphiuma means
Rhyacotriton variegatus
Proteus anguinus
Necturus beyeri
Tylototriton asperrimus
Cynops orientalis
Salamandra salamandra
Dicamptodon aterrimus
Ambystoma mexicanum
Pseudobranchus axanthus
Siren intermedia
Ranodon sibiricus
Batrachuperus yenyuanensis
Onychodactylus fischeri
Andrias davidianus
Silurana tropicalis
Bombina fortinuptialis
Typhlonectes natans
Gallus gallus
Ichthyophis bannanicus
Homo sapiens
Chrysemys picta bellii
Mus musculus
Latimeria chalumnae
01 subsititutionssite
Dicamptodontidae
Hynobiidae
Cryptobranchidae
Sirenidae
Plethodontidae
Rhyacotritonidae
Amphiumidae
Proteidae
Salamandridae
Ambystomatidae
ANURA
GYMNOPHIONA
Cry
ptob
ranc
hoid
eaSa
lam
andr
oide
a
30 nuclear genes
(total 27834 bp)
1
Non-amphibianOutgroup
99101083
99101074
FIG 4 Higher-level phylogenetic relationships of 10 salamander families inferred from 30 NPCL markers The tree was inferred by concatenationanalyses using ML BI and the mixture model (CAT) and by species-tree analysis using the pseudo-ML approach (MP-EST) Branch support valuesare indicated beside nodes in order of ML bootstrap (BPML) BI posterior probability (PPBI) CAT posterior probability (PPCAT) and MP-EST bootstrap(BPMP-EST) from left to right The filled squares represent BPMLgt 95 PPBAY = 10 PPCAT = 10 and BPMP-ESTgt 95 The circled number refers to the nodeof interest studied in figure 6 Branch lengths are from the ML analysis
Table 2 Statistical Confidence (P Values) for Alternative Branching Hypotheses Based on 30-Gene Data Set
Alternative Topology Tested Ln L P Value Rejection
Sirenidae is sister to Cryptobranchoidea 423 0004 0002 0004 + + +
Gymnophiona is sister to Anura (monophyletic lissamphibians) 343 0013 0012 0013 + + +
Gymnophiona is sister to Caudata (monophyletic lissamphibians) 439 0002 0001 0001 + + +
Anura is sister to Amniota (paraphyletic lissamphibians) 1446 5E30 0 0 + + +
Gymnophiona is sister to Amniota (paraphyletic lissamphibians) 1290 1E69 0 0 + + +
Caudata is sister to Amniota (paraphyletic lissamphibians) 1728 00001 0 0 + + +
2241
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
other distantly related groups is usually difficult For exampleSpinks et al (2010) collected 120 nuclear markers from aviansquamate and mammalian phylogenetic studies and evalu-ated their PCR performance in turtles They found that onlyeight nuclear markers successfully produced single expectedbands across 13 tested turtle species In another case Fongand Fujita (2011) developed 75 nuclear markers for vertebratephylogenetics but approximately 60 of the target fragmentswere unable to obtain in three test species (two reptiles andone lissamphibian) Therefore although the nested PCRmethod introduced here requires an additional PCR reactionthe extra work is still worthwhile
In PCR-based phylogenetic projects even when thePCR reactions are successful the products often contain sig-nificant nonspecific amplicons Such a condition requires ad-ditional effort involving gel purification and cloning whichinvolves much more time than the PCR reaction Our NPCLtoolkit is specifically designed to solve this problem so thatnormally over 90 of PCR reactions produce strong andsingle expected bands Moreover most of the primers usedto date for nuclear marker sets are degenerate and thus arenot suitable for direct sequencing PCR products Benefitingfrom the use of our nested PCR strategy we introduce an-choring sequences to the ends of PCR fragments while main-taining PCR efficiency Such introduced anchoring sequencesbring the added benefit that two specific sequencing primers(Seq_F and Seq_R) can be used in all Sanger sequencingreactions
One additional feature of our NPCL toolkit is that theaverage length of the NPCLs within it is 1050 bp a lengththat can easily be amplified in one PCR reaction and se-quenced in both directions to allow efficient use of resourcesIn contrast the average marker lengths are 920 bp for 10NPCLs in Li et al (2007) 760 bp for 26 NPCLs in Townsendet al (2008) 873 bp for 22 NPCLs in Shen et al (2011) and470 bp for 75 NPCLs in Fong and Fujita (2011) respectivelyLonger markers will provide more sites than shorter ones for
equivalent money and time This feature makes our NPCLtoolkit more cost-effective than previously developed nuclearmarker sets
Phylogenetic Utility
The vertebrate NPCL toolkit we developed here shows greatpromise in terms of phylogenetic utility A remarkable featureof our NPCL toolkit is that it provided 102 NPCLs with abroad range of evolutionary rates In the case of our demon-stration we used 30 NPCLs to resolve a family-level salaman-der phylogeny using both traditional concatenation analysesand a more promising species-tree analysis However thisexample does not mean that our toolkit performs wellonly on deep-timescale questions Our ongoing study usingthis toolkit to resolve the intra-relationships withinPlethodontidae a rapidly radiating group of salamanders sug-gests that the toolkit developed here also performs well inresolving species-level phylogenies For many vertebrategroups in which applicable nuclear markers are limitedsuch as some teleosts frogs and salamanders our NPCLtoolkit can provide a one-stop solution for phylogenetic stud-ies from the family level to the species level Even for thosegroups in which specific nuclear marker sets have beendeveloped our toolkit is still worth trying as many moreloci can be easily obtained that may resolve some difficultbranches
The Toolkit Is a Good Addition to Sequence CaptureApproaches
Recently sequence capture approaches have been applied tovertebrate phylogenomics (Crawford et al 2012 Fairclothet al 2012 Lemmon et al 2012 McCormack et al 2012)These approaches begin with the selective capture of geno-mic regions Briefly fragmented gDNA is hybridized to DNAor RNA probes either on an array or in solution NontargetedDNA is then washed away and the targeted DNA is se-quenced through NGS The most promising feature of thesequence capture approach is that it can simultaneously pro-duce hundreds to thousands of loci for tens of individualswithin a relatively short time Therefore the sequence captureapproach is considered to be much more cost-effective thanthe PCR-based method According to the calculation ofLemmon et al (2012) for a 100 taxa 500 loci project thecost of the sequence capture method is just 1ndash35 of thePCR-based method
However the sequence capture approach is currently toochallenging for most phylogenetic researchers Typical NGSruns (454 or Illumina) used by the sequence capture methodgenerate 1000000ndash2000000000 sequences Storing and pro-cessing these NGS data require significant computer memoryhardware upgrades and bioinformatic programming skillswhich are often not familiar to most phylogenetic researchersMoreover phylogenetic reconstruction assumes that ortho-logous genes are being analyzed across species For the PCR-based method the detection of paralogous genes is relativelystraightforward However in the sequence capture methodthe captured genomic regions comprise short conservedcores (probe regions) and long unconserved flanking
1
100
90
80
70
60
50
30
20
10
0
40
5 10 15 20 25 30
Concatenation analyses
Species tree analyses
Boo
tstr
ap s
uppo
rt (
)
Number of genes
FIG 5 The effect of increasing the number of nuclear loci on resolvingthe basal split within salamanders Each data point represents the meanof support values estimated from 30 randomly sampled subsets Thedashed line indicates the threshold of 95 bootstrap support valuesThe statistical plots show that the minimum number of nuclear locineeded to robustly resolve the basal split within salamanders is 25
2242
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
sequences Because paralogy cannot be detected until afterthe data are aligned those unalignable sequences will makethe detection of paralogy more difficult
In fact not every phylogenetic project will use more than500 loci as the sequence capture method normally doesBased on both empirical and simulation data 20ndash50 lociare generally sufficient to answer many phylogenetic ques-tions (Rokas et al 2003 Spinks et al 2009) This is also thenumber of loci that most phylogenetic studies will use Insuch a situation adopting the sequence capture method isnot cost-effective because researchers need to use relativelyexpensive NGS sequencing and spend time learning new ex-perimental techniques and carrying out sophisticated bioin-formatic processing Our NPCL toolkit is specially designed forsuch medium-scale phylogenetic projects using approxi-mately 50 loci Such a number of expected loci can beeasily fulfilled with our 102 NPCLs Because more than 90of the PCR reactions generated by our toolkit can be directlysequenced the average cost for one locus per sample is ratherlow In our laboratory generating one new sequence typicallycosts US$ 3 (without considering labor)
In addition researchers sometimes have only tiny amountsof DNA but they wish to perform a multilocus phylogeneticanalysis In such a situation the sequence capture method isdifficult to implement because it normally requires DNA atthe microgram level (Lemmon et al 2012) Our NPCL toolkitcan fill the gap here Benefiting from the use of the nestedPCR strategy the sensitivity of PCR reactions in our method isextremely high In many test experiments in our laboratorythe toolkit and protocol could produce target bands withonly 5ndash10 ng of DNA
Our NPCL toolkit is an alternative to the sequence capturemethod for the everyday work of phylogenetic researchersWhich method to choose depends on two major drivers theamounts of DNA and the expected number of loci Whenyour DNA is limited the better solution may be PCR other-wise sequence capture also works Taking into account themoney and time the two methods require we speculate thatthe economic transition point from PCR to sequence captureis at approximately 100 loci That assessment is why ourtoolkit includes 102 NPCL markers Our proposal is thatwhen using 100 loci one can try our NPCL toolkit whenusing gt100 loci sequence capture should be used
Future Directions
In this study we used multiple genome alignments depositedin the University of CaliforniandashSan Cruz (UCSC) genomebrowser to identify long and conserved exons across jawedvertebrates Benefiting from the use of a nested PCR strategythe experimental performance of the developed NPCLs indi-cated that they are highly stable in all major jawed vertebrategroups Recently a database for mining exon and intron mar-kers called EvolMarkers has been built by Li et al (2012)Careful investigation of this database may identify many con-served exons within nonvertebrates whose interrelationshipsare currently more problematic than those of vertebratesBecause the nonvertebrates constitute many distantly related
groups it may be impossible to develop a single set of PCRprimers for all nonvertebrates However following a similarmarker development strategy multiple NPCL toolkits couldbe constructed for various groups of nonvertebrates such asarthropods echinoderms and molluscs In addition becauseintrons are flanked by conserved exons the idea of the use ofnested PCRs for marker development could also be applied tothe development of EPIC (exon-primed intron crossing) mar-kers which are more suitable in shallow-scale phylogenetic orphylogeographic projects
Despite the benefits of our proposed method it must berecognized that when handling large-scale projects such as200 taxa 100 loci the use of our toolkit and Sanger se-quencing will still require significant cost time and laborAn alternative solution is to use NGS to replace Sanger se-quencing Recently 454 NGS technology has been applied tosequence-targeted gene regions from a pool of PCR productsfrom different specimens (Binladen et al 2007 Meyer et al2008) In such experiments specific tagging sequences mustbe added to amplicons by either PCR (Binladen et al 2007) orblunt-end ligation (Meyer et al 2008) Therefore if the tailingsequences of the second-round PCR primers in our NPCLtoolkit are replaced by tagging sequences instead (for tagdesigning see Faircloth and Glenn 2012) all PCR productscan be pooled together and sequenced with the 454 NGSwhich will greatly reduce the money and time cost comparedwith Sanger sequencing However parallel tagged sequencingvia NGS does not circumvent the process of PCR for eachindividual at each locus which may be the most onerous partof a large-scale phylogenomic project Some promising newtechnologies may help to solve this problem such as micro-droplet PCR (Tewhey et al 2009) where millions of individualPCR reactions are performed in picoliter-scale droplets simul-taneously and the 9696 Dynamic Array by Fluidigm whichallows 96 primer combinations to be used on 96 samples(9216 total PCR reactions) on a single PCR plate Howeverthere has been little research to applying NGS and new high-throughput PCR technologies to phylogenomics so theirease-of-use and cost-effectiveness still need to be explored
Summary
In conclusion we have developed an improved method forrapidly amplifying and sequencing NPCLs that has proven tobe useful and effective for molecular phylogenetic studies ofvertebrates The newly developed toolkit provides an attrac-tive alternative to available methods for vertebratephylogenomics
Materials and Methods
Development of NPCL and Primer Design
Our previous study showed that nested PCR is overwhel-mingly more effective than conventional PCR for obtainingtarget amplicons from complex genomic environments (Shenet al 2012) However nested PCR requires four conservedregions to design two pairs of primers (illustrated in fig 6yellow blocks represent the conserved regions used for primerdesign) which means that only relatively long exons are
2243
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li
et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons
As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the
Enrich target region from complex genomic environment with one pair of high degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min
Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min
Evaluate agarose gel electrophoretic results and sequencing
gDNA
the first round PCR product
Second PCR with tailed primers F2 and R2 using the 1st PCR as template
First PCR with primers F1 and R1 using gDNA as template
PCR evaluating and sequencing
25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing
A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product
PCR was performed with 50-100 ng DNA in a 25 ul reaction
PCR was performed with 1ul 1st PCR in a 25 ul reaction
(i)
(ii)
(iii)
F1
R2
F2
Seq_F
Seq_R
R1
Target Region
Target Region
Target Region
Target Region
Target Region
conserved blocks
single and strong bandN
Y (normally gt 90)
gel cutting or cloning then sequencing
cleanup with ExoI and FastAPdirect sequencing by general sequencing primers
Seq_F and Seq_R
Laboratory Protocol
FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region
2244
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR
Experimental Testing for Candidate Markers in16 Jawed Vertebrates
To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification
All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C
One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the
PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6
Calculation of Relative Evolutionary Rateof 102 NPCLs
The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates
Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships
To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant
2245
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online
Phylogenetic Analyses
Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)
For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model
The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence
We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree
Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8
model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates
Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)
Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)
Acknowledgments
The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)
ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R
Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197
Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786
Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375
Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press
Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749
Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543
Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726
Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990
Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307
Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370
2246
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577
Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321
Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563
Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586
Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518
Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179
Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276
Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701
Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288
Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145
Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744
Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744
Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971
Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302
McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754
McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538
Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945
Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278
Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618
Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712
Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620
Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10
Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583
Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892
Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804
Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542
Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214
San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561
San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599
Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252
Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256
Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508
Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247
Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947
Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129
Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956
Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690
Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031
Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525
Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195
Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142
Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777
Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110
2247
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156
Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508
Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400
Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140
2248
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
30 nuclear genes provide further support for the monophylyof lissamphibians and the Batrachia hypothesis (fig 4)Additionally all possible hypotheses against the monophylyof extant amphibians and the Batrachia hypothesis wererejected in our topological tests (table 2) However theBatrachia hypothesis did not receive strong support in ourspecies tree analysis (BPMP-EST = 74 fig 4) suggesting thatmore nuclear genes are still needed to resolve this node
The monophyly of the internally fertilizing salamanders(Salamandroidea all salamanders exclusive of HynobiidaeCryptobranchidae and Sirenidae) is strongly supported inour analyses (fig 4) in line with most recent molecular studies(Wiens et al 2005 Roelants et al 2007 Zhang and Wake 2009Pyron and Wiens 2011) but differing strongly from Frostet al (2006) who recovered a clade comprising SirenidaeDicamptodontidae Ambystomatidae and SalamandridaeThe internally fertilizing salamanders include two well-supported clades one is composed of AmbystomatidaeDicamptodontidae and Salamandridae and the otherof Proteidae Rhyacotritonidae Amphiumidae andPlethodontidae (fig 4)
Currently two hypotheses have been proposed regardingthe basal split within living salamanders The traditional viewfavors Sirenidae as the sister group to all remaining salaman-ders (Duellman and Trueb 1994) This hypothesis receivedstrong support in two recent studies (based on mitochon-drial genomes BPML = 98 Zhang and Wake 2009 based onmitochondrial genomes and nuclear genes BPMLgt 80 SanMauro 2010) In contrast some studies suggest that thebasal split separates Cryptobranchidae + Hynobiidae fromall other salamanders (Gao and Shubin 2001 Wiens et al2005 Frost et al 2006 Roelants et al 2007 Pyron and Wiens2011) but always without strong support (BPMLlt 71) Ourphylogenetic analyses based on 30 independent NPCLs sup-ported the second hypothesis that Cryptobranchoidea(Cryptobranchidae + Hynobiidae) branched first withinthe living salamanders This result is extremely robust inour concatenation analyses (BPML = 99 PPBAY = 10PPCAT = 10 fig 4) and statistically rejects all alternative hy-potheses (table 2) In the species tree analysis without dataconcatenation this result is also strong (BPMP-EST = 83fig 4)
How many nuclear genes then are needed to robustlyresolve the question of the basal split within living salaman-ders Our analysis of data subsets indicates a progressiveincrease in the bootstrap support value for the node of inter-est (fig 4) when an increasing number of genes are analyzed(fig 5) Analyses based on single genes rarely resolve the nodeof interest with any confidence Analyses based on 5ndash10 genesproduce bootstrap support values of 60ndash80 in concatena-tion analyses (fig 5) which is congruent with all previousnuclear studies using similar-sized data sets (Roelants et al2007 Pyron and Wiens 2011) Taking a bootstrap value of 95in concatenation analyses as the threshold of ldquofully resolvedrdquothe minimum number of nuclear genes needed to resolve theroot of the salamander tree is approximately 25 The previouscontradictory results may be due to the overwhelminglystrong signals from the mitochondrial genome Because
100908070604 0503020101020304060 50708090100
100908070604 0503020101020304060 50708090100
PCR success rate in 16 vertebrates () Relative evolutionary rate
FIG 3 Relative evolutionary rates of 102 NPCL in vertebrates The 102NPCLs are arranged in order of increasing variability on the right sideand their PCR success rates in the 16 tested vertebrates are shown onthe left side NPCLs indicated with asterisks may have extra copiesin teleost genomes and thus are not suitable for phylogenetic studiesof teleosts
2239
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
the initial diversification of salamanders occurred withina relatively short window of time (Weisrock et al 2005)the genealogical histories of individual gene loci maysometimes appear misleading in terms of the relation-ships among species due to incomplete lineage sortingUnfortunately the mitochondrial genome recorded such anincorrect history
DiscussionThe NPCL toolkit and experimental protocol introducedhere is a highly reliable rapid and cost-effective method forbuilding medium-scale multilocus data to produce high-resolution phylogenetic relationships This phylogenomicapproach has the potential to accelerate the completion ofmany parts of the vertebrate tree of life because no furthermarker development is required which is often the bottle-necks in phylogenetic research Once a specific phylogeneticquestion within vertebrates arises researchers simply need tocheck the list for our toolkit and look for NPCL markers withexpected evolutionary rates and experimental performance
for their groups of interest Then many orthologous loci canbe quickly obtained by traditional PCR and Sanger sequenc-ing usually without time-consuming gel cutting and cloningApplying the NPCL toolkit may also have another benefit forassembling the vertebrate tree of life because people workingon different groups can easily use the same set of loci whichwill facilitate combined analyses
Merits of the Toolkit
Because of the use of the nested PCR strategy outlined earliermost NPCL in the toolkit work for all major jawed vertebrategroups with high experimental success rates (nor-mallygt 95) Such results were achieved in unified PCR con-ditions without any extra effort involving cycling conditionoptimization This feature of the toolkit enables it to surpasspreviously developed nuclear marker sets (Murphy et al 2001Li et al 2007 Thomson et al 2008 Townsend et al 2008Wright et al 2008 Portik et al 2011 Shen et al 2011Zhou et al 2011) Most previous nuclear marker sets weredeveloped for specific animal groups and their application to
Table 1 Summary Information for the 30 NPCL Amplified in 19 Salamander Taxa
NOTEmdashLength length of refined alignment Var sites variable sites PI sites parsimony informative sites RCV relative composition variability
2240
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Aneides hardii
Plethodon jordani
Batrachoseps major
Eurycea bislineata
Amphiuma means
Rhyacotriton variegatus
Proteus anguinus
Necturus beyeri
Tylototriton asperrimus
Cynops orientalis
Salamandra salamandra
Dicamptodon aterrimus
Ambystoma mexicanum
Pseudobranchus axanthus
Siren intermedia
Ranodon sibiricus
Batrachuperus yenyuanensis
Onychodactylus fischeri
Andrias davidianus
Silurana tropicalis
Bombina fortinuptialis
Typhlonectes natans
Gallus gallus
Ichthyophis bannanicus
Homo sapiens
Chrysemys picta bellii
Mus musculus
Latimeria chalumnae
01 subsititutionssite
Dicamptodontidae
Hynobiidae
Cryptobranchidae
Sirenidae
Plethodontidae
Rhyacotritonidae
Amphiumidae
Proteidae
Salamandridae
Ambystomatidae
ANURA
GYMNOPHIONA
Cry
ptob
ranc
hoid
eaSa
lam
andr
oide
a
30 nuclear genes
(total 27834 bp)
1
Non-amphibianOutgroup
99101083
99101074
FIG 4 Higher-level phylogenetic relationships of 10 salamander families inferred from 30 NPCL markers The tree was inferred by concatenationanalyses using ML BI and the mixture model (CAT) and by species-tree analysis using the pseudo-ML approach (MP-EST) Branch support valuesare indicated beside nodes in order of ML bootstrap (BPML) BI posterior probability (PPBI) CAT posterior probability (PPCAT) and MP-EST bootstrap(BPMP-EST) from left to right The filled squares represent BPMLgt 95 PPBAY = 10 PPCAT = 10 and BPMP-ESTgt 95 The circled number refers to the nodeof interest studied in figure 6 Branch lengths are from the ML analysis
Table 2 Statistical Confidence (P Values) for Alternative Branching Hypotheses Based on 30-Gene Data Set
Alternative Topology Tested Ln L P Value Rejection
Sirenidae is sister to Cryptobranchoidea 423 0004 0002 0004 + + +
Gymnophiona is sister to Anura (monophyletic lissamphibians) 343 0013 0012 0013 + + +
Gymnophiona is sister to Caudata (monophyletic lissamphibians) 439 0002 0001 0001 + + +
Anura is sister to Amniota (paraphyletic lissamphibians) 1446 5E30 0 0 + + +
Gymnophiona is sister to Amniota (paraphyletic lissamphibians) 1290 1E69 0 0 + + +
Caudata is sister to Amniota (paraphyletic lissamphibians) 1728 00001 0 0 + + +
2241
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
other distantly related groups is usually difficult For exampleSpinks et al (2010) collected 120 nuclear markers from aviansquamate and mammalian phylogenetic studies and evalu-ated their PCR performance in turtles They found that onlyeight nuclear markers successfully produced single expectedbands across 13 tested turtle species In another case Fongand Fujita (2011) developed 75 nuclear markers for vertebratephylogenetics but approximately 60 of the target fragmentswere unable to obtain in three test species (two reptiles andone lissamphibian) Therefore although the nested PCRmethod introduced here requires an additional PCR reactionthe extra work is still worthwhile
In PCR-based phylogenetic projects even when thePCR reactions are successful the products often contain sig-nificant nonspecific amplicons Such a condition requires ad-ditional effort involving gel purification and cloning whichinvolves much more time than the PCR reaction Our NPCLtoolkit is specifically designed to solve this problem so thatnormally over 90 of PCR reactions produce strong andsingle expected bands Moreover most of the primers usedto date for nuclear marker sets are degenerate and thus arenot suitable for direct sequencing PCR products Benefitingfrom the use of our nested PCR strategy we introduce an-choring sequences to the ends of PCR fragments while main-taining PCR efficiency Such introduced anchoring sequencesbring the added benefit that two specific sequencing primers(Seq_F and Seq_R) can be used in all Sanger sequencingreactions
One additional feature of our NPCL toolkit is that theaverage length of the NPCLs within it is 1050 bp a lengththat can easily be amplified in one PCR reaction and se-quenced in both directions to allow efficient use of resourcesIn contrast the average marker lengths are 920 bp for 10NPCLs in Li et al (2007) 760 bp for 26 NPCLs in Townsendet al (2008) 873 bp for 22 NPCLs in Shen et al (2011) and470 bp for 75 NPCLs in Fong and Fujita (2011) respectivelyLonger markers will provide more sites than shorter ones for
equivalent money and time This feature makes our NPCLtoolkit more cost-effective than previously developed nuclearmarker sets
Phylogenetic Utility
The vertebrate NPCL toolkit we developed here shows greatpromise in terms of phylogenetic utility A remarkable featureof our NPCL toolkit is that it provided 102 NPCLs with abroad range of evolutionary rates In the case of our demon-stration we used 30 NPCLs to resolve a family-level salaman-der phylogeny using both traditional concatenation analysesand a more promising species-tree analysis However thisexample does not mean that our toolkit performs wellonly on deep-timescale questions Our ongoing study usingthis toolkit to resolve the intra-relationships withinPlethodontidae a rapidly radiating group of salamanders sug-gests that the toolkit developed here also performs well inresolving species-level phylogenies For many vertebrategroups in which applicable nuclear markers are limitedsuch as some teleosts frogs and salamanders our NPCLtoolkit can provide a one-stop solution for phylogenetic stud-ies from the family level to the species level Even for thosegroups in which specific nuclear marker sets have beendeveloped our toolkit is still worth trying as many moreloci can be easily obtained that may resolve some difficultbranches
The Toolkit Is a Good Addition to Sequence CaptureApproaches
Recently sequence capture approaches have been applied tovertebrate phylogenomics (Crawford et al 2012 Fairclothet al 2012 Lemmon et al 2012 McCormack et al 2012)These approaches begin with the selective capture of geno-mic regions Briefly fragmented gDNA is hybridized to DNAor RNA probes either on an array or in solution NontargetedDNA is then washed away and the targeted DNA is se-quenced through NGS The most promising feature of thesequence capture approach is that it can simultaneously pro-duce hundreds to thousands of loci for tens of individualswithin a relatively short time Therefore the sequence captureapproach is considered to be much more cost-effective thanthe PCR-based method According to the calculation ofLemmon et al (2012) for a 100 taxa 500 loci project thecost of the sequence capture method is just 1ndash35 of thePCR-based method
However the sequence capture approach is currently toochallenging for most phylogenetic researchers Typical NGSruns (454 or Illumina) used by the sequence capture methodgenerate 1000000ndash2000000000 sequences Storing and pro-cessing these NGS data require significant computer memoryhardware upgrades and bioinformatic programming skillswhich are often not familiar to most phylogenetic researchersMoreover phylogenetic reconstruction assumes that ortho-logous genes are being analyzed across species For the PCR-based method the detection of paralogous genes is relativelystraightforward However in the sequence capture methodthe captured genomic regions comprise short conservedcores (probe regions) and long unconserved flanking
1
100
90
80
70
60
50
30
20
10
0
40
5 10 15 20 25 30
Concatenation analyses
Species tree analyses
Boo
tstr
ap s
uppo
rt (
)
Number of genes
FIG 5 The effect of increasing the number of nuclear loci on resolvingthe basal split within salamanders Each data point represents the meanof support values estimated from 30 randomly sampled subsets Thedashed line indicates the threshold of 95 bootstrap support valuesThe statistical plots show that the minimum number of nuclear locineeded to robustly resolve the basal split within salamanders is 25
2242
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
sequences Because paralogy cannot be detected until afterthe data are aligned those unalignable sequences will makethe detection of paralogy more difficult
In fact not every phylogenetic project will use more than500 loci as the sequence capture method normally doesBased on both empirical and simulation data 20ndash50 lociare generally sufficient to answer many phylogenetic ques-tions (Rokas et al 2003 Spinks et al 2009) This is also thenumber of loci that most phylogenetic studies will use Insuch a situation adopting the sequence capture method isnot cost-effective because researchers need to use relativelyexpensive NGS sequencing and spend time learning new ex-perimental techniques and carrying out sophisticated bioin-formatic processing Our NPCL toolkit is specially designed forsuch medium-scale phylogenetic projects using approxi-mately 50 loci Such a number of expected loci can beeasily fulfilled with our 102 NPCLs Because more than 90of the PCR reactions generated by our toolkit can be directlysequenced the average cost for one locus per sample is ratherlow In our laboratory generating one new sequence typicallycosts US$ 3 (without considering labor)
In addition researchers sometimes have only tiny amountsof DNA but they wish to perform a multilocus phylogeneticanalysis In such a situation the sequence capture method isdifficult to implement because it normally requires DNA atthe microgram level (Lemmon et al 2012) Our NPCL toolkitcan fill the gap here Benefiting from the use of the nestedPCR strategy the sensitivity of PCR reactions in our method isextremely high In many test experiments in our laboratorythe toolkit and protocol could produce target bands withonly 5ndash10 ng of DNA
Our NPCL toolkit is an alternative to the sequence capturemethod for the everyday work of phylogenetic researchersWhich method to choose depends on two major drivers theamounts of DNA and the expected number of loci Whenyour DNA is limited the better solution may be PCR other-wise sequence capture also works Taking into account themoney and time the two methods require we speculate thatthe economic transition point from PCR to sequence captureis at approximately 100 loci That assessment is why ourtoolkit includes 102 NPCL markers Our proposal is thatwhen using 100 loci one can try our NPCL toolkit whenusing gt100 loci sequence capture should be used
Future Directions
In this study we used multiple genome alignments depositedin the University of CaliforniandashSan Cruz (UCSC) genomebrowser to identify long and conserved exons across jawedvertebrates Benefiting from the use of a nested PCR strategythe experimental performance of the developed NPCLs indi-cated that they are highly stable in all major jawed vertebrategroups Recently a database for mining exon and intron mar-kers called EvolMarkers has been built by Li et al (2012)Careful investigation of this database may identify many con-served exons within nonvertebrates whose interrelationshipsare currently more problematic than those of vertebratesBecause the nonvertebrates constitute many distantly related
groups it may be impossible to develop a single set of PCRprimers for all nonvertebrates However following a similarmarker development strategy multiple NPCL toolkits couldbe constructed for various groups of nonvertebrates such asarthropods echinoderms and molluscs In addition becauseintrons are flanked by conserved exons the idea of the use ofnested PCRs for marker development could also be applied tothe development of EPIC (exon-primed intron crossing) mar-kers which are more suitable in shallow-scale phylogenetic orphylogeographic projects
Despite the benefits of our proposed method it must berecognized that when handling large-scale projects such as200 taxa 100 loci the use of our toolkit and Sanger se-quencing will still require significant cost time and laborAn alternative solution is to use NGS to replace Sanger se-quencing Recently 454 NGS technology has been applied tosequence-targeted gene regions from a pool of PCR productsfrom different specimens (Binladen et al 2007 Meyer et al2008) In such experiments specific tagging sequences mustbe added to amplicons by either PCR (Binladen et al 2007) orblunt-end ligation (Meyer et al 2008) Therefore if the tailingsequences of the second-round PCR primers in our NPCLtoolkit are replaced by tagging sequences instead (for tagdesigning see Faircloth and Glenn 2012) all PCR productscan be pooled together and sequenced with the 454 NGSwhich will greatly reduce the money and time cost comparedwith Sanger sequencing However parallel tagged sequencingvia NGS does not circumvent the process of PCR for eachindividual at each locus which may be the most onerous partof a large-scale phylogenomic project Some promising newtechnologies may help to solve this problem such as micro-droplet PCR (Tewhey et al 2009) where millions of individualPCR reactions are performed in picoliter-scale droplets simul-taneously and the 9696 Dynamic Array by Fluidigm whichallows 96 primer combinations to be used on 96 samples(9216 total PCR reactions) on a single PCR plate Howeverthere has been little research to applying NGS and new high-throughput PCR technologies to phylogenomics so theirease-of-use and cost-effectiveness still need to be explored
Summary
In conclusion we have developed an improved method forrapidly amplifying and sequencing NPCLs that has proven tobe useful and effective for molecular phylogenetic studies ofvertebrates The newly developed toolkit provides an attrac-tive alternative to available methods for vertebratephylogenomics
Materials and Methods
Development of NPCL and Primer Design
Our previous study showed that nested PCR is overwhel-mingly more effective than conventional PCR for obtainingtarget amplicons from complex genomic environments (Shenet al 2012) However nested PCR requires four conservedregions to design two pairs of primers (illustrated in fig 6yellow blocks represent the conserved regions used for primerdesign) which means that only relatively long exons are
2243
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li
et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons
As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the
Enrich target region from complex genomic environment with one pair of high degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min
Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min
Evaluate agarose gel electrophoretic results and sequencing
gDNA
the first round PCR product
Second PCR with tailed primers F2 and R2 using the 1st PCR as template
First PCR with primers F1 and R1 using gDNA as template
PCR evaluating and sequencing
25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing
A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product
PCR was performed with 50-100 ng DNA in a 25 ul reaction
PCR was performed with 1ul 1st PCR in a 25 ul reaction
(i)
(ii)
(iii)
F1
R2
F2
Seq_F
Seq_R
R1
Target Region
Target Region
Target Region
Target Region
Target Region
conserved blocks
single and strong bandN
Y (normally gt 90)
gel cutting or cloning then sequencing
cleanup with ExoI and FastAPdirect sequencing by general sequencing primers
Seq_F and Seq_R
Laboratory Protocol
FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region
2244
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR
Experimental Testing for Candidate Markers in16 Jawed Vertebrates
To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification
All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C
One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the
PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6
Calculation of Relative Evolutionary Rateof 102 NPCLs
The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates
Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships
To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant
2245
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online
Phylogenetic Analyses
Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)
For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model
The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence
We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree
Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8
model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates
Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)
Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)
Acknowledgments
The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)
ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R
Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197
Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786
Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375
Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press
Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749
Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543
Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726
Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990
Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307
Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370
2246
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577
Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321
Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563
Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586
Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518
Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179
Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276
Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701
Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288
Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145
Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744
Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744
Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971
Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302
McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754
McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538
Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945
Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278
Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618
Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712
Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620
Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10
Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583
Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892
Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804
Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542
Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214
San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561
San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599
Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252
Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256
Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508
Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247
Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947
Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129
Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956
Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690
Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031
Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525
Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195
Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142
Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777
Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110
2247
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156
Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508
Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400
Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140
2248
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
the initial diversification of salamanders occurred withina relatively short window of time (Weisrock et al 2005)the genealogical histories of individual gene loci maysometimes appear misleading in terms of the relation-ships among species due to incomplete lineage sortingUnfortunately the mitochondrial genome recorded such anincorrect history
DiscussionThe NPCL toolkit and experimental protocol introducedhere is a highly reliable rapid and cost-effective method forbuilding medium-scale multilocus data to produce high-resolution phylogenetic relationships This phylogenomicapproach has the potential to accelerate the completion ofmany parts of the vertebrate tree of life because no furthermarker development is required which is often the bottle-necks in phylogenetic research Once a specific phylogeneticquestion within vertebrates arises researchers simply need tocheck the list for our toolkit and look for NPCL markers withexpected evolutionary rates and experimental performance
for their groups of interest Then many orthologous loci canbe quickly obtained by traditional PCR and Sanger sequenc-ing usually without time-consuming gel cutting and cloningApplying the NPCL toolkit may also have another benefit forassembling the vertebrate tree of life because people workingon different groups can easily use the same set of loci whichwill facilitate combined analyses
Merits of the Toolkit
Because of the use of the nested PCR strategy outlined earliermost NPCL in the toolkit work for all major jawed vertebrategroups with high experimental success rates (nor-mallygt 95) Such results were achieved in unified PCR con-ditions without any extra effort involving cycling conditionoptimization This feature of the toolkit enables it to surpasspreviously developed nuclear marker sets (Murphy et al 2001Li et al 2007 Thomson et al 2008 Townsend et al 2008Wright et al 2008 Portik et al 2011 Shen et al 2011Zhou et al 2011) Most previous nuclear marker sets weredeveloped for specific animal groups and their application to
Table 1 Summary Information for the 30 NPCL Amplified in 19 Salamander Taxa
NOTEmdashLength length of refined alignment Var sites variable sites PI sites parsimony informative sites RCV relative composition variability
2240
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Aneides hardii
Plethodon jordani
Batrachoseps major
Eurycea bislineata
Amphiuma means
Rhyacotriton variegatus
Proteus anguinus
Necturus beyeri
Tylototriton asperrimus
Cynops orientalis
Salamandra salamandra
Dicamptodon aterrimus
Ambystoma mexicanum
Pseudobranchus axanthus
Siren intermedia
Ranodon sibiricus
Batrachuperus yenyuanensis
Onychodactylus fischeri
Andrias davidianus
Silurana tropicalis
Bombina fortinuptialis
Typhlonectes natans
Gallus gallus
Ichthyophis bannanicus
Homo sapiens
Chrysemys picta bellii
Mus musculus
Latimeria chalumnae
01 subsititutionssite
Dicamptodontidae
Hynobiidae
Cryptobranchidae
Sirenidae
Plethodontidae
Rhyacotritonidae
Amphiumidae
Proteidae
Salamandridae
Ambystomatidae
ANURA
GYMNOPHIONA
Cry
ptob
ranc
hoid
eaSa
lam
andr
oide
a
30 nuclear genes
(total 27834 bp)
1
Non-amphibianOutgroup
99101083
99101074
FIG 4 Higher-level phylogenetic relationships of 10 salamander families inferred from 30 NPCL markers The tree was inferred by concatenationanalyses using ML BI and the mixture model (CAT) and by species-tree analysis using the pseudo-ML approach (MP-EST) Branch support valuesare indicated beside nodes in order of ML bootstrap (BPML) BI posterior probability (PPBI) CAT posterior probability (PPCAT) and MP-EST bootstrap(BPMP-EST) from left to right The filled squares represent BPMLgt 95 PPBAY = 10 PPCAT = 10 and BPMP-ESTgt 95 The circled number refers to the nodeof interest studied in figure 6 Branch lengths are from the ML analysis
Table 2 Statistical Confidence (P Values) for Alternative Branching Hypotheses Based on 30-Gene Data Set
Alternative Topology Tested Ln L P Value Rejection
Sirenidae is sister to Cryptobranchoidea 423 0004 0002 0004 + + +
Gymnophiona is sister to Anura (monophyletic lissamphibians) 343 0013 0012 0013 + + +
Gymnophiona is sister to Caudata (monophyletic lissamphibians) 439 0002 0001 0001 + + +
Anura is sister to Amniota (paraphyletic lissamphibians) 1446 5E30 0 0 + + +
Gymnophiona is sister to Amniota (paraphyletic lissamphibians) 1290 1E69 0 0 + + +
Caudata is sister to Amniota (paraphyletic lissamphibians) 1728 00001 0 0 + + +
2241
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
other distantly related groups is usually difficult For exampleSpinks et al (2010) collected 120 nuclear markers from aviansquamate and mammalian phylogenetic studies and evalu-ated their PCR performance in turtles They found that onlyeight nuclear markers successfully produced single expectedbands across 13 tested turtle species In another case Fongand Fujita (2011) developed 75 nuclear markers for vertebratephylogenetics but approximately 60 of the target fragmentswere unable to obtain in three test species (two reptiles andone lissamphibian) Therefore although the nested PCRmethod introduced here requires an additional PCR reactionthe extra work is still worthwhile
In PCR-based phylogenetic projects even when thePCR reactions are successful the products often contain sig-nificant nonspecific amplicons Such a condition requires ad-ditional effort involving gel purification and cloning whichinvolves much more time than the PCR reaction Our NPCLtoolkit is specifically designed to solve this problem so thatnormally over 90 of PCR reactions produce strong andsingle expected bands Moreover most of the primers usedto date for nuclear marker sets are degenerate and thus arenot suitable for direct sequencing PCR products Benefitingfrom the use of our nested PCR strategy we introduce an-choring sequences to the ends of PCR fragments while main-taining PCR efficiency Such introduced anchoring sequencesbring the added benefit that two specific sequencing primers(Seq_F and Seq_R) can be used in all Sanger sequencingreactions
One additional feature of our NPCL toolkit is that theaverage length of the NPCLs within it is 1050 bp a lengththat can easily be amplified in one PCR reaction and se-quenced in both directions to allow efficient use of resourcesIn contrast the average marker lengths are 920 bp for 10NPCLs in Li et al (2007) 760 bp for 26 NPCLs in Townsendet al (2008) 873 bp for 22 NPCLs in Shen et al (2011) and470 bp for 75 NPCLs in Fong and Fujita (2011) respectivelyLonger markers will provide more sites than shorter ones for
equivalent money and time This feature makes our NPCLtoolkit more cost-effective than previously developed nuclearmarker sets
Phylogenetic Utility
The vertebrate NPCL toolkit we developed here shows greatpromise in terms of phylogenetic utility A remarkable featureof our NPCL toolkit is that it provided 102 NPCLs with abroad range of evolutionary rates In the case of our demon-stration we used 30 NPCLs to resolve a family-level salaman-der phylogeny using both traditional concatenation analysesand a more promising species-tree analysis However thisexample does not mean that our toolkit performs wellonly on deep-timescale questions Our ongoing study usingthis toolkit to resolve the intra-relationships withinPlethodontidae a rapidly radiating group of salamanders sug-gests that the toolkit developed here also performs well inresolving species-level phylogenies For many vertebrategroups in which applicable nuclear markers are limitedsuch as some teleosts frogs and salamanders our NPCLtoolkit can provide a one-stop solution for phylogenetic stud-ies from the family level to the species level Even for thosegroups in which specific nuclear marker sets have beendeveloped our toolkit is still worth trying as many moreloci can be easily obtained that may resolve some difficultbranches
The Toolkit Is a Good Addition to Sequence CaptureApproaches
Recently sequence capture approaches have been applied tovertebrate phylogenomics (Crawford et al 2012 Fairclothet al 2012 Lemmon et al 2012 McCormack et al 2012)These approaches begin with the selective capture of geno-mic regions Briefly fragmented gDNA is hybridized to DNAor RNA probes either on an array or in solution NontargetedDNA is then washed away and the targeted DNA is se-quenced through NGS The most promising feature of thesequence capture approach is that it can simultaneously pro-duce hundreds to thousands of loci for tens of individualswithin a relatively short time Therefore the sequence captureapproach is considered to be much more cost-effective thanthe PCR-based method According to the calculation ofLemmon et al (2012) for a 100 taxa 500 loci project thecost of the sequence capture method is just 1ndash35 of thePCR-based method
However the sequence capture approach is currently toochallenging for most phylogenetic researchers Typical NGSruns (454 or Illumina) used by the sequence capture methodgenerate 1000000ndash2000000000 sequences Storing and pro-cessing these NGS data require significant computer memoryhardware upgrades and bioinformatic programming skillswhich are often not familiar to most phylogenetic researchersMoreover phylogenetic reconstruction assumes that ortho-logous genes are being analyzed across species For the PCR-based method the detection of paralogous genes is relativelystraightforward However in the sequence capture methodthe captured genomic regions comprise short conservedcores (probe regions) and long unconserved flanking
1
100
90
80
70
60
50
30
20
10
0
40
5 10 15 20 25 30
Concatenation analyses
Species tree analyses
Boo
tstr
ap s
uppo
rt (
)
Number of genes
FIG 5 The effect of increasing the number of nuclear loci on resolvingthe basal split within salamanders Each data point represents the meanof support values estimated from 30 randomly sampled subsets Thedashed line indicates the threshold of 95 bootstrap support valuesThe statistical plots show that the minimum number of nuclear locineeded to robustly resolve the basal split within salamanders is 25
2242
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
sequences Because paralogy cannot be detected until afterthe data are aligned those unalignable sequences will makethe detection of paralogy more difficult
In fact not every phylogenetic project will use more than500 loci as the sequence capture method normally doesBased on both empirical and simulation data 20ndash50 lociare generally sufficient to answer many phylogenetic ques-tions (Rokas et al 2003 Spinks et al 2009) This is also thenumber of loci that most phylogenetic studies will use Insuch a situation adopting the sequence capture method isnot cost-effective because researchers need to use relativelyexpensive NGS sequencing and spend time learning new ex-perimental techniques and carrying out sophisticated bioin-formatic processing Our NPCL toolkit is specially designed forsuch medium-scale phylogenetic projects using approxi-mately 50 loci Such a number of expected loci can beeasily fulfilled with our 102 NPCLs Because more than 90of the PCR reactions generated by our toolkit can be directlysequenced the average cost for one locus per sample is ratherlow In our laboratory generating one new sequence typicallycosts US$ 3 (without considering labor)
In addition researchers sometimes have only tiny amountsof DNA but they wish to perform a multilocus phylogeneticanalysis In such a situation the sequence capture method isdifficult to implement because it normally requires DNA atthe microgram level (Lemmon et al 2012) Our NPCL toolkitcan fill the gap here Benefiting from the use of the nestedPCR strategy the sensitivity of PCR reactions in our method isextremely high In many test experiments in our laboratorythe toolkit and protocol could produce target bands withonly 5ndash10 ng of DNA
Our NPCL toolkit is an alternative to the sequence capturemethod for the everyday work of phylogenetic researchersWhich method to choose depends on two major drivers theamounts of DNA and the expected number of loci Whenyour DNA is limited the better solution may be PCR other-wise sequence capture also works Taking into account themoney and time the two methods require we speculate thatthe economic transition point from PCR to sequence captureis at approximately 100 loci That assessment is why ourtoolkit includes 102 NPCL markers Our proposal is thatwhen using 100 loci one can try our NPCL toolkit whenusing gt100 loci sequence capture should be used
Future Directions
In this study we used multiple genome alignments depositedin the University of CaliforniandashSan Cruz (UCSC) genomebrowser to identify long and conserved exons across jawedvertebrates Benefiting from the use of a nested PCR strategythe experimental performance of the developed NPCLs indi-cated that they are highly stable in all major jawed vertebrategroups Recently a database for mining exon and intron mar-kers called EvolMarkers has been built by Li et al (2012)Careful investigation of this database may identify many con-served exons within nonvertebrates whose interrelationshipsare currently more problematic than those of vertebratesBecause the nonvertebrates constitute many distantly related
groups it may be impossible to develop a single set of PCRprimers for all nonvertebrates However following a similarmarker development strategy multiple NPCL toolkits couldbe constructed for various groups of nonvertebrates such asarthropods echinoderms and molluscs In addition becauseintrons are flanked by conserved exons the idea of the use ofnested PCRs for marker development could also be applied tothe development of EPIC (exon-primed intron crossing) mar-kers which are more suitable in shallow-scale phylogenetic orphylogeographic projects
Despite the benefits of our proposed method it must berecognized that when handling large-scale projects such as200 taxa 100 loci the use of our toolkit and Sanger se-quencing will still require significant cost time and laborAn alternative solution is to use NGS to replace Sanger se-quencing Recently 454 NGS technology has been applied tosequence-targeted gene regions from a pool of PCR productsfrom different specimens (Binladen et al 2007 Meyer et al2008) In such experiments specific tagging sequences mustbe added to amplicons by either PCR (Binladen et al 2007) orblunt-end ligation (Meyer et al 2008) Therefore if the tailingsequences of the second-round PCR primers in our NPCLtoolkit are replaced by tagging sequences instead (for tagdesigning see Faircloth and Glenn 2012) all PCR productscan be pooled together and sequenced with the 454 NGSwhich will greatly reduce the money and time cost comparedwith Sanger sequencing However parallel tagged sequencingvia NGS does not circumvent the process of PCR for eachindividual at each locus which may be the most onerous partof a large-scale phylogenomic project Some promising newtechnologies may help to solve this problem such as micro-droplet PCR (Tewhey et al 2009) where millions of individualPCR reactions are performed in picoliter-scale droplets simul-taneously and the 9696 Dynamic Array by Fluidigm whichallows 96 primer combinations to be used on 96 samples(9216 total PCR reactions) on a single PCR plate Howeverthere has been little research to applying NGS and new high-throughput PCR technologies to phylogenomics so theirease-of-use and cost-effectiveness still need to be explored
Summary
In conclusion we have developed an improved method forrapidly amplifying and sequencing NPCLs that has proven tobe useful and effective for molecular phylogenetic studies ofvertebrates The newly developed toolkit provides an attrac-tive alternative to available methods for vertebratephylogenomics
Materials and Methods
Development of NPCL and Primer Design
Our previous study showed that nested PCR is overwhel-mingly more effective than conventional PCR for obtainingtarget amplicons from complex genomic environments (Shenet al 2012) However nested PCR requires four conservedregions to design two pairs of primers (illustrated in fig 6yellow blocks represent the conserved regions used for primerdesign) which means that only relatively long exons are
2243
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li
et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons
As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the
Enrich target region from complex genomic environment with one pair of high degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min
Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min
Evaluate agarose gel electrophoretic results and sequencing
gDNA
the first round PCR product
Second PCR with tailed primers F2 and R2 using the 1st PCR as template
First PCR with primers F1 and R1 using gDNA as template
PCR evaluating and sequencing
25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing
A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product
PCR was performed with 50-100 ng DNA in a 25 ul reaction
PCR was performed with 1ul 1st PCR in a 25 ul reaction
(i)
(ii)
(iii)
F1
R2
F2
Seq_F
Seq_R
R1
Target Region
Target Region
Target Region
Target Region
Target Region
conserved blocks
single and strong bandN
Y (normally gt 90)
gel cutting or cloning then sequencing
cleanup with ExoI and FastAPdirect sequencing by general sequencing primers
Seq_F and Seq_R
Laboratory Protocol
FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region
2244
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR
Experimental Testing for Candidate Markers in16 Jawed Vertebrates
To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification
All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C
One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the
PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6
Calculation of Relative Evolutionary Rateof 102 NPCLs
The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates
Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships
To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant
2245
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online
Phylogenetic Analyses
Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)
For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model
The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence
We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree
Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8
model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates
Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)
Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)
Acknowledgments
The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)
ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R
Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197
Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786
Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375
Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press
Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749
Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543
Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726
Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990
Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307
Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370
2246
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577
Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321
Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563
Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586
Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518
Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179
Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276
Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701
Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288
Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145
Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744
Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744
Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971
Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302
McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754
McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538
Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945
Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278
Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618
Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712
Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620
Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10
Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583
Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892
Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804
Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542
Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214
San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561
San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599
Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252
Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256
Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508
Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247
Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947
Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129
Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956
Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690
Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031
Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525
Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195
Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142
Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777
Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110
2247
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156
Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508
Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400
Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140
2248
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Aneides hardii
Plethodon jordani
Batrachoseps major
Eurycea bislineata
Amphiuma means
Rhyacotriton variegatus
Proteus anguinus
Necturus beyeri
Tylototriton asperrimus
Cynops orientalis
Salamandra salamandra
Dicamptodon aterrimus
Ambystoma mexicanum
Pseudobranchus axanthus
Siren intermedia
Ranodon sibiricus
Batrachuperus yenyuanensis
Onychodactylus fischeri
Andrias davidianus
Silurana tropicalis
Bombina fortinuptialis
Typhlonectes natans
Gallus gallus
Ichthyophis bannanicus
Homo sapiens
Chrysemys picta bellii
Mus musculus
Latimeria chalumnae
01 subsititutionssite
Dicamptodontidae
Hynobiidae
Cryptobranchidae
Sirenidae
Plethodontidae
Rhyacotritonidae
Amphiumidae
Proteidae
Salamandridae
Ambystomatidae
ANURA
GYMNOPHIONA
Cry
ptob
ranc
hoid
eaSa
lam
andr
oide
a
30 nuclear genes
(total 27834 bp)
1
Non-amphibianOutgroup
99101083
99101074
FIG 4 Higher-level phylogenetic relationships of 10 salamander families inferred from 30 NPCL markers The tree was inferred by concatenationanalyses using ML BI and the mixture model (CAT) and by species-tree analysis using the pseudo-ML approach (MP-EST) Branch support valuesare indicated beside nodes in order of ML bootstrap (BPML) BI posterior probability (PPBI) CAT posterior probability (PPCAT) and MP-EST bootstrap(BPMP-EST) from left to right The filled squares represent BPMLgt 95 PPBAY = 10 PPCAT = 10 and BPMP-ESTgt 95 The circled number refers to the nodeof interest studied in figure 6 Branch lengths are from the ML analysis
Table 2 Statistical Confidence (P Values) for Alternative Branching Hypotheses Based on 30-Gene Data Set
Alternative Topology Tested Ln L P Value Rejection
Sirenidae is sister to Cryptobranchoidea 423 0004 0002 0004 + + +
Gymnophiona is sister to Anura (monophyletic lissamphibians) 343 0013 0012 0013 + + +
Gymnophiona is sister to Caudata (monophyletic lissamphibians) 439 0002 0001 0001 + + +
Anura is sister to Amniota (paraphyletic lissamphibians) 1446 5E30 0 0 + + +
Gymnophiona is sister to Amniota (paraphyletic lissamphibians) 1290 1E69 0 0 + + +
Caudata is sister to Amniota (paraphyletic lissamphibians) 1728 00001 0 0 + + +
2241
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
other distantly related groups is usually difficult For exampleSpinks et al (2010) collected 120 nuclear markers from aviansquamate and mammalian phylogenetic studies and evalu-ated their PCR performance in turtles They found that onlyeight nuclear markers successfully produced single expectedbands across 13 tested turtle species In another case Fongand Fujita (2011) developed 75 nuclear markers for vertebratephylogenetics but approximately 60 of the target fragmentswere unable to obtain in three test species (two reptiles andone lissamphibian) Therefore although the nested PCRmethod introduced here requires an additional PCR reactionthe extra work is still worthwhile
In PCR-based phylogenetic projects even when thePCR reactions are successful the products often contain sig-nificant nonspecific amplicons Such a condition requires ad-ditional effort involving gel purification and cloning whichinvolves much more time than the PCR reaction Our NPCLtoolkit is specifically designed to solve this problem so thatnormally over 90 of PCR reactions produce strong andsingle expected bands Moreover most of the primers usedto date for nuclear marker sets are degenerate and thus arenot suitable for direct sequencing PCR products Benefitingfrom the use of our nested PCR strategy we introduce an-choring sequences to the ends of PCR fragments while main-taining PCR efficiency Such introduced anchoring sequencesbring the added benefit that two specific sequencing primers(Seq_F and Seq_R) can be used in all Sanger sequencingreactions
One additional feature of our NPCL toolkit is that theaverage length of the NPCLs within it is 1050 bp a lengththat can easily be amplified in one PCR reaction and se-quenced in both directions to allow efficient use of resourcesIn contrast the average marker lengths are 920 bp for 10NPCLs in Li et al (2007) 760 bp for 26 NPCLs in Townsendet al (2008) 873 bp for 22 NPCLs in Shen et al (2011) and470 bp for 75 NPCLs in Fong and Fujita (2011) respectivelyLonger markers will provide more sites than shorter ones for
equivalent money and time This feature makes our NPCLtoolkit more cost-effective than previously developed nuclearmarker sets
Phylogenetic Utility
The vertebrate NPCL toolkit we developed here shows greatpromise in terms of phylogenetic utility A remarkable featureof our NPCL toolkit is that it provided 102 NPCLs with abroad range of evolutionary rates In the case of our demon-stration we used 30 NPCLs to resolve a family-level salaman-der phylogeny using both traditional concatenation analysesand a more promising species-tree analysis However thisexample does not mean that our toolkit performs wellonly on deep-timescale questions Our ongoing study usingthis toolkit to resolve the intra-relationships withinPlethodontidae a rapidly radiating group of salamanders sug-gests that the toolkit developed here also performs well inresolving species-level phylogenies For many vertebrategroups in which applicable nuclear markers are limitedsuch as some teleosts frogs and salamanders our NPCLtoolkit can provide a one-stop solution for phylogenetic stud-ies from the family level to the species level Even for thosegroups in which specific nuclear marker sets have beendeveloped our toolkit is still worth trying as many moreloci can be easily obtained that may resolve some difficultbranches
The Toolkit Is a Good Addition to Sequence CaptureApproaches
Recently sequence capture approaches have been applied tovertebrate phylogenomics (Crawford et al 2012 Fairclothet al 2012 Lemmon et al 2012 McCormack et al 2012)These approaches begin with the selective capture of geno-mic regions Briefly fragmented gDNA is hybridized to DNAor RNA probes either on an array or in solution NontargetedDNA is then washed away and the targeted DNA is se-quenced through NGS The most promising feature of thesequence capture approach is that it can simultaneously pro-duce hundreds to thousands of loci for tens of individualswithin a relatively short time Therefore the sequence captureapproach is considered to be much more cost-effective thanthe PCR-based method According to the calculation ofLemmon et al (2012) for a 100 taxa 500 loci project thecost of the sequence capture method is just 1ndash35 of thePCR-based method
However the sequence capture approach is currently toochallenging for most phylogenetic researchers Typical NGSruns (454 or Illumina) used by the sequence capture methodgenerate 1000000ndash2000000000 sequences Storing and pro-cessing these NGS data require significant computer memoryhardware upgrades and bioinformatic programming skillswhich are often not familiar to most phylogenetic researchersMoreover phylogenetic reconstruction assumes that ortho-logous genes are being analyzed across species For the PCR-based method the detection of paralogous genes is relativelystraightforward However in the sequence capture methodthe captured genomic regions comprise short conservedcores (probe regions) and long unconserved flanking
1
100
90
80
70
60
50
30
20
10
0
40
5 10 15 20 25 30
Concatenation analyses
Species tree analyses
Boo
tstr
ap s
uppo
rt (
)
Number of genes
FIG 5 The effect of increasing the number of nuclear loci on resolvingthe basal split within salamanders Each data point represents the meanof support values estimated from 30 randomly sampled subsets Thedashed line indicates the threshold of 95 bootstrap support valuesThe statistical plots show that the minimum number of nuclear locineeded to robustly resolve the basal split within salamanders is 25
2242
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
sequences Because paralogy cannot be detected until afterthe data are aligned those unalignable sequences will makethe detection of paralogy more difficult
In fact not every phylogenetic project will use more than500 loci as the sequence capture method normally doesBased on both empirical and simulation data 20ndash50 lociare generally sufficient to answer many phylogenetic ques-tions (Rokas et al 2003 Spinks et al 2009) This is also thenumber of loci that most phylogenetic studies will use Insuch a situation adopting the sequence capture method isnot cost-effective because researchers need to use relativelyexpensive NGS sequencing and spend time learning new ex-perimental techniques and carrying out sophisticated bioin-formatic processing Our NPCL toolkit is specially designed forsuch medium-scale phylogenetic projects using approxi-mately 50 loci Such a number of expected loci can beeasily fulfilled with our 102 NPCLs Because more than 90of the PCR reactions generated by our toolkit can be directlysequenced the average cost for one locus per sample is ratherlow In our laboratory generating one new sequence typicallycosts US$ 3 (without considering labor)
In addition researchers sometimes have only tiny amountsof DNA but they wish to perform a multilocus phylogeneticanalysis In such a situation the sequence capture method isdifficult to implement because it normally requires DNA atthe microgram level (Lemmon et al 2012) Our NPCL toolkitcan fill the gap here Benefiting from the use of the nestedPCR strategy the sensitivity of PCR reactions in our method isextremely high In many test experiments in our laboratorythe toolkit and protocol could produce target bands withonly 5ndash10 ng of DNA
Our NPCL toolkit is an alternative to the sequence capturemethod for the everyday work of phylogenetic researchersWhich method to choose depends on two major drivers theamounts of DNA and the expected number of loci Whenyour DNA is limited the better solution may be PCR other-wise sequence capture also works Taking into account themoney and time the two methods require we speculate thatthe economic transition point from PCR to sequence captureis at approximately 100 loci That assessment is why ourtoolkit includes 102 NPCL markers Our proposal is thatwhen using 100 loci one can try our NPCL toolkit whenusing gt100 loci sequence capture should be used
Future Directions
In this study we used multiple genome alignments depositedin the University of CaliforniandashSan Cruz (UCSC) genomebrowser to identify long and conserved exons across jawedvertebrates Benefiting from the use of a nested PCR strategythe experimental performance of the developed NPCLs indi-cated that they are highly stable in all major jawed vertebrategroups Recently a database for mining exon and intron mar-kers called EvolMarkers has been built by Li et al (2012)Careful investigation of this database may identify many con-served exons within nonvertebrates whose interrelationshipsare currently more problematic than those of vertebratesBecause the nonvertebrates constitute many distantly related
groups it may be impossible to develop a single set of PCRprimers for all nonvertebrates However following a similarmarker development strategy multiple NPCL toolkits couldbe constructed for various groups of nonvertebrates such asarthropods echinoderms and molluscs In addition becauseintrons are flanked by conserved exons the idea of the use ofnested PCRs for marker development could also be applied tothe development of EPIC (exon-primed intron crossing) mar-kers which are more suitable in shallow-scale phylogenetic orphylogeographic projects
Despite the benefits of our proposed method it must berecognized that when handling large-scale projects such as200 taxa 100 loci the use of our toolkit and Sanger se-quencing will still require significant cost time and laborAn alternative solution is to use NGS to replace Sanger se-quencing Recently 454 NGS technology has been applied tosequence-targeted gene regions from a pool of PCR productsfrom different specimens (Binladen et al 2007 Meyer et al2008) In such experiments specific tagging sequences mustbe added to amplicons by either PCR (Binladen et al 2007) orblunt-end ligation (Meyer et al 2008) Therefore if the tailingsequences of the second-round PCR primers in our NPCLtoolkit are replaced by tagging sequences instead (for tagdesigning see Faircloth and Glenn 2012) all PCR productscan be pooled together and sequenced with the 454 NGSwhich will greatly reduce the money and time cost comparedwith Sanger sequencing However parallel tagged sequencingvia NGS does not circumvent the process of PCR for eachindividual at each locus which may be the most onerous partof a large-scale phylogenomic project Some promising newtechnologies may help to solve this problem such as micro-droplet PCR (Tewhey et al 2009) where millions of individualPCR reactions are performed in picoliter-scale droplets simul-taneously and the 9696 Dynamic Array by Fluidigm whichallows 96 primer combinations to be used on 96 samples(9216 total PCR reactions) on a single PCR plate Howeverthere has been little research to applying NGS and new high-throughput PCR technologies to phylogenomics so theirease-of-use and cost-effectiveness still need to be explored
Summary
In conclusion we have developed an improved method forrapidly amplifying and sequencing NPCLs that has proven tobe useful and effective for molecular phylogenetic studies ofvertebrates The newly developed toolkit provides an attrac-tive alternative to available methods for vertebratephylogenomics
Materials and Methods
Development of NPCL and Primer Design
Our previous study showed that nested PCR is overwhel-mingly more effective than conventional PCR for obtainingtarget amplicons from complex genomic environments (Shenet al 2012) However nested PCR requires four conservedregions to design two pairs of primers (illustrated in fig 6yellow blocks represent the conserved regions used for primerdesign) which means that only relatively long exons are
2243
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li
et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons
As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the
Enrich target region from complex genomic environment with one pair of high degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min
Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min
Evaluate agarose gel electrophoretic results and sequencing
gDNA
the first round PCR product
Second PCR with tailed primers F2 and R2 using the 1st PCR as template
First PCR with primers F1 and R1 using gDNA as template
PCR evaluating and sequencing
25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing
A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product
PCR was performed with 50-100 ng DNA in a 25 ul reaction
PCR was performed with 1ul 1st PCR in a 25 ul reaction
(i)
(ii)
(iii)
F1
R2
F2
Seq_F
Seq_R
R1
Target Region
Target Region
Target Region
Target Region
Target Region
conserved blocks
single and strong bandN
Y (normally gt 90)
gel cutting or cloning then sequencing
cleanup with ExoI and FastAPdirect sequencing by general sequencing primers
Seq_F and Seq_R
Laboratory Protocol
FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region
2244
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR
Experimental Testing for Candidate Markers in16 Jawed Vertebrates
To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification
All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C
One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the
PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6
Calculation of Relative Evolutionary Rateof 102 NPCLs
The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates
Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships
To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant
2245
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online
Phylogenetic Analyses
Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)
For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model
The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence
We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree
Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8
model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates
Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)
Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)
Acknowledgments
The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)
ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R
Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197
Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786
Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375
Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press
Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749
Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543
Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726
Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990
Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307
Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370
2246
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577
Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321
Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563
Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586
Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518
Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179
Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276
Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701
Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288
Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145
Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744
Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744
Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971
Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302
McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754
McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538
Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945
Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278
Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618
Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712
Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620
Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10
Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583
Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892
Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804
Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542
Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214
San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561
San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599
Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252
Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256
Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508
Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247
Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947
Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129
Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956
Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690
Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031
Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525
Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195
Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142
Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777
Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110
2247
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156
Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508
Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400
Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140
2248
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
other distantly related groups is usually difficult For exampleSpinks et al (2010) collected 120 nuclear markers from aviansquamate and mammalian phylogenetic studies and evalu-ated their PCR performance in turtles They found that onlyeight nuclear markers successfully produced single expectedbands across 13 tested turtle species In another case Fongand Fujita (2011) developed 75 nuclear markers for vertebratephylogenetics but approximately 60 of the target fragmentswere unable to obtain in three test species (two reptiles andone lissamphibian) Therefore although the nested PCRmethod introduced here requires an additional PCR reactionthe extra work is still worthwhile
In PCR-based phylogenetic projects even when thePCR reactions are successful the products often contain sig-nificant nonspecific amplicons Such a condition requires ad-ditional effort involving gel purification and cloning whichinvolves much more time than the PCR reaction Our NPCLtoolkit is specifically designed to solve this problem so thatnormally over 90 of PCR reactions produce strong andsingle expected bands Moreover most of the primers usedto date for nuclear marker sets are degenerate and thus arenot suitable for direct sequencing PCR products Benefitingfrom the use of our nested PCR strategy we introduce an-choring sequences to the ends of PCR fragments while main-taining PCR efficiency Such introduced anchoring sequencesbring the added benefit that two specific sequencing primers(Seq_F and Seq_R) can be used in all Sanger sequencingreactions
One additional feature of our NPCL toolkit is that theaverage length of the NPCLs within it is 1050 bp a lengththat can easily be amplified in one PCR reaction and se-quenced in both directions to allow efficient use of resourcesIn contrast the average marker lengths are 920 bp for 10NPCLs in Li et al (2007) 760 bp for 26 NPCLs in Townsendet al (2008) 873 bp for 22 NPCLs in Shen et al (2011) and470 bp for 75 NPCLs in Fong and Fujita (2011) respectivelyLonger markers will provide more sites than shorter ones for
equivalent money and time This feature makes our NPCLtoolkit more cost-effective than previously developed nuclearmarker sets
Phylogenetic Utility
The vertebrate NPCL toolkit we developed here shows greatpromise in terms of phylogenetic utility A remarkable featureof our NPCL toolkit is that it provided 102 NPCLs with abroad range of evolutionary rates In the case of our demon-stration we used 30 NPCLs to resolve a family-level salaman-der phylogeny using both traditional concatenation analysesand a more promising species-tree analysis However thisexample does not mean that our toolkit performs wellonly on deep-timescale questions Our ongoing study usingthis toolkit to resolve the intra-relationships withinPlethodontidae a rapidly radiating group of salamanders sug-gests that the toolkit developed here also performs well inresolving species-level phylogenies For many vertebrategroups in which applicable nuclear markers are limitedsuch as some teleosts frogs and salamanders our NPCLtoolkit can provide a one-stop solution for phylogenetic stud-ies from the family level to the species level Even for thosegroups in which specific nuclear marker sets have beendeveloped our toolkit is still worth trying as many moreloci can be easily obtained that may resolve some difficultbranches
The Toolkit Is a Good Addition to Sequence CaptureApproaches
Recently sequence capture approaches have been applied tovertebrate phylogenomics (Crawford et al 2012 Fairclothet al 2012 Lemmon et al 2012 McCormack et al 2012)These approaches begin with the selective capture of geno-mic regions Briefly fragmented gDNA is hybridized to DNAor RNA probes either on an array or in solution NontargetedDNA is then washed away and the targeted DNA is se-quenced through NGS The most promising feature of thesequence capture approach is that it can simultaneously pro-duce hundreds to thousands of loci for tens of individualswithin a relatively short time Therefore the sequence captureapproach is considered to be much more cost-effective thanthe PCR-based method According to the calculation ofLemmon et al (2012) for a 100 taxa 500 loci project thecost of the sequence capture method is just 1ndash35 of thePCR-based method
However the sequence capture approach is currently toochallenging for most phylogenetic researchers Typical NGSruns (454 or Illumina) used by the sequence capture methodgenerate 1000000ndash2000000000 sequences Storing and pro-cessing these NGS data require significant computer memoryhardware upgrades and bioinformatic programming skillswhich are often not familiar to most phylogenetic researchersMoreover phylogenetic reconstruction assumes that ortho-logous genes are being analyzed across species For the PCR-based method the detection of paralogous genes is relativelystraightforward However in the sequence capture methodthe captured genomic regions comprise short conservedcores (probe regions) and long unconserved flanking
1
100
90
80
70
60
50
30
20
10
0
40
5 10 15 20 25 30
Concatenation analyses
Species tree analyses
Boo
tstr
ap s
uppo
rt (
)
Number of genes
FIG 5 The effect of increasing the number of nuclear loci on resolvingthe basal split within salamanders Each data point represents the meanof support values estimated from 30 randomly sampled subsets Thedashed line indicates the threshold of 95 bootstrap support valuesThe statistical plots show that the minimum number of nuclear locineeded to robustly resolve the basal split within salamanders is 25
2242
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
sequences Because paralogy cannot be detected until afterthe data are aligned those unalignable sequences will makethe detection of paralogy more difficult
In fact not every phylogenetic project will use more than500 loci as the sequence capture method normally doesBased on both empirical and simulation data 20ndash50 lociare generally sufficient to answer many phylogenetic ques-tions (Rokas et al 2003 Spinks et al 2009) This is also thenumber of loci that most phylogenetic studies will use Insuch a situation adopting the sequence capture method isnot cost-effective because researchers need to use relativelyexpensive NGS sequencing and spend time learning new ex-perimental techniques and carrying out sophisticated bioin-formatic processing Our NPCL toolkit is specially designed forsuch medium-scale phylogenetic projects using approxi-mately 50 loci Such a number of expected loci can beeasily fulfilled with our 102 NPCLs Because more than 90of the PCR reactions generated by our toolkit can be directlysequenced the average cost for one locus per sample is ratherlow In our laboratory generating one new sequence typicallycosts US$ 3 (without considering labor)
In addition researchers sometimes have only tiny amountsof DNA but they wish to perform a multilocus phylogeneticanalysis In such a situation the sequence capture method isdifficult to implement because it normally requires DNA atthe microgram level (Lemmon et al 2012) Our NPCL toolkitcan fill the gap here Benefiting from the use of the nestedPCR strategy the sensitivity of PCR reactions in our method isextremely high In many test experiments in our laboratorythe toolkit and protocol could produce target bands withonly 5ndash10 ng of DNA
Our NPCL toolkit is an alternative to the sequence capturemethod for the everyday work of phylogenetic researchersWhich method to choose depends on two major drivers theamounts of DNA and the expected number of loci Whenyour DNA is limited the better solution may be PCR other-wise sequence capture also works Taking into account themoney and time the two methods require we speculate thatthe economic transition point from PCR to sequence captureis at approximately 100 loci That assessment is why ourtoolkit includes 102 NPCL markers Our proposal is thatwhen using 100 loci one can try our NPCL toolkit whenusing gt100 loci sequence capture should be used
Future Directions
In this study we used multiple genome alignments depositedin the University of CaliforniandashSan Cruz (UCSC) genomebrowser to identify long and conserved exons across jawedvertebrates Benefiting from the use of a nested PCR strategythe experimental performance of the developed NPCLs indi-cated that they are highly stable in all major jawed vertebrategroups Recently a database for mining exon and intron mar-kers called EvolMarkers has been built by Li et al (2012)Careful investigation of this database may identify many con-served exons within nonvertebrates whose interrelationshipsare currently more problematic than those of vertebratesBecause the nonvertebrates constitute many distantly related
groups it may be impossible to develop a single set of PCRprimers for all nonvertebrates However following a similarmarker development strategy multiple NPCL toolkits couldbe constructed for various groups of nonvertebrates such asarthropods echinoderms and molluscs In addition becauseintrons are flanked by conserved exons the idea of the use ofnested PCRs for marker development could also be applied tothe development of EPIC (exon-primed intron crossing) mar-kers which are more suitable in shallow-scale phylogenetic orphylogeographic projects
Despite the benefits of our proposed method it must berecognized that when handling large-scale projects such as200 taxa 100 loci the use of our toolkit and Sanger se-quencing will still require significant cost time and laborAn alternative solution is to use NGS to replace Sanger se-quencing Recently 454 NGS technology has been applied tosequence-targeted gene regions from a pool of PCR productsfrom different specimens (Binladen et al 2007 Meyer et al2008) In such experiments specific tagging sequences mustbe added to amplicons by either PCR (Binladen et al 2007) orblunt-end ligation (Meyer et al 2008) Therefore if the tailingsequences of the second-round PCR primers in our NPCLtoolkit are replaced by tagging sequences instead (for tagdesigning see Faircloth and Glenn 2012) all PCR productscan be pooled together and sequenced with the 454 NGSwhich will greatly reduce the money and time cost comparedwith Sanger sequencing However parallel tagged sequencingvia NGS does not circumvent the process of PCR for eachindividual at each locus which may be the most onerous partof a large-scale phylogenomic project Some promising newtechnologies may help to solve this problem such as micro-droplet PCR (Tewhey et al 2009) where millions of individualPCR reactions are performed in picoliter-scale droplets simul-taneously and the 9696 Dynamic Array by Fluidigm whichallows 96 primer combinations to be used on 96 samples(9216 total PCR reactions) on a single PCR plate Howeverthere has been little research to applying NGS and new high-throughput PCR technologies to phylogenomics so theirease-of-use and cost-effectiveness still need to be explored
Summary
In conclusion we have developed an improved method forrapidly amplifying and sequencing NPCLs that has proven tobe useful and effective for molecular phylogenetic studies ofvertebrates The newly developed toolkit provides an attrac-tive alternative to available methods for vertebratephylogenomics
Materials and Methods
Development of NPCL and Primer Design
Our previous study showed that nested PCR is overwhel-mingly more effective than conventional PCR for obtainingtarget amplicons from complex genomic environments (Shenet al 2012) However nested PCR requires four conservedregions to design two pairs of primers (illustrated in fig 6yellow blocks represent the conserved regions used for primerdesign) which means that only relatively long exons are
2243
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li
et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons
As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the
Enrich target region from complex genomic environment with one pair of high degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min
Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min
Evaluate agarose gel electrophoretic results and sequencing
gDNA
the first round PCR product
Second PCR with tailed primers F2 and R2 using the 1st PCR as template
First PCR with primers F1 and R1 using gDNA as template
PCR evaluating and sequencing
25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing
A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product
PCR was performed with 50-100 ng DNA in a 25 ul reaction
PCR was performed with 1ul 1st PCR in a 25 ul reaction
(i)
(ii)
(iii)
F1
R2
F2
Seq_F
Seq_R
R1
Target Region
Target Region
Target Region
Target Region
Target Region
conserved blocks
single and strong bandN
Y (normally gt 90)
gel cutting or cloning then sequencing
cleanup with ExoI and FastAPdirect sequencing by general sequencing primers
Seq_F and Seq_R
Laboratory Protocol
FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region
2244
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR
Experimental Testing for Candidate Markers in16 Jawed Vertebrates
To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification
All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C
One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the
PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6
Calculation of Relative Evolutionary Rateof 102 NPCLs
The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates
Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships
To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant
2245
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online
Phylogenetic Analyses
Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)
For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model
The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence
We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree
Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8
model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates
Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)
Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)
Acknowledgments
The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)
ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R
Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197
Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786
Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375
Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press
Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749
Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543
Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726
Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990
Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307
Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370
2246
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577
Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321
Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563
Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586
Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518
Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179
Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276
Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701
Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288
Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145
Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744
Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744
Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971
Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302
McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754
McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538
Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945
Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278
Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618
Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712
Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620
Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10
Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583
Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892
Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804
Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542
Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214
San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561
San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599
Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252
Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256
Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508
Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247
Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947
Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129
Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956
Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690
Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031
Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525
Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195
Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142
Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777
Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110
2247
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156
Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508
Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400
Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140
2248
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
sequences Because paralogy cannot be detected until afterthe data are aligned those unalignable sequences will makethe detection of paralogy more difficult
In fact not every phylogenetic project will use more than500 loci as the sequence capture method normally doesBased on both empirical and simulation data 20ndash50 lociare generally sufficient to answer many phylogenetic ques-tions (Rokas et al 2003 Spinks et al 2009) This is also thenumber of loci that most phylogenetic studies will use Insuch a situation adopting the sequence capture method isnot cost-effective because researchers need to use relativelyexpensive NGS sequencing and spend time learning new ex-perimental techniques and carrying out sophisticated bioin-formatic processing Our NPCL toolkit is specially designed forsuch medium-scale phylogenetic projects using approxi-mately 50 loci Such a number of expected loci can beeasily fulfilled with our 102 NPCLs Because more than 90of the PCR reactions generated by our toolkit can be directlysequenced the average cost for one locus per sample is ratherlow In our laboratory generating one new sequence typicallycosts US$ 3 (without considering labor)
In addition researchers sometimes have only tiny amountsof DNA but they wish to perform a multilocus phylogeneticanalysis In such a situation the sequence capture method isdifficult to implement because it normally requires DNA atthe microgram level (Lemmon et al 2012) Our NPCL toolkitcan fill the gap here Benefiting from the use of the nestedPCR strategy the sensitivity of PCR reactions in our method isextremely high In many test experiments in our laboratorythe toolkit and protocol could produce target bands withonly 5ndash10 ng of DNA
Our NPCL toolkit is an alternative to the sequence capturemethod for the everyday work of phylogenetic researchersWhich method to choose depends on two major drivers theamounts of DNA and the expected number of loci Whenyour DNA is limited the better solution may be PCR other-wise sequence capture also works Taking into account themoney and time the two methods require we speculate thatthe economic transition point from PCR to sequence captureis at approximately 100 loci That assessment is why ourtoolkit includes 102 NPCL markers Our proposal is thatwhen using 100 loci one can try our NPCL toolkit whenusing gt100 loci sequence capture should be used
Future Directions
In this study we used multiple genome alignments depositedin the University of CaliforniandashSan Cruz (UCSC) genomebrowser to identify long and conserved exons across jawedvertebrates Benefiting from the use of a nested PCR strategythe experimental performance of the developed NPCLs indi-cated that they are highly stable in all major jawed vertebrategroups Recently a database for mining exon and intron mar-kers called EvolMarkers has been built by Li et al (2012)Careful investigation of this database may identify many con-served exons within nonvertebrates whose interrelationshipsare currently more problematic than those of vertebratesBecause the nonvertebrates constitute many distantly related
groups it may be impossible to develop a single set of PCRprimers for all nonvertebrates However following a similarmarker development strategy multiple NPCL toolkits couldbe constructed for various groups of nonvertebrates such asarthropods echinoderms and molluscs In addition becauseintrons are flanked by conserved exons the idea of the use ofnested PCRs for marker development could also be applied tothe development of EPIC (exon-primed intron crossing) mar-kers which are more suitable in shallow-scale phylogenetic orphylogeographic projects
Despite the benefits of our proposed method it must berecognized that when handling large-scale projects such as200 taxa 100 loci the use of our toolkit and Sanger se-quencing will still require significant cost time and laborAn alternative solution is to use NGS to replace Sanger se-quencing Recently 454 NGS technology has been applied tosequence-targeted gene regions from a pool of PCR productsfrom different specimens (Binladen et al 2007 Meyer et al2008) In such experiments specific tagging sequences mustbe added to amplicons by either PCR (Binladen et al 2007) orblunt-end ligation (Meyer et al 2008) Therefore if the tailingsequences of the second-round PCR primers in our NPCLtoolkit are replaced by tagging sequences instead (for tagdesigning see Faircloth and Glenn 2012) all PCR productscan be pooled together and sequenced with the 454 NGSwhich will greatly reduce the money and time cost comparedwith Sanger sequencing However parallel tagged sequencingvia NGS does not circumvent the process of PCR for eachindividual at each locus which may be the most onerous partof a large-scale phylogenomic project Some promising newtechnologies may help to solve this problem such as micro-droplet PCR (Tewhey et al 2009) where millions of individualPCR reactions are performed in picoliter-scale droplets simul-taneously and the 9696 Dynamic Array by Fluidigm whichallows 96 primer combinations to be used on 96 samples(9216 total PCR reactions) on a single PCR plate Howeverthere has been little research to applying NGS and new high-throughput PCR technologies to phylogenomics so theirease-of-use and cost-effectiveness still need to be explored
Summary
In conclusion we have developed an improved method forrapidly amplifying and sequencing NPCLs that has proven tobe useful and effective for molecular phylogenetic studies ofvertebrates The newly developed toolkit provides an attrac-tive alternative to available methods for vertebratephylogenomics
Materials and Methods
Development of NPCL and Primer Design
Our previous study showed that nested PCR is overwhel-mingly more effective than conventional PCR for obtainingtarget amplicons from complex genomic environments (Shenet al 2012) However nested PCR requires four conservedregions to design two pairs of primers (illustrated in fig 6yellow blocks represent the conserved regions used for primerdesign) which means that only relatively long exons are
2243
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li
et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons
As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the
Enrich target region from complex genomic environment with one pair of high degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min
Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min
Evaluate agarose gel electrophoretic results and sequencing
gDNA
the first round PCR product
Second PCR with tailed primers F2 and R2 using the 1st PCR as template
First PCR with primers F1 and R1 using gDNA as template
PCR evaluating and sequencing
25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing
A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product
PCR was performed with 50-100 ng DNA in a 25 ul reaction
PCR was performed with 1ul 1st PCR in a 25 ul reaction
(i)
(ii)
(iii)
F1
R2
F2
Seq_F
Seq_R
R1
Target Region
Target Region
Target Region
Target Region
Target Region
conserved blocks
single and strong bandN
Y (normally gt 90)
gel cutting or cloning then sequencing
cleanup with ExoI and FastAPdirect sequencing by general sequencing primers
Seq_F and Seq_R
Laboratory Protocol
FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region
2244
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR
Experimental Testing for Candidate Markers in16 Jawed Vertebrates
To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification
All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C
One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the
PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6
Calculation of Relative Evolutionary Rateof 102 NPCLs
The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates
Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships
To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant
2245
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online
Phylogenetic Analyses
Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)
For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model
The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence
We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree
Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8
model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates
Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)
Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)
Acknowledgments
The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)
ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R
Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197
Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786
Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375
Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press
Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749
Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543
Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726
Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990
Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307
Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370
2246
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577
Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321
Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563
Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586
Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518
Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179
Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276
Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701
Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288
Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145
Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744
Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744
Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971
Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302
McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754
McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538
Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945
Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278
Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618
Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712
Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620
Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10
Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583
Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892
Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804
Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542
Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214
San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561
San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599
Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252
Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256
Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508
Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247
Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947
Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129
Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956
Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690
Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031
Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525
Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195
Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142
Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777
Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110
2247
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156
Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508
Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400
Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140
2248
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li
et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons
As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the
Enrich target region from complex genomic environment with one pair of high degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min
Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers
Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min
Evaluate agarose gel electrophoretic results and sequencing
gDNA
the first round PCR product
Second PCR with tailed primers F2 and R2 using the 1st PCR as template
First PCR with primers F1 and R1 using gDNA as template
PCR evaluating and sequencing
25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing
A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product
PCR was performed with 50-100 ng DNA in a 25 ul reaction
PCR was performed with 1ul 1st PCR in a 25 ul reaction
(i)
(ii)
(iii)
F1
R2
F2
Seq_F
Seq_R
R1
Target Region
Target Region
Target Region
Target Region
Target Region
conserved blocks
single and strong bandN
Y (normally gt 90)
gel cutting or cloning then sequencing
cleanup with ExoI and FastAPdirect sequencing by general sequencing primers
Seq_F and Seq_R
Laboratory Protocol
FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region
2244
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR
Experimental Testing for Candidate Markers in16 Jawed Vertebrates
To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification
All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C
One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the
PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6
Calculation of Relative Evolutionary Rateof 102 NPCLs
The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates
Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships
To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant
2245
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online
Phylogenetic Analyses
Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)
For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model
The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence
We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree
Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8
model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates
Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)
Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)
Acknowledgments
The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)
ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R
Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197
Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786
Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375
Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press
Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749
Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543
Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726
Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990
Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307
Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370
2246
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577
Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321
Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563
Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586
Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518
Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179
Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276
Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701
Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288
Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145
Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744
Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744
Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971
Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302
McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754
McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538
Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945
Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278
Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618
Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712
Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620
Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10
Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583
Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892
Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804
Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542
Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214
San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561
San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599
Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252
Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256
Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508
Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247
Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947
Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129
Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956
Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690
Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031
Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525
Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195
Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142
Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777
Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110
2247
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156
Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508
Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400
Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140
2248
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR
Experimental Testing for Candidate Markers in16 Jawed Vertebrates
To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification
All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C
One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the
PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6
Calculation of Relative Evolutionary Rateof 102 NPCLs
The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates
Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships
To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant
2245
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online
Phylogenetic Analyses
Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)
For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model
The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence
We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree
Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8
model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates
Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)
Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)
Acknowledgments
The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)
ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R
Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197
Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786
Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375
Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press
Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749
Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543
Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726
Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990
Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307
Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370
2246
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577
Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321
Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563
Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586
Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518
Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179
Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276
Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701
Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288
Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145
Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744
Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744
Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971
Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302
McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754
McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538
Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945
Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278
Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618
Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712
Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620
Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10
Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583
Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892
Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804
Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542
Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214
San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561
San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599
Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252
Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256
Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508
Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247
Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947
Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129
Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956
Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690
Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031
Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525
Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195
Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142
Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777
Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110
2247
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156
Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508
Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400
Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140
2248
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online
Phylogenetic Analyses
Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)
For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model
The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence
We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree
Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8
model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates
Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)
Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)
Acknowledgments
The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)
ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R
Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197
Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786
Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375
Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press
Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749
Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543
Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726
Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990
Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307
Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370
2246
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577
Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321
Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563
Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586
Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518
Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179
Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276
Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701
Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288
Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145
Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744
Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744
Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971
Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302
McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754
McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538
Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945
Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278
Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618
Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712
Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620
Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10
Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583
Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892
Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804
Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542
Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214
San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561
San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599
Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252
Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256
Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508
Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247
Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947
Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129
Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956
Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690
Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031
Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525
Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195
Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142
Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777
Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110
2247
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156
Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508
Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400
Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140
2248
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577
Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321
Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563
Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586
Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518
Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179
Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276
Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701
Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288
Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145
Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744
Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744
Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971
Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302
McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754
McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538
Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945
Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278
Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618
Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712
Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620
Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10
Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583
Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892
Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804
Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542
Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214
San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561
San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599
Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252
Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256
Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508
Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247
Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947
Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129
Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956
Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690
Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031
Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525
Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195
Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142
Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777
Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110
2247
Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156
Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508
Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400
Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140
2248
Shen et al doi101093molbevmst122 MBE at V
anderbilt University - M
assey Law
Library on M
arch 18 2015httpm
beoxfordjournalsorgD
ownloaded from
Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156
Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508
Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400
Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140