Top Banner
Article Fast Track A Versatile and Highly Efficient Toolkit Including 102 Nuclear Markers for Vertebrate Phylogenomics, Tested by Resolving the Higher Level Relationships of the Caudata Xing Xing Shen, 1 Dan Liang, 1 Yan Jie Feng, 1 Meng Yun Chen, 1 and Peng Zhang* ,1 1 Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China *Corresponding author: E-mail: [email protected]. Associate editor: Xun Gu Abstract Resolving difficult nodes for any part of the vertebrate tree of life often requires analyzing a large number of loci. Developing molecular markers that are workable for the groups of interest is often a bottleneck in phylogenetic research. Here, on the basis of a nested polymerase chain reaction (PCR) strategy, we present a universal toolkit including 102 nuclear protein-coding locus (NPCL) markers for vertebrate phylogenomics. The 102 NPCL markers have a broad range of evolutionary rates, which makes them useful for a wide range of time depths. The new NPCL toolkit has three important advantages compared with all previously developed NPCL sets: 1) the kit is universally applicable across vertebrates, with a PCR success rate of 94.6% in 16 widely divergent tested vertebrate species; 2) more than 90% of PCR reactions produce strong and single bands of the expected sizes that can be directly sequenced; and 3) all cleanup PCR reactions can be sequenced with only two specific universal primers. To test its actual phylogenetic utility, 30 NPCLs from this toolkit were used to address the higher level relationships of living salamanders. Of the 639 target PCR reactions performed on 19 salamanders and several outgroup species, 632 (98.9%) were successful, and 602 (94.1%) were directly sequenced. Concatenation and species-tree analyses on this 30-locus data set produced a fully resolved phylogeny and showed that Cryptobranchoidea (Cryptobranchidae + Hynobiidae) branches first within the salamander tree, followed by Sirenidae. Our experimental tests and our demonstration for a particular case show that our NPCL toolkit is a highly reliable, fast, and cost-effective approach for vertebrate phylogenomic studies and thus has the potential to accelerate the completion of many parts of the vertebrate tree of life. Key words: nuclear marker, phylogenomic, vertebrate, salamander, phylogeny. Introduction Building phylogenomic supermatrices with multiple nuclear loci has become the standard method of resolving species relationships in difficult biological scenarios (Delsuc et al. 2005). One efficient method of constructing multilocus data sets is expressed sequence tag (EST) (Philippe and Telford 2006; Dunn et al. 2008; Philippe et al. 2009) or transcriptome (Ku ¨nstner et al. 2010) sequencing, in which high-quality RNA is extracted from each organism of inter- est and a huge number of ESTs or transcripts are then sequenced by Sanger or next-generation sequencing (NGS). However, this approach often generates patchy data sets with a high proportion of missing data, which may compromise phylogenetic inference (Lemmon et al. 2009; Roure et al. 2013). More importantly, this approach is not workable for many older collections because these specimens can only provide DNA samples. A second effi- cient way to construct multilocus data sets is the sequence capture method in which target genomic regions are selec- tively captured by hybridization with probes before NGS (Crawford et al. 2012; Faircloth et al. 2012; Lemmon et al. 2012; McCormack et al. 2012). The most attractive feature of this method is that it can generate hundreds to thou- sands of loci for many samples in a short time. However, experimentally, the efficiency of sequence capture is con- siderably influenced by the divergence between probes and target sequences (Lemmon et al. 2012; McCormack et al. 2013). More importantly, turning the huge data set derived from sequence capture into sequences that researchers can analyze requires sophisticated bioinformatic processing, which is currently quite challenging to most phylogenetic researchers (McCormack et al. 2013). Therefore, although the sequence capture method is efficient and promising, its immaturity currently restricts its wide application in the community. Currently, for vertebrate phylogenetics, the most widely used approach for building multilocus data sets is still con- ventional targeted polymerase chain reaction (PCR) and the sequencing of selected orthologous genes. However, the PCR-based method is laborious: 1) most practitioners spend much time developing and screening molecular mar- kers that are workable for their studied taxa and suitable to their evolutionary timescale of interest (Murphy et al. 2001; Li et al. 2007; Townsend et al. 2008; Wright et al. 2008; Shen et al. 2011); 2) it requires PCR of each organism at ß The Author 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Mol. Biol. Evol. 30(10):2235–2248 doi:10.1093/molbev/mst122 Advance Access publication July 4, 2013 2235 at Vanderbilt University - Massey Law Library on March 18, 2015 http://mbe.oxfordjournals.org/ Downloaded from
14

A Versatile and Highly Efficient Toolkit Including 102 ...completion of many parts of the vertebrate tree of life. Key words: nuclear marker, phylogenomic, vertebrate, salamander,

Jun 19, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Versatile and Highly Efficient Toolkit Including 102 ...completion of many parts of the vertebrate tree of life. Key words: nuclear marker, phylogenomic, vertebrate, salamander,

Article

FastT

rackA Versatile and Highly Efficient Toolkit Including 102 NuclearMarkers for Vertebrate Phylogenomics Tested by Resolvingthe Higher Level Relationships of the CaudataXing Xing Shen1 Dan Liang1 Yan Jie Feng1 Meng Yun Chen1 and Peng Zhang1

1Key Laboratory of Gene Engineering of the Ministry of Education State Key Laboratory of Biocontrol School of Life SciencesSun Yat-Sen University Guangzhou China

Corresponding author E-mail alarzhanggmailcom

Associate editor Xun Gu

Abstract

Resolving difficult nodes for any part of the vertebrate tree of life often requires analyzing a large number of lociDeveloping molecular markers that are workable for the groups of interest is often a bottleneck in phylogenetic researchHere on the basis of a nested polymerase chain reaction (PCR) strategy we present a universal toolkit including 102nuclear protein-coding locus (NPCL) markers for vertebrate phylogenomics The 102 NPCL markers have a broad range ofevolutionary rates which makes them useful for a wide range of time depths The new NPCL toolkit has three importantadvantages compared with all previously developed NPCL sets 1) the kit is universally applicable across vertebrates witha PCR success rate of 946 in 16 widely divergent tested vertebrate species 2) more than 90 of PCR reactions producestrong and single bands of the expected sizes that can be directly sequenced and 3) all cleanup PCR reactions can besequenced with only two specific universal primers To test its actual phylogenetic utility 30 NPCLs from this toolkit wereused to address the higher level relationships of living salamanders Of the 639 target PCR reactions performed on 19salamanders and several outgroup species 632 (989) were successful and 602 (941) were directly sequencedConcatenation and species-tree analyses on this 30-locus data set produced a fully resolved phylogeny and showedthat Cryptobranchoidea (Cryptobranchidae + Hynobiidae) branches first within the salamander tree followed bySirenidae Our experimental tests and our demonstration for a particular case show that our NPCL toolkit is a highlyreliable fast and cost-effective approach for vertebrate phylogenomic studies and thus has the potential to accelerate thecompletion of many parts of the vertebrate tree of life

Key words nuclear marker phylogenomic vertebrate salamander phylogeny

IntroductionBuilding phylogenomic supermatrices with multiple nuclearloci has become the standard method of resolving speciesrelationships in difficult biological scenarios (Delsuc et al2005) One efficient method of constructing multilocusdata sets is expressed sequence tag (EST) (Philippe andTelford 2006 Dunn et al 2008 Philippe et al 2009) ortranscriptome (Kunstner et al 2010) sequencing in whichhigh-quality RNA is extracted from each organism of inter-est and a huge number of ESTs or transcripts are thensequenced by Sanger or next-generation sequencing(NGS) However this approach often generates patchydata sets with a high proportion of missing data whichmay compromise phylogenetic inference (Lemmon et al2009 Roure et al 2013) More importantly this approachis not workable for many older collections because thesespecimens can only provide DNA samples A second effi-cient way to construct multilocus data sets is the sequencecapture method in which target genomic regions are selec-tively captured by hybridization with probes before NGS(Crawford et al 2012 Faircloth et al 2012 Lemmon et al2012 McCormack et al 2012) The most attractive feature

of this method is that it can generate hundreds to thou-sands of loci for many samples in a short time Howeverexperimentally the efficiency of sequence capture is con-siderably influenced by the divergence between probes andtarget sequences (Lemmon et al 2012 McCormack et al2013) More importantly turning the huge data set derivedfrom sequence capture into sequences that researchers cananalyze requires sophisticated bioinformatic processingwhich is currently quite challenging to most phylogeneticresearchers (McCormack et al 2013) Therefore althoughthe sequence capture method is efficient and promising itsimmaturity currently restricts its wide application in thecommunity

Currently for vertebrate phylogenetics the most widelyused approach for building multilocus data sets is still con-ventional targeted polymerase chain reaction (PCR) andthe sequencing of selected orthologous genes Howeverthe PCR-based method is laborious 1) most practitionersspend much time developing and screening molecular mar-kers that are workable for their studied taxa and suitable totheir evolutionary timescale of interest (Murphy et al 2001Li et al 2007 Townsend et al 2008 Wright et al 2008Shen et al 2011) 2) it requires PCR of each organism at

The Author 2013 Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution All rights reserved For permissions pleasee-mail journalspermissionsoupcom

Mol Biol Evol 30(10)2235ndash2248 doi101093molbevmst122 Advance Access publication July 4 2013 2235

at Vanderbilt U

niversity - Massey L

aw L

ibrary on March 18 2015

httpmbeoxfordjournalsorg

Dow

nloaded from

each locus not to mention the extra effort involved in PCRoptimization gel-purification and cloning On the otherhand the PCR-based method also has its advantages 1)it is highly targeted and can produce nearly complete datamatrices and the data analysis process is straightforwardand familiar to most empirical researchers 2) it requiresno prior genomic knowledge of the targeted organismsand 3) it works with tiny amounts of DNA and thusappears to be an ideal solution when DNA samples arelimited

For most interspecific phylogenetic projects nuclear pro-tein-coding loci (NPCLs) that are developed on exons arelikely the markers of choice for the PCR-based strategy be-cause they provide an appropriate level of variation easyalignment across a large phylogenetic span and relativelystraightforward detection of paralogs (Thomson et al 2010)In this study our aim was to develop a suite of universal NPCLmarkers and an efficient experimental protocol for vertebratephylogenomics Aimed at eliminating the drawbacks of theconventional PCR-based method we designed our NPCLtoolkit and protocol to 1) include approximately 100 NPCLmarkers (we think the economic transition from PCR to se-quence capture is at approximately 100 loci if more than 100loci are to be used the PCR method is not cost-efficient) 2)work for all major jawed vertebrate clades and provide goodresolution at different evolutionary timescales 3) producesingle and strong amplicon bands without any PCR optimi-zation in most cases and 4) yield PCR products that can bedirectly cleaned and sequenced without gel purification orcloning in most cases

Because our NPCL toolkit is designed for universal phy-logenetic applications in vertebrates it should be tested ina real case with some difficult samples Salamanders arewell known to have much larger genomes than most ver-tebrates (often 10 times the human genome httpwwwgenomesizecom) The PCR-based method normally per-forms poorly for salamanders (personal experience andcommunication with colleagues) For example Shen et al(2011) amplified 22 NPCL markers in 16 tested vertebratesIn all 15 nonsalamander species approximately 90 of themarkers could be successfully amplified however for thetested salamander species Batrachuperus yenyuanensis only8 of 22 NPCL markers (36) could be amplified Here weapply our NPCL toolkit and protocol to address the higherlevel relationships of living salamanders as a test ofthe toolkitrsquos utility Our results demonstrate that the newuniversal NPCL toolkit and protocol are fast and effectivein constructing multilocus data matrices for vertebratephylogenomics

Results

Experimental Performance and Characteristics of theNew NPCL Toolkit

The newly developed NPCL toolkit contains 102 NPCLmarkers ranging from 510 to 1650 bp with an averagelength of 1050 bp each NPCL marker comprises two pairsof primers for the nested PCR strategy (supplementary table

S1 Supplementary Material online) These 102 NPCL markersare broadly distributed on 21 chromosomes of the humangenome (fig 1) We classified their PCR performance intothree levels 1) producing a single target band of the expectedsize 2) producing a target band but also significant nonspe-cific bands and 3) not producing a target band The first twoconditions are considered successful The PCR performancesof the 102 NPCL markers across 16 diverse vertebrate speciesand three representative electropherograms are shown infigure 2 Of the 102 NPCL markers 57 have a 100 successrate in the 16 tested vertebrate species 87 have a success rateof more than 90 and the remaining 15 range from 56 to88 (fig 2) Of the 1632 PCR reactions (102 loci 16 taxa)1544 (946) were successful with 1485 (91) producingstrong single target bands that can be used for direct se-quencing In the demonstration case in which 30 NPCL mar-kers were used to investigate the higher level relationships ofliving salamanders 632 (989) of the 639 target fragmentswere successful Of the 632 successful reactions 602 (95)were directly sequenced with the general sequencing primersldquoSeq_Frdquo and ldquoSeq_Rrdquo The PCR success rates for each of the102 NPCL markers across the 16 tested vertebrate species areshown in figure 3

The evolutionary rate as evidenced by the degree of var-iability is an important parameter of an NPCL marker be-cause it determines applicability for different phylogeneticquestions Although our NPCL toolkit has a high PCR successrate in highly diverged taxa that success does not meanthat the NPCL markers in the toolkit are very conservedAs figure 3 illustrates our toolkit includes NPCLs with abroad range of evolutionary rates approximately 4-foldAmong the 102 NPCL markers 60 evolve faster than RAG1an NPCL that has been widely used for phylogenetic inferencein various vertebrate groups Because previous analyses basedon RAG1 data resulted in highly resolved and robustly sup-ported phylogenetic relationships at multiple hierarchicallevels (San Mauro et al 2005 Wiens et al 2005 Hugallet al 2007 Roelants et al 2007) this indicates that ourNPCL toolkit has the potential to resolve questions of bothdeep and shallow phylogeny

It is well known that the fish-specific genome duplicationoccurred in the teleosts (Meyer and Van de Peer 2005)Although most duplicated genes were secondarily lostsome were retained or evolved new functions For anNPCL marker if there are two similar copies in teleost ge-nomes it is difficult to check the orthologous status of theobtained fragments To this end we took the zebrafishsequence of each NPCL to Basic Local Alignment SearchTool (Blast) against all available teleost genomes in theENSEMBL database If an NPCL receives more than twoBlast hits and the top Blast score is not more than twicethe second Blast score that NPCL might have an extra copyin teleost genomes Using this method of the 102 NPCLs itwas found that only six (CXCR4 GLCE KCNF1 LINGO1NTN1 and PCDH10) may have extra copies in teleost ge-nomes (fig 3 supplementary table S1 SupplementaryMaterial online) This result indicates that our NPCL toolkitis also suitable for phylogenetic inference in teleosts

2236

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Phylogenetic Performance in a Real Case

Our demonstration case included 19 salamander species thatspan salamander evolutionary diversity (supplementary tableS2 Supplementary Material online) The nine outgroupspecies (two frogs two caecilians one turtle one bird twomammals and one coelacanth) provided a largely balancedrepresentation of relatives of salamanders The 30 newlyamplified NPCLs exhibited levels of variation comparablewith that of the traditional RAG1 gene with variable sitesvarying between 30 and 51 of all sequenced sites (table 1)The data set combining these 30 NPCLs comprises 27834 bpand exhibits little substitution saturation (supplementary figS1 Supplementary Material online) The phylogenetic analy-ses of the concatenated data set using three tree-buildingmethods (maximum likelihood [ML] Bayesian and CAT-mixture model) produced an identical fully resolved treefor 28 taxa (fig 4) In all 25 nodes of the tree the statisticalsupport was highly robust (BPML 99ndash100 PPBAY = 10PPCAT = 10) The species tree estimated from 30 individualNPCLs without data concatenation using the pseudo-ML

approach is identical to those estimated from the concate-nated analyses All nodes received bootstrap support valuesvarying between 74 and 100 (fig 4) We also conducted phy-logenetic analyses at the amino acid level (9278 deducedamino acid residues) using three tree-building methods(ML Bayesian and CAT-mixture model) The protein treetopology is identical to the DNA result with just slightlylower branch support for some nodes (supplementary figS2 Supplementary Material online) Therefore we did notfurther analyze the protein data set

The monophyly of extant amphibians with respect to am-niotes and the close relationship between frogs and salaman-ders (the Batrachia hypothesis) are repeatedly recovered inmost recent molecular studies (San Mauro et al 2005 Zhanget al 2005 Frost et al 2006 Hugall et al 2007 Roelants et al2007 Zhang and Wake 2009 San Mauro 2010 Pyron andWiens 2011) However a recent molecular study based on26 nuclear genes (Fong et al 2012) supports a caecilianndashsalamander sister relationship with the possible paraphylyof extant amphibians Our phylogenetic analyses based on

RERE

KIAA2013SPEN

CPT2

LPHN2

LRRC8D

DISP1

EXOC8

GGPS1

1 2 3 4 5 6 7 8 9 10 11

12 13 14 15 16 17 18 19 20 21 22 X

KCNF1

SOCS5

MSH6

LRRTM4

LCT

CXCR4

B3GALT1

TTN

IRS1

SH3BP4

LRRN1

CELSR3

GRM2

CASR

ZIC1

P2RY1

MB21D2

KIAA1239

ANKRD50

FAT4PCDH10

FAT1

ENC1

DMXL1

PCDH1ARSI

FAT2

FILIP1DOPEY1

FUT9

REV3L

DSE

SYNE1

GPERMIOS

KBTBD2

PCLO

PIK3CG

LRRN3

EXTL3

MOS

VCPIP1

PDP1

ZFPM2

ZHX2

LINGO2ZEB1

ROR2

GRIN3A

SVEP1

DBC1

DOLK

DCHS1

RAG1RAG2HYPCHST1

FZD4

ARID2

CAND1

MGAT4C

FICD

SACS

FREM2

MYCBP2

SLITRK1

LIG4

STON2

DISP2

VPS18

CILPFEM1BGLCELINGO1

DET1

PPL

DNAH3

SALL1

NTN1

MED1

WFIKKN2

MED13BPTF

EVPL

SETBP1

DSEL

FLRT3

ADNPZBED4

PANX2

TLR7NHS

RP2

FIG 1 Chromosome mapping of the 102 NPCL markers in the Homo sapiens genome

2237

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

(a)

(b)

(c)4500bp

1200bp

200bp

4500bp

4500bp

1200bp

200bp

1200bp

200bp

4500bp

1200bp

200bp

4500bp

4500bp

1200bp

200bp

1200bp

200bp

4500bp

1200bp

200bp

4500bp

4500bp

1200bp

200bp

1200bp

200bp

Tim

e (M

a) 300

200

100

400

0

457

Tim

e (M

a) 300

200

100

400

0

457

Tim

e (M

a) 300

200

100

400

0

457

102 NPCLs

34 NP

CLs

16 Taxa 16 Taxa 16 Taxa 34 NP

CLs

34 NP

CLs

ADNP

ANKRD50

ARSI

BPTF

CAND1

CASR

CXCR4

DBC1

DISP1

DISP2

DNAH3

DOPEY1

ENC1

EXTL3

FAT1

FAT4

FICD

FLRT3

FZD4

GGPS1

GLCE

GRIN3A

GRM2

KBTBD2

KCNF1

KIAA2013

LIG4

LINGO1

LPHN2

LRRN1

MB21D2

MIOS

MYCBP2

NHS

P2RY1

PANX2

PIK3CG

RAG1

RAG2

ROR2

SACS

SALL1

SETBP1

SOCS5

SPEN

STON2

VPS18

ZEB1

ZFPM2

ZHX2

HYP

CHST1

RERE

SVEP1

LCT

PDP1

MED13

LINGO2

LRRC8D

LRRTM4

RP2

SH3BP4

VCPIP1

DET1

FREM2

MSH6

PCLO

PPL

ARID2

DCHS1

DSEL

FILIP1

KIAA1239

SLITRK1

CPT2

MGAT4C

FEM1B

DMXL1

DOLK

ZBED4

REV3L

IRS1

FAT2

CILP

FUT9

LRRN3

TTN

GPER

WFIKKN2

MED1

EXOC8

B3GALT1

CELSR3

DSE

EVPL

PCDH1

TLR7

MOS

PCDH10

NTN1

SYNE1

ZIC1

CAND1 HYP IRS1

FAT2

CELSR3

RERE

SVEP1

DNAH3

GRM2

Sphyrna

Pangasius

Lepisosteus

Protopterus

IchthyophisB

atrachuperusR

anaM

us

StruthioZ

osteropsC

rocodylus

Trionyx

Podocnem

is

Sus

Hem

idactylusN

aja

Sphyrna

Pangasius

Lepisosteus

Protopterus

IchthyophisB

atrachuperusR

anaM

us

StruthioZ

osteropsC

rocodylus

Trionyx

Podocnem

is

Sus

Hem

idactylusN

aja

Sphyrna

Pangasius

Lepisosteus

Protopterus

IchthyophisB

atrachuperusR

anaM

us

StruthioZ

osteropsC

rocodylus

Trionyx

Podocnem

is

Sus

Hem

idactylusN

aja

FIG 2 PCR performance of the 102 NPCL markers in 16 divergent vertebrate species Each square and electrophoretic lane is aligned with the testedspecies (a) The draft divergence timescale for the 16 tested vertebrate species is based on Inoue et al (2010) and the book The Timetree of Life (b) ThePCR performance of each NPCL marker is ranked by three different-colored squares black producing single target band gray having a target band butwith significant nonspecific bands white no target band The102 NPCL markers are sorted according to their PCR success rates (c) Three typicalagarose electrophoresis results for 9 NPCL markers

2238

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

30 nuclear genes provide further support for the monophylyof lissamphibians and the Batrachia hypothesis (fig 4)Additionally all possible hypotheses against the monophylyof extant amphibians and the Batrachia hypothesis wererejected in our topological tests (table 2) However theBatrachia hypothesis did not receive strong support in ourspecies tree analysis (BPMP-EST = 74 fig 4) suggesting thatmore nuclear genes are still needed to resolve this node

The monophyly of the internally fertilizing salamanders(Salamandroidea all salamanders exclusive of HynobiidaeCryptobranchidae and Sirenidae) is strongly supported inour analyses (fig 4) in line with most recent molecular studies(Wiens et al 2005 Roelants et al 2007 Zhang and Wake 2009Pyron and Wiens 2011) but differing strongly from Frostet al (2006) who recovered a clade comprising SirenidaeDicamptodontidae Ambystomatidae and SalamandridaeThe internally fertilizing salamanders include two well-supported clades one is composed of AmbystomatidaeDicamptodontidae and Salamandridae and the otherof Proteidae Rhyacotritonidae Amphiumidae andPlethodontidae (fig 4)

Currently two hypotheses have been proposed regardingthe basal split within living salamanders The traditional viewfavors Sirenidae as the sister group to all remaining salaman-ders (Duellman and Trueb 1994) This hypothesis receivedstrong support in two recent studies (based on mitochon-drial genomes BPML = 98 Zhang and Wake 2009 based onmitochondrial genomes and nuclear genes BPMLgt 80 SanMauro 2010) In contrast some studies suggest that thebasal split separates Cryptobranchidae + Hynobiidae fromall other salamanders (Gao and Shubin 2001 Wiens et al2005 Frost et al 2006 Roelants et al 2007 Pyron and Wiens2011) but always without strong support (BPMLlt 71) Ourphylogenetic analyses based on 30 independent NPCLs sup-ported the second hypothesis that Cryptobranchoidea(Cryptobranchidae + Hynobiidae) branched first withinthe living salamanders This result is extremely robust inour concatenation analyses (BPML = 99 PPBAY = 10PPCAT = 10 fig 4) and statistically rejects all alternative hy-potheses (table 2) In the species tree analysis without dataconcatenation this result is also strong (BPMP-EST = 83fig 4)

How many nuclear genes then are needed to robustlyresolve the question of the basal split within living salaman-ders Our analysis of data subsets indicates a progressiveincrease in the bootstrap support value for the node of inter-est (fig 4) when an increasing number of genes are analyzed(fig 5) Analyses based on single genes rarely resolve the nodeof interest with any confidence Analyses based on 5ndash10 genesproduce bootstrap support values of 60ndash80 in concatena-tion analyses (fig 5) which is congruent with all previousnuclear studies using similar-sized data sets (Roelants et al2007 Pyron and Wiens 2011) Taking a bootstrap value of 95in concatenation analyses as the threshold of ldquofully resolvedrdquothe minimum number of nuclear genes needed to resolve theroot of the salamander tree is approximately 25 The previouscontradictory results may be due to the overwhelminglystrong signals from the mitochondrial genome Because

100908070604 0503020101020304060 50708090100

100908070604 0503020101020304060 50708090100

PCR success rate in 16 vertebrates () Relative evolutionary rate

NTN1ZIC1PCDH10BPTFKBTBD2DBC1LINGO1ARID2FUT9VCPIP1B3GALT1MED1LPHN2GGPS1SACSPANX2CASRPDP1CAND1LRRN3LINGO2MGAT4CFLRT3KIAA1239ENC1SLITRK1ZFPM2ZEB1FZD4MIOSMYCBP2DSELZBED4LRRC8DMB21D2SETBP1DOPEY1SALL1MED13LRRTM4DISP1RAG1DCHS1NHSEXTL3SOCS5ROR2TTNRP2REREANKRD50FREM2HYPGRM2CXCR4DSEGLCEPCLOVPS18GRIN3AADNPGPERFAT4STON2LRRN1ARSIIRS1PIK3CGP2RY1PCDH1KCNF1DMXL1RAG2PPLFAT2LIG4EXOC8DET1FILIP1WFIKKN2ZHX2CILPCELSR3SYNE1LCTTLR7CPT2CHST1FEM1BDNAH3SH3BP4DOLKFICDSPENKIAA2013SVEP1DISP2REV3LFAT1MSH6EVPLMOS

FIG 3 Relative evolutionary rates of 102 NPCL in vertebrates The 102NPCLs are arranged in order of increasing variability on the right sideand their PCR success rates in the 16 tested vertebrates are shown onthe left side NPCLs indicated with asterisks may have extra copiesin teleost genomes and thus are not suitable for phylogenetic studiesof teleosts

2239

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

the initial diversification of salamanders occurred withina relatively short window of time (Weisrock et al 2005)the genealogical histories of individual gene loci maysometimes appear misleading in terms of the relation-ships among species due to incomplete lineage sortingUnfortunately the mitochondrial genome recorded such anincorrect history

DiscussionThe NPCL toolkit and experimental protocol introducedhere is a highly reliable rapid and cost-effective method forbuilding medium-scale multilocus data to produce high-resolution phylogenetic relationships This phylogenomicapproach has the potential to accelerate the completion ofmany parts of the vertebrate tree of life because no furthermarker development is required which is often the bottle-necks in phylogenetic research Once a specific phylogeneticquestion within vertebrates arises researchers simply need tocheck the list for our toolkit and look for NPCL markers withexpected evolutionary rates and experimental performance

for their groups of interest Then many orthologous loci canbe quickly obtained by traditional PCR and Sanger sequenc-ing usually without time-consuming gel cutting and cloningApplying the NPCL toolkit may also have another benefit forassembling the vertebrate tree of life because people workingon different groups can easily use the same set of loci whichwill facilitate combined analyses

Merits of the Toolkit

Because of the use of the nested PCR strategy outlined earliermost NPCL in the toolkit work for all major jawed vertebrategroups with high experimental success rates (nor-mallygt 95) Such results were achieved in unified PCR con-ditions without any extra effort involving cycling conditionoptimization This feature of the toolkit enables it to surpasspreviously developed nuclear marker sets (Murphy et al 2001Li et al 2007 Thomson et al 2008 Townsend et al 2008Wright et al 2008 Portik et al 2011 Shen et al 2011Zhou et al 2011) Most previous nuclear marker sets weredeveloped for specific animal groups and their application to

Table 1 Summary Information for the 30 NPCL Amplified in 19 Salamander Taxa

Gene Length (bp) TaxaAmplified ()

PCR ProductsDirectly

Sequenced ()

GC VarSites ()

PI Sites()

Overall Mean

P Distance RCV

BPTF 552 19 (100) 19 (100) 43 163 (30) 118 (21) 0098 0093

CAND1 1155 19 (100) 17 (89) 44 403 (35) 314 (27) 0116 0065

DET1 711 19 (100) 18 (95) 46 275 (39) 216 (30) 0131 0091

DISP1 960 19 (100) 19 (100) 41 317 (33) 211 (22) 0096 0072

DNAH3 918 19 (100) 19 (100) 42 389 (42) 304 (33) 0139 0049

DOLK 672 16 (84) 12 (75) 52 316 (47) 236 (35) 0173 0126

DSEL 1266 19 (100) 19 (100) 44 546 (43) 415 (33) 0148 0055

ENC1 1083 19 (100) 19 (100) 51 363 (34) 279 (26) 0120 0057

EXTL3 1245 19 (100) 17 (89) 47 465 (37) 322 (26) 0118 0067

FAT4 738 19 (100) 19 (100) 45 344 (47) 249 (34) 0156 0072

FICD 510 18 (95) 18 (100) 44 169 (33) 124 (24) 0111 0074

GRM2 690 18 (95) 18 (100) 54 240 (35) 176 (26) 0115 0118

HYP 1260 19 (100) 19 (100) 47 516 (41) 359 (28) 0122 0049

KBTBD2 1059 19 (100) 19 (100) 44 406 (38) 246 (23) 0103 0040

KCNF1 735 19 (100) 19 (100) 52 294 (40) 220 (30) 0151 0153

KIAA1239 1362 19 (100) 19 (100) 42 479 (35) 338 (25) 0110 0048

KIAA2013 516 19 (100) 19 (100) 52 221 (43) 178 (34) 0148 0093

LIG4 1017 19 (100) 19 (100) 39 434 (43) 301 (30) 0137 0043

LPHN2 573 19 (100) 19 (100) 47 192 (34) 140 (24) 0106 0108

LRRN1 837 19 (100) 18 (95) 49 345 (41) 240 (29) 0130 0119

MGAT4C 747 18 (95) 18 (100) 42 273 (37) 212 (28) 0126 0079

MIOS 879 19 (100) 19 (100) 45 291 (33) 213 (24) 0107 0065

PANX2 717 19 (100) 19 (100) 44 254 (35) 199 (28) 0125 0059

PDP1 1035 19 (100) 19 (100) 45 348 (34) 261 (25) 0110 0080

PPL 1338 19 (100) 17 (89) 47 645 (48) 485 (36) 0156 0064

RAG1 1380 19 (100) 19 (100) 51 550 (40) 438 (32) 0146 0072

RAG2 792 19 (100) 18 (95) 49 406 (51) 310 (39) 0184 0090

SACS 1101 19 (100) 19 (100) 40 383 (35) 282 (26) 0105 0040

TTN 984 19 (100) 5 (26) 43 378 (38) 269 (27) 0125 0052

ZBED4 1002 19 (100) 19 (100) 39 325 (32) 224 (22) 0093 0040

NOTEmdashLength length of refined alignment Var sites variable sites PI sites parsimony informative sites RCV relative composition variability

2240

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Aneides hardii

Plethodon jordani

Batrachoseps major

Eurycea bislineata

Amphiuma means

Rhyacotriton variegatus

Proteus anguinus

Necturus beyeri

Tylototriton asperrimus

Cynops orientalis

Salamandra salamandra

Dicamptodon aterrimus

Ambystoma mexicanum

Pseudobranchus axanthus

Siren intermedia

Ranodon sibiricus

Batrachuperus yenyuanensis

Onychodactylus fischeri

Andrias davidianus

Silurana tropicalis

Bombina fortinuptialis

Typhlonectes natans

Gallus gallus

Ichthyophis bannanicus

Homo sapiens

Chrysemys picta bellii

Mus musculus

Latimeria chalumnae

01 subsititutionssite

Dicamptodontidae

Hynobiidae

Cryptobranchidae

Sirenidae

Plethodontidae

Rhyacotritonidae

Amphiumidae

Proteidae

Salamandridae

Ambystomatidae

ANURA

GYMNOPHIONA

Cry

ptob

ranc

hoid

eaSa

lam

andr

oide

a

30 nuclear genes

(total 27834 bp)

1

Non-amphibianOutgroup

99101083

99101074

FIG 4 Higher-level phylogenetic relationships of 10 salamander families inferred from 30 NPCL markers The tree was inferred by concatenationanalyses using ML BI and the mixture model (CAT) and by species-tree analysis using the pseudo-ML approach (MP-EST) Branch support valuesare indicated beside nodes in order of ML bootstrap (BPML) BI posterior probability (PPBI) CAT posterior probability (PPCAT) and MP-EST bootstrap(BPMP-EST) from left to right The filled squares represent BPMLgt 95 PPBAY = 10 PPCAT = 10 and BPMP-ESTgt 95 The circled number refers to the nodeof interest studied in figure 6 Branch lengths are from the ML analysis

Table 2 Statistical Confidence (P Values) for Alternative Branching Hypotheses Based on 30-Gene Data Set

Alternative Topology Tested Ln L P Value Rejection

AU BP KH

Best ML 0 0993 097 098

Sirenidae branched earlier 328 0025 0015 002 + + +

Sirenidae is sister to Cryptobranchoidea 423 0004 0002 0004 + + +

Gymnophiona is sister to Anura (monophyletic lissamphibians) 343 0013 0012 0013 + + +

Gymnophiona is sister to Caudata (monophyletic lissamphibians) 439 0002 0001 0001 + + +

Anura is sister to Amniota (paraphyletic lissamphibians) 1446 5E30 0 0 + + +

Gymnophiona is sister to Amniota (paraphyletic lissamphibians) 1290 1E69 0 0 + + +

Caudata is sister to Amniota (paraphyletic lissamphibians) 1728 00001 0 0 + + +

2241

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

other distantly related groups is usually difficult For exampleSpinks et al (2010) collected 120 nuclear markers from aviansquamate and mammalian phylogenetic studies and evalu-ated their PCR performance in turtles They found that onlyeight nuclear markers successfully produced single expectedbands across 13 tested turtle species In another case Fongand Fujita (2011) developed 75 nuclear markers for vertebratephylogenetics but approximately 60 of the target fragmentswere unable to obtain in three test species (two reptiles andone lissamphibian) Therefore although the nested PCRmethod introduced here requires an additional PCR reactionthe extra work is still worthwhile

In PCR-based phylogenetic projects even when thePCR reactions are successful the products often contain sig-nificant nonspecific amplicons Such a condition requires ad-ditional effort involving gel purification and cloning whichinvolves much more time than the PCR reaction Our NPCLtoolkit is specifically designed to solve this problem so thatnormally over 90 of PCR reactions produce strong andsingle expected bands Moreover most of the primers usedto date for nuclear marker sets are degenerate and thus arenot suitable for direct sequencing PCR products Benefitingfrom the use of our nested PCR strategy we introduce an-choring sequences to the ends of PCR fragments while main-taining PCR efficiency Such introduced anchoring sequencesbring the added benefit that two specific sequencing primers(Seq_F and Seq_R) can be used in all Sanger sequencingreactions

One additional feature of our NPCL toolkit is that theaverage length of the NPCLs within it is 1050 bp a lengththat can easily be amplified in one PCR reaction and se-quenced in both directions to allow efficient use of resourcesIn contrast the average marker lengths are 920 bp for 10NPCLs in Li et al (2007) 760 bp for 26 NPCLs in Townsendet al (2008) 873 bp for 22 NPCLs in Shen et al (2011) and470 bp for 75 NPCLs in Fong and Fujita (2011) respectivelyLonger markers will provide more sites than shorter ones for

equivalent money and time This feature makes our NPCLtoolkit more cost-effective than previously developed nuclearmarker sets

Phylogenetic Utility

The vertebrate NPCL toolkit we developed here shows greatpromise in terms of phylogenetic utility A remarkable featureof our NPCL toolkit is that it provided 102 NPCLs with abroad range of evolutionary rates In the case of our demon-stration we used 30 NPCLs to resolve a family-level salaman-der phylogeny using both traditional concatenation analysesand a more promising species-tree analysis However thisexample does not mean that our toolkit performs wellonly on deep-timescale questions Our ongoing study usingthis toolkit to resolve the intra-relationships withinPlethodontidae a rapidly radiating group of salamanders sug-gests that the toolkit developed here also performs well inresolving species-level phylogenies For many vertebrategroups in which applicable nuclear markers are limitedsuch as some teleosts frogs and salamanders our NPCLtoolkit can provide a one-stop solution for phylogenetic stud-ies from the family level to the species level Even for thosegroups in which specific nuclear marker sets have beendeveloped our toolkit is still worth trying as many moreloci can be easily obtained that may resolve some difficultbranches

The Toolkit Is a Good Addition to Sequence CaptureApproaches

Recently sequence capture approaches have been applied tovertebrate phylogenomics (Crawford et al 2012 Fairclothet al 2012 Lemmon et al 2012 McCormack et al 2012)These approaches begin with the selective capture of geno-mic regions Briefly fragmented gDNA is hybridized to DNAor RNA probes either on an array or in solution NontargetedDNA is then washed away and the targeted DNA is se-quenced through NGS The most promising feature of thesequence capture approach is that it can simultaneously pro-duce hundreds to thousands of loci for tens of individualswithin a relatively short time Therefore the sequence captureapproach is considered to be much more cost-effective thanthe PCR-based method According to the calculation ofLemmon et al (2012) for a 100 taxa 500 loci project thecost of the sequence capture method is just 1ndash35 of thePCR-based method

However the sequence capture approach is currently toochallenging for most phylogenetic researchers Typical NGSruns (454 or Illumina) used by the sequence capture methodgenerate 1000000ndash2000000000 sequences Storing and pro-cessing these NGS data require significant computer memoryhardware upgrades and bioinformatic programming skillswhich are often not familiar to most phylogenetic researchersMoreover phylogenetic reconstruction assumes that ortho-logous genes are being analyzed across species For the PCR-based method the detection of paralogous genes is relativelystraightforward However in the sequence capture methodthe captured genomic regions comprise short conservedcores (probe regions) and long unconserved flanking

1

100

90

80

70

60

50

30

20

10

0

40

5 10 15 20 25 30

Concatenation analyses

Species tree analyses

Boo

tstr

ap s

uppo

rt (

)

Number of genes

FIG 5 The effect of increasing the number of nuclear loci on resolvingthe basal split within salamanders Each data point represents the meanof support values estimated from 30 randomly sampled subsets Thedashed line indicates the threshold of 95 bootstrap support valuesThe statistical plots show that the minimum number of nuclear locineeded to robustly resolve the basal split within salamanders is 25

2242

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

sequences Because paralogy cannot be detected until afterthe data are aligned those unalignable sequences will makethe detection of paralogy more difficult

In fact not every phylogenetic project will use more than500 loci as the sequence capture method normally doesBased on both empirical and simulation data 20ndash50 lociare generally sufficient to answer many phylogenetic ques-tions (Rokas et al 2003 Spinks et al 2009) This is also thenumber of loci that most phylogenetic studies will use Insuch a situation adopting the sequence capture method isnot cost-effective because researchers need to use relativelyexpensive NGS sequencing and spend time learning new ex-perimental techniques and carrying out sophisticated bioin-formatic processing Our NPCL toolkit is specially designed forsuch medium-scale phylogenetic projects using approxi-mately 50 loci Such a number of expected loci can beeasily fulfilled with our 102 NPCLs Because more than 90of the PCR reactions generated by our toolkit can be directlysequenced the average cost for one locus per sample is ratherlow In our laboratory generating one new sequence typicallycosts US$ 3 (without considering labor)

In addition researchers sometimes have only tiny amountsof DNA but they wish to perform a multilocus phylogeneticanalysis In such a situation the sequence capture method isdifficult to implement because it normally requires DNA atthe microgram level (Lemmon et al 2012) Our NPCL toolkitcan fill the gap here Benefiting from the use of the nestedPCR strategy the sensitivity of PCR reactions in our method isextremely high In many test experiments in our laboratorythe toolkit and protocol could produce target bands withonly 5ndash10 ng of DNA

Our NPCL toolkit is an alternative to the sequence capturemethod for the everyday work of phylogenetic researchersWhich method to choose depends on two major drivers theamounts of DNA and the expected number of loci Whenyour DNA is limited the better solution may be PCR other-wise sequence capture also works Taking into account themoney and time the two methods require we speculate thatthe economic transition point from PCR to sequence captureis at approximately 100 loci That assessment is why ourtoolkit includes 102 NPCL markers Our proposal is thatwhen using 100 loci one can try our NPCL toolkit whenusing gt100 loci sequence capture should be used

Future Directions

In this study we used multiple genome alignments depositedin the University of CaliforniandashSan Cruz (UCSC) genomebrowser to identify long and conserved exons across jawedvertebrates Benefiting from the use of a nested PCR strategythe experimental performance of the developed NPCLs indi-cated that they are highly stable in all major jawed vertebrategroups Recently a database for mining exon and intron mar-kers called EvolMarkers has been built by Li et al (2012)Careful investigation of this database may identify many con-served exons within nonvertebrates whose interrelationshipsare currently more problematic than those of vertebratesBecause the nonvertebrates constitute many distantly related

groups it may be impossible to develop a single set of PCRprimers for all nonvertebrates However following a similarmarker development strategy multiple NPCL toolkits couldbe constructed for various groups of nonvertebrates such asarthropods echinoderms and molluscs In addition becauseintrons are flanked by conserved exons the idea of the use ofnested PCRs for marker development could also be applied tothe development of EPIC (exon-primed intron crossing) mar-kers which are more suitable in shallow-scale phylogenetic orphylogeographic projects

Despite the benefits of our proposed method it must berecognized that when handling large-scale projects such as200 taxa 100 loci the use of our toolkit and Sanger se-quencing will still require significant cost time and laborAn alternative solution is to use NGS to replace Sanger se-quencing Recently 454 NGS technology has been applied tosequence-targeted gene regions from a pool of PCR productsfrom different specimens (Binladen et al 2007 Meyer et al2008) In such experiments specific tagging sequences mustbe added to amplicons by either PCR (Binladen et al 2007) orblunt-end ligation (Meyer et al 2008) Therefore if the tailingsequences of the second-round PCR primers in our NPCLtoolkit are replaced by tagging sequences instead (for tagdesigning see Faircloth and Glenn 2012) all PCR productscan be pooled together and sequenced with the 454 NGSwhich will greatly reduce the money and time cost comparedwith Sanger sequencing However parallel tagged sequencingvia NGS does not circumvent the process of PCR for eachindividual at each locus which may be the most onerous partof a large-scale phylogenomic project Some promising newtechnologies may help to solve this problem such as micro-droplet PCR (Tewhey et al 2009) where millions of individualPCR reactions are performed in picoliter-scale droplets simul-taneously and the 9696 Dynamic Array by Fluidigm whichallows 96 primer combinations to be used on 96 samples(9216 total PCR reactions) on a single PCR plate Howeverthere has been little research to applying NGS and new high-throughput PCR technologies to phylogenomics so theirease-of-use and cost-effectiveness still need to be explored

Summary

In conclusion we have developed an improved method forrapidly amplifying and sequencing NPCLs that has proven tobe useful and effective for molecular phylogenetic studies ofvertebrates The newly developed toolkit provides an attrac-tive alternative to available methods for vertebratephylogenomics

Materials and Methods

Development of NPCL and Primer Design

Our previous study showed that nested PCR is overwhel-mingly more effective than conventional PCR for obtainingtarget amplicons from complex genomic environments (Shenet al 2012) However nested PCR requires four conservedregions to design two pairs of primers (illustrated in fig 6yellow blocks represent the conserved regions used for primerdesign) which means that only relatively long exons are

2243

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li

et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons

As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the

Enrich target region from complex genomic environment with one pair of high degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min

Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min

Evaluate agarose gel electrophoretic results and sequencing

gDNA

the first round PCR product

Second PCR with tailed primers F2 and R2 using the 1st PCR as template

First PCR with primers F1 and R1 using gDNA as template

PCR evaluating and sequencing

25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing

A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product

PCR was performed with 50-100 ng DNA in a 25 ul reaction

PCR was performed with 1ul 1st PCR in a 25 ul reaction

(i)

(ii)

(iii)

F1

R2

F2

Seq_F

Seq_R

R1

Target Region

Target Region

Target Region

Target Region

Target Region

conserved blocks

single and strong bandN

Y (normally gt 90)

gel cutting or cloning then sequencing

cleanup with ExoI and FastAPdirect sequencing by general sequencing primers

Seq_F and Seq_R

Laboratory Protocol

FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region

2244

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR

Experimental Testing for Candidate Markers in16 Jawed Vertebrates

To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification

All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C

One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the

PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6

Calculation of Relative Evolutionary Rateof 102 NPCLs

The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates

Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships

To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant

2245

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online

Phylogenetic Analyses

Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)

For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model

The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence

We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree

Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8

model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates

Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)

Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)

ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R

Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197

Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786

Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375

Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press

Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749

Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543

Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726

Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990

Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307

Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370

2246

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577

Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321

Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563

Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586

Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179

Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276

Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701

Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145

Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744

Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744

Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971

Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302

McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754

McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538

Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945

Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278

Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618

Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10

Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583

Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892

Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804

Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561

San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599

Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252

Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256

Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508

Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247

Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947

Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129

Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031

Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525

Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195

Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142

Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777

Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110

2247

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156

Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508

Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400

Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140

2248

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Page 2: A Versatile and Highly Efficient Toolkit Including 102 ...completion of many parts of the vertebrate tree of life. Key words: nuclear marker, phylogenomic, vertebrate, salamander,

each locus not to mention the extra effort involved in PCRoptimization gel-purification and cloning On the otherhand the PCR-based method also has its advantages 1)it is highly targeted and can produce nearly complete datamatrices and the data analysis process is straightforwardand familiar to most empirical researchers 2) it requiresno prior genomic knowledge of the targeted organismsand 3) it works with tiny amounts of DNA and thusappears to be an ideal solution when DNA samples arelimited

For most interspecific phylogenetic projects nuclear pro-tein-coding loci (NPCLs) that are developed on exons arelikely the markers of choice for the PCR-based strategy be-cause they provide an appropriate level of variation easyalignment across a large phylogenetic span and relativelystraightforward detection of paralogs (Thomson et al 2010)In this study our aim was to develop a suite of universal NPCLmarkers and an efficient experimental protocol for vertebratephylogenomics Aimed at eliminating the drawbacks of theconventional PCR-based method we designed our NPCLtoolkit and protocol to 1) include approximately 100 NPCLmarkers (we think the economic transition from PCR to se-quence capture is at approximately 100 loci if more than 100loci are to be used the PCR method is not cost-efficient) 2)work for all major jawed vertebrate clades and provide goodresolution at different evolutionary timescales 3) producesingle and strong amplicon bands without any PCR optimi-zation in most cases and 4) yield PCR products that can bedirectly cleaned and sequenced without gel purification orcloning in most cases

Because our NPCL toolkit is designed for universal phy-logenetic applications in vertebrates it should be tested ina real case with some difficult samples Salamanders arewell known to have much larger genomes than most ver-tebrates (often 10 times the human genome httpwwwgenomesizecom) The PCR-based method normally per-forms poorly for salamanders (personal experience andcommunication with colleagues) For example Shen et al(2011) amplified 22 NPCL markers in 16 tested vertebratesIn all 15 nonsalamander species approximately 90 of themarkers could be successfully amplified however for thetested salamander species Batrachuperus yenyuanensis only8 of 22 NPCL markers (36) could be amplified Here weapply our NPCL toolkit and protocol to address the higherlevel relationships of living salamanders as a test ofthe toolkitrsquos utility Our results demonstrate that the newuniversal NPCL toolkit and protocol are fast and effectivein constructing multilocus data matrices for vertebratephylogenomics

Results

Experimental Performance and Characteristics of theNew NPCL Toolkit

The newly developed NPCL toolkit contains 102 NPCLmarkers ranging from 510 to 1650 bp with an averagelength of 1050 bp each NPCL marker comprises two pairsof primers for the nested PCR strategy (supplementary table

S1 Supplementary Material online) These 102 NPCL markersare broadly distributed on 21 chromosomes of the humangenome (fig 1) We classified their PCR performance intothree levels 1) producing a single target band of the expectedsize 2) producing a target band but also significant nonspe-cific bands and 3) not producing a target band The first twoconditions are considered successful The PCR performancesof the 102 NPCL markers across 16 diverse vertebrate speciesand three representative electropherograms are shown infigure 2 Of the 102 NPCL markers 57 have a 100 successrate in the 16 tested vertebrate species 87 have a success rateof more than 90 and the remaining 15 range from 56 to88 (fig 2) Of the 1632 PCR reactions (102 loci 16 taxa)1544 (946) were successful with 1485 (91) producingstrong single target bands that can be used for direct se-quencing In the demonstration case in which 30 NPCL mar-kers were used to investigate the higher level relationships ofliving salamanders 632 (989) of the 639 target fragmentswere successful Of the 632 successful reactions 602 (95)were directly sequenced with the general sequencing primersldquoSeq_Frdquo and ldquoSeq_Rrdquo The PCR success rates for each of the102 NPCL markers across the 16 tested vertebrate species areshown in figure 3

The evolutionary rate as evidenced by the degree of var-iability is an important parameter of an NPCL marker be-cause it determines applicability for different phylogeneticquestions Although our NPCL toolkit has a high PCR successrate in highly diverged taxa that success does not meanthat the NPCL markers in the toolkit are very conservedAs figure 3 illustrates our toolkit includes NPCLs with abroad range of evolutionary rates approximately 4-foldAmong the 102 NPCL markers 60 evolve faster than RAG1an NPCL that has been widely used for phylogenetic inferencein various vertebrate groups Because previous analyses basedon RAG1 data resulted in highly resolved and robustly sup-ported phylogenetic relationships at multiple hierarchicallevels (San Mauro et al 2005 Wiens et al 2005 Hugallet al 2007 Roelants et al 2007) this indicates that ourNPCL toolkit has the potential to resolve questions of bothdeep and shallow phylogeny

It is well known that the fish-specific genome duplicationoccurred in the teleosts (Meyer and Van de Peer 2005)Although most duplicated genes were secondarily lostsome were retained or evolved new functions For anNPCL marker if there are two similar copies in teleost ge-nomes it is difficult to check the orthologous status of theobtained fragments To this end we took the zebrafishsequence of each NPCL to Basic Local Alignment SearchTool (Blast) against all available teleost genomes in theENSEMBL database If an NPCL receives more than twoBlast hits and the top Blast score is not more than twicethe second Blast score that NPCL might have an extra copyin teleost genomes Using this method of the 102 NPCLs itwas found that only six (CXCR4 GLCE KCNF1 LINGO1NTN1 and PCDH10) may have extra copies in teleost ge-nomes (fig 3 supplementary table S1 SupplementaryMaterial online) This result indicates that our NPCL toolkitis also suitable for phylogenetic inference in teleosts

2236

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Phylogenetic Performance in a Real Case

Our demonstration case included 19 salamander species thatspan salamander evolutionary diversity (supplementary tableS2 Supplementary Material online) The nine outgroupspecies (two frogs two caecilians one turtle one bird twomammals and one coelacanth) provided a largely balancedrepresentation of relatives of salamanders The 30 newlyamplified NPCLs exhibited levels of variation comparablewith that of the traditional RAG1 gene with variable sitesvarying between 30 and 51 of all sequenced sites (table 1)The data set combining these 30 NPCLs comprises 27834 bpand exhibits little substitution saturation (supplementary figS1 Supplementary Material online) The phylogenetic analy-ses of the concatenated data set using three tree-buildingmethods (maximum likelihood [ML] Bayesian and CAT-mixture model) produced an identical fully resolved treefor 28 taxa (fig 4) In all 25 nodes of the tree the statisticalsupport was highly robust (BPML 99ndash100 PPBAY = 10PPCAT = 10) The species tree estimated from 30 individualNPCLs without data concatenation using the pseudo-ML

approach is identical to those estimated from the concate-nated analyses All nodes received bootstrap support valuesvarying between 74 and 100 (fig 4) We also conducted phy-logenetic analyses at the amino acid level (9278 deducedamino acid residues) using three tree-building methods(ML Bayesian and CAT-mixture model) The protein treetopology is identical to the DNA result with just slightlylower branch support for some nodes (supplementary figS2 Supplementary Material online) Therefore we did notfurther analyze the protein data set

The monophyly of extant amphibians with respect to am-niotes and the close relationship between frogs and salaman-ders (the Batrachia hypothesis) are repeatedly recovered inmost recent molecular studies (San Mauro et al 2005 Zhanget al 2005 Frost et al 2006 Hugall et al 2007 Roelants et al2007 Zhang and Wake 2009 San Mauro 2010 Pyron andWiens 2011) However a recent molecular study based on26 nuclear genes (Fong et al 2012) supports a caecilianndashsalamander sister relationship with the possible paraphylyof extant amphibians Our phylogenetic analyses based on

RERE

KIAA2013SPEN

CPT2

LPHN2

LRRC8D

DISP1

EXOC8

GGPS1

1 2 3 4 5 6 7 8 9 10 11

12 13 14 15 16 17 18 19 20 21 22 X

KCNF1

SOCS5

MSH6

LRRTM4

LCT

CXCR4

B3GALT1

TTN

IRS1

SH3BP4

LRRN1

CELSR3

GRM2

CASR

ZIC1

P2RY1

MB21D2

KIAA1239

ANKRD50

FAT4PCDH10

FAT1

ENC1

DMXL1

PCDH1ARSI

FAT2

FILIP1DOPEY1

FUT9

REV3L

DSE

SYNE1

GPERMIOS

KBTBD2

PCLO

PIK3CG

LRRN3

EXTL3

MOS

VCPIP1

PDP1

ZFPM2

ZHX2

LINGO2ZEB1

ROR2

GRIN3A

SVEP1

DBC1

DOLK

DCHS1

RAG1RAG2HYPCHST1

FZD4

ARID2

CAND1

MGAT4C

FICD

SACS

FREM2

MYCBP2

SLITRK1

LIG4

STON2

DISP2

VPS18

CILPFEM1BGLCELINGO1

DET1

PPL

DNAH3

SALL1

NTN1

MED1

WFIKKN2

MED13BPTF

EVPL

SETBP1

DSEL

FLRT3

ADNPZBED4

PANX2

TLR7NHS

RP2

FIG 1 Chromosome mapping of the 102 NPCL markers in the Homo sapiens genome

2237

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

(a)

(b)

(c)4500bp

1200bp

200bp

4500bp

4500bp

1200bp

200bp

1200bp

200bp

4500bp

1200bp

200bp

4500bp

4500bp

1200bp

200bp

1200bp

200bp

4500bp

1200bp

200bp

4500bp

4500bp

1200bp

200bp

1200bp

200bp

Tim

e (M

a) 300

200

100

400

0

457

Tim

e (M

a) 300

200

100

400

0

457

Tim

e (M

a) 300

200

100

400

0

457

102 NPCLs

34 NP

CLs

16 Taxa 16 Taxa 16 Taxa 34 NP

CLs

34 NP

CLs

ADNP

ANKRD50

ARSI

BPTF

CAND1

CASR

CXCR4

DBC1

DISP1

DISP2

DNAH3

DOPEY1

ENC1

EXTL3

FAT1

FAT4

FICD

FLRT3

FZD4

GGPS1

GLCE

GRIN3A

GRM2

KBTBD2

KCNF1

KIAA2013

LIG4

LINGO1

LPHN2

LRRN1

MB21D2

MIOS

MYCBP2

NHS

P2RY1

PANX2

PIK3CG

RAG1

RAG2

ROR2

SACS

SALL1

SETBP1

SOCS5

SPEN

STON2

VPS18

ZEB1

ZFPM2

ZHX2

HYP

CHST1

RERE

SVEP1

LCT

PDP1

MED13

LINGO2

LRRC8D

LRRTM4

RP2

SH3BP4

VCPIP1

DET1

FREM2

MSH6

PCLO

PPL

ARID2

DCHS1

DSEL

FILIP1

KIAA1239

SLITRK1

CPT2

MGAT4C

FEM1B

DMXL1

DOLK

ZBED4

REV3L

IRS1

FAT2

CILP

FUT9

LRRN3

TTN

GPER

WFIKKN2

MED1

EXOC8

B3GALT1

CELSR3

DSE

EVPL

PCDH1

TLR7

MOS

PCDH10

NTN1

SYNE1

ZIC1

CAND1 HYP IRS1

FAT2

CELSR3

RERE

SVEP1

DNAH3

GRM2

Sphyrna

Pangasius

Lepisosteus

Protopterus

IchthyophisB

atrachuperusR

anaM

us

StruthioZ

osteropsC

rocodylus

Trionyx

Podocnem

is

Sus

Hem

idactylusN

aja

Sphyrna

Pangasius

Lepisosteus

Protopterus

IchthyophisB

atrachuperusR

anaM

us

StruthioZ

osteropsC

rocodylus

Trionyx

Podocnem

is

Sus

Hem

idactylusN

aja

Sphyrna

Pangasius

Lepisosteus

Protopterus

IchthyophisB

atrachuperusR

anaM

us

StruthioZ

osteropsC

rocodylus

Trionyx

Podocnem

is

Sus

Hem

idactylusN

aja

FIG 2 PCR performance of the 102 NPCL markers in 16 divergent vertebrate species Each square and electrophoretic lane is aligned with the testedspecies (a) The draft divergence timescale for the 16 tested vertebrate species is based on Inoue et al (2010) and the book The Timetree of Life (b) ThePCR performance of each NPCL marker is ranked by three different-colored squares black producing single target band gray having a target band butwith significant nonspecific bands white no target band The102 NPCL markers are sorted according to their PCR success rates (c) Three typicalagarose electrophoresis results for 9 NPCL markers

2238

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

30 nuclear genes provide further support for the monophylyof lissamphibians and the Batrachia hypothesis (fig 4)Additionally all possible hypotheses against the monophylyof extant amphibians and the Batrachia hypothesis wererejected in our topological tests (table 2) However theBatrachia hypothesis did not receive strong support in ourspecies tree analysis (BPMP-EST = 74 fig 4) suggesting thatmore nuclear genes are still needed to resolve this node

The monophyly of the internally fertilizing salamanders(Salamandroidea all salamanders exclusive of HynobiidaeCryptobranchidae and Sirenidae) is strongly supported inour analyses (fig 4) in line with most recent molecular studies(Wiens et al 2005 Roelants et al 2007 Zhang and Wake 2009Pyron and Wiens 2011) but differing strongly from Frostet al (2006) who recovered a clade comprising SirenidaeDicamptodontidae Ambystomatidae and SalamandridaeThe internally fertilizing salamanders include two well-supported clades one is composed of AmbystomatidaeDicamptodontidae and Salamandridae and the otherof Proteidae Rhyacotritonidae Amphiumidae andPlethodontidae (fig 4)

Currently two hypotheses have been proposed regardingthe basal split within living salamanders The traditional viewfavors Sirenidae as the sister group to all remaining salaman-ders (Duellman and Trueb 1994) This hypothesis receivedstrong support in two recent studies (based on mitochon-drial genomes BPML = 98 Zhang and Wake 2009 based onmitochondrial genomes and nuclear genes BPMLgt 80 SanMauro 2010) In contrast some studies suggest that thebasal split separates Cryptobranchidae + Hynobiidae fromall other salamanders (Gao and Shubin 2001 Wiens et al2005 Frost et al 2006 Roelants et al 2007 Pyron and Wiens2011) but always without strong support (BPMLlt 71) Ourphylogenetic analyses based on 30 independent NPCLs sup-ported the second hypothesis that Cryptobranchoidea(Cryptobranchidae + Hynobiidae) branched first withinthe living salamanders This result is extremely robust inour concatenation analyses (BPML = 99 PPBAY = 10PPCAT = 10 fig 4) and statistically rejects all alternative hy-potheses (table 2) In the species tree analysis without dataconcatenation this result is also strong (BPMP-EST = 83fig 4)

How many nuclear genes then are needed to robustlyresolve the question of the basal split within living salaman-ders Our analysis of data subsets indicates a progressiveincrease in the bootstrap support value for the node of inter-est (fig 4) when an increasing number of genes are analyzed(fig 5) Analyses based on single genes rarely resolve the nodeof interest with any confidence Analyses based on 5ndash10 genesproduce bootstrap support values of 60ndash80 in concatena-tion analyses (fig 5) which is congruent with all previousnuclear studies using similar-sized data sets (Roelants et al2007 Pyron and Wiens 2011) Taking a bootstrap value of 95in concatenation analyses as the threshold of ldquofully resolvedrdquothe minimum number of nuclear genes needed to resolve theroot of the salamander tree is approximately 25 The previouscontradictory results may be due to the overwhelminglystrong signals from the mitochondrial genome Because

100908070604 0503020101020304060 50708090100

100908070604 0503020101020304060 50708090100

PCR success rate in 16 vertebrates () Relative evolutionary rate

NTN1ZIC1PCDH10BPTFKBTBD2DBC1LINGO1ARID2FUT9VCPIP1B3GALT1MED1LPHN2GGPS1SACSPANX2CASRPDP1CAND1LRRN3LINGO2MGAT4CFLRT3KIAA1239ENC1SLITRK1ZFPM2ZEB1FZD4MIOSMYCBP2DSELZBED4LRRC8DMB21D2SETBP1DOPEY1SALL1MED13LRRTM4DISP1RAG1DCHS1NHSEXTL3SOCS5ROR2TTNRP2REREANKRD50FREM2HYPGRM2CXCR4DSEGLCEPCLOVPS18GRIN3AADNPGPERFAT4STON2LRRN1ARSIIRS1PIK3CGP2RY1PCDH1KCNF1DMXL1RAG2PPLFAT2LIG4EXOC8DET1FILIP1WFIKKN2ZHX2CILPCELSR3SYNE1LCTTLR7CPT2CHST1FEM1BDNAH3SH3BP4DOLKFICDSPENKIAA2013SVEP1DISP2REV3LFAT1MSH6EVPLMOS

FIG 3 Relative evolutionary rates of 102 NPCL in vertebrates The 102NPCLs are arranged in order of increasing variability on the right sideand their PCR success rates in the 16 tested vertebrates are shown onthe left side NPCLs indicated with asterisks may have extra copiesin teleost genomes and thus are not suitable for phylogenetic studiesof teleosts

2239

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

the initial diversification of salamanders occurred withina relatively short window of time (Weisrock et al 2005)the genealogical histories of individual gene loci maysometimes appear misleading in terms of the relation-ships among species due to incomplete lineage sortingUnfortunately the mitochondrial genome recorded such anincorrect history

DiscussionThe NPCL toolkit and experimental protocol introducedhere is a highly reliable rapid and cost-effective method forbuilding medium-scale multilocus data to produce high-resolution phylogenetic relationships This phylogenomicapproach has the potential to accelerate the completion ofmany parts of the vertebrate tree of life because no furthermarker development is required which is often the bottle-necks in phylogenetic research Once a specific phylogeneticquestion within vertebrates arises researchers simply need tocheck the list for our toolkit and look for NPCL markers withexpected evolutionary rates and experimental performance

for their groups of interest Then many orthologous loci canbe quickly obtained by traditional PCR and Sanger sequenc-ing usually without time-consuming gel cutting and cloningApplying the NPCL toolkit may also have another benefit forassembling the vertebrate tree of life because people workingon different groups can easily use the same set of loci whichwill facilitate combined analyses

Merits of the Toolkit

Because of the use of the nested PCR strategy outlined earliermost NPCL in the toolkit work for all major jawed vertebrategroups with high experimental success rates (nor-mallygt 95) Such results were achieved in unified PCR con-ditions without any extra effort involving cycling conditionoptimization This feature of the toolkit enables it to surpasspreviously developed nuclear marker sets (Murphy et al 2001Li et al 2007 Thomson et al 2008 Townsend et al 2008Wright et al 2008 Portik et al 2011 Shen et al 2011Zhou et al 2011) Most previous nuclear marker sets weredeveloped for specific animal groups and their application to

Table 1 Summary Information for the 30 NPCL Amplified in 19 Salamander Taxa

Gene Length (bp) TaxaAmplified ()

PCR ProductsDirectly

Sequenced ()

GC VarSites ()

PI Sites()

Overall Mean

P Distance RCV

BPTF 552 19 (100) 19 (100) 43 163 (30) 118 (21) 0098 0093

CAND1 1155 19 (100) 17 (89) 44 403 (35) 314 (27) 0116 0065

DET1 711 19 (100) 18 (95) 46 275 (39) 216 (30) 0131 0091

DISP1 960 19 (100) 19 (100) 41 317 (33) 211 (22) 0096 0072

DNAH3 918 19 (100) 19 (100) 42 389 (42) 304 (33) 0139 0049

DOLK 672 16 (84) 12 (75) 52 316 (47) 236 (35) 0173 0126

DSEL 1266 19 (100) 19 (100) 44 546 (43) 415 (33) 0148 0055

ENC1 1083 19 (100) 19 (100) 51 363 (34) 279 (26) 0120 0057

EXTL3 1245 19 (100) 17 (89) 47 465 (37) 322 (26) 0118 0067

FAT4 738 19 (100) 19 (100) 45 344 (47) 249 (34) 0156 0072

FICD 510 18 (95) 18 (100) 44 169 (33) 124 (24) 0111 0074

GRM2 690 18 (95) 18 (100) 54 240 (35) 176 (26) 0115 0118

HYP 1260 19 (100) 19 (100) 47 516 (41) 359 (28) 0122 0049

KBTBD2 1059 19 (100) 19 (100) 44 406 (38) 246 (23) 0103 0040

KCNF1 735 19 (100) 19 (100) 52 294 (40) 220 (30) 0151 0153

KIAA1239 1362 19 (100) 19 (100) 42 479 (35) 338 (25) 0110 0048

KIAA2013 516 19 (100) 19 (100) 52 221 (43) 178 (34) 0148 0093

LIG4 1017 19 (100) 19 (100) 39 434 (43) 301 (30) 0137 0043

LPHN2 573 19 (100) 19 (100) 47 192 (34) 140 (24) 0106 0108

LRRN1 837 19 (100) 18 (95) 49 345 (41) 240 (29) 0130 0119

MGAT4C 747 18 (95) 18 (100) 42 273 (37) 212 (28) 0126 0079

MIOS 879 19 (100) 19 (100) 45 291 (33) 213 (24) 0107 0065

PANX2 717 19 (100) 19 (100) 44 254 (35) 199 (28) 0125 0059

PDP1 1035 19 (100) 19 (100) 45 348 (34) 261 (25) 0110 0080

PPL 1338 19 (100) 17 (89) 47 645 (48) 485 (36) 0156 0064

RAG1 1380 19 (100) 19 (100) 51 550 (40) 438 (32) 0146 0072

RAG2 792 19 (100) 18 (95) 49 406 (51) 310 (39) 0184 0090

SACS 1101 19 (100) 19 (100) 40 383 (35) 282 (26) 0105 0040

TTN 984 19 (100) 5 (26) 43 378 (38) 269 (27) 0125 0052

ZBED4 1002 19 (100) 19 (100) 39 325 (32) 224 (22) 0093 0040

NOTEmdashLength length of refined alignment Var sites variable sites PI sites parsimony informative sites RCV relative composition variability

2240

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Aneides hardii

Plethodon jordani

Batrachoseps major

Eurycea bislineata

Amphiuma means

Rhyacotriton variegatus

Proteus anguinus

Necturus beyeri

Tylototriton asperrimus

Cynops orientalis

Salamandra salamandra

Dicamptodon aterrimus

Ambystoma mexicanum

Pseudobranchus axanthus

Siren intermedia

Ranodon sibiricus

Batrachuperus yenyuanensis

Onychodactylus fischeri

Andrias davidianus

Silurana tropicalis

Bombina fortinuptialis

Typhlonectes natans

Gallus gallus

Ichthyophis bannanicus

Homo sapiens

Chrysemys picta bellii

Mus musculus

Latimeria chalumnae

01 subsititutionssite

Dicamptodontidae

Hynobiidae

Cryptobranchidae

Sirenidae

Plethodontidae

Rhyacotritonidae

Amphiumidae

Proteidae

Salamandridae

Ambystomatidae

ANURA

GYMNOPHIONA

Cry

ptob

ranc

hoid

eaSa

lam

andr

oide

a

30 nuclear genes

(total 27834 bp)

1

Non-amphibianOutgroup

99101083

99101074

FIG 4 Higher-level phylogenetic relationships of 10 salamander families inferred from 30 NPCL markers The tree was inferred by concatenationanalyses using ML BI and the mixture model (CAT) and by species-tree analysis using the pseudo-ML approach (MP-EST) Branch support valuesare indicated beside nodes in order of ML bootstrap (BPML) BI posterior probability (PPBI) CAT posterior probability (PPCAT) and MP-EST bootstrap(BPMP-EST) from left to right The filled squares represent BPMLgt 95 PPBAY = 10 PPCAT = 10 and BPMP-ESTgt 95 The circled number refers to the nodeof interest studied in figure 6 Branch lengths are from the ML analysis

Table 2 Statistical Confidence (P Values) for Alternative Branching Hypotheses Based on 30-Gene Data Set

Alternative Topology Tested Ln L P Value Rejection

AU BP KH

Best ML 0 0993 097 098

Sirenidae branched earlier 328 0025 0015 002 + + +

Sirenidae is sister to Cryptobranchoidea 423 0004 0002 0004 + + +

Gymnophiona is sister to Anura (monophyletic lissamphibians) 343 0013 0012 0013 + + +

Gymnophiona is sister to Caudata (monophyletic lissamphibians) 439 0002 0001 0001 + + +

Anura is sister to Amniota (paraphyletic lissamphibians) 1446 5E30 0 0 + + +

Gymnophiona is sister to Amniota (paraphyletic lissamphibians) 1290 1E69 0 0 + + +

Caudata is sister to Amniota (paraphyletic lissamphibians) 1728 00001 0 0 + + +

2241

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

other distantly related groups is usually difficult For exampleSpinks et al (2010) collected 120 nuclear markers from aviansquamate and mammalian phylogenetic studies and evalu-ated their PCR performance in turtles They found that onlyeight nuclear markers successfully produced single expectedbands across 13 tested turtle species In another case Fongand Fujita (2011) developed 75 nuclear markers for vertebratephylogenetics but approximately 60 of the target fragmentswere unable to obtain in three test species (two reptiles andone lissamphibian) Therefore although the nested PCRmethod introduced here requires an additional PCR reactionthe extra work is still worthwhile

In PCR-based phylogenetic projects even when thePCR reactions are successful the products often contain sig-nificant nonspecific amplicons Such a condition requires ad-ditional effort involving gel purification and cloning whichinvolves much more time than the PCR reaction Our NPCLtoolkit is specifically designed to solve this problem so thatnormally over 90 of PCR reactions produce strong andsingle expected bands Moreover most of the primers usedto date for nuclear marker sets are degenerate and thus arenot suitable for direct sequencing PCR products Benefitingfrom the use of our nested PCR strategy we introduce an-choring sequences to the ends of PCR fragments while main-taining PCR efficiency Such introduced anchoring sequencesbring the added benefit that two specific sequencing primers(Seq_F and Seq_R) can be used in all Sanger sequencingreactions

One additional feature of our NPCL toolkit is that theaverage length of the NPCLs within it is 1050 bp a lengththat can easily be amplified in one PCR reaction and se-quenced in both directions to allow efficient use of resourcesIn contrast the average marker lengths are 920 bp for 10NPCLs in Li et al (2007) 760 bp for 26 NPCLs in Townsendet al (2008) 873 bp for 22 NPCLs in Shen et al (2011) and470 bp for 75 NPCLs in Fong and Fujita (2011) respectivelyLonger markers will provide more sites than shorter ones for

equivalent money and time This feature makes our NPCLtoolkit more cost-effective than previously developed nuclearmarker sets

Phylogenetic Utility

The vertebrate NPCL toolkit we developed here shows greatpromise in terms of phylogenetic utility A remarkable featureof our NPCL toolkit is that it provided 102 NPCLs with abroad range of evolutionary rates In the case of our demon-stration we used 30 NPCLs to resolve a family-level salaman-der phylogeny using both traditional concatenation analysesand a more promising species-tree analysis However thisexample does not mean that our toolkit performs wellonly on deep-timescale questions Our ongoing study usingthis toolkit to resolve the intra-relationships withinPlethodontidae a rapidly radiating group of salamanders sug-gests that the toolkit developed here also performs well inresolving species-level phylogenies For many vertebrategroups in which applicable nuclear markers are limitedsuch as some teleosts frogs and salamanders our NPCLtoolkit can provide a one-stop solution for phylogenetic stud-ies from the family level to the species level Even for thosegroups in which specific nuclear marker sets have beendeveloped our toolkit is still worth trying as many moreloci can be easily obtained that may resolve some difficultbranches

The Toolkit Is a Good Addition to Sequence CaptureApproaches

Recently sequence capture approaches have been applied tovertebrate phylogenomics (Crawford et al 2012 Fairclothet al 2012 Lemmon et al 2012 McCormack et al 2012)These approaches begin with the selective capture of geno-mic regions Briefly fragmented gDNA is hybridized to DNAor RNA probes either on an array or in solution NontargetedDNA is then washed away and the targeted DNA is se-quenced through NGS The most promising feature of thesequence capture approach is that it can simultaneously pro-duce hundreds to thousands of loci for tens of individualswithin a relatively short time Therefore the sequence captureapproach is considered to be much more cost-effective thanthe PCR-based method According to the calculation ofLemmon et al (2012) for a 100 taxa 500 loci project thecost of the sequence capture method is just 1ndash35 of thePCR-based method

However the sequence capture approach is currently toochallenging for most phylogenetic researchers Typical NGSruns (454 or Illumina) used by the sequence capture methodgenerate 1000000ndash2000000000 sequences Storing and pro-cessing these NGS data require significant computer memoryhardware upgrades and bioinformatic programming skillswhich are often not familiar to most phylogenetic researchersMoreover phylogenetic reconstruction assumes that ortho-logous genes are being analyzed across species For the PCR-based method the detection of paralogous genes is relativelystraightforward However in the sequence capture methodthe captured genomic regions comprise short conservedcores (probe regions) and long unconserved flanking

1

100

90

80

70

60

50

30

20

10

0

40

5 10 15 20 25 30

Concatenation analyses

Species tree analyses

Boo

tstr

ap s

uppo

rt (

)

Number of genes

FIG 5 The effect of increasing the number of nuclear loci on resolvingthe basal split within salamanders Each data point represents the meanof support values estimated from 30 randomly sampled subsets Thedashed line indicates the threshold of 95 bootstrap support valuesThe statistical plots show that the minimum number of nuclear locineeded to robustly resolve the basal split within salamanders is 25

2242

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

sequences Because paralogy cannot be detected until afterthe data are aligned those unalignable sequences will makethe detection of paralogy more difficult

In fact not every phylogenetic project will use more than500 loci as the sequence capture method normally doesBased on both empirical and simulation data 20ndash50 lociare generally sufficient to answer many phylogenetic ques-tions (Rokas et al 2003 Spinks et al 2009) This is also thenumber of loci that most phylogenetic studies will use Insuch a situation adopting the sequence capture method isnot cost-effective because researchers need to use relativelyexpensive NGS sequencing and spend time learning new ex-perimental techniques and carrying out sophisticated bioin-formatic processing Our NPCL toolkit is specially designed forsuch medium-scale phylogenetic projects using approxi-mately 50 loci Such a number of expected loci can beeasily fulfilled with our 102 NPCLs Because more than 90of the PCR reactions generated by our toolkit can be directlysequenced the average cost for one locus per sample is ratherlow In our laboratory generating one new sequence typicallycosts US$ 3 (without considering labor)

In addition researchers sometimes have only tiny amountsof DNA but they wish to perform a multilocus phylogeneticanalysis In such a situation the sequence capture method isdifficult to implement because it normally requires DNA atthe microgram level (Lemmon et al 2012) Our NPCL toolkitcan fill the gap here Benefiting from the use of the nestedPCR strategy the sensitivity of PCR reactions in our method isextremely high In many test experiments in our laboratorythe toolkit and protocol could produce target bands withonly 5ndash10 ng of DNA

Our NPCL toolkit is an alternative to the sequence capturemethod for the everyday work of phylogenetic researchersWhich method to choose depends on two major drivers theamounts of DNA and the expected number of loci Whenyour DNA is limited the better solution may be PCR other-wise sequence capture also works Taking into account themoney and time the two methods require we speculate thatthe economic transition point from PCR to sequence captureis at approximately 100 loci That assessment is why ourtoolkit includes 102 NPCL markers Our proposal is thatwhen using 100 loci one can try our NPCL toolkit whenusing gt100 loci sequence capture should be used

Future Directions

In this study we used multiple genome alignments depositedin the University of CaliforniandashSan Cruz (UCSC) genomebrowser to identify long and conserved exons across jawedvertebrates Benefiting from the use of a nested PCR strategythe experimental performance of the developed NPCLs indi-cated that they are highly stable in all major jawed vertebrategroups Recently a database for mining exon and intron mar-kers called EvolMarkers has been built by Li et al (2012)Careful investigation of this database may identify many con-served exons within nonvertebrates whose interrelationshipsare currently more problematic than those of vertebratesBecause the nonvertebrates constitute many distantly related

groups it may be impossible to develop a single set of PCRprimers for all nonvertebrates However following a similarmarker development strategy multiple NPCL toolkits couldbe constructed for various groups of nonvertebrates such asarthropods echinoderms and molluscs In addition becauseintrons are flanked by conserved exons the idea of the use ofnested PCRs for marker development could also be applied tothe development of EPIC (exon-primed intron crossing) mar-kers which are more suitable in shallow-scale phylogenetic orphylogeographic projects

Despite the benefits of our proposed method it must berecognized that when handling large-scale projects such as200 taxa 100 loci the use of our toolkit and Sanger se-quencing will still require significant cost time and laborAn alternative solution is to use NGS to replace Sanger se-quencing Recently 454 NGS technology has been applied tosequence-targeted gene regions from a pool of PCR productsfrom different specimens (Binladen et al 2007 Meyer et al2008) In such experiments specific tagging sequences mustbe added to amplicons by either PCR (Binladen et al 2007) orblunt-end ligation (Meyer et al 2008) Therefore if the tailingsequences of the second-round PCR primers in our NPCLtoolkit are replaced by tagging sequences instead (for tagdesigning see Faircloth and Glenn 2012) all PCR productscan be pooled together and sequenced with the 454 NGSwhich will greatly reduce the money and time cost comparedwith Sanger sequencing However parallel tagged sequencingvia NGS does not circumvent the process of PCR for eachindividual at each locus which may be the most onerous partof a large-scale phylogenomic project Some promising newtechnologies may help to solve this problem such as micro-droplet PCR (Tewhey et al 2009) where millions of individualPCR reactions are performed in picoliter-scale droplets simul-taneously and the 9696 Dynamic Array by Fluidigm whichallows 96 primer combinations to be used on 96 samples(9216 total PCR reactions) on a single PCR plate Howeverthere has been little research to applying NGS and new high-throughput PCR technologies to phylogenomics so theirease-of-use and cost-effectiveness still need to be explored

Summary

In conclusion we have developed an improved method forrapidly amplifying and sequencing NPCLs that has proven tobe useful and effective for molecular phylogenetic studies ofvertebrates The newly developed toolkit provides an attrac-tive alternative to available methods for vertebratephylogenomics

Materials and Methods

Development of NPCL and Primer Design

Our previous study showed that nested PCR is overwhel-mingly more effective than conventional PCR for obtainingtarget amplicons from complex genomic environments (Shenet al 2012) However nested PCR requires four conservedregions to design two pairs of primers (illustrated in fig 6yellow blocks represent the conserved regions used for primerdesign) which means that only relatively long exons are

2243

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li

et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons

As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the

Enrich target region from complex genomic environment with one pair of high degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min

Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min

Evaluate agarose gel electrophoretic results and sequencing

gDNA

the first round PCR product

Second PCR with tailed primers F2 and R2 using the 1st PCR as template

First PCR with primers F1 and R1 using gDNA as template

PCR evaluating and sequencing

25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing

A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product

PCR was performed with 50-100 ng DNA in a 25 ul reaction

PCR was performed with 1ul 1st PCR in a 25 ul reaction

(i)

(ii)

(iii)

F1

R2

F2

Seq_F

Seq_R

R1

Target Region

Target Region

Target Region

Target Region

Target Region

conserved blocks

single and strong bandN

Y (normally gt 90)

gel cutting or cloning then sequencing

cleanup with ExoI and FastAPdirect sequencing by general sequencing primers

Seq_F and Seq_R

Laboratory Protocol

FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region

2244

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR

Experimental Testing for Candidate Markers in16 Jawed Vertebrates

To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification

All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C

One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the

PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6

Calculation of Relative Evolutionary Rateof 102 NPCLs

The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates

Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships

To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant

2245

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online

Phylogenetic Analyses

Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)

For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model

The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence

We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree

Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8

model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates

Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)

Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)

ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R

Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197

Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786

Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375

Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press

Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749

Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543

Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726

Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990

Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307

Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370

2246

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577

Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321

Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563

Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586

Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179

Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276

Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701

Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145

Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744

Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744

Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971

Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302

McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754

McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538

Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945

Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278

Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618

Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10

Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583

Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892

Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804

Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561

San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599

Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252

Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256

Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508

Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247

Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947

Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129

Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031

Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525

Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195

Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142

Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777

Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110

2247

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156

Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508

Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400

Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140

2248

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Page 3: A Versatile and Highly Efficient Toolkit Including 102 ...completion of many parts of the vertebrate tree of life. Key words: nuclear marker, phylogenomic, vertebrate, salamander,

Phylogenetic Performance in a Real Case

Our demonstration case included 19 salamander species thatspan salamander evolutionary diversity (supplementary tableS2 Supplementary Material online) The nine outgroupspecies (two frogs two caecilians one turtle one bird twomammals and one coelacanth) provided a largely balancedrepresentation of relatives of salamanders The 30 newlyamplified NPCLs exhibited levels of variation comparablewith that of the traditional RAG1 gene with variable sitesvarying between 30 and 51 of all sequenced sites (table 1)The data set combining these 30 NPCLs comprises 27834 bpand exhibits little substitution saturation (supplementary figS1 Supplementary Material online) The phylogenetic analy-ses of the concatenated data set using three tree-buildingmethods (maximum likelihood [ML] Bayesian and CAT-mixture model) produced an identical fully resolved treefor 28 taxa (fig 4) In all 25 nodes of the tree the statisticalsupport was highly robust (BPML 99ndash100 PPBAY = 10PPCAT = 10) The species tree estimated from 30 individualNPCLs without data concatenation using the pseudo-ML

approach is identical to those estimated from the concate-nated analyses All nodes received bootstrap support valuesvarying between 74 and 100 (fig 4) We also conducted phy-logenetic analyses at the amino acid level (9278 deducedamino acid residues) using three tree-building methods(ML Bayesian and CAT-mixture model) The protein treetopology is identical to the DNA result with just slightlylower branch support for some nodes (supplementary figS2 Supplementary Material online) Therefore we did notfurther analyze the protein data set

The monophyly of extant amphibians with respect to am-niotes and the close relationship between frogs and salaman-ders (the Batrachia hypothesis) are repeatedly recovered inmost recent molecular studies (San Mauro et al 2005 Zhanget al 2005 Frost et al 2006 Hugall et al 2007 Roelants et al2007 Zhang and Wake 2009 San Mauro 2010 Pyron andWiens 2011) However a recent molecular study based on26 nuclear genes (Fong et al 2012) supports a caecilianndashsalamander sister relationship with the possible paraphylyof extant amphibians Our phylogenetic analyses based on

RERE

KIAA2013SPEN

CPT2

LPHN2

LRRC8D

DISP1

EXOC8

GGPS1

1 2 3 4 5 6 7 8 9 10 11

12 13 14 15 16 17 18 19 20 21 22 X

KCNF1

SOCS5

MSH6

LRRTM4

LCT

CXCR4

B3GALT1

TTN

IRS1

SH3BP4

LRRN1

CELSR3

GRM2

CASR

ZIC1

P2RY1

MB21D2

KIAA1239

ANKRD50

FAT4PCDH10

FAT1

ENC1

DMXL1

PCDH1ARSI

FAT2

FILIP1DOPEY1

FUT9

REV3L

DSE

SYNE1

GPERMIOS

KBTBD2

PCLO

PIK3CG

LRRN3

EXTL3

MOS

VCPIP1

PDP1

ZFPM2

ZHX2

LINGO2ZEB1

ROR2

GRIN3A

SVEP1

DBC1

DOLK

DCHS1

RAG1RAG2HYPCHST1

FZD4

ARID2

CAND1

MGAT4C

FICD

SACS

FREM2

MYCBP2

SLITRK1

LIG4

STON2

DISP2

VPS18

CILPFEM1BGLCELINGO1

DET1

PPL

DNAH3

SALL1

NTN1

MED1

WFIKKN2

MED13BPTF

EVPL

SETBP1

DSEL

FLRT3

ADNPZBED4

PANX2

TLR7NHS

RP2

FIG 1 Chromosome mapping of the 102 NPCL markers in the Homo sapiens genome

2237

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

(a)

(b)

(c)4500bp

1200bp

200bp

4500bp

4500bp

1200bp

200bp

1200bp

200bp

4500bp

1200bp

200bp

4500bp

4500bp

1200bp

200bp

1200bp

200bp

4500bp

1200bp

200bp

4500bp

4500bp

1200bp

200bp

1200bp

200bp

Tim

e (M

a) 300

200

100

400

0

457

Tim

e (M

a) 300

200

100

400

0

457

Tim

e (M

a) 300

200

100

400

0

457

102 NPCLs

34 NP

CLs

16 Taxa 16 Taxa 16 Taxa 34 NP

CLs

34 NP

CLs

ADNP

ANKRD50

ARSI

BPTF

CAND1

CASR

CXCR4

DBC1

DISP1

DISP2

DNAH3

DOPEY1

ENC1

EXTL3

FAT1

FAT4

FICD

FLRT3

FZD4

GGPS1

GLCE

GRIN3A

GRM2

KBTBD2

KCNF1

KIAA2013

LIG4

LINGO1

LPHN2

LRRN1

MB21D2

MIOS

MYCBP2

NHS

P2RY1

PANX2

PIK3CG

RAG1

RAG2

ROR2

SACS

SALL1

SETBP1

SOCS5

SPEN

STON2

VPS18

ZEB1

ZFPM2

ZHX2

HYP

CHST1

RERE

SVEP1

LCT

PDP1

MED13

LINGO2

LRRC8D

LRRTM4

RP2

SH3BP4

VCPIP1

DET1

FREM2

MSH6

PCLO

PPL

ARID2

DCHS1

DSEL

FILIP1

KIAA1239

SLITRK1

CPT2

MGAT4C

FEM1B

DMXL1

DOLK

ZBED4

REV3L

IRS1

FAT2

CILP

FUT9

LRRN3

TTN

GPER

WFIKKN2

MED1

EXOC8

B3GALT1

CELSR3

DSE

EVPL

PCDH1

TLR7

MOS

PCDH10

NTN1

SYNE1

ZIC1

CAND1 HYP IRS1

FAT2

CELSR3

RERE

SVEP1

DNAH3

GRM2

Sphyrna

Pangasius

Lepisosteus

Protopterus

IchthyophisB

atrachuperusR

anaM

us

StruthioZ

osteropsC

rocodylus

Trionyx

Podocnem

is

Sus

Hem

idactylusN

aja

Sphyrna

Pangasius

Lepisosteus

Protopterus

IchthyophisB

atrachuperusR

anaM

us

StruthioZ

osteropsC

rocodylus

Trionyx

Podocnem

is

Sus

Hem

idactylusN

aja

Sphyrna

Pangasius

Lepisosteus

Protopterus

IchthyophisB

atrachuperusR

anaM

us

StruthioZ

osteropsC

rocodylus

Trionyx

Podocnem

is

Sus

Hem

idactylusN

aja

FIG 2 PCR performance of the 102 NPCL markers in 16 divergent vertebrate species Each square and electrophoretic lane is aligned with the testedspecies (a) The draft divergence timescale for the 16 tested vertebrate species is based on Inoue et al (2010) and the book The Timetree of Life (b) ThePCR performance of each NPCL marker is ranked by three different-colored squares black producing single target band gray having a target band butwith significant nonspecific bands white no target band The102 NPCL markers are sorted according to their PCR success rates (c) Three typicalagarose electrophoresis results for 9 NPCL markers

2238

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

30 nuclear genes provide further support for the monophylyof lissamphibians and the Batrachia hypothesis (fig 4)Additionally all possible hypotheses against the monophylyof extant amphibians and the Batrachia hypothesis wererejected in our topological tests (table 2) However theBatrachia hypothesis did not receive strong support in ourspecies tree analysis (BPMP-EST = 74 fig 4) suggesting thatmore nuclear genes are still needed to resolve this node

The monophyly of the internally fertilizing salamanders(Salamandroidea all salamanders exclusive of HynobiidaeCryptobranchidae and Sirenidae) is strongly supported inour analyses (fig 4) in line with most recent molecular studies(Wiens et al 2005 Roelants et al 2007 Zhang and Wake 2009Pyron and Wiens 2011) but differing strongly from Frostet al (2006) who recovered a clade comprising SirenidaeDicamptodontidae Ambystomatidae and SalamandridaeThe internally fertilizing salamanders include two well-supported clades one is composed of AmbystomatidaeDicamptodontidae and Salamandridae and the otherof Proteidae Rhyacotritonidae Amphiumidae andPlethodontidae (fig 4)

Currently two hypotheses have been proposed regardingthe basal split within living salamanders The traditional viewfavors Sirenidae as the sister group to all remaining salaman-ders (Duellman and Trueb 1994) This hypothesis receivedstrong support in two recent studies (based on mitochon-drial genomes BPML = 98 Zhang and Wake 2009 based onmitochondrial genomes and nuclear genes BPMLgt 80 SanMauro 2010) In contrast some studies suggest that thebasal split separates Cryptobranchidae + Hynobiidae fromall other salamanders (Gao and Shubin 2001 Wiens et al2005 Frost et al 2006 Roelants et al 2007 Pyron and Wiens2011) but always without strong support (BPMLlt 71) Ourphylogenetic analyses based on 30 independent NPCLs sup-ported the second hypothesis that Cryptobranchoidea(Cryptobranchidae + Hynobiidae) branched first withinthe living salamanders This result is extremely robust inour concatenation analyses (BPML = 99 PPBAY = 10PPCAT = 10 fig 4) and statistically rejects all alternative hy-potheses (table 2) In the species tree analysis without dataconcatenation this result is also strong (BPMP-EST = 83fig 4)

How many nuclear genes then are needed to robustlyresolve the question of the basal split within living salaman-ders Our analysis of data subsets indicates a progressiveincrease in the bootstrap support value for the node of inter-est (fig 4) when an increasing number of genes are analyzed(fig 5) Analyses based on single genes rarely resolve the nodeof interest with any confidence Analyses based on 5ndash10 genesproduce bootstrap support values of 60ndash80 in concatena-tion analyses (fig 5) which is congruent with all previousnuclear studies using similar-sized data sets (Roelants et al2007 Pyron and Wiens 2011) Taking a bootstrap value of 95in concatenation analyses as the threshold of ldquofully resolvedrdquothe minimum number of nuclear genes needed to resolve theroot of the salamander tree is approximately 25 The previouscontradictory results may be due to the overwhelminglystrong signals from the mitochondrial genome Because

100908070604 0503020101020304060 50708090100

100908070604 0503020101020304060 50708090100

PCR success rate in 16 vertebrates () Relative evolutionary rate

NTN1ZIC1PCDH10BPTFKBTBD2DBC1LINGO1ARID2FUT9VCPIP1B3GALT1MED1LPHN2GGPS1SACSPANX2CASRPDP1CAND1LRRN3LINGO2MGAT4CFLRT3KIAA1239ENC1SLITRK1ZFPM2ZEB1FZD4MIOSMYCBP2DSELZBED4LRRC8DMB21D2SETBP1DOPEY1SALL1MED13LRRTM4DISP1RAG1DCHS1NHSEXTL3SOCS5ROR2TTNRP2REREANKRD50FREM2HYPGRM2CXCR4DSEGLCEPCLOVPS18GRIN3AADNPGPERFAT4STON2LRRN1ARSIIRS1PIK3CGP2RY1PCDH1KCNF1DMXL1RAG2PPLFAT2LIG4EXOC8DET1FILIP1WFIKKN2ZHX2CILPCELSR3SYNE1LCTTLR7CPT2CHST1FEM1BDNAH3SH3BP4DOLKFICDSPENKIAA2013SVEP1DISP2REV3LFAT1MSH6EVPLMOS

FIG 3 Relative evolutionary rates of 102 NPCL in vertebrates The 102NPCLs are arranged in order of increasing variability on the right sideand their PCR success rates in the 16 tested vertebrates are shown onthe left side NPCLs indicated with asterisks may have extra copiesin teleost genomes and thus are not suitable for phylogenetic studiesof teleosts

2239

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

the initial diversification of salamanders occurred withina relatively short window of time (Weisrock et al 2005)the genealogical histories of individual gene loci maysometimes appear misleading in terms of the relation-ships among species due to incomplete lineage sortingUnfortunately the mitochondrial genome recorded such anincorrect history

DiscussionThe NPCL toolkit and experimental protocol introducedhere is a highly reliable rapid and cost-effective method forbuilding medium-scale multilocus data to produce high-resolution phylogenetic relationships This phylogenomicapproach has the potential to accelerate the completion ofmany parts of the vertebrate tree of life because no furthermarker development is required which is often the bottle-necks in phylogenetic research Once a specific phylogeneticquestion within vertebrates arises researchers simply need tocheck the list for our toolkit and look for NPCL markers withexpected evolutionary rates and experimental performance

for their groups of interest Then many orthologous loci canbe quickly obtained by traditional PCR and Sanger sequenc-ing usually without time-consuming gel cutting and cloningApplying the NPCL toolkit may also have another benefit forassembling the vertebrate tree of life because people workingon different groups can easily use the same set of loci whichwill facilitate combined analyses

Merits of the Toolkit

Because of the use of the nested PCR strategy outlined earliermost NPCL in the toolkit work for all major jawed vertebrategroups with high experimental success rates (nor-mallygt 95) Such results were achieved in unified PCR con-ditions without any extra effort involving cycling conditionoptimization This feature of the toolkit enables it to surpasspreviously developed nuclear marker sets (Murphy et al 2001Li et al 2007 Thomson et al 2008 Townsend et al 2008Wright et al 2008 Portik et al 2011 Shen et al 2011Zhou et al 2011) Most previous nuclear marker sets weredeveloped for specific animal groups and their application to

Table 1 Summary Information for the 30 NPCL Amplified in 19 Salamander Taxa

Gene Length (bp) TaxaAmplified ()

PCR ProductsDirectly

Sequenced ()

GC VarSites ()

PI Sites()

Overall Mean

P Distance RCV

BPTF 552 19 (100) 19 (100) 43 163 (30) 118 (21) 0098 0093

CAND1 1155 19 (100) 17 (89) 44 403 (35) 314 (27) 0116 0065

DET1 711 19 (100) 18 (95) 46 275 (39) 216 (30) 0131 0091

DISP1 960 19 (100) 19 (100) 41 317 (33) 211 (22) 0096 0072

DNAH3 918 19 (100) 19 (100) 42 389 (42) 304 (33) 0139 0049

DOLK 672 16 (84) 12 (75) 52 316 (47) 236 (35) 0173 0126

DSEL 1266 19 (100) 19 (100) 44 546 (43) 415 (33) 0148 0055

ENC1 1083 19 (100) 19 (100) 51 363 (34) 279 (26) 0120 0057

EXTL3 1245 19 (100) 17 (89) 47 465 (37) 322 (26) 0118 0067

FAT4 738 19 (100) 19 (100) 45 344 (47) 249 (34) 0156 0072

FICD 510 18 (95) 18 (100) 44 169 (33) 124 (24) 0111 0074

GRM2 690 18 (95) 18 (100) 54 240 (35) 176 (26) 0115 0118

HYP 1260 19 (100) 19 (100) 47 516 (41) 359 (28) 0122 0049

KBTBD2 1059 19 (100) 19 (100) 44 406 (38) 246 (23) 0103 0040

KCNF1 735 19 (100) 19 (100) 52 294 (40) 220 (30) 0151 0153

KIAA1239 1362 19 (100) 19 (100) 42 479 (35) 338 (25) 0110 0048

KIAA2013 516 19 (100) 19 (100) 52 221 (43) 178 (34) 0148 0093

LIG4 1017 19 (100) 19 (100) 39 434 (43) 301 (30) 0137 0043

LPHN2 573 19 (100) 19 (100) 47 192 (34) 140 (24) 0106 0108

LRRN1 837 19 (100) 18 (95) 49 345 (41) 240 (29) 0130 0119

MGAT4C 747 18 (95) 18 (100) 42 273 (37) 212 (28) 0126 0079

MIOS 879 19 (100) 19 (100) 45 291 (33) 213 (24) 0107 0065

PANX2 717 19 (100) 19 (100) 44 254 (35) 199 (28) 0125 0059

PDP1 1035 19 (100) 19 (100) 45 348 (34) 261 (25) 0110 0080

PPL 1338 19 (100) 17 (89) 47 645 (48) 485 (36) 0156 0064

RAG1 1380 19 (100) 19 (100) 51 550 (40) 438 (32) 0146 0072

RAG2 792 19 (100) 18 (95) 49 406 (51) 310 (39) 0184 0090

SACS 1101 19 (100) 19 (100) 40 383 (35) 282 (26) 0105 0040

TTN 984 19 (100) 5 (26) 43 378 (38) 269 (27) 0125 0052

ZBED4 1002 19 (100) 19 (100) 39 325 (32) 224 (22) 0093 0040

NOTEmdashLength length of refined alignment Var sites variable sites PI sites parsimony informative sites RCV relative composition variability

2240

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Aneides hardii

Plethodon jordani

Batrachoseps major

Eurycea bislineata

Amphiuma means

Rhyacotriton variegatus

Proteus anguinus

Necturus beyeri

Tylototriton asperrimus

Cynops orientalis

Salamandra salamandra

Dicamptodon aterrimus

Ambystoma mexicanum

Pseudobranchus axanthus

Siren intermedia

Ranodon sibiricus

Batrachuperus yenyuanensis

Onychodactylus fischeri

Andrias davidianus

Silurana tropicalis

Bombina fortinuptialis

Typhlonectes natans

Gallus gallus

Ichthyophis bannanicus

Homo sapiens

Chrysemys picta bellii

Mus musculus

Latimeria chalumnae

01 subsititutionssite

Dicamptodontidae

Hynobiidae

Cryptobranchidae

Sirenidae

Plethodontidae

Rhyacotritonidae

Amphiumidae

Proteidae

Salamandridae

Ambystomatidae

ANURA

GYMNOPHIONA

Cry

ptob

ranc

hoid

eaSa

lam

andr

oide

a

30 nuclear genes

(total 27834 bp)

1

Non-amphibianOutgroup

99101083

99101074

FIG 4 Higher-level phylogenetic relationships of 10 salamander families inferred from 30 NPCL markers The tree was inferred by concatenationanalyses using ML BI and the mixture model (CAT) and by species-tree analysis using the pseudo-ML approach (MP-EST) Branch support valuesare indicated beside nodes in order of ML bootstrap (BPML) BI posterior probability (PPBI) CAT posterior probability (PPCAT) and MP-EST bootstrap(BPMP-EST) from left to right The filled squares represent BPMLgt 95 PPBAY = 10 PPCAT = 10 and BPMP-ESTgt 95 The circled number refers to the nodeof interest studied in figure 6 Branch lengths are from the ML analysis

Table 2 Statistical Confidence (P Values) for Alternative Branching Hypotheses Based on 30-Gene Data Set

Alternative Topology Tested Ln L P Value Rejection

AU BP KH

Best ML 0 0993 097 098

Sirenidae branched earlier 328 0025 0015 002 + + +

Sirenidae is sister to Cryptobranchoidea 423 0004 0002 0004 + + +

Gymnophiona is sister to Anura (monophyletic lissamphibians) 343 0013 0012 0013 + + +

Gymnophiona is sister to Caudata (monophyletic lissamphibians) 439 0002 0001 0001 + + +

Anura is sister to Amniota (paraphyletic lissamphibians) 1446 5E30 0 0 + + +

Gymnophiona is sister to Amniota (paraphyletic lissamphibians) 1290 1E69 0 0 + + +

Caudata is sister to Amniota (paraphyletic lissamphibians) 1728 00001 0 0 + + +

2241

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

other distantly related groups is usually difficult For exampleSpinks et al (2010) collected 120 nuclear markers from aviansquamate and mammalian phylogenetic studies and evalu-ated their PCR performance in turtles They found that onlyeight nuclear markers successfully produced single expectedbands across 13 tested turtle species In another case Fongand Fujita (2011) developed 75 nuclear markers for vertebratephylogenetics but approximately 60 of the target fragmentswere unable to obtain in three test species (two reptiles andone lissamphibian) Therefore although the nested PCRmethod introduced here requires an additional PCR reactionthe extra work is still worthwhile

In PCR-based phylogenetic projects even when thePCR reactions are successful the products often contain sig-nificant nonspecific amplicons Such a condition requires ad-ditional effort involving gel purification and cloning whichinvolves much more time than the PCR reaction Our NPCLtoolkit is specifically designed to solve this problem so thatnormally over 90 of PCR reactions produce strong andsingle expected bands Moreover most of the primers usedto date for nuclear marker sets are degenerate and thus arenot suitable for direct sequencing PCR products Benefitingfrom the use of our nested PCR strategy we introduce an-choring sequences to the ends of PCR fragments while main-taining PCR efficiency Such introduced anchoring sequencesbring the added benefit that two specific sequencing primers(Seq_F and Seq_R) can be used in all Sanger sequencingreactions

One additional feature of our NPCL toolkit is that theaverage length of the NPCLs within it is 1050 bp a lengththat can easily be amplified in one PCR reaction and se-quenced in both directions to allow efficient use of resourcesIn contrast the average marker lengths are 920 bp for 10NPCLs in Li et al (2007) 760 bp for 26 NPCLs in Townsendet al (2008) 873 bp for 22 NPCLs in Shen et al (2011) and470 bp for 75 NPCLs in Fong and Fujita (2011) respectivelyLonger markers will provide more sites than shorter ones for

equivalent money and time This feature makes our NPCLtoolkit more cost-effective than previously developed nuclearmarker sets

Phylogenetic Utility

The vertebrate NPCL toolkit we developed here shows greatpromise in terms of phylogenetic utility A remarkable featureof our NPCL toolkit is that it provided 102 NPCLs with abroad range of evolutionary rates In the case of our demon-stration we used 30 NPCLs to resolve a family-level salaman-der phylogeny using both traditional concatenation analysesand a more promising species-tree analysis However thisexample does not mean that our toolkit performs wellonly on deep-timescale questions Our ongoing study usingthis toolkit to resolve the intra-relationships withinPlethodontidae a rapidly radiating group of salamanders sug-gests that the toolkit developed here also performs well inresolving species-level phylogenies For many vertebrategroups in which applicable nuclear markers are limitedsuch as some teleosts frogs and salamanders our NPCLtoolkit can provide a one-stop solution for phylogenetic stud-ies from the family level to the species level Even for thosegroups in which specific nuclear marker sets have beendeveloped our toolkit is still worth trying as many moreloci can be easily obtained that may resolve some difficultbranches

The Toolkit Is a Good Addition to Sequence CaptureApproaches

Recently sequence capture approaches have been applied tovertebrate phylogenomics (Crawford et al 2012 Fairclothet al 2012 Lemmon et al 2012 McCormack et al 2012)These approaches begin with the selective capture of geno-mic regions Briefly fragmented gDNA is hybridized to DNAor RNA probes either on an array or in solution NontargetedDNA is then washed away and the targeted DNA is se-quenced through NGS The most promising feature of thesequence capture approach is that it can simultaneously pro-duce hundreds to thousands of loci for tens of individualswithin a relatively short time Therefore the sequence captureapproach is considered to be much more cost-effective thanthe PCR-based method According to the calculation ofLemmon et al (2012) for a 100 taxa 500 loci project thecost of the sequence capture method is just 1ndash35 of thePCR-based method

However the sequence capture approach is currently toochallenging for most phylogenetic researchers Typical NGSruns (454 or Illumina) used by the sequence capture methodgenerate 1000000ndash2000000000 sequences Storing and pro-cessing these NGS data require significant computer memoryhardware upgrades and bioinformatic programming skillswhich are often not familiar to most phylogenetic researchersMoreover phylogenetic reconstruction assumes that ortho-logous genes are being analyzed across species For the PCR-based method the detection of paralogous genes is relativelystraightforward However in the sequence capture methodthe captured genomic regions comprise short conservedcores (probe regions) and long unconserved flanking

1

100

90

80

70

60

50

30

20

10

0

40

5 10 15 20 25 30

Concatenation analyses

Species tree analyses

Boo

tstr

ap s

uppo

rt (

)

Number of genes

FIG 5 The effect of increasing the number of nuclear loci on resolvingthe basal split within salamanders Each data point represents the meanof support values estimated from 30 randomly sampled subsets Thedashed line indicates the threshold of 95 bootstrap support valuesThe statistical plots show that the minimum number of nuclear locineeded to robustly resolve the basal split within salamanders is 25

2242

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

sequences Because paralogy cannot be detected until afterthe data are aligned those unalignable sequences will makethe detection of paralogy more difficult

In fact not every phylogenetic project will use more than500 loci as the sequence capture method normally doesBased on both empirical and simulation data 20ndash50 lociare generally sufficient to answer many phylogenetic ques-tions (Rokas et al 2003 Spinks et al 2009) This is also thenumber of loci that most phylogenetic studies will use Insuch a situation adopting the sequence capture method isnot cost-effective because researchers need to use relativelyexpensive NGS sequencing and spend time learning new ex-perimental techniques and carrying out sophisticated bioin-formatic processing Our NPCL toolkit is specially designed forsuch medium-scale phylogenetic projects using approxi-mately 50 loci Such a number of expected loci can beeasily fulfilled with our 102 NPCLs Because more than 90of the PCR reactions generated by our toolkit can be directlysequenced the average cost for one locus per sample is ratherlow In our laboratory generating one new sequence typicallycosts US$ 3 (without considering labor)

In addition researchers sometimes have only tiny amountsof DNA but they wish to perform a multilocus phylogeneticanalysis In such a situation the sequence capture method isdifficult to implement because it normally requires DNA atthe microgram level (Lemmon et al 2012) Our NPCL toolkitcan fill the gap here Benefiting from the use of the nestedPCR strategy the sensitivity of PCR reactions in our method isextremely high In many test experiments in our laboratorythe toolkit and protocol could produce target bands withonly 5ndash10 ng of DNA

Our NPCL toolkit is an alternative to the sequence capturemethod for the everyday work of phylogenetic researchersWhich method to choose depends on two major drivers theamounts of DNA and the expected number of loci Whenyour DNA is limited the better solution may be PCR other-wise sequence capture also works Taking into account themoney and time the two methods require we speculate thatthe economic transition point from PCR to sequence captureis at approximately 100 loci That assessment is why ourtoolkit includes 102 NPCL markers Our proposal is thatwhen using 100 loci one can try our NPCL toolkit whenusing gt100 loci sequence capture should be used

Future Directions

In this study we used multiple genome alignments depositedin the University of CaliforniandashSan Cruz (UCSC) genomebrowser to identify long and conserved exons across jawedvertebrates Benefiting from the use of a nested PCR strategythe experimental performance of the developed NPCLs indi-cated that they are highly stable in all major jawed vertebrategroups Recently a database for mining exon and intron mar-kers called EvolMarkers has been built by Li et al (2012)Careful investigation of this database may identify many con-served exons within nonvertebrates whose interrelationshipsare currently more problematic than those of vertebratesBecause the nonvertebrates constitute many distantly related

groups it may be impossible to develop a single set of PCRprimers for all nonvertebrates However following a similarmarker development strategy multiple NPCL toolkits couldbe constructed for various groups of nonvertebrates such asarthropods echinoderms and molluscs In addition becauseintrons are flanked by conserved exons the idea of the use ofnested PCRs for marker development could also be applied tothe development of EPIC (exon-primed intron crossing) mar-kers which are more suitable in shallow-scale phylogenetic orphylogeographic projects

Despite the benefits of our proposed method it must berecognized that when handling large-scale projects such as200 taxa 100 loci the use of our toolkit and Sanger se-quencing will still require significant cost time and laborAn alternative solution is to use NGS to replace Sanger se-quencing Recently 454 NGS technology has been applied tosequence-targeted gene regions from a pool of PCR productsfrom different specimens (Binladen et al 2007 Meyer et al2008) In such experiments specific tagging sequences mustbe added to amplicons by either PCR (Binladen et al 2007) orblunt-end ligation (Meyer et al 2008) Therefore if the tailingsequences of the second-round PCR primers in our NPCLtoolkit are replaced by tagging sequences instead (for tagdesigning see Faircloth and Glenn 2012) all PCR productscan be pooled together and sequenced with the 454 NGSwhich will greatly reduce the money and time cost comparedwith Sanger sequencing However parallel tagged sequencingvia NGS does not circumvent the process of PCR for eachindividual at each locus which may be the most onerous partof a large-scale phylogenomic project Some promising newtechnologies may help to solve this problem such as micro-droplet PCR (Tewhey et al 2009) where millions of individualPCR reactions are performed in picoliter-scale droplets simul-taneously and the 9696 Dynamic Array by Fluidigm whichallows 96 primer combinations to be used on 96 samples(9216 total PCR reactions) on a single PCR plate Howeverthere has been little research to applying NGS and new high-throughput PCR technologies to phylogenomics so theirease-of-use and cost-effectiveness still need to be explored

Summary

In conclusion we have developed an improved method forrapidly amplifying and sequencing NPCLs that has proven tobe useful and effective for molecular phylogenetic studies ofvertebrates The newly developed toolkit provides an attrac-tive alternative to available methods for vertebratephylogenomics

Materials and Methods

Development of NPCL and Primer Design

Our previous study showed that nested PCR is overwhel-mingly more effective than conventional PCR for obtainingtarget amplicons from complex genomic environments (Shenet al 2012) However nested PCR requires four conservedregions to design two pairs of primers (illustrated in fig 6yellow blocks represent the conserved regions used for primerdesign) which means that only relatively long exons are

2243

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li

et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons

As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the

Enrich target region from complex genomic environment with one pair of high degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min

Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min

Evaluate agarose gel electrophoretic results and sequencing

gDNA

the first round PCR product

Second PCR with tailed primers F2 and R2 using the 1st PCR as template

First PCR with primers F1 and R1 using gDNA as template

PCR evaluating and sequencing

25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing

A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product

PCR was performed with 50-100 ng DNA in a 25 ul reaction

PCR was performed with 1ul 1st PCR in a 25 ul reaction

(i)

(ii)

(iii)

F1

R2

F2

Seq_F

Seq_R

R1

Target Region

Target Region

Target Region

Target Region

Target Region

conserved blocks

single and strong bandN

Y (normally gt 90)

gel cutting or cloning then sequencing

cleanup with ExoI and FastAPdirect sequencing by general sequencing primers

Seq_F and Seq_R

Laboratory Protocol

FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region

2244

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR

Experimental Testing for Candidate Markers in16 Jawed Vertebrates

To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification

All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C

One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the

PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6

Calculation of Relative Evolutionary Rateof 102 NPCLs

The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates

Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships

To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant

2245

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online

Phylogenetic Analyses

Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)

For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model

The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence

We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree

Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8

model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates

Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)

Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)

ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R

Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197

Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786

Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375

Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press

Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749

Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543

Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726

Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990

Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307

Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370

2246

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577

Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321

Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563

Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586

Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179

Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276

Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701

Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145

Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744

Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744

Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971

Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302

McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754

McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538

Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945

Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278

Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618

Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10

Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583

Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892

Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804

Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561

San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599

Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252

Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256

Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508

Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247

Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947

Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129

Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031

Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525

Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195

Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142

Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777

Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110

2247

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156

Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508

Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400

Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140

2248

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Page 4: A Versatile and Highly Efficient Toolkit Including 102 ...completion of many parts of the vertebrate tree of life. Key words: nuclear marker, phylogenomic, vertebrate, salamander,

(a)

(b)

(c)4500bp

1200bp

200bp

4500bp

4500bp

1200bp

200bp

1200bp

200bp

4500bp

1200bp

200bp

4500bp

4500bp

1200bp

200bp

1200bp

200bp

4500bp

1200bp

200bp

4500bp

4500bp

1200bp

200bp

1200bp

200bp

Tim

e (M

a) 300

200

100

400

0

457

Tim

e (M

a) 300

200

100

400

0

457

Tim

e (M

a) 300

200

100

400

0

457

102 NPCLs

34 NP

CLs

16 Taxa 16 Taxa 16 Taxa 34 NP

CLs

34 NP

CLs

ADNP

ANKRD50

ARSI

BPTF

CAND1

CASR

CXCR4

DBC1

DISP1

DISP2

DNAH3

DOPEY1

ENC1

EXTL3

FAT1

FAT4

FICD

FLRT3

FZD4

GGPS1

GLCE

GRIN3A

GRM2

KBTBD2

KCNF1

KIAA2013

LIG4

LINGO1

LPHN2

LRRN1

MB21D2

MIOS

MYCBP2

NHS

P2RY1

PANX2

PIK3CG

RAG1

RAG2

ROR2

SACS

SALL1

SETBP1

SOCS5

SPEN

STON2

VPS18

ZEB1

ZFPM2

ZHX2

HYP

CHST1

RERE

SVEP1

LCT

PDP1

MED13

LINGO2

LRRC8D

LRRTM4

RP2

SH3BP4

VCPIP1

DET1

FREM2

MSH6

PCLO

PPL

ARID2

DCHS1

DSEL

FILIP1

KIAA1239

SLITRK1

CPT2

MGAT4C

FEM1B

DMXL1

DOLK

ZBED4

REV3L

IRS1

FAT2

CILP

FUT9

LRRN3

TTN

GPER

WFIKKN2

MED1

EXOC8

B3GALT1

CELSR3

DSE

EVPL

PCDH1

TLR7

MOS

PCDH10

NTN1

SYNE1

ZIC1

CAND1 HYP IRS1

FAT2

CELSR3

RERE

SVEP1

DNAH3

GRM2

Sphyrna

Pangasius

Lepisosteus

Protopterus

IchthyophisB

atrachuperusR

anaM

us

StruthioZ

osteropsC

rocodylus

Trionyx

Podocnem

is

Sus

Hem

idactylusN

aja

Sphyrna

Pangasius

Lepisosteus

Protopterus

IchthyophisB

atrachuperusR

anaM

us

StruthioZ

osteropsC

rocodylus

Trionyx

Podocnem

is

Sus

Hem

idactylusN

aja

Sphyrna

Pangasius

Lepisosteus

Protopterus

IchthyophisB

atrachuperusR

anaM

us

StruthioZ

osteropsC

rocodylus

Trionyx

Podocnem

is

Sus

Hem

idactylusN

aja

FIG 2 PCR performance of the 102 NPCL markers in 16 divergent vertebrate species Each square and electrophoretic lane is aligned with the testedspecies (a) The draft divergence timescale for the 16 tested vertebrate species is based on Inoue et al (2010) and the book The Timetree of Life (b) ThePCR performance of each NPCL marker is ranked by three different-colored squares black producing single target band gray having a target band butwith significant nonspecific bands white no target band The102 NPCL markers are sorted according to their PCR success rates (c) Three typicalagarose electrophoresis results for 9 NPCL markers

2238

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

30 nuclear genes provide further support for the monophylyof lissamphibians and the Batrachia hypothesis (fig 4)Additionally all possible hypotheses against the monophylyof extant amphibians and the Batrachia hypothesis wererejected in our topological tests (table 2) However theBatrachia hypothesis did not receive strong support in ourspecies tree analysis (BPMP-EST = 74 fig 4) suggesting thatmore nuclear genes are still needed to resolve this node

The monophyly of the internally fertilizing salamanders(Salamandroidea all salamanders exclusive of HynobiidaeCryptobranchidae and Sirenidae) is strongly supported inour analyses (fig 4) in line with most recent molecular studies(Wiens et al 2005 Roelants et al 2007 Zhang and Wake 2009Pyron and Wiens 2011) but differing strongly from Frostet al (2006) who recovered a clade comprising SirenidaeDicamptodontidae Ambystomatidae and SalamandridaeThe internally fertilizing salamanders include two well-supported clades one is composed of AmbystomatidaeDicamptodontidae and Salamandridae and the otherof Proteidae Rhyacotritonidae Amphiumidae andPlethodontidae (fig 4)

Currently two hypotheses have been proposed regardingthe basal split within living salamanders The traditional viewfavors Sirenidae as the sister group to all remaining salaman-ders (Duellman and Trueb 1994) This hypothesis receivedstrong support in two recent studies (based on mitochon-drial genomes BPML = 98 Zhang and Wake 2009 based onmitochondrial genomes and nuclear genes BPMLgt 80 SanMauro 2010) In contrast some studies suggest that thebasal split separates Cryptobranchidae + Hynobiidae fromall other salamanders (Gao and Shubin 2001 Wiens et al2005 Frost et al 2006 Roelants et al 2007 Pyron and Wiens2011) but always without strong support (BPMLlt 71) Ourphylogenetic analyses based on 30 independent NPCLs sup-ported the second hypothesis that Cryptobranchoidea(Cryptobranchidae + Hynobiidae) branched first withinthe living salamanders This result is extremely robust inour concatenation analyses (BPML = 99 PPBAY = 10PPCAT = 10 fig 4) and statistically rejects all alternative hy-potheses (table 2) In the species tree analysis without dataconcatenation this result is also strong (BPMP-EST = 83fig 4)

How many nuclear genes then are needed to robustlyresolve the question of the basal split within living salaman-ders Our analysis of data subsets indicates a progressiveincrease in the bootstrap support value for the node of inter-est (fig 4) when an increasing number of genes are analyzed(fig 5) Analyses based on single genes rarely resolve the nodeof interest with any confidence Analyses based on 5ndash10 genesproduce bootstrap support values of 60ndash80 in concatena-tion analyses (fig 5) which is congruent with all previousnuclear studies using similar-sized data sets (Roelants et al2007 Pyron and Wiens 2011) Taking a bootstrap value of 95in concatenation analyses as the threshold of ldquofully resolvedrdquothe minimum number of nuclear genes needed to resolve theroot of the salamander tree is approximately 25 The previouscontradictory results may be due to the overwhelminglystrong signals from the mitochondrial genome Because

100908070604 0503020101020304060 50708090100

100908070604 0503020101020304060 50708090100

PCR success rate in 16 vertebrates () Relative evolutionary rate

NTN1ZIC1PCDH10BPTFKBTBD2DBC1LINGO1ARID2FUT9VCPIP1B3GALT1MED1LPHN2GGPS1SACSPANX2CASRPDP1CAND1LRRN3LINGO2MGAT4CFLRT3KIAA1239ENC1SLITRK1ZFPM2ZEB1FZD4MIOSMYCBP2DSELZBED4LRRC8DMB21D2SETBP1DOPEY1SALL1MED13LRRTM4DISP1RAG1DCHS1NHSEXTL3SOCS5ROR2TTNRP2REREANKRD50FREM2HYPGRM2CXCR4DSEGLCEPCLOVPS18GRIN3AADNPGPERFAT4STON2LRRN1ARSIIRS1PIK3CGP2RY1PCDH1KCNF1DMXL1RAG2PPLFAT2LIG4EXOC8DET1FILIP1WFIKKN2ZHX2CILPCELSR3SYNE1LCTTLR7CPT2CHST1FEM1BDNAH3SH3BP4DOLKFICDSPENKIAA2013SVEP1DISP2REV3LFAT1MSH6EVPLMOS

FIG 3 Relative evolutionary rates of 102 NPCL in vertebrates The 102NPCLs are arranged in order of increasing variability on the right sideand their PCR success rates in the 16 tested vertebrates are shown onthe left side NPCLs indicated with asterisks may have extra copiesin teleost genomes and thus are not suitable for phylogenetic studiesof teleosts

2239

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

the initial diversification of salamanders occurred withina relatively short window of time (Weisrock et al 2005)the genealogical histories of individual gene loci maysometimes appear misleading in terms of the relation-ships among species due to incomplete lineage sortingUnfortunately the mitochondrial genome recorded such anincorrect history

DiscussionThe NPCL toolkit and experimental protocol introducedhere is a highly reliable rapid and cost-effective method forbuilding medium-scale multilocus data to produce high-resolution phylogenetic relationships This phylogenomicapproach has the potential to accelerate the completion ofmany parts of the vertebrate tree of life because no furthermarker development is required which is often the bottle-necks in phylogenetic research Once a specific phylogeneticquestion within vertebrates arises researchers simply need tocheck the list for our toolkit and look for NPCL markers withexpected evolutionary rates and experimental performance

for their groups of interest Then many orthologous loci canbe quickly obtained by traditional PCR and Sanger sequenc-ing usually without time-consuming gel cutting and cloningApplying the NPCL toolkit may also have another benefit forassembling the vertebrate tree of life because people workingon different groups can easily use the same set of loci whichwill facilitate combined analyses

Merits of the Toolkit

Because of the use of the nested PCR strategy outlined earliermost NPCL in the toolkit work for all major jawed vertebrategroups with high experimental success rates (nor-mallygt 95) Such results were achieved in unified PCR con-ditions without any extra effort involving cycling conditionoptimization This feature of the toolkit enables it to surpasspreviously developed nuclear marker sets (Murphy et al 2001Li et al 2007 Thomson et al 2008 Townsend et al 2008Wright et al 2008 Portik et al 2011 Shen et al 2011Zhou et al 2011) Most previous nuclear marker sets weredeveloped for specific animal groups and their application to

Table 1 Summary Information for the 30 NPCL Amplified in 19 Salamander Taxa

Gene Length (bp) TaxaAmplified ()

PCR ProductsDirectly

Sequenced ()

GC VarSites ()

PI Sites()

Overall Mean

P Distance RCV

BPTF 552 19 (100) 19 (100) 43 163 (30) 118 (21) 0098 0093

CAND1 1155 19 (100) 17 (89) 44 403 (35) 314 (27) 0116 0065

DET1 711 19 (100) 18 (95) 46 275 (39) 216 (30) 0131 0091

DISP1 960 19 (100) 19 (100) 41 317 (33) 211 (22) 0096 0072

DNAH3 918 19 (100) 19 (100) 42 389 (42) 304 (33) 0139 0049

DOLK 672 16 (84) 12 (75) 52 316 (47) 236 (35) 0173 0126

DSEL 1266 19 (100) 19 (100) 44 546 (43) 415 (33) 0148 0055

ENC1 1083 19 (100) 19 (100) 51 363 (34) 279 (26) 0120 0057

EXTL3 1245 19 (100) 17 (89) 47 465 (37) 322 (26) 0118 0067

FAT4 738 19 (100) 19 (100) 45 344 (47) 249 (34) 0156 0072

FICD 510 18 (95) 18 (100) 44 169 (33) 124 (24) 0111 0074

GRM2 690 18 (95) 18 (100) 54 240 (35) 176 (26) 0115 0118

HYP 1260 19 (100) 19 (100) 47 516 (41) 359 (28) 0122 0049

KBTBD2 1059 19 (100) 19 (100) 44 406 (38) 246 (23) 0103 0040

KCNF1 735 19 (100) 19 (100) 52 294 (40) 220 (30) 0151 0153

KIAA1239 1362 19 (100) 19 (100) 42 479 (35) 338 (25) 0110 0048

KIAA2013 516 19 (100) 19 (100) 52 221 (43) 178 (34) 0148 0093

LIG4 1017 19 (100) 19 (100) 39 434 (43) 301 (30) 0137 0043

LPHN2 573 19 (100) 19 (100) 47 192 (34) 140 (24) 0106 0108

LRRN1 837 19 (100) 18 (95) 49 345 (41) 240 (29) 0130 0119

MGAT4C 747 18 (95) 18 (100) 42 273 (37) 212 (28) 0126 0079

MIOS 879 19 (100) 19 (100) 45 291 (33) 213 (24) 0107 0065

PANX2 717 19 (100) 19 (100) 44 254 (35) 199 (28) 0125 0059

PDP1 1035 19 (100) 19 (100) 45 348 (34) 261 (25) 0110 0080

PPL 1338 19 (100) 17 (89) 47 645 (48) 485 (36) 0156 0064

RAG1 1380 19 (100) 19 (100) 51 550 (40) 438 (32) 0146 0072

RAG2 792 19 (100) 18 (95) 49 406 (51) 310 (39) 0184 0090

SACS 1101 19 (100) 19 (100) 40 383 (35) 282 (26) 0105 0040

TTN 984 19 (100) 5 (26) 43 378 (38) 269 (27) 0125 0052

ZBED4 1002 19 (100) 19 (100) 39 325 (32) 224 (22) 0093 0040

NOTEmdashLength length of refined alignment Var sites variable sites PI sites parsimony informative sites RCV relative composition variability

2240

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Aneides hardii

Plethodon jordani

Batrachoseps major

Eurycea bislineata

Amphiuma means

Rhyacotriton variegatus

Proteus anguinus

Necturus beyeri

Tylototriton asperrimus

Cynops orientalis

Salamandra salamandra

Dicamptodon aterrimus

Ambystoma mexicanum

Pseudobranchus axanthus

Siren intermedia

Ranodon sibiricus

Batrachuperus yenyuanensis

Onychodactylus fischeri

Andrias davidianus

Silurana tropicalis

Bombina fortinuptialis

Typhlonectes natans

Gallus gallus

Ichthyophis bannanicus

Homo sapiens

Chrysemys picta bellii

Mus musculus

Latimeria chalumnae

01 subsititutionssite

Dicamptodontidae

Hynobiidae

Cryptobranchidae

Sirenidae

Plethodontidae

Rhyacotritonidae

Amphiumidae

Proteidae

Salamandridae

Ambystomatidae

ANURA

GYMNOPHIONA

Cry

ptob

ranc

hoid

eaSa

lam

andr

oide

a

30 nuclear genes

(total 27834 bp)

1

Non-amphibianOutgroup

99101083

99101074

FIG 4 Higher-level phylogenetic relationships of 10 salamander families inferred from 30 NPCL markers The tree was inferred by concatenationanalyses using ML BI and the mixture model (CAT) and by species-tree analysis using the pseudo-ML approach (MP-EST) Branch support valuesare indicated beside nodes in order of ML bootstrap (BPML) BI posterior probability (PPBI) CAT posterior probability (PPCAT) and MP-EST bootstrap(BPMP-EST) from left to right The filled squares represent BPMLgt 95 PPBAY = 10 PPCAT = 10 and BPMP-ESTgt 95 The circled number refers to the nodeof interest studied in figure 6 Branch lengths are from the ML analysis

Table 2 Statistical Confidence (P Values) for Alternative Branching Hypotheses Based on 30-Gene Data Set

Alternative Topology Tested Ln L P Value Rejection

AU BP KH

Best ML 0 0993 097 098

Sirenidae branched earlier 328 0025 0015 002 + + +

Sirenidae is sister to Cryptobranchoidea 423 0004 0002 0004 + + +

Gymnophiona is sister to Anura (monophyletic lissamphibians) 343 0013 0012 0013 + + +

Gymnophiona is sister to Caudata (monophyletic lissamphibians) 439 0002 0001 0001 + + +

Anura is sister to Amniota (paraphyletic lissamphibians) 1446 5E30 0 0 + + +

Gymnophiona is sister to Amniota (paraphyletic lissamphibians) 1290 1E69 0 0 + + +

Caudata is sister to Amniota (paraphyletic lissamphibians) 1728 00001 0 0 + + +

2241

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

other distantly related groups is usually difficult For exampleSpinks et al (2010) collected 120 nuclear markers from aviansquamate and mammalian phylogenetic studies and evalu-ated their PCR performance in turtles They found that onlyeight nuclear markers successfully produced single expectedbands across 13 tested turtle species In another case Fongand Fujita (2011) developed 75 nuclear markers for vertebratephylogenetics but approximately 60 of the target fragmentswere unable to obtain in three test species (two reptiles andone lissamphibian) Therefore although the nested PCRmethod introduced here requires an additional PCR reactionthe extra work is still worthwhile

In PCR-based phylogenetic projects even when thePCR reactions are successful the products often contain sig-nificant nonspecific amplicons Such a condition requires ad-ditional effort involving gel purification and cloning whichinvolves much more time than the PCR reaction Our NPCLtoolkit is specifically designed to solve this problem so thatnormally over 90 of PCR reactions produce strong andsingle expected bands Moreover most of the primers usedto date for nuclear marker sets are degenerate and thus arenot suitable for direct sequencing PCR products Benefitingfrom the use of our nested PCR strategy we introduce an-choring sequences to the ends of PCR fragments while main-taining PCR efficiency Such introduced anchoring sequencesbring the added benefit that two specific sequencing primers(Seq_F and Seq_R) can be used in all Sanger sequencingreactions

One additional feature of our NPCL toolkit is that theaverage length of the NPCLs within it is 1050 bp a lengththat can easily be amplified in one PCR reaction and se-quenced in both directions to allow efficient use of resourcesIn contrast the average marker lengths are 920 bp for 10NPCLs in Li et al (2007) 760 bp for 26 NPCLs in Townsendet al (2008) 873 bp for 22 NPCLs in Shen et al (2011) and470 bp for 75 NPCLs in Fong and Fujita (2011) respectivelyLonger markers will provide more sites than shorter ones for

equivalent money and time This feature makes our NPCLtoolkit more cost-effective than previously developed nuclearmarker sets

Phylogenetic Utility

The vertebrate NPCL toolkit we developed here shows greatpromise in terms of phylogenetic utility A remarkable featureof our NPCL toolkit is that it provided 102 NPCLs with abroad range of evolutionary rates In the case of our demon-stration we used 30 NPCLs to resolve a family-level salaman-der phylogeny using both traditional concatenation analysesand a more promising species-tree analysis However thisexample does not mean that our toolkit performs wellonly on deep-timescale questions Our ongoing study usingthis toolkit to resolve the intra-relationships withinPlethodontidae a rapidly radiating group of salamanders sug-gests that the toolkit developed here also performs well inresolving species-level phylogenies For many vertebrategroups in which applicable nuclear markers are limitedsuch as some teleosts frogs and salamanders our NPCLtoolkit can provide a one-stop solution for phylogenetic stud-ies from the family level to the species level Even for thosegroups in which specific nuclear marker sets have beendeveloped our toolkit is still worth trying as many moreloci can be easily obtained that may resolve some difficultbranches

The Toolkit Is a Good Addition to Sequence CaptureApproaches

Recently sequence capture approaches have been applied tovertebrate phylogenomics (Crawford et al 2012 Fairclothet al 2012 Lemmon et al 2012 McCormack et al 2012)These approaches begin with the selective capture of geno-mic regions Briefly fragmented gDNA is hybridized to DNAor RNA probes either on an array or in solution NontargetedDNA is then washed away and the targeted DNA is se-quenced through NGS The most promising feature of thesequence capture approach is that it can simultaneously pro-duce hundreds to thousands of loci for tens of individualswithin a relatively short time Therefore the sequence captureapproach is considered to be much more cost-effective thanthe PCR-based method According to the calculation ofLemmon et al (2012) for a 100 taxa 500 loci project thecost of the sequence capture method is just 1ndash35 of thePCR-based method

However the sequence capture approach is currently toochallenging for most phylogenetic researchers Typical NGSruns (454 or Illumina) used by the sequence capture methodgenerate 1000000ndash2000000000 sequences Storing and pro-cessing these NGS data require significant computer memoryhardware upgrades and bioinformatic programming skillswhich are often not familiar to most phylogenetic researchersMoreover phylogenetic reconstruction assumes that ortho-logous genes are being analyzed across species For the PCR-based method the detection of paralogous genes is relativelystraightforward However in the sequence capture methodthe captured genomic regions comprise short conservedcores (probe regions) and long unconserved flanking

1

100

90

80

70

60

50

30

20

10

0

40

5 10 15 20 25 30

Concatenation analyses

Species tree analyses

Boo

tstr

ap s

uppo

rt (

)

Number of genes

FIG 5 The effect of increasing the number of nuclear loci on resolvingthe basal split within salamanders Each data point represents the meanof support values estimated from 30 randomly sampled subsets Thedashed line indicates the threshold of 95 bootstrap support valuesThe statistical plots show that the minimum number of nuclear locineeded to robustly resolve the basal split within salamanders is 25

2242

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

sequences Because paralogy cannot be detected until afterthe data are aligned those unalignable sequences will makethe detection of paralogy more difficult

In fact not every phylogenetic project will use more than500 loci as the sequence capture method normally doesBased on both empirical and simulation data 20ndash50 lociare generally sufficient to answer many phylogenetic ques-tions (Rokas et al 2003 Spinks et al 2009) This is also thenumber of loci that most phylogenetic studies will use Insuch a situation adopting the sequence capture method isnot cost-effective because researchers need to use relativelyexpensive NGS sequencing and spend time learning new ex-perimental techniques and carrying out sophisticated bioin-formatic processing Our NPCL toolkit is specially designed forsuch medium-scale phylogenetic projects using approxi-mately 50 loci Such a number of expected loci can beeasily fulfilled with our 102 NPCLs Because more than 90of the PCR reactions generated by our toolkit can be directlysequenced the average cost for one locus per sample is ratherlow In our laboratory generating one new sequence typicallycosts US$ 3 (without considering labor)

In addition researchers sometimes have only tiny amountsof DNA but they wish to perform a multilocus phylogeneticanalysis In such a situation the sequence capture method isdifficult to implement because it normally requires DNA atthe microgram level (Lemmon et al 2012) Our NPCL toolkitcan fill the gap here Benefiting from the use of the nestedPCR strategy the sensitivity of PCR reactions in our method isextremely high In many test experiments in our laboratorythe toolkit and protocol could produce target bands withonly 5ndash10 ng of DNA

Our NPCL toolkit is an alternative to the sequence capturemethod for the everyday work of phylogenetic researchersWhich method to choose depends on two major drivers theamounts of DNA and the expected number of loci Whenyour DNA is limited the better solution may be PCR other-wise sequence capture also works Taking into account themoney and time the two methods require we speculate thatthe economic transition point from PCR to sequence captureis at approximately 100 loci That assessment is why ourtoolkit includes 102 NPCL markers Our proposal is thatwhen using 100 loci one can try our NPCL toolkit whenusing gt100 loci sequence capture should be used

Future Directions

In this study we used multiple genome alignments depositedin the University of CaliforniandashSan Cruz (UCSC) genomebrowser to identify long and conserved exons across jawedvertebrates Benefiting from the use of a nested PCR strategythe experimental performance of the developed NPCLs indi-cated that they are highly stable in all major jawed vertebrategroups Recently a database for mining exon and intron mar-kers called EvolMarkers has been built by Li et al (2012)Careful investigation of this database may identify many con-served exons within nonvertebrates whose interrelationshipsare currently more problematic than those of vertebratesBecause the nonvertebrates constitute many distantly related

groups it may be impossible to develop a single set of PCRprimers for all nonvertebrates However following a similarmarker development strategy multiple NPCL toolkits couldbe constructed for various groups of nonvertebrates such asarthropods echinoderms and molluscs In addition becauseintrons are flanked by conserved exons the idea of the use ofnested PCRs for marker development could also be applied tothe development of EPIC (exon-primed intron crossing) mar-kers which are more suitable in shallow-scale phylogenetic orphylogeographic projects

Despite the benefits of our proposed method it must berecognized that when handling large-scale projects such as200 taxa 100 loci the use of our toolkit and Sanger se-quencing will still require significant cost time and laborAn alternative solution is to use NGS to replace Sanger se-quencing Recently 454 NGS technology has been applied tosequence-targeted gene regions from a pool of PCR productsfrom different specimens (Binladen et al 2007 Meyer et al2008) In such experiments specific tagging sequences mustbe added to amplicons by either PCR (Binladen et al 2007) orblunt-end ligation (Meyer et al 2008) Therefore if the tailingsequences of the second-round PCR primers in our NPCLtoolkit are replaced by tagging sequences instead (for tagdesigning see Faircloth and Glenn 2012) all PCR productscan be pooled together and sequenced with the 454 NGSwhich will greatly reduce the money and time cost comparedwith Sanger sequencing However parallel tagged sequencingvia NGS does not circumvent the process of PCR for eachindividual at each locus which may be the most onerous partof a large-scale phylogenomic project Some promising newtechnologies may help to solve this problem such as micro-droplet PCR (Tewhey et al 2009) where millions of individualPCR reactions are performed in picoliter-scale droplets simul-taneously and the 9696 Dynamic Array by Fluidigm whichallows 96 primer combinations to be used on 96 samples(9216 total PCR reactions) on a single PCR plate Howeverthere has been little research to applying NGS and new high-throughput PCR technologies to phylogenomics so theirease-of-use and cost-effectiveness still need to be explored

Summary

In conclusion we have developed an improved method forrapidly amplifying and sequencing NPCLs that has proven tobe useful and effective for molecular phylogenetic studies ofvertebrates The newly developed toolkit provides an attrac-tive alternative to available methods for vertebratephylogenomics

Materials and Methods

Development of NPCL and Primer Design

Our previous study showed that nested PCR is overwhel-mingly more effective than conventional PCR for obtainingtarget amplicons from complex genomic environments (Shenet al 2012) However nested PCR requires four conservedregions to design two pairs of primers (illustrated in fig 6yellow blocks represent the conserved regions used for primerdesign) which means that only relatively long exons are

2243

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li

et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons

As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the

Enrich target region from complex genomic environment with one pair of high degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min

Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min

Evaluate agarose gel electrophoretic results and sequencing

gDNA

the first round PCR product

Second PCR with tailed primers F2 and R2 using the 1st PCR as template

First PCR with primers F1 and R1 using gDNA as template

PCR evaluating and sequencing

25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing

A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product

PCR was performed with 50-100 ng DNA in a 25 ul reaction

PCR was performed with 1ul 1st PCR in a 25 ul reaction

(i)

(ii)

(iii)

F1

R2

F2

Seq_F

Seq_R

R1

Target Region

Target Region

Target Region

Target Region

Target Region

conserved blocks

single and strong bandN

Y (normally gt 90)

gel cutting or cloning then sequencing

cleanup with ExoI and FastAPdirect sequencing by general sequencing primers

Seq_F and Seq_R

Laboratory Protocol

FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region

2244

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR

Experimental Testing for Candidate Markers in16 Jawed Vertebrates

To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification

All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C

One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the

PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6

Calculation of Relative Evolutionary Rateof 102 NPCLs

The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates

Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships

To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant

2245

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online

Phylogenetic Analyses

Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)

For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model

The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence

We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree

Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8

model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates

Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)

Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)

ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R

Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197

Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786

Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375

Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press

Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749

Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543

Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726

Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990

Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307

Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370

2246

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577

Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321

Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563

Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586

Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179

Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276

Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701

Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145

Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744

Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744

Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971

Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302

McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754

McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538

Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945

Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278

Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618

Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10

Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583

Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892

Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804

Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561

San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599

Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252

Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256

Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508

Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247

Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947

Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129

Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031

Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525

Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195

Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142

Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777

Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110

2247

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156

Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508

Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400

Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140

2248

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Page 5: A Versatile and Highly Efficient Toolkit Including 102 ...completion of many parts of the vertebrate tree of life. Key words: nuclear marker, phylogenomic, vertebrate, salamander,

30 nuclear genes provide further support for the monophylyof lissamphibians and the Batrachia hypothesis (fig 4)Additionally all possible hypotheses against the monophylyof extant amphibians and the Batrachia hypothesis wererejected in our topological tests (table 2) However theBatrachia hypothesis did not receive strong support in ourspecies tree analysis (BPMP-EST = 74 fig 4) suggesting thatmore nuclear genes are still needed to resolve this node

The monophyly of the internally fertilizing salamanders(Salamandroidea all salamanders exclusive of HynobiidaeCryptobranchidae and Sirenidae) is strongly supported inour analyses (fig 4) in line with most recent molecular studies(Wiens et al 2005 Roelants et al 2007 Zhang and Wake 2009Pyron and Wiens 2011) but differing strongly from Frostet al (2006) who recovered a clade comprising SirenidaeDicamptodontidae Ambystomatidae and SalamandridaeThe internally fertilizing salamanders include two well-supported clades one is composed of AmbystomatidaeDicamptodontidae and Salamandridae and the otherof Proteidae Rhyacotritonidae Amphiumidae andPlethodontidae (fig 4)

Currently two hypotheses have been proposed regardingthe basal split within living salamanders The traditional viewfavors Sirenidae as the sister group to all remaining salaman-ders (Duellman and Trueb 1994) This hypothesis receivedstrong support in two recent studies (based on mitochon-drial genomes BPML = 98 Zhang and Wake 2009 based onmitochondrial genomes and nuclear genes BPMLgt 80 SanMauro 2010) In contrast some studies suggest that thebasal split separates Cryptobranchidae + Hynobiidae fromall other salamanders (Gao and Shubin 2001 Wiens et al2005 Frost et al 2006 Roelants et al 2007 Pyron and Wiens2011) but always without strong support (BPMLlt 71) Ourphylogenetic analyses based on 30 independent NPCLs sup-ported the second hypothesis that Cryptobranchoidea(Cryptobranchidae + Hynobiidae) branched first withinthe living salamanders This result is extremely robust inour concatenation analyses (BPML = 99 PPBAY = 10PPCAT = 10 fig 4) and statistically rejects all alternative hy-potheses (table 2) In the species tree analysis without dataconcatenation this result is also strong (BPMP-EST = 83fig 4)

How many nuclear genes then are needed to robustlyresolve the question of the basal split within living salaman-ders Our analysis of data subsets indicates a progressiveincrease in the bootstrap support value for the node of inter-est (fig 4) when an increasing number of genes are analyzed(fig 5) Analyses based on single genes rarely resolve the nodeof interest with any confidence Analyses based on 5ndash10 genesproduce bootstrap support values of 60ndash80 in concatena-tion analyses (fig 5) which is congruent with all previousnuclear studies using similar-sized data sets (Roelants et al2007 Pyron and Wiens 2011) Taking a bootstrap value of 95in concatenation analyses as the threshold of ldquofully resolvedrdquothe minimum number of nuclear genes needed to resolve theroot of the salamander tree is approximately 25 The previouscontradictory results may be due to the overwhelminglystrong signals from the mitochondrial genome Because

100908070604 0503020101020304060 50708090100

100908070604 0503020101020304060 50708090100

PCR success rate in 16 vertebrates () Relative evolutionary rate

NTN1ZIC1PCDH10BPTFKBTBD2DBC1LINGO1ARID2FUT9VCPIP1B3GALT1MED1LPHN2GGPS1SACSPANX2CASRPDP1CAND1LRRN3LINGO2MGAT4CFLRT3KIAA1239ENC1SLITRK1ZFPM2ZEB1FZD4MIOSMYCBP2DSELZBED4LRRC8DMB21D2SETBP1DOPEY1SALL1MED13LRRTM4DISP1RAG1DCHS1NHSEXTL3SOCS5ROR2TTNRP2REREANKRD50FREM2HYPGRM2CXCR4DSEGLCEPCLOVPS18GRIN3AADNPGPERFAT4STON2LRRN1ARSIIRS1PIK3CGP2RY1PCDH1KCNF1DMXL1RAG2PPLFAT2LIG4EXOC8DET1FILIP1WFIKKN2ZHX2CILPCELSR3SYNE1LCTTLR7CPT2CHST1FEM1BDNAH3SH3BP4DOLKFICDSPENKIAA2013SVEP1DISP2REV3LFAT1MSH6EVPLMOS

FIG 3 Relative evolutionary rates of 102 NPCL in vertebrates The 102NPCLs are arranged in order of increasing variability on the right sideand their PCR success rates in the 16 tested vertebrates are shown onthe left side NPCLs indicated with asterisks may have extra copiesin teleost genomes and thus are not suitable for phylogenetic studiesof teleosts

2239

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

the initial diversification of salamanders occurred withina relatively short window of time (Weisrock et al 2005)the genealogical histories of individual gene loci maysometimes appear misleading in terms of the relation-ships among species due to incomplete lineage sortingUnfortunately the mitochondrial genome recorded such anincorrect history

DiscussionThe NPCL toolkit and experimental protocol introducedhere is a highly reliable rapid and cost-effective method forbuilding medium-scale multilocus data to produce high-resolution phylogenetic relationships This phylogenomicapproach has the potential to accelerate the completion ofmany parts of the vertebrate tree of life because no furthermarker development is required which is often the bottle-necks in phylogenetic research Once a specific phylogeneticquestion within vertebrates arises researchers simply need tocheck the list for our toolkit and look for NPCL markers withexpected evolutionary rates and experimental performance

for their groups of interest Then many orthologous loci canbe quickly obtained by traditional PCR and Sanger sequenc-ing usually without time-consuming gel cutting and cloningApplying the NPCL toolkit may also have another benefit forassembling the vertebrate tree of life because people workingon different groups can easily use the same set of loci whichwill facilitate combined analyses

Merits of the Toolkit

Because of the use of the nested PCR strategy outlined earliermost NPCL in the toolkit work for all major jawed vertebrategroups with high experimental success rates (nor-mallygt 95) Such results were achieved in unified PCR con-ditions without any extra effort involving cycling conditionoptimization This feature of the toolkit enables it to surpasspreviously developed nuclear marker sets (Murphy et al 2001Li et al 2007 Thomson et al 2008 Townsend et al 2008Wright et al 2008 Portik et al 2011 Shen et al 2011Zhou et al 2011) Most previous nuclear marker sets weredeveloped for specific animal groups and their application to

Table 1 Summary Information for the 30 NPCL Amplified in 19 Salamander Taxa

Gene Length (bp) TaxaAmplified ()

PCR ProductsDirectly

Sequenced ()

GC VarSites ()

PI Sites()

Overall Mean

P Distance RCV

BPTF 552 19 (100) 19 (100) 43 163 (30) 118 (21) 0098 0093

CAND1 1155 19 (100) 17 (89) 44 403 (35) 314 (27) 0116 0065

DET1 711 19 (100) 18 (95) 46 275 (39) 216 (30) 0131 0091

DISP1 960 19 (100) 19 (100) 41 317 (33) 211 (22) 0096 0072

DNAH3 918 19 (100) 19 (100) 42 389 (42) 304 (33) 0139 0049

DOLK 672 16 (84) 12 (75) 52 316 (47) 236 (35) 0173 0126

DSEL 1266 19 (100) 19 (100) 44 546 (43) 415 (33) 0148 0055

ENC1 1083 19 (100) 19 (100) 51 363 (34) 279 (26) 0120 0057

EXTL3 1245 19 (100) 17 (89) 47 465 (37) 322 (26) 0118 0067

FAT4 738 19 (100) 19 (100) 45 344 (47) 249 (34) 0156 0072

FICD 510 18 (95) 18 (100) 44 169 (33) 124 (24) 0111 0074

GRM2 690 18 (95) 18 (100) 54 240 (35) 176 (26) 0115 0118

HYP 1260 19 (100) 19 (100) 47 516 (41) 359 (28) 0122 0049

KBTBD2 1059 19 (100) 19 (100) 44 406 (38) 246 (23) 0103 0040

KCNF1 735 19 (100) 19 (100) 52 294 (40) 220 (30) 0151 0153

KIAA1239 1362 19 (100) 19 (100) 42 479 (35) 338 (25) 0110 0048

KIAA2013 516 19 (100) 19 (100) 52 221 (43) 178 (34) 0148 0093

LIG4 1017 19 (100) 19 (100) 39 434 (43) 301 (30) 0137 0043

LPHN2 573 19 (100) 19 (100) 47 192 (34) 140 (24) 0106 0108

LRRN1 837 19 (100) 18 (95) 49 345 (41) 240 (29) 0130 0119

MGAT4C 747 18 (95) 18 (100) 42 273 (37) 212 (28) 0126 0079

MIOS 879 19 (100) 19 (100) 45 291 (33) 213 (24) 0107 0065

PANX2 717 19 (100) 19 (100) 44 254 (35) 199 (28) 0125 0059

PDP1 1035 19 (100) 19 (100) 45 348 (34) 261 (25) 0110 0080

PPL 1338 19 (100) 17 (89) 47 645 (48) 485 (36) 0156 0064

RAG1 1380 19 (100) 19 (100) 51 550 (40) 438 (32) 0146 0072

RAG2 792 19 (100) 18 (95) 49 406 (51) 310 (39) 0184 0090

SACS 1101 19 (100) 19 (100) 40 383 (35) 282 (26) 0105 0040

TTN 984 19 (100) 5 (26) 43 378 (38) 269 (27) 0125 0052

ZBED4 1002 19 (100) 19 (100) 39 325 (32) 224 (22) 0093 0040

NOTEmdashLength length of refined alignment Var sites variable sites PI sites parsimony informative sites RCV relative composition variability

2240

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Aneides hardii

Plethodon jordani

Batrachoseps major

Eurycea bislineata

Amphiuma means

Rhyacotriton variegatus

Proteus anguinus

Necturus beyeri

Tylototriton asperrimus

Cynops orientalis

Salamandra salamandra

Dicamptodon aterrimus

Ambystoma mexicanum

Pseudobranchus axanthus

Siren intermedia

Ranodon sibiricus

Batrachuperus yenyuanensis

Onychodactylus fischeri

Andrias davidianus

Silurana tropicalis

Bombina fortinuptialis

Typhlonectes natans

Gallus gallus

Ichthyophis bannanicus

Homo sapiens

Chrysemys picta bellii

Mus musculus

Latimeria chalumnae

01 subsititutionssite

Dicamptodontidae

Hynobiidae

Cryptobranchidae

Sirenidae

Plethodontidae

Rhyacotritonidae

Amphiumidae

Proteidae

Salamandridae

Ambystomatidae

ANURA

GYMNOPHIONA

Cry

ptob

ranc

hoid

eaSa

lam

andr

oide

a

30 nuclear genes

(total 27834 bp)

1

Non-amphibianOutgroup

99101083

99101074

FIG 4 Higher-level phylogenetic relationships of 10 salamander families inferred from 30 NPCL markers The tree was inferred by concatenationanalyses using ML BI and the mixture model (CAT) and by species-tree analysis using the pseudo-ML approach (MP-EST) Branch support valuesare indicated beside nodes in order of ML bootstrap (BPML) BI posterior probability (PPBI) CAT posterior probability (PPCAT) and MP-EST bootstrap(BPMP-EST) from left to right The filled squares represent BPMLgt 95 PPBAY = 10 PPCAT = 10 and BPMP-ESTgt 95 The circled number refers to the nodeof interest studied in figure 6 Branch lengths are from the ML analysis

Table 2 Statistical Confidence (P Values) for Alternative Branching Hypotheses Based on 30-Gene Data Set

Alternative Topology Tested Ln L P Value Rejection

AU BP KH

Best ML 0 0993 097 098

Sirenidae branched earlier 328 0025 0015 002 + + +

Sirenidae is sister to Cryptobranchoidea 423 0004 0002 0004 + + +

Gymnophiona is sister to Anura (monophyletic lissamphibians) 343 0013 0012 0013 + + +

Gymnophiona is sister to Caudata (monophyletic lissamphibians) 439 0002 0001 0001 + + +

Anura is sister to Amniota (paraphyletic lissamphibians) 1446 5E30 0 0 + + +

Gymnophiona is sister to Amniota (paraphyletic lissamphibians) 1290 1E69 0 0 + + +

Caudata is sister to Amniota (paraphyletic lissamphibians) 1728 00001 0 0 + + +

2241

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

other distantly related groups is usually difficult For exampleSpinks et al (2010) collected 120 nuclear markers from aviansquamate and mammalian phylogenetic studies and evalu-ated their PCR performance in turtles They found that onlyeight nuclear markers successfully produced single expectedbands across 13 tested turtle species In another case Fongand Fujita (2011) developed 75 nuclear markers for vertebratephylogenetics but approximately 60 of the target fragmentswere unable to obtain in three test species (two reptiles andone lissamphibian) Therefore although the nested PCRmethod introduced here requires an additional PCR reactionthe extra work is still worthwhile

In PCR-based phylogenetic projects even when thePCR reactions are successful the products often contain sig-nificant nonspecific amplicons Such a condition requires ad-ditional effort involving gel purification and cloning whichinvolves much more time than the PCR reaction Our NPCLtoolkit is specifically designed to solve this problem so thatnormally over 90 of PCR reactions produce strong andsingle expected bands Moreover most of the primers usedto date for nuclear marker sets are degenerate and thus arenot suitable for direct sequencing PCR products Benefitingfrom the use of our nested PCR strategy we introduce an-choring sequences to the ends of PCR fragments while main-taining PCR efficiency Such introduced anchoring sequencesbring the added benefit that two specific sequencing primers(Seq_F and Seq_R) can be used in all Sanger sequencingreactions

One additional feature of our NPCL toolkit is that theaverage length of the NPCLs within it is 1050 bp a lengththat can easily be amplified in one PCR reaction and se-quenced in both directions to allow efficient use of resourcesIn contrast the average marker lengths are 920 bp for 10NPCLs in Li et al (2007) 760 bp for 26 NPCLs in Townsendet al (2008) 873 bp for 22 NPCLs in Shen et al (2011) and470 bp for 75 NPCLs in Fong and Fujita (2011) respectivelyLonger markers will provide more sites than shorter ones for

equivalent money and time This feature makes our NPCLtoolkit more cost-effective than previously developed nuclearmarker sets

Phylogenetic Utility

The vertebrate NPCL toolkit we developed here shows greatpromise in terms of phylogenetic utility A remarkable featureof our NPCL toolkit is that it provided 102 NPCLs with abroad range of evolutionary rates In the case of our demon-stration we used 30 NPCLs to resolve a family-level salaman-der phylogeny using both traditional concatenation analysesand a more promising species-tree analysis However thisexample does not mean that our toolkit performs wellonly on deep-timescale questions Our ongoing study usingthis toolkit to resolve the intra-relationships withinPlethodontidae a rapidly radiating group of salamanders sug-gests that the toolkit developed here also performs well inresolving species-level phylogenies For many vertebrategroups in which applicable nuclear markers are limitedsuch as some teleosts frogs and salamanders our NPCLtoolkit can provide a one-stop solution for phylogenetic stud-ies from the family level to the species level Even for thosegroups in which specific nuclear marker sets have beendeveloped our toolkit is still worth trying as many moreloci can be easily obtained that may resolve some difficultbranches

The Toolkit Is a Good Addition to Sequence CaptureApproaches

Recently sequence capture approaches have been applied tovertebrate phylogenomics (Crawford et al 2012 Fairclothet al 2012 Lemmon et al 2012 McCormack et al 2012)These approaches begin with the selective capture of geno-mic regions Briefly fragmented gDNA is hybridized to DNAor RNA probes either on an array or in solution NontargetedDNA is then washed away and the targeted DNA is se-quenced through NGS The most promising feature of thesequence capture approach is that it can simultaneously pro-duce hundreds to thousands of loci for tens of individualswithin a relatively short time Therefore the sequence captureapproach is considered to be much more cost-effective thanthe PCR-based method According to the calculation ofLemmon et al (2012) for a 100 taxa 500 loci project thecost of the sequence capture method is just 1ndash35 of thePCR-based method

However the sequence capture approach is currently toochallenging for most phylogenetic researchers Typical NGSruns (454 or Illumina) used by the sequence capture methodgenerate 1000000ndash2000000000 sequences Storing and pro-cessing these NGS data require significant computer memoryhardware upgrades and bioinformatic programming skillswhich are often not familiar to most phylogenetic researchersMoreover phylogenetic reconstruction assumes that ortho-logous genes are being analyzed across species For the PCR-based method the detection of paralogous genes is relativelystraightforward However in the sequence capture methodthe captured genomic regions comprise short conservedcores (probe regions) and long unconserved flanking

1

100

90

80

70

60

50

30

20

10

0

40

5 10 15 20 25 30

Concatenation analyses

Species tree analyses

Boo

tstr

ap s

uppo

rt (

)

Number of genes

FIG 5 The effect of increasing the number of nuclear loci on resolvingthe basal split within salamanders Each data point represents the meanof support values estimated from 30 randomly sampled subsets Thedashed line indicates the threshold of 95 bootstrap support valuesThe statistical plots show that the minimum number of nuclear locineeded to robustly resolve the basal split within salamanders is 25

2242

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

sequences Because paralogy cannot be detected until afterthe data are aligned those unalignable sequences will makethe detection of paralogy more difficult

In fact not every phylogenetic project will use more than500 loci as the sequence capture method normally doesBased on both empirical and simulation data 20ndash50 lociare generally sufficient to answer many phylogenetic ques-tions (Rokas et al 2003 Spinks et al 2009) This is also thenumber of loci that most phylogenetic studies will use Insuch a situation adopting the sequence capture method isnot cost-effective because researchers need to use relativelyexpensive NGS sequencing and spend time learning new ex-perimental techniques and carrying out sophisticated bioin-formatic processing Our NPCL toolkit is specially designed forsuch medium-scale phylogenetic projects using approxi-mately 50 loci Such a number of expected loci can beeasily fulfilled with our 102 NPCLs Because more than 90of the PCR reactions generated by our toolkit can be directlysequenced the average cost for one locus per sample is ratherlow In our laboratory generating one new sequence typicallycosts US$ 3 (without considering labor)

In addition researchers sometimes have only tiny amountsof DNA but they wish to perform a multilocus phylogeneticanalysis In such a situation the sequence capture method isdifficult to implement because it normally requires DNA atthe microgram level (Lemmon et al 2012) Our NPCL toolkitcan fill the gap here Benefiting from the use of the nestedPCR strategy the sensitivity of PCR reactions in our method isextremely high In many test experiments in our laboratorythe toolkit and protocol could produce target bands withonly 5ndash10 ng of DNA

Our NPCL toolkit is an alternative to the sequence capturemethod for the everyday work of phylogenetic researchersWhich method to choose depends on two major drivers theamounts of DNA and the expected number of loci Whenyour DNA is limited the better solution may be PCR other-wise sequence capture also works Taking into account themoney and time the two methods require we speculate thatthe economic transition point from PCR to sequence captureis at approximately 100 loci That assessment is why ourtoolkit includes 102 NPCL markers Our proposal is thatwhen using 100 loci one can try our NPCL toolkit whenusing gt100 loci sequence capture should be used

Future Directions

In this study we used multiple genome alignments depositedin the University of CaliforniandashSan Cruz (UCSC) genomebrowser to identify long and conserved exons across jawedvertebrates Benefiting from the use of a nested PCR strategythe experimental performance of the developed NPCLs indi-cated that they are highly stable in all major jawed vertebrategroups Recently a database for mining exon and intron mar-kers called EvolMarkers has been built by Li et al (2012)Careful investigation of this database may identify many con-served exons within nonvertebrates whose interrelationshipsare currently more problematic than those of vertebratesBecause the nonvertebrates constitute many distantly related

groups it may be impossible to develop a single set of PCRprimers for all nonvertebrates However following a similarmarker development strategy multiple NPCL toolkits couldbe constructed for various groups of nonvertebrates such asarthropods echinoderms and molluscs In addition becauseintrons are flanked by conserved exons the idea of the use ofnested PCRs for marker development could also be applied tothe development of EPIC (exon-primed intron crossing) mar-kers which are more suitable in shallow-scale phylogenetic orphylogeographic projects

Despite the benefits of our proposed method it must berecognized that when handling large-scale projects such as200 taxa 100 loci the use of our toolkit and Sanger se-quencing will still require significant cost time and laborAn alternative solution is to use NGS to replace Sanger se-quencing Recently 454 NGS technology has been applied tosequence-targeted gene regions from a pool of PCR productsfrom different specimens (Binladen et al 2007 Meyer et al2008) In such experiments specific tagging sequences mustbe added to amplicons by either PCR (Binladen et al 2007) orblunt-end ligation (Meyer et al 2008) Therefore if the tailingsequences of the second-round PCR primers in our NPCLtoolkit are replaced by tagging sequences instead (for tagdesigning see Faircloth and Glenn 2012) all PCR productscan be pooled together and sequenced with the 454 NGSwhich will greatly reduce the money and time cost comparedwith Sanger sequencing However parallel tagged sequencingvia NGS does not circumvent the process of PCR for eachindividual at each locus which may be the most onerous partof a large-scale phylogenomic project Some promising newtechnologies may help to solve this problem such as micro-droplet PCR (Tewhey et al 2009) where millions of individualPCR reactions are performed in picoliter-scale droplets simul-taneously and the 9696 Dynamic Array by Fluidigm whichallows 96 primer combinations to be used on 96 samples(9216 total PCR reactions) on a single PCR plate Howeverthere has been little research to applying NGS and new high-throughput PCR technologies to phylogenomics so theirease-of-use and cost-effectiveness still need to be explored

Summary

In conclusion we have developed an improved method forrapidly amplifying and sequencing NPCLs that has proven tobe useful and effective for molecular phylogenetic studies ofvertebrates The newly developed toolkit provides an attrac-tive alternative to available methods for vertebratephylogenomics

Materials and Methods

Development of NPCL and Primer Design

Our previous study showed that nested PCR is overwhel-mingly more effective than conventional PCR for obtainingtarget amplicons from complex genomic environments (Shenet al 2012) However nested PCR requires four conservedregions to design two pairs of primers (illustrated in fig 6yellow blocks represent the conserved regions used for primerdesign) which means that only relatively long exons are

2243

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li

et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons

As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the

Enrich target region from complex genomic environment with one pair of high degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min

Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min

Evaluate agarose gel electrophoretic results and sequencing

gDNA

the first round PCR product

Second PCR with tailed primers F2 and R2 using the 1st PCR as template

First PCR with primers F1 and R1 using gDNA as template

PCR evaluating and sequencing

25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing

A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product

PCR was performed with 50-100 ng DNA in a 25 ul reaction

PCR was performed with 1ul 1st PCR in a 25 ul reaction

(i)

(ii)

(iii)

F1

R2

F2

Seq_F

Seq_R

R1

Target Region

Target Region

Target Region

Target Region

Target Region

conserved blocks

single and strong bandN

Y (normally gt 90)

gel cutting or cloning then sequencing

cleanup with ExoI and FastAPdirect sequencing by general sequencing primers

Seq_F and Seq_R

Laboratory Protocol

FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region

2244

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR

Experimental Testing for Candidate Markers in16 Jawed Vertebrates

To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification

All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C

One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the

PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6

Calculation of Relative Evolutionary Rateof 102 NPCLs

The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates

Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships

To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant

2245

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online

Phylogenetic Analyses

Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)

For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model

The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence

We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree

Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8

model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates

Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)

Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)

ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R

Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197

Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786

Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375

Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press

Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749

Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543

Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726

Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990

Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307

Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370

2246

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577

Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321

Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563

Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586

Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179

Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276

Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701

Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145

Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744

Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744

Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971

Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302

McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754

McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538

Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945

Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278

Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618

Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10

Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583

Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892

Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804

Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561

San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599

Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252

Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256

Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508

Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247

Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947

Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129

Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031

Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525

Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195

Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142

Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777

Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110

2247

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156

Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508

Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400

Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140

2248

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Page 6: A Versatile and Highly Efficient Toolkit Including 102 ...completion of many parts of the vertebrate tree of life. Key words: nuclear marker, phylogenomic, vertebrate, salamander,

the initial diversification of salamanders occurred withina relatively short window of time (Weisrock et al 2005)the genealogical histories of individual gene loci maysometimes appear misleading in terms of the relation-ships among species due to incomplete lineage sortingUnfortunately the mitochondrial genome recorded such anincorrect history

DiscussionThe NPCL toolkit and experimental protocol introducedhere is a highly reliable rapid and cost-effective method forbuilding medium-scale multilocus data to produce high-resolution phylogenetic relationships This phylogenomicapproach has the potential to accelerate the completion ofmany parts of the vertebrate tree of life because no furthermarker development is required which is often the bottle-necks in phylogenetic research Once a specific phylogeneticquestion within vertebrates arises researchers simply need tocheck the list for our toolkit and look for NPCL markers withexpected evolutionary rates and experimental performance

for their groups of interest Then many orthologous loci canbe quickly obtained by traditional PCR and Sanger sequenc-ing usually without time-consuming gel cutting and cloningApplying the NPCL toolkit may also have another benefit forassembling the vertebrate tree of life because people workingon different groups can easily use the same set of loci whichwill facilitate combined analyses

Merits of the Toolkit

Because of the use of the nested PCR strategy outlined earliermost NPCL in the toolkit work for all major jawed vertebrategroups with high experimental success rates (nor-mallygt 95) Such results were achieved in unified PCR con-ditions without any extra effort involving cycling conditionoptimization This feature of the toolkit enables it to surpasspreviously developed nuclear marker sets (Murphy et al 2001Li et al 2007 Thomson et al 2008 Townsend et al 2008Wright et al 2008 Portik et al 2011 Shen et al 2011Zhou et al 2011) Most previous nuclear marker sets weredeveloped for specific animal groups and their application to

Table 1 Summary Information for the 30 NPCL Amplified in 19 Salamander Taxa

Gene Length (bp) TaxaAmplified ()

PCR ProductsDirectly

Sequenced ()

GC VarSites ()

PI Sites()

Overall Mean

P Distance RCV

BPTF 552 19 (100) 19 (100) 43 163 (30) 118 (21) 0098 0093

CAND1 1155 19 (100) 17 (89) 44 403 (35) 314 (27) 0116 0065

DET1 711 19 (100) 18 (95) 46 275 (39) 216 (30) 0131 0091

DISP1 960 19 (100) 19 (100) 41 317 (33) 211 (22) 0096 0072

DNAH3 918 19 (100) 19 (100) 42 389 (42) 304 (33) 0139 0049

DOLK 672 16 (84) 12 (75) 52 316 (47) 236 (35) 0173 0126

DSEL 1266 19 (100) 19 (100) 44 546 (43) 415 (33) 0148 0055

ENC1 1083 19 (100) 19 (100) 51 363 (34) 279 (26) 0120 0057

EXTL3 1245 19 (100) 17 (89) 47 465 (37) 322 (26) 0118 0067

FAT4 738 19 (100) 19 (100) 45 344 (47) 249 (34) 0156 0072

FICD 510 18 (95) 18 (100) 44 169 (33) 124 (24) 0111 0074

GRM2 690 18 (95) 18 (100) 54 240 (35) 176 (26) 0115 0118

HYP 1260 19 (100) 19 (100) 47 516 (41) 359 (28) 0122 0049

KBTBD2 1059 19 (100) 19 (100) 44 406 (38) 246 (23) 0103 0040

KCNF1 735 19 (100) 19 (100) 52 294 (40) 220 (30) 0151 0153

KIAA1239 1362 19 (100) 19 (100) 42 479 (35) 338 (25) 0110 0048

KIAA2013 516 19 (100) 19 (100) 52 221 (43) 178 (34) 0148 0093

LIG4 1017 19 (100) 19 (100) 39 434 (43) 301 (30) 0137 0043

LPHN2 573 19 (100) 19 (100) 47 192 (34) 140 (24) 0106 0108

LRRN1 837 19 (100) 18 (95) 49 345 (41) 240 (29) 0130 0119

MGAT4C 747 18 (95) 18 (100) 42 273 (37) 212 (28) 0126 0079

MIOS 879 19 (100) 19 (100) 45 291 (33) 213 (24) 0107 0065

PANX2 717 19 (100) 19 (100) 44 254 (35) 199 (28) 0125 0059

PDP1 1035 19 (100) 19 (100) 45 348 (34) 261 (25) 0110 0080

PPL 1338 19 (100) 17 (89) 47 645 (48) 485 (36) 0156 0064

RAG1 1380 19 (100) 19 (100) 51 550 (40) 438 (32) 0146 0072

RAG2 792 19 (100) 18 (95) 49 406 (51) 310 (39) 0184 0090

SACS 1101 19 (100) 19 (100) 40 383 (35) 282 (26) 0105 0040

TTN 984 19 (100) 5 (26) 43 378 (38) 269 (27) 0125 0052

ZBED4 1002 19 (100) 19 (100) 39 325 (32) 224 (22) 0093 0040

NOTEmdashLength length of refined alignment Var sites variable sites PI sites parsimony informative sites RCV relative composition variability

2240

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Aneides hardii

Plethodon jordani

Batrachoseps major

Eurycea bislineata

Amphiuma means

Rhyacotriton variegatus

Proteus anguinus

Necturus beyeri

Tylototriton asperrimus

Cynops orientalis

Salamandra salamandra

Dicamptodon aterrimus

Ambystoma mexicanum

Pseudobranchus axanthus

Siren intermedia

Ranodon sibiricus

Batrachuperus yenyuanensis

Onychodactylus fischeri

Andrias davidianus

Silurana tropicalis

Bombina fortinuptialis

Typhlonectes natans

Gallus gallus

Ichthyophis bannanicus

Homo sapiens

Chrysemys picta bellii

Mus musculus

Latimeria chalumnae

01 subsititutionssite

Dicamptodontidae

Hynobiidae

Cryptobranchidae

Sirenidae

Plethodontidae

Rhyacotritonidae

Amphiumidae

Proteidae

Salamandridae

Ambystomatidae

ANURA

GYMNOPHIONA

Cry

ptob

ranc

hoid

eaSa

lam

andr

oide

a

30 nuclear genes

(total 27834 bp)

1

Non-amphibianOutgroup

99101083

99101074

FIG 4 Higher-level phylogenetic relationships of 10 salamander families inferred from 30 NPCL markers The tree was inferred by concatenationanalyses using ML BI and the mixture model (CAT) and by species-tree analysis using the pseudo-ML approach (MP-EST) Branch support valuesare indicated beside nodes in order of ML bootstrap (BPML) BI posterior probability (PPBI) CAT posterior probability (PPCAT) and MP-EST bootstrap(BPMP-EST) from left to right The filled squares represent BPMLgt 95 PPBAY = 10 PPCAT = 10 and BPMP-ESTgt 95 The circled number refers to the nodeof interest studied in figure 6 Branch lengths are from the ML analysis

Table 2 Statistical Confidence (P Values) for Alternative Branching Hypotheses Based on 30-Gene Data Set

Alternative Topology Tested Ln L P Value Rejection

AU BP KH

Best ML 0 0993 097 098

Sirenidae branched earlier 328 0025 0015 002 + + +

Sirenidae is sister to Cryptobranchoidea 423 0004 0002 0004 + + +

Gymnophiona is sister to Anura (monophyletic lissamphibians) 343 0013 0012 0013 + + +

Gymnophiona is sister to Caudata (monophyletic lissamphibians) 439 0002 0001 0001 + + +

Anura is sister to Amniota (paraphyletic lissamphibians) 1446 5E30 0 0 + + +

Gymnophiona is sister to Amniota (paraphyletic lissamphibians) 1290 1E69 0 0 + + +

Caudata is sister to Amniota (paraphyletic lissamphibians) 1728 00001 0 0 + + +

2241

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

other distantly related groups is usually difficult For exampleSpinks et al (2010) collected 120 nuclear markers from aviansquamate and mammalian phylogenetic studies and evalu-ated their PCR performance in turtles They found that onlyeight nuclear markers successfully produced single expectedbands across 13 tested turtle species In another case Fongand Fujita (2011) developed 75 nuclear markers for vertebratephylogenetics but approximately 60 of the target fragmentswere unable to obtain in three test species (two reptiles andone lissamphibian) Therefore although the nested PCRmethod introduced here requires an additional PCR reactionthe extra work is still worthwhile

In PCR-based phylogenetic projects even when thePCR reactions are successful the products often contain sig-nificant nonspecific amplicons Such a condition requires ad-ditional effort involving gel purification and cloning whichinvolves much more time than the PCR reaction Our NPCLtoolkit is specifically designed to solve this problem so thatnormally over 90 of PCR reactions produce strong andsingle expected bands Moreover most of the primers usedto date for nuclear marker sets are degenerate and thus arenot suitable for direct sequencing PCR products Benefitingfrom the use of our nested PCR strategy we introduce an-choring sequences to the ends of PCR fragments while main-taining PCR efficiency Such introduced anchoring sequencesbring the added benefit that two specific sequencing primers(Seq_F and Seq_R) can be used in all Sanger sequencingreactions

One additional feature of our NPCL toolkit is that theaverage length of the NPCLs within it is 1050 bp a lengththat can easily be amplified in one PCR reaction and se-quenced in both directions to allow efficient use of resourcesIn contrast the average marker lengths are 920 bp for 10NPCLs in Li et al (2007) 760 bp for 26 NPCLs in Townsendet al (2008) 873 bp for 22 NPCLs in Shen et al (2011) and470 bp for 75 NPCLs in Fong and Fujita (2011) respectivelyLonger markers will provide more sites than shorter ones for

equivalent money and time This feature makes our NPCLtoolkit more cost-effective than previously developed nuclearmarker sets

Phylogenetic Utility

The vertebrate NPCL toolkit we developed here shows greatpromise in terms of phylogenetic utility A remarkable featureof our NPCL toolkit is that it provided 102 NPCLs with abroad range of evolutionary rates In the case of our demon-stration we used 30 NPCLs to resolve a family-level salaman-der phylogeny using both traditional concatenation analysesand a more promising species-tree analysis However thisexample does not mean that our toolkit performs wellonly on deep-timescale questions Our ongoing study usingthis toolkit to resolve the intra-relationships withinPlethodontidae a rapidly radiating group of salamanders sug-gests that the toolkit developed here also performs well inresolving species-level phylogenies For many vertebrategroups in which applicable nuclear markers are limitedsuch as some teleosts frogs and salamanders our NPCLtoolkit can provide a one-stop solution for phylogenetic stud-ies from the family level to the species level Even for thosegroups in which specific nuclear marker sets have beendeveloped our toolkit is still worth trying as many moreloci can be easily obtained that may resolve some difficultbranches

The Toolkit Is a Good Addition to Sequence CaptureApproaches

Recently sequence capture approaches have been applied tovertebrate phylogenomics (Crawford et al 2012 Fairclothet al 2012 Lemmon et al 2012 McCormack et al 2012)These approaches begin with the selective capture of geno-mic regions Briefly fragmented gDNA is hybridized to DNAor RNA probes either on an array or in solution NontargetedDNA is then washed away and the targeted DNA is se-quenced through NGS The most promising feature of thesequence capture approach is that it can simultaneously pro-duce hundreds to thousands of loci for tens of individualswithin a relatively short time Therefore the sequence captureapproach is considered to be much more cost-effective thanthe PCR-based method According to the calculation ofLemmon et al (2012) for a 100 taxa 500 loci project thecost of the sequence capture method is just 1ndash35 of thePCR-based method

However the sequence capture approach is currently toochallenging for most phylogenetic researchers Typical NGSruns (454 or Illumina) used by the sequence capture methodgenerate 1000000ndash2000000000 sequences Storing and pro-cessing these NGS data require significant computer memoryhardware upgrades and bioinformatic programming skillswhich are often not familiar to most phylogenetic researchersMoreover phylogenetic reconstruction assumes that ortho-logous genes are being analyzed across species For the PCR-based method the detection of paralogous genes is relativelystraightforward However in the sequence capture methodthe captured genomic regions comprise short conservedcores (probe regions) and long unconserved flanking

1

100

90

80

70

60

50

30

20

10

0

40

5 10 15 20 25 30

Concatenation analyses

Species tree analyses

Boo

tstr

ap s

uppo

rt (

)

Number of genes

FIG 5 The effect of increasing the number of nuclear loci on resolvingthe basal split within salamanders Each data point represents the meanof support values estimated from 30 randomly sampled subsets Thedashed line indicates the threshold of 95 bootstrap support valuesThe statistical plots show that the minimum number of nuclear locineeded to robustly resolve the basal split within salamanders is 25

2242

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

sequences Because paralogy cannot be detected until afterthe data are aligned those unalignable sequences will makethe detection of paralogy more difficult

In fact not every phylogenetic project will use more than500 loci as the sequence capture method normally doesBased on both empirical and simulation data 20ndash50 lociare generally sufficient to answer many phylogenetic ques-tions (Rokas et al 2003 Spinks et al 2009) This is also thenumber of loci that most phylogenetic studies will use Insuch a situation adopting the sequence capture method isnot cost-effective because researchers need to use relativelyexpensive NGS sequencing and spend time learning new ex-perimental techniques and carrying out sophisticated bioin-formatic processing Our NPCL toolkit is specially designed forsuch medium-scale phylogenetic projects using approxi-mately 50 loci Such a number of expected loci can beeasily fulfilled with our 102 NPCLs Because more than 90of the PCR reactions generated by our toolkit can be directlysequenced the average cost for one locus per sample is ratherlow In our laboratory generating one new sequence typicallycosts US$ 3 (without considering labor)

In addition researchers sometimes have only tiny amountsof DNA but they wish to perform a multilocus phylogeneticanalysis In such a situation the sequence capture method isdifficult to implement because it normally requires DNA atthe microgram level (Lemmon et al 2012) Our NPCL toolkitcan fill the gap here Benefiting from the use of the nestedPCR strategy the sensitivity of PCR reactions in our method isextremely high In many test experiments in our laboratorythe toolkit and protocol could produce target bands withonly 5ndash10 ng of DNA

Our NPCL toolkit is an alternative to the sequence capturemethod for the everyday work of phylogenetic researchersWhich method to choose depends on two major drivers theamounts of DNA and the expected number of loci Whenyour DNA is limited the better solution may be PCR other-wise sequence capture also works Taking into account themoney and time the two methods require we speculate thatthe economic transition point from PCR to sequence captureis at approximately 100 loci That assessment is why ourtoolkit includes 102 NPCL markers Our proposal is thatwhen using 100 loci one can try our NPCL toolkit whenusing gt100 loci sequence capture should be used

Future Directions

In this study we used multiple genome alignments depositedin the University of CaliforniandashSan Cruz (UCSC) genomebrowser to identify long and conserved exons across jawedvertebrates Benefiting from the use of a nested PCR strategythe experimental performance of the developed NPCLs indi-cated that they are highly stable in all major jawed vertebrategroups Recently a database for mining exon and intron mar-kers called EvolMarkers has been built by Li et al (2012)Careful investigation of this database may identify many con-served exons within nonvertebrates whose interrelationshipsare currently more problematic than those of vertebratesBecause the nonvertebrates constitute many distantly related

groups it may be impossible to develop a single set of PCRprimers for all nonvertebrates However following a similarmarker development strategy multiple NPCL toolkits couldbe constructed for various groups of nonvertebrates such asarthropods echinoderms and molluscs In addition becauseintrons are flanked by conserved exons the idea of the use ofnested PCRs for marker development could also be applied tothe development of EPIC (exon-primed intron crossing) mar-kers which are more suitable in shallow-scale phylogenetic orphylogeographic projects

Despite the benefits of our proposed method it must berecognized that when handling large-scale projects such as200 taxa 100 loci the use of our toolkit and Sanger se-quencing will still require significant cost time and laborAn alternative solution is to use NGS to replace Sanger se-quencing Recently 454 NGS technology has been applied tosequence-targeted gene regions from a pool of PCR productsfrom different specimens (Binladen et al 2007 Meyer et al2008) In such experiments specific tagging sequences mustbe added to amplicons by either PCR (Binladen et al 2007) orblunt-end ligation (Meyer et al 2008) Therefore if the tailingsequences of the second-round PCR primers in our NPCLtoolkit are replaced by tagging sequences instead (for tagdesigning see Faircloth and Glenn 2012) all PCR productscan be pooled together and sequenced with the 454 NGSwhich will greatly reduce the money and time cost comparedwith Sanger sequencing However parallel tagged sequencingvia NGS does not circumvent the process of PCR for eachindividual at each locus which may be the most onerous partof a large-scale phylogenomic project Some promising newtechnologies may help to solve this problem such as micro-droplet PCR (Tewhey et al 2009) where millions of individualPCR reactions are performed in picoliter-scale droplets simul-taneously and the 9696 Dynamic Array by Fluidigm whichallows 96 primer combinations to be used on 96 samples(9216 total PCR reactions) on a single PCR plate Howeverthere has been little research to applying NGS and new high-throughput PCR technologies to phylogenomics so theirease-of-use and cost-effectiveness still need to be explored

Summary

In conclusion we have developed an improved method forrapidly amplifying and sequencing NPCLs that has proven tobe useful and effective for molecular phylogenetic studies ofvertebrates The newly developed toolkit provides an attrac-tive alternative to available methods for vertebratephylogenomics

Materials and Methods

Development of NPCL and Primer Design

Our previous study showed that nested PCR is overwhel-mingly more effective than conventional PCR for obtainingtarget amplicons from complex genomic environments (Shenet al 2012) However nested PCR requires four conservedregions to design two pairs of primers (illustrated in fig 6yellow blocks represent the conserved regions used for primerdesign) which means that only relatively long exons are

2243

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li

et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons

As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the

Enrich target region from complex genomic environment with one pair of high degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min

Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min

Evaluate agarose gel electrophoretic results and sequencing

gDNA

the first round PCR product

Second PCR with tailed primers F2 and R2 using the 1st PCR as template

First PCR with primers F1 and R1 using gDNA as template

PCR evaluating and sequencing

25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing

A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product

PCR was performed with 50-100 ng DNA in a 25 ul reaction

PCR was performed with 1ul 1st PCR in a 25 ul reaction

(i)

(ii)

(iii)

F1

R2

F2

Seq_F

Seq_R

R1

Target Region

Target Region

Target Region

Target Region

Target Region

conserved blocks

single and strong bandN

Y (normally gt 90)

gel cutting or cloning then sequencing

cleanup with ExoI and FastAPdirect sequencing by general sequencing primers

Seq_F and Seq_R

Laboratory Protocol

FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region

2244

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR

Experimental Testing for Candidate Markers in16 Jawed Vertebrates

To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification

All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C

One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the

PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6

Calculation of Relative Evolutionary Rateof 102 NPCLs

The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates

Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships

To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant

2245

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online

Phylogenetic Analyses

Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)

For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model

The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence

We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree

Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8

model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates

Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)

Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)

ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R

Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197

Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786

Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375

Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press

Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749

Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543

Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726

Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990

Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307

Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370

2246

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577

Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321

Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563

Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586

Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179

Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276

Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701

Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145

Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744

Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744

Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971

Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302

McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754

McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538

Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945

Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278

Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618

Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10

Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583

Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892

Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804

Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561

San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599

Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252

Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256

Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508

Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247

Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947

Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129

Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031

Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525

Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195

Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142

Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777

Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110

2247

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156

Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508

Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400

Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140

2248

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Page 7: A Versatile and Highly Efficient Toolkit Including 102 ...completion of many parts of the vertebrate tree of life. Key words: nuclear marker, phylogenomic, vertebrate, salamander,

Aneides hardii

Plethodon jordani

Batrachoseps major

Eurycea bislineata

Amphiuma means

Rhyacotriton variegatus

Proteus anguinus

Necturus beyeri

Tylototriton asperrimus

Cynops orientalis

Salamandra salamandra

Dicamptodon aterrimus

Ambystoma mexicanum

Pseudobranchus axanthus

Siren intermedia

Ranodon sibiricus

Batrachuperus yenyuanensis

Onychodactylus fischeri

Andrias davidianus

Silurana tropicalis

Bombina fortinuptialis

Typhlonectes natans

Gallus gallus

Ichthyophis bannanicus

Homo sapiens

Chrysemys picta bellii

Mus musculus

Latimeria chalumnae

01 subsititutionssite

Dicamptodontidae

Hynobiidae

Cryptobranchidae

Sirenidae

Plethodontidae

Rhyacotritonidae

Amphiumidae

Proteidae

Salamandridae

Ambystomatidae

ANURA

GYMNOPHIONA

Cry

ptob

ranc

hoid

eaSa

lam

andr

oide

a

30 nuclear genes

(total 27834 bp)

1

Non-amphibianOutgroup

99101083

99101074

FIG 4 Higher-level phylogenetic relationships of 10 salamander families inferred from 30 NPCL markers The tree was inferred by concatenationanalyses using ML BI and the mixture model (CAT) and by species-tree analysis using the pseudo-ML approach (MP-EST) Branch support valuesare indicated beside nodes in order of ML bootstrap (BPML) BI posterior probability (PPBI) CAT posterior probability (PPCAT) and MP-EST bootstrap(BPMP-EST) from left to right The filled squares represent BPMLgt 95 PPBAY = 10 PPCAT = 10 and BPMP-ESTgt 95 The circled number refers to the nodeof interest studied in figure 6 Branch lengths are from the ML analysis

Table 2 Statistical Confidence (P Values) for Alternative Branching Hypotheses Based on 30-Gene Data Set

Alternative Topology Tested Ln L P Value Rejection

AU BP KH

Best ML 0 0993 097 098

Sirenidae branched earlier 328 0025 0015 002 + + +

Sirenidae is sister to Cryptobranchoidea 423 0004 0002 0004 + + +

Gymnophiona is sister to Anura (monophyletic lissamphibians) 343 0013 0012 0013 + + +

Gymnophiona is sister to Caudata (monophyletic lissamphibians) 439 0002 0001 0001 + + +

Anura is sister to Amniota (paraphyletic lissamphibians) 1446 5E30 0 0 + + +

Gymnophiona is sister to Amniota (paraphyletic lissamphibians) 1290 1E69 0 0 + + +

Caudata is sister to Amniota (paraphyletic lissamphibians) 1728 00001 0 0 + + +

2241

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

other distantly related groups is usually difficult For exampleSpinks et al (2010) collected 120 nuclear markers from aviansquamate and mammalian phylogenetic studies and evalu-ated their PCR performance in turtles They found that onlyeight nuclear markers successfully produced single expectedbands across 13 tested turtle species In another case Fongand Fujita (2011) developed 75 nuclear markers for vertebratephylogenetics but approximately 60 of the target fragmentswere unable to obtain in three test species (two reptiles andone lissamphibian) Therefore although the nested PCRmethod introduced here requires an additional PCR reactionthe extra work is still worthwhile

In PCR-based phylogenetic projects even when thePCR reactions are successful the products often contain sig-nificant nonspecific amplicons Such a condition requires ad-ditional effort involving gel purification and cloning whichinvolves much more time than the PCR reaction Our NPCLtoolkit is specifically designed to solve this problem so thatnormally over 90 of PCR reactions produce strong andsingle expected bands Moreover most of the primers usedto date for nuclear marker sets are degenerate and thus arenot suitable for direct sequencing PCR products Benefitingfrom the use of our nested PCR strategy we introduce an-choring sequences to the ends of PCR fragments while main-taining PCR efficiency Such introduced anchoring sequencesbring the added benefit that two specific sequencing primers(Seq_F and Seq_R) can be used in all Sanger sequencingreactions

One additional feature of our NPCL toolkit is that theaverage length of the NPCLs within it is 1050 bp a lengththat can easily be amplified in one PCR reaction and se-quenced in both directions to allow efficient use of resourcesIn contrast the average marker lengths are 920 bp for 10NPCLs in Li et al (2007) 760 bp for 26 NPCLs in Townsendet al (2008) 873 bp for 22 NPCLs in Shen et al (2011) and470 bp for 75 NPCLs in Fong and Fujita (2011) respectivelyLonger markers will provide more sites than shorter ones for

equivalent money and time This feature makes our NPCLtoolkit more cost-effective than previously developed nuclearmarker sets

Phylogenetic Utility

The vertebrate NPCL toolkit we developed here shows greatpromise in terms of phylogenetic utility A remarkable featureof our NPCL toolkit is that it provided 102 NPCLs with abroad range of evolutionary rates In the case of our demon-stration we used 30 NPCLs to resolve a family-level salaman-der phylogeny using both traditional concatenation analysesand a more promising species-tree analysis However thisexample does not mean that our toolkit performs wellonly on deep-timescale questions Our ongoing study usingthis toolkit to resolve the intra-relationships withinPlethodontidae a rapidly radiating group of salamanders sug-gests that the toolkit developed here also performs well inresolving species-level phylogenies For many vertebrategroups in which applicable nuclear markers are limitedsuch as some teleosts frogs and salamanders our NPCLtoolkit can provide a one-stop solution for phylogenetic stud-ies from the family level to the species level Even for thosegroups in which specific nuclear marker sets have beendeveloped our toolkit is still worth trying as many moreloci can be easily obtained that may resolve some difficultbranches

The Toolkit Is a Good Addition to Sequence CaptureApproaches

Recently sequence capture approaches have been applied tovertebrate phylogenomics (Crawford et al 2012 Fairclothet al 2012 Lemmon et al 2012 McCormack et al 2012)These approaches begin with the selective capture of geno-mic regions Briefly fragmented gDNA is hybridized to DNAor RNA probes either on an array or in solution NontargetedDNA is then washed away and the targeted DNA is se-quenced through NGS The most promising feature of thesequence capture approach is that it can simultaneously pro-duce hundreds to thousands of loci for tens of individualswithin a relatively short time Therefore the sequence captureapproach is considered to be much more cost-effective thanthe PCR-based method According to the calculation ofLemmon et al (2012) for a 100 taxa 500 loci project thecost of the sequence capture method is just 1ndash35 of thePCR-based method

However the sequence capture approach is currently toochallenging for most phylogenetic researchers Typical NGSruns (454 or Illumina) used by the sequence capture methodgenerate 1000000ndash2000000000 sequences Storing and pro-cessing these NGS data require significant computer memoryhardware upgrades and bioinformatic programming skillswhich are often not familiar to most phylogenetic researchersMoreover phylogenetic reconstruction assumes that ortho-logous genes are being analyzed across species For the PCR-based method the detection of paralogous genes is relativelystraightforward However in the sequence capture methodthe captured genomic regions comprise short conservedcores (probe regions) and long unconserved flanking

1

100

90

80

70

60

50

30

20

10

0

40

5 10 15 20 25 30

Concatenation analyses

Species tree analyses

Boo

tstr

ap s

uppo

rt (

)

Number of genes

FIG 5 The effect of increasing the number of nuclear loci on resolvingthe basal split within salamanders Each data point represents the meanof support values estimated from 30 randomly sampled subsets Thedashed line indicates the threshold of 95 bootstrap support valuesThe statistical plots show that the minimum number of nuclear locineeded to robustly resolve the basal split within salamanders is 25

2242

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

sequences Because paralogy cannot be detected until afterthe data are aligned those unalignable sequences will makethe detection of paralogy more difficult

In fact not every phylogenetic project will use more than500 loci as the sequence capture method normally doesBased on both empirical and simulation data 20ndash50 lociare generally sufficient to answer many phylogenetic ques-tions (Rokas et al 2003 Spinks et al 2009) This is also thenumber of loci that most phylogenetic studies will use Insuch a situation adopting the sequence capture method isnot cost-effective because researchers need to use relativelyexpensive NGS sequencing and spend time learning new ex-perimental techniques and carrying out sophisticated bioin-formatic processing Our NPCL toolkit is specially designed forsuch medium-scale phylogenetic projects using approxi-mately 50 loci Such a number of expected loci can beeasily fulfilled with our 102 NPCLs Because more than 90of the PCR reactions generated by our toolkit can be directlysequenced the average cost for one locus per sample is ratherlow In our laboratory generating one new sequence typicallycosts US$ 3 (without considering labor)

In addition researchers sometimes have only tiny amountsof DNA but they wish to perform a multilocus phylogeneticanalysis In such a situation the sequence capture method isdifficult to implement because it normally requires DNA atthe microgram level (Lemmon et al 2012) Our NPCL toolkitcan fill the gap here Benefiting from the use of the nestedPCR strategy the sensitivity of PCR reactions in our method isextremely high In many test experiments in our laboratorythe toolkit and protocol could produce target bands withonly 5ndash10 ng of DNA

Our NPCL toolkit is an alternative to the sequence capturemethod for the everyday work of phylogenetic researchersWhich method to choose depends on two major drivers theamounts of DNA and the expected number of loci Whenyour DNA is limited the better solution may be PCR other-wise sequence capture also works Taking into account themoney and time the two methods require we speculate thatthe economic transition point from PCR to sequence captureis at approximately 100 loci That assessment is why ourtoolkit includes 102 NPCL markers Our proposal is thatwhen using 100 loci one can try our NPCL toolkit whenusing gt100 loci sequence capture should be used

Future Directions

In this study we used multiple genome alignments depositedin the University of CaliforniandashSan Cruz (UCSC) genomebrowser to identify long and conserved exons across jawedvertebrates Benefiting from the use of a nested PCR strategythe experimental performance of the developed NPCLs indi-cated that they are highly stable in all major jawed vertebrategroups Recently a database for mining exon and intron mar-kers called EvolMarkers has been built by Li et al (2012)Careful investigation of this database may identify many con-served exons within nonvertebrates whose interrelationshipsare currently more problematic than those of vertebratesBecause the nonvertebrates constitute many distantly related

groups it may be impossible to develop a single set of PCRprimers for all nonvertebrates However following a similarmarker development strategy multiple NPCL toolkits couldbe constructed for various groups of nonvertebrates such asarthropods echinoderms and molluscs In addition becauseintrons are flanked by conserved exons the idea of the use ofnested PCRs for marker development could also be applied tothe development of EPIC (exon-primed intron crossing) mar-kers which are more suitable in shallow-scale phylogenetic orphylogeographic projects

Despite the benefits of our proposed method it must berecognized that when handling large-scale projects such as200 taxa 100 loci the use of our toolkit and Sanger se-quencing will still require significant cost time and laborAn alternative solution is to use NGS to replace Sanger se-quencing Recently 454 NGS technology has been applied tosequence-targeted gene regions from a pool of PCR productsfrom different specimens (Binladen et al 2007 Meyer et al2008) In such experiments specific tagging sequences mustbe added to amplicons by either PCR (Binladen et al 2007) orblunt-end ligation (Meyer et al 2008) Therefore if the tailingsequences of the second-round PCR primers in our NPCLtoolkit are replaced by tagging sequences instead (for tagdesigning see Faircloth and Glenn 2012) all PCR productscan be pooled together and sequenced with the 454 NGSwhich will greatly reduce the money and time cost comparedwith Sanger sequencing However parallel tagged sequencingvia NGS does not circumvent the process of PCR for eachindividual at each locus which may be the most onerous partof a large-scale phylogenomic project Some promising newtechnologies may help to solve this problem such as micro-droplet PCR (Tewhey et al 2009) where millions of individualPCR reactions are performed in picoliter-scale droplets simul-taneously and the 9696 Dynamic Array by Fluidigm whichallows 96 primer combinations to be used on 96 samples(9216 total PCR reactions) on a single PCR plate Howeverthere has been little research to applying NGS and new high-throughput PCR technologies to phylogenomics so theirease-of-use and cost-effectiveness still need to be explored

Summary

In conclusion we have developed an improved method forrapidly amplifying and sequencing NPCLs that has proven tobe useful and effective for molecular phylogenetic studies ofvertebrates The newly developed toolkit provides an attrac-tive alternative to available methods for vertebratephylogenomics

Materials and Methods

Development of NPCL and Primer Design

Our previous study showed that nested PCR is overwhel-mingly more effective than conventional PCR for obtainingtarget amplicons from complex genomic environments (Shenet al 2012) However nested PCR requires four conservedregions to design two pairs of primers (illustrated in fig 6yellow blocks represent the conserved regions used for primerdesign) which means that only relatively long exons are

2243

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li

et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons

As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the

Enrich target region from complex genomic environment with one pair of high degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min

Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min

Evaluate agarose gel electrophoretic results and sequencing

gDNA

the first round PCR product

Second PCR with tailed primers F2 and R2 using the 1st PCR as template

First PCR with primers F1 and R1 using gDNA as template

PCR evaluating and sequencing

25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing

A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product

PCR was performed with 50-100 ng DNA in a 25 ul reaction

PCR was performed with 1ul 1st PCR in a 25 ul reaction

(i)

(ii)

(iii)

F1

R2

F2

Seq_F

Seq_R

R1

Target Region

Target Region

Target Region

Target Region

Target Region

conserved blocks

single and strong bandN

Y (normally gt 90)

gel cutting or cloning then sequencing

cleanup with ExoI and FastAPdirect sequencing by general sequencing primers

Seq_F and Seq_R

Laboratory Protocol

FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region

2244

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR

Experimental Testing for Candidate Markers in16 Jawed Vertebrates

To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification

All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C

One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the

PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6

Calculation of Relative Evolutionary Rateof 102 NPCLs

The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates

Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships

To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant

2245

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online

Phylogenetic Analyses

Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)

For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model

The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence

We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree

Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8

model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates

Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)

Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)

ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R

Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197

Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786

Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375

Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press

Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749

Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543

Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726

Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990

Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307

Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370

2246

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577

Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321

Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563

Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586

Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179

Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276

Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701

Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145

Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744

Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744

Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971

Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302

McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754

McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538

Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945

Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278

Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618

Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10

Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583

Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892

Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804

Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561

San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599

Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252

Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256

Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508

Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247

Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947

Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129

Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031

Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525

Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195

Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142

Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777

Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110

2247

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156

Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508

Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400

Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140

2248

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Page 8: A Versatile and Highly Efficient Toolkit Including 102 ...completion of many parts of the vertebrate tree of life. Key words: nuclear marker, phylogenomic, vertebrate, salamander,

other distantly related groups is usually difficult For exampleSpinks et al (2010) collected 120 nuclear markers from aviansquamate and mammalian phylogenetic studies and evalu-ated their PCR performance in turtles They found that onlyeight nuclear markers successfully produced single expectedbands across 13 tested turtle species In another case Fongand Fujita (2011) developed 75 nuclear markers for vertebratephylogenetics but approximately 60 of the target fragmentswere unable to obtain in three test species (two reptiles andone lissamphibian) Therefore although the nested PCRmethod introduced here requires an additional PCR reactionthe extra work is still worthwhile

In PCR-based phylogenetic projects even when thePCR reactions are successful the products often contain sig-nificant nonspecific amplicons Such a condition requires ad-ditional effort involving gel purification and cloning whichinvolves much more time than the PCR reaction Our NPCLtoolkit is specifically designed to solve this problem so thatnormally over 90 of PCR reactions produce strong andsingle expected bands Moreover most of the primers usedto date for nuclear marker sets are degenerate and thus arenot suitable for direct sequencing PCR products Benefitingfrom the use of our nested PCR strategy we introduce an-choring sequences to the ends of PCR fragments while main-taining PCR efficiency Such introduced anchoring sequencesbring the added benefit that two specific sequencing primers(Seq_F and Seq_R) can be used in all Sanger sequencingreactions

One additional feature of our NPCL toolkit is that theaverage length of the NPCLs within it is 1050 bp a lengththat can easily be amplified in one PCR reaction and se-quenced in both directions to allow efficient use of resourcesIn contrast the average marker lengths are 920 bp for 10NPCLs in Li et al (2007) 760 bp for 26 NPCLs in Townsendet al (2008) 873 bp for 22 NPCLs in Shen et al (2011) and470 bp for 75 NPCLs in Fong and Fujita (2011) respectivelyLonger markers will provide more sites than shorter ones for

equivalent money and time This feature makes our NPCLtoolkit more cost-effective than previously developed nuclearmarker sets

Phylogenetic Utility

The vertebrate NPCL toolkit we developed here shows greatpromise in terms of phylogenetic utility A remarkable featureof our NPCL toolkit is that it provided 102 NPCLs with abroad range of evolutionary rates In the case of our demon-stration we used 30 NPCLs to resolve a family-level salaman-der phylogeny using both traditional concatenation analysesand a more promising species-tree analysis However thisexample does not mean that our toolkit performs wellonly on deep-timescale questions Our ongoing study usingthis toolkit to resolve the intra-relationships withinPlethodontidae a rapidly radiating group of salamanders sug-gests that the toolkit developed here also performs well inresolving species-level phylogenies For many vertebrategroups in which applicable nuclear markers are limitedsuch as some teleosts frogs and salamanders our NPCLtoolkit can provide a one-stop solution for phylogenetic stud-ies from the family level to the species level Even for thosegroups in which specific nuclear marker sets have beendeveloped our toolkit is still worth trying as many moreloci can be easily obtained that may resolve some difficultbranches

The Toolkit Is a Good Addition to Sequence CaptureApproaches

Recently sequence capture approaches have been applied tovertebrate phylogenomics (Crawford et al 2012 Fairclothet al 2012 Lemmon et al 2012 McCormack et al 2012)These approaches begin with the selective capture of geno-mic regions Briefly fragmented gDNA is hybridized to DNAor RNA probes either on an array or in solution NontargetedDNA is then washed away and the targeted DNA is se-quenced through NGS The most promising feature of thesequence capture approach is that it can simultaneously pro-duce hundreds to thousands of loci for tens of individualswithin a relatively short time Therefore the sequence captureapproach is considered to be much more cost-effective thanthe PCR-based method According to the calculation ofLemmon et al (2012) for a 100 taxa 500 loci project thecost of the sequence capture method is just 1ndash35 of thePCR-based method

However the sequence capture approach is currently toochallenging for most phylogenetic researchers Typical NGSruns (454 or Illumina) used by the sequence capture methodgenerate 1000000ndash2000000000 sequences Storing and pro-cessing these NGS data require significant computer memoryhardware upgrades and bioinformatic programming skillswhich are often not familiar to most phylogenetic researchersMoreover phylogenetic reconstruction assumes that ortho-logous genes are being analyzed across species For the PCR-based method the detection of paralogous genes is relativelystraightforward However in the sequence capture methodthe captured genomic regions comprise short conservedcores (probe regions) and long unconserved flanking

1

100

90

80

70

60

50

30

20

10

0

40

5 10 15 20 25 30

Concatenation analyses

Species tree analyses

Boo

tstr

ap s

uppo

rt (

)

Number of genes

FIG 5 The effect of increasing the number of nuclear loci on resolvingthe basal split within salamanders Each data point represents the meanof support values estimated from 30 randomly sampled subsets Thedashed line indicates the threshold of 95 bootstrap support valuesThe statistical plots show that the minimum number of nuclear locineeded to robustly resolve the basal split within salamanders is 25

2242

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

sequences Because paralogy cannot be detected until afterthe data are aligned those unalignable sequences will makethe detection of paralogy more difficult

In fact not every phylogenetic project will use more than500 loci as the sequence capture method normally doesBased on both empirical and simulation data 20ndash50 lociare generally sufficient to answer many phylogenetic ques-tions (Rokas et al 2003 Spinks et al 2009) This is also thenumber of loci that most phylogenetic studies will use Insuch a situation adopting the sequence capture method isnot cost-effective because researchers need to use relativelyexpensive NGS sequencing and spend time learning new ex-perimental techniques and carrying out sophisticated bioin-formatic processing Our NPCL toolkit is specially designed forsuch medium-scale phylogenetic projects using approxi-mately 50 loci Such a number of expected loci can beeasily fulfilled with our 102 NPCLs Because more than 90of the PCR reactions generated by our toolkit can be directlysequenced the average cost for one locus per sample is ratherlow In our laboratory generating one new sequence typicallycosts US$ 3 (without considering labor)

In addition researchers sometimes have only tiny amountsof DNA but they wish to perform a multilocus phylogeneticanalysis In such a situation the sequence capture method isdifficult to implement because it normally requires DNA atthe microgram level (Lemmon et al 2012) Our NPCL toolkitcan fill the gap here Benefiting from the use of the nestedPCR strategy the sensitivity of PCR reactions in our method isextremely high In many test experiments in our laboratorythe toolkit and protocol could produce target bands withonly 5ndash10 ng of DNA

Our NPCL toolkit is an alternative to the sequence capturemethod for the everyday work of phylogenetic researchersWhich method to choose depends on two major drivers theamounts of DNA and the expected number of loci Whenyour DNA is limited the better solution may be PCR other-wise sequence capture also works Taking into account themoney and time the two methods require we speculate thatthe economic transition point from PCR to sequence captureis at approximately 100 loci That assessment is why ourtoolkit includes 102 NPCL markers Our proposal is thatwhen using 100 loci one can try our NPCL toolkit whenusing gt100 loci sequence capture should be used

Future Directions

In this study we used multiple genome alignments depositedin the University of CaliforniandashSan Cruz (UCSC) genomebrowser to identify long and conserved exons across jawedvertebrates Benefiting from the use of a nested PCR strategythe experimental performance of the developed NPCLs indi-cated that they are highly stable in all major jawed vertebrategroups Recently a database for mining exon and intron mar-kers called EvolMarkers has been built by Li et al (2012)Careful investigation of this database may identify many con-served exons within nonvertebrates whose interrelationshipsare currently more problematic than those of vertebratesBecause the nonvertebrates constitute many distantly related

groups it may be impossible to develop a single set of PCRprimers for all nonvertebrates However following a similarmarker development strategy multiple NPCL toolkits couldbe constructed for various groups of nonvertebrates such asarthropods echinoderms and molluscs In addition becauseintrons are flanked by conserved exons the idea of the use ofnested PCRs for marker development could also be applied tothe development of EPIC (exon-primed intron crossing) mar-kers which are more suitable in shallow-scale phylogenetic orphylogeographic projects

Despite the benefits of our proposed method it must berecognized that when handling large-scale projects such as200 taxa 100 loci the use of our toolkit and Sanger se-quencing will still require significant cost time and laborAn alternative solution is to use NGS to replace Sanger se-quencing Recently 454 NGS technology has been applied tosequence-targeted gene regions from a pool of PCR productsfrom different specimens (Binladen et al 2007 Meyer et al2008) In such experiments specific tagging sequences mustbe added to amplicons by either PCR (Binladen et al 2007) orblunt-end ligation (Meyer et al 2008) Therefore if the tailingsequences of the second-round PCR primers in our NPCLtoolkit are replaced by tagging sequences instead (for tagdesigning see Faircloth and Glenn 2012) all PCR productscan be pooled together and sequenced with the 454 NGSwhich will greatly reduce the money and time cost comparedwith Sanger sequencing However parallel tagged sequencingvia NGS does not circumvent the process of PCR for eachindividual at each locus which may be the most onerous partof a large-scale phylogenomic project Some promising newtechnologies may help to solve this problem such as micro-droplet PCR (Tewhey et al 2009) where millions of individualPCR reactions are performed in picoliter-scale droplets simul-taneously and the 9696 Dynamic Array by Fluidigm whichallows 96 primer combinations to be used on 96 samples(9216 total PCR reactions) on a single PCR plate Howeverthere has been little research to applying NGS and new high-throughput PCR technologies to phylogenomics so theirease-of-use and cost-effectiveness still need to be explored

Summary

In conclusion we have developed an improved method forrapidly amplifying and sequencing NPCLs that has proven tobe useful and effective for molecular phylogenetic studies ofvertebrates The newly developed toolkit provides an attrac-tive alternative to available methods for vertebratephylogenomics

Materials and Methods

Development of NPCL and Primer Design

Our previous study showed that nested PCR is overwhel-mingly more effective than conventional PCR for obtainingtarget amplicons from complex genomic environments (Shenet al 2012) However nested PCR requires four conservedregions to design two pairs of primers (illustrated in fig 6yellow blocks represent the conserved regions used for primerdesign) which means that only relatively long exons are

2243

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li

et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons

As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the

Enrich target region from complex genomic environment with one pair of high degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min

Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min

Evaluate agarose gel electrophoretic results and sequencing

gDNA

the first round PCR product

Second PCR with tailed primers F2 and R2 using the 1st PCR as template

First PCR with primers F1 and R1 using gDNA as template

PCR evaluating and sequencing

25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing

A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product

PCR was performed with 50-100 ng DNA in a 25 ul reaction

PCR was performed with 1ul 1st PCR in a 25 ul reaction

(i)

(ii)

(iii)

F1

R2

F2

Seq_F

Seq_R

R1

Target Region

Target Region

Target Region

Target Region

Target Region

conserved blocks

single and strong bandN

Y (normally gt 90)

gel cutting or cloning then sequencing

cleanup with ExoI and FastAPdirect sequencing by general sequencing primers

Seq_F and Seq_R

Laboratory Protocol

FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region

2244

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR

Experimental Testing for Candidate Markers in16 Jawed Vertebrates

To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification

All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C

One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the

PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6

Calculation of Relative Evolutionary Rateof 102 NPCLs

The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates

Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships

To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant

2245

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online

Phylogenetic Analyses

Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)

For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model

The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence

We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree

Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8

model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates

Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)

Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)

ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R

Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197

Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786

Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375

Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press

Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749

Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543

Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726

Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990

Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307

Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370

2246

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577

Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321

Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563

Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586

Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179

Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276

Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701

Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145

Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744

Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744

Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971

Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302

McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754

McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538

Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945

Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278

Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618

Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10

Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583

Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892

Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804

Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561

San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599

Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252

Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256

Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508

Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247

Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947

Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129

Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031

Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525

Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195

Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142

Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777

Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110

2247

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156

Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508

Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400

Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140

2248

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Page 9: A Versatile and Highly Efficient Toolkit Including 102 ...completion of many parts of the vertebrate tree of life. Key words: nuclear marker, phylogenomic, vertebrate, salamander,

sequences Because paralogy cannot be detected until afterthe data are aligned those unalignable sequences will makethe detection of paralogy more difficult

In fact not every phylogenetic project will use more than500 loci as the sequence capture method normally doesBased on both empirical and simulation data 20ndash50 lociare generally sufficient to answer many phylogenetic ques-tions (Rokas et al 2003 Spinks et al 2009) This is also thenumber of loci that most phylogenetic studies will use Insuch a situation adopting the sequence capture method isnot cost-effective because researchers need to use relativelyexpensive NGS sequencing and spend time learning new ex-perimental techniques and carrying out sophisticated bioin-formatic processing Our NPCL toolkit is specially designed forsuch medium-scale phylogenetic projects using approxi-mately 50 loci Such a number of expected loci can beeasily fulfilled with our 102 NPCLs Because more than 90of the PCR reactions generated by our toolkit can be directlysequenced the average cost for one locus per sample is ratherlow In our laboratory generating one new sequence typicallycosts US$ 3 (without considering labor)

In addition researchers sometimes have only tiny amountsof DNA but they wish to perform a multilocus phylogeneticanalysis In such a situation the sequence capture method isdifficult to implement because it normally requires DNA atthe microgram level (Lemmon et al 2012) Our NPCL toolkitcan fill the gap here Benefiting from the use of the nestedPCR strategy the sensitivity of PCR reactions in our method isextremely high In many test experiments in our laboratorythe toolkit and protocol could produce target bands withonly 5ndash10 ng of DNA

Our NPCL toolkit is an alternative to the sequence capturemethod for the everyday work of phylogenetic researchersWhich method to choose depends on two major drivers theamounts of DNA and the expected number of loci Whenyour DNA is limited the better solution may be PCR other-wise sequence capture also works Taking into account themoney and time the two methods require we speculate thatthe economic transition point from PCR to sequence captureis at approximately 100 loci That assessment is why ourtoolkit includes 102 NPCL markers Our proposal is thatwhen using 100 loci one can try our NPCL toolkit whenusing gt100 loci sequence capture should be used

Future Directions

In this study we used multiple genome alignments depositedin the University of CaliforniandashSan Cruz (UCSC) genomebrowser to identify long and conserved exons across jawedvertebrates Benefiting from the use of a nested PCR strategythe experimental performance of the developed NPCLs indi-cated that they are highly stable in all major jawed vertebrategroups Recently a database for mining exon and intron mar-kers called EvolMarkers has been built by Li et al (2012)Careful investigation of this database may identify many con-served exons within nonvertebrates whose interrelationshipsare currently more problematic than those of vertebratesBecause the nonvertebrates constitute many distantly related

groups it may be impossible to develop a single set of PCRprimers for all nonvertebrates However following a similarmarker development strategy multiple NPCL toolkits couldbe constructed for various groups of nonvertebrates such asarthropods echinoderms and molluscs In addition becauseintrons are flanked by conserved exons the idea of the use ofnested PCRs for marker development could also be applied tothe development of EPIC (exon-primed intron crossing) mar-kers which are more suitable in shallow-scale phylogenetic orphylogeographic projects

Despite the benefits of our proposed method it must berecognized that when handling large-scale projects such as200 taxa 100 loci the use of our toolkit and Sanger se-quencing will still require significant cost time and laborAn alternative solution is to use NGS to replace Sanger se-quencing Recently 454 NGS technology has been applied tosequence-targeted gene regions from a pool of PCR productsfrom different specimens (Binladen et al 2007 Meyer et al2008) In such experiments specific tagging sequences mustbe added to amplicons by either PCR (Binladen et al 2007) orblunt-end ligation (Meyer et al 2008) Therefore if the tailingsequences of the second-round PCR primers in our NPCLtoolkit are replaced by tagging sequences instead (for tagdesigning see Faircloth and Glenn 2012) all PCR productscan be pooled together and sequenced with the 454 NGSwhich will greatly reduce the money and time cost comparedwith Sanger sequencing However parallel tagged sequencingvia NGS does not circumvent the process of PCR for eachindividual at each locus which may be the most onerous partof a large-scale phylogenomic project Some promising newtechnologies may help to solve this problem such as micro-droplet PCR (Tewhey et al 2009) where millions of individualPCR reactions are performed in picoliter-scale droplets simul-taneously and the 9696 Dynamic Array by Fluidigm whichallows 96 primer combinations to be used on 96 samples(9216 total PCR reactions) on a single PCR plate Howeverthere has been little research to applying NGS and new high-throughput PCR technologies to phylogenomics so theirease-of-use and cost-effectiveness still need to be explored

Summary

In conclusion we have developed an improved method forrapidly amplifying and sequencing NPCLs that has proven tobe useful and effective for molecular phylogenetic studies ofvertebrates The newly developed toolkit provides an attrac-tive alternative to available methods for vertebratephylogenomics

Materials and Methods

Development of NPCL and Primer Design

Our previous study showed that nested PCR is overwhel-mingly more effective than conventional PCR for obtainingtarget amplicons from complex genomic environments (Shenet al 2012) However nested PCR requires four conservedregions to design two pairs of primers (illustrated in fig 6yellow blocks represent the conserved regions used for primerdesign) which means that only relatively long exons are

2243

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li

et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons

As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the

Enrich target region from complex genomic environment with one pair of high degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min

Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min

Evaluate agarose gel electrophoretic results and sequencing

gDNA

the first round PCR product

Second PCR with tailed primers F2 and R2 using the 1st PCR as template

First PCR with primers F1 and R1 using gDNA as template

PCR evaluating and sequencing

25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing

A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product

PCR was performed with 50-100 ng DNA in a 25 ul reaction

PCR was performed with 1ul 1st PCR in a 25 ul reaction

(i)

(ii)

(iii)

F1

R2

F2

Seq_F

Seq_R

R1

Target Region

Target Region

Target Region

Target Region

Target Region

conserved blocks

single and strong bandN

Y (normally gt 90)

gel cutting or cloning then sequencing

cleanup with ExoI and FastAPdirect sequencing by general sequencing primers

Seq_F and Seq_R

Laboratory Protocol

FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region

2244

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR

Experimental Testing for Candidate Markers in16 Jawed Vertebrates

To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification

All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C

One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the

PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6

Calculation of Relative Evolutionary Rateof 102 NPCLs

The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates

Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships

To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant

2245

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online

Phylogenetic Analyses

Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)

For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model

The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence

We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree

Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8

model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates

Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)

Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)

ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R

Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197

Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786

Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375

Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press

Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749

Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543

Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726

Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990

Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307

Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370

2246

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577

Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321

Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563

Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586

Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179

Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276

Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701

Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145

Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744

Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744

Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971

Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302

McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754

McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538

Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945

Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278

Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618

Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10

Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583

Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892

Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804

Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561

San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599

Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252

Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256

Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508

Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247

Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947

Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129

Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031

Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525

Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195

Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142

Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777

Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110

2247

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156

Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508

Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400

Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140

2248

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Page 10: A Versatile and Highly Efficient Toolkit Including 102 ...completion of many parts of the vertebrate tree of life. Key words: nuclear marker, phylogenomic, vertebrate, salamander,

suitable as candidates for NPCL development with the nestedPCR method To search for long and conserved exons wetook advantages of our previous bioinformatic methodwhich used the multiple genome alignment data from theUCSC Genome Browser to identify conserved exons (Shenet al 2011) Because the NPCL markers are to be used invertebrates we focused only on those multiple genome align-ments that include at least six species Danio rerio (zebrafish)Silurana tropicalis (frog) Anolis carolinensis (lizard) Gallusgallus (chicken) Mus musculus (mouse) and Homo sapiens(human) The alignments of candidate exons had to meettwo criteria length of more than 700 bp and pairwise similar-ity ranging from 35 to 90 The detailed bioinformatic pipe-line has been described elsewhere (Shen et al 2011) Inaddition to using multiple genome alignments to screenNPCL candidates we also manually searched for nucleargenes that were used previously (Murphy et al 2001 Li

et al 2007 Townsend et al 2008 Wright et al 2008 Zhouet al 2011 Song et al 2012) in the ENSEMBL databaseto check whether they contain large and appropriately con-served exons

As a result we assembled a total of 305 NPCL candidatealignments of which 120 contained the appropriate numberof conserved blocks and used these to design nested PCRprimers To increase the success rates of our NPCL markers inamniotes we manually added a turtle sequence (Chrysemyspicta bellii) to each of the candidate alignments using datadownloaded from the ENSEMBL database A total number of480 primers were designed for the 120 NPCL candidatesBriefly the first-round PCR primers are only used to enrichtarget regions from genomic environments and not to obtaintarget amplicons so the degeneracy of these primers is nor-mally high to increase reaction sensitivity the second-roundPCR primers are used to obtain target amplicons so the

Enrich target region from complex genomic environment with one pair of high degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 45 degC for 40 s and 72 degC for 2 min and a final extension at 72 degC for 10 min

Specifically amplify target region from the first round PCR products with one pair of tailed low degenerate primers

Cycling conditions an initial denaturation step of 4 min at 94degC followed by 35 cycles of 94 degC for 45 s 50 degC for 40 s and 72 degC for 90 s and a final extension at 72 degC for 10 min

Evaluate agarose gel electrophoretic results and sequencing

gDNA

the first round PCR product

Second PCR with tailed primers F2 and R2 using the 1st PCR as template

First PCR with primers F1 and R1 using gDNA as template

PCR evaluating and sequencing

25 ul PCR product is cleaned with 2U ExoI and 04U FastAPcleanup conditions 37 degC for 30 min 80 degC for 15 mincleaned PCR product can be used for direct sequencing

A Sanger sequencing reaction is performed with 05 microl BigDye and 1 microl cleanup PCR product

PCR was performed with 50-100 ng DNA in a 25 ul reaction

PCR was performed with 1ul 1st PCR in a 25 ul reaction

(i)

(ii)

(iii)

F1

R2

F2

Seq_F

Seq_R

R1

Target Region

Target Region

Target Region

Target Region

Target Region

conserved blocks

single and strong bandN

Y (normally gt 90)

gel cutting or cloning then sequencing

cleanup with ExoI and FastAPdirect sequencing by general sequencing primers

Seq_F and Seq_R

Laboratory Protocol

FIG 6 Schematic representation of the experimental protocol for using our NPCL toolkit Note that for each NPCL nested PCR primers are designed onfour short conserved blocks flanking the target region

2244

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR

Experimental Testing for Candidate Markers in16 Jawed Vertebrates

To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification

All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C

One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the

PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6

Calculation of Relative Evolutionary Rateof 102 NPCLs

The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates

Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships

To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant

2245

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online

Phylogenetic Analyses

Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)

For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model

The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence

We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree

Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8

model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates

Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)

Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)

ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R

Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197

Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786

Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375

Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press

Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749

Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543

Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726

Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990

Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307

Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370

2246

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577

Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321

Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563

Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586

Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179

Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276

Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701

Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145

Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744

Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744

Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971

Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302

McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754

McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538

Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945

Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278

Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618

Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10

Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583

Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892

Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804

Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561

San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599

Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252

Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256

Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508

Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247

Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947

Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129

Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031

Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525

Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195

Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142

Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777

Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110

2247

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156

Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508

Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400

Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140

2248

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Page 11: A Versatile and Highly Efficient Toolkit Including 102 ...completion of many parts of the vertebrate tree of life. Key words: nuclear marker, phylogenomic, vertebrate, salamander,

degeneracy of these primers is lower to increase reactionspecificity Our previous study showed that the nested PCRmethod often produces strong and single amplicon bands(Shen et al 2012) To facilitate the next-step direct sequenc-ing we added a tail (50-AGGGTTTTCCCAGTCACGAC-30) tothe 50-end of all second-round forward primers and a tail (50-AGATAACAATTTCACACAGG-30) to the 50-end of all second-round reverse primers These tail sequences can provide twounique anchoring sites for direct sequencing from cleanedPCR products In our pilot experiments adding the tail se-quences to primers did not affect the efficiency of the second-round PCR

Experimental Testing for Candidate Markers in16 Jawed Vertebrates

To test the experimental performance of our newly designedNPCL markers we selected 16 taxa representing nine majorjawed vertebrate lineages Chondrichthyes (Sphyrna lewini)Actinopterygii (Lepisosteus oculatus and Pangasius sutchi)Dipnoi (Protopterus annectens) Lissamphibia (Ichthyophisbannanicus Batrachuperus yenyuanensis and Rana nigroma-culata) Mammalia (Mus musculus and Sus scrofa domestica)Testudines (Trionyx sinensis and Podocnemis unifilis) Aves(Struthio camelus and Zosterops japonica) Crocodylia(Crocodylus siamensis) and Squamata (Hemidactylusbowringii and Naja naja atra) Total genomic DNA was ex-tracted from ethanol-preserved tissues (liver or muscle) usingthe standard salt extraction protocol All extracted genomicDNAs were diluted to a concentration of 50 ngml1 with1 TE and stored at 20 C before PCR amplification

All 120 NPCL markers were tested with a two-round PCRstrategy (nested PCR) The first-round PCR was performed in25ml reaction containing 1ndash2ml template DNA (50ndash100 ng)with final concentrations of 1 PCR buffer 200mM dNTP400 nM of each forward and reverse first-round primers and125 U Taq polymerase (TransTaq High Fidelity TransGenBeijing) The cycling conditions of the first-round PCR wereas follows an initial denaturation step of 4 min at 94 C fol-lowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 45 C and a 2 min extension at 72 C followedby a final 10 min extension at 72 C The second-round PCRwas also performed in 25ml reaction containing 1ml of thefirst round PCR product (without dilution) and final concen-trations of 1 PCR buffer 200mM dNTP 400 nM of eachforward and reverse second-round primers and 125 U Taqpolymerase The cycling conditions of the second-round PCRwere as follows an initial denaturation step of 4 min at 94 Cfollowed by 35 cycles of a 45 s denaturation at 94 C a 40 sannealing at 50 C and a 90 s extension at 72 C followed by afinal 10 min extension at 72 C

One microliter of the second-round PCR products wasanalyzed on 10 TAE agarose gel An NPCL marker wasconsidered successful if more than 8 of the 16 tested taxaproduced target amplicon bands On this basis 102 out of 120tested NPCL markers were successful The nested-PCR pri-mers for the 102 NPCL markers can be found in the onlinesupplementary table S1 Supplementary Material online If the

PCR products contained significant nonspecific ampliconbands (normallylt 10) they needed further processing forexample standard gel cutting or cloning If the PCR reactionsproduced single amplicon bands (normallygt 90) they werecleaned with ExoFAP treatment 2 U ExoI and 04 U FastAP(all Fermentas) were added to the PCR tube and incubatedfor 30 min at 37 C and 15 min at 80 C The cleanup PCRreactions can be directly used as templates for Sanger se-quencing According to our experimental designs all PCRfragments can be sequenced with the two universal sequenc-ing primers Seq_F 50-AGGGTTTTCCCAGTCACGAC-30 andSeq_R 50-AGATAACAATTTCACACAGG-30 from both endsA typical Sanger sequencing reaction in our laboratory con-sumes 05ml BigDye and 1ml cleanup PCR product Theprimer design strategy the laboratory protocol for thenested PCR method and the pretreatment of PCR productsbefore Sanger sequencing are illustrated in figure 6

Calculation of Relative Evolutionary Rateof 102 NPCLs

The rate multipliers (m) across partitions estimated inMrBayes 32 (Ronquist et al 2012) are used as relative evolu-tionary rates To calculate these parameters alignments foreach NPCL were prepared for 12 species Homo sapiensMacaca mulatta Mus musculus Rattus norvegicus Gallusgallus Meleagris gallopavo Chrysemys picta bellii Anolis car-olinensis Silurana tropicalis Tetraodon nigroviridis Takifugurubripes and Danio rerio Because genome data are availablefor the 12 species we did not generate any new data The 102NPCL alignments were then combined and subjected toMrBayes analyses partitioned by genes Each gene was as-signed a separate GTR + + I model and all model param-eters were unlinked Two Markov chain Monte Carlo(MCMC) runs were performed with one cold chain andthree heated chains (temperature set to 01) for 50 milliongenerations and sampled every 1000 generations The ratemultiplier for each gene was estimated using Tracer version14 after discarding the first 50 of the generations All evo-lutionary rates were normalized by dividing by the maximumvalue of the obtained rates

Gene and Taxon Sampling for Investigating HigherLevel Salamander Relationships

To test the utility of our NPCL toolkit in a real case weselected 19 salamander taxa representing all 10 salamanderfamilies and 9 outgroup taxa to investigate the family-levelrelationships of salamanders (supplementary table S2Supplementary Material online) For gene sampling we ran-domly selected 30 NPCL markers whose PCR success rateswere more than 90 in the 16 previously tested vertebratesAmong the target 840 sequences (30 markers for 28 taxa) 201were available in public databases (NCBI UCSC andENSEMBL) whereas the remaining 639 sequences neededto be generated de novo The experimental procedure wasas described earlier All obtained sequences were examined bychecking for the presence of premature stop codons (pseu-dogene) and by BlastX searches against the nonredundant

2245

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online

Phylogenetic Analyses

Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)

For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model

The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence

We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree

Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8

model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates

Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)

Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)

ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R

Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197

Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786

Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375

Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press

Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749

Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543

Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726

Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990

Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307

Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370

2246

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577

Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321

Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563

Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586

Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179

Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276

Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701

Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145

Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744

Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744

Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971

Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302

McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754

McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538

Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945

Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278

Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618

Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10

Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583

Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892

Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804

Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561

San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599

Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252

Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256

Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508

Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247

Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947

Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129

Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031

Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525

Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195

Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142

Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777

Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110

2247

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156

Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508

Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400

Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140

2248

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Page 12: A Versatile and Highly Efficient Toolkit Including 102 ...completion of many parts of the vertebrate tree of life. Key words: nuclear marker, phylogenomic, vertebrate, salamander,

protein sequences (nr) to confirm that they were our targetgenes The NPCLs species and accession numbers for thenewly obtained sequences are listed in the supplementarytable S2 Supplementary Material online

Phylogenetic Analyses

Alignments of all 30 NPCL markers were conducted using theG-INS-i method from MAFFT (Katoh et al 2005) under thedefault settings according to their translated amino acid se-quences then refined by eye All 30 refined alignments werecombined into a concatenated data set (27834 bp)

For the concatenated data set we manually defined fivepartitioning strategies 2 partitions (one for codon positions1 and 2 and one for codon position 3) 3 partitions (onepartition for each codon position) 30 partitions (one parti-tion for each gene) 60 partitions (one for codon position1 + 2 and one codon position 3 across 30 genes) and 90 par-titions (codon position partitioning across 30 genes)Comparisons of the five partitioning strategies and selectionsof corresponding nucleotide substitution models were con-ducted under the Bayesian information criterion imple-mented in PartitionFinder (Lanfear et al 2012) The3-partition scheme (one partition for each codon position)was chosen as the best-fitting partitioning strategy and all 3partitions favored the GTR + + I model

The concatenated data set was separately analyzed withboth ML and Bayesian inference (BI) methods under the 3-partition scheme Partitioned ML analyses were implementedusing RAxML version 726 (Stamatakis 2006) with theGTR + + I model assigned for each partition A searchthat combined 100 separate ML searches was applied tofind the optimal tree and branch support for each nodewas evaluated with 500 standard bootstrapping replicates(-f d -b 500 option) implemented in RAxML The partitionedBI was conducted using MrBayes 32 (Ronquist et al 2012)All model parameters were unlinked Two MCMC runs wereperformed with one cold chain and three heated chains (tem-perature set to 01) for 60 million generations and sampledevery 1000 generations The chain stationarity was visualizedby plottingln L against the generation number using Tracerversion 14 and the first 50 of generations were discardedTopologies and posterior probabilities were estimated fromthe remaining generations Two runs for each analysis werecompared for congruence

We also performed Bayesian phylogenetic analyses under amixture model CAT + GTR + 4 in PhyloBayes 33 (Lartillotet al 2009) with two independent MCMC runs Each run wasperformed for 10000 cycles and sampled every cycleStationarity was reached when the largest discrepancy(maxdiff) was less than 01 between two independent runsThe first 5000 trees in each MCMC run were discardedThe remaining 10000 trees of the two runs were sampledevery 5 trees to generate a 50 majority-rule posteriorconsensus tree

Species tree estimation was conducted using the pseudo-ML approach in the program MP-EST (Liu et al 2010) underthe coalescent model Briefly the gene trees for 30 NPCLmarkers were reconstructed with ML under the GTR + 8

model using PHYML 30 (Guindon et al 2010) The resulting30 gene trees were rooted with an outgroup (Chrysemys pictabellii) and used to generate an MP-EST species tree usingthe program MP-EST The robustness of the species treewas evaluated with nonparametric bootstrapping of 500replicates

Alternative phylogenetic hypotheses were tested basedon 30-gene data set We first calculated sitewise log likeli-hoods of alternative trees using RAxML (-f g) under theGTR + + I model Then the site log-likelihood file wasused to estimate P values for each alternative tree inCONSEL program (Shimodaira and Hasegawa 2001) usingthe KishinondashHasegawa test (KH) (Kishino and Hasegawa1989) the approximately unbiased test (AU) (Shimodaira2002) and the RELL bootstrap proportion test (BP)

Supplementary MaterialSupplementary tables S1 and S2 and figures S1 and S2 areavailable at Molecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

The authors are grateful to David Wake and Carol Spencerof the Museum of Vertebrate Zoology at the Universityof California Berkeley for tissue loans David Wake pro-vided many useful comments on the manuscript Thiswork was supported by National Natural ScienceFoundation of China grants (31172075 and 30900136) andthe National Science Fund for Excellent Young Scholars(no pending)

ReferencesBinladen J Gilbert MTP Bollback JP Panitz F Bendixen C Nielsen R

Willerslev E 2007 The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification productsby 454 parallel sequencing PLoS One 2e197

Crawford NG Faircloth BC McCormack JE Brumfield RT Winker KGlenn TC 2012 More than 1000 ultraconserved elements provideevidence that turtles are the sister group to archosaurs Biol Lett 8783ndash786

Delsuc F Brinkmann H Philippe H 2005 Phylogenomics and thereconstruction of the tree of life Nat Rev Genet 6361ndash375

Duellman WE Trueb L 1994 Biology of amphibians Baltimore (MD)Johns Hopkins University Press

Dunn CW Hejnol A Matus DQ et al (18 co-authors) 2008 Broadphylogenomic sampling improves resolution of the animal tree oflife Nature 452745ndash749

Faircloth BC Glenn TC 2012 Not all sequence tags are created equaldesigning and validating sequence identification tags robust toindels PLoS One 7e42543

Faircloth BC McCormack JE Crawford NG Brumfield RT Glenn TC2012 Ultraconserved elements anchor thousands of genetic markersfor target enrichment spanning multiple evolutionary timescalesSyst Biol 61717ndash726

Fong JJ Brown JM Fujita MK Boussau B 2012 A phylogenomicapproach to vertebrate phylogeny supports a turtle-archosauraffinity and a possible paraphyletic lissamphibia PLoS One 7e48990

Fong JJ Fujita MK 2011 Evaluating phylogenetic informativeness anddata-type usage for new protein-coding genes across VertebrataMol Phylogenet Evol 61300ndash307

Frost DR Grant T Faivovich J et al (18 co-authors) 2006 The amphib-ian tree of life Bull Am Mus Nat Hist 2978ndash370

2246

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577

Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321

Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563

Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586

Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179

Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276

Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701

Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145

Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744

Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744

Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971

Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302

McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754

McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538

Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945

Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278

Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618

Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10

Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583

Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892

Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804

Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561

San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599

Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252

Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256

Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508

Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247

Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947

Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129

Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031

Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525

Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195

Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142

Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777

Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110

2247

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156

Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508

Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400

Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140

2248

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Page 13: A Versatile and Highly Efficient Toolkit Including 102 ...completion of many parts of the vertebrate tree of life. Key words: nuclear marker, phylogenomic, vertebrate, salamander,

Gao KQ Shubin NH 2001 Late Jurassic salamanders from northernChina Nature 410574ndash577

Guindon S Dufayard J-F Lefort V Anisimova M Hordijk W Gascuel O2010 New algorithms and methods to estimate maximum-likelihood phylogenies assessing the performance of PhyML 30Syst Biol 59307ndash321

Hugall AF Foster R Lee MSY 2007 Calibration choice rate smoothingand the pattern of tetrapod diversification according to the longnuclear gene RAG-1 Syst Biol 56543ndash563

Inoue JG Miya M Lam K Tay BH Danks JA Bell J Walker TI VenkateshB 2010 Evolutionary origin and phylogeny of the modern holoce-phalans (Chondrichthyes Chimaeriformes) a mitogenomic perspec-tive Mol Biol Evol 272576ndash2586

Katoh K Kuma K Toh H Miyata T 2005 MAFFT version 5 improve-ment in accuracy of multiple sequence alignment Nucleic Acids Res33511ndash518

Kishino H Hasegawa M 1989 Evaluation of the maximum likelihoodestimate of the evolutionary tree topologies from DNA sequencedata and the branching order in Hominoidea J Mol Evol 29170ndash179

Kunstner A Wolf JBW Backstrom N et al (13 co-authors) 2010Comparative genomics based on massive parallel transcriptomesequencing reveals patterns of substitution and selection across10 bird species Mol Ecol 19266ndash276

Lanfear R Calcott B Ho SYW Guindon S 2012 Partitionfindercombined selection of partitioning schemes and substitutionmodels for phylogenetic analyses Mol Biol Evol 291695ndash1701

Lartillot N Lepage T Blanquart S 2009 PhyloBayes 3 a Bayesian soft-ware package for phylogenetic reconstruction and molecular datingBioinformatics 252286ndash2288

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009 The effectof ambiguous data on phylogenetic estimates obtained bymaximum likelihood and Bayesian inference Syst Biol 58130ndash145

Lemmon AR Emme SA Lemmon EM 2012 Anchored hybridenrichment for massively high-throughput phylogenomics SystBiol 61727ndash744

Li C Ortı G Zhang G Lu G 2007 A practical approach to phyloge-nomics the phylogeny of ray-finned fish (Actinopterygii) as a casestudy BMC Evol Biol 744

Li C Riethoven JJ Naylor GJ 2012 EvolMarkers a database for miningexon and intron markers for evolution ecology and conservationstudies Mol Ecol Resour 12967ndash971

Liu L Yu L Edwards SV 2010 A maximum pseudo-likelihood approachfor estimating species trees under the coalescent model BMC EvolBiol 10302

McCormack JE Faircloth BC Crawford NG Gowaty PA Brumfield RTGlenn TC 2012 Ultraconserved elements are novelphylogenomic markers that resolve placental mammal phylog-eny when combined with species-tree analysis Genome Res 22746ndash754

McCormack JE Hird SM Zellmer AJ Carstens BC Brumfield RT 2013Applications of next-generation sequencing to phylogeography andphylogenetics Mol Phylogenet Evol 66526ndash538

Meyer A Van de Peer Y 2005 From 2R to 3R evidence for a fish-specificgenome duplication (FSGD) Bioessays 27937ndash945

Meyer M Stenzel U Hofreiter M 2008 Parallel tagged sequencing onthe 454 platform Nat Protoc 3267ndash278

Murphy WJ Eizirik E Johnson WE Zhang YP Ryder OA OrsquoBrien SJ 2001Molecular phylogenetics and the origins of placental mammalsNature 409614ndash618

Philippe H Derelle R Lopez P et al (20 co-authors) 2009Phylogenomics revives traditional views on deep animal relation-ships Curr Biol 19706ndash712

Philippe H Telford M 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Portik DM Wood PL Grismer JL Stanley EL Jackman TR 2011Identification of 104 rapidly-evolving nuclear protein-codingmarkers for amplification across scaled reptiles using genomic re-sources Conserv Genet Resour 41ndash10

Pyron RA Wiens JJ 2011 A large-scale phylogeny of Amphibiaincluding over 2800 species and a revised classification ofextant frogs salamanders and caecilians Mol Phylogenet Evol 61543ndash583

Roelants K Gower DJ Wilkinson M Loader SP Biju SD Guillaume KMoriau L Bossuyt F 2007 Global patterns of diversification inthe history of modern amphibians Proc Natl Acad Sci U S A 104887ndash892

Rokas A Williams BL King N Carroll SB 2003 Genome-scaleapproaches to resolving incongruence in molecular phylogeniesNature 425798ndash804

Ronquist F Teslenko M Van der Mark P Ayres DL Darling A Hohna SLarget B Liu L Suchard MA Huelsenbeck JP 2012 MrBayes 32efficient Bayesian phylogenetic inference and model choice acrossa large model space Syst Biol 61539ndash542

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

San Mauro D 2010 A multilocus timescale for the origin of extantamphibians Mol Phylogenet Evol 56554ndash561

San Mauro D Vences M Alcobendas M Zardoya R Meyer A 2005Initial diversification of living amphibians predated the breakup ofPangaea Am Nat 165590ndash599

Shen XX Liang D Wen JZ Zhang P 2011 Multiple genome alignmentsfacilitate development of NPCL markers a case study of tetrapodphylogeny focusing on the position of turtles Mol Biol Evol 283237ndash3252

Shen XX Liang D Zhang P 2012 The development of three longuniversal nuclear protein-coding locus markers and their applica-tion to osteichthyan phylogenetics with nested PCR PLoS One 7e39256

Shimodaira H 2002 An approximately unbiased test of phylogenetictree selection Syst Biol 51492ndash508

Shimodaira H Hasegawa M 2001 CONSEL for assessing theconfidence of phylogenetic tree selection Bioinformatics 171246ndash1247

Song S Liu L Edwards SV Wu S 2012 Resolving conflict ineutherian mammal phylogeny using phylogenomics and themultispecies coalescent model Proc Natl Acad Sci U S A 10914942ndash14947

Spinks PQ Thomson RC Barley AJ Newman CE Shaffer HB 2010Testing avian squamate and mammalian nuclear markers forcross amplification in turtles Conserv Genet Resour 2127ndash129

Spinks PQ Thomson RC Lovely GA Shaffer HB 2009 Assessing what isneeded to resolve a molecular phylogeny simulations and empiricaldata from Emydid turtles BMC Evol Biol 956

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Tewhey R Warner JB Nakano M Libby B Medkova M David PHKotsopoulos SK Samuels ML Hutchison JB Larson JW 2009Microdroplet-based PCR enrichment for large-scale targetedsequencing Nat Biotechnol 271025ndash1031

Thomson RC Shedlock AM Edwards SV Shaffer HB 2008 Developingmarkers for multilocus phylogenetics in non-model organismsA test case with turtles Mol Phylogenet Evol 49514ndash525

Thomson RC Wang IJ Johnson JR 2010 Genome-enabled developmentof DNA markers for ecology evolution and conservation Mol Ecol192184ndash2195

Townsend TM Alegre RE Kelley ST Wiens JJ Reeder TW 2008 Rapiddevelopment of multiple nuclear loci for phylogenetic analysis usinggenomic resources an example from squamate reptiles MolPhylogenet Evol 47129ndash142

Weisrock DW Harmon LJ Larson A 2005 Resolving deep phylogeneticrelationships in salamanders analyses of mitochondrial and nucleargenomic DNA Syst Biol 54758ndash777

Wiens J Bonett R Chippindale P 2005 Ontogeny discombobulatesphylogeny paedomorphosis and higher-level salamander relation-ships Syst Biol 5491ndash110

2247

Universal Toolkit Including 102 NPCL doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156

Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508

Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400

Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140

2248

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from

Page 14: A Versatile and Highly Efficient Toolkit Including 102 ...completion of many parts of the vertebrate tree of life. Key words: nuclear marker, phylogenomic, vertebrate, salamander,

Wright TF Schirtzinger EE Matsumoto T et al (11 co-authors) 2008A multilocus molecular phylogeny of the parrots (Psittaciformes)support for a Gondwanan origin during the cretaceous Mol BiolEvol 252141ndash2156

Zhang P Wake DB 2009 Higher-level salamander relationships anddivergence dates inferred from complete mitochondrial genomesMol Phylogenet Evol 53492ndash508

Zhang P Zhou H Chen YQ Liu YF Qu LH 2005 Mitogenomic per-spectives on the origin and phylogeny of living amphibians Syst Biol54391ndash400

Zhou XM Xu SX Zhang P Yang G 2011 Developing a series ofconservative anchor markers and their application to phylo-genomics of Laurasiatherian mammals Mol Ecol Resour 11134ndash140

2248

Shen et al doi101093molbevmst122 MBE at V

anderbilt University - M

assey Law

Library on M

arch 18 2015httpm

beoxfordjournalsorgD

ownloaded from