Genome-Wide Analysis of the Core DNA Replication ......Genome Analysis Genome-Wide Analysis of the Core DNA Replication Machinery in the Higher Plants Arabidopsis and Rice1[W][OA]
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Genome Analysis
Genome-Wide Analysis of the Core DNAReplication Machinery in the Higher PlantsArabidopsis and Rice1[W][OA]
Randall W. Shultz2, Vinaya M. Tatineni, Linda Hanley-Bowdoin*, and William F. Thompson
Department of Plant Biology (R.W.S., W.F.T.), Department of Statistical Genetics and Bioinformatics(V.M.T.), and Department of Molecular and Structural Biochemistry (L.H.-B.), North Carolina StateUniversity, Raleigh, North Carolina 27695
Core DNA replication proteins mediate the initiation, elongation, and Okazaki fragment maturation functions of DNAreplication. Although this process is generally conserved in eukaryotes, important differences in the molecular architectureof the DNA replication machine and the function of individual subunits have been reported in various model systems. Wehave combined genome-wide bioinformatic analyses of Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) withpublished experimental data to provide a comprehensive view of the core DNA replication machinery in plants. Manycomponents identified in this analysis have not been studied previously in plant systems, including the GINS (go ichi ni san)complex (PSF1, PSF2, PSF3, and SLD5), MCM8, MCM9, MCM10, NOC3, POLA2, POLA3, POLA4, POLD3, POLD4, andRNASEH2. Our results indicate that the core DNA replication machinery from plants is more similar to vertebrates thansingle-celled yeasts (Saccharomyces cerevisiae), suggesting that animal models may be more relevant to plant systems.However, we also uncovered some important differences between plants and vertebrate machinery. For example, we did notidentify geminin or RNASEH1 genes in plants. Our analyses also indicate that plants may be unique among eukaryotes inthat they have multiple copies of numerous core DNA replication genes. This finding raises the question of whetherspecialized functions have evolved in some cases. This analysis establishes that the core DNA replication machinery ishighly conserved across plant species and displays many features in common with other eukaryotes and some character-istics that are unique to plants.
DNA replication depends on the coordinated actionof numerous multiprotein complexes. At the simplestlevel, it requires an initiator to establish the site ofreplication initiation, a helicase to unwind DNA, apolymerase to synthesize new DNA, and machinery toprocess the Okazaki fragments generated during dis-continuous synthesis. Much is known about the DNAreplication machinery in yeast (Saccharomyces cerevi-siae) and animal model systems, but relatively little isknown about the apparatus in plants. To gain insightinto plant DNA replication components, we have
combined published experimental information withour own bioinformatic analysis of genomic sequencedata to examine the core DNA replication machineryin the model plants Arabidopsis (Arabidopsis thaliana)and rice (Oryza sativa).
Figure 1 depicts a model eukaryotic DNA replica-tion fork and illustrates the protein complexes knownor suspected to be part of the core DNA replicationmachine. These complexes mediate the initiation,elongation, and maturation stages of DNA replicationand, as such, constitute the core eukaryotic DNA rep-lication machinery. The events leading to the forma-tion of an active DNA replication fork occur in astepwise fashion, but our understanding of the timingand specific details of how these events unfold indiverse eukaryotes is limited, and there are a growingnumber of examples of variations between model sys-tems (Bell, 2002; Bell and Dutta, 2002; Kearsey andCotterill, 2003).
In recent years, there has been increased interest inplant DNA replication and in using plants as modelsfor understanding DNA replication in eukaryotes. Adetailed understanding of the core DNA replicationmachinery in plants will provide researchers with animportant tool for understanding what makes plantsunique with respect to replicative and developmentalcapacity and for investigating how plant strategiescompare to the mechanisms employed by animals.
1 This work was supported by the National Science FoundationPlant Genome Research Initiative (grant no. 0421651) and an Inte-grative Graduate Education and Research Traineeship from theNational Science Foundation (to R.W.S.).
2 Present address: Department of Molecular and Structural Bio-chemistry, North Carolina State University, Raleigh, NC 27695.
The author responsible for distribution of materials integral tothe findings presented in this article in accordance with the policydescribed in the Instructions for Authors (www.plantphysiol.org) is:Linda Hanley-Bowdoin ([email protected]).
[W] The online version of this article contains Web-only data.[OA] Open Access articles can be viewed online without a subscription.www.plantphysiol.org/cgi/doi/10.1104/pp.107.101105
Plant Physiology, August 2007, Vol. 144, pp. 1697–1714, www.plantphysiol.org � 2007 American Society of Plant Biologists 1697 www.plantphysiol.orgon June 18, 2020 - Published by Downloaded from
To identify the core DNA replication genes inArabidopsis and rice, we developed an approach
that incorporated experimental data from the litera-ture with homology-based computational gene anno-tation. First, we assembled a database of yeast andanimal proteins that have been determined experi-mentally to be part of the core eukaryotic DNA rep-lication machinery. The BLAST algorithm was used tosearch against the translated Arabidopsis genomedatabase at The Arabidopsis Information Resource(TAIR), and sequences with significant similarity wereassigned putative annotations based on their functionsin yeast and animal systems. The Arabidopsis se-quences were then used to identify putative homologsin The Institute for Genomic Research (TIGR) rice ge-nome database. Next, we searched the primary liter-ature and, when available, incorporated experimentalresults that pertained to plant systems to validate theannotation (relevant plant literature is listed in TableI). In cases where no experimental data from plantscould be found, we generated protein sequence align-ments from diverse eukaryotes and considered thevalidity of putative annotations based on the quality ofthe alignment and the presence of highly conserveddomains. Using this strategy, we report the core DNAreplication machinery in the dicot Arabidopsis and themonocot rice (Table I). Together, these results estab-lished that there is a general conservation of DNAreplication machinery in plants.
We encountered numerous instances where the ex-isting gene model resulted in a protein that eitherlacked highly conserved sequences or contained addi-tional residues compared to other eukaryotic proteins.When available, plant-derived transcripts from Gen-Bank and the TIGR Plant Transcript Assembly data-bases (TIGR-TA; Childs et al., 2007) were used to guidethe prediction of a new gene model. In cases where full-length transcripts could not be identified, we usedprotein sequence alignments to create gene models thatmaximized sequence conservation with other eukary-otes (Table I). The predicted coding and resulting pro-tein sequences are provided in Supplemental Text S1.
Preinitiation
One of the first steps toward establishing a functionalDNA replication fork is binding of the eukaryoticinitiator complex termed the origin recognition com-plex (ORC) to DNA in late M and early G1 phases of thecell division cycle (Dutta and Bell, 1997; Bryant et al.,2001; Bell, 2002; Bell and Dutta, 2002; DePamphilis,2003). Next, CDC6 interacts with the origin-boundORC, which together recruit a CDT1/MCM2-7 com-plex of proteins. Hydrolysis of ATP by ORC/CDC6causes the release of CDT1 and structural alteration ofthe ring-shaped MCM complex, leading to its loadingaround DNA (Randell et al., 2006; Ranjan and Gossen,2006; Waga and Zembutsu, 2006). MCM loading isreiterated through the action of a single ORC/CDC6complex resulting in the recruitment of 10 to 40 MCMcomplexes at each potential origin (Blow and Dutta,
Figure 1. Model depicting the core eukaryotic DNA replication ma-chinery from initiation through Okazaki fragment maturation. A, Com-ponents of the preinitiation complex. DNA bound ORC recruits NOC3,CDC6, and CDT1 in early G1. Reiterative loading of 10 to 40 MCMcomplexes forms a licensed origin. After MCM loading is complete,CDC6 and CDT1 dissociate from the origin. B, At the G1/S transition asubset of licensed origins transition to an initiation complex. The preciseorder of events is not clear and may vary between systems. CDC45,TOPBP1, and MCM8-10 contribute to GINS complex loading, DNAunwinding, and recruitment of the polymerases. C, Components of theactive DNA replication fork. MCM2-7, CDC45, and GINS unwind theduplex DNA. Leading strand synthesis is accomplished primarily byPOLE. GINS increases the processivity of POLE. On the lagging strand,RPA stabilizes ssDNA, POLA lays down a short RNA/DNA primer andthen is replaced by POLD, which completes the Okazaki fragment. RFCloads PCNA, which increases the processivity of POLD. The precise roleof MCM8-10 in this process is not clear. D, The dominant mechanism ofOkazaki fragment maturation requires FEN1 to cleave the RNA/DNAflap, resulting in a nick that is sealed by LIG1.
Shultz et al.
1698 Plant Physiol. Vol. 144, 2007 www.plantphysiol.orgon June 18, 2020 - Published by Downloaded from
2005). The resulting protein/DNA assembly consistingof ORC1-6/CDC6/MCM2-7 is termed the prereplica-tion complex (pre-RC), and sites containing this com-plex are considered licensed with the potential to serveas origins of replication (Fig. 1A).
All six ORC genes have been identified in Arabi-dopsis (Gavin et al., 1995; Collinge et al., 2004; Masudaet al., 2004), and genes encoding ORC1 to 5 have beenreported for rice (Kimura et al., 2000a; Li et al., 2005;Mori et al., 2005) and maize (Zea mays; Witmer et al.,2003). The Arabidopsis ORC proteins show 22% to37% amino acid identity with human ORC subunits(Table I). Consistent with a role in DNA replication,Arabidopsis ORC transcripts have been shown to beabundant in proliferating tissues such as root tips,young leaves, and flower buds, and their expressioninduced upon cell cycle reentry following Suc starva-tion of cultured suspension cells (Masuda et al., 2004;
Diaz-Trivino et al., 2005). Interestingly, ZmORC3 (Witmeret al., 2003) and AtORC5-6 (Diaz-Trivino et al., 2005)transcripts are also abundant in postmitotic tissues,suggesting that plant ORC subunits may have addi-tional functions in mature tissues.
Two analyses of the ORC complex in rice (Mori et al.,2005) and maize (Witmer et al., 2003) failed to identifya candidate ORC6 gene. The authors of each of thesestudies suggested that ORC6 is poorly conserved inplants. However, our query of the TIGR-TA databaseusing AtORC6 yielded strong hits from a diverse arrayof plant species. Alignment of five dicot, two monocot,and two conifer ORC6 proteins showed significantsequence similarity within plants and between plantsand other eukaryotes (Fig. 2A). This alignment sup-ports the conclusion that like other ORC subunits,ORC6 is conserved in plants. Interestingly, our analy-sis indicates that the ORC6 C terminus is conserved
aAt, Hs, Os, and Sc are Arabidopsis, human, rice, and yeast, respectively. bReferences: 1, Kimura et al. (2000a); 2, Witmer et al. (2003);3, Collinge et al. (2004); 4, Masuda et al. (2004); 5, Diaz-Trivino et al. (2005); 6, Mori et al. (2005); 7, Gavin et al. (1995); 8, Li et al. (2005); 9, Sabelliet al. (1996); 10, Sabelli et al. (1999); 11, Stevens et al. (2002); 12, Dambrauskas et al. (2003); 13, Taliercio et al. (2005); 14, Dresselhaus et al. (2006);15, Springer et al. (1995); 16, Springer et al. (2000); 17, Bastida and Puigdomenech (2002); 18, Holding and Springer (2002); 19, Castellano et al.(2001); 20, de Jager et al. (2001); 21, Ramos et al. (2001); 22, Castellano et al. (2004); 23, Raynaud et al. (2005); 24, Stevens et al. (2004); 25, Yokoiet al. (1997); 26, Burgers et al. (2001); 27, Uchiyama et al. (2002); 28, Garcia et al. (2006); 29, Kimura et al. (2001); 30, Egelkrout et al. (2002); 31,Kosugi and Ohashi (2002); 32, Toueille et al. (2002); 33, Castillo et al. (2003); 34, Raynaud et al. (2006); 35, Furukawa et al. (2003); 36, Furukawaet al. (2001); 37, Jenik et al. (2005); 38, Ronceret et al. (2005); 39, He and Mascarenhas (1998); 40, Yang and Sheila (2002); 41, Grelon et al. (2003);42, Ishibashi et al. (2001); 43, Marwedel et al. (2003); 44, Ishibashi et al. (2005); 45, Ishibashi et al. (2006); 46, Xia et al. (2006); 47, Kimura et al.(2003); 48, Kimura et al. (2000b); 49, Taylor et al. (1998); 50, Sunderland et al. (2004); 51, Bonatto et al. (2005); 52, Sunderland et al.(2006). cNew gene models were predicted for these loci. NA, Gene was not identified; ns, no significant similarity was found.
Shultz et al.
1700 Plant Physiol. Vol. 144, 2007 www.plantphysiol.orgon June 18, 2020 - Published by Downloaded from
Figure 2. Multiple sequence alignments of plant ORC6 and GINS complex proteins. A, ORC6. B, PSF1. C, PSF2. D, PSF3. E,SLD5. For all sections, protein sequences from the indicated plant species were aligned using the Clustal W algorithm. Blackshading indicates identical residues in all aligned sequences. Gray shading denotes residues with similar chemical propertiesthat are conserved in .50% of sequences aligned. Similar chemical properties of amino acid residues were defined as follows:DE, Acidic; AGILV, aliphatic; NQ, amide; FWY, aromatic; RHK, basic; ST, hydroxyl; CM, sulphur. Plant sequences were alsoaligned with proteins from other eukaryotes. Ascomycota includes sequences from yeast and Schizosaccharomyces pombe.Vertebrata includes Homo sapeins, Danio rerio, X. laevis, and Gallus gallus. An x indicates residues that are identical in allsequences from plants and the specified group. An o denotes residues with similar chemical properties that are conserved in.50% of plant sequences and 100% of sequences from the specified group. The plant species are designated as Ac, Ananascomosus; Af, Aquilegia formosa 3 Aquilegia pubescens; At, Arabidopsis; Bn, Brassica napus; Br, B. rapa; Cs, Citrus sinensis; Ee,Euphorbia esula; Ga, Gossypium arboreum; Gm, G. max; Gr, Gossypium raimondii; Ha, Helianthus annus; Ht, Helianthuspetiolaris; Hv, Hordeum vulgare; In, Ipomoea nil; Lc, Lotus corniculatus; Ls, Lactuca serriola; Lv, Lactuca virosa; Mt, Medicagotruncatula; Nt, N. tabacum; Os, O. sativa; Pg, Picea glauca; Ps, Picea sitchensis; Pt, Populus trichocarpa; Pta, Pinus taeda; Sa,Senecio aethnensis; Sb, Sorghum bicolor; Sch, Solanum chacoense; So, Saccharum officinarum; St, Solanum tuberosum; Ta,Triticum aestivum; Zm, Z mays.
Core DNA Replication Machinery in Plants
Plant Physiol. Vol. 144, 2007 1701 www.plantphysiol.orgon June 18, 2020 - Published by Downloaded from
among plants but differs from other eukaryotes, in-dicative of a plant-specific function.
It has been reported that Arabidopsis has two CDC6(Ramos et al., 2001) and two CDT1 (Castellano et al.,2004) genes. We identified one candidate CDC6 and twocandidate CDT1 homologs in rice (Table I). OsCDC6shares 51% and 59% amino acid identity with AtCDC6A(At2g29680) and AtCDC6B (At1g07270), respectively.The CDT1 proteins are more divergent. AtCDT1A(At2g31270) and AtCDT1B (At3g54710) are 37% iden-tical while the two rice CDT1 proteins are 30% iden-tical. Between Arabidopsis and rice, the amino acid
identity ranges from 28% to 37% (Table I). Given thatCDC6 and CDT1 have similar functions in all eukary-otes studied to date, it is likely that these proteins alsoact similarly in plants. However, the divergence betweencopies raises the possibility of additional activities.
The six-subunit MCM complex (MCM2-7) representsthe putative eukaryotic replicative helicase (Forsburg,2004; Masai et al., 2005; Maiorano et al., 2006), andgenes encoding one copy of each subunit have beenidentified in Arabidopsis (Springer et al., 1995; Stevenset al., 2002; Masuda et al., 2004; Dresselhaus et al., 2006).MCM3 (Sabelli et al., 1996; Sabelli et al., 1999) and
Figure 2. (Continued. )
Shultz et al.
1702 Plant Physiol. Vol. 144, 2007 www.plantphysiol.orgon June 18, 2020 - Published by Downloaded from
MCM6 (Dresselhaus et al., 2006) proteins have beenidentified in maize, and MCM3 has been reported intobacco (Nicotiana tabacum; Dambrauskas et al., 2003).We identified strong candidates for each of the MCM2-7proteins in rice (Table I). Importantly, these proteinscontain the sequence features that define the MCMfamily, including Walker A and Walker B domains, azinc finger region, and an Arg finger motif (Forsburg,2004; Maiorano et al., 2006).
Nucleolar complex-associated (NOC) proteins areconserved in eukaryotes and are involved in ribosomebiogenesis (Milkereit et al., 2001) and cell differentia-tion (Tominaga et al., 2004). One member of this com-plex, NOC3, has been shown to interact with ORC andMCM proteins and is required for pre-RC formation inbudding yeast (Zhang et al., 2002). Our analysis indi-cates that Arabidopsis and rice both code for a NOC3protein (Table I). The TAIR gene model for AtNOC3(At1g79150.1) produces a protein of 496 amino acidsthat is missing conserved sequences in the C-terminalregion. This model is based on a cDNA sequence inGenBank (accession NM_106566), but there is a secondcDNA sequence in GenBank (accession AAC17047)with a different intron-exon structure that contains theconserved C-terminal portion of NOC3. It would beinteresting to investigate whether these two AtNOC3transcripts represent alternative splicing events orare simply artifacts. We assembled the available tran-
scripts to predict a putative full-length AtNOC3 tran-script (Supplemental Text S1). Our results support theconclusion that a complete pre-RC is conserved in plants.
Initiation
The pre-RC assembles at many sites, but only asubset of these sites recruit replication machinery andinitiate DNA synthesis (Bell, 2002; DePamphilis et al.,2006). Neither the order of events nor the proteinsinvolved in the transition from pre-RC to active rep-lication fork are completely defined. However, severalproteins are known to have critical roles in this initi-ation process (Table I; Fig. 1B).
MCM8 and MCM9 proteins are conserved in adiverse array of eukaryotes, but are lacking in mostfungi and Caenorhabditis elegans (Blanton et al., 2005).Human and frog MCM8 proteins associate with chro-matin in S-phase after loading of the MCM2-7 com-plex, and may stabilize replication protein3 (RPA3)and POLA1 binding to the replication fork (Gozuaciket al., 2003; Maiorano et al., 2006). The function ofMCM9 is not known, but it is expressed maximally inS-phase and is transcriptionally regulated by E2F1,indicative of a role in DNA replication (Yoshida, 2005).In Arabidopsis, sequences with similarity to eukary-otic MCM8 and MCM9 proteins have been reported(Dresselhaus et al., 2006), and we identified putative
Figure 2. (Continued. )
Core DNA Replication Machinery in Plants
Plant Physiol. Vol. 144, 2007 1703 www.plantphysiol.orgon June 18, 2020 - Published by Downloaded from
MCM8 and MCM9 genes in rice, suggesting that theseproteins are generally conserved in plants. The TIGRgene model for OsMCM8 predicts a protein of 482amino acids, which is considerably shorter than otherMCM8 proteins and lacks several highly conserveddomains. However, a sequence encoding the missingdomains is present in the rice genome, supporting ourprediction of a new gene model for OsMCM8 (Supple-mental Text S1).
Our examination of the Arabidopsis and poplar(Populus spp.) MCM9 gene sequences suggested thatthey also may be alternatively spliced or misrepre-sented by transcripts in the databases, and new genemodels that maximize protein sequence conservationwere predicted (Supplemental Text S1). ArabidopsisMCM8 and MCM9 are expressed at low levels (Schmidet al., 2005). This could explain why the functionaltranscripts have not been cloned, but a directed efforttoward identifying primary and alternatively splicedMCM8 and MCM9 transcripts is needed to determinethe relevance of our predicted gene models.
Like other MCM family members, the central regionof plant MCM8 and MCM9 proteins contain Walker Aand B NTP-binding domains, a putative zinc finger,and an Arg finger motif (Supplemental Figs. S1 andS2). In both plants and animals, the MCM8 and MCM9proteins contain a classic GKS sequence in the WalkerA motif compared to the deviant A/SKS sequencefound in MCM2-7 (Maiorano et al., 2005, 2006). Arab-idopsis, poplar, and rice lack the first approximately 60amino acids of animal MCM8 proteins and are alsomissing a QVLTKDLEXXAAXLQXDE motif found inhuman, chicken, frog, and sea urchin homologs (Sup-plemental Fig. S1, region C). Additional differencesbetween plant and animal MCM8 proteins are indi-cated (Supplemental Fig. S1, regions A and B and D–I).In animals, MCM9 proteins are the largest of the MCMfamily members due to a long, poorly conservedC-terminal domain. Plants do not have this C-terminalextension, raising the possibility that MCM9 functionsdifferently in plants and animals (Supplemental Fig.S2). It is also noteworthy that while all MCM2-8proteins contain the IDEFDKM Walker B sequence,only the IDEF is conserved in MCM9 proteins (Walkeret al., 1982; Neuwald et al., 1999).
MCM10, which is conserved from yeast to humans,does not contain the sequence features that define therest of the MCM family. However, it is an essential partof the core DNA replication machinery and has beenimplicated in a variety of DNA replication processes,including loading and stabilizing DNA polymerase a(POLA; Ricke and Bielinsky, 2004), recruitment ofCDC45 (Wohlschlegel et al., 2002; Sawyer et al., 2004),and as a component of the replisome progression com-plex (Gambus et al., 2006; Pacek et al., 2006). Althoughwe found no published information regarding MCM10proteins in plants, we identified putative MCM10 ho-mologs in Arabidopsis, rice, maize, and the westerncolumbine (Aquilegia formosa). Eukaryotic MCM10 pro-teins are not highly conserved (Supplemental Fig. S3).
Arabidopsis and rice MCM10 proteins show 44%amino acid identity, while Arabidopsis and humanMCM10 proteins display 33% identity within thealigned region. In budding yeast, MCM10 interactswith itself through a CCCH-type zinc finger motif toform a large homocomplex that is required for DNAreplication (Cook et al., 2003). The CCCH zinc finger isconserved in plant and animal MCM10 proteins, sug-gesting that zinc binding and homomultimerization areshared properties of MCM10 proteins (SupplementalFig. S3).
CDC45 is essential for both the initiation and elon-gation stages of DNA replication (Bell and Dutta, 2002;Pollok et al., 2003; Pacek and Walter, 2004). It assem-bles onto the origin in late G1 after the MCM2-7 com-plex and concurrent with the onset of initiation (Zouand Stillman, 1998). CDC45 is required for POLA com-plex loading (Mimura and Takisawa, 1998) and is acomponent of large complexes containing MCMs andGINS. These observations have led to the suggestionthat CDC45 serves as an anchor coupling POLA to thereplication fork via the replisome complex (Gambuset al., 2006; Moyer et al., 2006; Pacek et al., 2006).Published results indicate that Arabidopsis CDC45 isexpressed in proliferating tissues and transcripts aremost abundant at the G1/S transition (Stevens et al.,2004), consistent with a role at this cell cycle stage.Interestingly, AtCDC45 has also been implicated inmeiosis—a role not yet reported for any other eukary-otes (Stevens et al., 2004). In rice, we identified twoputative CDC45 genes located on chromosomes 11 and12. These two proteins differ only at four positions,indicative of a very recent duplication event or strongselective pressure.
The GINS complex, which consists of four proteins,PSF1, PSF2, PSF3, and SLD5, was identified recently asa critical part of the initiation process. GINS is essentialfor the establishment and maintenance of a functionalDNA replication fork (Kanemaki et al., 2003; Kubotaet al., 2003; Takayama et al., 2003; Gambus et al., 2006).The GINS proteins copurify as a tightly associated het-erotetrameric complex with a ring-like structure thatresembles the DNA polymerase d (POLD) processivityfactor, proliferating cell nuclear antigen (PCNA), inelectron micrographs (Kubota et al., 2003). GINS hasbeen shown to bind weakly to DNA polymerase e(POLE) and specifically stimulate DNA synthesis byPOLE in vitro, leading to the suggestion that GINSfunctions as a POLE processivity factor analogous tothe function of PCNA (Seki et al., 2006). However, thereis also evidence that GINS is a core component of theeukaryotic DNA replication fork helicase. GINS inter-acts stably with the MCM2-7 complex and CDC45, andthe GINS/MCM/CDC45 supercomplex functions as ahelicase in vitro (Gambus et al., 2006; Moyer et al.,2006). The precise role that GINS plays at the DNAreplication fork is not yet clear, but GINS interactionswith POLE and the CDC45/MCM helicase complexplace it between these two complexes on the leadingstrand (Fig. 1C).
Shultz et al.
1704 Plant Physiol. Vol. 144, 2007 www.plantphysiol.orgon June 18, 2020 - Published by Downloaded from
GINS complex proteins have been identified in abroad array of eukaryotes based on sequence similarity(Kubota et al., 2003), and experimental evidence fromyeast (cited above), fly (Moyer et al., 2006), frog (Kubotaet al., 2003; Pacek et al., 2006), and mouse (Ueno et al.,2005) suggests that the complex is functionally con-served in eukaryotes. Consistent with this suggestion,we identified a complete GINS complex from twodicots, Arabidopsis and soybean (Glycine max), andtwo monocots, rice and maize. In Arabidopsis, wefound two loci (see Table I) that encode for nearlyidentical copies of the putative PSF3 protein. We alsoidentified transcripts representing one or several of theGINS complex proteins from a diverse array of addi-tional plant species, demonstrating that GINS is con-served broadly in plants. We performed phylogeneticanalysis of the GINS complex from plants, animals, andyeasts, demonstrating that the proteins cluster primar-ily by subunit and secondarily by taxonomy (Fig. 3). Inall cases, the vertebrates (human, chicken, zebrafish,and frog) form a tight cluster with good bootstrapsupport. Drosophila sequences cluster loosely with thisgroup, and C. elegans sequences are more divergent.Plant sequences also form a highly supported cluster,with monocot and dicot sequences tending to separateinto subgroups. For PSF1, PSF2, and SLD5, the plantsequences cluster with animals, while for PSF3, plantsand yeast cluster, although the bootstrap support forthese divisions is generally low (Fig. 3).
GINS complex proteins are highly conserved withrespect to amino acid sequence (Supplemental TableS1). PSF2, which shares 66% identity between Arabi-dopsis and rice and 42% identity between Arabidopsisand human, is the most highly conserved GINS com-plex subunit. Amino acid sequence length and pI arealso conserved features of eukaryotic GINS proteins(Supplemental Table S2). PSF1 shows the least sizevariability with an average of 199 amino acids and a SD
of 4.4 between plants, vertebrates, and yeasts (data notshown). In budding yeast, PSF3 and SLD5 proteins arelonger than plant and animal sequences due to anapproximately 25 amino acid N-terminal extensionand several small internal insertions. The predicted pIsof GINS complex proteins typically range from 5 to 7(Supplemental Table S2). PSF1 in chicken, which has apredicted pI of 8.8, and PSF3 in Arabidopsis and rice,which have predicted pIs of 8.3 and 9.2, respectively,are notable exceptions.
To identify conserved and unique features of plant GINScomplex proteins, we generated sequence alignmentsfrom diverse plant species and compared them toyeast and vertebrate GINS proteins (Fig. 2, B–E). Thesealignments indicated that eukaryotic PSF1 proteins aresimilar along their entire length, but show the highestdegree of sequence conservation in the central andC-terminal regions (Fig. 2B). Two blocks of identicalresidues, RNKRCLMAY (block I) and VDMVPPKDP(block II), and a highly conserved motif in the C ter-minus (block III), are apparent in plant sequences.These domains are also highly conserved in yeast and
animals, suggesting that they are critical for PSF1function. Supporting this conclusion, it has beenshown that mutation of a conserved Arg residue inblock I of budding yeast [NK(R-to-G) CL] results incell growth arrest and morphology consistent with aDNA replication defect (Takayama et al., 2003). PSF1 ispredicted to adopt primarily a helical conformationwith a short elongated region (b sheet) near its C ter-minus. Given the structural constraints conferred byPro residues, the conserved double Pro in block II mayserve an important role in defining the structural prop-erties of PSF1.
PSF2 proteins from yeast, animals, and plants con-tain tracts of identical and conserved residues spreadacross the length of the protein (Fig. 2C). In contrast tothe rest of the protein, the C terminus stands out asbeing poorly conserved. Plant proteins have an addi-tional 15 to 20 amino acids at the C terminus includinga short, conserved motif, PRRxLRR (region B). Plantand vertebrate PSF2 proteins also contain a conservedsequence (region A) that is lacking in budding andfission yeasts. Alignment of PSF3 proteins reveals twoconserved features in the N-terminal region (Fig. 2D,region A and region B), a high degree of similaritythrough the central portion of the protein, and anLGRKR motif at the C-terminal end (region C). Thismotif does not align with yeast and vertebrate pro-teins. However, vertebrate sequences have a con-served NYXKRK motif in this region, suggesting thatpositive charge may be important at the C terminus(data not shown).
Our analysis indicated that SLD5 proteins containtwo prominent blocks of highly conserved amino acids(Fig. 2E, blocks I and II). Except for a short conservedregion at the extreme C terminus (region A), the N andC termini of SLD5 are divergent. We used the COILSalgorithm (Lupas et al., 1991) to predict that the plantSLD5 proteins adopt a coiled-coil structure betweenblocks I and II (Fig. 2E, COILS track). Coiled-coildomains, which are common in transcription factors(Leu zipper motif), SNARE complexes, and spindle-pole-body components, are thought to interact pri-marily with other coiled-coil domains (Lupas, 1996,1997; Martin et al., 2004; Rose et al., 2004). We did notdetect coiled-coil domains in the other GINS complexproteins, and it would be interesting to ask if SLD5 actsas a homodimer or facilitates interactions between theGINS complex and other coiled-coil proteins.
Our analysis suggested that the initiation compo-nents have largely been conserved in plants, andsupports the hypothesis that similar mechanisms gov-ern the transition from pre-RC to active replicationfork in plants and animals.
Elongation Complex
Initiation of active replication at the G1/S transitionrequires the assembly of additional proteins includingDNA polymerases and Okazaki fragment maturationfactors to form a complete replication factory (Fig. 1C;
Core DNA Replication Machinery in Plants
Plant Physiol. Vol. 144, 2007 1705 www.plantphysiol.orgon June 18, 2020 - Published by Downloaded from
Waga and Stillman, 1998; Bell and Dutta, 2002). Threeeukaryotic DNA polymerase complexes have beenimplicated in DNA replication—POLA, POLD, andPOLE (Burgers et al., 2001; Garg and Burgers, 2005;Johnson and O’Donnell, 2005).
The POLA complex includes a catalytic subunit(POLA1), two primase subunits (POLA3 and POLA4),and POLA2, which is thought to tether the complexto the replication fork (Frick and Richardson, 2001).Protein complexes containing polymerase and primase
Figure 3. Phylogenetic analysis of eukaryotic GINS complex proteins. Protein sequences were aligned with ClustalW using theGonnet scoring matrix in MEGA. A single tree containing all of the sequences was constructed by the neighbor-joining method,and was split manually into four subtrees (A–D) for visualization. Bootstrap values from 5,000 iterations are shown at each node.Species abbreviations are the same as in Figure 2.
Shultz et al.
1706 Plant Physiol. Vol. 144, 2007 www.plantphysiol.orgon June 18, 2020 - Published by Downloaded from
activity have been purified from a variety of plantsystems (Coello and Vazquez-Ramos, 1995; Garcia et al.,2002), demonstrating that a POLA-like function exists inplants. However, sequence homology has been investi-gated only for the POLA1 subunit in rice (Yokoi et al.,1997). Rice POLA1 was originally reported to be shorterthan other eukaryotic POLA1 homologs, due to a trun-cated N terminus (Yokoi et al., 1997). However, publi-cation of the rice genomic sequence (Sasaki et al., 2002)allowed us to predict a full-length OsPOLA1 (GenBankaccession O48653). We identified all four putative POLAsubunits in Arabidopsis and the remaining three sub-units in rice. We predicted new gene models for ricePOLA2 and POLA4 subunits, resulting in better con-servation to other eukaryotes (Supplemental Text S1).
Seven protein sequence features have been estab-lished as conserved in all eukaryotic DNA polymerasecatalytic subunits (Spicer et al., 1988; Wong et al.,1988), and an additional five regions are conserved inPOLA1 proteins (Miyazawa et al., 1993). We found thatthese defined regions are conserved in the Arabidopsisand rice POLA1 proteins. Although the sequence fea-tures of POLA2 to 4 are less well characterized,Arabidopsis POLA2, POLA3, and POLA4 align with37%, 41%, and 47% identity to their correspondinghuman proteins, respectively. According to our anal-yses, the majority of sequence features that are con-served between yeast and human are present in thecorresponding Arabidopsis and rice proteins, support-ing the hypothesis of conserved function. One notableexception is an YYRRLFP motif of unknown functionlocated at the N terminus of yeast and animal POLA4proteins but absent in Arabidopsis and rice POLA4.We conclude that POLA is a four subunit complex inArabidopsis and rice.
POLD is known to function as a heterotetramer infission yeast and animals (POLD1–4), but only three sub-units have been identified in budding yeast (POLD1–3;Johnson and O’Donnell, 2005). The largest subunit(POLD1) contains the polymerase and exonucleaseactivity, while the other subunits are involved incomplex stabilization and interactions with PCNA.In rice, the POLD1 and POLD2 genes have been shownto be expressed primarily in proliferating tissues andinduced upon regrowth following Suc starvation incell culture (Uchiyama et al., 2002). Interestingly, onlyPOLD1 transcripts were detected in mature leaves andinduced upon UV irradiation treatment (Uchiyamaet al., 2002), leading to the suggestion that POLD1 hasspecific DNA repair functions independent of POLD2.An alternate explanation is that POLD2 protein activ-ity does not correlate with its transcription. POLD1and PCNA protein levels correlate in maize, furthersupporting the conclusion that plant POLD functionsin DNA replication (Garcia et al., 2006).
A previously published alignment of Arabidopsis,soybean, rice, and maize sequences indicated that plantPOLD1 proteins contain most of the conserved domainspresent in other eukaryotic POLD1 proteins, but thedicot sequences lacked two C-terminal zinc finger mo-
tifs (Garcia et al., 2006). This result was surprising be-cause these motifs are highly conserved in othereukaryotes and are critical for interaction with POLD2.The AtPOLD1 and GmPOLD1 sequences were derivedfrom transcripts in GenBank (accessions NP_201201 andAAC18443, respectively). We identified a second Arabi-dopsis transcript (accession ABA41487) that utilizes analternative splice donor site (GT) at the end of exon 27(data not shown) that results in a frameshift, whichrestores conservation in the C terminus, including thezinc finger motifs. Similarly, we searched the TIGR-TAdatabase and identified an assembly (TA66266_3847) forGmPOLD1 that encodes a protein with these zinc fingermotifs. It is not known if these transcripts representbona fide alternative splicing events or artifacts, but it isclear that dicots produce transcripts specifying proteinsthat contain these important zinc finger domains.
Our analysis indicated that Arabidopsis and ricePOLD2 proteins also contain all of the sequence fea-tures conserved between animals and yeasts (data notshown). In humans, a region of hydrophobic residues(MRPFL) near the N terminus of POLD2 has beenshown to mediate interaction with PCNA (Lu et al.,2002). We found a similar sequence (MRT/NLL) inArabidopsis and rice POL2D proteins at this position.We also identified a conserved PCNA-binding motif inArabidopsis and rice POLD3, suggesting that multiplePOLD subunits mediate PCNA interactions in plantsas has been reported for human (Ducoux et al., 2001).The N termini of plant and animal POLD3 also showsignificant sequence similarity, while their central re-gions are more divergent (data not shown).
The POLD4 subunit is not essential for growth infission yeast (Reynolds et al., 1998), but increases theprocessivity of both fission yeast and human POLD1 to3 complexes in vitro (Zou and Stillman, 2000; Li et al.,2006). Human POLD4 also stabilizes the POLD com-plex and participates in interactions with PCNA (Liet al., 2006). A POLD4 homolog has not been identifiedin budding yeast, and it has been uncertain whether theplant POLD complex consists of three or four subunits.We identified a single Arabidopsis POLD4 and twoputative POLD4 genes in rice (Table I), indicating thatPOLD consists of at least four subunits in plants.
Four POLE subunits have been identified in ver-tebrates (POLE1–4), budding yeast (POL2, DPB2,DPB4, and DPB3), and other eukaryotes (Johnson andO’Donnell, 2005; the vertebrate nomenclature is usedhere). In Arabidopsis, two genes encoding the catalyticsubunit (POLE1A, At1g08260 and POLE1B, At2g27120)and a single gene encoding the second largest subunit(POLE2) have been reported (Ronceret et al., 2005). BothAtPOLE1A and AtPOLE1B were shown to contain theconserved domains of other eukaryotic homologs(Ronceret et al., 2005). Mutations in either AtPOLE1Aor AtPOLE2 result in DNA replication defects (Jeniket al., 2005; Ronceret et al., 2005), while AtPOLE1Bmutants do not exhibit visible phenotypic effects(Ronceret et al., 2005). These results indicate thatAtPOLE1B is not required when AtPOLE1A is present.
Core DNA Replication Machinery in Plants
Plant Physiol. Vol. 144, 2007 1707 www.plantphysiol.orgon June 18, 2020 - Published by Downloaded from
However, AtPOLE1A/B double mutants arrest earlierthan single AtPOLE1A mutants, suggesting somefunctional overlap in vivo (Jenik et al., 2005). We iden-tified a single POLE1 gene in rice that specifies a proteinthat is 66% and 63% identical to AtPOLE1A andAtPOLE1B, respectively (Table I). Like the AtPOLE1proteins, OsPOLE1 contains all of the functional do-mains conserved in other eukaryotes.
We identified two candidate POLE2 genes in the ricegenome (Table I). The gene models for these loci(OsPOLE2A, LOC_Os05g06840.1 and OsPOLE2B, LOC_Os08g36330.1) predict proteins that are considerablyshorter than other eukaryotic POLE2 proteins and aremissing several highly conserved domains. Becausethese gene models were derived solely by computa-tional methods, we searched the TIGR-TA database forbiological transcripts. We identified a single transcriptassembly (TA60386_4530) representing OsPOLE2. Thistranscript aligns to the OsPOLE2A locus but has adifferent intron/exon structure than the computationalgene model. Translation of this sequence results in aprotein containing the domains missing from the com-putational model and likely specifies a functionalOsPOLE2 protein. We were unable to detect any biol-ogical transcripts for the OsPOLE2B gene, and stopcodons in the genomic sequence prevent the predic-tion of a full-length coding sequence that would con-tain all of the conserved domains. As a consequence,we suggest that OsPOLE2B is a pseudogene.
Our search for POLE3 and POLE4 homologs inArabidopsis and rice returned a family of histone-foldproteins, which includes the core histones as well as alarge number of CCAAT box-binding transcriptionfactors. Histone-fold proteins share a conserved three-dimensional conformation but are only distantly re-lated in primary sequence (Arents and Moudrianakis,1995; Marino-Ramirez et al., 2006). We were unable tospecify POLE3 and POLE4 homologs based on se-quence similarity. However, the POLE small subunitshave been identified and functionally verified in amultitude of other eukaryotes (Garg and Burgers,2005), and it is likely that a directed experimentalapproach will identify plant homologs.
PCNA, the processivity clamp for POLD, is highlyconserved among eukaryotes and is structurally re-lated to the bacterial b-sliding clamp (Maga andHubscher, 2003; Naryzhny et al., 2005). PCNA homo-logs have been described in numerous plants (Toueilleet al., 2002) and will not be described in detail here.
Replication factor C (RFC) is a five-subunit clamploader complex that uses ATP to load PCNA ontoDNA (Ellison and Stillman, 1998; Venclovas et al.,2002; Majka and Burgers, 2004). Variable nomenclaturehas been used to describe the subunits in yeasts andanimals with HsRFC1-5, corresponding to buddingyeast ScRFC1, ScRFC4, ScRFC5, ScRFC2, and ScRFC3,respectively (we have adopted the human nomencla-ture here). All five RFC subunits have been identifiedin both Arabidopsis and rice, and contain conservedsequence motifs characteristic of other eukaryotic
RFCs (Furukawa et al., 2003). In rice, the RFC subunitsare expressed in proliferating tissues and transcriptlevels respond to chemical treatments that arrest cellcycle progression (Furukawa et al., 2003). The se-quence conservation and experimental data indicatethat like other eukaryotes, plants utilize a five-subunitRFC complex to load PCNA.
Budding yeast DPB11 is essential for the recruitmentof POLE and POLA complexes to origins (Masumotoet al., 2000; Bell and Dutta, 2002). Fission yeast RAD4,human TOPBP1, Drosophila MUS101, and XenopusCUT5 proteins are all thought to be functional homo-logs of ScDPB11 (Hashimoto and Takisawa, 2003; Kimet al., 2005). Amino acid sequence conservation be-tween the yeast and animal proteins is limited, but allcontain copies of the breast cancer 1 gene (BRCA1)C-terminal domain (BRCT). Four BRCT domains arepresent in ScDPB11, while HsTOPBP1 and DmMUS101contain eight and seven copies of the BRCT domain,respectively (Makiniemi et al., 2001; Kim et al., 2005).We searched for an Arabidopsis DPB11/TOPBP1homolog and found two BRCT domain-containingproteins, meiosis 1 (MEI1, At1g77320), and At4g02110(Table I). AtMEI1 contains five BRCT domains andplays an essential role in DNA repair during meiosis(Grelon et al., 2003). At4g02110 contains only twoBRCT domains (Pfam data not shown) but does showsignificant similarity to other TOPBP1 proteins (TableI). Based on sequence similarity, it is not possible todetermine if one, both, or neither of these proteins aretrue homologs of TOPBP1. AtMEI1 mutants do notdisplay any visible mitotic phenotypes, suggestingthat plants do not require a TOPBP1 homolog or thatanother protein performs this function (Grelon et al.,2003). In rice, we identified one protein (Os11g08660)that is most similar to AtMEI1 (38% identity) and twoproteins with similarity to At4g02110 (Table I). Con-sidering that TOPBP1/DPB11 function is conservedfrom yeast to human, a directed effort to identify afunctional homolog in plants would be worthwhile.
Okazaki Fragment Maturation
Semidiscontinuous replication requires machineryto process the Okazaki fragments generated duringlagging strand synthesis (Fig. 1D). As POLD/PCNAextends the Okazaki fragment, it encounters the 5# endof the downstream replication product and displaces itfrom the template strand, generating a flap (Magaet al., 2001). The flap is then cleaved to generate a nick,which is ligated to form the intact nascent strand (Kaoand Bambara, 2003). The dominant mechanism of flapcleavage requires Flap Endonuclease1 (FEN1) tocleave the 5# flap structure and DNA Ligase1 (LIG1)to seal the nick (Kao et al., 2004; Rossi and Bambara,2006). Both FEN1 and LIG1 homologs have beendescribed in plants (Table I). Other models of Okazakifragment maturation require DNA2 and/or RNASE Hin addition to FEN1 for efficient processing of the flapstructure (Qiu et al., 1999; Masuda-Sasa et al., 2006;
Shultz et al.
1708 Plant Physiol. Vol. 144, 2007 www.plantphysiol.orgon June 18, 2020 - Published by Downloaded from
Stewart et al., 2006). We identified a single putativeDNA2 gene in both the Arabidopsis and rice genomes(Table I). AtDNA2 is 33% identical to HsDNA2 and54% identical to the putative OsDNA2 protein. Wewere unable to identify a RNASEH1 homolog in anyplant species, but both Arabidopsis and rice encode aRNASE H2 homolog. Perhaps RNASE H2 is the dom-inant RNASE H enzyme in plants.
Multiple Copy Core DNA Replication Genes
Plants may be unique among eukaryotes in that theyhave multiple copies of numerous core DNA replica-tion genes (Table II). This raises the question ofwhether some copies have evolved specialized func-tions. Indeed, this has been demonstrated for thesingle-stranded DNA (ssDNA)-binding RPA complexin rice (see references cited below).
RPA functions as a heterotrimeric complex to stabi-lize ssDNA during replication, repair, and transcription(Iftode et al., 1999; Fanning et al., 2006; Zou et al., 2006).The largest subunit (RPA1) contains the primaryssDNA-binding activity, while the two smaller subunits(RPA2 and RPA3) stabilize the complex and mediateinteractions with replication and repair machinery (Zouet al., 2006). Aside from humans, which have two RPA2homologs (Keshav et al., 1995), plants are the onlyeukaryotes that possess multiple copies of an RPA gene.Rice has three copies each of RPA1 and RPA2, and asingle RPA3 gene (Table II). Arabidopsis has five puta-tive RPA1 genes and two copies each of RPA2 and RPA3(Table II). Three distinct RPA complexes, termed Atype, B type, and C type have been characterized in rice(Ishibashi et al., 2005, 2006). The A-type complex local-izes to the chloroplast and, thus, is expected to functionprimarily in organelle processes. The B- and C-type
complexes both localize to the nuclear compartment,suggesting that they act in nuclear processes, but theprecise function of each complex remains to be deter-mined. It is not known whether analogous complexesoccur in Arabidopsis, but mutation of ArabidopsisRPA1A (At2g06510) is lethal while mutation of a differ-ent RPA1 copy (At5g08020) results in viable but mutagen-sensitive plants (Ishibashi et al., 2005).
Three members of the pre-RC (ORC1, CDC6, andCDT1) are duplicated in Arabidopsis. Both the AtORC1A(At4g14700) and AtORC1B (At4g12620) promotershave been shown to contain consensus E2F-bindingsites (Masuda et al., 2004), but only AtORC1A tran-scripts were found to be elevated in tissues that un-dergo extra endocycles (Diaz-Trivino et al., 2005),suggesting distinct functions with respect to mitoticand endocycling cells. Similarly, AtCDC6A (At2g29680)and AtCDC6B (At1g07270) have distinct expressionprofiles (Masuda et al., 2004). The only other eukary-ote known to have multiple CDC6 genes is Xenopuslaevis, where XlCDC6A and XlCDC6B have distinctN-terminal regulatory motifs and different expressionpatterns in the developing frog embryo (Tikhmyanovaand Coleman, 2003). XlCDC6A acts prior to the mid-blastula transition, after which XlCDC6B becomes thedominant protein (Tikhmyanova and Coleman, 2003).The midblastula transition coincides with extensivechromatin remodeling, activation of zygotic transcrip-tion, and a clear shift in the regulation of origin usage. Ithas been suggested that XlCDC6A and XlCDC6B playkey roles in determining origin usage during develop-ment. An understanding of whether the two AtCDC6genes are functionally distinct awaits further analysis.We identified only single copies of ORC1 and CDC6 inrice, indicating that any specialized functions that mayhave evolved in Arabidopsis are not required for plantdevelopment. Rice contains two nearly identicalCDC45 proteins while Arabidopsis only has one, butboth Arabidopsis and rice contain two CDT1 genes. Adirected effort to understand whether these multicopyDNA replication genes in plants have functional sig-nificance would be worthwhile.
PCNA and POLE1 genes have also been duplicatedin Arabidopsis (Table II). Interestingly, we observedthat the AtCDC6B (At1g07270), AtPCNA1 (At1g07370),and AtPOLE1A (At1g08260) genes are located inclose physical proximity on chromosome 1, and theother copies of these genes, AtCDC6A (At2g29680),AtPCNA2 (At2g29570), and AtPOLE1B (At2g27120),are clustered on chromosome 2 (data not shown). Apublished analysis of segmental duplications in theArabidopsis genome indicated that this region wasduplicated in a polyploidy event approximately 24 to40 million years ago, prior to the Arabidopsis/Brassicarapa split (Blanc et al., 2003). We compared the se-quences and found that the levels of nucleotide con-servation between each copy of AtCDC6, AtPCNA,and AtPOLE1 are similar at 82%, 85%, and 85%, re-spectively. However, at the amino acid level, the twocopies of AtPCNA and AtPOLE1 are identical at 96%
Table II. Copy numbers of core DNA replication genes
aTOPBP1 aliases include DPB11, CUT5, MUS101, and RAD4. bThesecond copy of RPA2 in human is named RPA4. GF, Putative gene family.Species abbreviations: Hs, Human; Sc, yeast; At, Arabidopsis; Os, rice.
Core DNA Replication Machinery in Plants
Plant Physiol. Vol. 144, 2007 1709 www.plantphysiol.orgon June 18, 2020 - Published by Downloaded from
and 90% of residues, respectively, while the two copiesof AtCDC6 only show 72% identity. This situationprovides an excellent opportunity for a more detailedanalysis of the different evolutionary pressures onthese genes.
CONCLUSION
Through genome-wide bioinformatic analysis ofArabidopsis and rice and a comprehensive review ofthe extant literature, we report that the core DNAreplication machinery of animals and yeasts is con-served in plants. Generalization to other plant speciesis supported by the inclusion of both a monocot and adicot in this analysis. Identification of components thathave not previously been reported from any plant,including the GINS complex, MCM10, NOC3, POLA2to 4, POLD3 and 4, and RNASEH2, will open up newavenues of research. Additionally, extension of manypreviously reported components to include both mono-cot and dicot proteins should facilitate comparisonwithin plants and between plants and other eukaryotes.
We did not detect candidate homologs for RNASEH1 or geminin, leading us to suggest that these pro-teins are not conserved in plants. Geminin is a crit-ical regulator of CDT1 activity in some metazoans(Ballabeni et al., 2004; Lutzmann et al., 2006) but hasnot been identified in yeast. It would be interesting todetermine how CDT1 activity is regulated in plants.An intriguing possibility is that one or more membersof the large CDK and cyclin gene families encoded byplant genomes (Vandepoele et al., 2002) function asCDT1 regulators. Very little is known about RNASE Henzymes in plants, and the lack of an obvious RNASEH1 homolog in plants suggests that plants may bedifferent from other eukaryotic organisms in thisregard. However, we did identify an RNASE H2 genein both Arabidopsis and rice, and an effort to define thefunctional capacity of this RNASE H2 enzyme in plantsis needed.
Our analysis also indicated that core DNA replica-tion proteins from plants are more similar to humanthan to budding yeast proteins (Table I). This obser-vation holds true for the majority of proteins listed inTable I and is most striking for ORC3, ORC5, ORC6,CDT1, TOPBP1, and POLD3, for which no significantalignment between Arabidopsis and budding yeastproteins could be generated. The parallels in the coreDNA replication machinery between plants and ani-mals are not limited to amino acid similarity. Forexample, budding yeast have only three POLD sub-units, while animals have four, and there are fourstrong POLD candidates in both Arabidopsis and rice.The Arabidopsis and rice genomes also encode puta-tive MCM8 and MCM9 proteins, which are part of thereplication initiation complex in animals but not inyeasts. In summary, the available data suggest thatanimal systems may be more relevant models thanbudding yeast for plant DNA replication.
We also found numerous components of the coreDNA replication machine that are encoded by smallgene families in both Arabidopsis and rice. With fewexceptions, this situation seems to be unique to plants.There is some evidence of functional divergence be-tween copies, and it would be interesting to investi-gate the evolutionary relationships and functionalroles of these genes in greater detail. There are manyexamples of overlapping functions in DNA replicationand repair machinery (Kimura and Sakaguchi, 2006),and it is attractive to hypothesize that some membersof DNA replication gene families have specializedroles related to repair. Similarly, plant cells often un-dergo endoreduplication as part of normal develop-ment, and there is some evidence suggesting thatmembers of DNA replication gene families have spe-cialized roles in this process. A systematic approachto determining the function of each gene copy in suchfamilies would provide an important contribution tothe fields of DNA replication and plant developmentalbiology.
MATERIALS AND METHODS
Assembly of Yeast and Animal Reference Sequences
Core DNA replication genes were defined primarily by review of the
literature. The STRING database (Snel et al., 2000; von Mering et al., 2007) was
also used to supplement known protein interaction networks. Yeast (Saccha-
romyces cerevisiae) sequences were downloaded from the National Center for
Biotechnology Information (NCBI) and the Saccharomyces Genome Database.
Animal sequences were downloaded from the NCBI RefSeq database when
possible, and the NCBI nonredundant database otherwise. The NCBI Homo-
loGene database was useful for identifying homologs in various organisms.
The BLAST programs (BLASTP and TBLASTN) were used to query sequence
databases. Vector NTI (Invitrogen) was used to manage and analyze se-
quences in house. GenBank accession numbers are provided in Supplemental
File S1.
Plant Sequences
To identify core DNA replication proteins in Arabidopsis (Arabidopsis
thaliana), yeast and animal proteins were used to query (BLASTP) the
Arabidopsis genome databases at TAIR, TIGR, and NCBI. Sequences with
significant similarity were downloaded into our Vector NTI database and
putative annotations were assigned based on the function of yeast and animal
proteins. Next, the NCBI PubMed and ISI Web of Science literature databases
were queried for publications relevant to each protein in a plant system.
Pertinent information was used to manually curate the putative annotations
we assigned based on sequence similarity. This curated list of Arabidopsis
proteins was then used to query (BLASTP and TBLASTN) the rice (Oryza
sativa L. sp. japonica) genome database managed by TIGR. Sequences with
significant similarity to Arabidopsis proteins were downloaded into our
Vector NTI database and annotated accordingly. Transcripts representing core
DNA replication proteins from all other plants in this analysis were down-
loaded from either the TIGR plant transcript assembly (Childs et al., 2007) or
NCBI databases. Protein translations were performed using the Vector NTI
software package.
Protein Sequence and Phylogenetic Analyses
Percent amino acid identity and similarity values reported in Table I,
Supplemental Table S1, and in the text were generated by pairwise BLAST on
the NCBI Web site using the default parameters. The percentages reported
correspond to regions of the proteins that were aligned by the algorithms. In
Shultz et al.
1710 Plant Physiol. Vol. 144, 2007 www.plantphysiol.orgon June 18, 2020 - Published by Downloaded from