Top Banner
Insect Molecular Biology (2006) 15(5), 703–714 © 2006 The Authors Journal compilation © 2006 The Royal Entomological Society 703 Blackwell Publishing Ltd Caste development and reproduction: a genome-wide analysis of hallmarks of insect eusociality A. S. Cristino*, F. M. F. Nunes†, C. H. Lobo†, M. M. G. Bitondi§, Z. L. P. Simões§, L. da Fontoura Costa¶, H. M. G. Lattorff**, R. F. A. Moritz**, J. D. Evans†† and K. Hartfelder‡ * Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, Brazil; Departamento de Genética and Departamento de Biologia Celular e Molecular e Bioagentes Patogênicos, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, Brazil; § Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, Brazil; Instituto de Física de São Carlos, Universidade de São Paulo, São Carlos, Brazil; ** Institut für Biologie, Molekulare Ökologie, Martin-Luther-Universität Halle-Wittenberg, Halle (Saale), Germany; †† Bee Research Laboratory, USDA-ARS, BARC-E, Beltsville, MD, USA Abstract The honey bee queen and worker castes are a model system for developmental plasticity. We used estab- lished expressed sequence tag information for a Gene Ontology based annotation of genes that are differen- tially expressed during caste development. Metabolic regulation emerged as a major theme, with a caste- specific difference in the expression of oxidoreduct- ases vs. hydrolases. Motif searches in upstream regions revealed group-specific motifs, providing an entry point to cis -regulatory network studies on caste genes. For genes putatively involved in reproduction, meiosis- associated factors came out as highly conserved, whereas some determinants of embryonic axes either do not have clear orthologs ( bag of marbles , gurken , torso ), or appear to be lacking ( trunk ) in the bee genome. Our results are the outcome of a first genome-based initiative to provide an annotated framework for trends in gene regulation during female caste differentiation (representing developmental plasticity) and reproduction. Keywords: caste development, oogenesis, meiosis, UCR motifs, AlignACE. Introduction The evolution of social organization in the Hymenoptera is intricately linked to the division of reproductive activities between highly fertile queens and functionally sterile workers (Wilson, 1971). Ontogenetically, these alternative pheno- types primarily reflect the differential feeding of larvae, a mechanism that is especially pronounced in the honey bee, Apis mellifera . Queen-destined larvae are fed large amounts of royal jelly during the entire larval feeding phase, whereas larvae destined to become workers receive an altered diet during the last larval instars (Haydak, 1970). This differential feeding program, in turn, acts on the endo- crine system where it generates caste-specific signatures in juvenile hormone (JH) and ecdysteroid titres (Hartfelder & Engels, 1998; Rachinsky et al ., 1990). These metamorphic hormones are part of the endocrine programme that drives morphogenesis into either of the two alternative pathways. The major differences between an adult honey bee queen and a worker reside in the reproductive system. A queen usually has close to 200 ovarioles per ovary and is capable of producing several hundred eggs per day. Workers in contrast have between two and 12 ovarioles per ovary (Snodgrass, 1956), which do not show signs of ongoing oogenesis as long as the queen is present. If the queen is lost, a number of workers can activate their ovaries and produce haploid eggs that will develop into drones (Kropácová & Haslbachová, 1971; Page & Erickson, 1988; Moritz et al ., 1996). In order to come to an understanding of the molecular nature and the signal transduction pathways underlying these developmental and ovary activation signals, differential gene expression profiling in honey bee caste development Received 20 April 2006; accepted after revision 17 July 2006. Correspond- ence: Klaus Hartfelder, Departamento de Biologia Celular e Molecular e Bioagentes Patogênicos, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Avenida Bandeirantes 3900, 14049-900 Ribeirão Preto, Brazil. Tel.: +55 16 36023063; fax: +55 16 36331786; e-mail: [email protected] Re-use of this article is permitted in accordance with the Creative Commons Deed, Attribution 2·5, which does not permit commercial exploitation.
12

Caste development and reproduction: a genome-wide analysis of hallmarks of insect eusociality

Apr 21, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Caste development and reproduction: a genome-wide analysis of hallmarks of insect eusociality

Insect Molecular Biology (2006)

15

(5) 703ndash714

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society

703

Blackwell Publishing Ltd

Caste development and reproduction a genome-wide analysis of hallmarks of insect eusociality

A S Cristino F M F Nunesdagger C H Lobodagger M M G Bitondisect Z L P Simotildeessect L da Fontoura Costapara H M G Lattorff R F A Moritz J D Evansdaggerdagger and K HartfelderDagger

Instituto de Matemaacutetica e Estatiacutestica Universidade de Satildeo Paulo Satildeo Paulo Brazil

dagger

Departamento de Geneacutetica and

Dagger

Departamento de Biologia Celular e Molecular e Bioagentes Patogecircnicos Faculdade de Medicina de Ribeiratildeo Preto Universidade de Satildeo Paulo Ribeiratildeo Preto Brazil

sect

Departamento de Biologia Faculdade de Filosofia Ciecircncias e Letras de Ribeiratildeo Preto Universidade de Satildeo Paulo Ribeiratildeo Preto Brazil

para

Instituto de Fiacutesica de Satildeo Carlos Universidade de Satildeo Paulo Satildeo Carlos Brazil

Institut fuumlr Biologie Molekulare Oumlkologie Martin-Luther-Universitaumlt Halle-Wittenberg Halle (Saale) Germany

daggerdagger

Bee Research Laboratory USDA-ARS BARC-E Beltsville MD USA

Abstract

The honey bee queen and worker castes are a modelsystem for developmental plasticity We used estab-lished expressed sequence tag information for a GeneOntology based annotation of genes that are differen-tially expressed during caste development Metabolicregulation emerged as a major theme with a caste-specific difference in the expression of oxidoreduct-ases vs hydrolases Motif searches in upstream regionsrevealed group-specific motifs providing an entrypoint to

cis

-regulatory network studies on caste genesFor genes putatively involved in reproduction meiosis-associated factors came out as highly conservedwhereas some determinants of embryonic axes eitherdo not have clear orthologs (

bag of marbles

gurken

torso

) or appear to be lacking (

trunk

) in the bee

genome Our results are the outcome of a firstgenome-based initiative to provide an annotatedframework for trends in gene regulation during femalecaste differentiation (representing developmentalplasticity) and reproduction

Keywords caste development oogenesis meiosisUCR motifs AlignACE

Introduction

The evolution of social organization in the Hymenoptera isintricately linked to the division of reproductive activitiesbetween highly fertile queens and functionally sterile workers(Wilson 1971) Ontogenetically these alternative pheno-types primarily reflect the differential feeding of larvaea mechanism that is especially pronounced in the honeybee

Apis mellifera

Queen-destined larvae are fed largeamounts of royal jelly during the entire larval feeding phasewhereas larvae destined to become workers receive analtered diet during the last larval instars (Haydak 1970)This differential feeding program in turn acts on the endo-crine system where it generates caste-specific signaturesin juvenile hormone (JH) and ecdysteroid titres (Hartfelderamp Engels 1998 Rachinsky

et al

1990) These metamorphichormones are part of the endocrine programme that drivesmorphogenesis into either of the two alternative pathways

The major differences between an adult honey beequeen and a worker reside in the reproductive system Aqueen usually has close to 200 ovarioles per ovary and iscapable of producing several hundred eggs per dayWorkers in contrast have between two and 12 ovariolesper ovary (Snodgrass 1956) which do not show signs ofongoing oogenesis as long as the queen is present If thequeen is lost a number of workers can activate theirovaries and produce haploid eggs that will develop intodrones (Kropaacutecovaacute amp Haslbachovaacute 1971 Page amp Erickson1988 Moritz

et al

1996)In order to come to an understanding of the molecular

nature and the signal transduction pathways underlyingthese developmental and ovary activation signals differentialgene expression profiling in honey bee caste development

Received 20 April 2006 accepted after revision 17 July 2006 Correspond-ence Klaus Hartfelder Departamento de Biologia Celular e Moleculare Bioagentes Patogecircnicos Faculdade de Medicina de Ribeiratildeo PretoUniversidade de Satildeo Paulo Avenida Bandeirantes 3900 14049-900Ribeiratildeo Preto Brazil Tel +55 16 36023063 fax +55 16 36331786e-mail klausfmrpuspbr

Re-use of this article is permitted in accordance with the Creative CommonsDeed Attribution 2middot5 which does not permit commercial exploitation

704

A S Cristino

et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society

Insect Molecular Biology

15

703ndash714

was initiated in the late nineties The main body of currentlyavailable data resulted from a cDNA library generated bysuppression subtractive hybridization (SSH) that con-trasted queen and worker larvae (Evans amp Wheeler 1999)Subsequent macroarray analyses (Evans amp Wheeler 2000)revealed a clustering of these expressed sequence tags(ESTs) into three distinct groups genes overexpressed inyoung (bipotent) larvae genes overexpressed in fifth-instarqueen larvae and genes overexpressed in fifth-instar workerlarvae A second study focusing on oxidative metabolismidentified a set of differentially expressed mitochondrialgenes (Corona

et al

1999) The third approach was a DDRT-PCR screen for hormone responsive genes to investigatethe mode of action of ecdysteroids in the differentiation of thelarval ovary (Hepperle amp Hartfelder 2001) Many of theseEST sets could not be properly annotated at that time eitherbecause of a limited number of fully sequenced insectgenomes or because the libraries contained large numbersof transcripts in 3

prime

-gene regions including poorly con-served untranslated regions (UTRs) The draft assembly forthe honey bee genome (Honey Bee Genome SequencingConsortium 2006) now permits a much more reliable anno-tation of this unique set of experimentally validated genes

Reproductive activity of honey bees is determined in atwo-step process The basic differences in reproductivecapacity between queen and workers manifest themselvesduring larval development by a wave of programmed celldeath that leads to the destruction of over 95 of the ovarioleprimordia in the larval ovary of workers (Schmidt-Capella ampHartfelder 1998) In the adult life cycle of each caste theco-ordinated flux of egg production through previtellogenicand vitellogenic growth will require the activity of other setsof genes Some of these act as determinants of the majoregg and also embryonic axes As the fruit fly is the most welldeveloped insect model for axis determination (St Johnstonamp Nuumlsslein-Volhard 1992) and maternal factors have notyet been functionally characterized in the honey bee search-ing the genome assembly (Honey Bee Genome Sequenc-ing Consortium 2006) provides the first major opportunityto explore putative patterning networks in honey bees

The vitellogenic growth phase of the honey bee oocytehas long been the centre of attention as a means ofdescribing differential fertility of the female castes (Engels1974) The synthesis of large amounts of vitellogenin by thequeen fat body is intimately related to her high reproductiverate The equally high vitellogenin titres in haemolymph ofnonreproducing young worker bees however have beenan enigma as their ovaries are inactive in the presence ofthe queen Vitellogenin expression has apparently becomeuncoupled from oocyte growth during the evolution of thesterile worker caste and has acquired secondary functionsIt became involved in the production of royal jelly (Amdam

et al

2003) and in the regulation of worker lifespan(Amdam

et al

2004) through an inhibitory effect on the

endocrine system (Guidugli

et al

2005) Along with suchunique life-history traits related to socially organized repro-duction honey bees also promise to answer new questionsinvolving meiosis as the honey bee genome exhibitsrecombination rates that exceed those of all other higherorganisms (Hunt amp Page 1995 Solignac

et al

2004) andas honey bee males being haploid forego meiosis I inproducing gametes

The honey bee genome sequence database (Honey BeeGenome Sequencing Consortium 2006) has become anextremely valuable resource not only for comparative genom-ics but also for functional genomics One of the oldest andfor evolutionary biologists most challenging question insocial insect biology is the development of a reproductiveand a nonreproductive caste (Darwin 1859) Apart from itsimplications on evolutionary theory in terms of kin selection(Hamilton 1964) this is essentially a question of howdevelopmental pathways diverge to shape distinct pheno-types and how oogenesis is regulated to achieve levels ofextremely high (queen) and extremely low (worker) fertility

The annotation of genes related to caste developmentand differential reproduction in the honey bee has impli-cations well beyond this species It represents the firstgenome-wide annotation of a molecular architecture behindreproductive division of labour In the light of current dis-cussions on the importance of alternative phenotypes inthe evolution of novelties (West-Eberhard 2003) the honeybee genome information is certainly one of the most valuableresources In the present manuscript we delineate a strategyon how to transcend from a straightforward gene annota-tion approach to functional studies based on motif analysisof upstream regulatory regions

Results and discussion

From caste to BLAST differentially expressed genes in caste development

The full list of genes that are overexpressed in fifth-instarqueen or worker larvae is made available online in theSupplementary material (Table 1S) This list includesscaffold number corresponding EST number(s) GLEAN3-predicted protein sequence similarity and identity indicesto corresponding

Drosophila melanogaster

orthologs aswell as protein domain information (Pfam)

A general result was that a relatively large subset ofgenes (nine of 34) overexpressed in honey bee queenlarvae is represented by putative

Drosophila

orthologs forwhich no Gene Ontology (GO) term for Biological Processis indicated in Flybase In contrast all worker genes corre-spond to functionally relatively well-defined

Drosophila

genes Even when taking into consideration the conceptuallimits in attributing GO terms on biological process from

Drosophila

orthologs to honey bee genes this finding couldhave a bearing on basic questions in socioevolution

Genomics of honey bee caste development and reproduction

705

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society

Insect Molecular Biology

15

703ndash714

namely which caste is the novelty the queen or theworker(s) Phrased in other terms the genome sequenceinformation now permits to address at a molecular levelquestions that are fundamental to understand the role of(and evolutionary trends in) ontogenetic processes thatstructure insect societies especially in hymenopteransSuch basic questions are (1) how many degrees offreedom (or release from constraints) may actually havebeen gained from splitting the functions normally performedby a solitary ancestral hymenopteran female into two ormore castes and (2) how was this release from constraintsintegrated into postembryonic differentiation processesto generate truly alternative phenotypes A second obser-vation of potential interest to functional genomics was thata relatively large subset of the caste-related genes mapsto chromosome 2 (seven of 51 unique sequences)

Most genes in the caste gene list are represented by oneor two EST hits except for a predicted

hexamerin 70b

gene(GB10869-PA) This gene was evidenced by 10 ESTs onein a 5

prime

-located exon and nine in the 3

prime

region (five ESTscomprising parts of exon 7 and parts of the 3

prime

-UTR theother four ESTs landing in exons 6 and 7) The macroarraydata (Evans amp Wheeler 2000) established this gene asoverexpressed in the worker caste Hexamerins are animportant class of storage proteins that show interestingexpression patterns related to caste and reproduction inmany social insects (Martinez

et al

2001 Hunt

et al

2003 Zhou

et al

2006ab) A cDNA encoding the honeybee Hexamerin 70b subunit has recently been cloned andsequenced (Cunha

et al

2005) and hormone manipula-tion experiments showed that the abundance of

hexamerin70b

transcripts in larval development is positively cor-related with high levels of JH and ecdysteroids This

could actually reflect a regulatory feedback function in JHtitre regulation as exemplified in the termite

Reticulitermesflavipes

where the Hex1Hex2 ratio controls JH availabilityfor caste-specifically differentiating tissues (Zhou

et al

2006b)

Within the honey bee caste genes for which GO infor-mation was imported and deduced from their

Drosophila

orthologs we noted a predominance of terms clusteringas lsquocellular physiological processrsquo (95 GO0050875) andlsquometabolismrsquo (90 GO0008152) in the lsquoBiological Processrsquo(GO0008150) category (Fig 1A) GO-statistics differencesbetween queens and workers became apparent in termsclustering as lsquocell differentiationrsquo (0 for queen and 285for workers GO0030154) and lsquometabolismrsquo (96 forqueen and 785 for worker GO0008152) in the lsquoBiologicalProcessrsquo (GO0008150) (Fig 2A)

With respect to lsquoMolecular Functionrsquo (GO0003674)most terms were related to mRNA translation (lsquonucleic acidbindingrsquo (38 GO0003676) lsquostructural constituent ofribosomersquo (24 GO0003735) lsquoprotein bindingrsquo (12GO0005515) lsquonucleotide bindingrsquo (12 GO0000166)lsquotranslation factor activity nucleic acid bindingrsquo (7GO0008135) Further important terms were lsquooxidoreductaseactivityrsquo (19 GO0016491) and lsquohydrolase activityrsquo(165 GO0016787) (Fig 1B) For these latter two termswe noted potentially interesting differences related to castewith lsquohydrolase activityrsquo being overrepresented by workertranscribed genes whereas lsquooxidoreductase activityrsquo wasexclusively represented by queen genes (Fig 2B) Eventhough these GO assignments on Molecular Function arebased on evidence from

D melanogaster

without experi-mental evidence for

Apis mellifera

the correspondinggenes are well conserved in sequence and show the

Figure 1 Dominant gene ontology terms for (A) Biological Process and (B) Molecular Function in honey bee genes with an experimentally validated caste-specific expression pattern during the last larval instar The graph was generated by a FatiGO analysis set at level 3 Frequencies indicate the appearance of GO terms in the total set of queen and worker differentially expressed genes

706

A S Cristino

et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society

Insect Molecular Biology

15

703ndash714

relevant protein domains (Supplementary materialTable 1S

)

and thus are indicative of functional trendsIn general terms the caste-specific separation into

metabolic pathway preferences oxidoreductases vshydrolases may reflect the switch in diet that a workerlarva experiences during the fourth and fifth larval instarThis represents a switch from a proteinlipid-rich diet toa more carbohydrate-rich diet (Haydak 1970) and thisswitch apparently is accompanied by an increase in theexpression of genes coding for proteins with hydrolaseactivity Similar switches in gene expression patternshave recently been reported for

D melanogaster

in anexperiment where larvae were shifted from a cornmealdiet to a banana diet (Carsten

et al

2005) resulting inthe up- or downregulation of 55 genes of a test populationof 6000 Among these are five genes with dehydrogenaseoxidoreductase activity These parallels in dietary switchresponses are indicative of conserved coregulated genenetworks An open question is of course how these can beco-opted to generate different phenotypes such as thecastes of social insects In this respect social insects clearlygo a big step beyond the simple metabolic switch responseseen in

Drosophila

They have apparently incorporateddivergent metabolic regulation into a network architectureconsistent with morphogenetic differentiation This requiredthat metabolic regulation became integrated throughthe endocrine system with developmental patterningprocesses

The importance of metabolic regulation on caste devel-opment has also come to light in a recent RepresentationalDifference Analysis (RDA) study on caste development inthe highly eusocial stingless bee

Melipona quadrifasciata

(Judice

et al

2006)

This is particularly interesting becausein this genus caste development is thought to be based ona genetic predisposition (Kerr 1950) Metabolic regulationmay thus be a

sine qua non

for caste development and

caste-specific metabolic pathways may be set in motionrather independently of the nature of the initial switch(nutritional or genetic) The question of how this metabolicswitch may integrate with the resultant endocrine signaturecharacteristic for each caste is still a widely open field butrecent studies in

Drosophila

showing an interaction betweenecdysone and insulin signalling in the determination ofbody size (Colombani

et al

2005 Mirth

et al

2005) mayprovide a lead

This is also the point to reflect on how justified it is toheuristically rely on

Drosophila

orthologs and to use their GOattributes in a developmental context (caste differentiation)that has no parallel in

Drosophila

A recent gene expressionprofiling study in the ant

Camponotus festinatus

employinga microarray set-up of 384 clones showed significantlydifferent expression levels for larval vs adult ants in 91genes (21 confirmed by qRTndashPCR) including an

Apishexamerin 70b

ortholog (Goodisman

et al

2005) Whencomparing the temporal expression patterns of these antgenes with expression profiles for their respective

Drosophila

orthologs (Arbeitsman

et al

2002) relatively little accordwas noted for the two species leading to the suggestionthat these genes may have taken on distinct functionsdue to the long divergence time between dipterans andhymenopterans (Goodisman

et al

2005) Differencesaside these examples show that in practically all studies onlarge-scale functional considerations in gene expression weare strongly wedded with

Drosophila

and even thoughfunctional divergence in orthologs may have occurredthere is little experimental gene-by-gene evidence availablefor any of the major insect orders outside of Diptera

Functional studies are clearly profiting from the nowavailable honey bee genome sequence as evident fromthe increasing number of RNAi experiments in honeybees(see citations in Honey Bee Genome Sequencing Consor-tium 2006) This is still a small number compared with

Figure 2 Gene Ontology categories with caste-specific expression patterns for Biological Process (A) Genes classified as part of cell differentiation processes are significantly overexpressed in workers whereas genes related to metabolism are overexpressed in queen larvae In the Molecular Function categories (B) we observed an apparent split indicating differential enzyme preferences in queens (overexpress oxidoreductases) and in workers (overexpress hydrolases) The graph was generated by a FatiGO analysis set at level 3 Frequencies indicate the appearance of GO terms in the queen (black bars) and worker differentially expressed genes (grey bars)

Genomics of honey bee caste development and reproduction

707

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society

Insect Molecular Biology

15

703ndash714

the large-scale RNAi assays established for

Drosophila

(Boutros

et al

2004) but the development of cell cultureapproaches in the honey bee (Bergem

et al

2006) repre-sents a step in this direction

Alternatively regulatory functional associations betweengenes and their integration into networks can be inferredfrom the presence of response elements in upstreamcontrol regions In our analysis of differentially expressedgenes in queen-worker development we took a bioinformaticsapproach for a first look into the molecular architecture of adevelopmental polyphenism

Motif search in upstream regions of differentially expressed genes

The genes related to caste development are among thefirst honey bee genes for which experimentally validatedexpression data were generated (Corona

et al

1999Evans amp Wheeler 1999 2000 Hepperle amp Hartfelder2001 Guidugli

et al

2004) Certainly these 51 genes donot comprise all the genes involved in caste developmentbut they are expected be prominent players as they werethe ones that stood out in the SSH and DDRT-PCRapproaches The 51 caste genes do not represent genefamilies but rather fall into many very different molecularfunction categories This made us ask whether theobserved overexpression pattern of different genes ineither queen or worker larvae may be associated with theoccurrence of specific regulatory motifs in the upstreamcontrol regions (UCR) of these genes

Three different algorithms AlignACE (Roth

et al

1998)MEME (Bailey amp Elkan 1995) and MDscan (Liu

et al

2002) were used to construct a pipeline for detectingoverrepresented motifs in the two unaligned sets of UCRsequences for the caste-specifically expressed genes Thispipeline was run on a lsquotop-10rsquo set of 12 genes (six for eachcaste) which showed the most pronounced caste differ-ences in expression (Evans amp Wheeler 2000) and alsoon a randomly selected set of UCRs (background control)We calculated four different metrics for each motif MAPscore (Roth

et al

1998) a group-specificity score (Churchscore) (Hughes

et al

2000) and a ROC AUC and MNCPmetric (Clarke amp Granek 2003) A first set of filters wasused to detect motifs with a potential for regulatory functions(MAP score

ge

5 ROC AUC

ge

07) This resulted in 46 motifsout of 123 total UCR motifs found in the queen UCR setand in 71 motifs out of 261 total found in the worker UCRset (Supplementary material Table 2S

)

A parametric statistical test (

MANOVA

P

= 00001Wilksrsquo = 078

F

= 72) and a nonparametric statistical test(KolmogorovndashSmirnov Table 1) on ROC AUC and MNCPindices showed that these two sets of filtered motifs aresignificantly different from a randomly selected set ofmotifs The rank-order metrics ROC AUC and MNCP havepreviously been used to compare the association of short

regulatory sequence features with gene expression data(microarray analyses on coregulated genes) and they havebeen useful in flagging false positives erroneously includedin lsquotop-10rsquo sets of differentially expressed genes (Clarke ampGranek 2003)

To select highly specific motifs found in each data set weused the group-specificity score (Church score

le

1e

minus

05

Hughes

et al

2000) to identify the most likely motifsinvolved in decision making for pathways leading to queen(two motifs Fig 3A) or to worker development (12 motifswith Church score

le

1e

minus

07

Fig 3B) As the SSH and DDRT-PCR approaches on caste development can be expectedto retrieve only a subpopulation of such genes these motifsrepresent only a partial scenario of the transcriptionalregulatory network underlying caste development Themotifs can now be used to screen other GLEAN3-predictedgenes to integrate a candidate list of putatively coregulatedgenes in caste development that can be submitted tofurther experimental validation

Each motif found in UCRs of queen (46) and worker (71)overexpressed genes was compared with the entire setof

D melanogaster cis

-regulatory motifs contained in theTRANSFAC database (version 40 Wingender

et al

2000)Only alignments passing 80 identity for each position-specific site matrix (PSSM) were considered as significantmatches Whereas none of the most specific motifs foreach caste showed similarity to any of the

D melanogaster

motifs some of the more ubiquitous ones did resemblebinding sites of transcription factors such as

AntennapediaUltrabithorax

zerknuumlllt

even skipped

trithorax-like

tailless

paired

fushi tarazu

and

Adh transcription factor 1

(Supple-mentary material Table 2S)

When we plotted the positions of the two queen and the12 worker motifs in the UCRs of the caste-specificallyexpressed genes (Fig 4) an interesting pattern emergedfor the worker-specific motifs Some of the worker motifsappeared to be clustered and occurring in tandem further-more they were positioned relatively close to the predictedtranslation start sites in some of the genes that are over-expressed during worker development (annotation resultsof these genes are listed in Supplementary materialTable 1S) A position close to the predicted translation start

Table 1 KolmogorovndashSmirnov analysis of ROC AUC and MNCP metric for statistical significance of putative regulatory motifs in upstream control regions of genes with queen or worker-specific expression patterns These motifs were contrasted with a random set of motifs detected in a random set of UCRs of GLEAN3-predicted honey bee genes

Group pairs ROC AUC MNCP

Random times (Queen + Worker) P gt 01 P lt 0001Random times Queen P gt 01 P lt 0005Random times Worker P gt 01 P lt 0001Queen times Worker P lt 01 P gt 01

708 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

sites is generally taken as a sign of strong regulatory effect(Davidson 2001)

As caste development is highly dependent on changesin haemolymph titres of JH and ecdysteroids we alsoscreened the UCRs of the differentially expressed genesfor putative nuclear receptor binding sites Regulatoryelements involved in the JH response are not well under-stood yet so any prediction in this direction would beelusive (Wheeler amp Nijhout 2003) Functional ecdysoneresponse elements (EcRE) have however been identifiedand it is now well established that the EcRUSP complexbinds to direct or inverted (palindromic) repeats (Riddi-hough amp Pelham 1987 Antoniewski et al 1995 Perera

et al 2005) A PSSM search (Wassermann amp Sandelin2004) based on a canonical representation (rGkTCAaT-Gamcy) (Perera et al 2005) did not reveal any putativeEcRE motif in the UCRs of the 51 caste-differentiallyexpressed genes However this does not rule out thatthese genes respond to changes in JH andor ecdysteroidtitres as these hormones require EcRUSP binding prima-rily in the expression of early genes but not necessarily forthe late response genes (Li amp White 2003 Sullivan ampThummel 2003)

In conclusion the predictions from such a combinedstrategy that searches for group-specific and for conservedregulatory motifs in GLEAN3 predicted honey bee genes

Figure 3 Putative regulatory motifs and their consensus sequences in UCRs of queen and worker overexpressed genes Scores for MAP Church ROC AUC and MNCP metrics indicate degree of group specificity and significance level

Genomics of honey bee caste development and reproduction 709

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

represents a major transition from nonhypothesis-drivenhigh-throughput screens to hypothesis-driven searches forcontext-dependent gene expression in honey bees Suchdirected search results can serve as a platform for experi-mental analyses of genome-wide integration in hormonalcontrol of caste development in bees In addition this studyexemplifies how existent algorithms for detecting sharedregulatory motifs can be joined into a toolkit for predictingcoregulated gene expression patterns in honey bees Thesemethods have been shown to be robust and are gainingacceptance for use in functional and comparative genomics(Liu et al 2004 Pritsker et al 2004 Zhu et al 2005)

Oogenesis and reproduction

As caste development sets the stage for reproductivedivision of labour genes involved in reproductive processesare strong candidates for functional analyses In the presentstudy we performed BLAST searches to identify honey beeorthologs for a list of 32 fly genes with the GO attributelsquooogenesisrsquo and for four genes specifically related tolsquovitellogenesisrsquo The list for fly genes involved in nuclearevents in germ cells consisted of 20 genes for lsquofemalemeiosisrsquo 12 genes for lsquorecombinationrsquo and 21 genes underthe heading lsquochromosome segregation including segregation

distortionrsquo (Supplementary material Table 3S) In some casesthese GO attributes for fly genes of course overlapped

BLASTN and BLASTX searches for these fly genesagainst the honey bee genome assembly 30 and theGLEAN3 Official Set (aa) retrieved statistically well sup-ported putative bee orthologs for most of these candidatesFor the genes involved in meiosis recombination and chro-mosome segregation this finding although not unexpectedis of interest as meiosis in the haploid honey bee drone isstrongly modified when compared with a normal diploidmeiosis The first meiosis is initiated but the nucleusremains undivided and only the superfluous centrioles areeliminated as cytoplasmatic buds (Hoage amp Kessel 1968)An interesting gene thelytoky (th) has recently beenmapped in this context (Lattorff et al 2005) It preventsalmost completely meiotic recombination in the automixisof laying workers of the Cape honey bee As an indicationof the interplay between meiosis and later developmentthis locus also appears to be an integral part of variousgene cascades involved in caste determination (Lattorffet al 2006)

The fly genes retrieved in the GO searches for lsquooogenesisrsquorepresent a much larger range of Molecular Functioncategories such as transcription factors proteins regulating

Figure 4 Map of the group-specific motifs found in queen and worker UCRs of caste-specifically expressed genes The coding region is represented by the GLEAN3 prediction number (assembly 40) with arrows indicating the translation start site Asterisks mark UCRs of the lsquotop10rsquo set used to find the over-represented motifs

710 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

translation by RNA binding RNA helicases enzymes(ubiquitination transfer of sugar residues sulfotransferase)GTPase activity and several factors binding to cytoskeletalproteins (Supplementary material Table 3S) This widerange of functional categories is expected as these genesare involved in a series of different steps during oogenesisin the polytrophic meroistic ovary Oogenesis starts out withthe maintenance of germline and somatic stem cell identityin the germline niche in the upper germarium A key geneinvolved in this process is pumilio (Forbes amp Lehmann1998) which is represented by a highly conserved beeortholog GB10504-PA The second step is the formation ofgerm cell cysts the determination of an oocyte within eachcyst and the survival of these cysts involving genes suchas benign gonial cell neoplasm (Lin et al 1994) encore(Hawkins et al 1996) ovo and ovarian tumour (otu) (Staabamp Steinmann-Zwicky 1995) all well conserved in the honeybee genome Interestingly we could not find a clear beeortholog for bag of marbles (bam) which is one of the primeearly response genes in the cystoblast differentiationpathway in Drosophila (McKearin 1997)

The third step comprises previtellogenic growth of thefollicle and during these stages a number of maternalfactors are deposited and anchored either within the oocyteor in the perivitelline space that define the egg and thefuture embryonic axes (for review see St Johnston ampNuumlsslein-Volhard (1992) In the list of Drosophila genesinvolved in early steps of axis determination a couple ofsurprises came up in the search for honey bee orthologsA big surprise was that we could not find a gurken orthologin the bee even though this gene sets up both the anteriorndashposterior and dorsalndashventral axes in the Drosophila egg(Gonzaacutelez-Reyes et al 1995) whereas downstreamcomponents of the Gurken signalling cascade appear tobe preserved in the bee genome (Honey Bee GenomeSequencing Consortium 2006) Similar apparent gaps inconstituents of patterning cascades were noted for theterminal regions of the embryo such as a lack of a torsoortholog whereas its ligand torso-like is represented by awell conserved ortholog in the bee (GB18663-PA)

With respect to genes involved in the final processes ofoogenesis we primarily looked at genes that play a partduring vitellogenesis There are four genes of interest in thisclass the primary one coding for the yolk protein precursorvitellogenin This gene has already been sequenced for thehoney bee (Piulachs et al 2003) and as expected it ismuch more related to vitellogenins of other insects andeven vertebrates than to the Drosophila yolk proteinswhich apparently are derived from lipases (Hagedorn et al1998) The second gene of interest is the bee ortholog toyolkless as this (GB16571-PA) could represent a putativevitellogenin receptor The other two Drosophila genes withclear orthologs in the bee are CG18641 and CG12139which code for a lipase and an LDL receptor respectively

General conclusions

The current analysis made use of previous experimentalanalyses on differential transcription during caste develop-ment of honey bee larvae In the annotation of these geneswhich includes references to Gene Ontology terms asso-ciated with their respective Drosophila orthologs two majorconfigurations emerged First of all worker genes werebetter defined in terms of GO attributes compared with therelatively large number of queen genes that had no GOterms associated to their respective Drosophila orthologsEven when taking into consideration the conceptual limitsin attributing GO terms on molecular function and biologicalprocess from Drosophila orthologs to honey bee genesthis finding could have a bearing on general basicquestions in socioevolution namely which caste is moredivergent from a nonsocial reproductive female bee proto-type or reproductive ground plan the queen or the workerLess speculative is the second major conclusion comingout of the GO analysis for Molecular Function showing andconfirming (Eder et al 1983 Corona et al 1999) theimportant role of metabolic regulation in caste developmentThis facet is demonstrated especially clearly in thecaste-specific expression of oxidoreductases (queen)vs hydrolases (workers)

The honey bee genome information provided not only amuch improved annotation platform for caste-specificallyexpressed ESTs but even more so opens the possibilityto explore putative regulatory features of the honey beegenome In the current study we employed modified Gibbssampling and expectation-maximization algorithms (Alig-nACE MDScan MEME) to detect group-specific motifs ingene regions up to 1000 bp upstream of translation startsites We detected 14 motifs that were significantly over-represented in the caste genes when compared withcorresponding motifs found in a random set of GLEAN3-predicted honey bee genes The localization of suchmotifs in UCRs of worker-overexpressed genes revealeda clustering of such motifs close to the predicted basalpromotor regions suggesting strong regulatory effectsSuch search strategies and the detected motifs can providethe lead to reveal and unravel cis-regulatory networks forand within specific contexts of honey bee biology

Caste polyphenism in social insects makes a strong casefor the emergence of novelties at a microevolutionary level(West-Eberhard 2003) Without the pretension to discussexhaustively the mechanisms underlying this surge ofdevelopmental plasticity two major themes becomeapparent in this and other studies Regulatory change hasbeen demonstrated in the shut-down of wing disc pattern-ing cascades in ants (Abouheif amp Wray 2002) and iscertainly also implicit in observed temporal changes ingene expression during postembryonic development ofants and bumble bees (Goodisman et al 2005 Pereboom

Genomics of honey bee caste development and reproduction 711

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

et al 2005) Such change would be expected to involvecis-regulatory elements that is change in transcriptionfactor binding sites in UCRs as approached in this studyand also evolutionary change in response thresholds tocirculating morphogenetic hormones (for review see Hart-felder amp Emlen 2005) The second and quite unexpectedtheme is the acquisition of new systemic functions byevolutionary rather old proteins such as vitellogenin andhexamerins These apparently unspectacular proteinshave evolved into key players for caste evolution and repro-ductive division of labour via novel regulatory connectivitywith JH (Amdam et al 2004 Guidugli et al 2005 Zhouet al 2006ab)

Experimental procedures

Selection and annotation of ESTs representing differentially expressed genes in honey bee caste development

The starting point were 164 entries (mainly 3prime-ESTs) in GenBank(BG101532ndashBG101697) from an SSH library (Evans amp Wheeler1999 Evans amp Wheeler 2000) When validated by macroarrayanalyses a clustering into three major classes became apparent(I) genes overexpressed in young larvae (II) genes overexpressedin last instar queen larvae and (III) genes overexpressed in lastinstar worker larvae For this study we excluded the class I ESTsbecause their expression is not caste-specific but rather representsexpression differences between young (still bipotent) and olderlarvae To the class II queen ESTs (82) we added one completecDNA entry (AY601642) from a DDRT-PCR screen (Corona et al1999) and to the class III set of worker ESTs (40) we added sevenGenBank dbEST entries (BG149167ndashBG149173) from a DDRT-PCR screen on ovary development (Hepperle amp Hartfelder 2001)

The EST sequences were submitted to BLASTN searches(parameters -G 2 -E 3 -W 15 -F lsquom Drsquo -U -e 1e-20) against genomesequence assembly Amel_v30 to retrieve matches in linked orunlinked genomic contigs and to exclude no-matches (seven ESTsin queen) ESTs that aligned within the same scaffold werechecked for clustering and overlap This clustering also served toexclude genes that were represented by non-overlapping ESTsfrom both castes This procedure generated a set of 51 uniqueputative gene sequences overexpressed in either queen (34) or inworker larvae (17) These 51 nonredundant sequences weresubmitted to BLASTX searches against the Official Set of GLEAN3-predicted protein sequences (cut-off value at 1eminus20) For ESTs withno significant protein sequence matches the genomic regionsadjacent to the mapped EST were searched to find neighbouringORFs especially those nearest to putative 3prime UTRs of predictedproteins as the EST libraries have a bias in this direction

Official Set protein sequences were aligned against Amel_v30sequence assembly using TBLASTN to map protein to genomeand subsequently they were aligned using BLASTP against theGenBank nonredundant (nr) and the Flybase protein sequencedatabases The manual features annotation procedure of theArtemis 70 program (Rutherford et al 2000) was used to mapORFs putative splice sites of exons and ESTs to genome coordi-nates The final annotation file was generated with a Python scriptin GFF format (httpwwwsangeracukSoftwareformatsGFF)

Honey bee sequences annotated as orthologs to D mela-nogaster genes were putatively assigned the GO terms listed in

the respective Flybase entry In addition the definition of new GOterms (Biological Process ontology) related to caste developmentand polyphenism (GO0048651 and GO0048650 respectively)was co-ordinated with the Gene Ontology Consortium (Ashburneret al 2000) The FatiGO web tool (Al-Sharour et al 2004) wasused to cluster GO terms (level 3 setting) for Biological Processand Molecular Function

For the detection of conserved domains the 51 proteinsequences were screened against the Pfam database (httpwwwsangeracukSoftwarePfam) using the HMMER platform(current release 232 httphmmerwustledu) with a cut-offvalue set at 1eminus10

Annotation of oogenesis and reproduction genes

In order to identify putative honey bee orthologs to D melanogastergenes we searched the following GO terms in Flybase lsquooogenesisrsquo(GO0009993) lsquovitellogenesisrsquo (GO0007296) lsquofemale meiosisrsquo(GO0007143) lsquoDNA recombinationrsquo (GO0006310) and lsquochromo-some segregationrsquo (GO0007059) Genes related to segregationdistortion were searched for in Flybase in phenotypic descriptionsand mutant effects of D melanogaster genes as this phenomenonis not represented by a GO term Hence this group may be moreheterogeneous than the others From this list we removed genesof pleiotropic function (multifaceted GO entries in Biological Process)and genes that lacked defined transcripts in the Drosophilagenome database

For the GO terms lsquooogenesisrsquo and lsquovitellogenesisrsquo we performedTBLASTN and BLASTP searches for 42 fruit fly genes against theAmel_v30 genome assembly and the GLEAN3-predicted proteinsequences (Honey Bee Genome Sequencing Consortium 2006)respectively The orthologous D melanogaster gene was charac-terized by the same procedure as described above (reciprocalbest hit) For the GO terms lsquofemale meiosisrsquo lsquoDNA recombinationrsquolsquochromosome segregationrsquo and the non-GO group lsquosegregationdistortionrsquo transcripts of D melanogaster were searched againstnr databases at NCBI using BLASTP The obtained sequenceswere searched against the Amel_v30 genome assembly and theGLEAN3-predicted protein sequences using TBLASTN Homolo-gous sequences (threshold 1eminus10) were predicted using the BioEditsoftware (Hall 1999) ORFs showing significant homology (BLASTPthreshold 1eminus20) were assembled and used in BLASTP searchesagainst the nr databases at NCBI

Motif search in upstream regions in caste-specifically expressed genes

In order to detect overrepresented motifs in the upstream controlregions (UCRs) of the two sets of caste-related genes we selectedgene subsets based on two criteria (1) those that had shown thehighest caste-specificity in the array analyses (Evans amp Wheeler2000) and (2) those that had a conserved 5prime region when comparedwith the Drosophila orthologs These lsquotop10rsquo genes consisted of sixqueen genes (GB13072 GB11628 GB19380 GB14798 GB16047and GB18242) and of six worker genes (GB10869 GB12371GB12239 GB10428 GB19006 GB14758) The motif search wasconducted separately on the two sets of UCR sequences using threemethods AlignAce (Roth et al 1998) MEME (Bailey amp Elkan1995) and MDscan (Liu et al 2002) Default parameters valueswere used in all searches except that GC content in intergenicregions was set to 25 representing the background value estab-lished for the honey bee UCR database generated in this study

712 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

The database containing 10156 UCR sequences was gener-ated by parsing the Official Set annotation file (downloaded in GFFformat from httpwwwbeegenomehgscbcmtmcedubeeftphtml)to extract upstream regions starting from the terminal 5prime-genomiccoordinate of each predicted CDS The UCRs were arbitrarily setto a size frame of 1000 nucleotides (Roth et al 1998) but weretrimmed whenever another predicted ORF was detected in any ofthese regions

The MAP (maximum a priori log likelihood) score group specificityscore (called Church score in this manuscript) (Hughes et al2000) ROC AUC (area under the curve for a receiver-operatorcharacteristic plot) metric and MNCP (mean normalized condi-tional probability) metric (Clarke amp Granek 2003) were used todetect motifs that most likely correspond to biologically significantcis-regulatory elements The filters ran on the UCRs of the subsetsof queen and worker genes were a MAP score cut-off value of 50followed by a ROC AUC cut-off at 07 followed by a group specificityscore cut-off at 1eminus05 The UCR database of all GLEAN3 predictedhoney bee genes was used as the background to calculatethese metrics

A parametric test (MANOVA) and a nonparametric test (KolmogorovndashSmirnov) were conducted to identify significance levels for the twosets of filtered motifs found in the UCRs of caste-specificallyexpressed genes against filtered motifs found in the random UCRset Random motifs were sampled from a motif database (10 391motifs) generated by running our script 100 times with a randomsets of UCR sequences

The main criterion for identifying known regulatory motifs amongthese caste-specific ones was the alignment of the PSSM for eachbee motif with the Drosophila melanogaster sequences in theTRANSFAC database (release 40) (Wingender et al 2000) Onlythe alignments passing a threshold of 80 identity for each PSSMwere considered as significant matches In addition we checkedfor a specific binding motif the EcRUSP motif (rGkTCAaTGamcy-3prime) known to function in the expression of genes responding tomorphogenetic hormone titres (Perera et al 2005)

Operating system and programming tools

An Ubuntu Linux (version Breezy) operating system was used toimplement all scripts and pipelines designed for annotation proce-dures and motif discovery The Python programming language(httpwwwpythonorg) Biopython (httpwwwbiopythonorg)and TAMO (Tools for Analysis of Motifs) packages (Gordon et al2005) were used in program design Other web applications werebuilt using the Zope application server (httpwwwzopeorg)hosted at httpzulufmrpuspbrbeelab

Acknowledgements

This work was supported by grants from FAPESP(Fundaccedilatildeo de Amparo a Pesquisa do Estado de SatildeoPaulo 9900719-6) and CNPq (Conselho Nacional deDesenvolvimento Cientiacutefico e Tecnoloacutegico 4729632-3-1)

References

Abouheif E and Wray GA (2002) Evolution of the genetic networkunderlying wing polyphenism in ants Science 297 249ndash252

Al-Sharour F Diaz-Uriarte R and Dopazo J (2004) FatiGO aweb tool for finding significant associations of Gene Ontologyterms with groups of genes Bioinformatics 20 578ndash580

Amdam GV Norberg K Hagen A and Omholt SW (2003)Social exploitation of vitellogenin Proc Natl Acad Sci USA 1001799ndash1802

Amdam GV Simotildees ZLP Hagen A Norberg K SchroderK Mikkelsen O et al (2004) Hormonal control of the yolkprecursor vitellogenin regulates immune function and longevityin honeybees Exp Gerontol 39 767ndash773

Antoniewski C OrsquoGrady MS Edmondson RG Lassieur SMand Benes H (1995) Characterization of an EcRUSP heter-odimer target site that mediates ecdysone responsiveness ofthe Drosophila Lsp-2 gene Mol Gen Genet 249 545ndash556

Arbeitsman MN Furlong EEM Imam F Johnson E NullBH Baker BS et al (2002) Gene expression during the lifecycle of Drosophila melanogaster Science 297 2270ndash2275

Ashburner M Ball CA Blake JA Botstein D Butler HCherry JM et al (2000) Gene Ontology tool for the unificationof biology Nature Genet 25 25ndash29

Bailey TL and Elkan D (1995) Unsupervised learning of multiplemotifs in biopolymers using expectation maximizationMachine Learning J 21 58ndash83

Bergem M Norberg K and Aamodt RM (2006) Long-termmaintenance of in vitro cultured honeybee (Apis mellifera)embryonic cells BMC Dev Biol 6 17 online

Boutros M Kiger AA Armknecht S Kerr K Hild M Koch Bet al (2004) Genome-wide RNAi analysis of growth and viabil-ity in Drosophila cells Science 303 832ndash835

Carsten LD Watts T and Markov TA (2005) Gene expressionpatterns accompanying a dietary switch in Drosophila mela-nogaster Mol Ecol 14 3203ndash3208

Clarke ND and Granek JA (2003) Rank order metrics for quan-tifying the association of sequence features with gene regula-tion Bioinformatics 19 212ndash218

Colombani J Bianchini L Layalle S Pondeville E Dauphin-Villemant C Antoniewski C et al (2005) Antagonisticactions of ecdysone and insulins determine final size inDrosophila Science 310 667ndash670

Corona M Estrada E and Zurita M (1999) Differential expressionof mitochondrial genes between queens and workers duringcaste determination in the honeybee Apis mellifera J Exp Biol202 929ndash938

Cunha AC Nascimento AM Guidugli KR Simotildees ZLPand Bitondi MMG (2005) Molecular cloning and expressionof a hexamerin cDNA from the honey bee J Insect Physiol 511135ndash1147

Darwin CR (1859) On the Origin of Species by Means of NaturalSelection John Murray London

Davidson EH (2001) Genomic Regulatory Systems Developmentand Evolution Academic Press San Diego

Eder J Kremer JP and Rembold H (1983) Correlation ofcytochrome c titer and respiration in Apis mellifera adaptiveresponses to caste determination defines workers intercastesand queens Comp Biochem Physiol B 76 703ndash716

Engels W (1974) Occurrence and significance of vitellogenins infemale castes of social Hymenoptera Am Zool 14 1229ndash1237

Evans JD and Wheeler DE (1999) Differential gene expressionbetween developing queens and workers in the honey beeApis mellifera Proc Natl Acad Sci USA 96 5575ndash5580

Genomics of honey bee caste development and reproduction 713

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

Evans JD and Wheeler DE (2000) Expression profiles duringhoneybee caste determination Genome Biol 2 1 6 pagesonline httpgenomebiologycom200021research00011

Forbes A and Lehmann R (1998) Nanos and Pumilio havecritical roles in the development and function of Drosophilagermline stem cells Development 125 679ndash690

Gonzaacutelez-Reyes A Elliot H and St Johnston D (1995)Polarization of both major body axes in Drosophila by gurken-torpedo signalling Nature 375 654ndash658

Goodisman MA Isoe J Wheeler DE and Wells MA (2005)Evolution of insect metamorphosis a microarray-based studyof larval and adult gene expression in the ant Camponotusfestinatus Evolution 59 858ndash870

Gordon DB Nekludova L McCallum S and Fraenkel E(2005) TAMO a flexible object-oriented framework for analys-ing transcriptional regulation using DNA-sequence motifsBioinformatics 21 3164ndash3165

Guidugli KR Hepperle C and Hartfelder K (2004) A memberof the short-chain dehydrogenasereductase (SDR) super-family is a target of the ecdysone response in honey bee (Apismellifera) caste development Apidologie 35 37ndash47

Guidugli KR Nascimento AM Amdam GV Barchuk AROmholt SW Simotildees ZLP and Hartfelder K (2005)Vitellogenin regulates hormonal dynamics in the worker casteof a eusocial insect FEBS Lett 579 4961ndash4965

Hagedorn HH Maddison DR and Tu Z (1998) The evolutionof vitellogenins cyclorrhaphan yolk proteins and relatedmolecules Adv Insect Physiol 27 335ndash384

Hall TA (1999) BioEdit a user-friendly biological sequence align-ment editor and analysis program for Windows 9598NTNucleic Acids Symp Ser 41 95ndash98

Hamilton WD (1964) The genetical theory of social behaviour Iamp II J Theor Biol 7(1ndash16) 17ndash52

Hartfelder K and Emlen DJ (2005) Endocrine control of insectpolyphenism In Comprehensive Insect Molecular Science(Gilbert LI Iatrou K and Gill S eds) Vol 3 pp 651ndash703Elsevier Oxford

Hartfelder K and Engels W (1998) Social insect polymorphismHormonal regulation of plasticity in development and repro-duction in the honeybee Curr Topics Dev Biol 40 45ndash77

Hawkins NC Thorpe J and Schuumlpbach T (1996) encore agene required for the regulation of germ line mitosis and oocytedifferentiation during Drosophila oogenesis Development 122281ndash290

Haydak MH (1970) Honey bee nutrition Annu Rev Entomol 15143ndash156

Hepperle C and Hartfelder K (2001) Differentially expressedregulatory genes in honey bee caste development Naturwis-senschaften 88 113ndash116

Hoage TR and Kessel RG (1968) An electron microscopicstudy of the process of differentiation during spermatogenesis inthe drone honey bee (Apis mellifera L) with special reference tocentriole replication and elimination J Ultrastruct Res 24 6ndash32

Honey Bee Genome Sequencing Consortium (2006) Insightsinto social insects from the genome of the honeybee Apismellifera Nature (in press)

Hughes JD Estep PW Tavazole S and Church GM (2000)Computational identification of cis-regulatory elementsassociated with groups of functionally related genes inSaccharomyces cervisiae J Mol Biol 296 1205ndash1214

Hunt GJ and Page RE (1995) Linkage map of the honey bee

Apis mellifera based on RAPD markers Genetics 139 1371ndash1382

Hunt JH Buck NA and Wheeler DE (2003) Storage proteinsin vespid wasps characterization developmental pattern andoccurrence in adults J Insect Physiol 49 785ndash794

Judice CC Carazzole MF Festa F Sogayar MC HartfelderK and Pereira GAG (2006) Gene expression profiles under-lying alternative caste phenotypes in a highly eusocial beeMelipona quadrifasciata Insect Mol Biol 15 33ndash44

Kerr WE (1950) Genetic determination of castes in the genusMelipona Genetics 35 143ndash152

Kropaacutecovaacute S and Haslbachovaacute H (1971) The influence ofqueenlessness and of unsealed brood on the development ofovaries in worker honeybees J Apicult Res 10 57ndash61

Lattorff HMG Moritz RFA and Fuchs S (2005) A single genedetermines thelytokous parthenogenesis in honey bee workers(Apis mellifera capensis) Heredity 94 533ndash537

Lattorff HMG Moritz RFA Solignac M and Crewe RM(2006) Control of reproductive dominance by the thelytokygene in honeybees Biol Lett (in press)

Li TR and White KP (2003) Tissue-specific gene expressionand ecdysone-regulated genomic networks in Drosophila DevCell 5 59ndash72

Lin H Yue L and Spradling AC (1994) The Drosophila fusomea germline-specific organelle contains membrane skeletalproteins and functions in cyst formation Development 120947ndash956

Liu XS Brutlag DL and Liu JS (2002) An algorithm for findingprotein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments Nature Biotechnol20 835ndash839

Liu Y Liu S Wei L Altman RB and Batzoglou S (2004)Eukaryotic regulatory element conservation analysis andidentification using comparative genomics Genome Res 14354ndash366

Martinez T Burmester T Veenstra JA and Wheeler DE(2001) Sequence and evolution of a hexamerin from the antCamponotus festinatus Insect Mol Biol 9 427ndash431

McKearin D (1997) The Drosophila fusome organelle biogenesisand germ cell differentiation if you build it hellip Bioessays 19147ndash152

Mirth C Truman JW and Riddiford LM (2005) The role of theprothoracic gland in determining critical weight for metamor-phosis in Drosophila melanogaster Curr Biol 15 1ndash12

Moritz RFA Kryger P and Allsopp MH (1996) Competition forroyalty in bees Nature 384 31

Page RE and Erickson EH (1988) Reproduction by worker honeybees (Apis mellifera L) Behav Ecol Sociobiol 23 117ndash126

Pereboom JJM Jordan WC Sumner S Hammond RL andBourke AFG (2005) Differential gene expression in queen-worker caste determination in bumble bees Proc R Soc B 2721145ndash1152

Perera SC Zheng S Feng QL Krell PJ Retnakaran A andPalli SR (2005) Heterodimerization of ecdysone receptor andultraspiracle on symmetric and asymmetric response elementsArch Insect Biochem Physiol 60 55ndash70

Piulachs MD Guidugli KR Barchuk AR Cruz J SimotildeesZLP and Belleacutes X (2003) The vitellogenin of the honeybeeApis mellifera Structural analysis of the cDNA and expressionstudies Insect Biochem Mol Biol 33 459ndash465

Pritsker M Liu YC Beer MA and Tavazole S (2004) Whole

714 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

genome discovery of transcription factor binding sites bynetwork-level conservation Genome Res 14 99ndash108

Rachinsky A Strambi C Strambi A and Hartfelder K (1990)Caste and metamorphosis ndash hemolymph titers of juvenilehormone and ecdysteroids in last instar honeybee larvae GenComp Endocr 79 31ndash38

Riddihough G and Pelham HRB (1987) An ecdysone responseelement in the Drosophila hsp27 promotor EMBO J 6 3729ndash3734

Roth FP Hughes JD Estep PW and Church GM (1998)Finding DNA regulatory motifs within unaligned noncodingsequences clustered by whole-genome mRNA quantificationNature Biotechnol 16 939ndash945

Rutherford K Parkhill J Crook J Horsnell T Rice PRajandream MA and Barrell B (2000) Artemis sequencevisualisation and annotation Bioinformatics 16 944ndash945

Schmidt-Capella IC and Hartfelder K (1998) Juvenile hormoneeffect on DNA synthesis and apoptosis in caste-specificdifferentiation of the larval honey bee (Apis mellifera L) ovaryJ Insect Physiol 44 385ndash391

Snodgrass RE (1956) Anatomy of the Honey Bee Cornell Uni-versity Press Ithaca

Solignac M Vautrin D Baudry E Mougel F Loiseau A andCornuet JM (2004) A microsatellite-based linkage map of thehoneybee Apis mellifera L Genetics 167 253ndash262

St Johnston D and Nuumlsslein-Volhard C (1992) The origin of pat-tern and polarity in the Drosophila embryo Cell 68 201ndash219

Staab S and Steinmann-Zwicky M (1995) Female germ cells ofDrosophila require zygotic ovo and otu product for survival inlarvae and pupae respectively Mech Dev 54 205ndash210

Sullivan AA and Thummel CS (2003) Temporal profiles ofnuclear receptor gene expression profiles reveal coordinatetranscriptional responses during Drosophila development MolEndocrinol 17 2125ndash2137

Wassermann WW and Sandelin A (2004) Applied bioinformaticsfor the identification of regulatory elements Nature Rev Genet5 267ndash287

West-Eberhard MJ (2003) Developmental Plasticity andEvolution Oxford University Press Oxford

Wheeler DE and Nijhout HF (2003) A perspective for under-standing the modes of juvenile hormone as a lipid signalingsystem Bioessays 25 994ndash1001

Wilson EO (1971) The Insect Societies Belknapp Press of Har-vard University Press Cambridge MA

Wingender E Chen X Hehl R Karas H Liebich I Matys Vet al (2000) TRANSFAC an integrated system for geneexpression regulation Nucleic Acids Res 28 316ndash319

Zhou X Oi FM and Scharf ME (2006a) Social exploitationof hexamerin RNAi reveals a major caste-regulatory factor intermites Proc Natl Acad Sci USA 103 4499ndash4504

Zhou X Tarver MR Bennett GW Oi FM and Scharf ME(2006b) Two hexamerin genes from the termite Reticulitermesflavipes sequence expression and proposed functions incaste regulation Gene 376 47ndash58

Zhu Z Shendure J and Church GM (2005) Discoveringfunctional transcription factor combinations in the human cellcycle Genome Res 15 848ndash855

Supplementary material

The following material is available for this article online

Table S1 Annotation results of caste-specifically expressedhoney bee genes

Table S2 Caste-specific motifs in UCRs of honey bee genes

Table S3 Annotation of honey bee orthologs to Drosophilamelanogaster genes involved in Reproduction

This material is available as part of the online article fromhttpwwwblackwell-synergycom

Page 2: Caste development and reproduction: a genome-wide analysis of hallmarks of insect eusociality

704

A S Cristino

et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society

Insect Molecular Biology

15

703ndash714

was initiated in the late nineties The main body of currentlyavailable data resulted from a cDNA library generated bysuppression subtractive hybridization (SSH) that con-trasted queen and worker larvae (Evans amp Wheeler 1999)Subsequent macroarray analyses (Evans amp Wheeler 2000)revealed a clustering of these expressed sequence tags(ESTs) into three distinct groups genes overexpressed inyoung (bipotent) larvae genes overexpressed in fifth-instarqueen larvae and genes overexpressed in fifth-instar workerlarvae A second study focusing on oxidative metabolismidentified a set of differentially expressed mitochondrialgenes (Corona

et al

1999) The third approach was a DDRT-PCR screen for hormone responsive genes to investigatethe mode of action of ecdysteroids in the differentiation of thelarval ovary (Hepperle amp Hartfelder 2001) Many of theseEST sets could not be properly annotated at that time eitherbecause of a limited number of fully sequenced insectgenomes or because the libraries contained large numbersof transcripts in 3

prime

-gene regions including poorly con-served untranslated regions (UTRs) The draft assembly forthe honey bee genome (Honey Bee Genome SequencingConsortium 2006) now permits a much more reliable anno-tation of this unique set of experimentally validated genes

Reproductive activity of honey bees is determined in atwo-step process The basic differences in reproductivecapacity between queen and workers manifest themselvesduring larval development by a wave of programmed celldeath that leads to the destruction of over 95 of the ovarioleprimordia in the larval ovary of workers (Schmidt-Capella ampHartfelder 1998) In the adult life cycle of each caste theco-ordinated flux of egg production through previtellogenicand vitellogenic growth will require the activity of other setsof genes Some of these act as determinants of the majoregg and also embryonic axes As the fruit fly is the most welldeveloped insect model for axis determination (St Johnstonamp Nuumlsslein-Volhard 1992) and maternal factors have notyet been functionally characterized in the honey bee search-ing the genome assembly (Honey Bee Genome Sequenc-ing Consortium 2006) provides the first major opportunityto explore putative patterning networks in honey bees

The vitellogenic growth phase of the honey bee oocytehas long been the centre of attention as a means ofdescribing differential fertility of the female castes (Engels1974) The synthesis of large amounts of vitellogenin by thequeen fat body is intimately related to her high reproductiverate The equally high vitellogenin titres in haemolymph ofnonreproducing young worker bees however have beenan enigma as their ovaries are inactive in the presence ofthe queen Vitellogenin expression has apparently becomeuncoupled from oocyte growth during the evolution of thesterile worker caste and has acquired secondary functionsIt became involved in the production of royal jelly (Amdam

et al

2003) and in the regulation of worker lifespan(Amdam

et al

2004) through an inhibitory effect on the

endocrine system (Guidugli

et al

2005) Along with suchunique life-history traits related to socially organized repro-duction honey bees also promise to answer new questionsinvolving meiosis as the honey bee genome exhibitsrecombination rates that exceed those of all other higherorganisms (Hunt amp Page 1995 Solignac

et al

2004) andas honey bee males being haploid forego meiosis I inproducing gametes

The honey bee genome sequence database (Honey BeeGenome Sequencing Consortium 2006) has become anextremely valuable resource not only for comparative genom-ics but also for functional genomics One of the oldest andfor evolutionary biologists most challenging question insocial insect biology is the development of a reproductiveand a nonreproductive caste (Darwin 1859) Apart from itsimplications on evolutionary theory in terms of kin selection(Hamilton 1964) this is essentially a question of howdevelopmental pathways diverge to shape distinct pheno-types and how oogenesis is regulated to achieve levels ofextremely high (queen) and extremely low (worker) fertility

The annotation of genes related to caste developmentand differential reproduction in the honey bee has impli-cations well beyond this species It represents the firstgenome-wide annotation of a molecular architecture behindreproductive division of labour In the light of current dis-cussions on the importance of alternative phenotypes inthe evolution of novelties (West-Eberhard 2003) the honeybee genome information is certainly one of the most valuableresources In the present manuscript we delineate a strategyon how to transcend from a straightforward gene annota-tion approach to functional studies based on motif analysisof upstream regulatory regions

Results and discussion

From caste to BLAST differentially expressed genes in caste development

The full list of genes that are overexpressed in fifth-instarqueen or worker larvae is made available online in theSupplementary material (Table 1S) This list includesscaffold number corresponding EST number(s) GLEAN3-predicted protein sequence similarity and identity indicesto corresponding

Drosophila melanogaster

orthologs aswell as protein domain information (Pfam)

A general result was that a relatively large subset ofgenes (nine of 34) overexpressed in honey bee queenlarvae is represented by putative

Drosophila

orthologs forwhich no Gene Ontology (GO) term for Biological Processis indicated in Flybase In contrast all worker genes corre-spond to functionally relatively well-defined

Drosophila

genes Even when taking into consideration the conceptuallimits in attributing GO terms on biological process from

Drosophila

orthologs to honey bee genes this finding couldhave a bearing on basic questions in socioevolution

Genomics of honey bee caste development and reproduction

705

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society

Insect Molecular Biology

15

703ndash714

namely which caste is the novelty the queen or theworker(s) Phrased in other terms the genome sequenceinformation now permits to address at a molecular levelquestions that are fundamental to understand the role of(and evolutionary trends in) ontogenetic processes thatstructure insect societies especially in hymenopteransSuch basic questions are (1) how many degrees offreedom (or release from constraints) may actually havebeen gained from splitting the functions normally performedby a solitary ancestral hymenopteran female into two ormore castes and (2) how was this release from constraintsintegrated into postembryonic differentiation processesto generate truly alternative phenotypes A second obser-vation of potential interest to functional genomics was thata relatively large subset of the caste-related genes mapsto chromosome 2 (seven of 51 unique sequences)

Most genes in the caste gene list are represented by oneor two EST hits except for a predicted

hexamerin 70b

gene(GB10869-PA) This gene was evidenced by 10 ESTs onein a 5

prime

-located exon and nine in the 3

prime

region (five ESTscomprising parts of exon 7 and parts of the 3

prime

-UTR theother four ESTs landing in exons 6 and 7) The macroarraydata (Evans amp Wheeler 2000) established this gene asoverexpressed in the worker caste Hexamerins are animportant class of storage proteins that show interestingexpression patterns related to caste and reproduction inmany social insects (Martinez

et al

2001 Hunt

et al

2003 Zhou

et al

2006ab) A cDNA encoding the honeybee Hexamerin 70b subunit has recently been cloned andsequenced (Cunha

et al

2005) and hormone manipula-tion experiments showed that the abundance of

hexamerin70b

transcripts in larval development is positively cor-related with high levels of JH and ecdysteroids This

could actually reflect a regulatory feedback function in JHtitre regulation as exemplified in the termite

Reticulitermesflavipes

where the Hex1Hex2 ratio controls JH availabilityfor caste-specifically differentiating tissues (Zhou

et al

2006b)

Within the honey bee caste genes for which GO infor-mation was imported and deduced from their

Drosophila

orthologs we noted a predominance of terms clusteringas lsquocellular physiological processrsquo (95 GO0050875) andlsquometabolismrsquo (90 GO0008152) in the lsquoBiological Processrsquo(GO0008150) category (Fig 1A) GO-statistics differencesbetween queens and workers became apparent in termsclustering as lsquocell differentiationrsquo (0 for queen and 285for workers GO0030154) and lsquometabolismrsquo (96 forqueen and 785 for worker GO0008152) in the lsquoBiologicalProcessrsquo (GO0008150) (Fig 2A)

With respect to lsquoMolecular Functionrsquo (GO0003674)most terms were related to mRNA translation (lsquonucleic acidbindingrsquo (38 GO0003676) lsquostructural constituent ofribosomersquo (24 GO0003735) lsquoprotein bindingrsquo (12GO0005515) lsquonucleotide bindingrsquo (12 GO0000166)lsquotranslation factor activity nucleic acid bindingrsquo (7GO0008135) Further important terms were lsquooxidoreductaseactivityrsquo (19 GO0016491) and lsquohydrolase activityrsquo(165 GO0016787) (Fig 1B) For these latter two termswe noted potentially interesting differences related to castewith lsquohydrolase activityrsquo being overrepresented by workertranscribed genes whereas lsquooxidoreductase activityrsquo wasexclusively represented by queen genes (Fig 2B) Eventhough these GO assignments on Molecular Function arebased on evidence from

D melanogaster

without experi-mental evidence for

Apis mellifera

the correspondinggenes are well conserved in sequence and show the

Figure 1 Dominant gene ontology terms for (A) Biological Process and (B) Molecular Function in honey bee genes with an experimentally validated caste-specific expression pattern during the last larval instar The graph was generated by a FatiGO analysis set at level 3 Frequencies indicate the appearance of GO terms in the total set of queen and worker differentially expressed genes

706

A S Cristino

et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society

Insect Molecular Biology

15

703ndash714

relevant protein domains (Supplementary materialTable 1S

)

and thus are indicative of functional trendsIn general terms the caste-specific separation into

metabolic pathway preferences oxidoreductases vshydrolases may reflect the switch in diet that a workerlarva experiences during the fourth and fifth larval instarThis represents a switch from a proteinlipid-rich diet toa more carbohydrate-rich diet (Haydak 1970) and thisswitch apparently is accompanied by an increase in theexpression of genes coding for proteins with hydrolaseactivity Similar switches in gene expression patternshave recently been reported for

D melanogaster

in anexperiment where larvae were shifted from a cornmealdiet to a banana diet (Carsten

et al

2005) resulting inthe up- or downregulation of 55 genes of a test populationof 6000 Among these are five genes with dehydrogenaseoxidoreductase activity These parallels in dietary switchresponses are indicative of conserved coregulated genenetworks An open question is of course how these can beco-opted to generate different phenotypes such as thecastes of social insects In this respect social insects clearlygo a big step beyond the simple metabolic switch responseseen in

Drosophila

They have apparently incorporateddivergent metabolic regulation into a network architectureconsistent with morphogenetic differentiation This requiredthat metabolic regulation became integrated throughthe endocrine system with developmental patterningprocesses

The importance of metabolic regulation on caste devel-opment has also come to light in a recent RepresentationalDifference Analysis (RDA) study on caste development inthe highly eusocial stingless bee

Melipona quadrifasciata

(Judice

et al

2006)

This is particularly interesting becausein this genus caste development is thought to be based ona genetic predisposition (Kerr 1950) Metabolic regulationmay thus be a

sine qua non

for caste development and

caste-specific metabolic pathways may be set in motionrather independently of the nature of the initial switch(nutritional or genetic) The question of how this metabolicswitch may integrate with the resultant endocrine signaturecharacteristic for each caste is still a widely open field butrecent studies in

Drosophila

showing an interaction betweenecdysone and insulin signalling in the determination ofbody size (Colombani

et al

2005 Mirth

et al

2005) mayprovide a lead

This is also the point to reflect on how justified it is toheuristically rely on

Drosophila

orthologs and to use their GOattributes in a developmental context (caste differentiation)that has no parallel in

Drosophila

A recent gene expressionprofiling study in the ant

Camponotus festinatus

employinga microarray set-up of 384 clones showed significantlydifferent expression levels for larval vs adult ants in 91genes (21 confirmed by qRTndashPCR) including an

Apishexamerin 70b

ortholog (Goodisman

et al

2005) Whencomparing the temporal expression patterns of these antgenes with expression profiles for their respective

Drosophila

orthologs (Arbeitsman

et al

2002) relatively little accordwas noted for the two species leading to the suggestionthat these genes may have taken on distinct functionsdue to the long divergence time between dipterans andhymenopterans (Goodisman

et al

2005) Differencesaside these examples show that in practically all studies onlarge-scale functional considerations in gene expression weare strongly wedded with

Drosophila

and even thoughfunctional divergence in orthologs may have occurredthere is little experimental gene-by-gene evidence availablefor any of the major insect orders outside of Diptera

Functional studies are clearly profiting from the nowavailable honey bee genome sequence as evident fromthe increasing number of RNAi experiments in honeybees(see citations in Honey Bee Genome Sequencing Consor-tium 2006) This is still a small number compared with

Figure 2 Gene Ontology categories with caste-specific expression patterns for Biological Process (A) Genes classified as part of cell differentiation processes are significantly overexpressed in workers whereas genes related to metabolism are overexpressed in queen larvae In the Molecular Function categories (B) we observed an apparent split indicating differential enzyme preferences in queens (overexpress oxidoreductases) and in workers (overexpress hydrolases) The graph was generated by a FatiGO analysis set at level 3 Frequencies indicate the appearance of GO terms in the queen (black bars) and worker differentially expressed genes (grey bars)

Genomics of honey bee caste development and reproduction

707

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society

Insect Molecular Biology

15

703ndash714

the large-scale RNAi assays established for

Drosophila

(Boutros

et al

2004) but the development of cell cultureapproaches in the honey bee (Bergem

et al

2006) repre-sents a step in this direction

Alternatively regulatory functional associations betweengenes and their integration into networks can be inferredfrom the presence of response elements in upstreamcontrol regions In our analysis of differentially expressedgenes in queen-worker development we took a bioinformaticsapproach for a first look into the molecular architecture of adevelopmental polyphenism

Motif search in upstream regions of differentially expressed genes

The genes related to caste development are among thefirst honey bee genes for which experimentally validatedexpression data were generated (Corona

et al

1999Evans amp Wheeler 1999 2000 Hepperle amp Hartfelder2001 Guidugli

et al

2004) Certainly these 51 genes donot comprise all the genes involved in caste developmentbut they are expected be prominent players as they werethe ones that stood out in the SSH and DDRT-PCRapproaches The 51 caste genes do not represent genefamilies but rather fall into many very different molecularfunction categories This made us ask whether theobserved overexpression pattern of different genes ineither queen or worker larvae may be associated with theoccurrence of specific regulatory motifs in the upstreamcontrol regions (UCR) of these genes

Three different algorithms AlignACE (Roth

et al

1998)MEME (Bailey amp Elkan 1995) and MDscan (Liu

et al

2002) were used to construct a pipeline for detectingoverrepresented motifs in the two unaligned sets of UCRsequences for the caste-specifically expressed genes Thispipeline was run on a lsquotop-10rsquo set of 12 genes (six for eachcaste) which showed the most pronounced caste differ-ences in expression (Evans amp Wheeler 2000) and alsoon a randomly selected set of UCRs (background control)We calculated four different metrics for each motif MAPscore (Roth

et al

1998) a group-specificity score (Churchscore) (Hughes

et al

2000) and a ROC AUC and MNCPmetric (Clarke amp Granek 2003) A first set of filters wasused to detect motifs with a potential for regulatory functions(MAP score

ge

5 ROC AUC

ge

07) This resulted in 46 motifsout of 123 total UCR motifs found in the queen UCR setand in 71 motifs out of 261 total found in the worker UCRset (Supplementary material Table 2S

)

A parametric statistical test (

MANOVA

P

= 00001Wilksrsquo = 078

F

= 72) and a nonparametric statistical test(KolmogorovndashSmirnov Table 1) on ROC AUC and MNCPindices showed that these two sets of filtered motifs aresignificantly different from a randomly selected set ofmotifs The rank-order metrics ROC AUC and MNCP havepreviously been used to compare the association of short

regulatory sequence features with gene expression data(microarray analyses on coregulated genes) and they havebeen useful in flagging false positives erroneously includedin lsquotop-10rsquo sets of differentially expressed genes (Clarke ampGranek 2003)

To select highly specific motifs found in each data set weused the group-specificity score (Church score

le

1e

minus

05

Hughes

et al

2000) to identify the most likely motifsinvolved in decision making for pathways leading to queen(two motifs Fig 3A) or to worker development (12 motifswith Church score

le

1e

minus

07

Fig 3B) As the SSH and DDRT-PCR approaches on caste development can be expectedto retrieve only a subpopulation of such genes these motifsrepresent only a partial scenario of the transcriptionalregulatory network underlying caste development Themotifs can now be used to screen other GLEAN3-predictedgenes to integrate a candidate list of putatively coregulatedgenes in caste development that can be submitted tofurther experimental validation

Each motif found in UCRs of queen (46) and worker (71)overexpressed genes was compared with the entire setof

D melanogaster cis

-regulatory motifs contained in theTRANSFAC database (version 40 Wingender

et al

2000)Only alignments passing 80 identity for each position-specific site matrix (PSSM) were considered as significantmatches Whereas none of the most specific motifs foreach caste showed similarity to any of the

D melanogaster

motifs some of the more ubiquitous ones did resemblebinding sites of transcription factors such as

AntennapediaUltrabithorax

zerknuumlllt

even skipped

trithorax-like

tailless

paired

fushi tarazu

and

Adh transcription factor 1

(Supple-mentary material Table 2S)

When we plotted the positions of the two queen and the12 worker motifs in the UCRs of the caste-specificallyexpressed genes (Fig 4) an interesting pattern emergedfor the worker-specific motifs Some of the worker motifsappeared to be clustered and occurring in tandem further-more they were positioned relatively close to the predictedtranslation start sites in some of the genes that are over-expressed during worker development (annotation resultsof these genes are listed in Supplementary materialTable 1S) A position close to the predicted translation start

Table 1 KolmogorovndashSmirnov analysis of ROC AUC and MNCP metric for statistical significance of putative regulatory motifs in upstream control regions of genes with queen or worker-specific expression patterns These motifs were contrasted with a random set of motifs detected in a random set of UCRs of GLEAN3-predicted honey bee genes

Group pairs ROC AUC MNCP

Random times (Queen + Worker) P gt 01 P lt 0001Random times Queen P gt 01 P lt 0005Random times Worker P gt 01 P lt 0001Queen times Worker P lt 01 P gt 01

708 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

sites is generally taken as a sign of strong regulatory effect(Davidson 2001)

As caste development is highly dependent on changesin haemolymph titres of JH and ecdysteroids we alsoscreened the UCRs of the differentially expressed genesfor putative nuclear receptor binding sites Regulatoryelements involved in the JH response are not well under-stood yet so any prediction in this direction would beelusive (Wheeler amp Nijhout 2003) Functional ecdysoneresponse elements (EcRE) have however been identifiedand it is now well established that the EcRUSP complexbinds to direct or inverted (palindromic) repeats (Riddi-hough amp Pelham 1987 Antoniewski et al 1995 Perera

et al 2005) A PSSM search (Wassermann amp Sandelin2004) based on a canonical representation (rGkTCAaT-Gamcy) (Perera et al 2005) did not reveal any putativeEcRE motif in the UCRs of the 51 caste-differentiallyexpressed genes However this does not rule out thatthese genes respond to changes in JH andor ecdysteroidtitres as these hormones require EcRUSP binding prima-rily in the expression of early genes but not necessarily forthe late response genes (Li amp White 2003 Sullivan ampThummel 2003)

In conclusion the predictions from such a combinedstrategy that searches for group-specific and for conservedregulatory motifs in GLEAN3 predicted honey bee genes

Figure 3 Putative regulatory motifs and their consensus sequences in UCRs of queen and worker overexpressed genes Scores for MAP Church ROC AUC and MNCP metrics indicate degree of group specificity and significance level

Genomics of honey bee caste development and reproduction 709

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

represents a major transition from nonhypothesis-drivenhigh-throughput screens to hypothesis-driven searches forcontext-dependent gene expression in honey bees Suchdirected search results can serve as a platform for experi-mental analyses of genome-wide integration in hormonalcontrol of caste development in bees In addition this studyexemplifies how existent algorithms for detecting sharedregulatory motifs can be joined into a toolkit for predictingcoregulated gene expression patterns in honey bees Thesemethods have been shown to be robust and are gainingacceptance for use in functional and comparative genomics(Liu et al 2004 Pritsker et al 2004 Zhu et al 2005)

Oogenesis and reproduction

As caste development sets the stage for reproductivedivision of labour genes involved in reproductive processesare strong candidates for functional analyses In the presentstudy we performed BLAST searches to identify honey beeorthologs for a list of 32 fly genes with the GO attributelsquooogenesisrsquo and for four genes specifically related tolsquovitellogenesisrsquo The list for fly genes involved in nuclearevents in germ cells consisted of 20 genes for lsquofemalemeiosisrsquo 12 genes for lsquorecombinationrsquo and 21 genes underthe heading lsquochromosome segregation including segregation

distortionrsquo (Supplementary material Table 3S) In some casesthese GO attributes for fly genes of course overlapped

BLASTN and BLASTX searches for these fly genesagainst the honey bee genome assembly 30 and theGLEAN3 Official Set (aa) retrieved statistically well sup-ported putative bee orthologs for most of these candidatesFor the genes involved in meiosis recombination and chro-mosome segregation this finding although not unexpectedis of interest as meiosis in the haploid honey bee drone isstrongly modified when compared with a normal diploidmeiosis The first meiosis is initiated but the nucleusremains undivided and only the superfluous centrioles areeliminated as cytoplasmatic buds (Hoage amp Kessel 1968)An interesting gene thelytoky (th) has recently beenmapped in this context (Lattorff et al 2005) It preventsalmost completely meiotic recombination in the automixisof laying workers of the Cape honey bee As an indicationof the interplay between meiosis and later developmentthis locus also appears to be an integral part of variousgene cascades involved in caste determination (Lattorffet al 2006)

The fly genes retrieved in the GO searches for lsquooogenesisrsquorepresent a much larger range of Molecular Functioncategories such as transcription factors proteins regulating

Figure 4 Map of the group-specific motifs found in queen and worker UCRs of caste-specifically expressed genes The coding region is represented by the GLEAN3 prediction number (assembly 40) with arrows indicating the translation start site Asterisks mark UCRs of the lsquotop10rsquo set used to find the over-represented motifs

710 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

translation by RNA binding RNA helicases enzymes(ubiquitination transfer of sugar residues sulfotransferase)GTPase activity and several factors binding to cytoskeletalproteins (Supplementary material Table 3S) This widerange of functional categories is expected as these genesare involved in a series of different steps during oogenesisin the polytrophic meroistic ovary Oogenesis starts out withthe maintenance of germline and somatic stem cell identityin the germline niche in the upper germarium A key geneinvolved in this process is pumilio (Forbes amp Lehmann1998) which is represented by a highly conserved beeortholog GB10504-PA The second step is the formation ofgerm cell cysts the determination of an oocyte within eachcyst and the survival of these cysts involving genes suchas benign gonial cell neoplasm (Lin et al 1994) encore(Hawkins et al 1996) ovo and ovarian tumour (otu) (Staabamp Steinmann-Zwicky 1995) all well conserved in the honeybee genome Interestingly we could not find a clear beeortholog for bag of marbles (bam) which is one of the primeearly response genes in the cystoblast differentiationpathway in Drosophila (McKearin 1997)

The third step comprises previtellogenic growth of thefollicle and during these stages a number of maternalfactors are deposited and anchored either within the oocyteor in the perivitelline space that define the egg and thefuture embryonic axes (for review see St Johnston ampNuumlsslein-Volhard (1992) In the list of Drosophila genesinvolved in early steps of axis determination a couple ofsurprises came up in the search for honey bee orthologsA big surprise was that we could not find a gurken orthologin the bee even though this gene sets up both the anteriorndashposterior and dorsalndashventral axes in the Drosophila egg(Gonzaacutelez-Reyes et al 1995) whereas downstreamcomponents of the Gurken signalling cascade appear tobe preserved in the bee genome (Honey Bee GenomeSequencing Consortium 2006) Similar apparent gaps inconstituents of patterning cascades were noted for theterminal regions of the embryo such as a lack of a torsoortholog whereas its ligand torso-like is represented by awell conserved ortholog in the bee (GB18663-PA)

With respect to genes involved in the final processes ofoogenesis we primarily looked at genes that play a partduring vitellogenesis There are four genes of interest in thisclass the primary one coding for the yolk protein precursorvitellogenin This gene has already been sequenced for thehoney bee (Piulachs et al 2003) and as expected it ismuch more related to vitellogenins of other insects andeven vertebrates than to the Drosophila yolk proteinswhich apparently are derived from lipases (Hagedorn et al1998) The second gene of interest is the bee ortholog toyolkless as this (GB16571-PA) could represent a putativevitellogenin receptor The other two Drosophila genes withclear orthologs in the bee are CG18641 and CG12139which code for a lipase and an LDL receptor respectively

General conclusions

The current analysis made use of previous experimentalanalyses on differential transcription during caste develop-ment of honey bee larvae In the annotation of these geneswhich includes references to Gene Ontology terms asso-ciated with their respective Drosophila orthologs two majorconfigurations emerged First of all worker genes werebetter defined in terms of GO attributes compared with therelatively large number of queen genes that had no GOterms associated to their respective Drosophila orthologsEven when taking into consideration the conceptual limitsin attributing GO terms on molecular function and biologicalprocess from Drosophila orthologs to honey bee genesthis finding could have a bearing on general basicquestions in socioevolution namely which caste is moredivergent from a nonsocial reproductive female bee proto-type or reproductive ground plan the queen or the workerLess speculative is the second major conclusion comingout of the GO analysis for Molecular Function showing andconfirming (Eder et al 1983 Corona et al 1999) theimportant role of metabolic regulation in caste developmentThis facet is demonstrated especially clearly in thecaste-specific expression of oxidoreductases (queen)vs hydrolases (workers)

The honey bee genome information provided not only amuch improved annotation platform for caste-specificallyexpressed ESTs but even more so opens the possibilityto explore putative regulatory features of the honey beegenome In the current study we employed modified Gibbssampling and expectation-maximization algorithms (Alig-nACE MDScan MEME) to detect group-specific motifs ingene regions up to 1000 bp upstream of translation startsites We detected 14 motifs that were significantly over-represented in the caste genes when compared withcorresponding motifs found in a random set of GLEAN3-predicted honey bee genes The localization of suchmotifs in UCRs of worker-overexpressed genes revealeda clustering of such motifs close to the predicted basalpromotor regions suggesting strong regulatory effectsSuch search strategies and the detected motifs can providethe lead to reveal and unravel cis-regulatory networks forand within specific contexts of honey bee biology

Caste polyphenism in social insects makes a strong casefor the emergence of novelties at a microevolutionary level(West-Eberhard 2003) Without the pretension to discussexhaustively the mechanisms underlying this surge ofdevelopmental plasticity two major themes becomeapparent in this and other studies Regulatory change hasbeen demonstrated in the shut-down of wing disc pattern-ing cascades in ants (Abouheif amp Wray 2002) and iscertainly also implicit in observed temporal changes ingene expression during postembryonic development ofants and bumble bees (Goodisman et al 2005 Pereboom

Genomics of honey bee caste development and reproduction 711

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

et al 2005) Such change would be expected to involvecis-regulatory elements that is change in transcriptionfactor binding sites in UCRs as approached in this studyand also evolutionary change in response thresholds tocirculating morphogenetic hormones (for review see Hart-felder amp Emlen 2005) The second and quite unexpectedtheme is the acquisition of new systemic functions byevolutionary rather old proteins such as vitellogenin andhexamerins These apparently unspectacular proteinshave evolved into key players for caste evolution and repro-ductive division of labour via novel regulatory connectivitywith JH (Amdam et al 2004 Guidugli et al 2005 Zhouet al 2006ab)

Experimental procedures

Selection and annotation of ESTs representing differentially expressed genes in honey bee caste development

The starting point were 164 entries (mainly 3prime-ESTs) in GenBank(BG101532ndashBG101697) from an SSH library (Evans amp Wheeler1999 Evans amp Wheeler 2000) When validated by macroarrayanalyses a clustering into three major classes became apparent(I) genes overexpressed in young larvae (II) genes overexpressedin last instar queen larvae and (III) genes overexpressed in lastinstar worker larvae For this study we excluded the class I ESTsbecause their expression is not caste-specific but rather representsexpression differences between young (still bipotent) and olderlarvae To the class II queen ESTs (82) we added one completecDNA entry (AY601642) from a DDRT-PCR screen (Corona et al1999) and to the class III set of worker ESTs (40) we added sevenGenBank dbEST entries (BG149167ndashBG149173) from a DDRT-PCR screen on ovary development (Hepperle amp Hartfelder 2001)

The EST sequences were submitted to BLASTN searches(parameters -G 2 -E 3 -W 15 -F lsquom Drsquo -U -e 1e-20) against genomesequence assembly Amel_v30 to retrieve matches in linked orunlinked genomic contigs and to exclude no-matches (seven ESTsin queen) ESTs that aligned within the same scaffold werechecked for clustering and overlap This clustering also served toexclude genes that were represented by non-overlapping ESTsfrom both castes This procedure generated a set of 51 uniqueputative gene sequences overexpressed in either queen (34) or inworker larvae (17) These 51 nonredundant sequences weresubmitted to BLASTX searches against the Official Set of GLEAN3-predicted protein sequences (cut-off value at 1eminus20) For ESTs withno significant protein sequence matches the genomic regionsadjacent to the mapped EST were searched to find neighbouringORFs especially those nearest to putative 3prime UTRs of predictedproteins as the EST libraries have a bias in this direction

Official Set protein sequences were aligned against Amel_v30sequence assembly using TBLASTN to map protein to genomeand subsequently they were aligned using BLASTP against theGenBank nonredundant (nr) and the Flybase protein sequencedatabases The manual features annotation procedure of theArtemis 70 program (Rutherford et al 2000) was used to mapORFs putative splice sites of exons and ESTs to genome coordi-nates The final annotation file was generated with a Python scriptin GFF format (httpwwwsangeracukSoftwareformatsGFF)

Honey bee sequences annotated as orthologs to D mela-nogaster genes were putatively assigned the GO terms listed in

the respective Flybase entry In addition the definition of new GOterms (Biological Process ontology) related to caste developmentand polyphenism (GO0048651 and GO0048650 respectively)was co-ordinated with the Gene Ontology Consortium (Ashburneret al 2000) The FatiGO web tool (Al-Sharour et al 2004) wasused to cluster GO terms (level 3 setting) for Biological Processand Molecular Function

For the detection of conserved domains the 51 proteinsequences were screened against the Pfam database (httpwwwsangeracukSoftwarePfam) using the HMMER platform(current release 232 httphmmerwustledu) with a cut-offvalue set at 1eminus10

Annotation of oogenesis and reproduction genes

In order to identify putative honey bee orthologs to D melanogastergenes we searched the following GO terms in Flybase lsquooogenesisrsquo(GO0009993) lsquovitellogenesisrsquo (GO0007296) lsquofemale meiosisrsquo(GO0007143) lsquoDNA recombinationrsquo (GO0006310) and lsquochromo-some segregationrsquo (GO0007059) Genes related to segregationdistortion were searched for in Flybase in phenotypic descriptionsand mutant effects of D melanogaster genes as this phenomenonis not represented by a GO term Hence this group may be moreheterogeneous than the others From this list we removed genesof pleiotropic function (multifaceted GO entries in Biological Process)and genes that lacked defined transcripts in the Drosophilagenome database

For the GO terms lsquooogenesisrsquo and lsquovitellogenesisrsquo we performedTBLASTN and BLASTP searches for 42 fruit fly genes against theAmel_v30 genome assembly and the GLEAN3-predicted proteinsequences (Honey Bee Genome Sequencing Consortium 2006)respectively The orthologous D melanogaster gene was charac-terized by the same procedure as described above (reciprocalbest hit) For the GO terms lsquofemale meiosisrsquo lsquoDNA recombinationrsquolsquochromosome segregationrsquo and the non-GO group lsquosegregationdistortionrsquo transcripts of D melanogaster were searched againstnr databases at NCBI using BLASTP The obtained sequenceswere searched against the Amel_v30 genome assembly and theGLEAN3-predicted protein sequences using TBLASTN Homolo-gous sequences (threshold 1eminus10) were predicted using the BioEditsoftware (Hall 1999) ORFs showing significant homology (BLASTPthreshold 1eminus20) were assembled and used in BLASTP searchesagainst the nr databases at NCBI

Motif search in upstream regions in caste-specifically expressed genes

In order to detect overrepresented motifs in the upstream controlregions (UCRs) of the two sets of caste-related genes we selectedgene subsets based on two criteria (1) those that had shown thehighest caste-specificity in the array analyses (Evans amp Wheeler2000) and (2) those that had a conserved 5prime region when comparedwith the Drosophila orthologs These lsquotop10rsquo genes consisted of sixqueen genes (GB13072 GB11628 GB19380 GB14798 GB16047and GB18242) and of six worker genes (GB10869 GB12371GB12239 GB10428 GB19006 GB14758) The motif search wasconducted separately on the two sets of UCR sequences using threemethods AlignAce (Roth et al 1998) MEME (Bailey amp Elkan1995) and MDscan (Liu et al 2002) Default parameters valueswere used in all searches except that GC content in intergenicregions was set to 25 representing the background value estab-lished for the honey bee UCR database generated in this study

712 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

The database containing 10156 UCR sequences was gener-ated by parsing the Official Set annotation file (downloaded in GFFformat from httpwwwbeegenomehgscbcmtmcedubeeftphtml)to extract upstream regions starting from the terminal 5prime-genomiccoordinate of each predicted CDS The UCRs were arbitrarily setto a size frame of 1000 nucleotides (Roth et al 1998) but weretrimmed whenever another predicted ORF was detected in any ofthese regions

The MAP (maximum a priori log likelihood) score group specificityscore (called Church score in this manuscript) (Hughes et al2000) ROC AUC (area under the curve for a receiver-operatorcharacteristic plot) metric and MNCP (mean normalized condi-tional probability) metric (Clarke amp Granek 2003) were used todetect motifs that most likely correspond to biologically significantcis-regulatory elements The filters ran on the UCRs of the subsetsof queen and worker genes were a MAP score cut-off value of 50followed by a ROC AUC cut-off at 07 followed by a group specificityscore cut-off at 1eminus05 The UCR database of all GLEAN3 predictedhoney bee genes was used as the background to calculatethese metrics

A parametric test (MANOVA) and a nonparametric test (KolmogorovndashSmirnov) were conducted to identify significance levels for the twosets of filtered motifs found in the UCRs of caste-specificallyexpressed genes against filtered motifs found in the random UCRset Random motifs were sampled from a motif database (10 391motifs) generated by running our script 100 times with a randomsets of UCR sequences

The main criterion for identifying known regulatory motifs amongthese caste-specific ones was the alignment of the PSSM for eachbee motif with the Drosophila melanogaster sequences in theTRANSFAC database (release 40) (Wingender et al 2000) Onlythe alignments passing a threshold of 80 identity for each PSSMwere considered as significant matches In addition we checkedfor a specific binding motif the EcRUSP motif (rGkTCAaTGamcy-3prime) known to function in the expression of genes responding tomorphogenetic hormone titres (Perera et al 2005)

Operating system and programming tools

An Ubuntu Linux (version Breezy) operating system was used toimplement all scripts and pipelines designed for annotation proce-dures and motif discovery The Python programming language(httpwwwpythonorg) Biopython (httpwwwbiopythonorg)and TAMO (Tools for Analysis of Motifs) packages (Gordon et al2005) were used in program design Other web applications werebuilt using the Zope application server (httpwwwzopeorg)hosted at httpzulufmrpuspbrbeelab

Acknowledgements

This work was supported by grants from FAPESP(Fundaccedilatildeo de Amparo a Pesquisa do Estado de SatildeoPaulo 9900719-6) and CNPq (Conselho Nacional deDesenvolvimento Cientiacutefico e Tecnoloacutegico 4729632-3-1)

References

Abouheif E and Wray GA (2002) Evolution of the genetic networkunderlying wing polyphenism in ants Science 297 249ndash252

Al-Sharour F Diaz-Uriarte R and Dopazo J (2004) FatiGO aweb tool for finding significant associations of Gene Ontologyterms with groups of genes Bioinformatics 20 578ndash580

Amdam GV Norberg K Hagen A and Omholt SW (2003)Social exploitation of vitellogenin Proc Natl Acad Sci USA 1001799ndash1802

Amdam GV Simotildees ZLP Hagen A Norberg K SchroderK Mikkelsen O et al (2004) Hormonal control of the yolkprecursor vitellogenin regulates immune function and longevityin honeybees Exp Gerontol 39 767ndash773

Antoniewski C OrsquoGrady MS Edmondson RG Lassieur SMand Benes H (1995) Characterization of an EcRUSP heter-odimer target site that mediates ecdysone responsiveness ofthe Drosophila Lsp-2 gene Mol Gen Genet 249 545ndash556

Arbeitsman MN Furlong EEM Imam F Johnson E NullBH Baker BS et al (2002) Gene expression during the lifecycle of Drosophila melanogaster Science 297 2270ndash2275

Ashburner M Ball CA Blake JA Botstein D Butler HCherry JM et al (2000) Gene Ontology tool for the unificationof biology Nature Genet 25 25ndash29

Bailey TL and Elkan D (1995) Unsupervised learning of multiplemotifs in biopolymers using expectation maximizationMachine Learning J 21 58ndash83

Bergem M Norberg K and Aamodt RM (2006) Long-termmaintenance of in vitro cultured honeybee (Apis mellifera)embryonic cells BMC Dev Biol 6 17 online

Boutros M Kiger AA Armknecht S Kerr K Hild M Koch Bet al (2004) Genome-wide RNAi analysis of growth and viabil-ity in Drosophila cells Science 303 832ndash835

Carsten LD Watts T and Markov TA (2005) Gene expressionpatterns accompanying a dietary switch in Drosophila mela-nogaster Mol Ecol 14 3203ndash3208

Clarke ND and Granek JA (2003) Rank order metrics for quan-tifying the association of sequence features with gene regula-tion Bioinformatics 19 212ndash218

Colombani J Bianchini L Layalle S Pondeville E Dauphin-Villemant C Antoniewski C et al (2005) Antagonisticactions of ecdysone and insulins determine final size inDrosophila Science 310 667ndash670

Corona M Estrada E and Zurita M (1999) Differential expressionof mitochondrial genes between queens and workers duringcaste determination in the honeybee Apis mellifera J Exp Biol202 929ndash938

Cunha AC Nascimento AM Guidugli KR Simotildees ZLPand Bitondi MMG (2005) Molecular cloning and expressionof a hexamerin cDNA from the honey bee J Insect Physiol 511135ndash1147

Darwin CR (1859) On the Origin of Species by Means of NaturalSelection John Murray London

Davidson EH (2001) Genomic Regulatory Systems Developmentand Evolution Academic Press San Diego

Eder J Kremer JP and Rembold H (1983) Correlation ofcytochrome c titer and respiration in Apis mellifera adaptiveresponses to caste determination defines workers intercastesand queens Comp Biochem Physiol B 76 703ndash716

Engels W (1974) Occurrence and significance of vitellogenins infemale castes of social Hymenoptera Am Zool 14 1229ndash1237

Evans JD and Wheeler DE (1999) Differential gene expressionbetween developing queens and workers in the honey beeApis mellifera Proc Natl Acad Sci USA 96 5575ndash5580

Genomics of honey bee caste development and reproduction 713

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

Evans JD and Wheeler DE (2000) Expression profiles duringhoneybee caste determination Genome Biol 2 1 6 pagesonline httpgenomebiologycom200021research00011

Forbes A and Lehmann R (1998) Nanos and Pumilio havecritical roles in the development and function of Drosophilagermline stem cells Development 125 679ndash690

Gonzaacutelez-Reyes A Elliot H and St Johnston D (1995)Polarization of both major body axes in Drosophila by gurken-torpedo signalling Nature 375 654ndash658

Goodisman MA Isoe J Wheeler DE and Wells MA (2005)Evolution of insect metamorphosis a microarray-based studyof larval and adult gene expression in the ant Camponotusfestinatus Evolution 59 858ndash870

Gordon DB Nekludova L McCallum S and Fraenkel E(2005) TAMO a flexible object-oriented framework for analys-ing transcriptional regulation using DNA-sequence motifsBioinformatics 21 3164ndash3165

Guidugli KR Hepperle C and Hartfelder K (2004) A memberof the short-chain dehydrogenasereductase (SDR) super-family is a target of the ecdysone response in honey bee (Apismellifera) caste development Apidologie 35 37ndash47

Guidugli KR Nascimento AM Amdam GV Barchuk AROmholt SW Simotildees ZLP and Hartfelder K (2005)Vitellogenin regulates hormonal dynamics in the worker casteof a eusocial insect FEBS Lett 579 4961ndash4965

Hagedorn HH Maddison DR and Tu Z (1998) The evolutionof vitellogenins cyclorrhaphan yolk proteins and relatedmolecules Adv Insect Physiol 27 335ndash384

Hall TA (1999) BioEdit a user-friendly biological sequence align-ment editor and analysis program for Windows 9598NTNucleic Acids Symp Ser 41 95ndash98

Hamilton WD (1964) The genetical theory of social behaviour Iamp II J Theor Biol 7(1ndash16) 17ndash52

Hartfelder K and Emlen DJ (2005) Endocrine control of insectpolyphenism In Comprehensive Insect Molecular Science(Gilbert LI Iatrou K and Gill S eds) Vol 3 pp 651ndash703Elsevier Oxford

Hartfelder K and Engels W (1998) Social insect polymorphismHormonal regulation of plasticity in development and repro-duction in the honeybee Curr Topics Dev Biol 40 45ndash77

Hawkins NC Thorpe J and Schuumlpbach T (1996) encore agene required for the regulation of germ line mitosis and oocytedifferentiation during Drosophila oogenesis Development 122281ndash290

Haydak MH (1970) Honey bee nutrition Annu Rev Entomol 15143ndash156

Hepperle C and Hartfelder K (2001) Differentially expressedregulatory genes in honey bee caste development Naturwis-senschaften 88 113ndash116

Hoage TR and Kessel RG (1968) An electron microscopicstudy of the process of differentiation during spermatogenesis inthe drone honey bee (Apis mellifera L) with special reference tocentriole replication and elimination J Ultrastruct Res 24 6ndash32

Honey Bee Genome Sequencing Consortium (2006) Insightsinto social insects from the genome of the honeybee Apismellifera Nature (in press)

Hughes JD Estep PW Tavazole S and Church GM (2000)Computational identification of cis-regulatory elementsassociated with groups of functionally related genes inSaccharomyces cervisiae J Mol Biol 296 1205ndash1214

Hunt GJ and Page RE (1995) Linkage map of the honey bee

Apis mellifera based on RAPD markers Genetics 139 1371ndash1382

Hunt JH Buck NA and Wheeler DE (2003) Storage proteinsin vespid wasps characterization developmental pattern andoccurrence in adults J Insect Physiol 49 785ndash794

Judice CC Carazzole MF Festa F Sogayar MC HartfelderK and Pereira GAG (2006) Gene expression profiles under-lying alternative caste phenotypes in a highly eusocial beeMelipona quadrifasciata Insect Mol Biol 15 33ndash44

Kerr WE (1950) Genetic determination of castes in the genusMelipona Genetics 35 143ndash152

Kropaacutecovaacute S and Haslbachovaacute H (1971) The influence ofqueenlessness and of unsealed brood on the development ofovaries in worker honeybees J Apicult Res 10 57ndash61

Lattorff HMG Moritz RFA and Fuchs S (2005) A single genedetermines thelytokous parthenogenesis in honey bee workers(Apis mellifera capensis) Heredity 94 533ndash537

Lattorff HMG Moritz RFA Solignac M and Crewe RM(2006) Control of reproductive dominance by the thelytokygene in honeybees Biol Lett (in press)

Li TR and White KP (2003) Tissue-specific gene expressionand ecdysone-regulated genomic networks in Drosophila DevCell 5 59ndash72

Lin H Yue L and Spradling AC (1994) The Drosophila fusomea germline-specific organelle contains membrane skeletalproteins and functions in cyst formation Development 120947ndash956

Liu XS Brutlag DL and Liu JS (2002) An algorithm for findingprotein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments Nature Biotechnol20 835ndash839

Liu Y Liu S Wei L Altman RB and Batzoglou S (2004)Eukaryotic regulatory element conservation analysis andidentification using comparative genomics Genome Res 14354ndash366

Martinez T Burmester T Veenstra JA and Wheeler DE(2001) Sequence and evolution of a hexamerin from the antCamponotus festinatus Insect Mol Biol 9 427ndash431

McKearin D (1997) The Drosophila fusome organelle biogenesisand germ cell differentiation if you build it hellip Bioessays 19147ndash152

Mirth C Truman JW and Riddiford LM (2005) The role of theprothoracic gland in determining critical weight for metamor-phosis in Drosophila melanogaster Curr Biol 15 1ndash12

Moritz RFA Kryger P and Allsopp MH (1996) Competition forroyalty in bees Nature 384 31

Page RE and Erickson EH (1988) Reproduction by worker honeybees (Apis mellifera L) Behav Ecol Sociobiol 23 117ndash126

Pereboom JJM Jordan WC Sumner S Hammond RL andBourke AFG (2005) Differential gene expression in queen-worker caste determination in bumble bees Proc R Soc B 2721145ndash1152

Perera SC Zheng S Feng QL Krell PJ Retnakaran A andPalli SR (2005) Heterodimerization of ecdysone receptor andultraspiracle on symmetric and asymmetric response elementsArch Insect Biochem Physiol 60 55ndash70

Piulachs MD Guidugli KR Barchuk AR Cruz J SimotildeesZLP and Belleacutes X (2003) The vitellogenin of the honeybeeApis mellifera Structural analysis of the cDNA and expressionstudies Insect Biochem Mol Biol 33 459ndash465

Pritsker M Liu YC Beer MA and Tavazole S (2004) Whole

714 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

genome discovery of transcription factor binding sites bynetwork-level conservation Genome Res 14 99ndash108

Rachinsky A Strambi C Strambi A and Hartfelder K (1990)Caste and metamorphosis ndash hemolymph titers of juvenilehormone and ecdysteroids in last instar honeybee larvae GenComp Endocr 79 31ndash38

Riddihough G and Pelham HRB (1987) An ecdysone responseelement in the Drosophila hsp27 promotor EMBO J 6 3729ndash3734

Roth FP Hughes JD Estep PW and Church GM (1998)Finding DNA regulatory motifs within unaligned noncodingsequences clustered by whole-genome mRNA quantificationNature Biotechnol 16 939ndash945

Rutherford K Parkhill J Crook J Horsnell T Rice PRajandream MA and Barrell B (2000) Artemis sequencevisualisation and annotation Bioinformatics 16 944ndash945

Schmidt-Capella IC and Hartfelder K (1998) Juvenile hormoneeffect on DNA synthesis and apoptosis in caste-specificdifferentiation of the larval honey bee (Apis mellifera L) ovaryJ Insect Physiol 44 385ndash391

Snodgrass RE (1956) Anatomy of the Honey Bee Cornell Uni-versity Press Ithaca

Solignac M Vautrin D Baudry E Mougel F Loiseau A andCornuet JM (2004) A microsatellite-based linkage map of thehoneybee Apis mellifera L Genetics 167 253ndash262

St Johnston D and Nuumlsslein-Volhard C (1992) The origin of pat-tern and polarity in the Drosophila embryo Cell 68 201ndash219

Staab S and Steinmann-Zwicky M (1995) Female germ cells ofDrosophila require zygotic ovo and otu product for survival inlarvae and pupae respectively Mech Dev 54 205ndash210

Sullivan AA and Thummel CS (2003) Temporal profiles ofnuclear receptor gene expression profiles reveal coordinatetranscriptional responses during Drosophila development MolEndocrinol 17 2125ndash2137

Wassermann WW and Sandelin A (2004) Applied bioinformaticsfor the identification of regulatory elements Nature Rev Genet5 267ndash287

West-Eberhard MJ (2003) Developmental Plasticity andEvolution Oxford University Press Oxford

Wheeler DE and Nijhout HF (2003) A perspective for under-standing the modes of juvenile hormone as a lipid signalingsystem Bioessays 25 994ndash1001

Wilson EO (1971) The Insect Societies Belknapp Press of Har-vard University Press Cambridge MA

Wingender E Chen X Hehl R Karas H Liebich I Matys Vet al (2000) TRANSFAC an integrated system for geneexpression regulation Nucleic Acids Res 28 316ndash319

Zhou X Oi FM and Scharf ME (2006a) Social exploitationof hexamerin RNAi reveals a major caste-regulatory factor intermites Proc Natl Acad Sci USA 103 4499ndash4504

Zhou X Tarver MR Bennett GW Oi FM and Scharf ME(2006b) Two hexamerin genes from the termite Reticulitermesflavipes sequence expression and proposed functions incaste regulation Gene 376 47ndash58

Zhu Z Shendure J and Church GM (2005) Discoveringfunctional transcription factor combinations in the human cellcycle Genome Res 15 848ndash855

Supplementary material

The following material is available for this article online

Table S1 Annotation results of caste-specifically expressedhoney bee genes

Table S2 Caste-specific motifs in UCRs of honey bee genes

Table S3 Annotation of honey bee orthologs to Drosophilamelanogaster genes involved in Reproduction

This material is available as part of the online article fromhttpwwwblackwell-synergycom

Page 3: Caste development and reproduction: a genome-wide analysis of hallmarks of insect eusociality

Genomics of honey bee caste development and reproduction

705

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society

Insect Molecular Biology

15

703ndash714

namely which caste is the novelty the queen or theworker(s) Phrased in other terms the genome sequenceinformation now permits to address at a molecular levelquestions that are fundamental to understand the role of(and evolutionary trends in) ontogenetic processes thatstructure insect societies especially in hymenopteransSuch basic questions are (1) how many degrees offreedom (or release from constraints) may actually havebeen gained from splitting the functions normally performedby a solitary ancestral hymenopteran female into two ormore castes and (2) how was this release from constraintsintegrated into postembryonic differentiation processesto generate truly alternative phenotypes A second obser-vation of potential interest to functional genomics was thata relatively large subset of the caste-related genes mapsto chromosome 2 (seven of 51 unique sequences)

Most genes in the caste gene list are represented by oneor two EST hits except for a predicted

hexamerin 70b

gene(GB10869-PA) This gene was evidenced by 10 ESTs onein a 5

prime

-located exon and nine in the 3

prime

region (five ESTscomprising parts of exon 7 and parts of the 3

prime

-UTR theother four ESTs landing in exons 6 and 7) The macroarraydata (Evans amp Wheeler 2000) established this gene asoverexpressed in the worker caste Hexamerins are animportant class of storage proteins that show interestingexpression patterns related to caste and reproduction inmany social insects (Martinez

et al

2001 Hunt

et al

2003 Zhou

et al

2006ab) A cDNA encoding the honeybee Hexamerin 70b subunit has recently been cloned andsequenced (Cunha

et al

2005) and hormone manipula-tion experiments showed that the abundance of

hexamerin70b

transcripts in larval development is positively cor-related with high levels of JH and ecdysteroids This

could actually reflect a regulatory feedback function in JHtitre regulation as exemplified in the termite

Reticulitermesflavipes

where the Hex1Hex2 ratio controls JH availabilityfor caste-specifically differentiating tissues (Zhou

et al

2006b)

Within the honey bee caste genes for which GO infor-mation was imported and deduced from their

Drosophila

orthologs we noted a predominance of terms clusteringas lsquocellular physiological processrsquo (95 GO0050875) andlsquometabolismrsquo (90 GO0008152) in the lsquoBiological Processrsquo(GO0008150) category (Fig 1A) GO-statistics differencesbetween queens and workers became apparent in termsclustering as lsquocell differentiationrsquo (0 for queen and 285for workers GO0030154) and lsquometabolismrsquo (96 forqueen and 785 for worker GO0008152) in the lsquoBiologicalProcessrsquo (GO0008150) (Fig 2A)

With respect to lsquoMolecular Functionrsquo (GO0003674)most terms were related to mRNA translation (lsquonucleic acidbindingrsquo (38 GO0003676) lsquostructural constituent ofribosomersquo (24 GO0003735) lsquoprotein bindingrsquo (12GO0005515) lsquonucleotide bindingrsquo (12 GO0000166)lsquotranslation factor activity nucleic acid bindingrsquo (7GO0008135) Further important terms were lsquooxidoreductaseactivityrsquo (19 GO0016491) and lsquohydrolase activityrsquo(165 GO0016787) (Fig 1B) For these latter two termswe noted potentially interesting differences related to castewith lsquohydrolase activityrsquo being overrepresented by workertranscribed genes whereas lsquooxidoreductase activityrsquo wasexclusively represented by queen genes (Fig 2B) Eventhough these GO assignments on Molecular Function arebased on evidence from

D melanogaster

without experi-mental evidence for

Apis mellifera

the correspondinggenes are well conserved in sequence and show the

Figure 1 Dominant gene ontology terms for (A) Biological Process and (B) Molecular Function in honey bee genes with an experimentally validated caste-specific expression pattern during the last larval instar The graph was generated by a FatiGO analysis set at level 3 Frequencies indicate the appearance of GO terms in the total set of queen and worker differentially expressed genes

706

A S Cristino

et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society

Insect Molecular Biology

15

703ndash714

relevant protein domains (Supplementary materialTable 1S

)

and thus are indicative of functional trendsIn general terms the caste-specific separation into

metabolic pathway preferences oxidoreductases vshydrolases may reflect the switch in diet that a workerlarva experiences during the fourth and fifth larval instarThis represents a switch from a proteinlipid-rich diet toa more carbohydrate-rich diet (Haydak 1970) and thisswitch apparently is accompanied by an increase in theexpression of genes coding for proteins with hydrolaseactivity Similar switches in gene expression patternshave recently been reported for

D melanogaster

in anexperiment where larvae were shifted from a cornmealdiet to a banana diet (Carsten

et al

2005) resulting inthe up- or downregulation of 55 genes of a test populationof 6000 Among these are five genes with dehydrogenaseoxidoreductase activity These parallels in dietary switchresponses are indicative of conserved coregulated genenetworks An open question is of course how these can beco-opted to generate different phenotypes such as thecastes of social insects In this respect social insects clearlygo a big step beyond the simple metabolic switch responseseen in

Drosophila

They have apparently incorporateddivergent metabolic regulation into a network architectureconsistent with morphogenetic differentiation This requiredthat metabolic regulation became integrated throughthe endocrine system with developmental patterningprocesses

The importance of metabolic regulation on caste devel-opment has also come to light in a recent RepresentationalDifference Analysis (RDA) study on caste development inthe highly eusocial stingless bee

Melipona quadrifasciata

(Judice

et al

2006)

This is particularly interesting becausein this genus caste development is thought to be based ona genetic predisposition (Kerr 1950) Metabolic regulationmay thus be a

sine qua non

for caste development and

caste-specific metabolic pathways may be set in motionrather independently of the nature of the initial switch(nutritional or genetic) The question of how this metabolicswitch may integrate with the resultant endocrine signaturecharacteristic for each caste is still a widely open field butrecent studies in

Drosophila

showing an interaction betweenecdysone and insulin signalling in the determination ofbody size (Colombani

et al

2005 Mirth

et al

2005) mayprovide a lead

This is also the point to reflect on how justified it is toheuristically rely on

Drosophila

orthologs and to use their GOattributes in a developmental context (caste differentiation)that has no parallel in

Drosophila

A recent gene expressionprofiling study in the ant

Camponotus festinatus

employinga microarray set-up of 384 clones showed significantlydifferent expression levels for larval vs adult ants in 91genes (21 confirmed by qRTndashPCR) including an

Apishexamerin 70b

ortholog (Goodisman

et al

2005) Whencomparing the temporal expression patterns of these antgenes with expression profiles for their respective

Drosophila

orthologs (Arbeitsman

et al

2002) relatively little accordwas noted for the two species leading to the suggestionthat these genes may have taken on distinct functionsdue to the long divergence time between dipterans andhymenopterans (Goodisman

et al

2005) Differencesaside these examples show that in practically all studies onlarge-scale functional considerations in gene expression weare strongly wedded with

Drosophila

and even thoughfunctional divergence in orthologs may have occurredthere is little experimental gene-by-gene evidence availablefor any of the major insect orders outside of Diptera

Functional studies are clearly profiting from the nowavailable honey bee genome sequence as evident fromthe increasing number of RNAi experiments in honeybees(see citations in Honey Bee Genome Sequencing Consor-tium 2006) This is still a small number compared with

Figure 2 Gene Ontology categories with caste-specific expression patterns for Biological Process (A) Genes classified as part of cell differentiation processes are significantly overexpressed in workers whereas genes related to metabolism are overexpressed in queen larvae In the Molecular Function categories (B) we observed an apparent split indicating differential enzyme preferences in queens (overexpress oxidoreductases) and in workers (overexpress hydrolases) The graph was generated by a FatiGO analysis set at level 3 Frequencies indicate the appearance of GO terms in the queen (black bars) and worker differentially expressed genes (grey bars)

Genomics of honey bee caste development and reproduction

707

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society

Insect Molecular Biology

15

703ndash714

the large-scale RNAi assays established for

Drosophila

(Boutros

et al

2004) but the development of cell cultureapproaches in the honey bee (Bergem

et al

2006) repre-sents a step in this direction

Alternatively regulatory functional associations betweengenes and their integration into networks can be inferredfrom the presence of response elements in upstreamcontrol regions In our analysis of differentially expressedgenes in queen-worker development we took a bioinformaticsapproach for a first look into the molecular architecture of adevelopmental polyphenism

Motif search in upstream regions of differentially expressed genes

The genes related to caste development are among thefirst honey bee genes for which experimentally validatedexpression data were generated (Corona

et al

1999Evans amp Wheeler 1999 2000 Hepperle amp Hartfelder2001 Guidugli

et al

2004) Certainly these 51 genes donot comprise all the genes involved in caste developmentbut they are expected be prominent players as they werethe ones that stood out in the SSH and DDRT-PCRapproaches The 51 caste genes do not represent genefamilies but rather fall into many very different molecularfunction categories This made us ask whether theobserved overexpression pattern of different genes ineither queen or worker larvae may be associated with theoccurrence of specific regulatory motifs in the upstreamcontrol regions (UCR) of these genes

Three different algorithms AlignACE (Roth

et al

1998)MEME (Bailey amp Elkan 1995) and MDscan (Liu

et al

2002) were used to construct a pipeline for detectingoverrepresented motifs in the two unaligned sets of UCRsequences for the caste-specifically expressed genes Thispipeline was run on a lsquotop-10rsquo set of 12 genes (six for eachcaste) which showed the most pronounced caste differ-ences in expression (Evans amp Wheeler 2000) and alsoon a randomly selected set of UCRs (background control)We calculated four different metrics for each motif MAPscore (Roth

et al

1998) a group-specificity score (Churchscore) (Hughes

et al

2000) and a ROC AUC and MNCPmetric (Clarke amp Granek 2003) A first set of filters wasused to detect motifs with a potential for regulatory functions(MAP score

ge

5 ROC AUC

ge

07) This resulted in 46 motifsout of 123 total UCR motifs found in the queen UCR setand in 71 motifs out of 261 total found in the worker UCRset (Supplementary material Table 2S

)

A parametric statistical test (

MANOVA

P

= 00001Wilksrsquo = 078

F

= 72) and a nonparametric statistical test(KolmogorovndashSmirnov Table 1) on ROC AUC and MNCPindices showed that these two sets of filtered motifs aresignificantly different from a randomly selected set ofmotifs The rank-order metrics ROC AUC and MNCP havepreviously been used to compare the association of short

regulatory sequence features with gene expression data(microarray analyses on coregulated genes) and they havebeen useful in flagging false positives erroneously includedin lsquotop-10rsquo sets of differentially expressed genes (Clarke ampGranek 2003)

To select highly specific motifs found in each data set weused the group-specificity score (Church score

le

1e

minus

05

Hughes

et al

2000) to identify the most likely motifsinvolved in decision making for pathways leading to queen(two motifs Fig 3A) or to worker development (12 motifswith Church score

le

1e

minus

07

Fig 3B) As the SSH and DDRT-PCR approaches on caste development can be expectedto retrieve only a subpopulation of such genes these motifsrepresent only a partial scenario of the transcriptionalregulatory network underlying caste development Themotifs can now be used to screen other GLEAN3-predictedgenes to integrate a candidate list of putatively coregulatedgenes in caste development that can be submitted tofurther experimental validation

Each motif found in UCRs of queen (46) and worker (71)overexpressed genes was compared with the entire setof

D melanogaster cis

-regulatory motifs contained in theTRANSFAC database (version 40 Wingender

et al

2000)Only alignments passing 80 identity for each position-specific site matrix (PSSM) were considered as significantmatches Whereas none of the most specific motifs foreach caste showed similarity to any of the

D melanogaster

motifs some of the more ubiquitous ones did resemblebinding sites of transcription factors such as

AntennapediaUltrabithorax

zerknuumlllt

even skipped

trithorax-like

tailless

paired

fushi tarazu

and

Adh transcription factor 1

(Supple-mentary material Table 2S)

When we plotted the positions of the two queen and the12 worker motifs in the UCRs of the caste-specificallyexpressed genes (Fig 4) an interesting pattern emergedfor the worker-specific motifs Some of the worker motifsappeared to be clustered and occurring in tandem further-more they were positioned relatively close to the predictedtranslation start sites in some of the genes that are over-expressed during worker development (annotation resultsof these genes are listed in Supplementary materialTable 1S) A position close to the predicted translation start

Table 1 KolmogorovndashSmirnov analysis of ROC AUC and MNCP metric for statistical significance of putative regulatory motifs in upstream control regions of genes with queen or worker-specific expression patterns These motifs were contrasted with a random set of motifs detected in a random set of UCRs of GLEAN3-predicted honey bee genes

Group pairs ROC AUC MNCP

Random times (Queen + Worker) P gt 01 P lt 0001Random times Queen P gt 01 P lt 0005Random times Worker P gt 01 P lt 0001Queen times Worker P lt 01 P gt 01

708 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

sites is generally taken as a sign of strong regulatory effect(Davidson 2001)

As caste development is highly dependent on changesin haemolymph titres of JH and ecdysteroids we alsoscreened the UCRs of the differentially expressed genesfor putative nuclear receptor binding sites Regulatoryelements involved in the JH response are not well under-stood yet so any prediction in this direction would beelusive (Wheeler amp Nijhout 2003) Functional ecdysoneresponse elements (EcRE) have however been identifiedand it is now well established that the EcRUSP complexbinds to direct or inverted (palindromic) repeats (Riddi-hough amp Pelham 1987 Antoniewski et al 1995 Perera

et al 2005) A PSSM search (Wassermann amp Sandelin2004) based on a canonical representation (rGkTCAaT-Gamcy) (Perera et al 2005) did not reveal any putativeEcRE motif in the UCRs of the 51 caste-differentiallyexpressed genes However this does not rule out thatthese genes respond to changes in JH andor ecdysteroidtitres as these hormones require EcRUSP binding prima-rily in the expression of early genes but not necessarily forthe late response genes (Li amp White 2003 Sullivan ampThummel 2003)

In conclusion the predictions from such a combinedstrategy that searches for group-specific and for conservedregulatory motifs in GLEAN3 predicted honey bee genes

Figure 3 Putative regulatory motifs and their consensus sequences in UCRs of queen and worker overexpressed genes Scores for MAP Church ROC AUC and MNCP metrics indicate degree of group specificity and significance level

Genomics of honey bee caste development and reproduction 709

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

represents a major transition from nonhypothesis-drivenhigh-throughput screens to hypothesis-driven searches forcontext-dependent gene expression in honey bees Suchdirected search results can serve as a platform for experi-mental analyses of genome-wide integration in hormonalcontrol of caste development in bees In addition this studyexemplifies how existent algorithms for detecting sharedregulatory motifs can be joined into a toolkit for predictingcoregulated gene expression patterns in honey bees Thesemethods have been shown to be robust and are gainingacceptance for use in functional and comparative genomics(Liu et al 2004 Pritsker et al 2004 Zhu et al 2005)

Oogenesis and reproduction

As caste development sets the stage for reproductivedivision of labour genes involved in reproductive processesare strong candidates for functional analyses In the presentstudy we performed BLAST searches to identify honey beeorthologs for a list of 32 fly genes with the GO attributelsquooogenesisrsquo and for four genes specifically related tolsquovitellogenesisrsquo The list for fly genes involved in nuclearevents in germ cells consisted of 20 genes for lsquofemalemeiosisrsquo 12 genes for lsquorecombinationrsquo and 21 genes underthe heading lsquochromosome segregation including segregation

distortionrsquo (Supplementary material Table 3S) In some casesthese GO attributes for fly genes of course overlapped

BLASTN and BLASTX searches for these fly genesagainst the honey bee genome assembly 30 and theGLEAN3 Official Set (aa) retrieved statistically well sup-ported putative bee orthologs for most of these candidatesFor the genes involved in meiosis recombination and chro-mosome segregation this finding although not unexpectedis of interest as meiosis in the haploid honey bee drone isstrongly modified when compared with a normal diploidmeiosis The first meiosis is initiated but the nucleusremains undivided and only the superfluous centrioles areeliminated as cytoplasmatic buds (Hoage amp Kessel 1968)An interesting gene thelytoky (th) has recently beenmapped in this context (Lattorff et al 2005) It preventsalmost completely meiotic recombination in the automixisof laying workers of the Cape honey bee As an indicationof the interplay between meiosis and later developmentthis locus also appears to be an integral part of variousgene cascades involved in caste determination (Lattorffet al 2006)

The fly genes retrieved in the GO searches for lsquooogenesisrsquorepresent a much larger range of Molecular Functioncategories such as transcription factors proteins regulating

Figure 4 Map of the group-specific motifs found in queen and worker UCRs of caste-specifically expressed genes The coding region is represented by the GLEAN3 prediction number (assembly 40) with arrows indicating the translation start site Asterisks mark UCRs of the lsquotop10rsquo set used to find the over-represented motifs

710 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

translation by RNA binding RNA helicases enzymes(ubiquitination transfer of sugar residues sulfotransferase)GTPase activity and several factors binding to cytoskeletalproteins (Supplementary material Table 3S) This widerange of functional categories is expected as these genesare involved in a series of different steps during oogenesisin the polytrophic meroistic ovary Oogenesis starts out withthe maintenance of germline and somatic stem cell identityin the germline niche in the upper germarium A key geneinvolved in this process is pumilio (Forbes amp Lehmann1998) which is represented by a highly conserved beeortholog GB10504-PA The second step is the formation ofgerm cell cysts the determination of an oocyte within eachcyst and the survival of these cysts involving genes suchas benign gonial cell neoplasm (Lin et al 1994) encore(Hawkins et al 1996) ovo and ovarian tumour (otu) (Staabamp Steinmann-Zwicky 1995) all well conserved in the honeybee genome Interestingly we could not find a clear beeortholog for bag of marbles (bam) which is one of the primeearly response genes in the cystoblast differentiationpathway in Drosophila (McKearin 1997)

The third step comprises previtellogenic growth of thefollicle and during these stages a number of maternalfactors are deposited and anchored either within the oocyteor in the perivitelline space that define the egg and thefuture embryonic axes (for review see St Johnston ampNuumlsslein-Volhard (1992) In the list of Drosophila genesinvolved in early steps of axis determination a couple ofsurprises came up in the search for honey bee orthologsA big surprise was that we could not find a gurken orthologin the bee even though this gene sets up both the anteriorndashposterior and dorsalndashventral axes in the Drosophila egg(Gonzaacutelez-Reyes et al 1995) whereas downstreamcomponents of the Gurken signalling cascade appear tobe preserved in the bee genome (Honey Bee GenomeSequencing Consortium 2006) Similar apparent gaps inconstituents of patterning cascades were noted for theterminal regions of the embryo such as a lack of a torsoortholog whereas its ligand torso-like is represented by awell conserved ortholog in the bee (GB18663-PA)

With respect to genes involved in the final processes ofoogenesis we primarily looked at genes that play a partduring vitellogenesis There are four genes of interest in thisclass the primary one coding for the yolk protein precursorvitellogenin This gene has already been sequenced for thehoney bee (Piulachs et al 2003) and as expected it ismuch more related to vitellogenins of other insects andeven vertebrates than to the Drosophila yolk proteinswhich apparently are derived from lipases (Hagedorn et al1998) The second gene of interest is the bee ortholog toyolkless as this (GB16571-PA) could represent a putativevitellogenin receptor The other two Drosophila genes withclear orthologs in the bee are CG18641 and CG12139which code for a lipase and an LDL receptor respectively

General conclusions

The current analysis made use of previous experimentalanalyses on differential transcription during caste develop-ment of honey bee larvae In the annotation of these geneswhich includes references to Gene Ontology terms asso-ciated with their respective Drosophila orthologs two majorconfigurations emerged First of all worker genes werebetter defined in terms of GO attributes compared with therelatively large number of queen genes that had no GOterms associated to their respective Drosophila orthologsEven when taking into consideration the conceptual limitsin attributing GO terms on molecular function and biologicalprocess from Drosophila orthologs to honey bee genesthis finding could have a bearing on general basicquestions in socioevolution namely which caste is moredivergent from a nonsocial reproductive female bee proto-type or reproductive ground plan the queen or the workerLess speculative is the second major conclusion comingout of the GO analysis for Molecular Function showing andconfirming (Eder et al 1983 Corona et al 1999) theimportant role of metabolic regulation in caste developmentThis facet is demonstrated especially clearly in thecaste-specific expression of oxidoreductases (queen)vs hydrolases (workers)

The honey bee genome information provided not only amuch improved annotation platform for caste-specificallyexpressed ESTs but even more so opens the possibilityto explore putative regulatory features of the honey beegenome In the current study we employed modified Gibbssampling and expectation-maximization algorithms (Alig-nACE MDScan MEME) to detect group-specific motifs ingene regions up to 1000 bp upstream of translation startsites We detected 14 motifs that were significantly over-represented in the caste genes when compared withcorresponding motifs found in a random set of GLEAN3-predicted honey bee genes The localization of suchmotifs in UCRs of worker-overexpressed genes revealeda clustering of such motifs close to the predicted basalpromotor regions suggesting strong regulatory effectsSuch search strategies and the detected motifs can providethe lead to reveal and unravel cis-regulatory networks forand within specific contexts of honey bee biology

Caste polyphenism in social insects makes a strong casefor the emergence of novelties at a microevolutionary level(West-Eberhard 2003) Without the pretension to discussexhaustively the mechanisms underlying this surge ofdevelopmental plasticity two major themes becomeapparent in this and other studies Regulatory change hasbeen demonstrated in the shut-down of wing disc pattern-ing cascades in ants (Abouheif amp Wray 2002) and iscertainly also implicit in observed temporal changes ingene expression during postembryonic development ofants and bumble bees (Goodisman et al 2005 Pereboom

Genomics of honey bee caste development and reproduction 711

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

et al 2005) Such change would be expected to involvecis-regulatory elements that is change in transcriptionfactor binding sites in UCRs as approached in this studyand also evolutionary change in response thresholds tocirculating morphogenetic hormones (for review see Hart-felder amp Emlen 2005) The second and quite unexpectedtheme is the acquisition of new systemic functions byevolutionary rather old proteins such as vitellogenin andhexamerins These apparently unspectacular proteinshave evolved into key players for caste evolution and repro-ductive division of labour via novel regulatory connectivitywith JH (Amdam et al 2004 Guidugli et al 2005 Zhouet al 2006ab)

Experimental procedures

Selection and annotation of ESTs representing differentially expressed genes in honey bee caste development

The starting point were 164 entries (mainly 3prime-ESTs) in GenBank(BG101532ndashBG101697) from an SSH library (Evans amp Wheeler1999 Evans amp Wheeler 2000) When validated by macroarrayanalyses a clustering into three major classes became apparent(I) genes overexpressed in young larvae (II) genes overexpressedin last instar queen larvae and (III) genes overexpressed in lastinstar worker larvae For this study we excluded the class I ESTsbecause their expression is not caste-specific but rather representsexpression differences between young (still bipotent) and olderlarvae To the class II queen ESTs (82) we added one completecDNA entry (AY601642) from a DDRT-PCR screen (Corona et al1999) and to the class III set of worker ESTs (40) we added sevenGenBank dbEST entries (BG149167ndashBG149173) from a DDRT-PCR screen on ovary development (Hepperle amp Hartfelder 2001)

The EST sequences were submitted to BLASTN searches(parameters -G 2 -E 3 -W 15 -F lsquom Drsquo -U -e 1e-20) against genomesequence assembly Amel_v30 to retrieve matches in linked orunlinked genomic contigs and to exclude no-matches (seven ESTsin queen) ESTs that aligned within the same scaffold werechecked for clustering and overlap This clustering also served toexclude genes that were represented by non-overlapping ESTsfrom both castes This procedure generated a set of 51 uniqueputative gene sequences overexpressed in either queen (34) or inworker larvae (17) These 51 nonredundant sequences weresubmitted to BLASTX searches against the Official Set of GLEAN3-predicted protein sequences (cut-off value at 1eminus20) For ESTs withno significant protein sequence matches the genomic regionsadjacent to the mapped EST were searched to find neighbouringORFs especially those nearest to putative 3prime UTRs of predictedproteins as the EST libraries have a bias in this direction

Official Set protein sequences were aligned against Amel_v30sequence assembly using TBLASTN to map protein to genomeand subsequently they were aligned using BLASTP against theGenBank nonredundant (nr) and the Flybase protein sequencedatabases The manual features annotation procedure of theArtemis 70 program (Rutherford et al 2000) was used to mapORFs putative splice sites of exons and ESTs to genome coordi-nates The final annotation file was generated with a Python scriptin GFF format (httpwwwsangeracukSoftwareformatsGFF)

Honey bee sequences annotated as orthologs to D mela-nogaster genes were putatively assigned the GO terms listed in

the respective Flybase entry In addition the definition of new GOterms (Biological Process ontology) related to caste developmentand polyphenism (GO0048651 and GO0048650 respectively)was co-ordinated with the Gene Ontology Consortium (Ashburneret al 2000) The FatiGO web tool (Al-Sharour et al 2004) wasused to cluster GO terms (level 3 setting) for Biological Processand Molecular Function

For the detection of conserved domains the 51 proteinsequences were screened against the Pfam database (httpwwwsangeracukSoftwarePfam) using the HMMER platform(current release 232 httphmmerwustledu) with a cut-offvalue set at 1eminus10

Annotation of oogenesis and reproduction genes

In order to identify putative honey bee orthologs to D melanogastergenes we searched the following GO terms in Flybase lsquooogenesisrsquo(GO0009993) lsquovitellogenesisrsquo (GO0007296) lsquofemale meiosisrsquo(GO0007143) lsquoDNA recombinationrsquo (GO0006310) and lsquochromo-some segregationrsquo (GO0007059) Genes related to segregationdistortion were searched for in Flybase in phenotypic descriptionsand mutant effects of D melanogaster genes as this phenomenonis not represented by a GO term Hence this group may be moreheterogeneous than the others From this list we removed genesof pleiotropic function (multifaceted GO entries in Biological Process)and genes that lacked defined transcripts in the Drosophilagenome database

For the GO terms lsquooogenesisrsquo and lsquovitellogenesisrsquo we performedTBLASTN and BLASTP searches for 42 fruit fly genes against theAmel_v30 genome assembly and the GLEAN3-predicted proteinsequences (Honey Bee Genome Sequencing Consortium 2006)respectively The orthologous D melanogaster gene was charac-terized by the same procedure as described above (reciprocalbest hit) For the GO terms lsquofemale meiosisrsquo lsquoDNA recombinationrsquolsquochromosome segregationrsquo and the non-GO group lsquosegregationdistortionrsquo transcripts of D melanogaster were searched againstnr databases at NCBI using BLASTP The obtained sequenceswere searched against the Amel_v30 genome assembly and theGLEAN3-predicted protein sequences using TBLASTN Homolo-gous sequences (threshold 1eminus10) were predicted using the BioEditsoftware (Hall 1999) ORFs showing significant homology (BLASTPthreshold 1eminus20) were assembled and used in BLASTP searchesagainst the nr databases at NCBI

Motif search in upstream regions in caste-specifically expressed genes

In order to detect overrepresented motifs in the upstream controlregions (UCRs) of the two sets of caste-related genes we selectedgene subsets based on two criteria (1) those that had shown thehighest caste-specificity in the array analyses (Evans amp Wheeler2000) and (2) those that had a conserved 5prime region when comparedwith the Drosophila orthologs These lsquotop10rsquo genes consisted of sixqueen genes (GB13072 GB11628 GB19380 GB14798 GB16047and GB18242) and of six worker genes (GB10869 GB12371GB12239 GB10428 GB19006 GB14758) The motif search wasconducted separately on the two sets of UCR sequences using threemethods AlignAce (Roth et al 1998) MEME (Bailey amp Elkan1995) and MDscan (Liu et al 2002) Default parameters valueswere used in all searches except that GC content in intergenicregions was set to 25 representing the background value estab-lished for the honey bee UCR database generated in this study

712 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

The database containing 10156 UCR sequences was gener-ated by parsing the Official Set annotation file (downloaded in GFFformat from httpwwwbeegenomehgscbcmtmcedubeeftphtml)to extract upstream regions starting from the terminal 5prime-genomiccoordinate of each predicted CDS The UCRs were arbitrarily setto a size frame of 1000 nucleotides (Roth et al 1998) but weretrimmed whenever another predicted ORF was detected in any ofthese regions

The MAP (maximum a priori log likelihood) score group specificityscore (called Church score in this manuscript) (Hughes et al2000) ROC AUC (area under the curve for a receiver-operatorcharacteristic plot) metric and MNCP (mean normalized condi-tional probability) metric (Clarke amp Granek 2003) were used todetect motifs that most likely correspond to biologically significantcis-regulatory elements The filters ran on the UCRs of the subsetsof queen and worker genes were a MAP score cut-off value of 50followed by a ROC AUC cut-off at 07 followed by a group specificityscore cut-off at 1eminus05 The UCR database of all GLEAN3 predictedhoney bee genes was used as the background to calculatethese metrics

A parametric test (MANOVA) and a nonparametric test (KolmogorovndashSmirnov) were conducted to identify significance levels for the twosets of filtered motifs found in the UCRs of caste-specificallyexpressed genes against filtered motifs found in the random UCRset Random motifs were sampled from a motif database (10 391motifs) generated by running our script 100 times with a randomsets of UCR sequences

The main criterion for identifying known regulatory motifs amongthese caste-specific ones was the alignment of the PSSM for eachbee motif with the Drosophila melanogaster sequences in theTRANSFAC database (release 40) (Wingender et al 2000) Onlythe alignments passing a threshold of 80 identity for each PSSMwere considered as significant matches In addition we checkedfor a specific binding motif the EcRUSP motif (rGkTCAaTGamcy-3prime) known to function in the expression of genes responding tomorphogenetic hormone titres (Perera et al 2005)

Operating system and programming tools

An Ubuntu Linux (version Breezy) operating system was used toimplement all scripts and pipelines designed for annotation proce-dures and motif discovery The Python programming language(httpwwwpythonorg) Biopython (httpwwwbiopythonorg)and TAMO (Tools for Analysis of Motifs) packages (Gordon et al2005) were used in program design Other web applications werebuilt using the Zope application server (httpwwwzopeorg)hosted at httpzulufmrpuspbrbeelab

Acknowledgements

This work was supported by grants from FAPESP(Fundaccedilatildeo de Amparo a Pesquisa do Estado de SatildeoPaulo 9900719-6) and CNPq (Conselho Nacional deDesenvolvimento Cientiacutefico e Tecnoloacutegico 4729632-3-1)

References

Abouheif E and Wray GA (2002) Evolution of the genetic networkunderlying wing polyphenism in ants Science 297 249ndash252

Al-Sharour F Diaz-Uriarte R and Dopazo J (2004) FatiGO aweb tool for finding significant associations of Gene Ontologyterms with groups of genes Bioinformatics 20 578ndash580

Amdam GV Norberg K Hagen A and Omholt SW (2003)Social exploitation of vitellogenin Proc Natl Acad Sci USA 1001799ndash1802

Amdam GV Simotildees ZLP Hagen A Norberg K SchroderK Mikkelsen O et al (2004) Hormonal control of the yolkprecursor vitellogenin regulates immune function and longevityin honeybees Exp Gerontol 39 767ndash773

Antoniewski C OrsquoGrady MS Edmondson RG Lassieur SMand Benes H (1995) Characterization of an EcRUSP heter-odimer target site that mediates ecdysone responsiveness ofthe Drosophila Lsp-2 gene Mol Gen Genet 249 545ndash556

Arbeitsman MN Furlong EEM Imam F Johnson E NullBH Baker BS et al (2002) Gene expression during the lifecycle of Drosophila melanogaster Science 297 2270ndash2275

Ashburner M Ball CA Blake JA Botstein D Butler HCherry JM et al (2000) Gene Ontology tool for the unificationof biology Nature Genet 25 25ndash29

Bailey TL and Elkan D (1995) Unsupervised learning of multiplemotifs in biopolymers using expectation maximizationMachine Learning J 21 58ndash83

Bergem M Norberg K and Aamodt RM (2006) Long-termmaintenance of in vitro cultured honeybee (Apis mellifera)embryonic cells BMC Dev Biol 6 17 online

Boutros M Kiger AA Armknecht S Kerr K Hild M Koch Bet al (2004) Genome-wide RNAi analysis of growth and viabil-ity in Drosophila cells Science 303 832ndash835

Carsten LD Watts T and Markov TA (2005) Gene expressionpatterns accompanying a dietary switch in Drosophila mela-nogaster Mol Ecol 14 3203ndash3208

Clarke ND and Granek JA (2003) Rank order metrics for quan-tifying the association of sequence features with gene regula-tion Bioinformatics 19 212ndash218

Colombani J Bianchini L Layalle S Pondeville E Dauphin-Villemant C Antoniewski C et al (2005) Antagonisticactions of ecdysone and insulins determine final size inDrosophila Science 310 667ndash670

Corona M Estrada E and Zurita M (1999) Differential expressionof mitochondrial genes between queens and workers duringcaste determination in the honeybee Apis mellifera J Exp Biol202 929ndash938

Cunha AC Nascimento AM Guidugli KR Simotildees ZLPand Bitondi MMG (2005) Molecular cloning and expressionof a hexamerin cDNA from the honey bee J Insect Physiol 511135ndash1147

Darwin CR (1859) On the Origin of Species by Means of NaturalSelection John Murray London

Davidson EH (2001) Genomic Regulatory Systems Developmentand Evolution Academic Press San Diego

Eder J Kremer JP and Rembold H (1983) Correlation ofcytochrome c titer and respiration in Apis mellifera adaptiveresponses to caste determination defines workers intercastesand queens Comp Biochem Physiol B 76 703ndash716

Engels W (1974) Occurrence and significance of vitellogenins infemale castes of social Hymenoptera Am Zool 14 1229ndash1237

Evans JD and Wheeler DE (1999) Differential gene expressionbetween developing queens and workers in the honey beeApis mellifera Proc Natl Acad Sci USA 96 5575ndash5580

Genomics of honey bee caste development and reproduction 713

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

Evans JD and Wheeler DE (2000) Expression profiles duringhoneybee caste determination Genome Biol 2 1 6 pagesonline httpgenomebiologycom200021research00011

Forbes A and Lehmann R (1998) Nanos and Pumilio havecritical roles in the development and function of Drosophilagermline stem cells Development 125 679ndash690

Gonzaacutelez-Reyes A Elliot H and St Johnston D (1995)Polarization of both major body axes in Drosophila by gurken-torpedo signalling Nature 375 654ndash658

Goodisman MA Isoe J Wheeler DE and Wells MA (2005)Evolution of insect metamorphosis a microarray-based studyof larval and adult gene expression in the ant Camponotusfestinatus Evolution 59 858ndash870

Gordon DB Nekludova L McCallum S and Fraenkel E(2005) TAMO a flexible object-oriented framework for analys-ing transcriptional regulation using DNA-sequence motifsBioinformatics 21 3164ndash3165

Guidugli KR Hepperle C and Hartfelder K (2004) A memberof the short-chain dehydrogenasereductase (SDR) super-family is a target of the ecdysone response in honey bee (Apismellifera) caste development Apidologie 35 37ndash47

Guidugli KR Nascimento AM Amdam GV Barchuk AROmholt SW Simotildees ZLP and Hartfelder K (2005)Vitellogenin regulates hormonal dynamics in the worker casteof a eusocial insect FEBS Lett 579 4961ndash4965

Hagedorn HH Maddison DR and Tu Z (1998) The evolutionof vitellogenins cyclorrhaphan yolk proteins and relatedmolecules Adv Insect Physiol 27 335ndash384

Hall TA (1999) BioEdit a user-friendly biological sequence align-ment editor and analysis program for Windows 9598NTNucleic Acids Symp Ser 41 95ndash98

Hamilton WD (1964) The genetical theory of social behaviour Iamp II J Theor Biol 7(1ndash16) 17ndash52

Hartfelder K and Emlen DJ (2005) Endocrine control of insectpolyphenism In Comprehensive Insect Molecular Science(Gilbert LI Iatrou K and Gill S eds) Vol 3 pp 651ndash703Elsevier Oxford

Hartfelder K and Engels W (1998) Social insect polymorphismHormonal regulation of plasticity in development and repro-duction in the honeybee Curr Topics Dev Biol 40 45ndash77

Hawkins NC Thorpe J and Schuumlpbach T (1996) encore agene required for the regulation of germ line mitosis and oocytedifferentiation during Drosophila oogenesis Development 122281ndash290

Haydak MH (1970) Honey bee nutrition Annu Rev Entomol 15143ndash156

Hepperle C and Hartfelder K (2001) Differentially expressedregulatory genes in honey bee caste development Naturwis-senschaften 88 113ndash116

Hoage TR and Kessel RG (1968) An electron microscopicstudy of the process of differentiation during spermatogenesis inthe drone honey bee (Apis mellifera L) with special reference tocentriole replication and elimination J Ultrastruct Res 24 6ndash32

Honey Bee Genome Sequencing Consortium (2006) Insightsinto social insects from the genome of the honeybee Apismellifera Nature (in press)

Hughes JD Estep PW Tavazole S and Church GM (2000)Computational identification of cis-regulatory elementsassociated with groups of functionally related genes inSaccharomyces cervisiae J Mol Biol 296 1205ndash1214

Hunt GJ and Page RE (1995) Linkage map of the honey bee

Apis mellifera based on RAPD markers Genetics 139 1371ndash1382

Hunt JH Buck NA and Wheeler DE (2003) Storage proteinsin vespid wasps characterization developmental pattern andoccurrence in adults J Insect Physiol 49 785ndash794

Judice CC Carazzole MF Festa F Sogayar MC HartfelderK and Pereira GAG (2006) Gene expression profiles under-lying alternative caste phenotypes in a highly eusocial beeMelipona quadrifasciata Insect Mol Biol 15 33ndash44

Kerr WE (1950) Genetic determination of castes in the genusMelipona Genetics 35 143ndash152

Kropaacutecovaacute S and Haslbachovaacute H (1971) The influence ofqueenlessness and of unsealed brood on the development ofovaries in worker honeybees J Apicult Res 10 57ndash61

Lattorff HMG Moritz RFA and Fuchs S (2005) A single genedetermines thelytokous parthenogenesis in honey bee workers(Apis mellifera capensis) Heredity 94 533ndash537

Lattorff HMG Moritz RFA Solignac M and Crewe RM(2006) Control of reproductive dominance by the thelytokygene in honeybees Biol Lett (in press)

Li TR and White KP (2003) Tissue-specific gene expressionand ecdysone-regulated genomic networks in Drosophila DevCell 5 59ndash72

Lin H Yue L and Spradling AC (1994) The Drosophila fusomea germline-specific organelle contains membrane skeletalproteins and functions in cyst formation Development 120947ndash956

Liu XS Brutlag DL and Liu JS (2002) An algorithm for findingprotein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments Nature Biotechnol20 835ndash839

Liu Y Liu S Wei L Altman RB and Batzoglou S (2004)Eukaryotic regulatory element conservation analysis andidentification using comparative genomics Genome Res 14354ndash366

Martinez T Burmester T Veenstra JA and Wheeler DE(2001) Sequence and evolution of a hexamerin from the antCamponotus festinatus Insect Mol Biol 9 427ndash431

McKearin D (1997) The Drosophila fusome organelle biogenesisand germ cell differentiation if you build it hellip Bioessays 19147ndash152

Mirth C Truman JW and Riddiford LM (2005) The role of theprothoracic gland in determining critical weight for metamor-phosis in Drosophila melanogaster Curr Biol 15 1ndash12

Moritz RFA Kryger P and Allsopp MH (1996) Competition forroyalty in bees Nature 384 31

Page RE and Erickson EH (1988) Reproduction by worker honeybees (Apis mellifera L) Behav Ecol Sociobiol 23 117ndash126

Pereboom JJM Jordan WC Sumner S Hammond RL andBourke AFG (2005) Differential gene expression in queen-worker caste determination in bumble bees Proc R Soc B 2721145ndash1152

Perera SC Zheng S Feng QL Krell PJ Retnakaran A andPalli SR (2005) Heterodimerization of ecdysone receptor andultraspiracle on symmetric and asymmetric response elementsArch Insect Biochem Physiol 60 55ndash70

Piulachs MD Guidugli KR Barchuk AR Cruz J SimotildeesZLP and Belleacutes X (2003) The vitellogenin of the honeybeeApis mellifera Structural analysis of the cDNA and expressionstudies Insect Biochem Mol Biol 33 459ndash465

Pritsker M Liu YC Beer MA and Tavazole S (2004) Whole

714 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

genome discovery of transcription factor binding sites bynetwork-level conservation Genome Res 14 99ndash108

Rachinsky A Strambi C Strambi A and Hartfelder K (1990)Caste and metamorphosis ndash hemolymph titers of juvenilehormone and ecdysteroids in last instar honeybee larvae GenComp Endocr 79 31ndash38

Riddihough G and Pelham HRB (1987) An ecdysone responseelement in the Drosophila hsp27 promotor EMBO J 6 3729ndash3734

Roth FP Hughes JD Estep PW and Church GM (1998)Finding DNA regulatory motifs within unaligned noncodingsequences clustered by whole-genome mRNA quantificationNature Biotechnol 16 939ndash945

Rutherford K Parkhill J Crook J Horsnell T Rice PRajandream MA and Barrell B (2000) Artemis sequencevisualisation and annotation Bioinformatics 16 944ndash945

Schmidt-Capella IC and Hartfelder K (1998) Juvenile hormoneeffect on DNA synthesis and apoptosis in caste-specificdifferentiation of the larval honey bee (Apis mellifera L) ovaryJ Insect Physiol 44 385ndash391

Snodgrass RE (1956) Anatomy of the Honey Bee Cornell Uni-versity Press Ithaca

Solignac M Vautrin D Baudry E Mougel F Loiseau A andCornuet JM (2004) A microsatellite-based linkage map of thehoneybee Apis mellifera L Genetics 167 253ndash262

St Johnston D and Nuumlsslein-Volhard C (1992) The origin of pat-tern and polarity in the Drosophila embryo Cell 68 201ndash219

Staab S and Steinmann-Zwicky M (1995) Female germ cells ofDrosophila require zygotic ovo and otu product for survival inlarvae and pupae respectively Mech Dev 54 205ndash210

Sullivan AA and Thummel CS (2003) Temporal profiles ofnuclear receptor gene expression profiles reveal coordinatetranscriptional responses during Drosophila development MolEndocrinol 17 2125ndash2137

Wassermann WW and Sandelin A (2004) Applied bioinformaticsfor the identification of regulatory elements Nature Rev Genet5 267ndash287

West-Eberhard MJ (2003) Developmental Plasticity andEvolution Oxford University Press Oxford

Wheeler DE and Nijhout HF (2003) A perspective for under-standing the modes of juvenile hormone as a lipid signalingsystem Bioessays 25 994ndash1001

Wilson EO (1971) The Insect Societies Belknapp Press of Har-vard University Press Cambridge MA

Wingender E Chen X Hehl R Karas H Liebich I Matys Vet al (2000) TRANSFAC an integrated system for geneexpression regulation Nucleic Acids Res 28 316ndash319

Zhou X Oi FM and Scharf ME (2006a) Social exploitationof hexamerin RNAi reveals a major caste-regulatory factor intermites Proc Natl Acad Sci USA 103 4499ndash4504

Zhou X Tarver MR Bennett GW Oi FM and Scharf ME(2006b) Two hexamerin genes from the termite Reticulitermesflavipes sequence expression and proposed functions incaste regulation Gene 376 47ndash58

Zhu Z Shendure J and Church GM (2005) Discoveringfunctional transcription factor combinations in the human cellcycle Genome Res 15 848ndash855

Supplementary material

The following material is available for this article online

Table S1 Annotation results of caste-specifically expressedhoney bee genes

Table S2 Caste-specific motifs in UCRs of honey bee genes

Table S3 Annotation of honey bee orthologs to Drosophilamelanogaster genes involved in Reproduction

This material is available as part of the online article fromhttpwwwblackwell-synergycom

Page 4: Caste development and reproduction: a genome-wide analysis of hallmarks of insect eusociality

706

A S Cristino

et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society

Insect Molecular Biology

15

703ndash714

relevant protein domains (Supplementary materialTable 1S

)

and thus are indicative of functional trendsIn general terms the caste-specific separation into

metabolic pathway preferences oxidoreductases vshydrolases may reflect the switch in diet that a workerlarva experiences during the fourth and fifth larval instarThis represents a switch from a proteinlipid-rich diet toa more carbohydrate-rich diet (Haydak 1970) and thisswitch apparently is accompanied by an increase in theexpression of genes coding for proteins with hydrolaseactivity Similar switches in gene expression patternshave recently been reported for

D melanogaster

in anexperiment where larvae were shifted from a cornmealdiet to a banana diet (Carsten

et al

2005) resulting inthe up- or downregulation of 55 genes of a test populationof 6000 Among these are five genes with dehydrogenaseoxidoreductase activity These parallels in dietary switchresponses are indicative of conserved coregulated genenetworks An open question is of course how these can beco-opted to generate different phenotypes such as thecastes of social insects In this respect social insects clearlygo a big step beyond the simple metabolic switch responseseen in

Drosophila

They have apparently incorporateddivergent metabolic regulation into a network architectureconsistent with morphogenetic differentiation This requiredthat metabolic regulation became integrated throughthe endocrine system with developmental patterningprocesses

The importance of metabolic regulation on caste devel-opment has also come to light in a recent RepresentationalDifference Analysis (RDA) study on caste development inthe highly eusocial stingless bee

Melipona quadrifasciata

(Judice

et al

2006)

This is particularly interesting becausein this genus caste development is thought to be based ona genetic predisposition (Kerr 1950) Metabolic regulationmay thus be a

sine qua non

for caste development and

caste-specific metabolic pathways may be set in motionrather independently of the nature of the initial switch(nutritional or genetic) The question of how this metabolicswitch may integrate with the resultant endocrine signaturecharacteristic for each caste is still a widely open field butrecent studies in

Drosophila

showing an interaction betweenecdysone and insulin signalling in the determination ofbody size (Colombani

et al

2005 Mirth

et al

2005) mayprovide a lead

This is also the point to reflect on how justified it is toheuristically rely on

Drosophila

orthologs and to use their GOattributes in a developmental context (caste differentiation)that has no parallel in

Drosophila

A recent gene expressionprofiling study in the ant

Camponotus festinatus

employinga microarray set-up of 384 clones showed significantlydifferent expression levels for larval vs adult ants in 91genes (21 confirmed by qRTndashPCR) including an

Apishexamerin 70b

ortholog (Goodisman

et al

2005) Whencomparing the temporal expression patterns of these antgenes with expression profiles for their respective

Drosophila

orthologs (Arbeitsman

et al

2002) relatively little accordwas noted for the two species leading to the suggestionthat these genes may have taken on distinct functionsdue to the long divergence time between dipterans andhymenopterans (Goodisman

et al

2005) Differencesaside these examples show that in practically all studies onlarge-scale functional considerations in gene expression weare strongly wedded with

Drosophila

and even thoughfunctional divergence in orthologs may have occurredthere is little experimental gene-by-gene evidence availablefor any of the major insect orders outside of Diptera

Functional studies are clearly profiting from the nowavailable honey bee genome sequence as evident fromthe increasing number of RNAi experiments in honeybees(see citations in Honey Bee Genome Sequencing Consor-tium 2006) This is still a small number compared with

Figure 2 Gene Ontology categories with caste-specific expression patterns for Biological Process (A) Genes classified as part of cell differentiation processes are significantly overexpressed in workers whereas genes related to metabolism are overexpressed in queen larvae In the Molecular Function categories (B) we observed an apparent split indicating differential enzyme preferences in queens (overexpress oxidoreductases) and in workers (overexpress hydrolases) The graph was generated by a FatiGO analysis set at level 3 Frequencies indicate the appearance of GO terms in the queen (black bars) and worker differentially expressed genes (grey bars)

Genomics of honey bee caste development and reproduction

707

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society

Insect Molecular Biology

15

703ndash714

the large-scale RNAi assays established for

Drosophila

(Boutros

et al

2004) but the development of cell cultureapproaches in the honey bee (Bergem

et al

2006) repre-sents a step in this direction

Alternatively regulatory functional associations betweengenes and their integration into networks can be inferredfrom the presence of response elements in upstreamcontrol regions In our analysis of differentially expressedgenes in queen-worker development we took a bioinformaticsapproach for a first look into the molecular architecture of adevelopmental polyphenism

Motif search in upstream regions of differentially expressed genes

The genes related to caste development are among thefirst honey bee genes for which experimentally validatedexpression data were generated (Corona

et al

1999Evans amp Wheeler 1999 2000 Hepperle amp Hartfelder2001 Guidugli

et al

2004) Certainly these 51 genes donot comprise all the genes involved in caste developmentbut they are expected be prominent players as they werethe ones that stood out in the SSH and DDRT-PCRapproaches The 51 caste genes do not represent genefamilies but rather fall into many very different molecularfunction categories This made us ask whether theobserved overexpression pattern of different genes ineither queen or worker larvae may be associated with theoccurrence of specific regulatory motifs in the upstreamcontrol regions (UCR) of these genes

Three different algorithms AlignACE (Roth

et al

1998)MEME (Bailey amp Elkan 1995) and MDscan (Liu

et al

2002) were used to construct a pipeline for detectingoverrepresented motifs in the two unaligned sets of UCRsequences for the caste-specifically expressed genes Thispipeline was run on a lsquotop-10rsquo set of 12 genes (six for eachcaste) which showed the most pronounced caste differ-ences in expression (Evans amp Wheeler 2000) and alsoon a randomly selected set of UCRs (background control)We calculated four different metrics for each motif MAPscore (Roth

et al

1998) a group-specificity score (Churchscore) (Hughes

et al

2000) and a ROC AUC and MNCPmetric (Clarke amp Granek 2003) A first set of filters wasused to detect motifs with a potential for regulatory functions(MAP score

ge

5 ROC AUC

ge

07) This resulted in 46 motifsout of 123 total UCR motifs found in the queen UCR setand in 71 motifs out of 261 total found in the worker UCRset (Supplementary material Table 2S

)

A parametric statistical test (

MANOVA

P

= 00001Wilksrsquo = 078

F

= 72) and a nonparametric statistical test(KolmogorovndashSmirnov Table 1) on ROC AUC and MNCPindices showed that these two sets of filtered motifs aresignificantly different from a randomly selected set ofmotifs The rank-order metrics ROC AUC and MNCP havepreviously been used to compare the association of short

regulatory sequence features with gene expression data(microarray analyses on coregulated genes) and they havebeen useful in flagging false positives erroneously includedin lsquotop-10rsquo sets of differentially expressed genes (Clarke ampGranek 2003)

To select highly specific motifs found in each data set weused the group-specificity score (Church score

le

1e

minus

05

Hughes

et al

2000) to identify the most likely motifsinvolved in decision making for pathways leading to queen(two motifs Fig 3A) or to worker development (12 motifswith Church score

le

1e

minus

07

Fig 3B) As the SSH and DDRT-PCR approaches on caste development can be expectedto retrieve only a subpopulation of such genes these motifsrepresent only a partial scenario of the transcriptionalregulatory network underlying caste development Themotifs can now be used to screen other GLEAN3-predictedgenes to integrate a candidate list of putatively coregulatedgenes in caste development that can be submitted tofurther experimental validation

Each motif found in UCRs of queen (46) and worker (71)overexpressed genes was compared with the entire setof

D melanogaster cis

-regulatory motifs contained in theTRANSFAC database (version 40 Wingender

et al

2000)Only alignments passing 80 identity for each position-specific site matrix (PSSM) were considered as significantmatches Whereas none of the most specific motifs foreach caste showed similarity to any of the

D melanogaster

motifs some of the more ubiquitous ones did resemblebinding sites of transcription factors such as

AntennapediaUltrabithorax

zerknuumlllt

even skipped

trithorax-like

tailless

paired

fushi tarazu

and

Adh transcription factor 1

(Supple-mentary material Table 2S)

When we plotted the positions of the two queen and the12 worker motifs in the UCRs of the caste-specificallyexpressed genes (Fig 4) an interesting pattern emergedfor the worker-specific motifs Some of the worker motifsappeared to be clustered and occurring in tandem further-more they were positioned relatively close to the predictedtranslation start sites in some of the genes that are over-expressed during worker development (annotation resultsof these genes are listed in Supplementary materialTable 1S) A position close to the predicted translation start

Table 1 KolmogorovndashSmirnov analysis of ROC AUC and MNCP metric for statistical significance of putative regulatory motifs in upstream control regions of genes with queen or worker-specific expression patterns These motifs were contrasted with a random set of motifs detected in a random set of UCRs of GLEAN3-predicted honey bee genes

Group pairs ROC AUC MNCP

Random times (Queen + Worker) P gt 01 P lt 0001Random times Queen P gt 01 P lt 0005Random times Worker P gt 01 P lt 0001Queen times Worker P lt 01 P gt 01

708 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

sites is generally taken as a sign of strong regulatory effect(Davidson 2001)

As caste development is highly dependent on changesin haemolymph titres of JH and ecdysteroids we alsoscreened the UCRs of the differentially expressed genesfor putative nuclear receptor binding sites Regulatoryelements involved in the JH response are not well under-stood yet so any prediction in this direction would beelusive (Wheeler amp Nijhout 2003) Functional ecdysoneresponse elements (EcRE) have however been identifiedand it is now well established that the EcRUSP complexbinds to direct or inverted (palindromic) repeats (Riddi-hough amp Pelham 1987 Antoniewski et al 1995 Perera

et al 2005) A PSSM search (Wassermann amp Sandelin2004) based on a canonical representation (rGkTCAaT-Gamcy) (Perera et al 2005) did not reveal any putativeEcRE motif in the UCRs of the 51 caste-differentiallyexpressed genes However this does not rule out thatthese genes respond to changes in JH andor ecdysteroidtitres as these hormones require EcRUSP binding prima-rily in the expression of early genes but not necessarily forthe late response genes (Li amp White 2003 Sullivan ampThummel 2003)

In conclusion the predictions from such a combinedstrategy that searches for group-specific and for conservedregulatory motifs in GLEAN3 predicted honey bee genes

Figure 3 Putative regulatory motifs and their consensus sequences in UCRs of queen and worker overexpressed genes Scores for MAP Church ROC AUC and MNCP metrics indicate degree of group specificity and significance level

Genomics of honey bee caste development and reproduction 709

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

represents a major transition from nonhypothesis-drivenhigh-throughput screens to hypothesis-driven searches forcontext-dependent gene expression in honey bees Suchdirected search results can serve as a platform for experi-mental analyses of genome-wide integration in hormonalcontrol of caste development in bees In addition this studyexemplifies how existent algorithms for detecting sharedregulatory motifs can be joined into a toolkit for predictingcoregulated gene expression patterns in honey bees Thesemethods have been shown to be robust and are gainingacceptance for use in functional and comparative genomics(Liu et al 2004 Pritsker et al 2004 Zhu et al 2005)

Oogenesis and reproduction

As caste development sets the stage for reproductivedivision of labour genes involved in reproductive processesare strong candidates for functional analyses In the presentstudy we performed BLAST searches to identify honey beeorthologs for a list of 32 fly genes with the GO attributelsquooogenesisrsquo and for four genes specifically related tolsquovitellogenesisrsquo The list for fly genes involved in nuclearevents in germ cells consisted of 20 genes for lsquofemalemeiosisrsquo 12 genes for lsquorecombinationrsquo and 21 genes underthe heading lsquochromosome segregation including segregation

distortionrsquo (Supplementary material Table 3S) In some casesthese GO attributes for fly genes of course overlapped

BLASTN and BLASTX searches for these fly genesagainst the honey bee genome assembly 30 and theGLEAN3 Official Set (aa) retrieved statistically well sup-ported putative bee orthologs for most of these candidatesFor the genes involved in meiosis recombination and chro-mosome segregation this finding although not unexpectedis of interest as meiosis in the haploid honey bee drone isstrongly modified when compared with a normal diploidmeiosis The first meiosis is initiated but the nucleusremains undivided and only the superfluous centrioles areeliminated as cytoplasmatic buds (Hoage amp Kessel 1968)An interesting gene thelytoky (th) has recently beenmapped in this context (Lattorff et al 2005) It preventsalmost completely meiotic recombination in the automixisof laying workers of the Cape honey bee As an indicationof the interplay between meiosis and later developmentthis locus also appears to be an integral part of variousgene cascades involved in caste determination (Lattorffet al 2006)

The fly genes retrieved in the GO searches for lsquooogenesisrsquorepresent a much larger range of Molecular Functioncategories such as transcription factors proteins regulating

Figure 4 Map of the group-specific motifs found in queen and worker UCRs of caste-specifically expressed genes The coding region is represented by the GLEAN3 prediction number (assembly 40) with arrows indicating the translation start site Asterisks mark UCRs of the lsquotop10rsquo set used to find the over-represented motifs

710 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

translation by RNA binding RNA helicases enzymes(ubiquitination transfer of sugar residues sulfotransferase)GTPase activity and several factors binding to cytoskeletalproteins (Supplementary material Table 3S) This widerange of functional categories is expected as these genesare involved in a series of different steps during oogenesisin the polytrophic meroistic ovary Oogenesis starts out withthe maintenance of germline and somatic stem cell identityin the germline niche in the upper germarium A key geneinvolved in this process is pumilio (Forbes amp Lehmann1998) which is represented by a highly conserved beeortholog GB10504-PA The second step is the formation ofgerm cell cysts the determination of an oocyte within eachcyst and the survival of these cysts involving genes suchas benign gonial cell neoplasm (Lin et al 1994) encore(Hawkins et al 1996) ovo and ovarian tumour (otu) (Staabamp Steinmann-Zwicky 1995) all well conserved in the honeybee genome Interestingly we could not find a clear beeortholog for bag of marbles (bam) which is one of the primeearly response genes in the cystoblast differentiationpathway in Drosophila (McKearin 1997)

The third step comprises previtellogenic growth of thefollicle and during these stages a number of maternalfactors are deposited and anchored either within the oocyteor in the perivitelline space that define the egg and thefuture embryonic axes (for review see St Johnston ampNuumlsslein-Volhard (1992) In the list of Drosophila genesinvolved in early steps of axis determination a couple ofsurprises came up in the search for honey bee orthologsA big surprise was that we could not find a gurken orthologin the bee even though this gene sets up both the anteriorndashposterior and dorsalndashventral axes in the Drosophila egg(Gonzaacutelez-Reyes et al 1995) whereas downstreamcomponents of the Gurken signalling cascade appear tobe preserved in the bee genome (Honey Bee GenomeSequencing Consortium 2006) Similar apparent gaps inconstituents of patterning cascades were noted for theterminal regions of the embryo such as a lack of a torsoortholog whereas its ligand torso-like is represented by awell conserved ortholog in the bee (GB18663-PA)

With respect to genes involved in the final processes ofoogenesis we primarily looked at genes that play a partduring vitellogenesis There are four genes of interest in thisclass the primary one coding for the yolk protein precursorvitellogenin This gene has already been sequenced for thehoney bee (Piulachs et al 2003) and as expected it ismuch more related to vitellogenins of other insects andeven vertebrates than to the Drosophila yolk proteinswhich apparently are derived from lipases (Hagedorn et al1998) The second gene of interest is the bee ortholog toyolkless as this (GB16571-PA) could represent a putativevitellogenin receptor The other two Drosophila genes withclear orthologs in the bee are CG18641 and CG12139which code for a lipase and an LDL receptor respectively

General conclusions

The current analysis made use of previous experimentalanalyses on differential transcription during caste develop-ment of honey bee larvae In the annotation of these geneswhich includes references to Gene Ontology terms asso-ciated with their respective Drosophila orthologs two majorconfigurations emerged First of all worker genes werebetter defined in terms of GO attributes compared with therelatively large number of queen genes that had no GOterms associated to their respective Drosophila orthologsEven when taking into consideration the conceptual limitsin attributing GO terms on molecular function and biologicalprocess from Drosophila orthologs to honey bee genesthis finding could have a bearing on general basicquestions in socioevolution namely which caste is moredivergent from a nonsocial reproductive female bee proto-type or reproductive ground plan the queen or the workerLess speculative is the second major conclusion comingout of the GO analysis for Molecular Function showing andconfirming (Eder et al 1983 Corona et al 1999) theimportant role of metabolic regulation in caste developmentThis facet is demonstrated especially clearly in thecaste-specific expression of oxidoreductases (queen)vs hydrolases (workers)

The honey bee genome information provided not only amuch improved annotation platform for caste-specificallyexpressed ESTs but even more so opens the possibilityto explore putative regulatory features of the honey beegenome In the current study we employed modified Gibbssampling and expectation-maximization algorithms (Alig-nACE MDScan MEME) to detect group-specific motifs ingene regions up to 1000 bp upstream of translation startsites We detected 14 motifs that were significantly over-represented in the caste genes when compared withcorresponding motifs found in a random set of GLEAN3-predicted honey bee genes The localization of suchmotifs in UCRs of worker-overexpressed genes revealeda clustering of such motifs close to the predicted basalpromotor regions suggesting strong regulatory effectsSuch search strategies and the detected motifs can providethe lead to reveal and unravel cis-regulatory networks forand within specific contexts of honey bee biology

Caste polyphenism in social insects makes a strong casefor the emergence of novelties at a microevolutionary level(West-Eberhard 2003) Without the pretension to discussexhaustively the mechanisms underlying this surge ofdevelopmental plasticity two major themes becomeapparent in this and other studies Regulatory change hasbeen demonstrated in the shut-down of wing disc pattern-ing cascades in ants (Abouheif amp Wray 2002) and iscertainly also implicit in observed temporal changes ingene expression during postembryonic development ofants and bumble bees (Goodisman et al 2005 Pereboom

Genomics of honey bee caste development and reproduction 711

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

et al 2005) Such change would be expected to involvecis-regulatory elements that is change in transcriptionfactor binding sites in UCRs as approached in this studyand also evolutionary change in response thresholds tocirculating morphogenetic hormones (for review see Hart-felder amp Emlen 2005) The second and quite unexpectedtheme is the acquisition of new systemic functions byevolutionary rather old proteins such as vitellogenin andhexamerins These apparently unspectacular proteinshave evolved into key players for caste evolution and repro-ductive division of labour via novel regulatory connectivitywith JH (Amdam et al 2004 Guidugli et al 2005 Zhouet al 2006ab)

Experimental procedures

Selection and annotation of ESTs representing differentially expressed genes in honey bee caste development

The starting point were 164 entries (mainly 3prime-ESTs) in GenBank(BG101532ndashBG101697) from an SSH library (Evans amp Wheeler1999 Evans amp Wheeler 2000) When validated by macroarrayanalyses a clustering into three major classes became apparent(I) genes overexpressed in young larvae (II) genes overexpressedin last instar queen larvae and (III) genes overexpressed in lastinstar worker larvae For this study we excluded the class I ESTsbecause their expression is not caste-specific but rather representsexpression differences between young (still bipotent) and olderlarvae To the class II queen ESTs (82) we added one completecDNA entry (AY601642) from a DDRT-PCR screen (Corona et al1999) and to the class III set of worker ESTs (40) we added sevenGenBank dbEST entries (BG149167ndashBG149173) from a DDRT-PCR screen on ovary development (Hepperle amp Hartfelder 2001)

The EST sequences were submitted to BLASTN searches(parameters -G 2 -E 3 -W 15 -F lsquom Drsquo -U -e 1e-20) against genomesequence assembly Amel_v30 to retrieve matches in linked orunlinked genomic contigs and to exclude no-matches (seven ESTsin queen) ESTs that aligned within the same scaffold werechecked for clustering and overlap This clustering also served toexclude genes that were represented by non-overlapping ESTsfrom both castes This procedure generated a set of 51 uniqueputative gene sequences overexpressed in either queen (34) or inworker larvae (17) These 51 nonredundant sequences weresubmitted to BLASTX searches against the Official Set of GLEAN3-predicted protein sequences (cut-off value at 1eminus20) For ESTs withno significant protein sequence matches the genomic regionsadjacent to the mapped EST were searched to find neighbouringORFs especially those nearest to putative 3prime UTRs of predictedproteins as the EST libraries have a bias in this direction

Official Set protein sequences were aligned against Amel_v30sequence assembly using TBLASTN to map protein to genomeand subsequently they were aligned using BLASTP against theGenBank nonredundant (nr) and the Flybase protein sequencedatabases The manual features annotation procedure of theArtemis 70 program (Rutherford et al 2000) was used to mapORFs putative splice sites of exons and ESTs to genome coordi-nates The final annotation file was generated with a Python scriptin GFF format (httpwwwsangeracukSoftwareformatsGFF)

Honey bee sequences annotated as orthologs to D mela-nogaster genes were putatively assigned the GO terms listed in

the respective Flybase entry In addition the definition of new GOterms (Biological Process ontology) related to caste developmentand polyphenism (GO0048651 and GO0048650 respectively)was co-ordinated with the Gene Ontology Consortium (Ashburneret al 2000) The FatiGO web tool (Al-Sharour et al 2004) wasused to cluster GO terms (level 3 setting) for Biological Processand Molecular Function

For the detection of conserved domains the 51 proteinsequences were screened against the Pfam database (httpwwwsangeracukSoftwarePfam) using the HMMER platform(current release 232 httphmmerwustledu) with a cut-offvalue set at 1eminus10

Annotation of oogenesis and reproduction genes

In order to identify putative honey bee orthologs to D melanogastergenes we searched the following GO terms in Flybase lsquooogenesisrsquo(GO0009993) lsquovitellogenesisrsquo (GO0007296) lsquofemale meiosisrsquo(GO0007143) lsquoDNA recombinationrsquo (GO0006310) and lsquochromo-some segregationrsquo (GO0007059) Genes related to segregationdistortion were searched for in Flybase in phenotypic descriptionsand mutant effects of D melanogaster genes as this phenomenonis not represented by a GO term Hence this group may be moreheterogeneous than the others From this list we removed genesof pleiotropic function (multifaceted GO entries in Biological Process)and genes that lacked defined transcripts in the Drosophilagenome database

For the GO terms lsquooogenesisrsquo and lsquovitellogenesisrsquo we performedTBLASTN and BLASTP searches for 42 fruit fly genes against theAmel_v30 genome assembly and the GLEAN3-predicted proteinsequences (Honey Bee Genome Sequencing Consortium 2006)respectively The orthologous D melanogaster gene was charac-terized by the same procedure as described above (reciprocalbest hit) For the GO terms lsquofemale meiosisrsquo lsquoDNA recombinationrsquolsquochromosome segregationrsquo and the non-GO group lsquosegregationdistortionrsquo transcripts of D melanogaster were searched againstnr databases at NCBI using BLASTP The obtained sequenceswere searched against the Amel_v30 genome assembly and theGLEAN3-predicted protein sequences using TBLASTN Homolo-gous sequences (threshold 1eminus10) were predicted using the BioEditsoftware (Hall 1999) ORFs showing significant homology (BLASTPthreshold 1eminus20) were assembled and used in BLASTP searchesagainst the nr databases at NCBI

Motif search in upstream regions in caste-specifically expressed genes

In order to detect overrepresented motifs in the upstream controlregions (UCRs) of the two sets of caste-related genes we selectedgene subsets based on two criteria (1) those that had shown thehighest caste-specificity in the array analyses (Evans amp Wheeler2000) and (2) those that had a conserved 5prime region when comparedwith the Drosophila orthologs These lsquotop10rsquo genes consisted of sixqueen genes (GB13072 GB11628 GB19380 GB14798 GB16047and GB18242) and of six worker genes (GB10869 GB12371GB12239 GB10428 GB19006 GB14758) The motif search wasconducted separately on the two sets of UCR sequences using threemethods AlignAce (Roth et al 1998) MEME (Bailey amp Elkan1995) and MDscan (Liu et al 2002) Default parameters valueswere used in all searches except that GC content in intergenicregions was set to 25 representing the background value estab-lished for the honey bee UCR database generated in this study

712 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

The database containing 10156 UCR sequences was gener-ated by parsing the Official Set annotation file (downloaded in GFFformat from httpwwwbeegenomehgscbcmtmcedubeeftphtml)to extract upstream regions starting from the terminal 5prime-genomiccoordinate of each predicted CDS The UCRs were arbitrarily setto a size frame of 1000 nucleotides (Roth et al 1998) but weretrimmed whenever another predicted ORF was detected in any ofthese regions

The MAP (maximum a priori log likelihood) score group specificityscore (called Church score in this manuscript) (Hughes et al2000) ROC AUC (area under the curve for a receiver-operatorcharacteristic plot) metric and MNCP (mean normalized condi-tional probability) metric (Clarke amp Granek 2003) were used todetect motifs that most likely correspond to biologically significantcis-regulatory elements The filters ran on the UCRs of the subsetsof queen and worker genes were a MAP score cut-off value of 50followed by a ROC AUC cut-off at 07 followed by a group specificityscore cut-off at 1eminus05 The UCR database of all GLEAN3 predictedhoney bee genes was used as the background to calculatethese metrics

A parametric test (MANOVA) and a nonparametric test (KolmogorovndashSmirnov) were conducted to identify significance levels for the twosets of filtered motifs found in the UCRs of caste-specificallyexpressed genes against filtered motifs found in the random UCRset Random motifs were sampled from a motif database (10 391motifs) generated by running our script 100 times with a randomsets of UCR sequences

The main criterion for identifying known regulatory motifs amongthese caste-specific ones was the alignment of the PSSM for eachbee motif with the Drosophila melanogaster sequences in theTRANSFAC database (release 40) (Wingender et al 2000) Onlythe alignments passing a threshold of 80 identity for each PSSMwere considered as significant matches In addition we checkedfor a specific binding motif the EcRUSP motif (rGkTCAaTGamcy-3prime) known to function in the expression of genes responding tomorphogenetic hormone titres (Perera et al 2005)

Operating system and programming tools

An Ubuntu Linux (version Breezy) operating system was used toimplement all scripts and pipelines designed for annotation proce-dures and motif discovery The Python programming language(httpwwwpythonorg) Biopython (httpwwwbiopythonorg)and TAMO (Tools for Analysis of Motifs) packages (Gordon et al2005) were used in program design Other web applications werebuilt using the Zope application server (httpwwwzopeorg)hosted at httpzulufmrpuspbrbeelab

Acknowledgements

This work was supported by grants from FAPESP(Fundaccedilatildeo de Amparo a Pesquisa do Estado de SatildeoPaulo 9900719-6) and CNPq (Conselho Nacional deDesenvolvimento Cientiacutefico e Tecnoloacutegico 4729632-3-1)

References

Abouheif E and Wray GA (2002) Evolution of the genetic networkunderlying wing polyphenism in ants Science 297 249ndash252

Al-Sharour F Diaz-Uriarte R and Dopazo J (2004) FatiGO aweb tool for finding significant associations of Gene Ontologyterms with groups of genes Bioinformatics 20 578ndash580

Amdam GV Norberg K Hagen A and Omholt SW (2003)Social exploitation of vitellogenin Proc Natl Acad Sci USA 1001799ndash1802

Amdam GV Simotildees ZLP Hagen A Norberg K SchroderK Mikkelsen O et al (2004) Hormonal control of the yolkprecursor vitellogenin regulates immune function and longevityin honeybees Exp Gerontol 39 767ndash773

Antoniewski C OrsquoGrady MS Edmondson RG Lassieur SMand Benes H (1995) Characterization of an EcRUSP heter-odimer target site that mediates ecdysone responsiveness ofthe Drosophila Lsp-2 gene Mol Gen Genet 249 545ndash556

Arbeitsman MN Furlong EEM Imam F Johnson E NullBH Baker BS et al (2002) Gene expression during the lifecycle of Drosophila melanogaster Science 297 2270ndash2275

Ashburner M Ball CA Blake JA Botstein D Butler HCherry JM et al (2000) Gene Ontology tool for the unificationof biology Nature Genet 25 25ndash29

Bailey TL and Elkan D (1995) Unsupervised learning of multiplemotifs in biopolymers using expectation maximizationMachine Learning J 21 58ndash83

Bergem M Norberg K and Aamodt RM (2006) Long-termmaintenance of in vitro cultured honeybee (Apis mellifera)embryonic cells BMC Dev Biol 6 17 online

Boutros M Kiger AA Armknecht S Kerr K Hild M Koch Bet al (2004) Genome-wide RNAi analysis of growth and viabil-ity in Drosophila cells Science 303 832ndash835

Carsten LD Watts T and Markov TA (2005) Gene expressionpatterns accompanying a dietary switch in Drosophila mela-nogaster Mol Ecol 14 3203ndash3208

Clarke ND and Granek JA (2003) Rank order metrics for quan-tifying the association of sequence features with gene regula-tion Bioinformatics 19 212ndash218

Colombani J Bianchini L Layalle S Pondeville E Dauphin-Villemant C Antoniewski C et al (2005) Antagonisticactions of ecdysone and insulins determine final size inDrosophila Science 310 667ndash670

Corona M Estrada E and Zurita M (1999) Differential expressionof mitochondrial genes between queens and workers duringcaste determination in the honeybee Apis mellifera J Exp Biol202 929ndash938

Cunha AC Nascimento AM Guidugli KR Simotildees ZLPand Bitondi MMG (2005) Molecular cloning and expressionof a hexamerin cDNA from the honey bee J Insect Physiol 511135ndash1147

Darwin CR (1859) On the Origin of Species by Means of NaturalSelection John Murray London

Davidson EH (2001) Genomic Regulatory Systems Developmentand Evolution Academic Press San Diego

Eder J Kremer JP and Rembold H (1983) Correlation ofcytochrome c titer and respiration in Apis mellifera adaptiveresponses to caste determination defines workers intercastesand queens Comp Biochem Physiol B 76 703ndash716

Engels W (1974) Occurrence and significance of vitellogenins infemale castes of social Hymenoptera Am Zool 14 1229ndash1237

Evans JD and Wheeler DE (1999) Differential gene expressionbetween developing queens and workers in the honey beeApis mellifera Proc Natl Acad Sci USA 96 5575ndash5580

Genomics of honey bee caste development and reproduction 713

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

Evans JD and Wheeler DE (2000) Expression profiles duringhoneybee caste determination Genome Biol 2 1 6 pagesonline httpgenomebiologycom200021research00011

Forbes A and Lehmann R (1998) Nanos and Pumilio havecritical roles in the development and function of Drosophilagermline stem cells Development 125 679ndash690

Gonzaacutelez-Reyes A Elliot H and St Johnston D (1995)Polarization of both major body axes in Drosophila by gurken-torpedo signalling Nature 375 654ndash658

Goodisman MA Isoe J Wheeler DE and Wells MA (2005)Evolution of insect metamorphosis a microarray-based studyof larval and adult gene expression in the ant Camponotusfestinatus Evolution 59 858ndash870

Gordon DB Nekludova L McCallum S and Fraenkel E(2005) TAMO a flexible object-oriented framework for analys-ing transcriptional regulation using DNA-sequence motifsBioinformatics 21 3164ndash3165

Guidugli KR Hepperle C and Hartfelder K (2004) A memberof the short-chain dehydrogenasereductase (SDR) super-family is a target of the ecdysone response in honey bee (Apismellifera) caste development Apidologie 35 37ndash47

Guidugli KR Nascimento AM Amdam GV Barchuk AROmholt SW Simotildees ZLP and Hartfelder K (2005)Vitellogenin regulates hormonal dynamics in the worker casteof a eusocial insect FEBS Lett 579 4961ndash4965

Hagedorn HH Maddison DR and Tu Z (1998) The evolutionof vitellogenins cyclorrhaphan yolk proteins and relatedmolecules Adv Insect Physiol 27 335ndash384

Hall TA (1999) BioEdit a user-friendly biological sequence align-ment editor and analysis program for Windows 9598NTNucleic Acids Symp Ser 41 95ndash98

Hamilton WD (1964) The genetical theory of social behaviour Iamp II J Theor Biol 7(1ndash16) 17ndash52

Hartfelder K and Emlen DJ (2005) Endocrine control of insectpolyphenism In Comprehensive Insect Molecular Science(Gilbert LI Iatrou K and Gill S eds) Vol 3 pp 651ndash703Elsevier Oxford

Hartfelder K and Engels W (1998) Social insect polymorphismHormonal regulation of plasticity in development and repro-duction in the honeybee Curr Topics Dev Biol 40 45ndash77

Hawkins NC Thorpe J and Schuumlpbach T (1996) encore agene required for the regulation of germ line mitosis and oocytedifferentiation during Drosophila oogenesis Development 122281ndash290

Haydak MH (1970) Honey bee nutrition Annu Rev Entomol 15143ndash156

Hepperle C and Hartfelder K (2001) Differentially expressedregulatory genes in honey bee caste development Naturwis-senschaften 88 113ndash116

Hoage TR and Kessel RG (1968) An electron microscopicstudy of the process of differentiation during spermatogenesis inthe drone honey bee (Apis mellifera L) with special reference tocentriole replication and elimination J Ultrastruct Res 24 6ndash32

Honey Bee Genome Sequencing Consortium (2006) Insightsinto social insects from the genome of the honeybee Apismellifera Nature (in press)

Hughes JD Estep PW Tavazole S and Church GM (2000)Computational identification of cis-regulatory elementsassociated with groups of functionally related genes inSaccharomyces cervisiae J Mol Biol 296 1205ndash1214

Hunt GJ and Page RE (1995) Linkage map of the honey bee

Apis mellifera based on RAPD markers Genetics 139 1371ndash1382

Hunt JH Buck NA and Wheeler DE (2003) Storage proteinsin vespid wasps characterization developmental pattern andoccurrence in adults J Insect Physiol 49 785ndash794

Judice CC Carazzole MF Festa F Sogayar MC HartfelderK and Pereira GAG (2006) Gene expression profiles under-lying alternative caste phenotypes in a highly eusocial beeMelipona quadrifasciata Insect Mol Biol 15 33ndash44

Kerr WE (1950) Genetic determination of castes in the genusMelipona Genetics 35 143ndash152

Kropaacutecovaacute S and Haslbachovaacute H (1971) The influence ofqueenlessness and of unsealed brood on the development ofovaries in worker honeybees J Apicult Res 10 57ndash61

Lattorff HMG Moritz RFA and Fuchs S (2005) A single genedetermines thelytokous parthenogenesis in honey bee workers(Apis mellifera capensis) Heredity 94 533ndash537

Lattorff HMG Moritz RFA Solignac M and Crewe RM(2006) Control of reproductive dominance by the thelytokygene in honeybees Biol Lett (in press)

Li TR and White KP (2003) Tissue-specific gene expressionand ecdysone-regulated genomic networks in Drosophila DevCell 5 59ndash72

Lin H Yue L and Spradling AC (1994) The Drosophila fusomea germline-specific organelle contains membrane skeletalproteins and functions in cyst formation Development 120947ndash956

Liu XS Brutlag DL and Liu JS (2002) An algorithm for findingprotein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments Nature Biotechnol20 835ndash839

Liu Y Liu S Wei L Altman RB and Batzoglou S (2004)Eukaryotic regulatory element conservation analysis andidentification using comparative genomics Genome Res 14354ndash366

Martinez T Burmester T Veenstra JA and Wheeler DE(2001) Sequence and evolution of a hexamerin from the antCamponotus festinatus Insect Mol Biol 9 427ndash431

McKearin D (1997) The Drosophila fusome organelle biogenesisand germ cell differentiation if you build it hellip Bioessays 19147ndash152

Mirth C Truman JW and Riddiford LM (2005) The role of theprothoracic gland in determining critical weight for metamor-phosis in Drosophila melanogaster Curr Biol 15 1ndash12

Moritz RFA Kryger P and Allsopp MH (1996) Competition forroyalty in bees Nature 384 31

Page RE and Erickson EH (1988) Reproduction by worker honeybees (Apis mellifera L) Behav Ecol Sociobiol 23 117ndash126

Pereboom JJM Jordan WC Sumner S Hammond RL andBourke AFG (2005) Differential gene expression in queen-worker caste determination in bumble bees Proc R Soc B 2721145ndash1152

Perera SC Zheng S Feng QL Krell PJ Retnakaran A andPalli SR (2005) Heterodimerization of ecdysone receptor andultraspiracle on symmetric and asymmetric response elementsArch Insect Biochem Physiol 60 55ndash70

Piulachs MD Guidugli KR Barchuk AR Cruz J SimotildeesZLP and Belleacutes X (2003) The vitellogenin of the honeybeeApis mellifera Structural analysis of the cDNA and expressionstudies Insect Biochem Mol Biol 33 459ndash465

Pritsker M Liu YC Beer MA and Tavazole S (2004) Whole

714 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

genome discovery of transcription factor binding sites bynetwork-level conservation Genome Res 14 99ndash108

Rachinsky A Strambi C Strambi A and Hartfelder K (1990)Caste and metamorphosis ndash hemolymph titers of juvenilehormone and ecdysteroids in last instar honeybee larvae GenComp Endocr 79 31ndash38

Riddihough G and Pelham HRB (1987) An ecdysone responseelement in the Drosophila hsp27 promotor EMBO J 6 3729ndash3734

Roth FP Hughes JD Estep PW and Church GM (1998)Finding DNA regulatory motifs within unaligned noncodingsequences clustered by whole-genome mRNA quantificationNature Biotechnol 16 939ndash945

Rutherford K Parkhill J Crook J Horsnell T Rice PRajandream MA and Barrell B (2000) Artemis sequencevisualisation and annotation Bioinformatics 16 944ndash945

Schmidt-Capella IC and Hartfelder K (1998) Juvenile hormoneeffect on DNA synthesis and apoptosis in caste-specificdifferentiation of the larval honey bee (Apis mellifera L) ovaryJ Insect Physiol 44 385ndash391

Snodgrass RE (1956) Anatomy of the Honey Bee Cornell Uni-versity Press Ithaca

Solignac M Vautrin D Baudry E Mougel F Loiseau A andCornuet JM (2004) A microsatellite-based linkage map of thehoneybee Apis mellifera L Genetics 167 253ndash262

St Johnston D and Nuumlsslein-Volhard C (1992) The origin of pat-tern and polarity in the Drosophila embryo Cell 68 201ndash219

Staab S and Steinmann-Zwicky M (1995) Female germ cells ofDrosophila require zygotic ovo and otu product for survival inlarvae and pupae respectively Mech Dev 54 205ndash210

Sullivan AA and Thummel CS (2003) Temporal profiles ofnuclear receptor gene expression profiles reveal coordinatetranscriptional responses during Drosophila development MolEndocrinol 17 2125ndash2137

Wassermann WW and Sandelin A (2004) Applied bioinformaticsfor the identification of regulatory elements Nature Rev Genet5 267ndash287

West-Eberhard MJ (2003) Developmental Plasticity andEvolution Oxford University Press Oxford

Wheeler DE and Nijhout HF (2003) A perspective for under-standing the modes of juvenile hormone as a lipid signalingsystem Bioessays 25 994ndash1001

Wilson EO (1971) The Insect Societies Belknapp Press of Har-vard University Press Cambridge MA

Wingender E Chen X Hehl R Karas H Liebich I Matys Vet al (2000) TRANSFAC an integrated system for geneexpression regulation Nucleic Acids Res 28 316ndash319

Zhou X Oi FM and Scharf ME (2006a) Social exploitationof hexamerin RNAi reveals a major caste-regulatory factor intermites Proc Natl Acad Sci USA 103 4499ndash4504

Zhou X Tarver MR Bennett GW Oi FM and Scharf ME(2006b) Two hexamerin genes from the termite Reticulitermesflavipes sequence expression and proposed functions incaste regulation Gene 376 47ndash58

Zhu Z Shendure J and Church GM (2005) Discoveringfunctional transcription factor combinations in the human cellcycle Genome Res 15 848ndash855

Supplementary material

The following material is available for this article online

Table S1 Annotation results of caste-specifically expressedhoney bee genes

Table S2 Caste-specific motifs in UCRs of honey bee genes

Table S3 Annotation of honey bee orthologs to Drosophilamelanogaster genes involved in Reproduction

This material is available as part of the online article fromhttpwwwblackwell-synergycom

Page 5: Caste development and reproduction: a genome-wide analysis of hallmarks of insect eusociality

Genomics of honey bee caste development and reproduction

707

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society

Insect Molecular Biology

15

703ndash714

the large-scale RNAi assays established for

Drosophila

(Boutros

et al

2004) but the development of cell cultureapproaches in the honey bee (Bergem

et al

2006) repre-sents a step in this direction

Alternatively regulatory functional associations betweengenes and their integration into networks can be inferredfrom the presence of response elements in upstreamcontrol regions In our analysis of differentially expressedgenes in queen-worker development we took a bioinformaticsapproach for a first look into the molecular architecture of adevelopmental polyphenism

Motif search in upstream regions of differentially expressed genes

The genes related to caste development are among thefirst honey bee genes for which experimentally validatedexpression data were generated (Corona

et al

1999Evans amp Wheeler 1999 2000 Hepperle amp Hartfelder2001 Guidugli

et al

2004) Certainly these 51 genes donot comprise all the genes involved in caste developmentbut they are expected be prominent players as they werethe ones that stood out in the SSH and DDRT-PCRapproaches The 51 caste genes do not represent genefamilies but rather fall into many very different molecularfunction categories This made us ask whether theobserved overexpression pattern of different genes ineither queen or worker larvae may be associated with theoccurrence of specific regulatory motifs in the upstreamcontrol regions (UCR) of these genes

Three different algorithms AlignACE (Roth

et al

1998)MEME (Bailey amp Elkan 1995) and MDscan (Liu

et al

2002) were used to construct a pipeline for detectingoverrepresented motifs in the two unaligned sets of UCRsequences for the caste-specifically expressed genes Thispipeline was run on a lsquotop-10rsquo set of 12 genes (six for eachcaste) which showed the most pronounced caste differ-ences in expression (Evans amp Wheeler 2000) and alsoon a randomly selected set of UCRs (background control)We calculated four different metrics for each motif MAPscore (Roth

et al

1998) a group-specificity score (Churchscore) (Hughes

et al

2000) and a ROC AUC and MNCPmetric (Clarke amp Granek 2003) A first set of filters wasused to detect motifs with a potential for regulatory functions(MAP score

ge

5 ROC AUC

ge

07) This resulted in 46 motifsout of 123 total UCR motifs found in the queen UCR setand in 71 motifs out of 261 total found in the worker UCRset (Supplementary material Table 2S

)

A parametric statistical test (

MANOVA

P

= 00001Wilksrsquo = 078

F

= 72) and a nonparametric statistical test(KolmogorovndashSmirnov Table 1) on ROC AUC and MNCPindices showed that these two sets of filtered motifs aresignificantly different from a randomly selected set ofmotifs The rank-order metrics ROC AUC and MNCP havepreviously been used to compare the association of short

regulatory sequence features with gene expression data(microarray analyses on coregulated genes) and they havebeen useful in flagging false positives erroneously includedin lsquotop-10rsquo sets of differentially expressed genes (Clarke ampGranek 2003)

To select highly specific motifs found in each data set weused the group-specificity score (Church score

le

1e

minus

05

Hughes

et al

2000) to identify the most likely motifsinvolved in decision making for pathways leading to queen(two motifs Fig 3A) or to worker development (12 motifswith Church score

le

1e

minus

07

Fig 3B) As the SSH and DDRT-PCR approaches on caste development can be expectedto retrieve only a subpopulation of such genes these motifsrepresent only a partial scenario of the transcriptionalregulatory network underlying caste development Themotifs can now be used to screen other GLEAN3-predictedgenes to integrate a candidate list of putatively coregulatedgenes in caste development that can be submitted tofurther experimental validation

Each motif found in UCRs of queen (46) and worker (71)overexpressed genes was compared with the entire setof

D melanogaster cis

-regulatory motifs contained in theTRANSFAC database (version 40 Wingender

et al

2000)Only alignments passing 80 identity for each position-specific site matrix (PSSM) were considered as significantmatches Whereas none of the most specific motifs foreach caste showed similarity to any of the

D melanogaster

motifs some of the more ubiquitous ones did resemblebinding sites of transcription factors such as

AntennapediaUltrabithorax

zerknuumlllt

even skipped

trithorax-like

tailless

paired

fushi tarazu

and

Adh transcription factor 1

(Supple-mentary material Table 2S)

When we plotted the positions of the two queen and the12 worker motifs in the UCRs of the caste-specificallyexpressed genes (Fig 4) an interesting pattern emergedfor the worker-specific motifs Some of the worker motifsappeared to be clustered and occurring in tandem further-more they were positioned relatively close to the predictedtranslation start sites in some of the genes that are over-expressed during worker development (annotation resultsof these genes are listed in Supplementary materialTable 1S) A position close to the predicted translation start

Table 1 KolmogorovndashSmirnov analysis of ROC AUC and MNCP metric for statistical significance of putative regulatory motifs in upstream control regions of genes with queen or worker-specific expression patterns These motifs were contrasted with a random set of motifs detected in a random set of UCRs of GLEAN3-predicted honey bee genes

Group pairs ROC AUC MNCP

Random times (Queen + Worker) P gt 01 P lt 0001Random times Queen P gt 01 P lt 0005Random times Worker P gt 01 P lt 0001Queen times Worker P lt 01 P gt 01

708 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

sites is generally taken as a sign of strong regulatory effect(Davidson 2001)

As caste development is highly dependent on changesin haemolymph titres of JH and ecdysteroids we alsoscreened the UCRs of the differentially expressed genesfor putative nuclear receptor binding sites Regulatoryelements involved in the JH response are not well under-stood yet so any prediction in this direction would beelusive (Wheeler amp Nijhout 2003) Functional ecdysoneresponse elements (EcRE) have however been identifiedand it is now well established that the EcRUSP complexbinds to direct or inverted (palindromic) repeats (Riddi-hough amp Pelham 1987 Antoniewski et al 1995 Perera

et al 2005) A PSSM search (Wassermann amp Sandelin2004) based on a canonical representation (rGkTCAaT-Gamcy) (Perera et al 2005) did not reveal any putativeEcRE motif in the UCRs of the 51 caste-differentiallyexpressed genes However this does not rule out thatthese genes respond to changes in JH andor ecdysteroidtitres as these hormones require EcRUSP binding prima-rily in the expression of early genes but not necessarily forthe late response genes (Li amp White 2003 Sullivan ampThummel 2003)

In conclusion the predictions from such a combinedstrategy that searches for group-specific and for conservedregulatory motifs in GLEAN3 predicted honey bee genes

Figure 3 Putative regulatory motifs and their consensus sequences in UCRs of queen and worker overexpressed genes Scores for MAP Church ROC AUC and MNCP metrics indicate degree of group specificity and significance level

Genomics of honey bee caste development and reproduction 709

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

represents a major transition from nonhypothesis-drivenhigh-throughput screens to hypothesis-driven searches forcontext-dependent gene expression in honey bees Suchdirected search results can serve as a platform for experi-mental analyses of genome-wide integration in hormonalcontrol of caste development in bees In addition this studyexemplifies how existent algorithms for detecting sharedregulatory motifs can be joined into a toolkit for predictingcoregulated gene expression patterns in honey bees Thesemethods have been shown to be robust and are gainingacceptance for use in functional and comparative genomics(Liu et al 2004 Pritsker et al 2004 Zhu et al 2005)

Oogenesis and reproduction

As caste development sets the stage for reproductivedivision of labour genes involved in reproductive processesare strong candidates for functional analyses In the presentstudy we performed BLAST searches to identify honey beeorthologs for a list of 32 fly genes with the GO attributelsquooogenesisrsquo and for four genes specifically related tolsquovitellogenesisrsquo The list for fly genes involved in nuclearevents in germ cells consisted of 20 genes for lsquofemalemeiosisrsquo 12 genes for lsquorecombinationrsquo and 21 genes underthe heading lsquochromosome segregation including segregation

distortionrsquo (Supplementary material Table 3S) In some casesthese GO attributes for fly genes of course overlapped

BLASTN and BLASTX searches for these fly genesagainst the honey bee genome assembly 30 and theGLEAN3 Official Set (aa) retrieved statistically well sup-ported putative bee orthologs for most of these candidatesFor the genes involved in meiosis recombination and chro-mosome segregation this finding although not unexpectedis of interest as meiosis in the haploid honey bee drone isstrongly modified when compared with a normal diploidmeiosis The first meiosis is initiated but the nucleusremains undivided and only the superfluous centrioles areeliminated as cytoplasmatic buds (Hoage amp Kessel 1968)An interesting gene thelytoky (th) has recently beenmapped in this context (Lattorff et al 2005) It preventsalmost completely meiotic recombination in the automixisof laying workers of the Cape honey bee As an indicationof the interplay between meiosis and later developmentthis locus also appears to be an integral part of variousgene cascades involved in caste determination (Lattorffet al 2006)

The fly genes retrieved in the GO searches for lsquooogenesisrsquorepresent a much larger range of Molecular Functioncategories such as transcription factors proteins regulating

Figure 4 Map of the group-specific motifs found in queen and worker UCRs of caste-specifically expressed genes The coding region is represented by the GLEAN3 prediction number (assembly 40) with arrows indicating the translation start site Asterisks mark UCRs of the lsquotop10rsquo set used to find the over-represented motifs

710 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

translation by RNA binding RNA helicases enzymes(ubiquitination transfer of sugar residues sulfotransferase)GTPase activity and several factors binding to cytoskeletalproteins (Supplementary material Table 3S) This widerange of functional categories is expected as these genesare involved in a series of different steps during oogenesisin the polytrophic meroistic ovary Oogenesis starts out withthe maintenance of germline and somatic stem cell identityin the germline niche in the upper germarium A key geneinvolved in this process is pumilio (Forbes amp Lehmann1998) which is represented by a highly conserved beeortholog GB10504-PA The second step is the formation ofgerm cell cysts the determination of an oocyte within eachcyst and the survival of these cysts involving genes suchas benign gonial cell neoplasm (Lin et al 1994) encore(Hawkins et al 1996) ovo and ovarian tumour (otu) (Staabamp Steinmann-Zwicky 1995) all well conserved in the honeybee genome Interestingly we could not find a clear beeortholog for bag of marbles (bam) which is one of the primeearly response genes in the cystoblast differentiationpathway in Drosophila (McKearin 1997)

The third step comprises previtellogenic growth of thefollicle and during these stages a number of maternalfactors are deposited and anchored either within the oocyteor in the perivitelline space that define the egg and thefuture embryonic axes (for review see St Johnston ampNuumlsslein-Volhard (1992) In the list of Drosophila genesinvolved in early steps of axis determination a couple ofsurprises came up in the search for honey bee orthologsA big surprise was that we could not find a gurken orthologin the bee even though this gene sets up both the anteriorndashposterior and dorsalndashventral axes in the Drosophila egg(Gonzaacutelez-Reyes et al 1995) whereas downstreamcomponents of the Gurken signalling cascade appear tobe preserved in the bee genome (Honey Bee GenomeSequencing Consortium 2006) Similar apparent gaps inconstituents of patterning cascades were noted for theterminal regions of the embryo such as a lack of a torsoortholog whereas its ligand torso-like is represented by awell conserved ortholog in the bee (GB18663-PA)

With respect to genes involved in the final processes ofoogenesis we primarily looked at genes that play a partduring vitellogenesis There are four genes of interest in thisclass the primary one coding for the yolk protein precursorvitellogenin This gene has already been sequenced for thehoney bee (Piulachs et al 2003) and as expected it ismuch more related to vitellogenins of other insects andeven vertebrates than to the Drosophila yolk proteinswhich apparently are derived from lipases (Hagedorn et al1998) The second gene of interest is the bee ortholog toyolkless as this (GB16571-PA) could represent a putativevitellogenin receptor The other two Drosophila genes withclear orthologs in the bee are CG18641 and CG12139which code for a lipase and an LDL receptor respectively

General conclusions

The current analysis made use of previous experimentalanalyses on differential transcription during caste develop-ment of honey bee larvae In the annotation of these geneswhich includes references to Gene Ontology terms asso-ciated with their respective Drosophila orthologs two majorconfigurations emerged First of all worker genes werebetter defined in terms of GO attributes compared with therelatively large number of queen genes that had no GOterms associated to their respective Drosophila orthologsEven when taking into consideration the conceptual limitsin attributing GO terms on molecular function and biologicalprocess from Drosophila orthologs to honey bee genesthis finding could have a bearing on general basicquestions in socioevolution namely which caste is moredivergent from a nonsocial reproductive female bee proto-type or reproductive ground plan the queen or the workerLess speculative is the second major conclusion comingout of the GO analysis for Molecular Function showing andconfirming (Eder et al 1983 Corona et al 1999) theimportant role of metabolic regulation in caste developmentThis facet is demonstrated especially clearly in thecaste-specific expression of oxidoreductases (queen)vs hydrolases (workers)

The honey bee genome information provided not only amuch improved annotation platform for caste-specificallyexpressed ESTs but even more so opens the possibilityto explore putative regulatory features of the honey beegenome In the current study we employed modified Gibbssampling and expectation-maximization algorithms (Alig-nACE MDScan MEME) to detect group-specific motifs ingene regions up to 1000 bp upstream of translation startsites We detected 14 motifs that were significantly over-represented in the caste genes when compared withcorresponding motifs found in a random set of GLEAN3-predicted honey bee genes The localization of suchmotifs in UCRs of worker-overexpressed genes revealeda clustering of such motifs close to the predicted basalpromotor regions suggesting strong regulatory effectsSuch search strategies and the detected motifs can providethe lead to reveal and unravel cis-regulatory networks forand within specific contexts of honey bee biology

Caste polyphenism in social insects makes a strong casefor the emergence of novelties at a microevolutionary level(West-Eberhard 2003) Without the pretension to discussexhaustively the mechanisms underlying this surge ofdevelopmental plasticity two major themes becomeapparent in this and other studies Regulatory change hasbeen demonstrated in the shut-down of wing disc pattern-ing cascades in ants (Abouheif amp Wray 2002) and iscertainly also implicit in observed temporal changes ingene expression during postembryonic development ofants and bumble bees (Goodisman et al 2005 Pereboom

Genomics of honey bee caste development and reproduction 711

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

et al 2005) Such change would be expected to involvecis-regulatory elements that is change in transcriptionfactor binding sites in UCRs as approached in this studyand also evolutionary change in response thresholds tocirculating morphogenetic hormones (for review see Hart-felder amp Emlen 2005) The second and quite unexpectedtheme is the acquisition of new systemic functions byevolutionary rather old proteins such as vitellogenin andhexamerins These apparently unspectacular proteinshave evolved into key players for caste evolution and repro-ductive division of labour via novel regulatory connectivitywith JH (Amdam et al 2004 Guidugli et al 2005 Zhouet al 2006ab)

Experimental procedures

Selection and annotation of ESTs representing differentially expressed genes in honey bee caste development

The starting point were 164 entries (mainly 3prime-ESTs) in GenBank(BG101532ndashBG101697) from an SSH library (Evans amp Wheeler1999 Evans amp Wheeler 2000) When validated by macroarrayanalyses a clustering into three major classes became apparent(I) genes overexpressed in young larvae (II) genes overexpressedin last instar queen larvae and (III) genes overexpressed in lastinstar worker larvae For this study we excluded the class I ESTsbecause their expression is not caste-specific but rather representsexpression differences between young (still bipotent) and olderlarvae To the class II queen ESTs (82) we added one completecDNA entry (AY601642) from a DDRT-PCR screen (Corona et al1999) and to the class III set of worker ESTs (40) we added sevenGenBank dbEST entries (BG149167ndashBG149173) from a DDRT-PCR screen on ovary development (Hepperle amp Hartfelder 2001)

The EST sequences were submitted to BLASTN searches(parameters -G 2 -E 3 -W 15 -F lsquom Drsquo -U -e 1e-20) against genomesequence assembly Amel_v30 to retrieve matches in linked orunlinked genomic contigs and to exclude no-matches (seven ESTsin queen) ESTs that aligned within the same scaffold werechecked for clustering and overlap This clustering also served toexclude genes that were represented by non-overlapping ESTsfrom both castes This procedure generated a set of 51 uniqueputative gene sequences overexpressed in either queen (34) or inworker larvae (17) These 51 nonredundant sequences weresubmitted to BLASTX searches against the Official Set of GLEAN3-predicted protein sequences (cut-off value at 1eminus20) For ESTs withno significant protein sequence matches the genomic regionsadjacent to the mapped EST were searched to find neighbouringORFs especially those nearest to putative 3prime UTRs of predictedproteins as the EST libraries have a bias in this direction

Official Set protein sequences were aligned against Amel_v30sequence assembly using TBLASTN to map protein to genomeand subsequently they were aligned using BLASTP against theGenBank nonredundant (nr) and the Flybase protein sequencedatabases The manual features annotation procedure of theArtemis 70 program (Rutherford et al 2000) was used to mapORFs putative splice sites of exons and ESTs to genome coordi-nates The final annotation file was generated with a Python scriptin GFF format (httpwwwsangeracukSoftwareformatsGFF)

Honey bee sequences annotated as orthologs to D mela-nogaster genes were putatively assigned the GO terms listed in

the respective Flybase entry In addition the definition of new GOterms (Biological Process ontology) related to caste developmentand polyphenism (GO0048651 and GO0048650 respectively)was co-ordinated with the Gene Ontology Consortium (Ashburneret al 2000) The FatiGO web tool (Al-Sharour et al 2004) wasused to cluster GO terms (level 3 setting) for Biological Processand Molecular Function

For the detection of conserved domains the 51 proteinsequences were screened against the Pfam database (httpwwwsangeracukSoftwarePfam) using the HMMER platform(current release 232 httphmmerwustledu) with a cut-offvalue set at 1eminus10

Annotation of oogenesis and reproduction genes

In order to identify putative honey bee orthologs to D melanogastergenes we searched the following GO terms in Flybase lsquooogenesisrsquo(GO0009993) lsquovitellogenesisrsquo (GO0007296) lsquofemale meiosisrsquo(GO0007143) lsquoDNA recombinationrsquo (GO0006310) and lsquochromo-some segregationrsquo (GO0007059) Genes related to segregationdistortion were searched for in Flybase in phenotypic descriptionsand mutant effects of D melanogaster genes as this phenomenonis not represented by a GO term Hence this group may be moreheterogeneous than the others From this list we removed genesof pleiotropic function (multifaceted GO entries in Biological Process)and genes that lacked defined transcripts in the Drosophilagenome database

For the GO terms lsquooogenesisrsquo and lsquovitellogenesisrsquo we performedTBLASTN and BLASTP searches for 42 fruit fly genes against theAmel_v30 genome assembly and the GLEAN3-predicted proteinsequences (Honey Bee Genome Sequencing Consortium 2006)respectively The orthologous D melanogaster gene was charac-terized by the same procedure as described above (reciprocalbest hit) For the GO terms lsquofemale meiosisrsquo lsquoDNA recombinationrsquolsquochromosome segregationrsquo and the non-GO group lsquosegregationdistortionrsquo transcripts of D melanogaster were searched againstnr databases at NCBI using BLASTP The obtained sequenceswere searched against the Amel_v30 genome assembly and theGLEAN3-predicted protein sequences using TBLASTN Homolo-gous sequences (threshold 1eminus10) were predicted using the BioEditsoftware (Hall 1999) ORFs showing significant homology (BLASTPthreshold 1eminus20) were assembled and used in BLASTP searchesagainst the nr databases at NCBI

Motif search in upstream regions in caste-specifically expressed genes

In order to detect overrepresented motifs in the upstream controlregions (UCRs) of the two sets of caste-related genes we selectedgene subsets based on two criteria (1) those that had shown thehighest caste-specificity in the array analyses (Evans amp Wheeler2000) and (2) those that had a conserved 5prime region when comparedwith the Drosophila orthologs These lsquotop10rsquo genes consisted of sixqueen genes (GB13072 GB11628 GB19380 GB14798 GB16047and GB18242) and of six worker genes (GB10869 GB12371GB12239 GB10428 GB19006 GB14758) The motif search wasconducted separately on the two sets of UCR sequences using threemethods AlignAce (Roth et al 1998) MEME (Bailey amp Elkan1995) and MDscan (Liu et al 2002) Default parameters valueswere used in all searches except that GC content in intergenicregions was set to 25 representing the background value estab-lished for the honey bee UCR database generated in this study

712 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

The database containing 10156 UCR sequences was gener-ated by parsing the Official Set annotation file (downloaded in GFFformat from httpwwwbeegenomehgscbcmtmcedubeeftphtml)to extract upstream regions starting from the terminal 5prime-genomiccoordinate of each predicted CDS The UCRs were arbitrarily setto a size frame of 1000 nucleotides (Roth et al 1998) but weretrimmed whenever another predicted ORF was detected in any ofthese regions

The MAP (maximum a priori log likelihood) score group specificityscore (called Church score in this manuscript) (Hughes et al2000) ROC AUC (area under the curve for a receiver-operatorcharacteristic plot) metric and MNCP (mean normalized condi-tional probability) metric (Clarke amp Granek 2003) were used todetect motifs that most likely correspond to biologically significantcis-regulatory elements The filters ran on the UCRs of the subsetsof queen and worker genes were a MAP score cut-off value of 50followed by a ROC AUC cut-off at 07 followed by a group specificityscore cut-off at 1eminus05 The UCR database of all GLEAN3 predictedhoney bee genes was used as the background to calculatethese metrics

A parametric test (MANOVA) and a nonparametric test (KolmogorovndashSmirnov) were conducted to identify significance levels for the twosets of filtered motifs found in the UCRs of caste-specificallyexpressed genes against filtered motifs found in the random UCRset Random motifs were sampled from a motif database (10 391motifs) generated by running our script 100 times with a randomsets of UCR sequences

The main criterion for identifying known regulatory motifs amongthese caste-specific ones was the alignment of the PSSM for eachbee motif with the Drosophila melanogaster sequences in theTRANSFAC database (release 40) (Wingender et al 2000) Onlythe alignments passing a threshold of 80 identity for each PSSMwere considered as significant matches In addition we checkedfor a specific binding motif the EcRUSP motif (rGkTCAaTGamcy-3prime) known to function in the expression of genes responding tomorphogenetic hormone titres (Perera et al 2005)

Operating system and programming tools

An Ubuntu Linux (version Breezy) operating system was used toimplement all scripts and pipelines designed for annotation proce-dures and motif discovery The Python programming language(httpwwwpythonorg) Biopython (httpwwwbiopythonorg)and TAMO (Tools for Analysis of Motifs) packages (Gordon et al2005) were used in program design Other web applications werebuilt using the Zope application server (httpwwwzopeorg)hosted at httpzulufmrpuspbrbeelab

Acknowledgements

This work was supported by grants from FAPESP(Fundaccedilatildeo de Amparo a Pesquisa do Estado de SatildeoPaulo 9900719-6) and CNPq (Conselho Nacional deDesenvolvimento Cientiacutefico e Tecnoloacutegico 4729632-3-1)

References

Abouheif E and Wray GA (2002) Evolution of the genetic networkunderlying wing polyphenism in ants Science 297 249ndash252

Al-Sharour F Diaz-Uriarte R and Dopazo J (2004) FatiGO aweb tool for finding significant associations of Gene Ontologyterms with groups of genes Bioinformatics 20 578ndash580

Amdam GV Norberg K Hagen A and Omholt SW (2003)Social exploitation of vitellogenin Proc Natl Acad Sci USA 1001799ndash1802

Amdam GV Simotildees ZLP Hagen A Norberg K SchroderK Mikkelsen O et al (2004) Hormonal control of the yolkprecursor vitellogenin regulates immune function and longevityin honeybees Exp Gerontol 39 767ndash773

Antoniewski C OrsquoGrady MS Edmondson RG Lassieur SMand Benes H (1995) Characterization of an EcRUSP heter-odimer target site that mediates ecdysone responsiveness ofthe Drosophila Lsp-2 gene Mol Gen Genet 249 545ndash556

Arbeitsman MN Furlong EEM Imam F Johnson E NullBH Baker BS et al (2002) Gene expression during the lifecycle of Drosophila melanogaster Science 297 2270ndash2275

Ashburner M Ball CA Blake JA Botstein D Butler HCherry JM et al (2000) Gene Ontology tool for the unificationof biology Nature Genet 25 25ndash29

Bailey TL and Elkan D (1995) Unsupervised learning of multiplemotifs in biopolymers using expectation maximizationMachine Learning J 21 58ndash83

Bergem M Norberg K and Aamodt RM (2006) Long-termmaintenance of in vitro cultured honeybee (Apis mellifera)embryonic cells BMC Dev Biol 6 17 online

Boutros M Kiger AA Armknecht S Kerr K Hild M Koch Bet al (2004) Genome-wide RNAi analysis of growth and viabil-ity in Drosophila cells Science 303 832ndash835

Carsten LD Watts T and Markov TA (2005) Gene expressionpatterns accompanying a dietary switch in Drosophila mela-nogaster Mol Ecol 14 3203ndash3208

Clarke ND and Granek JA (2003) Rank order metrics for quan-tifying the association of sequence features with gene regula-tion Bioinformatics 19 212ndash218

Colombani J Bianchini L Layalle S Pondeville E Dauphin-Villemant C Antoniewski C et al (2005) Antagonisticactions of ecdysone and insulins determine final size inDrosophila Science 310 667ndash670

Corona M Estrada E and Zurita M (1999) Differential expressionof mitochondrial genes between queens and workers duringcaste determination in the honeybee Apis mellifera J Exp Biol202 929ndash938

Cunha AC Nascimento AM Guidugli KR Simotildees ZLPand Bitondi MMG (2005) Molecular cloning and expressionof a hexamerin cDNA from the honey bee J Insect Physiol 511135ndash1147

Darwin CR (1859) On the Origin of Species by Means of NaturalSelection John Murray London

Davidson EH (2001) Genomic Regulatory Systems Developmentand Evolution Academic Press San Diego

Eder J Kremer JP and Rembold H (1983) Correlation ofcytochrome c titer and respiration in Apis mellifera adaptiveresponses to caste determination defines workers intercastesand queens Comp Biochem Physiol B 76 703ndash716

Engels W (1974) Occurrence and significance of vitellogenins infemale castes of social Hymenoptera Am Zool 14 1229ndash1237

Evans JD and Wheeler DE (1999) Differential gene expressionbetween developing queens and workers in the honey beeApis mellifera Proc Natl Acad Sci USA 96 5575ndash5580

Genomics of honey bee caste development and reproduction 713

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

Evans JD and Wheeler DE (2000) Expression profiles duringhoneybee caste determination Genome Biol 2 1 6 pagesonline httpgenomebiologycom200021research00011

Forbes A and Lehmann R (1998) Nanos and Pumilio havecritical roles in the development and function of Drosophilagermline stem cells Development 125 679ndash690

Gonzaacutelez-Reyes A Elliot H and St Johnston D (1995)Polarization of both major body axes in Drosophila by gurken-torpedo signalling Nature 375 654ndash658

Goodisman MA Isoe J Wheeler DE and Wells MA (2005)Evolution of insect metamorphosis a microarray-based studyof larval and adult gene expression in the ant Camponotusfestinatus Evolution 59 858ndash870

Gordon DB Nekludova L McCallum S and Fraenkel E(2005) TAMO a flexible object-oriented framework for analys-ing transcriptional regulation using DNA-sequence motifsBioinformatics 21 3164ndash3165

Guidugli KR Hepperle C and Hartfelder K (2004) A memberof the short-chain dehydrogenasereductase (SDR) super-family is a target of the ecdysone response in honey bee (Apismellifera) caste development Apidologie 35 37ndash47

Guidugli KR Nascimento AM Amdam GV Barchuk AROmholt SW Simotildees ZLP and Hartfelder K (2005)Vitellogenin regulates hormonal dynamics in the worker casteof a eusocial insect FEBS Lett 579 4961ndash4965

Hagedorn HH Maddison DR and Tu Z (1998) The evolutionof vitellogenins cyclorrhaphan yolk proteins and relatedmolecules Adv Insect Physiol 27 335ndash384

Hall TA (1999) BioEdit a user-friendly biological sequence align-ment editor and analysis program for Windows 9598NTNucleic Acids Symp Ser 41 95ndash98

Hamilton WD (1964) The genetical theory of social behaviour Iamp II J Theor Biol 7(1ndash16) 17ndash52

Hartfelder K and Emlen DJ (2005) Endocrine control of insectpolyphenism In Comprehensive Insect Molecular Science(Gilbert LI Iatrou K and Gill S eds) Vol 3 pp 651ndash703Elsevier Oxford

Hartfelder K and Engels W (1998) Social insect polymorphismHormonal regulation of plasticity in development and repro-duction in the honeybee Curr Topics Dev Biol 40 45ndash77

Hawkins NC Thorpe J and Schuumlpbach T (1996) encore agene required for the regulation of germ line mitosis and oocytedifferentiation during Drosophila oogenesis Development 122281ndash290

Haydak MH (1970) Honey bee nutrition Annu Rev Entomol 15143ndash156

Hepperle C and Hartfelder K (2001) Differentially expressedregulatory genes in honey bee caste development Naturwis-senschaften 88 113ndash116

Hoage TR and Kessel RG (1968) An electron microscopicstudy of the process of differentiation during spermatogenesis inthe drone honey bee (Apis mellifera L) with special reference tocentriole replication and elimination J Ultrastruct Res 24 6ndash32

Honey Bee Genome Sequencing Consortium (2006) Insightsinto social insects from the genome of the honeybee Apismellifera Nature (in press)

Hughes JD Estep PW Tavazole S and Church GM (2000)Computational identification of cis-regulatory elementsassociated with groups of functionally related genes inSaccharomyces cervisiae J Mol Biol 296 1205ndash1214

Hunt GJ and Page RE (1995) Linkage map of the honey bee

Apis mellifera based on RAPD markers Genetics 139 1371ndash1382

Hunt JH Buck NA and Wheeler DE (2003) Storage proteinsin vespid wasps characterization developmental pattern andoccurrence in adults J Insect Physiol 49 785ndash794

Judice CC Carazzole MF Festa F Sogayar MC HartfelderK and Pereira GAG (2006) Gene expression profiles under-lying alternative caste phenotypes in a highly eusocial beeMelipona quadrifasciata Insect Mol Biol 15 33ndash44

Kerr WE (1950) Genetic determination of castes in the genusMelipona Genetics 35 143ndash152

Kropaacutecovaacute S and Haslbachovaacute H (1971) The influence ofqueenlessness and of unsealed brood on the development ofovaries in worker honeybees J Apicult Res 10 57ndash61

Lattorff HMG Moritz RFA and Fuchs S (2005) A single genedetermines thelytokous parthenogenesis in honey bee workers(Apis mellifera capensis) Heredity 94 533ndash537

Lattorff HMG Moritz RFA Solignac M and Crewe RM(2006) Control of reproductive dominance by the thelytokygene in honeybees Biol Lett (in press)

Li TR and White KP (2003) Tissue-specific gene expressionand ecdysone-regulated genomic networks in Drosophila DevCell 5 59ndash72

Lin H Yue L and Spradling AC (1994) The Drosophila fusomea germline-specific organelle contains membrane skeletalproteins and functions in cyst formation Development 120947ndash956

Liu XS Brutlag DL and Liu JS (2002) An algorithm for findingprotein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments Nature Biotechnol20 835ndash839

Liu Y Liu S Wei L Altman RB and Batzoglou S (2004)Eukaryotic regulatory element conservation analysis andidentification using comparative genomics Genome Res 14354ndash366

Martinez T Burmester T Veenstra JA and Wheeler DE(2001) Sequence and evolution of a hexamerin from the antCamponotus festinatus Insect Mol Biol 9 427ndash431

McKearin D (1997) The Drosophila fusome organelle biogenesisand germ cell differentiation if you build it hellip Bioessays 19147ndash152

Mirth C Truman JW and Riddiford LM (2005) The role of theprothoracic gland in determining critical weight for metamor-phosis in Drosophila melanogaster Curr Biol 15 1ndash12

Moritz RFA Kryger P and Allsopp MH (1996) Competition forroyalty in bees Nature 384 31

Page RE and Erickson EH (1988) Reproduction by worker honeybees (Apis mellifera L) Behav Ecol Sociobiol 23 117ndash126

Pereboom JJM Jordan WC Sumner S Hammond RL andBourke AFG (2005) Differential gene expression in queen-worker caste determination in bumble bees Proc R Soc B 2721145ndash1152

Perera SC Zheng S Feng QL Krell PJ Retnakaran A andPalli SR (2005) Heterodimerization of ecdysone receptor andultraspiracle on symmetric and asymmetric response elementsArch Insect Biochem Physiol 60 55ndash70

Piulachs MD Guidugli KR Barchuk AR Cruz J SimotildeesZLP and Belleacutes X (2003) The vitellogenin of the honeybeeApis mellifera Structural analysis of the cDNA and expressionstudies Insect Biochem Mol Biol 33 459ndash465

Pritsker M Liu YC Beer MA and Tavazole S (2004) Whole

714 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

genome discovery of transcription factor binding sites bynetwork-level conservation Genome Res 14 99ndash108

Rachinsky A Strambi C Strambi A and Hartfelder K (1990)Caste and metamorphosis ndash hemolymph titers of juvenilehormone and ecdysteroids in last instar honeybee larvae GenComp Endocr 79 31ndash38

Riddihough G and Pelham HRB (1987) An ecdysone responseelement in the Drosophila hsp27 promotor EMBO J 6 3729ndash3734

Roth FP Hughes JD Estep PW and Church GM (1998)Finding DNA regulatory motifs within unaligned noncodingsequences clustered by whole-genome mRNA quantificationNature Biotechnol 16 939ndash945

Rutherford K Parkhill J Crook J Horsnell T Rice PRajandream MA and Barrell B (2000) Artemis sequencevisualisation and annotation Bioinformatics 16 944ndash945

Schmidt-Capella IC and Hartfelder K (1998) Juvenile hormoneeffect on DNA synthesis and apoptosis in caste-specificdifferentiation of the larval honey bee (Apis mellifera L) ovaryJ Insect Physiol 44 385ndash391

Snodgrass RE (1956) Anatomy of the Honey Bee Cornell Uni-versity Press Ithaca

Solignac M Vautrin D Baudry E Mougel F Loiseau A andCornuet JM (2004) A microsatellite-based linkage map of thehoneybee Apis mellifera L Genetics 167 253ndash262

St Johnston D and Nuumlsslein-Volhard C (1992) The origin of pat-tern and polarity in the Drosophila embryo Cell 68 201ndash219

Staab S and Steinmann-Zwicky M (1995) Female germ cells ofDrosophila require zygotic ovo and otu product for survival inlarvae and pupae respectively Mech Dev 54 205ndash210

Sullivan AA and Thummel CS (2003) Temporal profiles ofnuclear receptor gene expression profiles reveal coordinatetranscriptional responses during Drosophila development MolEndocrinol 17 2125ndash2137

Wassermann WW and Sandelin A (2004) Applied bioinformaticsfor the identification of regulatory elements Nature Rev Genet5 267ndash287

West-Eberhard MJ (2003) Developmental Plasticity andEvolution Oxford University Press Oxford

Wheeler DE and Nijhout HF (2003) A perspective for under-standing the modes of juvenile hormone as a lipid signalingsystem Bioessays 25 994ndash1001

Wilson EO (1971) The Insect Societies Belknapp Press of Har-vard University Press Cambridge MA

Wingender E Chen X Hehl R Karas H Liebich I Matys Vet al (2000) TRANSFAC an integrated system for geneexpression regulation Nucleic Acids Res 28 316ndash319

Zhou X Oi FM and Scharf ME (2006a) Social exploitationof hexamerin RNAi reveals a major caste-regulatory factor intermites Proc Natl Acad Sci USA 103 4499ndash4504

Zhou X Tarver MR Bennett GW Oi FM and Scharf ME(2006b) Two hexamerin genes from the termite Reticulitermesflavipes sequence expression and proposed functions incaste regulation Gene 376 47ndash58

Zhu Z Shendure J and Church GM (2005) Discoveringfunctional transcription factor combinations in the human cellcycle Genome Res 15 848ndash855

Supplementary material

The following material is available for this article online

Table S1 Annotation results of caste-specifically expressedhoney bee genes

Table S2 Caste-specific motifs in UCRs of honey bee genes

Table S3 Annotation of honey bee orthologs to Drosophilamelanogaster genes involved in Reproduction

This material is available as part of the online article fromhttpwwwblackwell-synergycom

Page 6: Caste development and reproduction: a genome-wide analysis of hallmarks of insect eusociality

708 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

sites is generally taken as a sign of strong regulatory effect(Davidson 2001)

As caste development is highly dependent on changesin haemolymph titres of JH and ecdysteroids we alsoscreened the UCRs of the differentially expressed genesfor putative nuclear receptor binding sites Regulatoryelements involved in the JH response are not well under-stood yet so any prediction in this direction would beelusive (Wheeler amp Nijhout 2003) Functional ecdysoneresponse elements (EcRE) have however been identifiedand it is now well established that the EcRUSP complexbinds to direct or inverted (palindromic) repeats (Riddi-hough amp Pelham 1987 Antoniewski et al 1995 Perera

et al 2005) A PSSM search (Wassermann amp Sandelin2004) based on a canonical representation (rGkTCAaT-Gamcy) (Perera et al 2005) did not reveal any putativeEcRE motif in the UCRs of the 51 caste-differentiallyexpressed genes However this does not rule out thatthese genes respond to changes in JH andor ecdysteroidtitres as these hormones require EcRUSP binding prima-rily in the expression of early genes but not necessarily forthe late response genes (Li amp White 2003 Sullivan ampThummel 2003)

In conclusion the predictions from such a combinedstrategy that searches for group-specific and for conservedregulatory motifs in GLEAN3 predicted honey bee genes

Figure 3 Putative regulatory motifs and their consensus sequences in UCRs of queen and worker overexpressed genes Scores for MAP Church ROC AUC and MNCP metrics indicate degree of group specificity and significance level

Genomics of honey bee caste development and reproduction 709

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

represents a major transition from nonhypothesis-drivenhigh-throughput screens to hypothesis-driven searches forcontext-dependent gene expression in honey bees Suchdirected search results can serve as a platform for experi-mental analyses of genome-wide integration in hormonalcontrol of caste development in bees In addition this studyexemplifies how existent algorithms for detecting sharedregulatory motifs can be joined into a toolkit for predictingcoregulated gene expression patterns in honey bees Thesemethods have been shown to be robust and are gainingacceptance for use in functional and comparative genomics(Liu et al 2004 Pritsker et al 2004 Zhu et al 2005)

Oogenesis and reproduction

As caste development sets the stage for reproductivedivision of labour genes involved in reproductive processesare strong candidates for functional analyses In the presentstudy we performed BLAST searches to identify honey beeorthologs for a list of 32 fly genes with the GO attributelsquooogenesisrsquo and for four genes specifically related tolsquovitellogenesisrsquo The list for fly genes involved in nuclearevents in germ cells consisted of 20 genes for lsquofemalemeiosisrsquo 12 genes for lsquorecombinationrsquo and 21 genes underthe heading lsquochromosome segregation including segregation

distortionrsquo (Supplementary material Table 3S) In some casesthese GO attributes for fly genes of course overlapped

BLASTN and BLASTX searches for these fly genesagainst the honey bee genome assembly 30 and theGLEAN3 Official Set (aa) retrieved statistically well sup-ported putative bee orthologs for most of these candidatesFor the genes involved in meiosis recombination and chro-mosome segregation this finding although not unexpectedis of interest as meiosis in the haploid honey bee drone isstrongly modified when compared with a normal diploidmeiosis The first meiosis is initiated but the nucleusremains undivided and only the superfluous centrioles areeliminated as cytoplasmatic buds (Hoage amp Kessel 1968)An interesting gene thelytoky (th) has recently beenmapped in this context (Lattorff et al 2005) It preventsalmost completely meiotic recombination in the automixisof laying workers of the Cape honey bee As an indicationof the interplay between meiosis and later developmentthis locus also appears to be an integral part of variousgene cascades involved in caste determination (Lattorffet al 2006)

The fly genes retrieved in the GO searches for lsquooogenesisrsquorepresent a much larger range of Molecular Functioncategories such as transcription factors proteins regulating

Figure 4 Map of the group-specific motifs found in queen and worker UCRs of caste-specifically expressed genes The coding region is represented by the GLEAN3 prediction number (assembly 40) with arrows indicating the translation start site Asterisks mark UCRs of the lsquotop10rsquo set used to find the over-represented motifs

710 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

translation by RNA binding RNA helicases enzymes(ubiquitination transfer of sugar residues sulfotransferase)GTPase activity and several factors binding to cytoskeletalproteins (Supplementary material Table 3S) This widerange of functional categories is expected as these genesare involved in a series of different steps during oogenesisin the polytrophic meroistic ovary Oogenesis starts out withthe maintenance of germline and somatic stem cell identityin the germline niche in the upper germarium A key geneinvolved in this process is pumilio (Forbes amp Lehmann1998) which is represented by a highly conserved beeortholog GB10504-PA The second step is the formation ofgerm cell cysts the determination of an oocyte within eachcyst and the survival of these cysts involving genes suchas benign gonial cell neoplasm (Lin et al 1994) encore(Hawkins et al 1996) ovo and ovarian tumour (otu) (Staabamp Steinmann-Zwicky 1995) all well conserved in the honeybee genome Interestingly we could not find a clear beeortholog for bag of marbles (bam) which is one of the primeearly response genes in the cystoblast differentiationpathway in Drosophila (McKearin 1997)

The third step comprises previtellogenic growth of thefollicle and during these stages a number of maternalfactors are deposited and anchored either within the oocyteor in the perivitelline space that define the egg and thefuture embryonic axes (for review see St Johnston ampNuumlsslein-Volhard (1992) In the list of Drosophila genesinvolved in early steps of axis determination a couple ofsurprises came up in the search for honey bee orthologsA big surprise was that we could not find a gurken orthologin the bee even though this gene sets up both the anteriorndashposterior and dorsalndashventral axes in the Drosophila egg(Gonzaacutelez-Reyes et al 1995) whereas downstreamcomponents of the Gurken signalling cascade appear tobe preserved in the bee genome (Honey Bee GenomeSequencing Consortium 2006) Similar apparent gaps inconstituents of patterning cascades were noted for theterminal regions of the embryo such as a lack of a torsoortholog whereas its ligand torso-like is represented by awell conserved ortholog in the bee (GB18663-PA)

With respect to genes involved in the final processes ofoogenesis we primarily looked at genes that play a partduring vitellogenesis There are four genes of interest in thisclass the primary one coding for the yolk protein precursorvitellogenin This gene has already been sequenced for thehoney bee (Piulachs et al 2003) and as expected it ismuch more related to vitellogenins of other insects andeven vertebrates than to the Drosophila yolk proteinswhich apparently are derived from lipases (Hagedorn et al1998) The second gene of interest is the bee ortholog toyolkless as this (GB16571-PA) could represent a putativevitellogenin receptor The other two Drosophila genes withclear orthologs in the bee are CG18641 and CG12139which code for a lipase and an LDL receptor respectively

General conclusions

The current analysis made use of previous experimentalanalyses on differential transcription during caste develop-ment of honey bee larvae In the annotation of these geneswhich includes references to Gene Ontology terms asso-ciated with their respective Drosophila orthologs two majorconfigurations emerged First of all worker genes werebetter defined in terms of GO attributes compared with therelatively large number of queen genes that had no GOterms associated to their respective Drosophila orthologsEven when taking into consideration the conceptual limitsin attributing GO terms on molecular function and biologicalprocess from Drosophila orthologs to honey bee genesthis finding could have a bearing on general basicquestions in socioevolution namely which caste is moredivergent from a nonsocial reproductive female bee proto-type or reproductive ground plan the queen or the workerLess speculative is the second major conclusion comingout of the GO analysis for Molecular Function showing andconfirming (Eder et al 1983 Corona et al 1999) theimportant role of metabolic regulation in caste developmentThis facet is demonstrated especially clearly in thecaste-specific expression of oxidoreductases (queen)vs hydrolases (workers)

The honey bee genome information provided not only amuch improved annotation platform for caste-specificallyexpressed ESTs but even more so opens the possibilityto explore putative regulatory features of the honey beegenome In the current study we employed modified Gibbssampling and expectation-maximization algorithms (Alig-nACE MDScan MEME) to detect group-specific motifs ingene regions up to 1000 bp upstream of translation startsites We detected 14 motifs that were significantly over-represented in the caste genes when compared withcorresponding motifs found in a random set of GLEAN3-predicted honey bee genes The localization of suchmotifs in UCRs of worker-overexpressed genes revealeda clustering of such motifs close to the predicted basalpromotor regions suggesting strong regulatory effectsSuch search strategies and the detected motifs can providethe lead to reveal and unravel cis-regulatory networks forand within specific contexts of honey bee biology

Caste polyphenism in social insects makes a strong casefor the emergence of novelties at a microevolutionary level(West-Eberhard 2003) Without the pretension to discussexhaustively the mechanisms underlying this surge ofdevelopmental plasticity two major themes becomeapparent in this and other studies Regulatory change hasbeen demonstrated in the shut-down of wing disc pattern-ing cascades in ants (Abouheif amp Wray 2002) and iscertainly also implicit in observed temporal changes ingene expression during postembryonic development ofants and bumble bees (Goodisman et al 2005 Pereboom

Genomics of honey bee caste development and reproduction 711

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

et al 2005) Such change would be expected to involvecis-regulatory elements that is change in transcriptionfactor binding sites in UCRs as approached in this studyand also evolutionary change in response thresholds tocirculating morphogenetic hormones (for review see Hart-felder amp Emlen 2005) The second and quite unexpectedtheme is the acquisition of new systemic functions byevolutionary rather old proteins such as vitellogenin andhexamerins These apparently unspectacular proteinshave evolved into key players for caste evolution and repro-ductive division of labour via novel regulatory connectivitywith JH (Amdam et al 2004 Guidugli et al 2005 Zhouet al 2006ab)

Experimental procedures

Selection and annotation of ESTs representing differentially expressed genes in honey bee caste development

The starting point were 164 entries (mainly 3prime-ESTs) in GenBank(BG101532ndashBG101697) from an SSH library (Evans amp Wheeler1999 Evans amp Wheeler 2000) When validated by macroarrayanalyses a clustering into three major classes became apparent(I) genes overexpressed in young larvae (II) genes overexpressedin last instar queen larvae and (III) genes overexpressed in lastinstar worker larvae For this study we excluded the class I ESTsbecause their expression is not caste-specific but rather representsexpression differences between young (still bipotent) and olderlarvae To the class II queen ESTs (82) we added one completecDNA entry (AY601642) from a DDRT-PCR screen (Corona et al1999) and to the class III set of worker ESTs (40) we added sevenGenBank dbEST entries (BG149167ndashBG149173) from a DDRT-PCR screen on ovary development (Hepperle amp Hartfelder 2001)

The EST sequences were submitted to BLASTN searches(parameters -G 2 -E 3 -W 15 -F lsquom Drsquo -U -e 1e-20) against genomesequence assembly Amel_v30 to retrieve matches in linked orunlinked genomic contigs and to exclude no-matches (seven ESTsin queen) ESTs that aligned within the same scaffold werechecked for clustering and overlap This clustering also served toexclude genes that were represented by non-overlapping ESTsfrom both castes This procedure generated a set of 51 uniqueputative gene sequences overexpressed in either queen (34) or inworker larvae (17) These 51 nonredundant sequences weresubmitted to BLASTX searches against the Official Set of GLEAN3-predicted protein sequences (cut-off value at 1eminus20) For ESTs withno significant protein sequence matches the genomic regionsadjacent to the mapped EST were searched to find neighbouringORFs especially those nearest to putative 3prime UTRs of predictedproteins as the EST libraries have a bias in this direction

Official Set protein sequences were aligned against Amel_v30sequence assembly using TBLASTN to map protein to genomeand subsequently they were aligned using BLASTP against theGenBank nonredundant (nr) and the Flybase protein sequencedatabases The manual features annotation procedure of theArtemis 70 program (Rutherford et al 2000) was used to mapORFs putative splice sites of exons and ESTs to genome coordi-nates The final annotation file was generated with a Python scriptin GFF format (httpwwwsangeracukSoftwareformatsGFF)

Honey bee sequences annotated as orthologs to D mela-nogaster genes were putatively assigned the GO terms listed in

the respective Flybase entry In addition the definition of new GOterms (Biological Process ontology) related to caste developmentand polyphenism (GO0048651 and GO0048650 respectively)was co-ordinated with the Gene Ontology Consortium (Ashburneret al 2000) The FatiGO web tool (Al-Sharour et al 2004) wasused to cluster GO terms (level 3 setting) for Biological Processand Molecular Function

For the detection of conserved domains the 51 proteinsequences were screened against the Pfam database (httpwwwsangeracukSoftwarePfam) using the HMMER platform(current release 232 httphmmerwustledu) with a cut-offvalue set at 1eminus10

Annotation of oogenesis and reproduction genes

In order to identify putative honey bee orthologs to D melanogastergenes we searched the following GO terms in Flybase lsquooogenesisrsquo(GO0009993) lsquovitellogenesisrsquo (GO0007296) lsquofemale meiosisrsquo(GO0007143) lsquoDNA recombinationrsquo (GO0006310) and lsquochromo-some segregationrsquo (GO0007059) Genes related to segregationdistortion were searched for in Flybase in phenotypic descriptionsand mutant effects of D melanogaster genes as this phenomenonis not represented by a GO term Hence this group may be moreheterogeneous than the others From this list we removed genesof pleiotropic function (multifaceted GO entries in Biological Process)and genes that lacked defined transcripts in the Drosophilagenome database

For the GO terms lsquooogenesisrsquo and lsquovitellogenesisrsquo we performedTBLASTN and BLASTP searches for 42 fruit fly genes against theAmel_v30 genome assembly and the GLEAN3-predicted proteinsequences (Honey Bee Genome Sequencing Consortium 2006)respectively The orthologous D melanogaster gene was charac-terized by the same procedure as described above (reciprocalbest hit) For the GO terms lsquofemale meiosisrsquo lsquoDNA recombinationrsquolsquochromosome segregationrsquo and the non-GO group lsquosegregationdistortionrsquo transcripts of D melanogaster were searched againstnr databases at NCBI using BLASTP The obtained sequenceswere searched against the Amel_v30 genome assembly and theGLEAN3-predicted protein sequences using TBLASTN Homolo-gous sequences (threshold 1eminus10) were predicted using the BioEditsoftware (Hall 1999) ORFs showing significant homology (BLASTPthreshold 1eminus20) were assembled and used in BLASTP searchesagainst the nr databases at NCBI

Motif search in upstream regions in caste-specifically expressed genes

In order to detect overrepresented motifs in the upstream controlregions (UCRs) of the two sets of caste-related genes we selectedgene subsets based on two criteria (1) those that had shown thehighest caste-specificity in the array analyses (Evans amp Wheeler2000) and (2) those that had a conserved 5prime region when comparedwith the Drosophila orthologs These lsquotop10rsquo genes consisted of sixqueen genes (GB13072 GB11628 GB19380 GB14798 GB16047and GB18242) and of six worker genes (GB10869 GB12371GB12239 GB10428 GB19006 GB14758) The motif search wasconducted separately on the two sets of UCR sequences using threemethods AlignAce (Roth et al 1998) MEME (Bailey amp Elkan1995) and MDscan (Liu et al 2002) Default parameters valueswere used in all searches except that GC content in intergenicregions was set to 25 representing the background value estab-lished for the honey bee UCR database generated in this study

712 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

The database containing 10156 UCR sequences was gener-ated by parsing the Official Set annotation file (downloaded in GFFformat from httpwwwbeegenomehgscbcmtmcedubeeftphtml)to extract upstream regions starting from the terminal 5prime-genomiccoordinate of each predicted CDS The UCRs were arbitrarily setto a size frame of 1000 nucleotides (Roth et al 1998) but weretrimmed whenever another predicted ORF was detected in any ofthese regions

The MAP (maximum a priori log likelihood) score group specificityscore (called Church score in this manuscript) (Hughes et al2000) ROC AUC (area under the curve for a receiver-operatorcharacteristic plot) metric and MNCP (mean normalized condi-tional probability) metric (Clarke amp Granek 2003) were used todetect motifs that most likely correspond to biologically significantcis-regulatory elements The filters ran on the UCRs of the subsetsof queen and worker genes were a MAP score cut-off value of 50followed by a ROC AUC cut-off at 07 followed by a group specificityscore cut-off at 1eminus05 The UCR database of all GLEAN3 predictedhoney bee genes was used as the background to calculatethese metrics

A parametric test (MANOVA) and a nonparametric test (KolmogorovndashSmirnov) were conducted to identify significance levels for the twosets of filtered motifs found in the UCRs of caste-specificallyexpressed genes against filtered motifs found in the random UCRset Random motifs were sampled from a motif database (10 391motifs) generated by running our script 100 times with a randomsets of UCR sequences

The main criterion for identifying known regulatory motifs amongthese caste-specific ones was the alignment of the PSSM for eachbee motif with the Drosophila melanogaster sequences in theTRANSFAC database (release 40) (Wingender et al 2000) Onlythe alignments passing a threshold of 80 identity for each PSSMwere considered as significant matches In addition we checkedfor a specific binding motif the EcRUSP motif (rGkTCAaTGamcy-3prime) known to function in the expression of genes responding tomorphogenetic hormone titres (Perera et al 2005)

Operating system and programming tools

An Ubuntu Linux (version Breezy) operating system was used toimplement all scripts and pipelines designed for annotation proce-dures and motif discovery The Python programming language(httpwwwpythonorg) Biopython (httpwwwbiopythonorg)and TAMO (Tools for Analysis of Motifs) packages (Gordon et al2005) were used in program design Other web applications werebuilt using the Zope application server (httpwwwzopeorg)hosted at httpzulufmrpuspbrbeelab

Acknowledgements

This work was supported by grants from FAPESP(Fundaccedilatildeo de Amparo a Pesquisa do Estado de SatildeoPaulo 9900719-6) and CNPq (Conselho Nacional deDesenvolvimento Cientiacutefico e Tecnoloacutegico 4729632-3-1)

References

Abouheif E and Wray GA (2002) Evolution of the genetic networkunderlying wing polyphenism in ants Science 297 249ndash252

Al-Sharour F Diaz-Uriarte R and Dopazo J (2004) FatiGO aweb tool for finding significant associations of Gene Ontologyterms with groups of genes Bioinformatics 20 578ndash580

Amdam GV Norberg K Hagen A and Omholt SW (2003)Social exploitation of vitellogenin Proc Natl Acad Sci USA 1001799ndash1802

Amdam GV Simotildees ZLP Hagen A Norberg K SchroderK Mikkelsen O et al (2004) Hormonal control of the yolkprecursor vitellogenin regulates immune function and longevityin honeybees Exp Gerontol 39 767ndash773

Antoniewski C OrsquoGrady MS Edmondson RG Lassieur SMand Benes H (1995) Characterization of an EcRUSP heter-odimer target site that mediates ecdysone responsiveness ofthe Drosophila Lsp-2 gene Mol Gen Genet 249 545ndash556

Arbeitsman MN Furlong EEM Imam F Johnson E NullBH Baker BS et al (2002) Gene expression during the lifecycle of Drosophila melanogaster Science 297 2270ndash2275

Ashburner M Ball CA Blake JA Botstein D Butler HCherry JM et al (2000) Gene Ontology tool for the unificationof biology Nature Genet 25 25ndash29

Bailey TL and Elkan D (1995) Unsupervised learning of multiplemotifs in biopolymers using expectation maximizationMachine Learning J 21 58ndash83

Bergem M Norberg K and Aamodt RM (2006) Long-termmaintenance of in vitro cultured honeybee (Apis mellifera)embryonic cells BMC Dev Biol 6 17 online

Boutros M Kiger AA Armknecht S Kerr K Hild M Koch Bet al (2004) Genome-wide RNAi analysis of growth and viabil-ity in Drosophila cells Science 303 832ndash835

Carsten LD Watts T and Markov TA (2005) Gene expressionpatterns accompanying a dietary switch in Drosophila mela-nogaster Mol Ecol 14 3203ndash3208

Clarke ND and Granek JA (2003) Rank order metrics for quan-tifying the association of sequence features with gene regula-tion Bioinformatics 19 212ndash218

Colombani J Bianchini L Layalle S Pondeville E Dauphin-Villemant C Antoniewski C et al (2005) Antagonisticactions of ecdysone and insulins determine final size inDrosophila Science 310 667ndash670

Corona M Estrada E and Zurita M (1999) Differential expressionof mitochondrial genes between queens and workers duringcaste determination in the honeybee Apis mellifera J Exp Biol202 929ndash938

Cunha AC Nascimento AM Guidugli KR Simotildees ZLPand Bitondi MMG (2005) Molecular cloning and expressionof a hexamerin cDNA from the honey bee J Insect Physiol 511135ndash1147

Darwin CR (1859) On the Origin of Species by Means of NaturalSelection John Murray London

Davidson EH (2001) Genomic Regulatory Systems Developmentand Evolution Academic Press San Diego

Eder J Kremer JP and Rembold H (1983) Correlation ofcytochrome c titer and respiration in Apis mellifera adaptiveresponses to caste determination defines workers intercastesand queens Comp Biochem Physiol B 76 703ndash716

Engels W (1974) Occurrence and significance of vitellogenins infemale castes of social Hymenoptera Am Zool 14 1229ndash1237

Evans JD and Wheeler DE (1999) Differential gene expressionbetween developing queens and workers in the honey beeApis mellifera Proc Natl Acad Sci USA 96 5575ndash5580

Genomics of honey bee caste development and reproduction 713

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

Evans JD and Wheeler DE (2000) Expression profiles duringhoneybee caste determination Genome Biol 2 1 6 pagesonline httpgenomebiologycom200021research00011

Forbes A and Lehmann R (1998) Nanos and Pumilio havecritical roles in the development and function of Drosophilagermline stem cells Development 125 679ndash690

Gonzaacutelez-Reyes A Elliot H and St Johnston D (1995)Polarization of both major body axes in Drosophila by gurken-torpedo signalling Nature 375 654ndash658

Goodisman MA Isoe J Wheeler DE and Wells MA (2005)Evolution of insect metamorphosis a microarray-based studyof larval and adult gene expression in the ant Camponotusfestinatus Evolution 59 858ndash870

Gordon DB Nekludova L McCallum S and Fraenkel E(2005) TAMO a flexible object-oriented framework for analys-ing transcriptional regulation using DNA-sequence motifsBioinformatics 21 3164ndash3165

Guidugli KR Hepperle C and Hartfelder K (2004) A memberof the short-chain dehydrogenasereductase (SDR) super-family is a target of the ecdysone response in honey bee (Apismellifera) caste development Apidologie 35 37ndash47

Guidugli KR Nascimento AM Amdam GV Barchuk AROmholt SW Simotildees ZLP and Hartfelder K (2005)Vitellogenin regulates hormonal dynamics in the worker casteof a eusocial insect FEBS Lett 579 4961ndash4965

Hagedorn HH Maddison DR and Tu Z (1998) The evolutionof vitellogenins cyclorrhaphan yolk proteins and relatedmolecules Adv Insect Physiol 27 335ndash384

Hall TA (1999) BioEdit a user-friendly biological sequence align-ment editor and analysis program for Windows 9598NTNucleic Acids Symp Ser 41 95ndash98

Hamilton WD (1964) The genetical theory of social behaviour Iamp II J Theor Biol 7(1ndash16) 17ndash52

Hartfelder K and Emlen DJ (2005) Endocrine control of insectpolyphenism In Comprehensive Insect Molecular Science(Gilbert LI Iatrou K and Gill S eds) Vol 3 pp 651ndash703Elsevier Oxford

Hartfelder K and Engels W (1998) Social insect polymorphismHormonal regulation of plasticity in development and repro-duction in the honeybee Curr Topics Dev Biol 40 45ndash77

Hawkins NC Thorpe J and Schuumlpbach T (1996) encore agene required for the regulation of germ line mitosis and oocytedifferentiation during Drosophila oogenesis Development 122281ndash290

Haydak MH (1970) Honey bee nutrition Annu Rev Entomol 15143ndash156

Hepperle C and Hartfelder K (2001) Differentially expressedregulatory genes in honey bee caste development Naturwis-senschaften 88 113ndash116

Hoage TR and Kessel RG (1968) An electron microscopicstudy of the process of differentiation during spermatogenesis inthe drone honey bee (Apis mellifera L) with special reference tocentriole replication and elimination J Ultrastruct Res 24 6ndash32

Honey Bee Genome Sequencing Consortium (2006) Insightsinto social insects from the genome of the honeybee Apismellifera Nature (in press)

Hughes JD Estep PW Tavazole S and Church GM (2000)Computational identification of cis-regulatory elementsassociated with groups of functionally related genes inSaccharomyces cervisiae J Mol Biol 296 1205ndash1214

Hunt GJ and Page RE (1995) Linkage map of the honey bee

Apis mellifera based on RAPD markers Genetics 139 1371ndash1382

Hunt JH Buck NA and Wheeler DE (2003) Storage proteinsin vespid wasps characterization developmental pattern andoccurrence in adults J Insect Physiol 49 785ndash794

Judice CC Carazzole MF Festa F Sogayar MC HartfelderK and Pereira GAG (2006) Gene expression profiles under-lying alternative caste phenotypes in a highly eusocial beeMelipona quadrifasciata Insect Mol Biol 15 33ndash44

Kerr WE (1950) Genetic determination of castes in the genusMelipona Genetics 35 143ndash152

Kropaacutecovaacute S and Haslbachovaacute H (1971) The influence ofqueenlessness and of unsealed brood on the development ofovaries in worker honeybees J Apicult Res 10 57ndash61

Lattorff HMG Moritz RFA and Fuchs S (2005) A single genedetermines thelytokous parthenogenesis in honey bee workers(Apis mellifera capensis) Heredity 94 533ndash537

Lattorff HMG Moritz RFA Solignac M and Crewe RM(2006) Control of reproductive dominance by the thelytokygene in honeybees Biol Lett (in press)

Li TR and White KP (2003) Tissue-specific gene expressionand ecdysone-regulated genomic networks in Drosophila DevCell 5 59ndash72

Lin H Yue L and Spradling AC (1994) The Drosophila fusomea germline-specific organelle contains membrane skeletalproteins and functions in cyst formation Development 120947ndash956

Liu XS Brutlag DL and Liu JS (2002) An algorithm for findingprotein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments Nature Biotechnol20 835ndash839

Liu Y Liu S Wei L Altman RB and Batzoglou S (2004)Eukaryotic regulatory element conservation analysis andidentification using comparative genomics Genome Res 14354ndash366

Martinez T Burmester T Veenstra JA and Wheeler DE(2001) Sequence and evolution of a hexamerin from the antCamponotus festinatus Insect Mol Biol 9 427ndash431

McKearin D (1997) The Drosophila fusome organelle biogenesisand germ cell differentiation if you build it hellip Bioessays 19147ndash152

Mirth C Truman JW and Riddiford LM (2005) The role of theprothoracic gland in determining critical weight for metamor-phosis in Drosophila melanogaster Curr Biol 15 1ndash12

Moritz RFA Kryger P and Allsopp MH (1996) Competition forroyalty in bees Nature 384 31

Page RE and Erickson EH (1988) Reproduction by worker honeybees (Apis mellifera L) Behav Ecol Sociobiol 23 117ndash126

Pereboom JJM Jordan WC Sumner S Hammond RL andBourke AFG (2005) Differential gene expression in queen-worker caste determination in bumble bees Proc R Soc B 2721145ndash1152

Perera SC Zheng S Feng QL Krell PJ Retnakaran A andPalli SR (2005) Heterodimerization of ecdysone receptor andultraspiracle on symmetric and asymmetric response elementsArch Insect Biochem Physiol 60 55ndash70

Piulachs MD Guidugli KR Barchuk AR Cruz J SimotildeesZLP and Belleacutes X (2003) The vitellogenin of the honeybeeApis mellifera Structural analysis of the cDNA and expressionstudies Insect Biochem Mol Biol 33 459ndash465

Pritsker M Liu YC Beer MA and Tavazole S (2004) Whole

714 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

genome discovery of transcription factor binding sites bynetwork-level conservation Genome Res 14 99ndash108

Rachinsky A Strambi C Strambi A and Hartfelder K (1990)Caste and metamorphosis ndash hemolymph titers of juvenilehormone and ecdysteroids in last instar honeybee larvae GenComp Endocr 79 31ndash38

Riddihough G and Pelham HRB (1987) An ecdysone responseelement in the Drosophila hsp27 promotor EMBO J 6 3729ndash3734

Roth FP Hughes JD Estep PW and Church GM (1998)Finding DNA regulatory motifs within unaligned noncodingsequences clustered by whole-genome mRNA quantificationNature Biotechnol 16 939ndash945

Rutherford K Parkhill J Crook J Horsnell T Rice PRajandream MA and Barrell B (2000) Artemis sequencevisualisation and annotation Bioinformatics 16 944ndash945

Schmidt-Capella IC and Hartfelder K (1998) Juvenile hormoneeffect on DNA synthesis and apoptosis in caste-specificdifferentiation of the larval honey bee (Apis mellifera L) ovaryJ Insect Physiol 44 385ndash391

Snodgrass RE (1956) Anatomy of the Honey Bee Cornell Uni-versity Press Ithaca

Solignac M Vautrin D Baudry E Mougel F Loiseau A andCornuet JM (2004) A microsatellite-based linkage map of thehoneybee Apis mellifera L Genetics 167 253ndash262

St Johnston D and Nuumlsslein-Volhard C (1992) The origin of pat-tern and polarity in the Drosophila embryo Cell 68 201ndash219

Staab S and Steinmann-Zwicky M (1995) Female germ cells ofDrosophila require zygotic ovo and otu product for survival inlarvae and pupae respectively Mech Dev 54 205ndash210

Sullivan AA and Thummel CS (2003) Temporal profiles ofnuclear receptor gene expression profiles reveal coordinatetranscriptional responses during Drosophila development MolEndocrinol 17 2125ndash2137

Wassermann WW and Sandelin A (2004) Applied bioinformaticsfor the identification of regulatory elements Nature Rev Genet5 267ndash287

West-Eberhard MJ (2003) Developmental Plasticity andEvolution Oxford University Press Oxford

Wheeler DE and Nijhout HF (2003) A perspective for under-standing the modes of juvenile hormone as a lipid signalingsystem Bioessays 25 994ndash1001

Wilson EO (1971) The Insect Societies Belknapp Press of Har-vard University Press Cambridge MA

Wingender E Chen X Hehl R Karas H Liebich I Matys Vet al (2000) TRANSFAC an integrated system for geneexpression regulation Nucleic Acids Res 28 316ndash319

Zhou X Oi FM and Scharf ME (2006a) Social exploitationof hexamerin RNAi reveals a major caste-regulatory factor intermites Proc Natl Acad Sci USA 103 4499ndash4504

Zhou X Tarver MR Bennett GW Oi FM and Scharf ME(2006b) Two hexamerin genes from the termite Reticulitermesflavipes sequence expression and proposed functions incaste regulation Gene 376 47ndash58

Zhu Z Shendure J and Church GM (2005) Discoveringfunctional transcription factor combinations in the human cellcycle Genome Res 15 848ndash855

Supplementary material

The following material is available for this article online

Table S1 Annotation results of caste-specifically expressedhoney bee genes

Table S2 Caste-specific motifs in UCRs of honey bee genes

Table S3 Annotation of honey bee orthologs to Drosophilamelanogaster genes involved in Reproduction

This material is available as part of the online article fromhttpwwwblackwell-synergycom

Page 7: Caste development and reproduction: a genome-wide analysis of hallmarks of insect eusociality

Genomics of honey bee caste development and reproduction 709

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

represents a major transition from nonhypothesis-drivenhigh-throughput screens to hypothesis-driven searches forcontext-dependent gene expression in honey bees Suchdirected search results can serve as a platform for experi-mental analyses of genome-wide integration in hormonalcontrol of caste development in bees In addition this studyexemplifies how existent algorithms for detecting sharedregulatory motifs can be joined into a toolkit for predictingcoregulated gene expression patterns in honey bees Thesemethods have been shown to be robust and are gainingacceptance for use in functional and comparative genomics(Liu et al 2004 Pritsker et al 2004 Zhu et al 2005)

Oogenesis and reproduction

As caste development sets the stage for reproductivedivision of labour genes involved in reproductive processesare strong candidates for functional analyses In the presentstudy we performed BLAST searches to identify honey beeorthologs for a list of 32 fly genes with the GO attributelsquooogenesisrsquo and for four genes specifically related tolsquovitellogenesisrsquo The list for fly genes involved in nuclearevents in germ cells consisted of 20 genes for lsquofemalemeiosisrsquo 12 genes for lsquorecombinationrsquo and 21 genes underthe heading lsquochromosome segregation including segregation

distortionrsquo (Supplementary material Table 3S) In some casesthese GO attributes for fly genes of course overlapped

BLASTN and BLASTX searches for these fly genesagainst the honey bee genome assembly 30 and theGLEAN3 Official Set (aa) retrieved statistically well sup-ported putative bee orthologs for most of these candidatesFor the genes involved in meiosis recombination and chro-mosome segregation this finding although not unexpectedis of interest as meiosis in the haploid honey bee drone isstrongly modified when compared with a normal diploidmeiosis The first meiosis is initiated but the nucleusremains undivided and only the superfluous centrioles areeliminated as cytoplasmatic buds (Hoage amp Kessel 1968)An interesting gene thelytoky (th) has recently beenmapped in this context (Lattorff et al 2005) It preventsalmost completely meiotic recombination in the automixisof laying workers of the Cape honey bee As an indicationof the interplay between meiosis and later developmentthis locus also appears to be an integral part of variousgene cascades involved in caste determination (Lattorffet al 2006)

The fly genes retrieved in the GO searches for lsquooogenesisrsquorepresent a much larger range of Molecular Functioncategories such as transcription factors proteins regulating

Figure 4 Map of the group-specific motifs found in queen and worker UCRs of caste-specifically expressed genes The coding region is represented by the GLEAN3 prediction number (assembly 40) with arrows indicating the translation start site Asterisks mark UCRs of the lsquotop10rsquo set used to find the over-represented motifs

710 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

translation by RNA binding RNA helicases enzymes(ubiquitination transfer of sugar residues sulfotransferase)GTPase activity and several factors binding to cytoskeletalproteins (Supplementary material Table 3S) This widerange of functional categories is expected as these genesare involved in a series of different steps during oogenesisin the polytrophic meroistic ovary Oogenesis starts out withthe maintenance of germline and somatic stem cell identityin the germline niche in the upper germarium A key geneinvolved in this process is pumilio (Forbes amp Lehmann1998) which is represented by a highly conserved beeortholog GB10504-PA The second step is the formation ofgerm cell cysts the determination of an oocyte within eachcyst and the survival of these cysts involving genes suchas benign gonial cell neoplasm (Lin et al 1994) encore(Hawkins et al 1996) ovo and ovarian tumour (otu) (Staabamp Steinmann-Zwicky 1995) all well conserved in the honeybee genome Interestingly we could not find a clear beeortholog for bag of marbles (bam) which is one of the primeearly response genes in the cystoblast differentiationpathway in Drosophila (McKearin 1997)

The third step comprises previtellogenic growth of thefollicle and during these stages a number of maternalfactors are deposited and anchored either within the oocyteor in the perivitelline space that define the egg and thefuture embryonic axes (for review see St Johnston ampNuumlsslein-Volhard (1992) In the list of Drosophila genesinvolved in early steps of axis determination a couple ofsurprises came up in the search for honey bee orthologsA big surprise was that we could not find a gurken orthologin the bee even though this gene sets up both the anteriorndashposterior and dorsalndashventral axes in the Drosophila egg(Gonzaacutelez-Reyes et al 1995) whereas downstreamcomponents of the Gurken signalling cascade appear tobe preserved in the bee genome (Honey Bee GenomeSequencing Consortium 2006) Similar apparent gaps inconstituents of patterning cascades were noted for theterminal regions of the embryo such as a lack of a torsoortholog whereas its ligand torso-like is represented by awell conserved ortholog in the bee (GB18663-PA)

With respect to genes involved in the final processes ofoogenesis we primarily looked at genes that play a partduring vitellogenesis There are four genes of interest in thisclass the primary one coding for the yolk protein precursorvitellogenin This gene has already been sequenced for thehoney bee (Piulachs et al 2003) and as expected it ismuch more related to vitellogenins of other insects andeven vertebrates than to the Drosophila yolk proteinswhich apparently are derived from lipases (Hagedorn et al1998) The second gene of interest is the bee ortholog toyolkless as this (GB16571-PA) could represent a putativevitellogenin receptor The other two Drosophila genes withclear orthologs in the bee are CG18641 and CG12139which code for a lipase and an LDL receptor respectively

General conclusions

The current analysis made use of previous experimentalanalyses on differential transcription during caste develop-ment of honey bee larvae In the annotation of these geneswhich includes references to Gene Ontology terms asso-ciated with their respective Drosophila orthologs two majorconfigurations emerged First of all worker genes werebetter defined in terms of GO attributes compared with therelatively large number of queen genes that had no GOterms associated to their respective Drosophila orthologsEven when taking into consideration the conceptual limitsin attributing GO terms on molecular function and biologicalprocess from Drosophila orthologs to honey bee genesthis finding could have a bearing on general basicquestions in socioevolution namely which caste is moredivergent from a nonsocial reproductive female bee proto-type or reproductive ground plan the queen or the workerLess speculative is the second major conclusion comingout of the GO analysis for Molecular Function showing andconfirming (Eder et al 1983 Corona et al 1999) theimportant role of metabolic regulation in caste developmentThis facet is demonstrated especially clearly in thecaste-specific expression of oxidoreductases (queen)vs hydrolases (workers)

The honey bee genome information provided not only amuch improved annotation platform for caste-specificallyexpressed ESTs but even more so opens the possibilityto explore putative regulatory features of the honey beegenome In the current study we employed modified Gibbssampling and expectation-maximization algorithms (Alig-nACE MDScan MEME) to detect group-specific motifs ingene regions up to 1000 bp upstream of translation startsites We detected 14 motifs that were significantly over-represented in the caste genes when compared withcorresponding motifs found in a random set of GLEAN3-predicted honey bee genes The localization of suchmotifs in UCRs of worker-overexpressed genes revealeda clustering of such motifs close to the predicted basalpromotor regions suggesting strong regulatory effectsSuch search strategies and the detected motifs can providethe lead to reveal and unravel cis-regulatory networks forand within specific contexts of honey bee biology

Caste polyphenism in social insects makes a strong casefor the emergence of novelties at a microevolutionary level(West-Eberhard 2003) Without the pretension to discussexhaustively the mechanisms underlying this surge ofdevelopmental plasticity two major themes becomeapparent in this and other studies Regulatory change hasbeen demonstrated in the shut-down of wing disc pattern-ing cascades in ants (Abouheif amp Wray 2002) and iscertainly also implicit in observed temporal changes ingene expression during postembryonic development ofants and bumble bees (Goodisman et al 2005 Pereboom

Genomics of honey bee caste development and reproduction 711

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

et al 2005) Such change would be expected to involvecis-regulatory elements that is change in transcriptionfactor binding sites in UCRs as approached in this studyand also evolutionary change in response thresholds tocirculating morphogenetic hormones (for review see Hart-felder amp Emlen 2005) The second and quite unexpectedtheme is the acquisition of new systemic functions byevolutionary rather old proteins such as vitellogenin andhexamerins These apparently unspectacular proteinshave evolved into key players for caste evolution and repro-ductive division of labour via novel regulatory connectivitywith JH (Amdam et al 2004 Guidugli et al 2005 Zhouet al 2006ab)

Experimental procedures

Selection and annotation of ESTs representing differentially expressed genes in honey bee caste development

The starting point were 164 entries (mainly 3prime-ESTs) in GenBank(BG101532ndashBG101697) from an SSH library (Evans amp Wheeler1999 Evans amp Wheeler 2000) When validated by macroarrayanalyses a clustering into three major classes became apparent(I) genes overexpressed in young larvae (II) genes overexpressedin last instar queen larvae and (III) genes overexpressed in lastinstar worker larvae For this study we excluded the class I ESTsbecause their expression is not caste-specific but rather representsexpression differences between young (still bipotent) and olderlarvae To the class II queen ESTs (82) we added one completecDNA entry (AY601642) from a DDRT-PCR screen (Corona et al1999) and to the class III set of worker ESTs (40) we added sevenGenBank dbEST entries (BG149167ndashBG149173) from a DDRT-PCR screen on ovary development (Hepperle amp Hartfelder 2001)

The EST sequences were submitted to BLASTN searches(parameters -G 2 -E 3 -W 15 -F lsquom Drsquo -U -e 1e-20) against genomesequence assembly Amel_v30 to retrieve matches in linked orunlinked genomic contigs and to exclude no-matches (seven ESTsin queen) ESTs that aligned within the same scaffold werechecked for clustering and overlap This clustering also served toexclude genes that were represented by non-overlapping ESTsfrom both castes This procedure generated a set of 51 uniqueputative gene sequences overexpressed in either queen (34) or inworker larvae (17) These 51 nonredundant sequences weresubmitted to BLASTX searches against the Official Set of GLEAN3-predicted protein sequences (cut-off value at 1eminus20) For ESTs withno significant protein sequence matches the genomic regionsadjacent to the mapped EST were searched to find neighbouringORFs especially those nearest to putative 3prime UTRs of predictedproteins as the EST libraries have a bias in this direction

Official Set protein sequences were aligned against Amel_v30sequence assembly using TBLASTN to map protein to genomeand subsequently they were aligned using BLASTP against theGenBank nonredundant (nr) and the Flybase protein sequencedatabases The manual features annotation procedure of theArtemis 70 program (Rutherford et al 2000) was used to mapORFs putative splice sites of exons and ESTs to genome coordi-nates The final annotation file was generated with a Python scriptin GFF format (httpwwwsangeracukSoftwareformatsGFF)

Honey bee sequences annotated as orthologs to D mela-nogaster genes were putatively assigned the GO terms listed in

the respective Flybase entry In addition the definition of new GOterms (Biological Process ontology) related to caste developmentand polyphenism (GO0048651 and GO0048650 respectively)was co-ordinated with the Gene Ontology Consortium (Ashburneret al 2000) The FatiGO web tool (Al-Sharour et al 2004) wasused to cluster GO terms (level 3 setting) for Biological Processand Molecular Function

For the detection of conserved domains the 51 proteinsequences were screened against the Pfam database (httpwwwsangeracukSoftwarePfam) using the HMMER platform(current release 232 httphmmerwustledu) with a cut-offvalue set at 1eminus10

Annotation of oogenesis and reproduction genes

In order to identify putative honey bee orthologs to D melanogastergenes we searched the following GO terms in Flybase lsquooogenesisrsquo(GO0009993) lsquovitellogenesisrsquo (GO0007296) lsquofemale meiosisrsquo(GO0007143) lsquoDNA recombinationrsquo (GO0006310) and lsquochromo-some segregationrsquo (GO0007059) Genes related to segregationdistortion were searched for in Flybase in phenotypic descriptionsand mutant effects of D melanogaster genes as this phenomenonis not represented by a GO term Hence this group may be moreheterogeneous than the others From this list we removed genesof pleiotropic function (multifaceted GO entries in Biological Process)and genes that lacked defined transcripts in the Drosophilagenome database

For the GO terms lsquooogenesisrsquo and lsquovitellogenesisrsquo we performedTBLASTN and BLASTP searches for 42 fruit fly genes against theAmel_v30 genome assembly and the GLEAN3-predicted proteinsequences (Honey Bee Genome Sequencing Consortium 2006)respectively The orthologous D melanogaster gene was charac-terized by the same procedure as described above (reciprocalbest hit) For the GO terms lsquofemale meiosisrsquo lsquoDNA recombinationrsquolsquochromosome segregationrsquo and the non-GO group lsquosegregationdistortionrsquo transcripts of D melanogaster were searched againstnr databases at NCBI using BLASTP The obtained sequenceswere searched against the Amel_v30 genome assembly and theGLEAN3-predicted protein sequences using TBLASTN Homolo-gous sequences (threshold 1eminus10) were predicted using the BioEditsoftware (Hall 1999) ORFs showing significant homology (BLASTPthreshold 1eminus20) were assembled and used in BLASTP searchesagainst the nr databases at NCBI

Motif search in upstream regions in caste-specifically expressed genes

In order to detect overrepresented motifs in the upstream controlregions (UCRs) of the two sets of caste-related genes we selectedgene subsets based on two criteria (1) those that had shown thehighest caste-specificity in the array analyses (Evans amp Wheeler2000) and (2) those that had a conserved 5prime region when comparedwith the Drosophila orthologs These lsquotop10rsquo genes consisted of sixqueen genes (GB13072 GB11628 GB19380 GB14798 GB16047and GB18242) and of six worker genes (GB10869 GB12371GB12239 GB10428 GB19006 GB14758) The motif search wasconducted separately on the two sets of UCR sequences using threemethods AlignAce (Roth et al 1998) MEME (Bailey amp Elkan1995) and MDscan (Liu et al 2002) Default parameters valueswere used in all searches except that GC content in intergenicregions was set to 25 representing the background value estab-lished for the honey bee UCR database generated in this study

712 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

The database containing 10156 UCR sequences was gener-ated by parsing the Official Set annotation file (downloaded in GFFformat from httpwwwbeegenomehgscbcmtmcedubeeftphtml)to extract upstream regions starting from the terminal 5prime-genomiccoordinate of each predicted CDS The UCRs were arbitrarily setto a size frame of 1000 nucleotides (Roth et al 1998) but weretrimmed whenever another predicted ORF was detected in any ofthese regions

The MAP (maximum a priori log likelihood) score group specificityscore (called Church score in this manuscript) (Hughes et al2000) ROC AUC (area under the curve for a receiver-operatorcharacteristic plot) metric and MNCP (mean normalized condi-tional probability) metric (Clarke amp Granek 2003) were used todetect motifs that most likely correspond to biologically significantcis-regulatory elements The filters ran on the UCRs of the subsetsof queen and worker genes were a MAP score cut-off value of 50followed by a ROC AUC cut-off at 07 followed by a group specificityscore cut-off at 1eminus05 The UCR database of all GLEAN3 predictedhoney bee genes was used as the background to calculatethese metrics

A parametric test (MANOVA) and a nonparametric test (KolmogorovndashSmirnov) were conducted to identify significance levels for the twosets of filtered motifs found in the UCRs of caste-specificallyexpressed genes against filtered motifs found in the random UCRset Random motifs were sampled from a motif database (10 391motifs) generated by running our script 100 times with a randomsets of UCR sequences

The main criterion for identifying known regulatory motifs amongthese caste-specific ones was the alignment of the PSSM for eachbee motif with the Drosophila melanogaster sequences in theTRANSFAC database (release 40) (Wingender et al 2000) Onlythe alignments passing a threshold of 80 identity for each PSSMwere considered as significant matches In addition we checkedfor a specific binding motif the EcRUSP motif (rGkTCAaTGamcy-3prime) known to function in the expression of genes responding tomorphogenetic hormone titres (Perera et al 2005)

Operating system and programming tools

An Ubuntu Linux (version Breezy) operating system was used toimplement all scripts and pipelines designed for annotation proce-dures and motif discovery The Python programming language(httpwwwpythonorg) Biopython (httpwwwbiopythonorg)and TAMO (Tools for Analysis of Motifs) packages (Gordon et al2005) were used in program design Other web applications werebuilt using the Zope application server (httpwwwzopeorg)hosted at httpzulufmrpuspbrbeelab

Acknowledgements

This work was supported by grants from FAPESP(Fundaccedilatildeo de Amparo a Pesquisa do Estado de SatildeoPaulo 9900719-6) and CNPq (Conselho Nacional deDesenvolvimento Cientiacutefico e Tecnoloacutegico 4729632-3-1)

References

Abouheif E and Wray GA (2002) Evolution of the genetic networkunderlying wing polyphenism in ants Science 297 249ndash252

Al-Sharour F Diaz-Uriarte R and Dopazo J (2004) FatiGO aweb tool for finding significant associations of Gene Ontologyterms with groups of genes Bioinformatics 20 578ndash580

Amdam GV Norberg K Hagen A and Omholt SW (2003)Social exploitation of vitellogenin Proc Natl Acad Sci USA 1001799ndash1802

Amdam GV Simotildees ZLP Hagen A Norberg K SchroderK Mikkelsen O et al (2004) Hormonal control of the yolkprecursor vitellogenin regulates immune function and longevityin honeybees Exp Gerontol 39 767ndash773

Antoniewski C OrsquoGrady MS Edmondson RG Lassieur SMand Benes H (1995) Characterization of an EcRUSP heter-odimer target site that mediates ecdysone responsiveness ofthe Drosophila Lsp-2 gene Mol Gen Genet 249 545ndash556

Arbeitsman MN Furlong EEM Imam F Johnson E NullBH Baker BS et al (2002) Gene expression during the lifecycle of Drosophila melanogaster Science 297 2270ndash2275

Ashburner M Ball CA Blake JA Botstein D Butler HCherry JM et al (2000) Gene Ontology tool for the unificationof biology Nature Genet 25 25ndash29

Bailey TL and Elkan D (1995) Unsupervised learning of multiplemotifs in biopolymers using expectation maximizationMachine Learning J 21 58ndash83

Bergem M Norberg K and Aamodt RM (2006) Long-termmaintenance of in vitro cultured honeybee (Apis mellifera)embryonic cells BMC Dev Biol 6 17 online

Boutros M Kiger AA Armknecht S Kerr K Hild M Koch Bet al (2004) Genome-wide RNAi analysis of growth and viabil-ity in Drosophila cells Science 303 832ndash835

Carsten LD Watts T and Markov TA (2005) Gene expressionpatterns accompanying a dietary switch in Drosophila mela-nogaster Mol Ecol 14 3203ndash3208

Clarke ND and Granek JA (2003) Rank order metrics for quan-tifying the association of sequence features with gene regula-tion Bioinformatics 19 212ndash218

Colombani J Bianchini L Layalle S Pondeville E Dauphin-Villemant C Antoniewski C et al (2005) Antagonisticactions of ecdysone and insulins determine final size inDrosophila Science 310 667ndash670

Corona M Estrada E and Zurita M (1999) Differential expressionof mitochondrial genes between queens and workers duringcaste determination in the honeybee Apis mellifera J Exp Biol202 929ndash938

Cunha AC Nascimento AM Guidugli KR Simotildees ZLPand Bitondi MMG (2005) Molecular cloning and expressionof a hexamerin cDNA from the honey bee J Insect Physiol 511135ndash1147

Darwin CR (1859) On the Origin of Species by Means of NaturalSelection John Murray London

Davidson EH (2001) Genomic Regulatory Systems Developmentand Evolution Academic Press San Diego

Eder J Kremer JP and Rembold H (1983) Correlation ofcytochrome c titer and respiration in Apis mellifera adaptiveresponses to caste determination defines workers intercastesand queens Comp Biochem Physiol B 76 703ndash716

Engels W (1974) Occurrence and significance of vitellogenins infemale castes of social Hymenoptera Am Zool 14 1229ndash1237

Evans JD and Wheeler DE (1999) Differential gene expressionbetween developing queens and workers in the honey beeApis mellifera Proc Natl Acad Sci USA 96 5575ndash5580

Genomics of honey bee caste development and reproduction 713

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

Evans JD and Wheeler DE (2000) Expression profiles duringhoneybee caste determination Genome Biol 2 1 6 pagesonline httpgenomebiologycom200021research00011

Forbes A and Lehmann R (1998) Nanos and Pumilio havecritical roles in the development and function of Drosophilagermline stem cells Development 125 679ndash690

Gonzaacutelez-Reyes A Elliot H and St Johnston D (1995)Polarization of both major body axes in Drosophila by gurken-torpedo signalling Nature 375 654ndash658

Goodisman MA Isoe J Wheeler DE and Wells MA (2005)Evolution of insect metamorphosis a microarray-based studyof larval and adult gene expression in the ant Camponotusfestinatus Evolution 59 858ndash870

Gordon DB Nekludova L McCallum S and Fraenkel E(2005) TAMO a flexible object-oriented framework for analys-ing transcriptional regulation using DNA-sequence motifsBioinformatics 21 3164ndash3165

Guidugli KR Hepperle C and Hartfelder K (2004) A memberof the short-chain dehydrogenasereductase (SDR) super-family is a target of the ecdysone response in honey bee (Apismellifera) caste development Apidologie 35 37ndash47

Guidugli KR Nascimento AM Amdam GV Barchuk AROmholt SW Simotildees ZLP and Hartfelder K (2005)Vitellogenin regulates hormonal dynamics in the worker casteof a eusocial insect FEBS Lett 579 4961ndash4965

Hagedorn HH Maddison DR and Tu Z (1998) The evolutionof vitellogenins cyclorrhaphan yolk proteins and relatedmolecules Adv Insect Physiol 27 335ndash384

Hall TA (1999) BioEdit a user-friendly biological sequence align-ment editor and analysis program for Windows 9598NTNucleic Acids Symp Ser 41 95ndash98

Hamilton WD (1964) The genetical theory of social behaviour Iamp II J Theor Biol 7(1ndash16) 17ndash52

Hartfelder K and Emlen DJ (2005) Endocrine control of insectpolyphenism In Comprehensive Insect Molecular Science(Gilbert LI Iatrou K and Gill S eds) Vol 3 pp 651ndash703Elsevier Oxford

Hartfelder K and Engels W (1998) Social insect polymorphismHormonal regulation of plasticity in development and repro-duction in the honeybee Curr Topics Dev Biol 40 45ndash77

Hawkins NC Thorpe J and Schuumlpbach T (1996) encore agene required for the regulation of germ line mitosis and oocytedifferentiation during Drosophila oogenesis Development 122281ndash290

Haydak MH (1970) Honey bee nutrition Annu Rev Entomol 15143ndash156

Hepperle C and Hartfelder K (2001) Differentially expressedregulatory genes in honey bee caste development Naturwis-senschaften 88 113ndash116

Hoage TR and Kessel RG (1968) An electron microscopicstudy of the process of differentiation during spermatogenesis inthe drone honey bee (Apis mellifera L) with special reference tocentriole replication and elimination J Ultrastruct Res 24 6ndash32

Honey Bee Genome Sequencing Consortium (2006) Insightsinto social insects from the genome of the honeybee Apismellifera Nature (in press)

Hughes JD Estep PW Tavazole S and Church GM (2000)Computational identification of cis-regulatory elementsassociated with groups of functionally related genes inSaccharomyces cervisiae J Mol Biol 296 1205ndash1214

Hunt GJ and Page RE (1995) Linkage map of the honey bee

Apis mellifera based on RAPD markers Genetics 139 1371ndash1382

Hunt JH Buck NA and Wheeler DE (2003) Storage proteinsin vespid wasps characterization developmental pattern andoccurrence in adults J Insect Physiol 49 785ndash794

Judice CC Carazzole MF Festa F Sogayar MC HartfelderK and Pereira GAG (2006) Gene expression profiles under-lying alternative caste phenotypes in a highly eusocial beeMelipona quadrifasciata Insect Mol Biol 15 33ndash44

Kerr WE (1950) Genetic determination of castes in the genusMelipona Genetics 35 143ndash152

Kropaacutecovaacute S and Haslbachovaacute H (1971) The influence ofqueenlessness and of unsealed brood on the development ofovaries in worker honeybees J Apicult Res 10 57ndash61

Lattorff HMG Moritz RFA and Fuchs S (2005) A single genedetermines thelytokous parthenogenesis in honey bee workers(Apis mellifera capensis) Heredity 94 533ndash537

Lattorff HMG Moritz RFA Solignac M and Crewe RM(2006) Control of reproductive dominance by the thelytokygene in honeybees Biol Lett (in press)

Li TR and White KP (2003) Tissue-specific gene expressionand ecdysone-regulated genomic networks in Drosophila DevCell 5 59ndash72

Lin H Yue L and Spradling AC (1994) The Drosophila fusomea germline-specific organelle contains membrane skeletalproteins and functions in cyst formation Development 120947ndash956

Liu XS Brutlag DL and Liu JS (2002) An algorithm for findingprotein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments Nature Biotechnol20 835ndash839

Liu Y Liu S Wei L Altman RB and Batzoglou S (2004)Eukaryotic regulatory element conservation analysis andidentification using comparative genomics Genome Res 14354ndash366

Martinez T Burmester T Veenstra JA and Wheeler DE(2001) Sequence and evolution of a hexamerin from the antCamponotus festinatus Insect Mol Biol 9 427ndash431

McKearin D (1997) The Drosophila fusome organelle biogenesisand germ cell differentiation if you build it hellip Bioessays 19147ndash152

Mirth C Truman JW and Riddiford LM (2005) The role of theprothoracic gland in determining critical weight for metamor-phosis in Drosophila melanogaster Curr Biol 15 1ndash12

Moritz RFA Kryger P and Allsopp MH (1996) Competition forroyalty in bees Nature 384 31

Page RE and Erickson EH (1988) Reproduction by worker honeybees (Apis mellifera L) Behav Ecol Sociobiol 23 117ndash126

Pereboom JJM Jordan WC Sumner S Hammond RL andBourke AFG (2005) Differential gene expression in queen-worker caste determination in bumble bees Proc R Soc B 2721145ndash1152

Perera SC Zheng S Feng QL Krell PJ Retnakaran A andPalli SR (2005) Heterodimerization of ecdysone receptor andultraspiracle on symmetric and asymmetric response elementsArch Insect Biochem Physiol 60 55ndash70

Piulachs MD Guidugli KR Barchuk AR Cruz J SimotildeesZLP and Belleacutes X (2003) The vitellogenin of the honeybeeApis mellifera Structural analysis of the cDNA and expressionstudies Insect Biochem Mol Biol 33 459ndash465

Pritsker M Liu YC Beer MA and Tavazole S (2004) Whole

714 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

genome discovery of transcription factor binding sites bynetwork-level conservation Genome Res 14 99ndash108

Rachinsky A Strambi C Strambi A and Hartfelder K (1990)Caste and metamorphosis ndash hemolymph titers of juvenilehormone and ecdysteroids in last instar honeybee larvae GenComp Endocr 79 31ndash38

Riddihough G and Pelham HRB (1987) An ecdysone responseelement in the Drosophila hsp27 promotor EMBO J 6 3729ndash3734

Roth FP Hughes JD Estep PW and Church GM (1998)Finding DNA regulatory motifs within unaligned noncodingsequences clustered by whole-genome mRNA quantificationNature Biotechnol 16 939ndash945

Rutherford K Parkhill J Crook J Horsnell T Rice PRajandream MA and Barrell B (2000) Artemis sequencevisualisation and annotation Bioinformatics 16 944ndash945

Schmidt-Capella IC and Hartfelder K (1998) Juvenile hormoneeffect on DNA synthesis and apoptosis in caste-specificdifferentiation of the larval honey bee (Apis mellifera L) ovaryJ Insect Physiol 44 385ndash391

Snodgrass RE (1956) Anatomy of the Honey Bee Cornell Uni-versity Press Ithaca

Solignac M Vautrin D Baudry E Mougel F Loiseau A andCornuet JM (2004) A microsatellite-based linkage map of thehoneybee Apis mellifera L Genetics 167 253ndash262

St Johnston D and Nuumlsslein-Volhard C (1992) The origin of pat-tern and polarity in the Drosophila embryo Cell 68 201ndash219

Staab S and Steinmann-Zwicky M (1995) Female germ cells ofDrosophila require zygotic ovo and otu product for survival inlarvae and pupae respectively Mech Dev 54 205ndash210

Sullivan AA and Thummel CS (2003) Temporal profiles ofnuclear receptor gene expression profiles reveal coordinatetranscriptional responses during Drosophila development MolEndocrinol 17 2125ndash2137

Wassermann WW and Sandelin A (2004) Applied bioinformaticsfor the identification of regulatory elements Nature Rev Genet5 267ndash287

West-Eberhard MJ (2003) Developmental Plasticity andEvolution Oxford University Press Oxford

Wheeler DE and Nijhout HF (2003) A perspective for under-standing the modes of juvenile hormone as a lipid signalingsystem Bioessays 25 994ndash1001

Wilson EO (1971) The Insect Societies Belknapp Press of Har-vard University Press Cambridge MA

Wingender E Chen X Hehl R Karas H Liebich I Matys Vet al (2000) TRANSFAC an integrated system for geneexpression regulation Nucleic Acids Res 28 316ndash319

Zhou X Oi FM and Scharf ME (2006a) Social exploitationof hexamerin RNAi reveals a major caste-regulatory factor intermites Proc Natl Acad Sci USA 103 4499ndash4504

Zhou X Tarver MR Bennett GW Oi FM and Scharf ME(2006b) Two hexamerin genes from the termite Reticulitermesflavipes sequence expression and proposed functions incaste regulation Gene 376 47ndash58

Zhu Z Shendure J and Church GM (2005) Discoveringfunctional transcription factor combinations in the human cellcycle Genome Res 15 848ndash855

Supplementary material

The following material is available for this article online

Table S1 Annotation results of caste-specifically expressedhoney bee genes

Table S2 Caste-specific motifs in UCRs of honey bee genes

Table S3 Annotation of honey bee orthologs to Drosophilamelanogaster genes involved in Reproduction

This material is available as part of the online article fromhttpwwwblackwell-synergycom

Page 8: Caste development and reproduction: a genome-wide analysis of hallmarks of insect eusociality

710 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

translation by RNA binding RNA helicases enzymes(ubiquitination transfer of sugar residues sulfotransferase)GTPase activity and several factors binding to cytoskeletalproteins (Supplementary material Table 3S) This widerange of functional categories is expected as these genesare involved in a series of different steps during oogenesisin the polytrophic meroistic ovary Oogenesis starts out withthe maintenance of germline and somatic stem cell identityin the germline niche in the upper germarium A key geneinvolved in this process is pumilio (Forbes amp Lehmann1998) which is represented by a highly conserved beeortholog GB10504-PA The second step is the formation ofgerm cell cysts the determination of an oocyte within eachcyst and the survival of these cysts involving genes suchas benign gonial cell neoplasm (Lin et al 1994) encore(Hawkins et al 1996) ovo and ovarian tumour (otu) (Staabamp Steinmann-Zwicky 1995) all well conserved in the honeybee genome Interestingly we could not find a clear beeortholog for bag of marbles (bam) which is one of the primeearly response genes in the cystoblast differentiationpathway in Drosophila (McKearin 1997)

The third step comprises previtellogenic growth of thefollicle and during these stages a number of maternalfactors are deposited and anchored either within the oocyteor in the perivitelline space that define the egg and thefuture embryonic axes (for review see St Johnston ampNuumlsslein-Volhard (1992) In the list of Drosophila genesinvolved in early steps of axis determination a couple ofsurprises came up in the search for honey bee orthologsA big surprise was that we could not find a gurken orthologin the bee even though this gene sets up both the anteriorndashposterior and dorsalndashventral axes in the Drosophila egg(Gonzaacutelez-Reyes et al 1995) whereas downstreamcomponents of the Gurken signalling cascade appear tobe preserved in the bee genome (Honey Bee GenomeSequencing Consortium 2006) Similar apparent gaps inconstituents of patterning cascades were noted for theterminal regions of the embryo such as a lack of a torsoortholog whereas its ligand torso-like is represented by awell conserved ortholog in the bee (GB18663-PA)

With respect to genes involved in the final processes ofoogenesis we primarily looked at genes that play a partduring vitellogenesis There are four genes of interest in thisclass the primary one coding for the yolk protein precursorvitellogenin This gene has already been sequenced for thehoney bee (Piulachs et al 2003) and as expected it ismuch more related to vitellogenins of other insects andeven vertebrates than to the Drosophila yolk proteinswhich apparently are derived from lipases (Hagedorn et al1998) The second gene of interest is the bee ortholog toyolkless as this (GB16571-PA) could represent a putativevitellogenin receptor The other two Drosophila genes withclear orthologs in the bee are CG18641 and CG12139which code for a lipase and an LDL receptor respectively

General conclusions

The current analysis made use of previous experimentalanalyses on differential transcription during caste develop-ment of honey bee larvae In the annotation of these geneswhich includes references to Gene Ontology terms asso-ciated with their respective Drosophila orthologs two majorconfigurations emerged First of all worker genes werebetter defined in terms of GO attributes compared with therelatively large number of queen genes that had no GOterms associated to their respective Drosophila orthologsEven when taking into consideration the conceptual limitsin attributing GO terms on molecular function and biologicalprocess from Drosophila orthologs to honey bee genesthis finding could have a bearing on general basicquestions in socioevolution namely which caste is moredivergent from a nonsocial reproductive female bee proto-type or reproductive ground plan the queen or the workerLess speculative is the second major conclusion comingout of the GO analysis for Molecular Function showing andconfirming (Eder et al 1983 Corona et al 1999) theimportant role of metabolic regulation in caste developmentThis facet is demonstrated especially clearly in thecaste-specific expression of oxidoreductases (queen)vs hydrolases (workers)

The honey bee genome information provided not only amuch improved annotation platform for caste-specificallyexpressed ESTs but even more so opens the possibilityto explore putative regulatory features of the honey beegenome In the current study we employed modified Gibbssampling and expectation-maximization algorithms (Alig-nACE MDScan MEME) to detect group-specific motifs ingene regions up to 1000 bp upstream of translation startsites We detected 14 motifs that were significantly over-represented in the caste genes when compared withcorresponding motifs found in a random set of GLEAN3-predicted honey bee genes The localization of suchmotifs in UCRs of worker-overexpressed genes revealeda clustering of such motifs close to the predicted basalpromotor regions suggesting strong regulatory effectsSuch search strategies and the detected motifs can providethe lead to reveal and unravel cis-regulatory networks forand within specific contexts of honey bee biology

Caste polyphenism in social insects makes a strong casefor the emergence of novelties at a microevolutionary level(West-Eberhard 2003) Without the pretension to discussexhaustively the mechanisms underlying this surge ofdevelopmental plasticity two major themes becomeapparent in this and other studies Regulatory change hasbeen demonstrated in the shut-down of wing disc pattern-ing cascades in ants (Abouheif amp Wray 2002) and iscertainly also implicit in observed temporal changes ingene expression during postembryonic development ofants and bumble bees (Goodisman et al 2005 Pereboom

Genomics of honey bee caste development and reproduction 711

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

et al 2005) Such change would be expected to involvecis-regulatory elements that is change in transcriptionfactor binding sites in UCRs as approached in this studyand also evolutionary change in response thresholds tocirculating morphogenetic hormones (for review see Hart-felder amp Emlen 2005) The second and quite unexpectedtheme is the acquisition of new systemic functions byevolutionary rather old proteins such as vitellogenin andhexamerins These apparently unspectacular proteinshave evolved into key players for caste evolution and repro-ductive division of labour via novel regulatory connectivitywith JH (Amdam et al 2004 Guidugli et al 2005 Zhouet al 2006ab)

Experimental procedures

Selection and annotation of ESTs representing differentially expressed genes in honey bee caste development

The starting point were 164 entries (mainly 3prime-ESTs) in GenBank(BG101532ndashBG101697) from an SSH library (Evans amp Wheeler1999 Evans amp Wheeler 2000) When validated by macroarrayanalyses a clustering into three major classes became apparent(I) genes overexpressed in young larvae (II) genes overexpressedin last instar queen larvae and (III) genes overexpressed in lastinstar worker larvae For this study we excluded the class I ESTsbecause their expression is not caste-specific but rather representsexpression differences between young (still bipotent) and olderlarvae To the class II queen ESTs (82) we added one completecDNA entry (AY601642) from a DDRT-PCR screen (Corona et al1999) and to the class III set of worker ESTs (40) we added sevenGenBank dbEST entries (BG149167ndashBG149173) from a DDRT-PCR screen on ovary development (Hepperle amp Hartfelder 2001)

The EST sequences were submitted to BLASTN searches(parameters -G 2 -E 3 -W 15 -F lsquom Drsquo -U -e 1e-20) against genomesequence assembly Amel_v30 to retrieve matches in linked orunlinked genomic contigs and to exclude no-matches (seven ESTsin queen) ESTs that aligned within the same scaffold werechecked for clustering and overlap This clustering also served toexclude genes that were represented by non-overlapping ESTsfrom both castes This procedure generated a set of 51 uniqueputative gene sequences overexpressed in either queen (34) or inworker larvae (17) These 51 nonredundant sequences weresubmitted to BLASTX searches against the Official Set of GLEAN3-predicted protein sequences (cut-off value at 1eminus20) For ESTs withno significant protein sequence matches the genomic regionsadjacent to the mapped EST were searched to find neighbouringORFs especially those nearest to putative 3prime UTRs of predictedproteins as the EST libraries have a bias in this direction

Official Set protein sequences were aligned against Amel_v30sequence assembly using TBLASTN to map protein to genomeand subsequently they were aligned using BLASTP against theGenBank nonredundant (nr) and the Flybase protein sequencedatabases The manual features annotation procedure of theArtemis 70 program (Rutherford et al 2000) was used to mapORFs putative splice sites of exons and ESTs to genome coordi-nates The final annotation file was generated with a Python scriptin GFF format (httpwwwsangeracukSoftwareformatsGFF)

Honey bee sequences annotated as orthologs to D mela-nogaster genes were putatively assigned the GO terms listed in

the respective Flybase entry In addition the definition of new GOterms (Biological Process ontology) related to caste developmentand polyphenism (GO0048651 and GO0048650 respectively)was co-ordinated with the Gene Ontology Consortium (Ashburneret al 2000) The FatiGO web tool (Al-Sharour et al 2004) wasused to cluster GO terms (level 3 setting) for Biological Processand Molecular Function

For the detection of conserved domains the 51 proteinsequences were screened against the Pfam database (httpwwwsangeracukSoftwarePfam) using the HMMER platform(current release 232 httphmmerwustledu) with a cut-offvalue set at 1eminus10

Annotation of oogenesis and reproduction genes

In order to identify putative honey bee orthologs to D melanogastergenes we searched the following GO terms in Flybase lsquooogenesisrsquo(GO0009993) lsquovitellogenesisrsquo (GO0007296) lsquofemale meiosisrsquo(GO0007143) lsquoDNA recombinationrsquo (GO0006310) and lsquochromo-some segregationrsquo (GO0007059) Genes related to segregationdistortion were searched for in Flybase in phenotypic descriptionsand mutant effects of D melanogaster genes as this phenomenonis not represented by a GO term Hence this group may be moreheterogeneous than the others From this list we removed genesof pleiotropic function (multifaceted GO entries in Biological Process)and genes that lacked defined transcripts in the Drosophilagenome database

For the GO terms lsquooogenesisrsquo and lsquovitellogenesisrsquo we performedTBLASTN and BLASTP searches for 42 fruit fly genes against theAmel_v30 genome assembly and the GLEAN3-predicted proteinsequences (Honey Bee Genome Sequencing Consortium 2006)respectively The orthologous D melanogaster gene was charac-terized by the same procedure as described above (reciprocalbest hit) For the GO terms lsquofemale meiosisrsquo lsquoDNA recombinationrsquolsquochromosome segregationrsquo and the non-GO group lsquosegregationdistortionrsquo transcripts of D melanogaster were searched againstnr databases at NCBI using BLASTP The obtained sequenceswere searched against the Amel_v30 genome assembly and theGLEAN3-predicted protein sequences using TBLASTN Homolo-gous sequences (threshold 1eminus10) were predicted using the BioEditsoftware (Hall 1999) ORFs showing significant homology (BLASTPthreshold 1eminus20) were assembled and used in BLASTP searchesagainst the nr databases at NCBI

Motif search in upstream regions in caste-specifically expressed genes

In order to detect overrepresented motifs in the upstream controlregions (UCRs) of the two sets of caste-related genes we selectedgene subsets based on two criteria (1) those that had shown thehighest caste-specificity in the array analyses (Evans amp Wheeler2000) and (2) those that had a conserved 5prime region when comparedwith the Drosophila orthologs These lsquotop10rsquo genes consisted of sixqueen genes (GB13072 GB11628 GB19380 GB14798 GB16047and GB18242) and of six worker genes (GB10869 GB12371GB12239 GB10428 GB19006 GB14758) The motif search wasconducted separately on the two sets of UCR sequences using threemethods AlignAce (Roth et al 1998) MEME (Bailey amp Elkan1995) and MDscan (Liu et al 2002) Default parameters valueswere used in all searches except that GC content in intergenicregions was set to 25 representing the background value estab-lished for the honey bee UCR database generated in this study

712 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

The database containing 10156 UCR sequences was gener-ated by parsing the Official Set annotation file (downloaded in GFFformat from httpwwwbeegenomehgscbcmtmcedubeeftphtml)to extract upstream regions starting from the terminal 5prime-genomiccoordinate of each predicted CDS The UCRs were arbitrarily setto a size frame of 1000 nucleotides (Roth et al 1998) but weretrimmed whenever another predicted ORF was detected in any ofthese regions

The MAP (maximum a priori log likelihood) score group specificityscore (called Church score in this manuscript) (Hughes et al2000) ROC AUC (area under the curve for a receiver-operatorcharacteristic plot) metric and MNCP (mean normalized condi-tional probability) metric (Clarke amp Granek 2003) were used todetect motifs that most likely correspond to biologically significantcis-regulatory elements The filters ran on the UCRs of the subsetsof queen and worker genes were a MAP score cut-off value of 50followed by a ROC AUC cut-off at 07 followed by a group specificityscore cut-off at 1eminus05 The UCR database of all GLEAN3 predictedhoney bee genes was used as the background to calculatethese metrics

A parametric test (MANOVA) and a nonparametric test (KolmogorovndashSmirnov) were conducted to identify significance levels for the twosets of filtered motifs found in the UCRs of caste-specificallyexpressed genes against filtered motifs found in the random UCRset Random motifs were sampled from a motif database (10 391motifs) generated by running our script 100 times with a randomsets of UCR sequences

The main criterion for identifying known regulatory motifs amongthese caste-specific ones was the alignment of the PSSM for eachbee motif with the Drosophila melanogaster sequences in theTRANSFAC database (release 40) (Wingender et al 2000) Onlythe alignments passing a threshold of 80 identity for each PSSMwere considered as significant matches In addition we checkedfor a specific binding motif the EcRUSP motif (rGkTCAaTGamcy-3prime) known to function in the expression of genes responding tomorphogenetic hormone titres (Perera et al 2005)

Operating system and programming tools

An Ubuntu Linux (version Breezy) operating system was used toimplement all scripts and pipelines designed for annotation proce-dures and motif discovery The Python programming language(httpwwwpythonorg) Biopython (httpwwwbiopythonorg)and TAMO (Tools for Analysis of Motifs) packages (Gordon et al2005) were used in program design Other web applications werebuilt using the Zope application server (httpwwwzopeorg)hosted at httpzulufmrpuspbrbeelab

Acknowledgements

This work was supported by grants from FAPESP(Fundaccedilatildeo de Amparo a Pesquisa do Estado de SatildeoPaulo 9900719-6) and CNPq (Conselho Nacional deDesenvolvimento Cientiacutefico e Tecnoloacutegico 4729632-3-1)

References

Abouheif E and Wray GA (2002) Evolution of the genetic networkunderlying wing polyphenism in ants Science 297 249ndash252

Al-Sharour F Diaz-Uriarte R and Dopazo J (2004) FatiGO aweb tool for finding significant associations of Gene Ontologyterms with groups of genes Bioinformatics 20 578ndash580

Amdam GV Norberg K Hagen A and Omholt SW (2003)Social exploitation of vitellogenin Proc Natl Acad Sci USA 1001799ndash1802

Amdam GV Simotildees ZLP Hagen A Norberg K SchroderK Mikkelsen O et al (2004) Hormonal control of the yolkprecursor vitellogenin regulates immune function and longevityin honeybees Exp Gerontol 39 767ndash773

Antoniewski C OrsquoGrady MS Edmondson RG Lassieur SMand Benes H (1995) Characterization of an EcRUSP heter-odimer target site that mediates ecdysone responsiveness ofthe Drosophila Lsp-2 gene Mol Gen Genet 249 545ndash556

Arbeitsman MN Furlong EEM Imam F Johnson E NullBH Baker BS et al (2002) Gene expression during the lifecycle of Drosophila melanogaster Science 297 2270ndash2275

Ashburner M Ball CA Blake JA Botstein D Butler HCherry JM et al (2000) Gene Ontology tool for the unificationof biology Nature Genet 25 25ndash29

Bailey TL and Elkan D (1995) Unsupervised learning of multiplemotifs in biopolymers using expectation maximizationMachine Learning J 21 58ndash83

Bergem M Norberg K and Aamodt RM (2006) Long-termmaintenance of in vitro cultured honeybee (Apis mellifera)embryonic cells BMC Dev Biol 6 17 online

Boutros M Kiger AA Armknecht S Kerr K Hild M Koch Bet al (2004) Genome-wide RNAi analysis of growth and viabil-ity in Drosophila cells Science 303 832ndash835

Carsten LD Watts T and Markov TA (2005) Gene expressionpatterns accompanying a dietary switch in Drosophila mela-nogaster Mol Ecol 14 3203ndash3208

Clarke ND and Granek JA (2003) Rank order metrics for quan-tifying the association of sequence features with gene regula-tion Bioinformatics 19 212ndash218

Colombani J Bianchini L Layalle S Pondeville E Dauphin-Villemant C Antoniewski C et al (2005) Antagonisticactions of ecdysone and insulins determine final size inDrosophila Science 310 667ndash670

Corona M Estrada E and Zurita M (1999) Differential expressionof mitochondrial genes between queens and workers duringcaste determination in the honeybee Apis mellifera J Exp Biol202 929ndash938

Cunha AC Nascimento AM Guidugli KR Simotildees ZLPand Bitondi MMG (2005) Molecular cloning and expressionof a hexamerin cDNA from the honey bee J Insect Physiol 511135ndash1147

Darwin CR (1859) On the Origin of Species by Means of NaturalSelection John Murray London

Davidson EH (2001) Genomic Regulatory Systems Developmentand Evolution Academic Press San Diego

Eder J Kremer JP and Rembold H (1983) Correlation ofcytochrome c titer and respiration in Apis mellifera adaptiveresponses to caste determination defines workers intercastesand queens Comp Biochem Physiol B 76 703ndash716

Engels W (1974) Occurrence and significance of vitellogenins infemale castes of social Hymenoptera Am Zool 14 1229ndash1237

Evans JD and Wheeler DE (1999) Differential gene expressionbetween developing queens and workers in the honey beeApis mellifera Proc Natl Acad Sci USA 96 5575ndash5580

Genomics of honey bee caste development and reproduction 713

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

Evans JD and Wheeler DE (2000) Expression profiles duringhoneybee caste determination Genome Biol 2 1 6 pagesonline httpgenomebiologycom200021research00011

Forbes A and Lehmann R (1998) Nanos and Pumilio havecritical roles in the development and function of Drosophilagermline stem cells Development 125 679ndash690

Gonzaacutelez-Reyes A Elliot H and St Johnston D (1995)Polarization of both major body axes in Drosophila by gurken-torpedo signalling Nature 375 654ndash658

Goodisman MA Isoe J Wheeler DE and Wells MA (2005)Evolution of insect metamorphosis a microarray-based studyof larval and adult gene expression in the ant Camponotusfestinatus Evolution 59 858ndash870

Gordon DB Nekludova L McCallum S and Fraenkel E(2005) TAMO a flexible object-oriented framework for analys-ing transcriptional regulation using DNA-sequence motifsBioinformatics 21 3164ndash3165

Guidugli KR Hepperle C and Hartfelder K (2004) A memberof the short-chain dehydrogenasereductase (SDR) super-family is a target of the ecdysone response in honey bee (Apismellifera) caste development Apidologie 35 37ndash47

Guidugli KR Nascimento AM Amdam GV Barchuk AROmholt SW Simotildees ZLP and Hartfelder K (2005)Vitellogenin regulates hormonal dynamics in the worker casteof a eusocial insect FEBS Lett 579 4961ndash4965

Hagedorn HH Maddison DR and Tu Z (1998) The evolutionof vitellogenins cyclorrhaphan yolk proteins and relatedmolecules Adv Insect Physiol 27 335ndash384

Hall TA (1999) BioEdit a user-friendly biological sequence align-ment editor and analysis program for Windows 9598NTNucleic Acids Symp Ser 41 95ndash98

Hamilton WD (1964) The genetical theory of social behaviour Iamp II J Theor Biol 7(1ndash16) 17ndash52

Hartfelder K and Emlen DJ (2005) Endocrine control of insectpolyphenism In Comprehensive Insect Molecular Science(Gilbert LI Iatrou K and Gill S eds) Vol 3 pp 651ndash703Elsevier Oxford

Hartfelder K and Engels W (1998) Social insect polymorphismHormonal regulation of plasticity in development and repro-duction in the honeybee Curr Topics Dev Biol 40 45ndash77

Hawkins NC Thorpe J and Schuumlpbach T (1996) encore agene required for the regulation of germ line mitosis and oocytedifferentiation during Drosophila oogenesis Development 122281ndash290

Haydak MH (1970) Honey bee nutrition Annu Rev Entomol 15143ndash156

Hepperle C and Hartfelder K (2001) Differentially expressedregulatory genes in honey bee caste development Naturwis-senschaften 88 113ndash116

Hoage TR and Kessel RG (1968) An electron microscopicstudy of the process of differentiation during spermatogenesis inthe drone honey bee (Apis mellifera L) with special reference tocentriole replication and elimination J Ultrastruct Res 24 6ndash32

Honey Bee Genome Sequencing Consortium (2006) Insightsinto social insects from the genome of the honeybee Apismellifera Nature (in press)

Hughes JD Estep PW Tavazole S and Church GM (2000)Computational identification of cis-regulatory elementsassociated with groups of functionally related genes inSaccharomyces cervisiae J Mol Biol 296 1205ndash1214

Hunt GJ and Page RE (1995) Linkage map of the honey bee

Apis mellifera based on RAPD markers Genetics 139 1371ndash1382

Hunt JH Buck NA and Wheeler DE (2003) Storage proteinsin vespid wasps characterization developmental pattern andoccurrence in adults J Insect Physiol 49 785ndash794

Judice CC Carazzole MF Festa F Sogayar MC HartfelderK and Pereira GAG (2006) Gene expression profiles under-lying alternative caste phenotypes in a highly eusocial beeMelipona quadrifasciata Insect Mol Biol 15 33ndash44

Kerr WE (1950) Genetic determination of castes in the genusMelipona Genetics 35 143ndash152

Kropaacutecovaacute S and Haslbachovaacute H (1971) The influence ofqueenlessness and of unsealed brood on the development ofovaries in worker honeybees J Apicult Res 10 57ndash61

Lattorff HMG Moritz RFA and Fuchs S (2005) A single genedetermines thelytokous parthenogenesis in honey bee workers(Apis mellifera capensis) Heredity 94 533ndash537

Lattorff HMG Moritz RFA Solignac M and Crewe RM(2006) Control of reproductive dominance by the thelytokygene in honeybees Biol Lett (in press)

Li TR and White KP (2003) Tissue-specific gene expressionand ecdysone-regulated genomic networks in Drosophila DevCell 5 59ndash72

Lin H Yue L and Spradling AC (1994) The Drosophila fusomea germline-specific organelle contains membrane skeletalproteins and functions in cyst formation Development 120947ndash956

Liu XS Brutlag DL and Liu JS (2002) An algorithm for findingprotein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments Nature Biotechnol20 835ndash839

Liu Y Liu S Wei L Altman RB and Batzoglou S (2004)Eukaryotic regulatory element conservation analysis andidentification using comparative genomics Genome Res 14354ndash366

Martinez T Burmester T Veenstra JA and Wheeler DE(2001) Sequence and evolution of a hexamerin from the antCamponotus festinatus Insect Mol Biol 9 427ndash431

McKearin D (1997) The Drosophila fusome organelle biogenesisand germ cell differentiation if you build it hellip Bioessays 19147ndash152

Mirth C Truman JW and Riddiford LM (2005) The role of theprothoracic gland in determining critical weight for metamor-phosis in Drosophila melanogaster Curr Biol 15 1ndash12

Moritz RFA Kryger P and Allsopp MH (1996) Competition forroyalty in bees Nature 384 31

Page RE and Erickson EH (1988) Reproduction by worker honeybees (Apis mellifera L) Behav Ecol Sociobiol 23 117ndash126

Pereboom JJM Jordan WC Sumner S Hammond RL andBourke AFG (2005) Differential gene expression in queen-worker caste determination in bumble bees Proc R Soc B 2721145ndash1152

Perera SC Zheng S Feng QL Krell PJ Retnakaran A andPalli SR (2005) Heterodimerization of ecdysone receptor andultraspiracle on symmetric and asymmetric response elementsArch Insect Biochem Physiol 60 55ndash70

Piulachs MD Guidugli KR Barchuk AR Cruz J SimotildeesZLP and Belleacutes X (2003) The vitellogenin of the honeybeeApis mellifera Structural analysis of the cDNA and expressionstudies Insect Biochem Mol Biol 33 459ndash465

Pritsker M Liu YC Beer MA and Tavazole S (2004) Whole

714 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

genome discovery of transcription factor binding sites bynetwork-level conservation Genome Res 14 99ndash108

Rachinsky A Strambi C Strambi A and Hartfelder K (1990)Caste and metamorphosis ndash hemolymph titers of juvenilehormone and ecdysteroids in last instar honeybee larvae GenComp Endocr 79 31ndash38

Riddihough G and Pelham HRB (1987) An ecdysone responseelement in the Drosophila hsp27 promotor EMBO J 6 3729ndash3734

Roth FP Hughes JD Estep PW and Church GM (1998)Finding DNA regulatory motifs within unaligned noncodingsequences clustered by whole-genome mRNA quantificationNature Biotechnol 16 939ndash945

Rutherford K Parkhill J Crook J Horsnell T Rice PRajandream MA and Barrell B (2000) Artemis sequencevisualisation and annotation Bioinformatics 16 944ndash945

Schmidt-Capella IC and Hartfelder K (1998) Juvenile hormoneeffect on DNA synthesis and apoptosis in caste-specificdifferentiation of the larval honey bee (Apis mellifera L) ovaryJ Insect Physiol 44 385ndash391

Snodgrass RE (1956) Anatomy of the Honey Bee Cornell Uni-versity Press Ithaca

Solignac M Vautrin D Baudry E Mougel F Loiseau A andCornuet JM (2004) A microsatellite-based linkage map of thehoneybee Apis mellifera L Genetics 167 253ndash262

St Johnston D and Nuumlsslein-Volhard C (1992) The origin of pat-tern and polarity in the Drosophila embryo Cell 68 201ndash219

Staab S and Steinmann-Zwicky M (1995) Female germ cells ofDrosophila require zygotic ovo and otu product for survival inlarvae and pupae respectively Mech Dev 54 205ndash210

Sullivan AA and Thummel CS (2003) Temporal profiles ofnuclear receptor gene expression profiles reveal coordinatetranscriptional responses during Drosophila development MolEndocrinol 17 2125ndash2137

Wassermann WW and Sandelin A (2004) Applied bioinformaticsfor the identification of regulatory elements Nature Rev Genet5 267ndash287

West-Eberhard MJ (2003) Developmental Plasticity andEvolution Oxford University Press Oxford

Wheeler DE and Nijhout HF (2003) A perspective for under-standing the modes of juvenile hormone as a lipid signalingsystem Bioessays 25 994ndash1001

Wilson EO (1971) The Insect Societies Belknapp Press of Har-vard University Press Cambridge MA

Wingender E Chen X Hehl R Karas H Liebich I Matys Vet al (2000) TRANSFAC an integrated system for geneexpression regulation Nucleic Acids Res 28 316ndash319

Zhou X Oi FM and Scharf ME (2006a) Social exploitationof hexamerin RNAi reveals a major caste-regulatory factor intermites Proc Natl Acad Sci USA 103 4499ndash4504

Zhou X Tarver MR Bennett GW Oi FM and Scharf ME(2006b) Two hexamerin genes from the termite Reticulitermesflavipes sequence expression and proposed functions incaste regulation Gene 376 47ndash58

Zhu Z Shendure J and Church GM (2005) Discoveringfunctional transcription factor combinations in the human cellcycle Genome Res 15 848ndash855

Supplementary material

The following material is available for this article online

Table S1 Annotation results of caste-specifically expressedhoney bee genes

Table S2 Caste-specific motifs in UCRs of honey bee genes

Table S3 Annotation of honey bee orthologs to Drosophilamelanogaster genes involved in Reproduction

This material is available as part of the online article fromhttpwwwblackwell-synergycom

Page 9: Caste development and reproduction: a genome-wide analysis of hallmarks of insect eusociality

Genomics of honey bee caste development and reproduction 711

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

et al 2005) Such change would be expected to involvecis-regulatory elements that is change in transcriptionfactor binding sites in UCRs as approached in this studyand also evolutionary change in response thresholds tocirculating morphogenetic hormones (for review see Hart-felder amp Emlen 2005) The second and quite unexpectedtheme is the acquisition of new systemic functions byevolutionary rather old proteins such as vitellogenin andhexamerins These apparently unspectacular proteinshave evolved into key players for caste evolution and repro-ductive division of labour via novel regulatory connectivitywith JH (Amdam et al 2004 Guidugli et al 2005 Zhouet al 2006ab)

Experimental procedures

Selection and annotation of ESTs representing differentially expressed genes in honey bee caste development

The starting point were 164 entries (mainly 3prime-ESTs) in GenBank(BG101532ndashBG101697) from an SSH library (Evans amp Wheeler1999 Evans amp Wheeler 2000) When validated by macroarrayanalyses a clustering into three major classes became apparent(I) genes overexpressed in young larvae (II) genes overexpressedin last instar queen larvae and (III) genes overexpressed in lastinstar worker larvae For this study we excluded the class I ESTsbecause their expression is not caste-specific but rather representsexpression differences between young (still bipotent) and olderlarvae To the class II queen ESTs (82) we added one completecDNA entry (AY601642) from a DDRT-PCR screen (Corona et al1999) and to the class III set of worker ESTs (40) we added sevenGenBank dbEST entries (BG149167ndashBG149173) from a DDRT-PCR screen on ovary development (Hepperle amp Hartfelder 2001)

The EST sequences were submitted to BLASTN searches(parameters -G 2 -E 3 -W 15 -F lsquom Drsquo -U -e 1e-20) against genomesequence assembly Amel_v30 to retrieve matches in linked orunlinked genomic contigs and to exclude no-matches (seven ESTsin queen) ESTs that aligned within the same scaffold werechecked for clustering and overlap This clustering also served toexclude genes that were represented by non-overlapping ESTsfrom both castes This procedure generated a set of 51 uniqueputative gene sequences overexpressed in either queen (34) or inworker larvae (17) These 51 nonredundant sequences weresubmitted to BLASTX searches against the Official Set of GLEAN3-predicted protein sequences (cut-off value at 1eminus20) For ESTs withno significant protein sequence matches the genomic regionsadjacent to the mapped EST were searched to find neighbouringORFs especially those nearest to putative 3prime UTRs of predictedproteins as the EST libraries have a bias in this direction

Official Set protein sequences were aligned against Amel_v30sequence assembly using TBLASTN to map protein to genomeand subsequently they were aligned using BLASTP against theGenBank nonredundant (nr) and the Flybase protein sequencedatabases The manual features annotation procedure of theArtemis 70 program (Rutherford et al 2000) was used to mapORFs putative splice sites of exons and ESTs to genome coordi-nates The final annotation file was generated with a Python scriptin GFF format (httpwwwsangeracukSoftwareformatsGFF)

Honey bee sequences annotated as orthologs to D mela-nogaster genes were putatively assigned the GO terms listed in

the respective Flybase entry In addition the definition of new GOterms (Biological Process ontology) related to caste developmentand polyphenism (GO0048651 and GO0048650 respectively)was co-ordinated with the Gene Ontology Consortium (Ashburneret al 2000) The FatiGO web tool (Al-Sharour et al 2004) wasused to cluster GO terms (level 3 setting) for Biological Processand Molecular Function

For the detection of conserved domains the 51 proteinsequences were screened against the Pfam database (httpwwwsangeracukSoftwarePfam) using the HMMER platform(current release 232 httphmmerwustledu) with a cut-offvalue set at 1eminus10

Annotation of oogenesis and reproduction genes

In order to identify putative honey bee orthologs to D melanogastergenes we searched the following GO terms in Flybase lsquooogenesisrsquo(GO0009993) lsquovitellogenesisrsquo (GO0007296) lsquofemale meiosisrsquo(GO0007143) lsquoDNA recombinationrsquo (GO0006310) and lsquochromo-some segregationrsquo (GO0007059) Genes related to segregationdistortion were searched for in Flybase in phenotypic descriptionsand mutant effects of D melanogaster genes as this phenomenonis not represented by a GO term Hence this group may be moreheterogeneous than the others From this list we removed genesof pleiotropic function (multifaceted GO entries in Biological Process)and genes that lacked defined transcripts in the Drosophilagenome database

For the GO terms lsquooogenesisrsquo and lsquovitellogenesisrsquo we performedTBLASTN and BLASTP searches for 42 fruit fly genes against theAmel_v30 genome assembly and the GLEAN3-predicted proteinsequences (Honey Bee Genome Sequencing Consortium 2006)respectively The orthologous D melanogaster gene was charac-terized by the same procedure as described above (reciprocalbest hit) For the GO terms lsquofemale meiosisrsquo lsquoDNA recombinationrsquolsquochromosome segregationrsquo and the non-GO group lsquosegregationdistortionrsquo transcripts of D melanogaster were searched againstnr databases at NCBI using BLASTP The obtained sequenceswere searched against the Amel_v30 genome assembly and theGLEAN3-predicted protein sequences using TBLASTN Homolo-gous sequences (threshold 1eminus10) were predicted using the BioEditsoftware (Hall 1999) ORFs showing significant homology (BLASTPthreshold 1eminus20) were assembled and used in BLASTP searchesagainst the nr databases at NCBI

Motif search in upstream regions in caste-specifically expressed genes

In order to detect overrepresented motifs in the upstream controlregions (UCRs) of the two sets of caste-related genes we selectedgene subsets based on two criteria (1) those that had shown thehighest caste-specificity in the array analyses (Evans amp Wheeler2000) and (2) those that had a conserved 5prime region when comparedwith the Drosophila orthologs These lsquotop10rsquo genes consisted of sixqueen genes (GB13072 GB11628 GB19380 GB14798 GB16047and GB18242) and of six worker genes (GB10869 GB12371GB12239 GB10428 GB19006 GB14758) The motif search wasconducted separately on the two sets of UCR sequences using threemethods AlignAce (Roth et al 1998) MEME (Bailey amp Elkan1995) and MDscan (Liu et al 2002) Default parameters valueswere used in all searches except that GC content in intergenicregions was set to 25 representing the background value estab-lished for the honey bee UCR database generated in this study

712 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

The database containing 10156 UCR sequences was gener-ated by parsing the Official Set annotation file (downloaded in GFFformat from httpwwwbeegenomehgscbcmtmcedubeeftphtml)to extract upstream regions starting from the terminal 5prime-genomiccoordinate of each predicted CDS The UCRs were arbitrarily setto a size frame of 1000 nucleotides (Roth et al 1998) but weretrimmed whenever another predicted ORF was detected in any ofthese regions

The MAP (maximum a priori log likelihood) score group specificityscore (called Church score in this manuscript) (Hughes et al2000) ROC AUC (area under the curve for a receiver-operatorcharacteristic plot) metric and MNCP (mean normalized condi-tional probability) metric (Clarke amp Granek 2003) were used todetect motifs that most likely correspond to biologically significantcis-regulatory elements The filters ran on the UCRs of the subsetsof queen and worker genes were a MAP score cut-off value of 50followed by a ROC AUC cut-off at 07 followed by a group specificityscore cut-off at 1eminus05 The UCR database of all GLEAN3 predictedhoney bee genes was used as the background to calculatethese metrics

A parametric test (MANOVA) and a nonparametric test (KolmogorovndashSmirnov) were conducted to identify significance levels for the twosets of filtered motifs found in the UCRs of caste-specificallyexpressed genes against filtered motifs found in the random UCRset Random motifs were sampled from a motif database (10 391motifs) generated by running our script 100 times with a randomsets of UCR sequences

The main criterion for identifying known regulatory motifs amongthese caste-specific ones was the alignment of the PSSM for eachbee motif with the Drosophila melanogaster sequences in theTRANSFAC database (release 40) (Wingender et al 2000) Onlythe alignments passing a threshold of 80 identity for each PSSMwere considered as significant matches In addition we checkedfor a specific binding motif the EcRUSP motif (rGkTCAaTGamcy-3prime) known to function in the expression of genes responding tomorphogenetic hormone titres (Perera et al 2005)

Operating system and programming tools

An Ubuntu Linux (version Breezy) operating system was used toimplement all scripts and pipelines designed for annotation proce-dures and motif discovery The Python programming language(httpwwwpythonorg) Biopython (httpwwwbiopythonorg)and TAMO (Tools for Analysis of Motifs) packages (Gordon et al2005) were used in program design Other web applications werebuilt using the Zope application server (httpwwwzopeorg)hosted at httpzulufmrpuspbrbeelab

Acknowledgements

This work was supported by grants from FAPESP(Fundaccedilatildeo de Amparo a Pesquisa do Estado de SatildeoPaulo 9900719-6) and CNPq (Conselho Nacional deDesenvolvimento Cientiacutefico e Tecnoloacutegico 4729632-3-1)

References

Abouheif E and Wray GA (2002) Evolution of the genetic networkunderlying wing polyphenism in ants Science 297 249ndash252

Al-Sharour F Diaz-Uriarte R and Dopazo J (2004) FatiGO aweb tool for finding significant associations of Gene Ontologyterms with groups of genes Bioinformatics 20 578ndash580

Amdam GV Norberg K Hagen A and Omholt SW (2003)Social exploitation of vitellogenin Proc Natl Acad Sci USA 1001799ndash1802

Amdam GV Simotildees ZLP Hagen A Norberg K SchroderK Mikkelsen O et al (2004) Hormonal control of the yolkprecursor vitellogenin regulates immune function and longevityin honeybees Exp Gerontol 39 767ndash773

Antoniewski C OrsquoGrady MS Edmondson RG Lassieur SMand Benes H (1995) Characterization of an EcRUSP heter-odimer target site that mediates ecdysone responsiveness ofthe Drosophila Lsp-2 gene Mol Gen Genet 249 545ndash556

Arbeitsman MN Furlong EEM Imam F Johnson E NullBH Baker BS et al (2002) Gene expression during the lifecycle of Drosophila melanogaster Science 297 2270ndash2275

Ashburner M Ball CA Blake JA Botstein D Butler HCherry JM et al (2000) Gene Ontology tool for the unificationof biology Nature Genet 25 25ndash29

Bailey TL and Elkan D (1995) Unsupervised learning of multiplemotifs in biopolymers using expectation maximizationMachine Learning J 21 58ndash83

Bergem M Norberg K and Aamodt RM (2006) Long-termmaintenance of in vitro cultured honeybee (Apis mellifera)embryonic cells BMC Dev Biol 6 17 online

Boutros M Kiger AA Armknecht S Kerr K Hild M Koch Bet al (2004) Genome-wide RNAi analysis of growth and viabil-ity in Drosophila cells Science 303 832ndash835

Carsten LD Watts T and Markov TA (2005) Gene expressionpatterns accompanying a dietary switch in Drosophila mela-nogaster Mol Ecol 14 3203ndash3208

Clarke ND and Granek JA (2003) Rank order metrics for quan-tifying the association of sequence features with gene regula-tion Bioinformatics 19 212ndash218

Colombani J Bianchini L Layalle S Pondeville E Dauphin-Villemant C Antoniewski C et al (2005) Antagonisticactions of ecdysone and insulins determine final size inDrosophila Science 310 667ndash670

Corona M Estrada E and Zurita M (1999) Differential expressionof mitochondrial genes between queens and workers duringcaste determination in the honeybee Apis mellifera J Exp Biol202 929ndash938

Cunha AC Nascimento AM Guidugli KR Simotildees ZLPand Bitondi MMG (2005) Molecular cloning and expressionof a hexamerin cDNA from the honey bee J Insect Physiol 511135ndash1147

Darwin CR (1859) On the Origin of Species by Means of NaturalSelection John Murray London

Davidson EH (2001) Genomic Regulatory Systems Developmentand Evolution Academic Press San Diego

Eder J Kremer JP and Rembold H (1983) Correlation ofcytochrome c titer and respiration in Apis mellifera adaptiveresponses to caste determination defines workers intercastesand queens Comp Biochem Physiol B 76 703ndash716

Engels W (1974) Occurrence and significance of vitellogenins infemale castes of social Hymenoptera Am Zool 14 1229ndash1237

Evans JD and Wheeler DE (1999) Differential gene expressionbetween developing queens and workers in the honey beeApis mellifera Proc Natl Acad Sci USA 96 5575ndash5580

Genomics of honey bee caste development and reproduction 713

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

Evans JD and Wheeler DE (2000) Expression profiles duringhoneybee caste determination Genome Biol 2 1 6 pagesonline httpgenomebiologycom200021research00011

Forbes A and Lehmann R (1998) Nanos and Pumilio havecritical roles in the development and function of Drosophilagermline stem cells Development 125 679ndash690

Gonzaacutelez-Reyes A Elliot H and St Johnston D (1995)Polarization of both major body axes in Drosophila by gurken-torpedo signalling Nature 375 654ndash658

Goodisman MA Isoe J Wheeler DE and Wells MA (2005)Evolution of insect metamorphosis a microarray-based studyof larval and adult gene expression in the ant Camponotusfestinatus Evolution 59 858ndash870

Gordon DB Nekludova L McCallum S and Fraenkel E(2005) TAMO a flexible object-oriented framework for analys-ing transcriptional regulation using DNA-sequence motifsBioinformatics 21 3164ndash3165

Guidugli KR Hepperle C and Hartfelder K (2004) A memberof the short-chain dehydrogenasereductase (SDR) super-family is a target of the ecdysone response in honey bee (Apismellifera) caste development Apidologie 35 37ndash47

Guidugli KR Nascimento AM Amdam GV Barchuk AROmholt SW Simotildees ZLP and Hartfelder K (2005)Vitellogenin regulates hormonal dynamics in the worker casteof a eusocial insect FEBS Lett 579 4961ndash4965

Hagedorn HH Maddison DR and Tu Z (1998) The evolutionof vitellogenins cyclorrhaphan yolk proteins and relatedmolecules Adv Insect Physiol 27 335ndash384

Hall TA (1999) BioEdit a user-friendly biological sequence align-ment editor and analysis program for Windows 9598NTNucleic Acids Symp Ser 41 95ndash98

Hamilton WD (1964) The genetical theory of social behaviour Iamp II J Theor Biol 7(1ndash16) 17ndash52

Hartfelder K and Emlen DJ (2005) Endocrine control of insectpolyphenism In Comprehensive Insect Molecular Science(Gilbert LI Iatrou K and Gill S eds) Vol 3 pp 651ndash703Elsevier Oxford

Hartfelder K and Engels W (1998) Social insect polymorphismHormonal regulation of plasticity in development and repro-duction in the honeybee Curr Topics Dev Biol 40 45ndash77

Hawkins NC Thorpe J and Schuumlpbach T (1996) encore agene required for the regulation of germ line mitosis and oocytedifferentiation during Drosophila oogenesis Development 122281ndash290

Haydak MH (1970) Honey bee nutrition Annu Rev Entomol 15143ndash156

Hepperle C and Hartfelder K (2001) Differentially expressedregulatory genes in honey bee caste development Naturwis-senschaften 88 113ndash116

Hoage TR and Kessel RG (1968) An electron microscopicstudy of the process of differentiation during spermatogenesis inthe drone honey bee (Apis mellifera L) with special reference tocentriole replication and elimination J Ultrastruct Res 24 6ndash32

Honey Bee Genome Sequencing Consortium (2006) Insightsinto social insects from the genome of the honeybee Apismellifera Nature (in press)

Hughes JD Estep PW Tavazole S and Church GM (2000)Computational identification of cis-regulatory elementsassociated with groups of functionally related genes inSaccharomyces cervisiae J Mol Biol 296 1205ndash1214

Hunt GJ and Page RE (1995) Linkage map of the honey bee

Apis mellifera based on RAPD markers Genetics 139 1371ndash1382

Hunt JH Buck NA and Wheeler DE (2003) Storage proteinsin vespid wasps characterization developmental pattern andoccurrence in adults J Insect Physiol 49 785ndash794

Judice CC Carazzole MF Festa F Sogayar MC HartfelderK and Pereira GAG (2006) Gene expression profiles under-lying alternative caste phenotypes in a highly eusocial beeMelipona quadrifasciata Insect Mol Biol 15 33ndash44

Kerr WE (1950) Genetic determination of castes in the genusMelipona Genetics 35 143ndash152

Kropaacutecovaacute S and Haslbachovaacute H (1971) The influence ofqueenlessness and of unsealed brood on the development ofovaries in worker honeybees J Apicult Res 10 57ndash61

Lattorff HMG Moritz RFA and Fuchs S (2005) A single genedetermines thelytokous parthenogenesis in honey bee workers(Apis mellifera capensis) Heredity 94 533ndash537

Lattorff HMG Moritz RFA Solignac M and Crewe RM(2006) Control of reproductive dominance by the thelytokygene in honeybees Biol Lett (in press)

Li TR and White KP (2003) Tissue-specific gene expressionand ecdysone-regulated genomic networks in Drosophila DevCell 5 59ndash72

Lin H Yue L and Spradling AC (1994) The Drosophila fusomea germline-specific organelle contains membrane skeletalproteins and functions in cyst formation Development 120947ndash956

Liu XS Brutlag DL and Liu JS (2002) An algorithm for findingprotein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments Nature Biotechnol20 835ndash839

Liu Y Liu S Wei L Altman RB and Batzoglou S (2004)Eukaryotic regulatory element conservation analysis andidentification using comparative genomics Genome Res 14354ndash366

Martinez T Burmester T Veenstra JA and Wheeler DE(2001) Sequence and evolution of a hexamerin from the antCamponotus festinatus Insect Mol Biol 9 427ndash431

McKearin D (1997) The Drosophila fusome organelle biogenesisand germ cell differentiation if you build it hellip Bioessays 19147ndash152

Mirth C Truman JW and Riddiford LM (2005) The role of theprothoracic gland in determining critical weight for metamor-phosis in Drosophila melanogaster Curr Biol 15 1ndash12

Moritz RFA Kryger P and Allsopp MH (1996) Competition forroyalty in bees Nature 384 31

Page RE and Erickson EH (1988) Reproduction by worker honeybees (Apis mellifera L) Behav Ecol Sociobiol 23 117ndash126

Pereboom JJM Jordan WC Sumner S Hammond RL andBourke AFG (2005) Differential gene expression in queen-worker caste determination in bumble bees Proc R Soc B 2721145ndash1152

Perera SC Zheng S Feng QL Krell PJ Retnakaran A andPalli SR (2005) Heterodimerization of ecdysone receptor andultraspiracle on symmetric and asymmetric response elementsArch Insect Biochem Physiol 60 55ndash70

Piulachs MD Guidugli KR Barchuk AR Cruz J SimotildeesZLP and Belleacutes X (2003) The vitellogenin of the honeybeeApis mellifera Structural analysis of the cDNA and expressionstudies Insect Biochem Mol Biol 33 459ndash465

Pritsker M Liu YC Beer MA and Tavazole S (2004) Whole

714 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

genome discovery of transcription factor binding sites bynetwork-level conservation Genome Res 14 99ndash108

Rachinsky A Strambi C Strambi A and Hartfelder K (1990)Caste and metamorphosis ndash hemolymph titers of juvenilehormone and ecdysteroids in last instar honeybee larvae GenComp Endocr 79 31ndash38

Riddihough G and Pelham HRB (1987) An ecdysone responseelement in the Drosophila hsp27 promotor EMBO J 6 3729ndash3734

Roth FP Hughes JD Estep PW and Church GM (1998)Finding DNA regulatory motifs within unaligned noncodingsequences clustered by whole-genome mRNA quantificationNature Biotechnol 16 939ndash945

Rutherford K Parkhill J Crook J Horsnell T Rice PRajandream MA and Barrell B (2000) Artemis sequencevisualisation and annotation Bioinformatics 16 944ndash945

Schmidt-Capella IC and Hartfelder K (1998) Juvenile hormoneeffect on DNA synthesis and apoptosis in caste-specificdifferentiation of the larval honey bee (Apis mellifera L) ovaryJ Insect Physiol 44 385ndash391

Snodgrass RE (1956) Anatomy of the Honey Bee Cornell Uni-versity Press Ithaca

Solignac M Vautrin D Baudry E Mougel F Loiseau A andCornuet JM (2004) A microsatellite-based linkage map of thehoneybee Apis mellifera L Genetics 167 253ndash262

St Johnston D and Nuumlsslein-Volhard C (1992) The origin of pat-tern and polarity in the Drosophila embryo Cell 68 201ndash219

Staab S and Steinmann-Zwicky M (1995) Female germ cells ofDrosophila require zygotic ovo and otu product for survival inlarvae and pupae respectively Mech Dev 54 205ndash210

Sullivan AA and Thummel CS (2003) Temporal profiles ofnuclear receptor gene expression profiles reveal coordinatetranscriptional responses during Drosophila development MolEndocrinol 17 2125ndash2137

Wassermann WW and Sandelin A (2004) Applied bioinformaticsfor the identification of regulatory elements Nature Rev Genet5 267ndash287

West-Eberhard MJ (2003) Developmental Plasticity andEvolution Oxford University Press Oxford

Wheeler DE and Nijhout HF (2003) A perspective for under-standing the modes of juvenile hormone as a lipid signalingsystem Bioessays 25 994ndash1001

Wilson EO (1971) The Insect Societies Belknapp Press of Har-vard University Press Cambridge MA

Wingender E Chen X Hehl R Karas H Liebich I Matys Vet al (2000) TRANSFAC an integrated system for geneexpression regulation Nucleic Acids Res 28 316ndash319

Zhou X Oi FM and Scharf ME (2006a) Social exploitationof hexamerin RNAi reveals a major caste-regulatory factor intermites Proc Natl Acad Sci USA 103 4499ndash4504

Zhou X Tarver MR Bennett GW Oi FM and Scharf ME(2006b) Two hexamerin genes from the termite Reticulitermesflavipes sequence expression and proposed functions incaste regulation Gene 376 47ndash58

Zhu Z Shendure J and Church GM (2005) Discoveringfunctional transcription factor combinations in the human cellcycle Genome Res 15 848ndash855

Supplementary material

The following material is available for this article online

Table S1 Annotation results of caste-specifically expressedhoney bee genes

Table S2 Caste-specific motifs in UCRs of honey bee genes

Table S3 Annotation of honey bee orthologs to Drosophilamelanogaster genes involved in Reproduction

This material is available as part of the online article fromhttpwwwblackwell-synergycom

Page 10: Caste development and reproduction: a genome-wide analysis of hallmarks of insect eusociality

712 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

The database containing 10156 UCR sequences was gener-ated by parsing the Official Set annotation file (downloaded in GFFformat from httpwwwbeegenomehgscbcmtmcedubeeftphtml)to extract upstream regions starting from the terminal 5prime-genomiccoordinate of each predicted CDS The UCRs were arbitrarily setto a size frame of 1000 nucleotides (Roth et al 1998) but weretrimmed whenever another predicted ORF was detected in any ofthese regions

The MAP (maximum a priori log likelihood) score group specificityscore (called Church score in this manuscript) (Hughes et al2000) ROC AUC (area under the curve for a receiver-operatorcharacteristic plot) metric and MNCP (mean normalized condi-tional probability) metric (Clarke amp Granek 2003) were used todetect motifs that most likely correspond to biologically significantcis-regulatory elements The filters ran on the UCRs of the subsetsof queen and worker genes were a MAP score cut-off value of 50followed by a ROC AUC cut-off at 07 followed by a group specificityscore cut-off at 1eminus05 The UCR database of all GLEAN3 predictedhoney bee genes was used as the background to calculatethese metrics

A parametric test (MANOVA) and a nonparametric test (KolmogorovndashSmirnov) were conducted to identify significance levels for the twosets of filtered motifs found in the UCRs of caste-specificallyexpressed genes against filtered motifs found in the random UCRset Random motifs were sampled from a motif database (10 391motifs) generated by running our script 100 times with a randomsets of UCR sequences

The main criterion for identifying known regulatory motifs amongthese caste-specific ones was the alignment of the PSSM for eachbee motif with the Drosophila melanogaster sequences in theTRANSFAC database (release 40) (Wingender et al 2000) Onlythe alignments passing a threshold of 80 identity for each PSSMwere considered as significant matches In addition we checkedfor a specific binding motif the EcRUSP motif (rGkTCAaTGamcy-3prime) known to function in the expression of genes responding tomorphogenetic hormone titres (Perera et al 2005)

Operating system and programming tools

An Ubuntu Linux (version Breezy) operating system was used toimplement all scripts and pipelines designed for annotation proce-dures and motif discovery The Python programming language(httpwwwpythonorg) Biopython (httpwwwbiopythonorg)and TAMO (Tools for Analysis of Motifs) packages (Gordon et al2005) were used in program design Other web applications werebuilt using the Zope application server (httpwwwzopeorg)hosted at httpzulufmrpuspbrbeelab

Acknowledgements

This work was supported by grants from FAPESP(Fundaccedilatildeo de Amparo a Pesquisa do Estado de SatildeoPaulo 9900719-6) and CNPq (Conselho Nacional deDesenvolvimento Cientiacutefico e Tecnoloacutegico 4729632-3-1)

References

Abouheif E and Wray GA (2002) Evolution of the genetic networkunderlying wing polyphenism in ants Science 297 249ndash252

Al-Sharour F Diaz-Uriarte R and Dopazo J (2004) FatiGO aweb tool for finding significant associations of Gene Ontologyterms with groups of genes Bioinformatics 20 578ndash580

Amdam GV Norberg K Hagen A and Omholt SW (2003)Social exploitation of vitellogenin Proc Natl Acad Sci USA 1001799ndash1802

Amdam GV Simotildees ZLP Hagen A Norberg K SchroderK Mikkelsen O et al (2004) Hormonal control of the yolkprecursor vitellogenin regulates immune function and longevityin honeybees Exp Gerontol 39 767ndash773

Antoniewski C OrsquoGrady MS Edmondson RG Lassieur SMand Benes H (1995) Characterization of an EcRUSP heter-odimer target site that mediates ecdysone responsiveness ofthe Drosophila Lsp-2 gene Mol Gen Genet 249 545ndash556

Arbeitsman MN Furlong EEM Imam F Johnson E NullBH Baker BS et al (2002) Gene expression during the lifecycle of Drosophila melanogaster Science 297 2270ndash2275

Ashburner M Ball CA Blake JA Botstein D Butler HCherry JM et al (2000) Gene Ontology tool for the unificationof biology Nature Genet 25 25ndash29

Bailey TL and Elkan D (1995) Unsupervised learning of multiplemotifs in biopolymers using expectation maximizationMachine Learning J 21 58ndash83

Bergem M Norberg K and Aamodt RM (2006) Long-termmaintenance of in vitro cultured honeybee (Apis mellifera)embryonic cells BMC Dev Biol 6 17 online

Boutros M Kiger AA Armknecht S Kerr K Hild M Koch Bet al (2004) Genome-wide RNAi analysis of growth and viabil-ity in Drosophila cells Science 303 832ndash835

Carsten LD Watts T and Markov TA (2005) Gene expressionpatterns accompanying a dietary switch in Drosophila mela-nogaster Mol Ecol 14 3203ndash3208

Clarke ND and Granek JA (2003) Rank order metrics for quan-tifying the association of sequence features with gene regula-tion Bioinformatics 19 212ndash218

Colombani J Bianchini L Layalle S Pondeville E Dauphin-Villemant C Antoniewski C et al (2005) Antagonisticactions of ecdysone and insulins determine final size inDrosophila Science 310 667ndash670

Corona M Estrada E and Zurita M (1999) Differential expressionof mitochondrial genes between queens and workers duringcaste determination in the honeybee Apis mellifera J Exp Biol202 929ndash938

Cunha AC Nascimento AM Guidugli KR Simotildees ZLPand Bitondi MMG (2005) Molecular cloning and expressionof a hexamerin cDNA from the honey bee J Insect Physiol 511135ndash1147

Darwin CR (1859) On the Origin of Species by Means of NaturalSelection John Murray London

Davidson EH (2001) Genomic Regulatory Systems Developmentand Evolution Academic Press San Diego

Eder J Kremer JP and Rembold H (1983) Correlation ofcytochrome c titer and respiration in Apis mellifera adaptiveresponses to caste determination defines workers intercastesand queens Comp Biochem Physiol B 76 703ndash716

Engels W (1974) Occurrence and significance of vitellogenins infemale castes of social Hymenoptera Am Zool 14 1229ndash1237

Evans JD and Wheeler DE (1999) Differential gene expressionbetween developing queens and workers in the honey beeApis mellifera Proc Natl Acad Sci USA 96 5575ndash5580

Genomics of honey bee caste development and reproduction 713

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

Evans JD and Wheeler DE (2000) Expression profiles duringhoneybee caste determination Genome Biol 2 1 6 pagesonline httpgenomebiologycom200021research00011

Forbes A and Lehmann R (1998) Nanos and Pumilio havecritical roles in the development and function of Drosophilagermline stem cells Development 125 679ndash690

Gonzaacutelez-Reyes A Elliot H and St Johnston D (1995)Polarization of both major body axes in Drosophila by gurken-torpedo signalling Nature 375 654ndash658

Goodisman MA Isoe J Wheeler DE and Wells MA (2005)Evolution of insect metamorphosis a microarray-based studyof larval and adult gene expression in the ant Camponotusfestinatus Evolution 59 858ndash870

Gordon DB Nekludova L McCallum S and Fraenkel E(2005) TAMO a flexible object-oriented framework for analys-ing transcriptional regulation using DNA-sequence motifsBioinformatics 21 3164ndash3165

Guidugli KR Hepperle C and Hartfelder K (2004) A memberof the short-chain dehydrogenasereductase (SDR) super-family is a target of the ecdysone response in honey bee (Apismellifera) caste development Apidologie 35 37ndash47

Guidugli KR Nascimento AM Amdam GV Barchuk AROmholt SW Simotildees ZLP and Hartfelder K (2005)Vitellogenin regulates hormonal dynamics in the worker casteof a eusocial insect FEBS Lett 579 4961ndash4965

Hagedorn HH Maddison DR and Tu Z (1998) The evolutionof vitellogenins cyclorrhaphan yolk proteins and relatedmolecules Adv Insect Physiol 27 335ndash384

Hall TA (1999) BioEdit a user-friendly biological sequence align-ment editor and analysis program for Windows 9598NTNucleic Acids Symp Ser 41 95ndash98

Hamilton WD (1964) The genetical theory of social behaviour Iamp II J Theor Biol 7(1ndash16) 17ndash52

Hartfelder K and Emlen DJ (2005) Endocrine control of insectpolyphenism In Comprehensive Insect Molecular Science(Gilbert LI Iatrou K and Gill S eds) Vol 3 pp 651ndash703Elsevier Oxford

Hartfelder K and Engels W (1998) Social insect polymorphismHormonal regulation of plasticity in development and repro-duction in the honeybee Curr Topics Dev Biol 40 45ndash77

Hawkins NC Thorpe J and Schuumlpbach T (1996) encore agene required for the regulation of germ line mitosis and oocytedifferentiation during Drosophila oogenesis Development 122281ndash290

Haydak MH (1970) Honey bee nutrition Annu Rev Entomol 15143ndash156

Hepperle C and Hartfelder K (2001) Differentially expressedregulatory genes in honey bee caste development Naturwis-senschaften 88 113ndash116

Hoage TR and Kessel RG (1968) An electron microscopicstudy of the process of differentiation during spermatogenesis inthe drone honey bee (Apis mellifera L) with special reference tocentriole replication and elimination J Ultrastruct Res 24 6ndash32

Honey Bee Genome Sequencing Consortium (2006) Insightsinto social insects from the genome of the honeybee Apismellifera Nature (in press)

Hughes JD Estep PW Tavazole S and Church GM (2000)Computational identification of cis-regulatory elementsassociated with groups of functionally related genes inSaccharomyces cervisiae J Mol Biol 296 1205ndash1214

Hunt GJ and Page RE (1995) Linkage map of the honey bee

Apis mellifera based on RAPD markers Genetics 139 1371ndash1382

Hunt JH Buck NA and Wheeler DE (2003) Storage proteinsin vespid wasps characterization developmental pattern andoccurrence in adults J Insect Physiol 49 785ndash794

Judice CC Carazzole MF Festa F Sogayar MC HartfelderK and Pereira GAG (2006) Gene expression profiles under-lying alternative caste phenotypes in a highly eusocial beeMelipona quadrifasciata Insect Mol Biol 15 33ndash44

Kerr WE (1950) Genetic determination of castes in the genusMelipona Genetics 35 143ndash152

Kropaacutecovaacute S and Haslbachovaacute H (1971) The influence ofqueenlessness and of unsealed brood on the development ofovaries in worker honeybees J Apicult Res 10 57ndash61

Lattorff HMG Moritz RFA and Fuchs S (2005) A single genedetermines thelytokous parthenogenesis in honey bee workers(Apis mellifera capensis) Heredity 94 533ndash537

Lattorff HMG Moritz RFA Solignac M and Crewe RM(2006) Control of reproductive dominance by the thelytokygene in honeybees Biol Lett (in press)

Li TR and White KP (2003) Tissue-specific gene expressionand ecdysone-regulated genomic networks in Drosophila DevCell 5 59ndash72

Lin H Yue L and Spradling AC (1994) The Drosophila fusomea germline-specific organelle contains membrane skeletalproteins and functions in cyst formation Development 120947ndash956

Liu XS Brutlag DL and Liu JS (2002) An algorithm for findingprotein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments Nature Biotechnol20 835ndash839

Liu Y Liu S Wei L Altman RB and Batzoglou S (2004)Eukaryotic regulatory element conservation analysis andidentification using comparative genomics Genome Res 14354ndash366

Martinez T Burmester T Veenstra JA and Wheeler DE(2001) Sequence and evolution of a hexamerin from the antCamponotus festinatus Insect Mol Biol 9 427ndash431

McKearin D (1997) The Drosophila fusome organelle biogenesisand germ cell differentiation if you build it hellip Bioessays 19147ndash152

Mirth C Truman JW and Riddiford LM (2005) The role of theprothoracic gland in determining critical weight for metamor-phosis in Drosophila melanogaster Curr Biol 15 1ndash12

Moritz RFA Kryger P and Allsopp MH (1996) Competition forroyalty in bees Nature 384 31

Page RE and Erickson EH (1988) Reproduction by worker honeybees (Apis mellifera L) Behav Ecol Sociobiol 23 117ndash126

Pereboom JJM Jordan WC Sumner S Hammond RL andBourke AFG (2005) Differential gene expression in queen-worker caste determination in bumble bees Proc R Soc B 2721145ndash1152

Perera SC Zheng S Feng QL Krell PJ Retnakaran A andPalli SR (2005) Heterodimerization of ecdysone receptor andultraspiracle on symmetric and asymmetric response elementsArch Insect Biochem Physiol 60 55ndash70

Piulachs MD Guidugli KR Barchuk AR Cruz J SimotildeesZLP and Belleacutes X (2003) The vitellogenin of the honeybeeApis mellifera Structural analysis of the cDNA and expressionstudies Insect Biochem Mol Biol 33 459ndash465

Pritsker M Liu YC Beer MA and Tavazole S (2004) Whole

714 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

genome discovery of transcription factor binding sites bynetwork-level conservation Genome Res 14 99ndash108

Rachinsky A Strambi C Strambi A and Hartfelder K (1990)Caste and metamorphosis ndash hemolymph titers of juvenilehormone and ecdysteroids in last instar honeybee larvae GenComp Endocr 79 31ndash38

Riddihough G and Pelham HRB (1987) An ecdysone responseelement in the Drosophila hsp27 promotor EMBO J 6 3729ndash3734

Roth FP Hughes JD Estep PW and Church GM (1998)Finding DNA regulatory motifs within unaligned noncodingsequences clustered by whole-genome mRNA quantificationNature Biotechnol 16 939ndash945

Rutherford K Parkhill J Crook J Horsnell T Rice PRajandream MA and Barrell B (2000) Artemis sequencevisualisation and annotation Bioinformatics 16 944ndash945

Schmidt-Capella IC and Hartfelder K (1998) Juvenile hormoneeffect on DNA synthesis and apoptosis in caste-specificdifferentiation of the larval honey bee (Apis mellifera L) ovaryJ Insect Physiol 44 385ndash391

Snodgrass RE (1956) Anatomy of the Honey Bee Cornell Uni-versity Press Ithaca

Solignac M Vautrin D Baudry E Mougel F Loiseau A andCornuet JM (2004) A microsatellite-based linkage map of thehoneybee Apis mellifera L Genetics 167 253ndash262

St Johnston D and Nuumlsslein-Volhard C (1992) The origin of pat-tern and polarity in the Drosophila embryo Cell 68 201ndash219

Staab S and Steinmann-Zwicky M (1995) Female germ cells ofDrosophila require zygotic ovo and otu product for survival inlarvae and pupae respectively Mech Dev 54 205ndash210

Sullivan AA and Thummel CS (2003) Temporal profiles ofnuclear receptor gene expression profiles reveal coordinatetranscriptional responses during Drosophila development MolEndocrinol 17 2125ndash2137

Wassermann WW and Sandelin A (2004) Applied bioinformaticsfor the identification of regulatory elements Nature Rev Genet5 267ndash287

West-Eberhard MJ (2003) Developmental Plasticity andEvolution Oxford University Press Oxford

Wheeler DE and Nijhout HF (2003) A perspective for under-standing the modes of juvenile hormone as a lipid signalingsystem Bioessays 25 994ndash1001

Wilson EO (1971) The Insect Societies Belknapp Press of Har-vard University Press Cambridge MA

Wingender E Chen X Hehl R Karas H Liebich I Matys Vet al (2000) TRANSFAC an integrated system for geneexpression regulation Nucleic Acids Res 28 316ndash319

Zhou X Oi FM and Scharf ME (2006a) Social exploitationof hexamerin RNAi reveals a major caste-regulatory factor intermites Proc Natl Acad Sci USA 103 4499ndash4504

Zhou X Tarver MR Bennett GW Oi FM and Scharf ME(2006b) Two hexamerin genes from the termite Reticulitermesflavipes sequence expression and proposed functions incaste regulation Gene 376 47ndash58

Zhu Z Shendure J and Church GM (2005) Discoveringfunctional transcription factor combinations in the human cellcycle Genome Res 15 848ndash855

Supplementary material

The following material is available for this article online

Table S1 Annotation results of caste-specifically expressedhoney bee genes

Table S2 Caste-specific motifs in UCRs of honey bee genes

Table S3 Annotation of honey bee orthologs to Drosophilamelanogaster genes involved in Reproduction

This material is available as part of the online article fromhttpwwwblackwell-synergycom

Page 11: Caste development and reproduction: a genome-wide analysis of hallmarks of insect eusociality

Genomics of honey bee caste development and reproduction 713

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

Evans JD and Wheeler DE (2000) Expression profiles duringhoneybee caste determination Genome Biol 2 1 6 pagesonline httpgenomebiologycom200021research00011

Forbes A and Lehmann R (1998) Nanos and Pumilio havecritical roles in the development and function of Drosophilagermline stem cells Development 125 679ndash690

Gonzaacutelez-Reyes A Elliot H and St Johnston D (1995)Polarization of both major body axes in Drosophila by gurken-torpedo signalling Nature 375 654ndash658

Goodisman MA Isoe J Wheeler DE and Wells MA (2005)Evolution of insect metamorphosis a microarray-based studyof larval and adult gene expression in the ant Camponotusfestinatus Evolution 59 858ndash870

Gordon DB Nekludova L McCallum S and Fraenkel E(2005) TAMO a flexible object-oriented framework for analys-ing transcriptional regulation using DNA-sequence motifsBioinformatics 21 3164ndash3165

Guidugli KR Hepperle C and Hartfelder K (2004) A memberof the short-chain dehydrogenasereductase (SDR) super-family is a target of the ecdysone response in honey bee (Apismellifera) caste development Apidologie 35 37ndash47

Guidugli KR Nascimento AM Amdam GV Barchuk AROmholt SW Simotildees ZLP and Hartfelder K (2005)Vitellogenin regulates hormonal dynamics in the worker casteof a eusocial insect FEBS Lett 579 4961ndash4965

Hagedorn HH Maddison DR and Tu Z (1998) The evolutionof vitellogenins cyclorrhaphan yolk proteins and relatedmolecules Adv Insect Physiol 27 335ndash384

Hall TA (1999) BioEdit a user-friendly biological sequence align-ment editor and analysis program for Windows 9598NTNucleic Acids Symp Ser 41 95ndash98

Hamilton WD (1964) The genetical theory of social behaviour Iamp II J Theor Biol 7(1ndash16) 17ndash52

Hartfelder K and Emlen DJ (2005) Endocrine control of insectpolyphenism In Comprehensive Insect Molecular Science(Gilbert LI Iatrou K and Gill S eds) Vol 3 pp 651ndash703Elsevier Oxford

Hartfelder K and Engels W (1998) Social insect polymorphismHormonal regulation of plasticity in development and repro-duction in the honeybee Curr Topics Dev Biol 40 45ndash77

Hawkins NC Thorpe J and Schuumlpbach T (1996) encore agene required for the regulation of germ line mitosis and oocytedifferentiation during Drosophila oogenesis Development 122281ndash290

Haydak MH (1970) Honey bee nutrition Annu Rev Entomol 15143ndash156

Hepperle C and Hartfelder K (2001) Differentially expressedregulatory genes in honey bee caste development Naturwis-senschaften 88 113ndash116

Hoage TR and Kessel RG (1968) An electron microscopicstudy of the process of differentiation during spermatogenesis inthe drone honey bee (Apis mellifera L) with special reference tocentriole replication and elimination J Ultrastruct Res 24 6ndash32

Honey Bee Genome Sequencing Consortium (2006) Insightsinto social insects from the genome of the honeybee Apismellifera Nature (in press)

Hughes JD Estep PW Tavazole S and Church GM (2000)Computational identification of cis-regulatory elementsassociated with groups of functionally related genes inSaccharomyces cervisiae J Mol Biol 296 1205ndash1214

Hunt GJ and Page RE (1995) Linkage map of the honey bee

Apis mellifera based on RAPD markers Genetics 139 1371ndash1382

Hunt JH Buck NA and Wheeler DE (2003) Storage proteinsin vespid wasps characterization developmental pattern andoccurrence in adults J Insect Physiol 49 785ndash794

Judice CC Carazzole MF Festa F Sogayar MC HartfelderK and Pereira GAG (2006) Gene expression profiles under-lying alternative caste phenotypes in a highly eusocial beeMelipona quadrifasciata Insect Mol Biol 15 33ndash44

Kerr WE (1950) Genetic determination of castes in the genusMelipona Genetics 35 143ndash152

Kropaacutecovaacute S and Haslbachovaacute H (1971) The influence ofqueenlessness and of unsealed brood on the development ofovaries in worker honeybees J Apicult Res 10 57ndash61

Lattorff HMG Moritz RFA and Fuchs S (2005) A single genedetermines thelytokous parthenogenesis in honey bee workers(Apis mellifera capensis) Heredity 94 533ndash537

Lattorff HMG Moritz RFA Solignac M and Crewe RM(2006) Control of reproductive dominance by the thelytokygene in honeybees Biol Lett (in press)

Li TR and White KP (2003) Tissue-specific gene expressionand ecdysone-regulated genomic networks in Drosophila DevCell 5 59ndash72

Lin H Yue L and Spradling AC (1994) The Drosophila fusomea germline-specific organelle contains membrane skeletalproteins and functions in cyst formation Development 120947ndash956

Liu XS Brutlag DL and Liu JS (2002) An algorithm for findingprotein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments Nature Biotechnol20 835ndash839

Liu Y Liu S Wei L Altman RB and Batzoglou S (2004)Eukaryotic regulatory element conservation analysis andidentification using comparative genomics Genome Res 14354ndash366

Martinez T Burmester T Veenstra JA and Wheeler DE(2001) Sequence and evolution of a hexamerin from the antCamponotus festinatus Insect Mol Biol 9 427ndash431

McKearin D (1997) The Drosophila fusome organelle biogenesisand germ cell differentiation if you build it hellip Bioessays 19147ndash152

Mirth C Truman JW and Riddiford LM (2005) The role of theprothoracic gland in determining critical weight for metamor-phosis in Drosophila melanogaster Curr Biol 15 1ndash12

Moritz RFA Kryger P and Allsopp MH (1996) Competition forroyalty in bees Nature 384 31

Page RE and Erickson EH (1988) Reproduction by worker honeybees (Apis mellifera L) Behav Ecol Sociobiol 23 117ndash126

Pereboom JJM Jordan WC Sumner S Hammond RL andBourke AFG (2005) Differential gene expression in queen-worker caste determination in bumble bees Proc R Soc B 2721145ndash1152

Perera SC Zheng S Feng QL Krell PJ Retnakaran A andPalli SR (2005) Heterodimerization of ecdysone receptor andultraspiracle on symmetric and asymmetric response elementsArch Insect Biochem Physiol 60 55ndash70

Piulachs MD Guidugli KR Barchuk AR Cruz J SimotildeesZLP and Belleacutes X (2003) The vitellogenin of the honeybeeApis mellifera Structural analysis of the cDNA and expressionstudies Insect Biochem Mol Biol 33 459ndash465

Pritsker M Liu YC Beer MA and Tavazole S (2004) Whole

714 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

genome discovery of transcription factor binding sites bynetwork-level conservation Genome Res 14 99ndash108

Rachinsky A Strambi C Strambi A and Hartfelder K (1990)Caste and metamorphosis ndash hemolymph titers of juvenilehormone and ecdysteroids in last instar honeybee larvae GenComp Endocr 79 31ndash38

Riddihough G and Pelham HRB (1987) An ecdysone responseelement in the Drosophila hsp27 promotor EMBO J 6 3729ndash3734

Roth FP Hughes JD Estep PW and Church GM (1998)Finding DNA regulatory motifs within unaligned noncodingsequences clustered by whole-genome mRNA quantificationNature Biotechnol 16 939ndash945

Rutherford K Parkhill J Crook J Horsnell T Rice PRajandream MA and Barrell B (2000) Artemis sequencevisualisation and annotation Bioinformatics 16 944ndash945

Schmidt-Capella IC and Hartfelder K (1998) Juvenile hormoneeffect on DNA synthesis and apoptosis in caste-specificdifferentiation of the larval honey bee (Apis mellifera L) ovaryJ Insect Physiol 44 385ndash391

Snodgrass RE (1956) Anatomy of the Honey Bee Cornell Uni-versity Press Ithaca

Solignac M Vautrin D Baudry E Mougel F Loiseau A andCornuet JM (2004) A microsatellite-based linkage map of thehoneybee Apis mellifera L Genetics 167 253ndash262

St Johnston D and Nuumlsslein-Volhard C (1992) The origin of pat-tern and polarity in the Drosophila embryo Cell 68 201ndash219

Staab S and Steinmann-Zwicky M (1995) Female germ cells ofDrosophila require zygotic ovo and otu product for survival inlarvae and pupae respectively Mech Dev 54 205ndash210

Sullivan AA and Thummel CS (2003) Temporal profiles ofnuclear receptor gene expression profiles reveal coordinatetranscriptional responses during Drosophila development MolEndocrinol 17 2125ndash2137

Wassermann WW and Sandelin A (2004) Applied bioinformaticsfor the identification of regulatory elements Nature Rev Genet5 267ndash287

West-Eberhard MJ (2003) Developmental Plasticity andEvolution Oxford University Press Oxford

Wheeler DE and Nijhout HF (2003) A perspective for under-standing the modes of juvenile hormone as a lipid signalingsystem Bioessays 25 994ndash1001

Wilson EO (1971) The Insect Societies Belknapp Press of Har-vard University Press Cambridge MA

Wingender E Chen X Hehl R Karas H Liebich I Matys Vet al (2000) TRANSFAC an integrated system for geneexpression regulation Nucleic Acids Res 28 316ndash319

Zhou X Oi FM and Scharf ME (2006a) Social exploitationof hexamerin RNAi reveals a major caste-regulatory factor intermites Proc Natl Acad Sci USA 103 4499ndash4504

Zhou X Tarver MR Bennett GW Oi FM and Scharf ME(2006b) Two hexamerin genes from the termite Reticulitermesflavipes sequence expression and proposed functions incaste regulation Gene 376 47ndash58

Zhu Z Shendure J and Church GM (2005) Discoveringfunctional transcription factor combinations in the human cellcycle Genome Res 15 848ndash855

Supplementary material

The following material is available for this article online

Table S1 Annotation results of caste-specifically expressedhoney bee genes

Table S2 Caste-specific motifs in UCRs of honey bee genes

Table S3 Annotation of honey bee orthologs to Drosophilamelanogaster genes involved in Reproduction

This material is available as part of the online article fromhttpwwwblackwell-synergycom

Page 12: Caste development and reproduction: a genome-wide analysis of hallmarks of insect eusociality

714 A S Cristino et al

copy 2006 The AuthorsJournal compilation copy 2006 The Royal Entomological Society Insect Molecular Biology 15 703ndash714

genome discovery of transcription factor binding sites bynetwork-level conservation Genome Res 14 99ndash108

Rachinsky A Strambi C Strambi A and Hartfelder K (1990)Caste and metamorphosis ndash hemolymph titers of juvenilehormone and ecdysteroids in last instar honeybee larvae GenComp Endocr 79 31ndash38

Riddihough G and Pelham HRB (1987) An ecdysone responseelement in the Drosophila hsp27 promotor EMBO J 6 3729ndash3734

Roth FP Hughes JD Estep PW and Church GM (1998)Finding DNA regulatory motifs within unaligned noncodingsequences clustered by whole-genome mRNA quantificationNature Biotechnol 16 939ndash945

Rutherford K Parkhill J Crook J Horsnell T Rice PRajandream MA and Barrell B (2000) Artemis sequencevisualisation and annotation Bioinformatics 16 944ndash945

Schmidt-Capella IC and Hartfelder K (1998) Juvenile hormoneeffect on DNA synthesis and apoptosis in caste-specificdifferentiation of the larval honey bee (Apis mellifera L) ovaryJ Insect Physiol 44 385ndash391

Snodgrass RE (1956) Anatomy of the Honey Bee Cornell Uni-versity Press Ithaca

Solignac M Vautrin D Baudry E Mougel F Loiseau A andCornuet JM (2004) A microsatellite-based linkage map of thehoneybee Apis mellifera L Genetics 167 253ndash262

St Johnston D and Nuumlsslein-Volhard C (1992) The origin of pat-tern and polarity in the Drosophila embryo Cell 68 201ndash219

Staab S and Steinmann-Zwicky M (1995) Female germ cells ofDrosophila require zygotic ovo and otu product for survival inlarvae and pupae respectively Mech Dev 54 205ndash210

Sullivan AA and Thummel CS (2003) Temporal profiles ofnuclear receptor gene expression profiles reveal coordinatetranscriptional responses during Drosophila development MolEndocrinol 17 2125ndash2137

Wassermann WW and Sandelin A (2004) Applied bioinformaticsfor the identification of regulatory elements Nature Rev Genet5 267ndash287

West-Eberhard MJ (2003) Developmental Plasticity andEvolution Oxford University Press Oxford

Wheeler DE and Nijhout HF (2003) A perspective for under-standing the modes of juvenile hormone as a lipid signalingsystem Bioessays 25 994ndash1001

Wilson EO (1971) The Insect Societies Belknapp Press of Har-vard University Press Cambridge MA

Wingender E Chen X Hehl R Karas H Liebich I Matys Vet al (2000) TRANSFAC an integrated system for geneexpression regulation Nucleic Acids Res 28 316ndash319

Zhou X Oi FM and Scharf ME (2006a) Social exploitationof hexamerin RNAi reveals a major caste-regulatory factor intermites Proc Natl Acad Sci USA 103 4499ndash4504

Zhou X Tarver MR Bennett GW Oi FM and Scharf ME(2006b) Two hexamerin genes from the termite Reticulitermesflavipes sequence expression and proposed functions incaste regulation Gene 376 47ndash58

Zhu Z Shendure J and Church GM (2005) Discoveringfunctional transcription factor combinations in the human cellcycle Genome Res 15 848ndash855

Supplementary material

The following material is available for this article online

Table S1 Annotation results of caste-specifically expressedhoney bee genes

Table S2 Caste-specific motifs in UCRs of honey bee genes

Table S3 Annotation of honey bee orthologs to Drosophilamelanogaster genes involved in Reproduction

This material is available as part of the online article fromhttpwwwblackwell-synergycom