Top Banner
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright
10

Evolutionary insights into insecticide resistance gene families of Anopheles gambiae

Feb 23, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Evolutionary insights into insecticide resistance gene families of Anopheles gambiae

This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies areencouraged to visit:

http://www.elsevier.com/copyright

Page 2: Evolutionary insights into insecticide resistance gene families of Anopheles gambiae

Author's personal copy

Evolutionary insights into insecticide resistance gene families ofAnopheles gambiae

Hemlata Srivastava, Meenu Sharma, Jyotsana Dixit, Aparup Das *

Evolutionary Genomics and Bioinformatics Laboratory, National Institute of Malaria Research, Sector 8, Dwarka, New Delhi 110 077, India

1. Introduction

Emergence and spread of the insecticide resistance (IR),especially in the insect vectors responsible in transmitting fatalinfectious diseases, e.g. malaria, dengue and encephalitis (Gublerand Clark, 1991; Thompson et al., 1996; Meslin, 1997; Brogdon andMcAllister, 1998), have made control measures of these diseasesdifficult. Since development of resistance has been influenced bygenetic, biological and operational factors, insecticide resistancecame with different outcomes depending on diverse environmentalconditions (Brogdon and McAllister, 1998). Among all these, geneticbasis of IR has been widely acclaimed and with the availability ofwhole genome sequence information of some insect vectors (Holtet al., 2002; Nene et al., 2007), it is now possible to ascertain geneticpattern and evolutionary inferences of IR genes. Furthermore, suchgenes are reported to be present in large groups (gene families), thuscharacterization and determination of evolutionary inter-relation-ships among different genes of a particular family is important.

Genetic studies so far in IR have indicated the existence of threemajor gene families. These are cytochrome P450 (CYP), glutathione S

transferase (GST) and carboxyl esterase (COE) (Ranson et al., 2002).Genes of the CYP gene family is majorly accounted for providingresistance to permethrin and pyrethroid and for metabolism ofendogenous substrates (Feyereisen, 2006). On the other hand, genesof the GST genefamily are known toberesponsible for DDT resistance(Prapanthadara et al., 1993) and genes of the COE gene family areknown to confer resistance to organophosphate insecticide (Ray-mond et al., 1998). Such diverse functions played by genes ofdifferent families havebeendiscussed intermsofcontinuousprocessof gene duplication and functional divergence or regulation whichhad resulted in building up and expansion of these gene families(Ranson et al., 2002). These processes might have helped the insectsin creating new genetic background on which natural selection canact for ecological adaptation. However, genetic architecture,genomic localization of different genes of a family, conservationpattern of each gene family across different taxa and evolutionaryevents responsible for shaping the whole genetic architecture of theIR gene families are some of the important aspects to understandevolution of IR mechanisms in insect of agricultural and medicalimportance. Considering the fact that controlling vector-bornediseases majorly relies on controlling the vectors, detail understand-ing on evolution of IR genes is the need of the hour.

To this respect, malaria is a devastating vector-borne infectiousdisease accounting for about 247 millions cases and about one

Infection, Genetics and Evolution 10 (2010) 620–628

A R T I C L E I N F O

Article history:

Received 19 August 2009

Received in revised form 5 April 2010

Accepted 6 April 2010

Available online 13 April 2010

Keywords:

Malaria

Insecticide resistance (IR)

An. gambiae

Gene families

Evolution

A B S T R A C T

Insecticide resistance (IR) is one of the major obstacles in insect pests and insect borne disease control

strategies, the mechanism of which is known to be genetically controlled. Three major gene families

(CYP, GST and COE) have been identified encoding various proteins to metabolize endogenous as well as

exogenous compounds that are responsible for IR mechanisms in insects. Understanding evolutionary

patterns of genes of such important functions could lead to important understanding, based on which,

further studies to control various insect borne infectious diseases could be initiated. We herein utilized

the whole genome sequence information of the malaria vector Anopheles gambiae and inferred

evolutionary pattern of the three known IR gene families (CYP, GST and COE). The pattern of conservation

of IR genes across 38 other taxa was determined to infer evolutionary pattern of these gene families.

Chromosomal distribution of IR genes was ascertained and each individual gene of IR gene families was

also mapped on the chromosomal arms of An. gambiae. Differential distributional and quantitative

aspects of introns in each gene were determined and genetic architecture of genes from all three gene

families was compared to draw differential evolution of IR gene families. Further, phylogenetic

relationships among genes of each of the three gene families were also inferred. These results in

correlation with chromosomal location of each gene have provided valuable information about

evolutionary history of IR gene families.

� 2010 Elsevier B.V. All rights reserved.

* Corresponding author. Tel.: +91 11 25307 322; fax: +91 11 25307 377.

E-mail address: [email protected] (A. Das).

Contents lists available at ScienceDirect

Infection, Genetics and Evolution

journal homepage: www.elsev ier .com/ locate /meegid

1567-1348/$ – see front matter � 2010 Elsevier B.V. All rights reserved.

doi:10.1016/j.meegid.2010.04.002

Page 3: Evolutionary insights into insecticide resistance gene families of Anopheles gambiae

Author's personal copy

million deaths (World Malaria Report, 2008). While malaria wasalmost on the verge of eradication during, 1950 and, 1960s due toeffective vector control strategies developed through the usage ofinsecticides (e.g. DDT), in less than 10 years, malaria staged adramatic comeback, majorly due to emergence of IR mosquitovectors, rapidly spreading across all malaria endemic parts of theglobe (World Malaria Report, 2008). Considering different speciesof Anopheles that are locally distributed in contributing to malariatransmission, it is imperative to determine genetic pattern of IR insuch vectors. Unfortunately however, no whole genome informa-tion is available other than An. gambiae so far, debarringunderstanding on Anopheles-specific evolutionary genetic patternof IR gene families. On the other hand, whole genome sequenceinformation in various other taxa has provided opportunities toperform comparative genetic analysis for evolutionary inferencesof IR in the model malaria vector, An. gambiae. Such preliminaryresearch would be definitely helpful in understanding genetic basisof IR in malaria vectors in general, and in An. gambiae, in particular.

We herein utilized the fully sequenced and annotated genomeof the malaria vector An. gambiae, available in the public domainand investigated evolutionary pattern of genes of the three IR genefamilies (CYP, GST and COE). We ascertained positions of genes ofeach family in different chromosomes of An. gambiae anddetermined the conservation pattern of each gene family acrossdifferent taxa. The genetic architectural characteristics, e.g. genesize, intron number and size were also retrieved and analyzed toframe evolutionary inferences of the genes in the three IR genefamilies. Furthermore, phylogenetic relationships among differentgenes of each individual family were determined and correlatedwith the chromosomal locations of genes to hypothesize possiblepast evolutionary events that might have shaped presentorganization of each gene family in the An. gambiae genome.

2. Materials and methods

2.1. Identification of IR genes and retrieval of orthology information

IR genes are spread all over the genome of An. gambiae anddistributed on both arms of the autosomes and on the X-chromosome. By utilizing whole genome sequence informationof An. gambiae provided in the Ensembl web database (http://www.ensembl.org, version 49-50, accessed on March–July 2008),we scanned each arm of the two autosomes and whole X-chromosome for detection of IR genes with sliding windowapproach. The IR genes were identified on the basis of functionalannotation available at Ensembl and only genes with knownfunctions were considered for further analysis, leaving novel genesaside. Further, to find the pattern of conservation of IR genefamilies, we retrieved orthology information of each IR gene in 38different taxa from Ensembl database based on hybrid method(Kuzniar et al., 2008).

2.2. Classification, mapping and intron pattern of IR genes

On the basis of orthology information, IR genes of each of thethree gene families (CYP, GST and COE) were classified into twocategories; conserved and unique. In order to identify chromo-somal distribution of IR genes, these two categories were mappedon both the arms of autosomes and on the X-chromosome of An.

gambiae. Furthermore, to know whether any gene family-specificintron pattern exists in IR genes, genetic architectural informationlike the gene length, intron length and intron number for each IRgene of all the three gene families were collected from Ensembl.The conserved and unique genes of each of the three IR genefamilies were further classified on the basis of intron number.Moreover, the proportion of introns of varying length-ranges was

also identified in genes of all three IR gene families. Additionally, toascertain if introns play any role in contributing to the overalllength of the gene, Pearson’s correlation coefficients (r) betweengene size and total intron length and between gene size and intronnumber were performed.

2.3. Sequence alignment and reconstruction of phylogenetic tree

To infer evolutionary status of genes in an IR gene family,nucleotide sequences of each gene of the three IR gene families wasretrieved from Ensembl and aligned using the MegAlign computerprogram (DNA Star Inc., Madison, WI), based on clustal Walgorithm. For each gene family, neighbor-joining (NJ) phyloge-netic tree was constructed using Kimura distance formula (Kimura,1980). Bootstrap values were calculated by generating, 1000randomized trees. Based on phylogenetic information, genes ofeach IR family were classified into two categories; category I –genes that are phylogenetically close to each other, category II –genes that are phylogenetically distant. Genes of category I werefurther classified into three subcategories based on their chromo-somal locations: (a) phylogenetically close genes present in thesame chromosome, (b) phylogenetically close genes present indifferent autosomes and (c) phylogenetically close genes present inthe X-chromosome and autosome. Genes that belong to thecategory II were not found to be phylogenetically close to any othergene of the family, hence, these genes were not included in theanalysis.

2.4. Test for gene conversion

In order to understand if gene conversions have played any rolein expansion and evolution of gene families, we performed tests ofgene conversion among each pair of genes in each of the threesubcategories Ia, Ib and Ic (see above) by utilizing the GENECONVcomputer program version 1.18 (Sawyer, 1999). This programidentifies gene conversion events based on identical fragmentsbetween a pair of sequences in nucleotide alignments. GENECONVassigns P-values for aligned sequences by two methods: (i) byproviding the global P-value on comparing each fragment with allpossible fragments for the entire alignment and in permutationcombination, and (ii) by finding the approximate P-values inpairwise alignments following method suggested by Karlin andAltschul (1990). Detected gene conversion events were confirmedby statistically significant global P-values �0.05 in alignedsequences.

3. Results

A total number of 119 IR genes were identified from 12,427available annotated genes of An. gambiae. Out of these, 76 (64%)were from the CYP gene family; 31 (26%) were from the GST andonly 12 (10%) were from the COE gene families. We have utilized allthe 119 genes to ascertain evolutionary inference of IR genes in An.

gambiae.

3.1. Conservation of IR genes of An. gambiae in other taxa

We searched for orthology information of all the 119 IR genes ofAn. gambiae in 38 different taxa. Except six genes of the CYP family,all other genes were found to have orthologs in one or more taxa. Amajor proportion of genes of both the CYP and the COE family wasfound to be conserved in Aedes aegypti and Drosophila melanoga-

ster, whereas majority of genes of the GST family were found to beconserved in Ciona species (Fig. 1). Interestingly, 21 genes of theCYP and 20 genes of the GST family were found to share orthologywith Caenorhabditis elegans and Saccharomyces cerevisae genes,

H. Srivastava et al. / Infection, Genetics and Evolution 10 (2010) 620–628 621

Page 4: Evolutionary insights into insecticide resistance gene families of Anopheles gambiae

Author's personal copy

respectively. Overall, eight genes of the GST and four genes of theCYP family were found to be highly conserved, as these genes werefound to be orthologous in almost all the 38 studied taxa. Thus, theoverall conservation pattern of IR genes of An. gambiae to the 38

studied taxa reflects comparatively higher conservation of genes ofthe GST family than the CYP followed by the COE family.

3.2. Distribution patterns and mapping of IR genes in chromosomes of

An. gambiae

Using the information from Ensembl web database, we lookedfor the patterns of distribution of all the 119 IR genes and alsomapped them in autosomes and in X-chromosome of An. gambiae.Although the general overview of distributional patterns indicatesrandomness, in some cases, non-random distribution was alsoobserved; e.g. a high proportion (80%) of genes of the COE familywas found to be confined to the left arm of the secondchromosome. Similarly, genes of the CYP and the GST familieswere majorly found in the right arm of the second and the thirdchromosomes including five An. gambiae-specific unique genes ofthe CYP family (Fig. 2).

We also mapped locations of each IR gene in differentchromosomal arms of An. gambiae (Fig. 3). It is quite evident fromthe figure that genes from some of the subfamilies of all the three

Fig. 1. Number of orthologous genes in the CYP, GST and COE gene families of An. gambiae in 38 different taxa. Aa: Aedes aegypti, Dm: Drosophila melanogaster, Bt: Bos taurus, Cf:

Canis familiaris, Cp: Cavia porcellus, Dr: Danio rerio, Ci: Ciona intestinalis, Cs: Ciona savignyi, Dn: Dasypus novemcinctus, Et: Echinops telfairi, Ec: Equus caballus, Eeu: Erinaceus

europaeus, Fc: Felis catus, Gg: Gallus gallus, Ga: Gasterosteus aculeatus, Hs: Homo sapiens, La: Loxodonta Africana, Mml: Macaca mulatta, Mmr: Microcebus murinus, Md:

Monodelphis domestica, Mms: Mus musculus, Ml: Myotis lucifugus, Op: Ochotona princeps, Oa: Ornithorhynchus anatinus, Oc: Oryctolagus cuniculus, Ol: Oryzias latipes, Og:

Otolemur garnettii, Pt: Pan troglodytes, Pp: Pongo pygmaeus, Rn: Rattus norvegicus, Sa: Sorex araneus, St: Spermophilus tridecemlineatus, Tr: Takifugu rubripes, Tn: Tetraodon

nigroviridis, Tb: Tupaia belangeri, Xt: Xenopus tropicalis, Sc: Saccharomyces cerevisiae, Ce: Caenorhabditis elegans.

Fig. 2. Distribution of conserved and unique IR genes of CYP, GST and COE gene

families in different chromosomal arms of An. gambiae.

Fig. 3. Chromosomal locations of different IR genes in An. gambiae (maps not in scale).

H. Srivastava et al. / Infection, Genetics and Evolution 10 (2010) 620–628622

Page 5: Evolutionary insights into insecticide resistance gene families of Anopheles gambiae

Author's personal copy

IR gene families were arranged in the form of clusters; e.g. genes ofthe CYP subfamilies (CYP4, CYP6 and CYP325) and of the GST

subfamilies (GSTD and GSTE) were found in the 2R and 3Rchromosomes (Fig. 3). Likewise, genes of COEB and COEJ

subfamilies were found to be clustered in the left arm of thesecond chromosome (2L) of An. gambiae genome (Fig. 3).

3.3. Characteristic and distribution of introns in IR genes

Since introns are supposed to contribute to the evolution offunctional properties of genes, we ascertained the presence anddistribution of introns in each of the 119 IR genes from all the threegene families. Interestingly, all the 119, except three genes (one ofthe CYP unique and two of the GST family) were found to bearintrons. However, numbers of introns were restricted to one or twoin majority (68%) of the IR genes and the rest (32%) of the geneswere found to contain �3 intron. In the CYP family, while all theunique genes contain one or two introns, about 60% of theconserved genes had either one or two introns and the rest 40%contains �3 introns (Fig. 4). Furthermore, out of these 40% CYP

genes containing �3 introns, a major proportion (70%) had >3introns (Fig. 4). This situation is quite unique in all IR gene families,as no gene of other family found to contain such a high number of>3 introns. The intron distribution pattern in genes of the GST

family looks a bit different, as almost half of the genes were eitherintron-less or contain only one intron, the rest half had only twointrons (Fig. 4). However, no gene of this family was found to have�3 introns (Fig. 4). Interestingly, all the genes of the COE family hadintrons and majority of genes (75%) contain one or two introns andas many as 25% genes were identified with �3 introns (Fig. 4). Insummary, it is quite evident that genes with �3 introns were onlypresent in the conserved CYP and the COE families, while genes ofthe GST family had only one or two introns.

In order to know the proportion of introns of different sizes ingenes of all three IR gene families, we categorized all the 119 IRgenes on the basis of intron length. Interestingly, the majorproportion of introns was of 1–100 bp in all three gene familiesexcept in the CYP conserved class (Fig. 5). The conserved CYP familycontains a high proportion of introns (>60%) with size of >100 bp,while in the GST, this proportion was about 30% (Fig. 5).Surprisingly, in the unique CYP gene family, the proportion ofintrons of up to 100 bp length was found to be almost absolute,

while in the COE and the GST gene families, the proportion ofintrons of up to 100 bp length were 72% and 67%, respectively(Fig. 5). Thus, it is clear that majority of introns of the three familiesare of <100 bp length but only the conserved genes of CYP familywere found to have a substantial proportion of long (>200 bp)introns. Furthermore, in order to test the hypothesis of nocorrelation between the number and the size of introns with thesize of the gene per se, we performed Pearson’s correlationcoefficient (r), independently for all the three IR gene families (CYP,GST and COE). Statistically significant positive ‘r’ values for thethree gene families (Table 1) strongly indicate that the size of IRgenes in An. gambiae is, in fact, is dependent on both the numberand length of the introns.

3.4. Phylogenetic relationships among genes of the IR gene family

In order to ascertain evolutionary relationships among genes ofeach IR gene family, we constructed three different neighbor-joining (NJ) trees (Fig. 6a–c). While many genes were detected tohave close phylogenetic relationships, some were very distantlyrelated. This pattern is well visualized in the case of the CYP genefamily which is the largest among all the three IR gene families.This situation provides us to ascertain whether genes that arephylogenetically close are also closely placed in the genome. The NJtrees (Fig. 6a–c) clearly revealed that genes of subcategory (a) weremuch more prevalent over the other two subcategories in all thethree IR gene families. However, distribution of genes of thissubcategory was different. Whereas in the GST and the COE genefamily, the genes of subcategory (a) were distributed in astraightforward manner, these genes were found to be haphaz-ardly distributed in the CYP family. This means that most of thephylogenetically closely placed genes are also physically close toeach other in the genome. However, genes of subcategories (b) and(c) were found to be distributed in small stretches in the CYP andGST families (Fig. 6a and b) but no gene of these subcategories wasfound in the COE family (Fig. 6c). On the contrary, a relatively smallnumber of genes that had no phylogenetic relationship with othergenes (category II) were found to be distributed in all the threegene families, with fragmented distribution. The distribution ofgenes under subcategory (c) (between the X-chromosome and

Fig. 4. Proportion of genes with different number of introns in three IR gene

families.

Fig. 5. Proportion of introns of various sizes in the three IR gene families.

Table 1Pearson’s correlation coefficient (r) and probability (P) values between gene size

and intron number and length, separately for all three IR gene families.

Correlation CYP GST COE

r P r P r P

Gene size and

intron number

0.35 <0.005 0.35 0.050 0.75 <0.05

Gene size and

intron length

1 <0.0001 0.99 <0.0001 1 <0.0001

H. Srivastava et al. / Infection, Genetics and Evolution 10 (2010) 620–628 623

Page 6: Evolutionary insights into insecticide resistance gene families of Anopheles gambiae

Author's personal copy

Fig. 6. (a) Neighbor-joining (NJ) phylogenetic tree of CYP gene family [blue bar represents subcategory (a), yellow bar represents subcategory (b) green bars represent

subcategory (c) and White bar represents category II]; (b) neighbor-joining (NJ) phylogenetic tree of GST gene family. (c) Neighbor-joining (NJ) phylogenetic tree of COE gene

family.

H. Srivastava et al. / Infection, Genetics and Evolution 10 (2010) 620–628624

Page 7: Evolutionary insights into insecticide resistance gene families of Anopheles gambiae

Author's personal copy

autosomes) was found to be biased towards more associationbetween the X-chromosome and the second chromosome in theCYP family (Fig. 6a). This is evident from the fact that, out of thetotal six such clusters (indicated by green bars in the figure) totallyfound (five in the CYP and one in the GST), four consist of genes ofthe X- and the second chromosomes.

In order to understand if any evolutionary event has shaped thepresent distribution of IR genes in An. gambiae genome, we testedfor gene conversion event in each pair of genes of all the threesubcategories, separately. Gene conversion events were onlydetected between pairs of genes in subcategory (a) (phylogeneti-cally close genes present in same chromosome). However,statistically significant results were obtained between ten differentpairs of genes in the CYP family, between two pairs of genes in theGST and only between one pair in the COE family (Fig. 6a–c).

4. Discussion

Members of the three gene families, CYP, GST and COE

collectively contribute to the metabolism, development and IRmechanism of the An. gambiae (Ranson et al., 2002). Adaptingcomparative genomic approaches, the present study attempts toinfer evolutionary status of genes and gene families conferring IR inAn. gambiae.

4.1. Chromosome-biased IR gene family evolution

The results of the present study indicate biased distribution ofIR genes in the genome of An. gambiae. This is clearly evident fromobserved accumulation of majority of the CYP and the GST genes in

the right arms of both the autosomes and more specifically, on thesecond chromosome (2R). This bias can be due to sub-metacentricstructure of chromosomes of An. gambiae, because the large rightarm of both the second and the third chromosomes (2R and 3R)could accommodate more number of genes (Xia et al., 2008).Furthermore, frequent observed paracentric inversions in the 2Rarm of An. gambiae (Sharakhov et al., 2002) which modulate geneconversion and recombination could also result in gene familyexpansion, particularly in the 2R arm of An. gambiae. However, thefact that positive natural selection favors chromosomes that havemore beneficial mutations in redundant copies than others (Ohta,1986), might expound the observed phenomenon of chromosomalbias. However, this fact might not be true for the COE family, asmajority of the COE genes were located in 2L arm (Fig. 4). Thisclearly explains the fact that genes belonging to the COE familyfollow a completely different pathway of expansion. Moreover, theobserved clustered organization of some IR genes of An. gambiae indifferent chromosomes (Fig. 4) might be an added advantage formore powerful expression by synergetic effect to perform efficientfunctions. This fact has been proposed in COE A and COE E genes inCulex pipiens (Rooker et al., 1996), as these two genes, present onchromosome II of C. pipiens, contribute to the esterase overpro-duction to confer resistance to organophosphate (Guillemaudet al., 1997), and thus supposed to behave as single supergene(Raymond et al., 1998).

4.2. Gene orthology patterns and evolution of IR gene families

Almost all the presently studied IR genes (except six of the CYP

family) were found to be conserved in one or more different taxa,

Fig. 6. (Continued ).

H. Srivastava et al. / Infection, Genetics and Evolution 10 (2010) 620–628 625

Page 8: Evolutionary insights into insecticide resistance gene families of Anopheles gambiae

Author's personal copy

signifying, the fact that IR genes in An. gambiae are generallyconserved. However, the pattern and degree of conservation variesamong the three IR gene families. Similar to the earlier observationon the overall genetic closeness of An. gambiae to other closelyrelated insect taxa, e.g. Ae. aegypti and D. melanogaster (Seversonet al., 2004), we have also detected conserved patterns of many CYP

and all the COE genes in these two taxa. Further, Ae. aegypti hasbeen found to contain maximum number of orthologous genes ofthe CYP and COE gene families. Strikingly however, maximumnumbers of the GST genes were found to be conserved in Ciona

species (Fig. 2). Although no clear explanation on this finding couldbe provided, (i) previous explanation on the genetic closeness ofCiona intestinalis to vertebrates rather than Drosophila (Dehal et al.,2002) and (ii) long-term vectorial association of Anophelinemosquitoes with vertebrates (Capasso, 1998; Colluzi, 1999) lead usto explain this phenomenon owing to horizontal gene transferevents between Anopheles and Ciona species, possibly throughvertebrate link. However, further studies are needed to justify thisfact. Very similarly, 21 genes of the CYP and 20 genes of the GST

family were found to be conserved in C. elegans and S. cerevisiae,respectively. This fact may be explained in terms of functionalimportance of different genes, by which the sequences werepreserved by natural selection, as indicated by the conserveness ofparalogous genes across taxa (Vavouri et al., 2008). Furthermore,while findings on many (25%) highly conserved (found othologousin maximum number of taxa) genes reflect slower evolution (seebelow) of the GST family, a comparatively less percentage of highlyconserved genes (� 6%) in the CYP family indicates a more rapidand lineage-specific evolution (See below).

The overall evolutionary pattern of three IR gene families of An.

gambiae based on genetic conservation among various other taxawas found to be quite different for the three gene families. Forexample, as discussed above, a high proportion of highly conservedgenes, absence of An. gambiae-specific unique genes and geneticcloseness to Ciona species and S. cerevisiae provide enoughevidence to conclude that the GST gene family has a much slowerevolutionary rate in comparison to the other two (see below). Thishypothesis is further substantiated by the fact that, genes of thisfamily, that play important functions in conserved physiologicalpathways (Enayati et al., 2005; Low et al., 2007) are more liable tobe under selective constraint. In contrast, presence of very fewhighly conserved genes and detection of An. gambia-specific genesindicate towards ongoing and rapid evolution in the CYP family. Inturn, findings on the high conservation of genes of the COE familywith the close insect species (Ae. aegypti and D. melanogaster), andabsence of genes orthologous to any other distant taxa reflect thefact that this gene family is fairly recently originated and evolvingwithin the lineage.

4.3. Intron organization and IR gene family evolution

The fact that introns are considered to play potential role infunctional evolution of genes (Fedorova and Fedorov, 2003) wasverified by observation of presence of introns in almost all thegenes of three IR gene families except in a single unique CYP andsome genes of the GST family (Fig. 5). Evolutionary status of thethree IR gene families was further inferred with the number, sizeand potential correlations of these two parameters with the overallsize of the genes. While a major proportion of IR genes were foundto have one or two introns and the length of introns was generallyfound to be <100 bp, about 50% genes of the CYP and the COE genefamily contain >2 introns. This observation supports the earlierreports on intron gain over loss in paralogous gene families(Babenko et al., 2004). Furthermore, intron content has beenconsidered as an indicator of rapid evolution of genes becauseintrons favor recombination (Gilbert et al., 1997; Duret, 2001) and

immensely increase the rate of protein evolution (Fedorova andFedorov, 2003). Thus, major proportion of genes having more thantwo introns in the CYP and the COE gene family indicate chance forthe rapid evolution of these genes in the respective families. On thecontrary, almost half a number of genes of the GST family waseither intron-less or had only one intron reflecting a comparativeslower evolution. This intron-less architecture might be explainedby the fact that highly expressed genes favored intron deficitorganization in order to minimize cost of transcription and splicing(Castillo-Davis et al., 2002; Sharma et al., 2010). Reported highexpression of GST genes during individual exposure to DDT(Prapanthadara et al., 1993; Ranson et al., 2001; Ortelli et al., 2003)as well as during oxidative stress and lipid peroxidation (Hayes andPulford, 1995; Lumjuan et al., 2007) support the fact that genes ofthis family not only contribute to IR function but also to severalhousekeeping metabolic activities that demand high and rapidexpression. This could be the reason that majority of the genes ofthe GST family found to be intron-less. Such highly expressed genesare supposed to evolve slowly (Pal et al., 2001; Jordan et al., 2004;Drummond et al., 2005; Shakhnovich and Koonin, 2006; Sharmaet al., 2010), as proposed for the GST family (see above).

Moreover, a fairly greater number of longer introns (34%) oflength >200 bp were detected in the CYP family in comparison tothe GST and the COE families, where introns of �100 bp lengthwere frequently observed (Fig. 6). Thus, considering the numberand size of introns in the three gene families, it could besummarized that the conserved CYP and the COE families haveaccumulated introns, whereas the GST family had few introns ofconsiderable length. Since introns are believed to act as rawmaterials on which natural selection can act (Cory, 2000; Bergman,2001), the observed intron pattern indicate prospective chances forevolution in CYP and the COE gene families. Whereas, scarcity ofintrons may reflect probability for much slower evolution of theGST gene family as was also corroborated with conservationpattern (see above). Furthermore, observed positive correlationbetween gene size with intron number and length (Table 1) reflectthat gene size in all three IR gene families have increased due toaccumulation of introns as was also observed in gene families of P.

falciparum (Sharma et al., 2010).

4.4. Footprints of diverse evolutionary events in IR gene family

expansion

Gene families are certainly shaped by different evolutionaryprocesses, however, these processes are not observed as such,rather inferred from sequences (Liberles and Dittmar, 2008). In thepresent study, evidences from DNA sequences were obtained toascertain differential evolutionary events that might have shapedthe present DNA sequence architecture of each gene family.

4.4.1. The CYP gene family

Phylogenetic relationships among genes of the CYP familyrevealed evolutionary closeness of unique CYP genes withconserved CYP genes in some cases (Fig. 6a, see SupplementaryTable 1). This situation reflects that evolutionary forces might havefacilitated origin of unique genes from the existing conservedgenes (see Supplementary Table 1, Fig. 6a) and signifies preferen-tial duplication of conserved genes (Davis and Petrov, 2004) in theCYP gene family. However, newly acquired unique gene copies canonly be retained in the genome if the functional divergence hasbeen attained by duplicated gene (Ohno, 1970; Hughes, 1994;Force et al., 1999; Roth et al., 2007). Thus, presence of such uniqueCYP genes might bear footprints of functional versatility (Tijetet al., 2001; Ranson et al., 2002; Feyereisen, 2006) achieved by theCYP gene family. Furthermore, conserved genes that weretandemly placed on the same chromosome were also found to

H. Srivastava et al. / Infection, Genetics and Evolution 10 (2010) 620–628626

Page 9: Evolutionary insights into insecticide resistance gene families of Anopheles gambiae

Author's personal copy

be phylogenetically close (Fig. 6a) which may be a sign of geneconversion event, as such events between duplicated genes mayproduce similar gene copies (Hahn et al., 2007). This might be thesituation here for genes of subcategory (a), as ten statisticallysignificant gene conversion events were found between differentpairs of genes (Figs. 3 and 6a). Moreover, the hypothesis thatphysical distance between duplicated gene pairs and geneconversion are inversely correlated (Drouin, 2002), has beencorroborated in the present study, as gene conversion event werefound only in subcategory (a) (phylogenetically close genes thatare located in form clusters in the same chromosomes). It was alsoreported that gene conversion might mostly be evident in genefamilies commonly subjected to positive selection, as variantsproduced by gene conversion are more likely to be retained in thegenome (Zimmer et al., 1980; Mondragon-Palomino and Gaut,2005). Furthermore, close phylogenetic status obtained betweengenes on different chromosome (Fig. 6a) may provide inference ondifferent evolutionary events that are found to be responsible forevolution of gene families such as translocation, segmentalduplication and chromosomal rearrangements (Cannon et al.,2004; Leister, 2004). However, duplicated gene pairs found in theX-chromosome and autosomes (subcategory (c)) might also bedetected due to germ line specific X-chromosme inactivation(McKee and Handel, 1993; Handel, 2004; Maciejowski et al., 2005)by which essential genes of X- chromosome might have beencopied to another location in the genome and most possibly to theclosely placed autosomes (Maciejowski et al., 2005; Heger andPonting, 2007). It was observed in the present study thatduplicated genes between autosomes and sex-chromosome weremainly found between X and 2R, similar to the observedduplication of genes in X-chromosome to 3R in D. yakuba (Hegerand Ponting, 2007). Hence, correlation of chromosomal location ofgenes and the phylogenetic status further revealed that expansionof the CYP gene family might not solely be due to gene duplication;other processes, e.g. gene conversion, recombination and geneduplication followed by translocation, might have led expansion ofthe CYP gene family in An. gambiae.

4.4.2. The GST gene family

The GST gene family consists of 31 genes and all of them werefound to be conserved in one or more taxa. Majority of the GST

genes were found to belong to the subcategory (a) and this patternreflects the role of gene conversion in the origin of GST genes, asreflected by the identification of two statistically significant geneconversion events in subcategory (a) of GST phylogeny (Fig. 6b).However, only in two cases, duplicated genes between autosomes–autosomes (subcategory (b)) and autosomes–sex-chromosome(subcategory (c)) were observed in the phylogenetic tree of the GST

gene family (Fig. 6b). Hence, the present observation of GST genesjustifies major contribution of gene conversion and duplication inthe GST gene family expansion. However, small number of genes inthis family may restricts the role of other evolutionary events, aslarge gene families are more likely to undergo non-homologouspairing, unequal crossing over and segmental duplication (Hahnet al., 2005). Thus, the pattern of the GST gene family reflects geneduplication and gene conversion as the major events shapingevolution of this gene family in An. gambiae.

4.4.3. The COE gene family

The COE gene family consists of only 12 genes and no species-specific gene was detected in the present study. The phylogeny ofCOE genes reflects the role of gene conversion in expansion of COE

gene family, as all genes were found to belong to subcategory (a)and hence present on the same chromosome (Fig. 6c). Twostatistically significant gene conversion events were detectedbetween COE genes, and genes belonging to subcategories (b) and

(c) were totally absent (Fig. 6c). Thus, restricted distribution of COE

genes and simple phylogenetic relationship among genes mightreveal recent and lineage-specific expansion of the COE gene familyin An. gambiae.

As a whole, the results of the present study revealed signatures ofdifferent evolutionary forces that had shaped expansion of the threeIR gene families in the most effective malaria vector of Africa.Further, rapid evolution of the CYP gene family, much slowerevolution of the GST gene family and lineage-specific expansion ofthe COE gene family in An. gambiae was marked. These IR gene-specific evolutionary conclusions are further substantiated bygenetic diversity studies in the natural populations of An. gambiae,e.g. high polymorphism in the CYP (Sundberg, 2005) and the COE

genes (Guillemaud et al., 1996), and finding of various GST genes as‘ancestral’ and ‘conserved’ across various taxa (Sheehan et al., 2001;Low et al., 2007; Walters et al., 2009). Moreover, similar to thepresent observation in An. gambiae, the GST gene family has beenfound to be organized in form of clusters in other model organisms,e.g. D. melanogaster, human and mouse and gene conversion wasdetected as proposed phenomenon behind such organization(Toung et al., 1993; Nelson et al., 2004). Also, comparative genomicanalysis with D. melanogaster has revealed considerable expansionof these IR gene families in the mosquito (Waterhouse et al., 2008).Thus, the present results on the evolutionary genetics of the three IRgene families clearly corroborate earlier findings in general, and inAn. gambiae, in particular.

In conclusion, the results of the present study not only providebasic properties of the IR genes of all the three families but could alsoestablish evolutionary inter-relationships among genes and definethe roles of different evolutionary forces in shaping the present daygenetic patterns. While the past evolutionary events were inferredfrom phylogenetic inter-relationships and gene conversion events,the present scenario was unraveled from the orthology studies.Moreover, the observed differential intron content in the IR genefamilies somehow reflects the future evolutionary potentiality of IRgenes in An. gambiae. The results of the present study also revealedthat all three IR gene families have been independently evolving andthis might be due to differential selective pressures on differentgenes. Also, other factors, such as, unequal crossing over, geneduplication, gene conversion and segmental duplication might bethe frequently occurring events in An. gambiae genome leading tothe origin and expansion of IR gene families. These events are morelikely to have occurred on the right arm of the second chromosomeof An. gambiae, and the right arms of both the autosomes are possiblehot-spot for the IR gene accumulation. However, different environ-mental factors that have major impact on the direction of evolutionof genes and gene families in genome could not be ignored.Whatever the case may be, the study being completely computa-tional in nature, need verification with other approaches (e.g.

population genetic studies of different IR genes) and in other malariavectors of local importance.

Acknowledgments

The authors thank Prof. A. P. Dash, former Director of NIMR forfacilities and encouragements, and the Indian Council of MedicalResearch (ICMR) for intramural funding. Thanks are also due toRama Rajendran and Dipti Bhakhuni for supports during the initialstage of the work and two anonymous reviewers for critical andconstructive comments on an earlier version of the manuscript.

Appendix A. Supplementary data

Supplementary data associated with this article can be found, in

the online version, at doi:10.1016/j.meegid.2010.04.002.

H. Srivastava et al. / Infection, Genetics and Evolution 10 (2010) 620–628 627

Page 10: Evolutionary insights into insecticide resistance gene families of Anopheles gambiae

Author's personal copy

References

Babenko, V.N., Rogozin, I.B., Mekhedov, S.L., Koonin, E.V., 2004. Prevalence of introngain over intron loss in the evolution of paralogous gene families. Nucleic AcidsRes. 32, 3724–3733.

Bergman, J., 2001. The functions of introns: from junk DNA to designed DNA. Pers.Sci. Christ. Faith 53, 170–178.

Brogdon, W.G., McAllister, J.C., 1998. Insecticide resistance and vector control.Emerg. Infect. Dis. 4, 605–612.

Cannon, S.B., Mitra, A., Baumgarten, A., Young, N.D., May, G., 2004. The role ofsegmental and tendem gene duplication in the evolution of large gene familiesin Arabidopsis thaliana. BMC Plant Biol. 4, 10.

Capasso, L., 1998. The origin of human malaria. Int. J. Anthropol. 13, 165–175.Castillo-Davis, C.I., Mekhedov, S.L., Hartl, D.L., Koonin, E.V., Kondrashov, F.A., 2002.

Selection for short introns in highly expressed genes. Nat. Genet. 31, 415–418.Colluzi, M., 1999. The clay feet of malaria giant and its African roots: hypotheses and

inferences about origin, spread and control of Plasmodium falciparum. Para-sitologia 41, 277–283.

Cory, B., 2000. Evolution’s fuel. New Scientist 2235, 50–51.Davis, J.C., Petrov, D.A., 2004. Preferential duplication of conserved proteins in

eukaryotic genomes. PLoS Biol. 2, 318–326.Dehal, P., Satau, Y., Campbell, R.K., Chapman, J., Degnan, B., De-Tomasao, A., et al.,

2002. The draft genome of Ciona intestinalis: insights into cordate and verte-brate origin. Science 13, 2157–2167.

Drouin, G., 2002. Characterization of the gene conversions between the multigenefamily members of the yeast genome. J. Mol. Evol. 55, 14–23.

Drummond, D.A., Bloom, J.D., Adami, C., Wilke, C.O., Arnold, F.H., 2005. Whyhighly expressed proteins evolve slowly. Proc. Natl. Acad. Sci. U.S.A. 102,14338–14343.

Duret, L., 2001. Why do genes have introns? Recombination might add a new pieceto the puzzle. Trends Genet. 17, 172–175.

Enayati, A.A., Ranson, H., Hemigway, J., 2005. Insect glutathione transferase andInsecticide resistance. Insect Mol. Biol. 14, 3–8.

Fedorova, L., Fedorov, A., 2003. Introns in gene evolution. Genetics 118, 123–131.Feyereisen, R., 2006. Evolution of insect P450. Biochem. Soc. Trans. 34, 1253–1255.Force, A., Lynch, M., Pickett, F.B., Amores, A., Yan, Y.L., Postlethwait, J., 1999.

Preservation of duplicated genes by complementary, degenerative mutations.Genetics 151, 1531–1545.

Gilbert, W., Sounza, S.J.D., Long, M., 1997. Origin of genes. Proc. Natl. Acad. Sci. U.S.A.94, 7698–7703.

Gubler, D.J., Clark, G.G., 1991. Dengue hemorrhagic fever: the emergence of globalhealth problem. Emerg. Infect. Dis. 1, 55–57.

Guillemaud, T., Makate, N., Raymond, M., Hirst, B., Callaghan, A., 1997. Esterase geneamplification in Culex pipiens. Insect Mol. Biol. 6, 319–327.

Guillemaud, T., Rooker, S., Pasteur, N., Raymond, M., 1996. Testing the uniqueamplification event and the worldwide migration hypothesis of insecticideresistance genes with sequence data. Heredity 77, 535–543.

Hahn, M.W., Bie, T.J., Stajich, J.E., Nguyen, C., Christianini, N., 2005. Estimating thetempo and mode of gene family evolution from comparative genomic data.Genome Res. 15, 1153–1160.

Hahn, M.W., Han, M.V., Han, S.-G., 2007. Gene family evolution across 12 Drosophilagenomes. PLoS Genet. 3, 2135–2146.

Handel, M.A., 2004. The XY body: a specialized meiotic chromatin domain. Exp. CellRes. 296, 57–63.

Hayes, J.D., Pulford, D.J., 1995. The glutathione S-transferase supergene family:regulation of GST and contribution of isoenzymes to cancer chemoprotectionand drug resistance. Crit. Rev. Biochem. Mol. Biol. 30, 445–600.

Heger, A., Ponting, C.P., 2007. Evolutionary rate analyses of orthologs and paralogsfrom 12 Drosophila genomes. Genome Res. 17, 1837–1849.

Holt, R.A., Subramanian, G.M., Halpern, A., Sutton, G.G., Charlab, R., Nusskem, D.R.,et al., 2002. The genome sequence of the malaria mosquito An. gambiae. Science298, 129–149.

Hughes, A.L., 1994. The evolution of functionally novel proteins after gene duplica-tion. Proc. Biol. Sci. 256, 119–124.

Jordan, I.K., Wolf, Y.I., Koonin, E.V., 2004. Duplicated genes evolve slower thansingletons despite the initial rate increase. BMC Evol. Biol. 4, 22.

Karlin, S., Altschul, S.F., 1990. Methods for assessing the statistical significance ofmolecular sequence features by using general scoring schemes. Proc. Natl. Acad.Sci. U.S.A. 87, 2264–2268.

Kimura, M., 1980. A simple method for estimating evolutionary rate of basesubstitutions through comparative studies of nucleotide sequences. J. Mol.Evol. 16, 111–120.

Kuzniar, A., van Ham, R.C., Pongor, S., Leunissen, J.A.M., 2008. The quest for othologs:finding the corresponding gene across genomes. Trends Genet. 24, 539–551.

Leister, D., 2004. Tandem and segmental gene duplication and recombination in theevolution of plant disease resistance genes. Trends Genet. 20, 116–122.

Liberles, D.A., Dittmar, K., 2008. Characterizing gene family evolution. Biol. Proc.Online 10, 66–73.

Low, W.Y., Ng, H.L., Morton, C.J., Parker, M.W., Batterham, P., Robin, C., 2007.Molecular evolution of glutathione S-transferase in genus Drosophila. Genetics177, 1363–1375.

Lumjuan, N., Stevenson, B.J., Prapanthadara, L., Somboon, P., Brophy, P.M., Loftus,B.J., et al., 2007. The Ae. aegypti glutathione transferase family. Insect Biochem.Mol. Biol. 37, 1026–1035.

Maciejowski, M., Ahn, J.H., Cipriani, P.G., Killian, D.J., Chaudhary, A.L., Lee, J.I., et al.,2005. Autosomal genes of autosomal/X-linked duplicated gene pairs and germline proliferation in C. elegans. Genetics 169, 1997–2011.

McKee, B.D., Handel, M.A., 1993. Sex chromosomes, recombination, and chromatinconformation. Chromosoma 102, 71–80.

Meslin, F.X., 1997. Global aspects of emerging and potential zoonoses: a WHOperspective. Emerg. Infect. Dis. 3, 223–228.

Mondragon-Palomino, M., Gaut, B.S., 2005. Gene conversion and the evolution ofthree leucine rich repeat gene families in Arabidopsis thaliana. Mol. Biol. Evol. 22,2444–2456.

Nelson, D.R., Zeldin, D.C., Hoffman, S.M.G., Maltais, L.J., Wain, H.M., Nebert, D.W.,2004. Comparison of cytochrome P450 (CYP) genes from the mouse and humangenomes, including nomenclature recommendations for genes, pseudogenesand alternative-splice variants. Pharmacogenetics 14, 1–18.

Nene, V., Wortman, J.R., Lawson, D., Haas, B., Kodira, C., Tu, Z.J., et al., 2007. Genomesequence of Ae. aegypti, a major arbovirus vector. Science 316, 1718–1723.

Ohno, S., 1970. Evolution by Gene Duplication. Springer-Verlag, New York.Ohta, T., 1986. Simulating evolution by gene duplication. Genetics 115, 207–213.Ortelli, F., Rossiter, L.C., Vontas, J., Ranson, H., Hemingway, J., 2003. Heterologous

expression of four glutathione transferase genes genetically linked to a majorinsecticide-resistance locus from the malaria vector An. gambiae. Biochem. J.373, 957–963.

Pal, C., Papp, B., Hurst, L.D., 2001. Highly expressed genes in yeast evolve slowly.Genetics 158, 927–931.

Prapanthadara, I., Hemingway, J., Ketterman, A.J., 1993. Partial purification andcharacterization of glutathione S transferase involved in DDT resistance frommosquito Anopheles gambiae. Pest. Biochem. Physiol. 47, 119–133.

Ranson, H., Claudianos, C., Ortelli, F., Abgrall, C., Hemingway, J., Sharakhova, M.V.,et al., 2002. Evolution of supergene families associated with insecticide resis-tance. Science 298, 179–181.

Ranson, H., Rossiter, L., Ortelli, F., Jensen, B., Wang, X., Charles, W.R., et al., 2001.Identification of a novel class of insect glutathione S-transferases involved inresistance to DDT in Malaria vector Anopheles gambiae. Biochem. J. 359,295–304.

Raymond, M., Chevillon, C., Guillemaud, T., Lenormand, T., Pasteur, N., 1998. Anoverview of the evolution of overproduced esterase in mosquito Culex pipiens.Philos. Trans. R. Soc. Lond. B 353, 1701–1711.

Rooker, S., Guillemaud, T., Berge, J., Pasteur, N., Raymond, M., 1996. Coamplicationof esterase A and B genes as a single unit in the mosquito Culex pipiens. Heredity77, 555–561.

Roth, C., Rastogi, S., Arvestad, L., Dittmar, K., Light, S., Ekman, D., et al., 2007.Evolution after gene duplication: model, mechanism, sequences, systems, andorganisms. J. Exp. Zool. 308B, 58–73.

Sawyer, S.A., 1999. GENECONV: A Computer Package for the Statistical Detection ofGene Conversion. Distributed by the Author. Department of Mathematics,Washington University, St. Louis.

Severson, D.W., DeBruyn, B., Lovin, D.D., Brown, S.E., Knudson, D.L., Morlais, I., 2004.Comparative genome analysis of the yellow fever mosquito Ae. aegypti with D.melanogaster and the malaria vector mosquito An. gambiae. J. Hered. 95, 103–113.

Shakhnovich, B.E., Koonin, E.V., 2006. Origins and impact of constraints in evolutionof gene families. Genome Res. 16, 529–1536.

Sharakhov, I.V., Serazin, A.C., Grushko, O.G., Dana, A., Lobo, N., Hillenmeyer, M.E.,et al., 2002. Inversions and gene order shuffling in An. gambiae and An. funestus.Science 298, 182–185.

Sharma, M., Dash, A.P., Das, A., 2010. Evolutionary genetic insights into Plasmodiumfalciparum functional genes. Parasitol. Res. 106, 349–355.

Sheehan, D., Meade, G., Foley, V.M., Dowd, C.A., 2001. Structure, function andevolution of glutathione transferases: implications for classification of non-mammalian members of an ancient enzyme superfamily. Biochem. J. 360, 1–16.

Sundberg, I.M., 2005. Genetic polymorphisms of cytochrome P450 2D6 (CYP2D6):clinical consequences, evolutionary aspects and functional diversity. Pharma-cogenom. J. 5, 6–13.

Thompson, D.F., Malone, J.B., Harb, M., Faris, R., Huh, O.K., Buck, A.A., et al., 1996.Bancroftian filariasis distribution and diurnal temperature differences in south-ern Nile delta. Emerg. Infect. Dis. 2, 234–235.

Tijet, N., Helvig, C., Feyereisen, R., 2001. The cytochrome P450 gene superfamily inD. melanogaster: annotation, intron–exon organization and phylogeny. Gene262, 189–198.

Toung, Y.-P.S., Hsieh, T., Tu, C.P.D., 1993. The glutathione S transferase D genes: adivergently organized, intronless gene family in Drosophila melanogaster. J. Biol.Chem. 268, 9737–9746.

Vavouri, T., Semple, J.I., Lehner, B., 2008. Widespread conservation of genetic redun-dancy during a billion years of eukarotic evolution. Trends Genet. 24, 485–487.

Walters, K.B., Grant, P., Johnson, D.L., 2009. Evolution of the GST omega gene familyin 12 Drosophila species. J. Hered. 100, 742–753.

Waterhouse, R.M., Wyder, S., Zdobnov, E.M., 2008. The Aedes aegypti genome: acomparative perspective. Insect Mol. Biol. 17, 1–8.

World Malaria Report, 2008. http://apps.who.int/malaria/wmr2008/malaria2008.pdf.

Xia, A., Sharakhova, M.V., Sharakhov, I.V., 2008. Reconstructing ancestral autosomalarrangements in the Anopheles gambiae complex. J. Comput. Biol. 15, 965–980.

Zimmer, E.A., Martin, S.L., Beverley, S.M., Kan, Y.W., Wilson, A.C., 1980. Rapidduplication and loss of genes coding for the alpha chains of hemoglobin. Proc.Natl. Acad. Sci. U.S.A. 77, 2158–2162.

H. Srivastava et al. / Infection, Genetics and Evolution 10 (2010) 620–628628