Broad Institute – M. Henn Dengue Virus White Paper
3/15/2010 Page 1 of 21
Comparative Genomics of Dengue Virus: genome population structure,
transmission, and understanding differential inflammatory disease
responses
A white paper for Microbial Genome Sequencing submitted by:
Matthew Henn1, Irene Bosch
2, and Eva Harris
3 in collaboration with the Genomic Resources in Dengue
Consortium (GRID)
1Broad Institute
Microbial Genome Sequencing & Analysis, 320 Charles Street, Cambridge, MA 02141 USA
tel: 617.324.2341 mailto: [email protected]
2University of Massachusetts Medical School, Center for Infectious Disease and Vaccine Research, Worcester, MA 01655 USA
3University of California-Berkeley, School of Public Health, Berkeley, CA 94720 USA
Tuesday, June 21, 2005
Broad Institute – M. Henn Dengue Virus White Paper
3/15/2010 Page 2 of 21
Executive Summary
Dengue virus (DEN), a category-A pathogen, is a significant threat to public health world wide. The
virus is transmitted to humans by the mosquitoes Aedes aegypti and Ae. albopictus. The incidence of
dengue fever (DF), dengue hemorrhagic fever (DHF), and dengue shock syndrome (DSS) are rapidly
increasing, and more than 2.5 billion people live in regions endemic for the disease. Presently
approximately 50-100 million cases of DF occur yearly with more than 500,000 resulting in severe and
potentially fatal forms of the disease (DHF & DSS). Several factors contribute to the threat posed by
dengue. Most significant are lack of cross-reactive immunity for the four DEN serotypes (DEN-1, DEN-
2, DEN-3, and DEN-4), hyperendemic circulation of the four different serotypes in the same geographical
area, frequent worldwide travel, high population density, and lack of effective mosquito control programs.
Other socioeconomic factors only amplify the challenge of dengue control.
Despite the threat of this virus to public health and its importance to research on viral hemorrhagic fevers
our understanding of the virulence, pathogenesis, and mechanisms of re-emergence of the dengue virus is
limited. As a consequence, presently no vaccine or anti-viral therapy exists for this flavivirus, and the
genetic underpinnings of disease outcomes are unknown. Studies of nucleotide divergence among he
different serotypes has largely been limited to a single gene. This lack of basic information of viral
diversity severely limits vaccine and anti-viral therapy development efforts. In addition, the presence of
immune pathology in secondary DEN infections indicates that previous non-sterile immune response to
one serotype can exacerbate the immune response to the secondary infections caused by another serotype.
This host-viral interaction also represents a considerable challenge for vaccine development.
We propose to sequence 3500 dengue genomes of distinct geographic origin and disease to build the
genomic infrastructure needed to study and combat this virus. We will define the population structure of
the virus at multiple scales (i.e. within host, local, regional, and continental) which will enable
determination of the impact of introduced strains versus indigenous evolution on disease outcomes.
Additionally, as this project develops, the association of genome sequences to clinical data as well as
human genetic information will provide the first map of genomic distributions with reference to DF,
DHF, and DSS and host immune pressure.
Such a genomic resource will be able to identify important genetic correlates of disease emergence,
virulence, and attenuation in addition to the stability of polymorphisms across the entire genome.
Further, these data will provide the information necessary to develop cost-effective strain diagnostics for
disease tracking and outbreak response.
Broad Institute – M. Henn Dengue Virus White Paper
3/15/2010 Page 3 of 21
I) Dengue Virus a Burgeoning Public Health Threat Worldwide and in the USA
The recent increase in activity of dengue is cause for serious attention to this global vector-borne viral
disease of humans. NIAID has recognized the importance of this RNA virus (family Flaviviridae) and
considers dengue a Category-A organism. More than 2.5 billion people live in areas endemic for the
disease. 50-100 million new dengue infections are estimated to occur yearly worldwide [1, 2] with the
severe forms of the disease, dengue haemorrhagic fever (DHF) and dengue shock syndrome (DSS),
accruing in at least 500,000 of the cases [1, 3]. This is an increase from approximately 100,000 cases per
annum in the 1970s (Figure 1) [3]. Morbidity from DF is also of great public health concern. In severe
epidemics, approximately 5% of DHF cases lead to fatalities [4] and rates as high as 40% have been
reported in endemic regions with poor medical facilities [1]. The prevalence of dengue disease
worldwide is surging, particularly in the Americas, and 2001 epidemiological and media reports indicated
the highest levels of the virus ever recorded [5]. This re-emergence is evident in the United States
Commonwealth of Puerto Rico where all serotypes of the disease are co-circulating [6] and in Hawaii,
which in 2001-2002 experienced its first autochthonous dengue epidemic since 1944 [7]. In addition,
from 2001-2004 377 suspected cases of travel-associated dengue infections were reported in the United
States with CDC confirmed cases occurring in 22 states [8].
Dengue consists of four known serotypes, DEN-1, DEN-2, DEN-3, and DEN-4, that are no longer
geographically isolated (Figure 2). Increased travel between dengue-endemic regions, globalized
markets, increased population densities, and unplanned urbanization particularly in tropical regions have
significantly impacted the epidemiology of dengue infection. Both the virus and the mosquito vector
have spread across the mid-latitudes (Figure 2). Due to the above societal shifts, all of the endemic
regions are now hyperendemic, with multiple DEN viruses co-circulating. This in turn has had a
tremendous impact on the resurgence of DF, DHF, and DSS. The potential of evolutionary mechanisms
leading to new genotypes is likely enhanced by an increased genetic pool. Molecular studies using the
DEN E-protein indicate a dramatic increase in genetic diversity within serotypes at the onset of
widespread human transmission approximately 100 years ago (Figure 3) [9]. Also apparent is the recent
appearance of DEN-1 and DEN-3 (Figure 3) [9]. How such diversity impacts virulence and greater
inflammatory responses remains unanswered. The presence of immune pathology in secondary dengue
infections indicates that previous non-sterile immune response to one serotype could exacerbate the innate
immune response to the secondary infections caused by a different serotype than the first (i.e. Immune
Enhancement Hypothesis). This intricate interaction between the virus and the host impacts disease
outcomes and poses a significant challenge for vaccine development. Tetravalent vaccines with sterile
immunity for all the four types of the virus are required.
Introduction of dengue into the United States mainland is of particular concern. The mosquito Aedes
aegypti is the dominant vector of dengue in the Americas, while Ae. albopictus is a major vector in Asia.
Significantly, the recent epidemic in Hawaii implicated Aedes albopictus as the vector [7], and this
mosquito is now found in at least 24 states on the US mainland [10, 11]. Dengue virus is unique from
related flaviviruses in that it has adapted to humans and can be maintained in an Aedes-human-Aedes
cycle without input from an enzootic cycle [1]. This urbanization of virus transmission increases its
ability to spread beyond its endemic regions as it is not dependent on the forest enzootic life-stage. In
2001, travel between the US mainland and dengue-endemic regions was estimated to be 14 million
passengers [7]. Additionally, the deliberate dispersal of mosquitoes infected with different DEN
serotypes into urban centers could pose a significant threat to public health in multiple regions of the US
where Ae. aegypti and Ae. albopictus are found.
Broad Institute – M. Henn Dengue Virus White Paper
3/15/2010 Page 4 of 21
Figure 1. Reported Cases of Dengue Hemorrhagic Fever from 1955-
1998. Source: World Health Organization
Figure 3. Maximum likelihood phylogeny of four dengue serotypes showing
diversity within serotypes and date (A.D.) of most recent common ancestor
(MRCA). Figure is from Twiddy et al. [9].
Figure 2. Worldwide distribution of Aedes aegypti, dengue virus serotypes, and dengue epidemic activity.
Figure is modified from Gubler [3] Figure 2.
Broad Institute – M. Henn Dengue Virus White Paper
3/15/2010 Page 5 of 21
II) Research Objectives and Goals
Remarkably, for a virus that is considered a Category-A pathogen and that infects ~100 million people
each year, there are only 96 whole genome sequences (10.5 kb) deposited, with the distribution of these
sequences being highly skewed among the serotypes (DEN-1=19, DEN-2=47, DEN-3=26, DEN-4=4).
There is an urgent need for an expansive and collaborative research program that relates viral genomic
sequences with well-defined clinical and epidemiological data as well as genetic information about the
human host. Under this proposal, we will generate the genomic sequence and databases needed to
overcome many present limitations to the dengue research community. Additionally, as a component of
this proposal the Broad Institute, in close collaboration with the greater infectious disease research
community, will develop the necessary framework and infrastructure capable of enabling high-throughput
viral-host interaction studies.
a) Comparative Genomics to Understand Evolution and Transmission of the Virus During an
Epidemic
Using comparative genomics methods we will determine the contributions of mutation, recombination,
genetic drift, gene flow and the natural selection of advantageous mutations to the genetic structure of
dengue virus populations in at least four epidemiologically well-defined settings (Puerto Rico, Venezuela,
Nicaragua, and Vietnam). The immediate availability and density of samples at these sites will enable us
to determine how genetic diversity is changing through time. This will allow us to resolve whether viral
evolution occurs indigenously, or whether new strains are imported from elsewhere, perhaps triggering
major epidemics. Ultimately data from these sites will provide a means to more accurately model how
the virus could respond to the introduction of a vaccine program. The use of viral samples collected
throughout particular epidemics will allow us to test the hypothesis that viruses are selected during the
course of epidemics and are correlated with increased severity [12-14]. This connection will be
investigated by association with the clinical outcome of specific cases as well as with the ratio of severe
to total dengue cases in the population over time (particularly when the epidemic is caused by
predominantly one serotype). Furthermore, by examining the phylogenetic relationships between the viral
sequences sampled (i.e. using coalescent methods) we will estimate the relative population growth rate of
dengue viruses in each geographic region, as well as whether this rate is changing through time and
whether it differs by serotype.
It is important to understand the pattern of genetic change that has occurred in endemic regions. Previous
studies have shown that changes in the DEN population genetic structure are associated in part with strain
introductions from both geographically and socio-economically related populations [15-20]. For
example, throughout the Americas, DEN-2 Asian–American subtype IIIb, linked to DHF/DSS cases,
appears to have displaced the American subtype V, which has only been associated with DF [18-23].
Thus, the first epidemic of DHF/DSS in the Americas in 1981 was attributed to a DEN-2strain imported
from Southeast Asia, where the severe form of the disease had been endemic [19]. Although the onset of
DHF/DSS in the Americas in 1981 was associated with the introduction of a novel strain of DEN-2,
changes in dengue severity have also been associated with hyperendemic transmission patterns (the co-
occurrence of multiple dengue serotypes in the same locality) [24]. There is strong evidence linking
disease severity with secondary heterologous infection [25]. In endogenous regions, secondary infections
have become increasingly more frequent over periods of time in which the virus has mutated.
Complete genome sequences are central to advancing the understanding of DEN population structure and
transmission, yet limited studies have taken into consideration the entire viral genome in such studies
(Appendix A). The DEN population genomic map generated via this proposal will allow the dengue
research community to better understand the population dynamics of the dengue genome during disease
outbreaks and the genetic diversity underlying severe forms of the disease. Given the relative lack of
Broad Institute – M. Henn Dengue Virus White Paper
3/15/2010 Page 6 of 21
knowledge of the genetic mechanisms of pathogenicity and virulence, full genome sequences will provide
the means to identify important genetic motifs beyond the envelope protein and how the evolution of
these elements is associated with disease re-emergence.
b) Comparative Genomics to Develop Strain Diagnostics and Resolve the Contribution of Viral
Determinants to Disease Severity
Some data suggest that hyperendemicity may not be the only factor contributing to the increased disease
severity and that strain virulence may play an important role [26-30]. Some genotypes, or indeed
serotypes, of dengue virus are likely unequivocally more virulent than others. Understanding if and how
viral factors contribute to disease pathogenesis could be very useful in the rational design and targeted
deployment of new anti-virals or prophylactic dengue vaccines. It would also provide valuable
information as to whether the spread of particular strains should be monitored with more attention.
Sequence variation in dengue viruses is probably related to virulence [31]. As mentioned above,
epidemiological studies have suggested that the DEN-2 Asian-American genotype, but not the American
genotype, are capable of causing DHF in secondary infections [19, 32]. These observations led to the
hypothesis that DEN-2 Asian-American genotype viruses were more virulent than DEN-2 American
genotype viruses. Similar observations have been reported with respect to genetically distinct variants of
DEN-3 in Sri Lanka, where strains isolated before 1989 were never associated with severe dengue while
those circulating after 1989 were correlated with the emergence of DHF in Sri Lanka [33]. DEN-4
generally leads to less severe disease than other serotypes, particularly DEN-2 and DEN-3 [34].
Dengue virus enters the body via a mosquito bite and is thought to replicate within a number of
immunological cells including: macrophages, monocytes, B cells, mast, dendritic, and endothelial cells
[35]. A dominant means of viral entry into host cells is thought to occur through the enchanced binding
of viral proteins to Fc! host-cell receptors. Besides the Fc! receptor that is implicated in Antibody
Dependent Enhancement (ADE), knowledge of other DEN receptors is limited. Recently genetic
determinants in the E gene and 5’ and 3’ untranslated regions of the DEN genome were associated with
dengue virulence [25], and it is known that dengue virus reactive T-cells vary in their ability to recognize
different serotypes [25]. Additionally, entry and replication of dengue in dendritic cells can be blocked
by antibodies to the carbohydrate recognition domain of DC-SIGN, a receptor that is utilized with
different efficiencies by different DEN serotypes [36]. All of these findings indicate a multiplicity of
genetic determinants of host infection in addition to immunity responses, and suggests specific host-viral
associations that are dependent of viral genome changes.
What is urgently needed to understand the viral determinants of disease severity is a genome-wide
strategy that identifies polymorphic loci that are associated with distinct infection outcomes (e.g., DF
versus DHF). Presently there is a lack of sufficient viral genomic data with wich to link disease outcome
with viral polymorphisms. Genomic sequence generated under this proposal will directly address this
need. For example, recent studies have shown that the nonstructural regions of dengue virus are involved
in counteracting the antiviral response of the host cell [37, 38]. An appropriately structured genome
database could resolve polymorphisms at these loci and associate this information with the virus’ ability
to block the antiviral response of the host. The ensuing genome sequence reference database we generate
will be invaluable for disease diagnostics and dengue monitoring. A long-term goal of this proposal and
of GRID is the development of high-throughput viral resequencing arrays. As an example, the ability to
quickly type strains using population-specific molecular markers could be applied to track and ultimately
Broad Institute – M. Henn Dengue Virus White Paper
3/15/2010 Page 7 of 21
aid the determination of both the potential threat and the appropriate response to outbreaks or the
malicious release of this Category-A biological agent.
c) Comparative Genomics to Understand In Vivo Diversity and Host–Viral Interaction
RNA viruses exhibit a high degree of sequence variation not just between isolates as addressed above, but
also among viruses within an individual. This in part is due to the viral RNA polymerase and its inability
to proofread. As a result, viral infections within an individual exist as a population of closely related
sequences that are called quasispecies [39-44]. The role of quasispecies in disease pathogenesis,
transmission, and viral evolution is suspected to possibly be important for RNA viruses. Recent data on
poliovirus indicate that intra-host genetic diversity is important for pathogenesis and tropism in
mammalian hosts (Raul Andino, personal communication). A study using the dengue envelope (E) gene
showed that genome-defective viruses exist in vivo [43]. Sequence generated under this proposal will
include a pilot initiative to characterize the extent of viral diveristy within a host. Of particular interest
are the comparisons between individuals with different disease outcomes and multiple time points within
an individual infection (e.g., first acute sample and sample at defervescence). These data will help to
define within-host DEN diversity and allow an assessment of the capacity of present isolation techniques
from serum to capture actual viral diversity. Generation of this data will assist in project design and
sequencing strategy for this project and future viral initiatives.
Dengue is an example of a complex disease; host genetic, viral and environmental factors likely
contribute to infection outcome. Understanding the contribution of host genetics to dengue disease
susceptibility/resistance is made more difficult by the absence of a robust animal model in which to
identify candidate susceptibility/resistance genes. Notwithstanding these limitations, host genetic
variability at the HLA class I loci, Fc! receptor, the vitamin D receptor, and the promoter region of
CD209 have all been associated with distinct symptomatic phenotypes [45-49].
There is an emerging theme in viral research that host immune pressures play a significant role in disease
outcomes and in viral evolution both within individuals (i.e. quasispecies) and at the geographic scale.
For example, research on HIV has revealed that mutations within HIV populations are associated with
with expression of specific HLA class I alleles [50]. In the case of dengue, secondary exposures to the
virus induce a memory response in immunologically primed individuals that can either clear the infection
or contribute to the virus’ pathology (Immune Enhancement Hypothesis, IEH); the genetic underpinnings
of this are unknown, but specific HLA class I alleles have been associated with the different outcomes
[49, 51]. Combined host-viral genomic studies are beginning to reveal that in the setting of highly
polymorphic pathogens (e.g. DEN), immune pressures eventually drive some escape mutations to
fixation. This results in some regions of the virus no longer being immunogenic to individuals expressing
particular HLA molecules [50]. For HIV, data are beginning to reveal population-specific
polymorphisms in HIV, suggesting the loss of some immunogenic regions of the virus in one population
expressing high levels of a particular HLA allele, while this region remains intact in another population
expressing only low levels of the allele (Todd Allen 2005, personal communication). Generation of an
association database that can identify a protective link between host HLA-type or other candidate genes
and dengue viral genomes will facilitate vaccine development for this highly variable pathogen (Figure
3), and aid identifying high risk groups. Such a resource could enable a more focused selection of
specific antigens for inclusion in vaccines aimed at engendering a particular immune response and
avoidance of an immune enhancement response.
While our current efforts are focused on understanding the interaction between human host and dengue,
future efforts are anticipated with the mosquito vector as well to provide a fully integrated genomic view
of the virus’ life-history.
Broad Institute – M. Henn Dengue Virus White Paper
3/15/2010 Page 8 of 21
d) Comparative Genomics Enhanced Vaccine Development and Antiviral Therapy
The development of successful vaccines and antiviral therapies against dengue is dependent on a
comprehensive understanding of the genetic variability across all serotypes and genotypes of the entire
viral genome. The antigenic variation of pathogens has implications in the immune response of the host
through mechanisms of immune evasion, immune attenuation and host mimicry. Presently only
sequence information for the envelope glycoprotein (E) gene exist in substantial number for use in
vaccine attenuation studies. Attenuated strains of all four DEN serotypes exist but a lack of animal
models and in vitro markers of attenuation slow further development of these vaccine candidates [52].
This limitation could be overcome by that ability to search whole genome data for loci polymorphisms
and regions susceptible to attenuation by single base pair or amino acid changes. Additionally, the
association of genomic diversity with virulence and disease outcome will permit the use of structural
genomics to understand conformation changes to disease specific proteins.
III) Strain and Site Selection
Three geographic areas were selected for this proposal -the Caribbean, the Americas, and Southeast Asia-
from which four focus collections/areas were selected: Puerto Rico, Venezuela, Nicaragua, and Vietnam.
These sites were selected as they are existing collections with on going sample collection that can meet
the criteria of our scientific goals. All four serotypes of the virus are circulating at each of the geographic
locations. These sites provide a continuum of sample resolutions that can assess viral population
structure at highly refined scales (i.e., at the resolution of counties in Puerto Rico, Figure 4), country
scales (i.e., Nicaragua and Venezuela), and global scales (i.e., all sites). The Puerto Rico surveillance
system represents a unique model to conduct phylogenetic analysis of dengue virus over an extended
period of time with documented epidemiological and clinical data in a region with very few reported re-
introductions (Appendix C). In addition, the CDC collections in Puerto Rico and the Americas are the
only known collections with a municipality sampling resolution (Figure 4). In contrast, to Puerto Rico
the other sites represent scenarios in which re-introductions have occurred, but also like Puerto Rico have
well documented epidemioligcal and clinical data across multiple epidemics and from within individual
outbreaks of the disease. The Venezuela, Nicaragua, and Vietnam sites also represent case studies with
collections of both RNA and serum samples for quasispecies analysis, as well as existing initiatives in
host-viral studies.
A significant effort was made by the Broad and GRID to identify sample collections and ongoing dengue
sampling efforts that could adequately address our defined scientific goals and generate required cDNAs
in a timely manner. A list was generated of existing sample collections (Appendix B) and each collection
was assessed for resolution of sampling, collection size, collection content, existence of appropriate
clinical data, and compatibility with sampling efforts currently ongoing. From this list, our sampling sites
were selected. One or more members of GRID has access to and have agreed to supply required viral
cDNAs and/or host DNAs for each region. Our selections ensure that all serotypes and known genotypes
are included in the sequencing effort. The selection of these sites will result in a highly structured,
comprehensive DEN genome database
Figure 4. Distribution of DEN-2 and DEN-3 in
Peurto Rico showing resolution of sampling at
the municipality level. The blue shadowing
Broad Institute – M. Henn Dengue Virus White Paper
3/15/2010 Page 9 of 21
While the initial focus of this project is on the collection sites noted, GRID is aware of a growing interest
in this project from other key geographic regions and dengue researchers. We anticipate the potential
inclusion of samples from other geographic regions and countries within selected focus areas. Such a
decision will be made based on how significant genomic viral diversity is within our proposed sampling
sites and the scientific value of broader geographic coverage versus deeper within site strain coverage
based on these findings. Approval of this project is anticipated to facilitate removal of current limitations
on sample access providing the necessary seed to justify additional funding to other granting agencies.
IV) DNA Sequencing, Assembly, and Annotation
• Viral Sequencing Strategy (dengue genome size = 10.7kb)
We anticipate generating 3500 genome sequences derived from a combination of low-passage viral
isolates and serum sample amplifications. The viral sequencing in this proposal will support two distinct
sequencing objectives. First, we will determine the entire sequence of individual viral genomes using
RNA from low-passage isolates as a means to capture viral diversity at the geographic scale and to relate
strain genome structure to disease outcomes. Second, we will determine the consensus genome sequence
of viral populations within an individual host (i.e. quasispecies). These “mixed” genome sequences
represent a sample of the diversity within a host and are required for host-pathogen association studies.
Viral Isolates. Low-passage isolates will yield sequence for individual full-length genomes suitable for
geographic diversity and disease associated studies. These samples are limited by their potential
misrepresentation of the frequency of individual sequence polymorphisms within the host’s
quasispecies complex. Therefore these samples are not suitable for our host-viral interaction
studies.
Serum Samples. Serum sample amplifications represent bulk viral RNA from mixed populations of
genotypes within a host (i.e. quasispecies). To describe this diversity we will employ two different
methods that can both ultimately generate a consensus genome sequence that represents the mixed
population. For these samples we will generate a genome consensus sequence using either a PCR-based
or clone-based approach. These two methods have different sensitivities. The clone-based method will
provide enhanced sensitivity for quasispecies analysis as sequence from strains at low dominance within
the host will be captured. The consensus sequences generated are suitable to accurately identify host-
associated and disease related polymorphisms. Employing the two different sequencing approaches (i.e.
PCR versus clone) to a subset of samples will allow us to determine how well PCR-based consensus
sequence generation represents actual viral diversity within the host.
Strategy. As identification of nucleotide changes are critical to the analysis, high quality sequence is
required to identify real differences rather than sequencing errors. This will be achieved through the
sequencing of overlapping PCR fragments.
PCR-based sequencing: Our sequencing approach is currently based on the utilization of a set of primer
pairs overlapping the entire DEN genome. cDNA will be produced from RNA extracted from low
passage viral isolates or serum samples. RT-PCR from viral isolate samples will be performed using a
standardized high fidelity PCR protocol at each of the four main focus regions and result in
approximately 5-8 overlapping amplicons. These PCR products will be provided to the Broad Institute
MSC for high throughput sequencing. RT-PCR amplification and sequencing in the forward and reverse
Broad Institute – M. Henn Dengue Virus White Paper
3/15/2010 Page 10 of 21
direction of 500bp amplicons generated from the original 5-8 overlapping amplicons will result in an
effective coverage of 8X.
Clone-based sequencing: This approach will also utilize a set of overlapping primer pairs for the entire
DEN genome. Approximately 30 clones will be picked for each PCR fragment (i.e. 30X coverage) and
then sequenced. Genome data will be assembled at the Broad Institute. Genome assembly is an active area of research at
each MSC and improvements in the algorithms are implemented on an ongoing basis. Annotation teams
at the Broad will use all available evidence and follow standard protocols at the Broad MSC for genome
annotation. The Broad will work in collaboration with members of GRID and the greater dengue research
community on genome analysis and the identification of sequence polymorphisms.
• Host Genetic Typing
While the primary focus of this proposal is on the viral genome sequencing, the importance of
understanding the interaction between host and virus is evident. Sequencing during the initial stages of
this project will be directed at the viral genome. As this project develops, we anticipate the typing of
1500 patients from which we have associated viral genome sequences and clinical data. To achieve this
goal and maximum impact of the genetic data on finding tangible disease solutions, the Broad is working
in close collaboration with members of GRID and current collaborators from other related projects (i.e.
HCV and HIV) to develop the appropriate technology and framework to successfully implement such a
program. Presently no high throughput method exists for host genetic typing of HLA and disease loci.
The Broad recognizes the significance of this need and is presently developing a strategy capable of
efficiently and effectively sequencing the numerous variants of the human HLA loci as well as other
relevant disease alleles. As needed, existing collaborations can provide the ability to type host loci.
• Future sequencing goals
The Broad in collaboration with members of GRID is presently exploring the potential of viral
resequencing arrays to provide the dengue research community a means for high throughput viral strain
diagnostics. Resequencing RNA genomes poses an interesting additional technological challenge and
opportunity. First, can RNA be directly hybridized to the DNA arrays or is cDNA required? The
standard gene expression protocol for Affymetrix microarrays calls for a cRNA-DNA hybridization, so
directly resequencing RNA is conceivable, but unproven. Second, the size of typical RNA viral genomes
compared to the high probe density of current microarrays raises the question whether complex variations
besides single nucleotide polymorphisms can be detected with additional probes. For example, all
possible single- and double- base deletions and insertions for the Dengue genome could be represented on
current microarrays. This novel application of Variation Detection Arrays could break new ground in
microarray-based resequencing, but demands new algorithm development.
V) Genome Resources in Dengue Consortium (GRID)
The process for producing this white paper was carried out in a highly collaborative fashion. Over the
past several months, many scientists from the dengue research community with a vital interest in these
data have worked in partnership with the Broad Institute to establish common scientific goals and define
the appropriate genomic infrastructure needed to support them. GRID is the result of this process, and
consists of:
Broad Institute – M. Henn Dengue Virus White Paper
3/15/2010 Page 11 of 21
1)Angel Balmaseda, Department of Virology, National Diagnostics and Reference Center, Ministry of
Health, Managua, Nicaragua
2) Bruce Birren, Broad Institute, Cambridge, MA
3) Irene Bosch, University of. Massachusetts Medical School, Center for Infectious Disease and Vaccine
Research, Worcester, MA
4) Eva Harris, Division of Infectious Diseases, School of Public Health, University of California -
Berkeley, , Berkeley, CA
5) Matthew Henn, Broad Institute, Cambridge, MA
6) Rebeca Rico-Hesse, Dept. Virology & Immunology, Southwest Foundation for Biomedical Research,
San Antonio, TX
7) Jeremy Farrar, Oxford University Clinical Research Unit, Hospital for Tropical Disease, HCMC,
Vietnam
8) Jorge Munoz-Jordán, Centers for Disease Control and Prevention, Division of Vector-Borne Infectious
Disease, San Juan, PR
9) David Kulp, University of Massachusetts, Dept. of Computer Science, Amherst, MA
10) Cameron Simmons, Oxford University Clinical Research Unit, Hospital for Tropical Disease, HCMC,
Vietnam
11) Steve Whitehead, Laboratory of Infectious Disease, NIAID Vaccine Branch, Bethesda, MD
12) Priscilla Yang, Harvard Medical School, Dept. of Microbiology, Boston, MA
GRID sought input and attained consensus concerning sequencing targets through many meetings at the
Broad with various participants, several conference calls with most members of GRID, and numerous e-
mail and telephone exchanges. The Broad has made a significant investment over the past six months in
developing and coordinating this project. A critical component of this effort has been the sharing of
information and data pertaining to goal development and sequencing target selection. In addition, great
weight was given to practical matters in the structure of both the sequencing and analysis components of
the proposed work. The group has worked together to standardize protocols for viral isolation and cDNA
production between the various project sites. Importantly, many consortium members have active
collaborations in our designated focus areas as well as have secure access to the viral collections and
sampling clinics necessary for the acquisition of genomic cDNA for all anticipated sequencing targets.
VI) Management of the Dengue Genome Project by the Research Community
1) Broad Institute (Matthew Henn) 20%
2) Irene Bosch 5%
3) Eva Harris 5%
Both Dr. Bosch and Dr. Harris have been key contacts for the Broad Institute during the development of
this white paper. They both maintain extensive dengue research programs at US universities, and have
well-established collaborations with dengue clinics in the Americas and Asia.
VII) Data Release
In accordance with the NIAID’s principles regarding data release, we will publicly release all data
generated under this contract as rapidly as possible. As required by our contract, NIAID will be provided
with a 21-45 calendar-day period to review and comment upon all data prior to its public release.
Broad Institute – M. Henn Dengue Virus White Paper
3/15/2010 Page 12 of 21
Chromatogram Files: Unless otherwise directed by NIAID, we will submit all sequences and trace files
(chromatograms) generated under this proposal to the Trace Archive at NCBI on a no less than weekly
basis. These data will also include information on templates, vectors, and quality values for each
sequence.
Genome Assemblies: Genome assemblies will be made available via GenBank and the MSCs’ websites,
after internal and community validation. Assuming no significant errors are detected during the
validation process, assemblies will be released within 45 calendar days of being generated.
Genome Annotation: Automated annotation data will be made available via GenBank and our web sites
after internal and community validation. Assuming no significant errors are detected during the
validation process, annotation data will be released within 45 calendar days of being generated.
Broad Institute – M. Henn Dengue Virus White Paper
3/15/2010 Page 13 of 21
VIII) References
1. Gubler, D.J., The changing epidemiology of yellow fever and dengue, 1900 to 2003: full
circle? Comp Immunol Microbiol Infect Dis, 2004. 27(5): p. 319-30.
2. World Helath Organization, Stengthening implementation of the global strategy for
DF/DHF prevention and control. 1999. p. http://www.who.int/vaccines-
disease/research/virus2.shtml.
3. Gubler, D.J., Epidemic dengue/dengue hemorrhagic fever as a public health, social and
economic problem in the 21st century. Trends Microbiol, 2002. 10(2): p. 100-3.
4. World Helath Organization, Vaccines, immunization, and biologicals: dengue and
Japanese encephalitis vaccines. 2001. p. http://www.who.int/vaccines-
diseases/research/virus2.htm.
5. Halstead, S.B., Dengue. Current Opinion In Infectious Diseases, 2002. 15(5): p. 471-476.
6. Rigau-Perez, J.G., et al., The reappearance of dengue-3 and a subsequent dengue-4 and
dengue-1 epidemic in Puerto Rico in 1998. Am J Trop Med Hyg, 2002. 67(4): p. 355-62.
7. Effler, P., et al., Dengue Fever, Hawaii, 2001-2002. Emerging Infectious Disease, 2005.
11(5): p. 742-749.
8. Beatty, M.E., et al., Travel-Associated Dengue Infections - United States, 2001–2004.
Morbidity and Mortality Weekly Report, 2005. 54(22): p. 555-558.
9. Twiddy, S.S., E.C. Holmes, and A. Rambaut, Inferring the rate and time-scale of dengue
virus evolution. Molecular Biology And Evolution, 2003. 20(1): p. 122-129.
10. Moore, C.G., Aedes albopictus in the United States: Current status and prospects for
further spread. Journal Of The American Mosquito Control Association, 1999. 15(2): p.
221-227.
11. Moore, C.G. and C.J. Mitchell, Aedes albopictus in the United States: Ten-year presence
and public health implications. Emerging Infectious Diseases, 1997. 3(3): p. 329-334.
12. Kouri, G.P., M.G. Guzman, and J.R. Bravo, Why dengue haemorrhagic fever in Cuba? 2.
An integral analysis. Trans R Soc Trop Med Hyg, 1987. 81(5): p. 821-3.
13. Rodriguez-Roche, R., et al., Virus evolution during a severe dengue epidemic in Cuba,
1997. Virology, 2005. 334(2): p. 154-9.
14. Guzman, M.G., G. Kouri, and S.B. Halstead, Do escape mutants explain rapid increases
in dengue case-fatality rates within epidemics? Lancet, 2000. 355(9218): p. 1902-3.
15. Fong, M.Y., C.L. Koh, and S.K. Lam, Molecular epidemiology of Malaysian dengue 2
viruses isolated over twenty-five years (1968-1993). Res Virol, 1998. 149(6): p. 457-64.
16. Foster, J.E., et al., Molecular evolution and phylogeny of dengue type 4 virus in the
Caribbean. Virology, 2003. 306(1): p. 126-34.
17. Lewis, J.A., et al., Phylogenetic relationships of dengue-2 viruses. Virology, 1993.
197(1): p. 216-24.
18. Nogueira, R.M., M.P. Miagostovich, and H.G. Schatzmayr, Molecular epidemiology of
dengue viruses in Brazil. Cad Saude Publica, 2000. 16(1): p. 205-11.
19. Rico-Hesse, R., et al., Origins of dengue type 2 viruses associated with increased
pathogenicity in the Americas. Virology, 1997. 230(2): p. 244-51.
20. Uzcategui, N.Y., et al., Molecular epidemiology of dengue type 2 virus in Venezuela:
evidence for in situ virus evolution and recombination. J Gen Virol, 2001. 82(Pt 12): p.
2945-53.
21. Halstead, S.B., et al., Haiti: absence of dengue hemorrhagic fever despite hyperendemic
dengue virus transmission. Am J Trop Med Hyg, 2001. 65(3): p. 180-3.
Broad Institute – M. Henn Dengue Virus White Paper
3/15/2010 Page 14 of 21
22. Ruiz, B.H., et al., Phylogenetic comparison of the DEN-2 Mexican isolate with other
flaviviruses. Intervirology, 2000. 43(1): p. 48-54.
23. Tolou, H., et al., Complete genomic sequence of a dengue type 2 virus from the French
West Indies. Biochem Biophys Res Commun, 2000. 277(1): p. 89-92.
24. Gubler, D.J., The global pandemic of dengue/dengue haemorrhagic fever: current status
and prospects for the future. Ann Acad Med Singapore, 1998. 27(2): p. 227-34.
25. Rothman, A.L., Immunology and immunopathogenesis of dengue disease. Adv Virus Res,
2003. 60: p. 397-419.
26. Cologna, R. and R. Rico-Hesse, American genotype structures decrease dengue virus
output from human monocytes and dendritic cells. J Virol, 2003. 77(7): p. 3929-38.
27. Endy, T.P., et al., Spatial and temporal circulation of dengue virus serotypes: a
prospective study of primary school children in Kamphaeng Phet, Thailand. Am J
Epidemiol, 2002. 156(1): p. 52-9.
28. Rico-Hesse, R., Molecular evolution and distribution of dengue viruses type 1 and 2 in
nature. Virology, 1990. 174(2): p. 479-93.
29. Rico-Hesse, R., Microevolution and virulence of dengue viruses. Adv Virus Res, 2003.
59: p. 315-41.
30. Vaughn, D.W., Invited commentary: Dengue lessons from Cuba. Am J Epidemiol, 2000.
152(9): p. 800-3.
31. Leitmeyer, K.C., et al., Dengue virus structural differences that correlate with
pathogenesis. J Virol, 1999. 73(6): p. 4738-47.
32. Watts, D.M., et al., Failure of secondary infection with American genotype dengue 2 to
cause dengue haemorrhagic fever. Lancet, 1999. 354(9188): p. 1431-4.
33. Messer, W.B., et al., Emergence and global spread of a dengue serotype 3, subtype III
virus. Emerg Infect Dis, 2003. 9(7): p. 800-9.
34. Nisalak, A., et al., Serotype-specific dengue virus circulation and dengue disease in
Bangkok, Thailand from 1973 to 1999. Am J Trop Med Hyg, 2003. 68(2): p. 191-202.
35. Malavige, G.N., et al., Dengue viral infections. Postgraduate Medical Journal, 2004.
80(948): p. 588-601.
36. Altmeyer, R., Virus attachment and entry offer numerous targets for antiviral therapy.
Curr Pharm Des, 2004. 10(30): p. 3701-12.
37. Liu, W.J., et al., Inhibition of interferon signaling by the New York 99 strain and Kunjin
subtype of West Nile virus involves blockage of STAT1 and STAT2 activation by
nonstructural proteins. J Virol, 2005. 79(3): p. 1934-42.
38. Munoz-Jordan, J.L., et al., Inhibition of interferon signaling by dengue virus. Proc Natl
Acad Sci U S A, 2003. 100(24): p. 14333-8.
39. Holland, J.J., J.C. De La Torre, and D.A. Steinhauer, RNA virus populations as
quasispecies. Curr Top Microbiol Immunol, 1992. 176: p. 1-20.
40. Lin, S.R., et al., Study of sequence variation of dengue type 3 virus in naturally infected
mosquitoes and human hosts: implications for transmission and evolution. J Virol, 2004.
78(22): p. 12717-21.
41. Lu, M., et al., Analysis of hepatitis C virus quasispecies populations by temperature
gradient gel electrophoresis. J Gen Virol, 1995. 76 (Pt 4): p. 881-7.
42. Martell, M., et al., Hepatitis C virus (HCV) circulates as a population of different but
closely related genomes: quasispecies nature of HCV genome distribution. J Virol, 1992.
66(5): p. 3225-9.
Broad Institute – M. Henn Dengue Virus White Paper
3/15/2010 Page 15 of 21
43. Wang, W.K., et al., Dengue type 3 virus in plasma is a population of closely related
genomes: quasispecies. J Virol, 2002. 76(9): p. 4662-5.
44. Zhu, T., et al., Genotypic and phenotypic characterization of HIV-1 patients with primary
infection. Science, 1993. 261(5125): p. 1179-81.
45. Loke, H., et al., Strong HLA class I--restricted T cell responses in dengue hemorrhagic
fever: a double-edged sword? J Infect Dis, 2001. 184(11): p. 1369-73.
46. Loke, H., et al., Susceptibility to dengue hemorrhagic fever in vietnam: evidence of an
association with variation in the vitamin d receptor and Fc gamma receptor IIa genes.
Am J Trop Med Hyg, 2002. 67(1): p. 102-6.
47. Stephens, H.A., et al., HLA-A and -B allele associations with secondary dengue virus
infections correlate with disease severity and the infecting viral serotype in ethnic Thais.
Tissue Antigens, 2002. 60(4): p. 309-18.
48. Sakuntabhai, A., et al., A variant in the CD209 promoter is associated with severity of
dengue disease. Nat Genet, 2005. 37(5): p. 507-13.
49. Waganaar, J.F.P., A.T.A. Mairuhu, and E.C. van Gorp, Genetic influences on dengue
virus infections. WHO Dengue Bulletin, 2004. 28: p. 126-134.
50. Moore, C.B., et al., Evidence of HIV-1 adaptation to HLA-restricted immune responses at
a population level. Science, 2002. 296(5572): p. 1439-43.
51. Stephens, H.A., et al., HLA-A and -B allele associations with secondary dengue virus
infections correlate with disease severity and the infecting viral serotype in ethnic Thais.
Tissue Antigens, 2002. 60(4): p. 309-18.
52. Rothman, A.L., Dengue: defining protective versus pathologic immunity. J Clin Invest,
2004. 113(7): p. 946-51.
53. Klungthong, C., et al., The molecular epidemiology of dengue virus serotype 4 in
Bangkok, Thailand. Virology, 2004. 329(1): p. 168-79.
54. Bennett, S.N., et al., Selection-driven evolution of emergent dengue virus. Mol Biol Evol,
2003. 20(10): p. 1650-8.
55. Moncayo, A.C., et al., Dengue emergence and adaptation to peridomestic mosquitoes.
Emerg Infect Dis, 2004. 10(10): p. 1790-6.
56. Wang, E., et al., Evolutionary relationships of endemic/epidemic and sylvatic dengue
viruses. J Virol, 2000. 74(7): p. 3227-34.
57. Uzcategui, N.Y., et al., Molecular epidemiology of dengue virus type 3 in Venezuela. J
Gen Virol, 2003. 84(Pt 6): p. 1569-75.
58. Cologna, R., P.M. Armstrong, and R. Rico-Hesse, Selection for virulent dengue viruses
occurs in humans and mosquitoes. J Virol, 2005. 79(2): p. 853-9.
59. Laille, M. and C. Roche, Comparison of dengue-1 virus envelope glycoprotein gene
sequences from French Polynesia. Am J Trop Med Hyg, 2004. 71(4): p. 478-84.
60. Shurtleff, A.C., et al., Genetic variation in the 3' non-coding region of dengue viruses.
Virology, 2001. 281(1): p. 75-87.
61. Aviles, G., et al., Complete coding sequences of dengue-1 viruses from Paraguay and
Argentina. Virus Res, 2003. 98(1): p. 75-82.
62. Baleotti, F.G., M.L. Moreli, and L.T. Figueiredo, Brazilian Flavivirus phylogeny based
on NS5. Mem Inst Oswaldo Cruz, 2003. 98(3): p. 379-82.
Broad Institute – M. Henn Dengue Virus White Paper
3/15/2010 Page 16 of 21
Appendix A
List of published evolutionary studies of dengue virus showing only two studies that used the complete
viral genome
Serotype
Location
Dengue serotype/No. of
clones sequenced
Target sequence Hypothesis Reference
Taiwan DEN-3. 88 from human
and 50 from mosquitoes
C and E Higher nucleotide
substitutions in
human derived
virus. Host-derived
positive selection
[40]
Americas DEN-2. 62 Isolates total
with 55 DEN-2.
E/NS1 junction No differences
between DF and
DHF. No
association with
Pathogenesis
[19]
Thailand DEN-4 Three genotypes-
SEA (I), SEA-America
(II), Sylvatic (III). 53
Isolates from 27-year
period
E
6 Thai strains
compared with 58
complete coding
regions
No geographical
segregation of the
three genotypes.
Immune driven
positive selection
[53]
Puerto Rico
USA
DEN-4. 82 isolates 40% of the genome
(4,000 bases of C,
PrM, E, NS1, NS2A,
NS4B
Temporal
clustering.
Antigenic variation
of NS2A.
Synonymous
changes
outnumbering non-
shynonymous
changes
[54]
West Africa,
SEA, Oceania
DEN-2 Hypothetic aa
replacements in E by
Wang et al.
Peridomestic
mosquitoes have
higher
dissemination rates
of endemic viruses.
Host-derived
positive selection
[55]
Malaysia and
Thailand
DEN-1, 2, and 4 E Sylvatic genetic
divergence
[56]
Venezuela DEN-3. 15 Isolates E Introduction of
Genotype III no
evidence of re-
emergence of type
V from 1960.
Importation of
dengue strains from
Asia.
[57]
SEA, West
Africa, America
DEN-2. 24 Isolates E Out-competing
Asia vs American
DEN2 strains in
mosquito. Selection
pressure evolution
[58]
Cuba DEN-2. 20 isolates and 6
for complete genome
E, complete genome No a.a. difference
in E.
Evolution during
epidemic in the NS
genes.
[13]
Venezuela DEN-2. 34 isolates E, PrM, NS1 In situ evolution of
dengue 2 within the
[20]
Broad Institute – M. Henn Dengue Virus White Paper
3/15/2010 Page 17 of 21
same geographical
region
French
Polynesia
DEN-1. Genotype V and
IV
E Introduction of
DEN1 SEA
[59]
SEA, West
Africa,
Americas,
sylvatic
DEN-1, 2, 3, 4 3’ NTR DEN viruses arose
from sylvatic
progenitors and
evolved into human
epidemic strains.
Variation in the
3'NCR does not
correlate with DEN
virus pathogenesis
[60]
Argentina,
Paraguay
DEN-1, Five isolates Whole genome 3- a.a. in NS4A.
Co-existance of two
genotypes of the
five known.
[61]
Brazil DEN and encephalitic
and hemorrhagic in 15
strains of flaviviruses
NS5 Selection pressure
evolution
[62]
Thailand &
Venezuela
DEN-2, 11 isolates Whole genome Correlated
structural
differences to
pathogenesis
[31]