-
HIV Sequence Compendium 2009
EditorsCarla KuikenLos Alamos National Laboratory
Thomas LeitnerLos Alamos National Laboratory
Brian FoleyLos Alamos National Laboratory
Beatrice HahnUniversity of Alabama
Preston MarxTulane National Primate ResearchCenter
Francince McCutchanHenry M. Jackson Foundation
Steven WolinskyNorthwestern University
Bette KorberLos Alamos National Laboratory
Project OfficerGeetha Bansal
Division of AIDSNational Institute of Allergy and Infectious
Diseases
Los Alamos HIV Sequence Database and Analysis StaffWerner
Abfalterer, Gayathri Athreya, Will Fischer, Bob Funkhouser,
Chien-Chi Lo,
Jennifer Macke, James J. Szinger, James Thurmond, Hyejin Yoon,
Ming Zhang
This publication is funded by the Division of AIDS, National
Institute of Allergy and Infectious Diseases,through interagency
agreement IAA Y1-AI-8309-1 HIV/SIV Database and Analysis Unit
with the U.S. Department of Energy.
Published byTheoretical Biology and Biophysics
Group T-6, Mail Stop K710Los Alamos National Laboratory
Los Alamos, New Mexico 87545 U.S.A.
LA-UR 09-03280
http://www.hiv.lanl.gov/
http://www.nih.gov/http://www.hiv.lanl.gov/http://www.hhs.gov/
-
HIV Sequence Compendium 2009
Published byTheoretical Biology and BiophysicsGroup T-6, Mail
Stop K710Los Alamos National LaboratoryLos Alamos, New Mexico 87545
U.S.A
LA-UR 09-03280Approved for public release; distribution is
unlimited.
Los Alamos National Laboratory, an affirmative
action/equalopportunity employer, is operated by Los Alamos
National Se-curity, LLC, for the National Nuclear Security
Administrationof the U.S. Department of Energy under contract
DE-AC52-06NA25396.
This report was prepared as an account of work sponsored by
anagency of the U.S. Government. Neither Los Alamos
NationalSecurity, LLC, the U.S. Government nor any agency
thereof,nor any of their employees make any warranty, express or
im-plied, or assume any legal liability or responsibility for the
accu-racy, completeness, or usefulness of any information,
apparatus,product, or process disclosed, or represent that its use
wouldnot infringe privately owned rights. Reference herein to
anyspecific commercial product, process, or service by trade
name,trademark, manufacturer, or otherwise does not necessarily
con-stitute or imply its endorsement, recommendation, or favoringby
Los Alamos National Security, LLC, the U.S. Government,or any
agency thereof. The views and opinions of authors ex-pressed herein
do not necessarily state or reflect those of LosAlamos National
Security, LLC, the U.S. Government, or anyagency thereof.
Los Alamos National Laboratory strongly supports academicfreedom
and a researchers right to publish; as an institution,however, the
Laboratory does not endorse the viewpoint of apublication or
guarantee its technical correctness.
This report was prepared as an account of work sponsored
byNIH/NIAID/DAIDS under contract number IAA Y1-AI-8309-1HIV/SIV
Database and Analysis Unit.
-
Contents
Contents iii
I Preface 1I-1 Introduction . . . . . . . . . . . . . . . . 1I-2
Acknowledgements . . . . . . . . . . . . 1I-3 Citing the compendium
or database . . . . 1I-4 About the PDF . . . . . . . . . . . . . .
1I-5 About the cover . . . . . . . . . . . . . . 2I-6 Genome maps .
. . . . . . . . . . . . . . 4I-7 HIV/SIV proteins . . . . . . . . .
. . . . 5I-8 Landmarks of the genome . . . . . . . . . 6I-9 Amino
acid codes . . . . . . . . . . . . . 8I-10 Nucleic acid codes . . .
. . . . . . . . . 8
II HIV-1/SIVcpz Complete Genomes 9II-1 Introduction . . . . . .
. . . . . . . . . . 9II-2 Annotated features . . . . . . . . . . .
. 10II-3 Sequences . . . . . . . . . . . . . . . . . 12II-4
Alignments . . . . . . . . . . . . . . . . 16
III HIV-2/SIV Complete Genomes 149III-1 Introduction . . . . . .
. . . . . . . . . . 149III-2 Annotated features . . . . . . . . . .
. . 150III-3 Sequences . . . . . . . . . . . . . . . . . 152III-4
Alignments . . . . . . . . . . . . . . . . 154
IV PLV Complete Genomes 219IV-1 Introduction . . . . . . . . . .
. . . . . . 219IV-2 Sequences . . . . . . . . . . . . . . . . .
220IV-3 Alignments . . . . . . . . . . . . . . . . 222
V HIV-1/SIVcpz Proteins 301V-1 Introduction . . . . . . . . . .
. . . . . . 301V-2 Annotated features . . . . . . . . . . . .
302V-3 Sequences . . . . . . . . . . . . . . . . . 304V-4
Alignments . . . . . . . . . . . . . . . . 310
VI HIV-2/SIV Proteins 365VI-1 Introduction . . . . . . . . . . .
. . . . . 365VI-2 Annotated features . . . . . . . . . . . .
366VI-3 Sequences . . . . . . . . . . . . . . . . . 367VI-4
Alignments . . . . . . . . . . . . . . . . 369
VII PLV Proteins 395VII-1 Introduction . . . . . . . . . . . . .
. . . 395VII-2 Sequences . . . . . . . . . . . . . . . . . 396VII-3
Alignments . . . . . . . . . . . . . . . . 403
HIV Sequence Compendium 2009 iii
-
Contents
iv HIV Sequence Compendium 2009
-
Pref
ace
I
Preface
I-1 Introduction
This compendium is an annual printed summary of the data
con-tained in the HIV sequence database. In these compendia we
tryto present a judicious selection of the data in such a way that
it isof maximum utility to HIV researchers. Each of the
alignmentsattempts to display the genetic variability within the
differentspecies, groups and subtypes of the virus.
This compendium contains sequences published before Jan-uary 1,
2009. Hence, though it is called the 2009 Compendium,its contents
correspond to the 2008 curated alignments on ourwebsite.
The number of sequences in the HIV database is still increas-ing
exponentially. In total, at the time of printing, there were229,451
sequences in the HIV Sequence Database, an increaseof 17% since
last year.
The number of near complete genomes (>7000
nucleotides)increased to 2099 by end of 2008, reflecting a smaller
in-crease than in previous years. However, as in previous years,the
compendium alignments contain only a small fraction ofthese.
Included in the alignments are a small number of se-quences
representing each of the subtypes and the more preva-lent
circulating recombinant forms (CRFs) such as 01 and02, as well as a
few outgroup sequences (group O and Nand SIV-CPZ). Of the rarer
CRFs we included one represen-tative each. A more complete version
of all alignments isavailable on our website,
http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html
Reprints are available from our website in the form of bothHTML
and PDF files. As always, we are open to complaints andsuggestions
for improvement. Inquiries and comments regard-ing the compendium
should be addressed to [email protected]
I-2 Acknowledgements
The HIV Sequence Database and Analysis Project is funded bythe
Vaccine and Prevention Research Program of the AIDS Di-vision of
the National Institute of Allergy and Infectious Dis-eases (Dr.
Geetha Bansal, Project Officer) through interagencyagreement IAA
Y1-AI-8309-1 HIV/SIV Database and Analy-sis Unit with the U.S.
Department of Energy.
I-3 Citing the compendium or database
The LANL HIV Sequence Database may be cited in the samemanner as
this compendium:
HIV Sequence Compendium 2009. Carla Kuiken, ThomasLeitner, Brian
Foley, Beatrice Hahn, Preston Marx, Franc-ince McCutchan, Steven
Wolinsky, and Bette Korber edi-tors. 2009. Publisher: Los Alamos
National Laboratory,Theoretical Biology and Biophysics, Los Alamos,
NewMexico. LA-UR 09-03280.
I-4 About the PDF
The complete HIV Sequence Compendium 2009 is availablein Adobe
Portable Document Format (PDF) from our
website,http://www.hiv.lanl.gov/. The PDF version is hy-pertext
enabled and features clickable table-of-contents, in-dexes,
references and links to external web sites.
This volume is typeset using LATEX.
HIV Sequence Compendium 2009 1
http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.htmlhttp://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.htmlmailto:[email protected]://www.hiv.lanl.gov/
-
Preface
Preface About the cover
I-5 About the cover
The cover of this compendium depicts a phylogenetic
analysismethod to classify sequences into known subtype clades
withthe Branching Index (BI) (Figure I.1). The purpose of the
ex-periment was to examine variation in BI classification
accuracyacross known subtypes and over the HIV-1 genome, and
therebyto establish threshold BI values for inferring whether a
sequencebelongs to a known subtype clade. The x-axis of each panel
de-picts position along the HIV-1 genome. The y-axis is the
BI,which varies from 0 (complete divergence from other membersof
the subtype clade) to 1 (no divergence from a known subtypeclade,
relative to other clades). Each panel depicts a knownsubtype and
results from computing the BI for many sequencesof that subtype.
Thus, the figure depicts a quantification of BIclassification
accuracy for HIV-1 subtypes. This analysis com-plements results
from the LANL HIV Database PhyloPlace toolby establishing BI error
rates and confidence levels for subtypeinference.
We sampled 10,000 random-length fragments over the 2005HIV-1
M-group subtype reference alignment, and randomlychose a query
sequence for each. We computed the BI twice foreach fragment
sampled, once with all sequences included (thusgenerating the data
points used to draw the upper curve, col-ored by subtype and
represented by + symbols) and once with asubset of sequences that
excluded the subtype of the query se-quence (generating the data
points used to draw the lower curve,colored by grayscale and
represented by symbols). This sec-ond scenario emulates conditions
where the query sequence isfrom an unknown subtype. The figure
depicts the BI from eachfragment as a pair of points. Both the
midpoint (+ or ) andthe extent of each fragment are shown. Vertical
lines connectpaired observations when one is misclassified. We fit
smoothcurves to the resulting data with loess locally weighted
regres-sion. Horizontal lines depict the optimal thresholds for
infer-ring that a sequence is of known subtype with greatest
accu-racy, either among all subtypes (black) or per subtype
(gray).Sequences yielding BI values above 0.66 are significantly
as-sociated with the subtype, with 93.5% confidence. For
moreinformation about BI, see Wilbe et al. 2003 (Virology
316:116125). For additional information about the generation of
thesecurves and their use in classifying HIV-1 sequences, see
Hraberet al. 2008 (J. Gen. Virol. 89:20982107).
2 HIV Sequence Compendium 2009
-
About the cover Preface
Pref
ace
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!!!!
!
!
!
!!
!
!
!
!
!!
!
!!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!!
!
!
!
!!
!!
!
!!
!!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!!
!!
!
!!!
!
!
!
!
!
!
!
!!!!
!
!!!!
!
!
!
!!
!!
!
!!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!!
!
!
!
!
!
!
!
!!
!
!!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!!
!
!
!
!!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!!!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
00.2
0.4
0.6
0.8
1
bra
nch
ing
in
dex
A1
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!!
!!
!
!
!
!
!
!!
!!!
!
!
!
!
!
!
!
!
!!!!
!
!
!
!
!!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!!
!
!!!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!!
!
!!!
!!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!!!
!
!!
!!!
!
!
!!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!!!!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!!
!
!!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!!
!
B
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!
!!
!
!
!!!!!!
!
!!!!!!!!!!!!!!!!!!
!
!
!
!
!!!!!!!!!!!!!!!!
!
!!!
!
!!!!!
!
!!!!!!!!!
!
!
!
!
!
!
!
!!!!!!
!
!!
!
!!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!!!!!!!
!!
!!!!!!
!
!
!
!
!
!!!
!
!!
!
!!!!!
!
!
!!
!
!
!
!!!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!!
!
!!!!
!
!
!
!
!
!
!
!!
!
!!
!
!
!!!!!!
!
!!!
!
!
!
!
!
!
!
!!!
!
!
!!!!!
!
!
!
!
!
!
!!!!!!!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!!!
!
!
!
!
!
!!!!!!!!!
!
!
!!!!
!
!!!!!!
!
!!!!!!!!!!!
!
!
!!!!!!!!!!!!!
!
!
!
!!
!
!
!
!
!
!!!
!
!
!
!!
!
!!!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!!
!!!
!
!
!
!
!
!
!!!!!!!!
!
!
!!
!
!
!
!
!!!
!
!
!
!
!
!
!!!!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!!!!!
!
!
!!
!
!
!!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!!
!
!
!
!!
!
!
!
!
!!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!!!!!!
!!
!
!!!!!!!!!!!!!
!
!!
!
!!!!!!!!!
!
!!!!!!!!
!
!
!
!
!
!!!
!
!
!
!!!
!
!
!
!
!
!
!!!!
!
!
!!!!!!!!!!
!
!
!
!!!!
!
!!
!!
!!!!
!
!!!!!!!!
!
!
!
!
!
!!!!
!
!!
!!
!
!!
!
!
!!!!!!!!!!!!!!!!
!
!!!!!!!!!!!!!!!
!
!
!
!
!!!!!!!
!
!!!!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!!!!
!
!
!
!
!
!
!
!!!!
!
!!!
!
!
!
!!!!
!
!!
C
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!!
!
!
!!!!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!!
!
!
!!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!!!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!!
!
!
!
!!!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!!!
!
!
!
!
!!
!
!
!!
!
!
!
!
!!!
!
!
!
!!!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!!
!
!!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!!!!!
!
!
!
!
!
!
!
!
!
!!
!!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
D
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!!
!!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!!
!
!!!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!!!
!
!
!
!!!!
!
!
!
!
!
!
!!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!!
!
!
!
!
!!!!!!!
!
!
!
!
!
!
!!!!!!
!
!
!!!!!!!!
!
!!!!
!
!!!!!!!!
!
!!!!!!!!!
!
!!!!!!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!!
!
!
!
!
00.2
0.4
0.6
0.8
1
bra
nch
ing
in
dex
0 2 4 6 8 10
alignment position (kb)
F1
!
!
!
!
!
!
!
!
!
!!!
!
!
!!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!!
!
!!!!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!!
!
!
!!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!!!!!!!
!
!
!
!!!!!!!
!
!!!!!!!!!!!!!!
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!
!!!!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
0 2 4 6 8 10
alignment position (kb)
F2
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!!!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!!!
!
!!
!
!
!!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!!!
!
!
!
!
!
!
!
!!
!!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!!
!
!!
!
!!
!!!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!!!!
0 2 4 6 8 10
alignment position (kb)
G
!!!!
!
!!!!!!!!!!!
!!!!
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!
!
!!
!
!!!!!
!
!!!!!!!!!!!!!
!
!!!!
!
!!!
!
!
!!!
!
!
!
!
!
!!!
!
!!!!
!
!!
!
!
!
!!!
!
!!!
!
!!!
!
!!
!
!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!!!!
!!
!
!
!!
!
!
!
!
!
!
!
!
!!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!!!!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!!
!
!
!!!
!
!
!
!
!
!!
!!!
!
!
!!
!
!!!
!
!
!
!
!!
!
!
!
!
!
!!
!!
!
!!!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!!
!!!!!
!
!
!
!!
!
!!!
!
!!
!!
!
!
!
!
!!
!!!
!
!
!
!
!!
!
!
!
!
!
!
!
!!!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!!!!!
!
!
!
!
!!
!
!
!!!
!
!!!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!!
!
!
!
!!
!
!
!!
!
!
!
!
!!
!
!!
!
!!
!
!
!
!!!
!
!
!
!!!!!
!
!
!!
!
!!!!!!
!
!!!!!!!!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!!!!!!!
!
!
!!
!!!!!
!
!
!
!!!
!!
!!!!
!
!!!
!
!!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!!
!
!!!
!
!!
!!
!
!
!
!
!
!!!!!!
!
!
!
!
!
!!
!
!
!
!
0 2 4 6 8 10
alignment position (kb)
H
Figure I.1: The branching index classifies HIV-1 subtypes.
HIV Sequence Compendium 2009 3
-
Preface
PrefaceG
enome
maps
I-6 Genome maps
LTR1
634gag
790 1186 1879 21341921 2086
2292
pol env
2085 2253 2550 3870 4230
5096
vif5041
5619
vpr5559
*5772
5850
5831
60455970
6045
vpu!6062
6310
tat
rev6225 7758
8795
8469
8379
8379
8653
nef8797
9417
LTR9086
9719
1
2
3
FRAME
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 9719
LTR1
8551103 1508 2198 2438
2249 2396
2668
2395 2638 2935 4252 4612
5754
vif5423
6070
vpx5898
6239
vpr6239
65026402
6697
6628
66976704 8228
92868861
8957
8861
9102
nef9120
9893
LTR9505
10359
1
2
3
FRAME
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 10359
LTR1
602
795 1266 1959 22321998 2184
2459
2183 2429 2747 4070 4430
5293
vif5238
5903
vpr5701
6042
5988
6214
6154
6214
6225 7719
8723
8352
8352
8719
nef8557
8457 9279
LTR8918
9597
1
2
3
FRAME
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 9597
tat
tat
rev
rev
HIV-1 HXB2
HIV-2 BEN
SIV Sykes
8424 9168
p17 p24
prot p51 RT p15 gp120 gp41
env
gp120 gp41
env
gp120 gp41
p31 int
polprot p51 RT p15
p15
p31 int
pol
prot p51 RT p31 int
p7
p2 p1
p6
gag
p17 p24 p7
p2 p1
p6
gag
p17 p24 p7
p2 p1
p6
3
3
3
5
5
5
Landmarks of the HIV-1, HIV-2, and SIV genomes. Open reading
frames are shown as rectangles. The gene start, indicated by the
small number in the upper leftcorner of each rectangle normally
records the position of the a in the atg start codon for that gene,
while the number in the lower right records the last position ofthe
stop codon. For pol, the start is taken to be the first t in the
sequence ttttttag, which forms part of the stem loop that
potentiates ribosomal slippage on theRNA and a resulting 1
frameshift and the translation of the Gag-Pol polyprotein. The tat
and rev spliced exons are shown as shaded rectangles. In HXB2,
*5772marks the position of a frameshift in the vpr gene caused by
an extra t relative to most other subtype B viruses; !6062
indicates a defective acg start codon invpu; 8424 and 9168 mark
premature stop codons in tat and nef. See Korber et al., Numbering
Positions in HIV Relative to HXB2CG, in Human Retroviruses andAIDS,
1998, p. 102. Available from
http://www.hiv.lanl.gov/content/sequence/HIV/REVIEWS/HXB2.html
4H
IVSequence
Com
pendium2009
http://www.hiv.lanl.gov/content/sequence/HIV/REVIEWS/HXB2.html
-
HIV/SIV proteins Preface
Pref
ace
I-7 HIV/SIV proteins
Name Size Function Localization
GagMA p17 membrane anchoring; env interaction; nuclear
transport of viral core (myristylated protein)virion
CA p24 core capsid virionNC p7 nucleocapsid, binds RNA
virion
p6 binds Vpr virion
PolProtease (PR) p15 Gag/Pol cleavage and maturation
virionReverseTranscriptase(RT)
p66, p51 reverse transcription, RNAse H activity virion
RNase H p15 virionIntegrase (IN) p31 DNA provirus integration
virion
Env gp120/gp41 external viral glycoproteins bind to CD4
andsecondary receptors
plasma membrane, virion envelope
Tat p16/p14 viral transcriptional transactivator primarily in
nucleolus/nucleus
Rev p19 RNA transport, stability and utilization
factor(phosphoprotein)
primarily in nuleolus/nucleusshuttling between nucleolus
andcytoplasm
Vif p23 promotes virion maturation and infectivity cytoplasm
(cytosol, membranes),virion
Vpr p10-15 promotes nuclear localization of
preintegrationcomplex, inhibits cell division, arrests infected
cells atG2/M
virion nucleus (nuclear membrane?)
Vpu p16 promotes extracellular release of viral
particles;degrades CD4 in the ER; (phosphoprotein only inHIV-1 and
SIVcpz)
integral membrane protein
Nef p27-p25 CD4 and class I downregulation (myristylated
protein) plasma membrane, cytoplasm,(virion?)
Vpx p12-16 Vpr homolog present in HIV-2 and some SIVs, absentin
HIV-1
virion (nucleus?)
Tev p28 tripartite tat-env-rev protein (also named Tnv)
primarily in nucleolus/nucleus
HIV Sequence Compendium 2009 5
-
Preface
Preface Landmarks of the genome
I-8 Landmarks of the genome
HIV genomic structural elements
LTR Long terminal repeat, the DNA sequence flanking thegenome of
integrated proviruses. It contains important reg-ulatory regions,
especially those for transcription initiationand
polyadenylation.
TAR Target sequence for viral transactivation, the binding
sitefor Tat protein and for cellular proteins; consists of
approxi-mately the first 45 nucleotides of the viral mRNAs in
HIV-1(or the first 100 nucleotides in HIV-2 and SIV.) TAR RNAforms
a hairpin stem-loop structure with a side bulge; thebulge is
necessary for Tat binding and function.
RRE Rev responsive element, an RNA element encoded withinthe env
region of HIV-1. It consists of approximately 200nucleotides
(positions 7327 to 7530 from the start of tran-scription in HIV-1,
spanning the border of gp120 and gp41).The RRE is necessary for Rev
function; it contains a highaffinity site for Rev; in all,
approximately seven binding sitesfor Rev exist within the RRE RNA.
Other lentiviruses (HIV-2, SIV, visna, CAEV) have similar RRE
elements in similarlocations within env, while HTLVs have an
analogous RNAelement (RXRE) serving the same purpose within their
LTR;RRE is the binding site for Rev protein, while RXRE is
thebinding site for Rex protein. RRE (and RXRE) form
complexsecondary structures, necessary for specific protein
binding.
PE Psi elements, a set of 4 stem-loop structures preceding
andoverlapping the Gag start codon which are the sites recog-nized
by the cysteine histidine box, a conserved motif withthe canonical
sequence CysX2CysX4HisX4Cys, present inthe Gag p7 MC protein. The
Psi Elements are present inunspliced genomic transcripts but absent
from spliced viralmRNAs.
SLIP An TTTTTT slippery site, followed by a stem-loop
struc-ture, is responsible for regulating the -1 ribosomal
frameshiftout of the Gag reading frame into the Pol reading
frame.
CRS Cis-acting repressive sequences postulated to
inhibitstructural protein expression in the absence of Rev. One
suchsite was mapped within the pol region of HIV-1. The
exactfunction has not been defined; splice sites have been
postu-lated to act as CRS sequences.
INS Inhibitory/Instability RNA sequences found within
thestructural genes of HIV-1 and of other complex
retroviruses.Multiple INS elements exist within the genome and can
actindependently; one of the best characterized elements
spansnucleotides 414 to 631 in the gag region of HIV-1. The
INSelements have been defined by functional assays as elementsthat
inhibit expression posttranscriptionally. Mutation of theRNA
elements was shown to lead to INS inactivation and upregulation of
gene expression.
Genes and gene products
GAG The genomic region encoding the capsid proteins
(groupspecific antigens). The precursor is the p55 myristylated
pro-
tein, which is processed to p17 (MAtrix), p24 (CApsid),
p7(NucleoCapsid), and p6 proteins, by the viral protease.
Gagassociates with the plasma membrane where the virus assem-bly
takes place. The 55 kDa Gag precursor is called assem-blin to
indicate its role in viral assembly.
POL The genomic region encoding the viral enzymes
protease,reverse transcriptase, RNAse, and integrase. These
enzymesare produced as a Gag-Pol precursor polyprotein, which
isprocessed by the viral protease; the Gag-Pol precursor is
pro-duced by ribosome frameshifting near the 3end of gag.
ENV Viral glycoproteins produced as a precursor (gp160)which is
processed to give a noncovalent complex of theexternal glycoprotein
gp120 and the transmembrane glyco-protein gp41. The mature
gp120-gp41 proteins are boundby non-covalent interactions and are
associated as a trimeron the cell surface. A substantial amount of
gp120 can befound released in the medium. gp120 contains the
bindingsite for the CD4 receptor, and the seven transmembrane
do-main chemokine receptors that serve as co-receptors for
HIV-1.
TAT Transactivator of HIV gene expression. One of two es-sential
viral regulatory factors (Tat and Rev) for HIV geneexpression. Two
forms are known, Tat-1 exon (minor form)of 72 amino acids and Tat-2
exon (major form) of 86 aminoacids. Low levels of both proteins are
found in persistentlyinfected cells. Tat has been localized
primarily in the nucle-olus/nucleus by immunofluorescence. It acts
by binding tothe TAR RNA element and activating transcription
initiationand elongation from the LTR promoter, preventing the
5LTRAATAAA polyadenylation signal from causing premature
ter-mination of transcription and polyadenylation. It is the
firsteukaryotic transcription factor known to interact with
RNArather than DNA and may have similarities with
prokaryoticanti-termination factors. Extracellular Tat can be found
andcan be taken up by cells in culture.
REV The second necessary regulatory factor for HIV expres-sion.
A 19 kDa phosphoprotein, localized primarily in
thenucleolus/nucleus, Rev acts by binding to RRE and promot-ing the
nuclear export, stabilization and utilization of the un-spliced
viral mRNAs containing RRE. Rev is consideredthe most functionally
conserved regulatory protein of lenti-viruses. Rev cycles rapidly
between the nucleus and the cy-toplasm.
VIF Viral infectivity factor, a basic protein of typically 23
kDa.Promotes the infectivity but not the production of viral
par-ticles. In the absence of Vif the produced viral particles
aredefective, while the cell-to-cell transmission of virus is
notaffected significantly. Found in almost all lentiviruses, Vifis
a cytoplasmic protein, existing in both a soluble cytoso-lic form
and a membrane-associated form. The latter formof Vif is a
peripheral membrane protein that is tightly asso-ciated with the
cytoplasmic side of cellular membranes. In2003, it was discovered
that Vif prevents the action of thecellular APOBEC-3G protein which
deaminates DNA:RNAheteroduplexes in the cytoplasm.
6 HIV Sequence Compendium 2009
-
Landmarks of the genome Preface
Pref
ace
VPR Vpr (viral protein R) is a 96-amino acid (14 kDa)
protein,which is incorporated into the virion. It interacts with
the p6Gag part of the Pr55 Gag precursor. Vpr detected in the cell
islocalized to the nucleus. Proposed functions for Vpr includethe
targeting the nuclear import of preintegration complexes,cell
growth arrest, transactivation of cellular genes, and in-duction of
cellular differentiation. In HIV-2, SIV-SMM, SIV-RCM, SIV-MND-2 and
SIV-DRL the Vpx gene is apparentlythe result of a Vpr gene
duplication event, possibly by recom-bination.
VPU Vpu (viral protein U) is unique to HIV-1, SIVcpz (theclosest
SIV relative of HIV-1), SIV-GSN, SIV-MUS, SIV-MON and SIV-DEN.
There is no similar gene in HIV-2,SIV-SMM or other SIVs. Vpu is a
16 kDa (81-amino acid)type I integral membrane protein with at
least two differentbiological functions: (a) degradation of CD4 in
the endoplas-mic reticulum, and (b) enhancement of virion release
fromthe plasma membrane of HIV-1-infected cells. Env and Vpuare
expressed from a bicistronic mRNA. Vpu probably pos-sesses an
N-terminal hydrophobic membrane anchor and ahydrophilic moiety. It
is phosphorylated by casein kinase IIat positions Ser52 and Ser56.
Vpu is involved in Env matu-ration and is not found in the virion.
Vpu has been found toincrease susceptibility of HIV-1 infected
cells to Fas killing.
NEF A multifunctional 27-kDa myristylated protein producedby an
ORF located at the 3end of the primate lentiviruses.Other forms of
Nef are known, including nonmyristylatedvariants. Nef is
predominantly cytoplasmic and associatedwith the plasma membrane
via the myristyl residue linked tothe conserved second amino acid
(Gly). Nef has also beenidentified in the nucleus and found
associated with the cy-toskeleton in some experiments. One of the
first HIV pro-teins to be produced in infected cells, it is the
most immuno-genic of the accessory proteins. The nef genes of HIV
andSIV are dispensable in vitro, but are essential for efficient
vi-ral spread and disease progression in vivo. Nef is necessaryfor
the maintenance of high virus loads and for the develop-ment of
AIDS in macaques, and viruses with defective Nefhave been detected
in some HIV-1 infected long term sur-vivors. Nef downregulates CD4,
the primary viral receptor,and MHC class I molecules, and these
functions map to dif-ferent parts of the protein. Nef interacts
with components ofhost cell signal transduction and
clathrin-dependent proteinsorting pathways. It increases viral
infectivity. Nef containsPxxP motifs that bind to SH3 domains of a
subset of Src ki-nases and are required for the enhanced growth of
HIV butnot for the downregulation of CD4.
VPX A virion protein of 12 kDa found in HIV-2, SIV-SMM,SIV-RCM,
SIV-MND-2 and SIV-DRL and not in HIV-1 orother SIVs. This accessory
gene is a homolog of HIV-1 vpr,and viruses with Vpx carry both vpr
and vpx. Vpx function inrelation to Vpr is not fully elucidated;
both are incorporatedinto virions at levels comparable to Gag
proteins through in-teractions with Gag p6. Vpx is necessary for
efficient replica-tion of SIV-SMM in PBMCs. Progression to AIDS and
death
in SIV-infected animals can occur in the absence of Vpr orVpx.
Double mutant virus lacking both vpr and vpx was at-tenuated,
whereas the single mutants were not, suggesting aredundancy in the
function of Vpr and Vpx related to viruspathogenicity.
Structural proteins/viral enzymes The products of gag, pol,and
env genes, which are essential components of the retro-viral
particle.
Regulatory proteins Tat and Rev proteins of HIV/SIV and Taxand
Rex proteins of HTLVs. They modulate transcriptionaland
posttranscriptional steps of virus gene expression and areessential
for virus propagation.
Accessory or auxiliary proteins Additional virion and
non-virion-associated proteins produced by HIV/SIV retro-viruses:
Vif, Vpr, Vpu, Vpx, Nef. Although the accessoryproteins are in
general not necessary for viral propagation intissue culture, they
have been conserved in the different iso-lates; this conservation
and experimental observations sug-gest that their role in vivo is
very important. Their functionalimportance continues to be
elucidated.
Complex retroviruses Retroviruses regulating their expres-sion
via viral factors and expressing additional proteins (reg-ulatory
and accessory) essential for their life cycle.
HIV Sequence Compendium 2009 7
-
Preface
Preface Amino acid codes
I-9 Amino acid codes
A AlanineB Aspartic Acid or AsparagineC CysteineD Aspartic AcidE
Glutamic AcidF PhenylalanineG GlycineH HistidineI IsoleucineK
LysineL LeucineM MethionineN AsparagineP ProlineQ GlutamineR
ArginineS SerineT ThreonineV ValineW TryptophanX unknown or other
amino acidY TyrosineZ Glutamic Acid or Glutamine. gap- identity*
stop codon# incomplete codon
I-10 Nucleic acid codes
A AdenineC CytosineG GuanineT ThymineU UracilM A or CR A or GW A
or TS C or GY C or TK G or TV A or C or GH A or C or TD A or G or
TB C or G or TN unknown. gap- identity
8 HIV Sequence Compendium 2009
-
HIV
-1/S
IVcp
zC
ompl
ete
Gen
omes
II
HIV-1/SIVcpz Complete Genomes
ContentsII-1 Introduction . . . . . . . . . . . . 9II-2
Annotated features . . . . . . . . . 10II-3 Sequences . . . . . . .
. . . . . . 12II-4 Alignments . . . . . . . . . . . . . 16
II-1 Introduction
The goal in selecting the sequences to present in the
publishedcompendium alignment is to display a set of sequences that
rep-resent the genetic variation of HIV-1 worldwide. Selections
aremade based on the list of identified subtypes and CRFs
togetherwith information from phylogenetic trees. Some effort was
alsoput into representing a diversity of geographic locations
andsampling times for each subtype, with an emphasis on more
re-cent samples. We include more sequences from the more preva-lent
subtypes, only one representative for each CRF (except forthe
widespread CRFs 01 and 02), and group N, group O, andCPZ sequences.
Within the limited space of the compendium,this selection is
intended to show the worldwide variation ofHIV-1 subtypes and CRFs.
For URFs, of which there is a grow-ing number, this alignment
contains none. The web-based ver-sion of this alignment contains
all URFs and many additionalrepresentatives of most subtypes and
CRFs.
As always, the HXB2 sequence is the master sequence in
thisalignment. This is also the HIV Database genome
coordinatestandard sequence. The alignment was generated by an
itera-tive process between automated alignment using HMMER
andmanual editing using BioEdit and Se-Al. As in previous years,the
alignment presented is not s