Top Banner
Phylogenetic Analysis of Phylogenetic Analysis of the SARS virus the SARS virus Zhang Louxin Zhang Louxin Dept of Math, NUS Dept of Math, NUS
19

Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

Dec 14, 2015

Download

Documents

Dandre Russett
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

Phylogenetic Analysis of the SARS virusPhylogenetic Analysis of the SARS virus

Zhang Louxin Zhang Louxin

Dept of Math, NUSDept of Math, NUS

Page 2: Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

Our StudyOur Study

Genome-wide analysis of the SARS virus:Genome-wide analysis of the SARS virus: a) Comparison to other coronavirusesa) Comparison to other coronaviruses

b) Comparison between different strains of the SARS viruses.b) Comparison between different strains of the SARS viruses.

Objective: Objective: a) Search for clues to where the SARS virus came froma) Search for clues to where the SARS virus came from

b) Provide a technique in tracking the step-by-step b) Provide a technique in tracking the step-by-step transmission of the SARS.transmission of the SARS.

Page 3: Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

Genome Structure of the SARS virusGenome Structure of the SARS virus(Marra et al., Rota et al., Ruan et al., 2003)(Marra et al., Rota et al., Ruan et al., 2003)

The RNA genome contains about 30k bps, having five major open reading frames (ORFs): ORF1a and ORF1b: replicase polyprotein (13149, 7887 bps) S: spike glycoprotein (3768 bps) E: small envelope protein (231 bps) M: membrane glycoproteins (666 bps) N: nucleocapsid protein (1269 bps)and 7 unknown ORF’s X’s (total 2595 bps)

Page 4: Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

Part 1:Part 1:Comparison to other cornaviruseComparison to other cornaviruse

In comparing the sequences of SARS virus and In comparing the sequences of SARS virus and each known coronavirus, we dideach known coronavirus, we did

a)a) pairwise alignment between genomes pairwise alignment between genomes of SARS-Cov and the coronavirus. of SARS-Cov and the coronavirus.

b)b) the quality of a) was assessed using the quality of a) was assessed using VISTA which measuring the match rate f(x)VISTA which measuring the match rate f(x) in the window [x-99, x] in the window [x-99, x]

Page 5: Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

Reference: Sars_HKConserve level: 60%Highlighted segments: ORF1a, ORF1b, S, M, N

Cowvs

SARS_HK

Page 6: Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

Reference: SARS_HKConservation Level: 50%Highlighted Segments: ORF1a, ORF1b, S, M, N

Human(229E)Pig

Turkey

Mouse

Cow

Bird

Page 7: Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

Observations and QuestionsObservations and Questions

The SARS coronavirus is not a recombinant between known The SARS coronavirus is not a recombinant between known coronaviruses. (coronaviruses. (Marra et al., Rota et al., Ruan et al., 2003Marra et al., Rota et al., Ruan et al., 2003).).

It probably evolved from an ancestor of the coronavirus family It probably evolved from an ancestor of the coronavirus family

((in anin an unidentified hostunidentified host) for a long time before infecting humans in 2002. ) for a long time before infecting humans in 2002.

Question 1Question 1: What is the: What is the host? host? Civet cat or cowCivet cat or cow??

Question2Question2: How did the SARS-COV evolve? Did it acquire genes from : How did the SARS-COV evolve? Did it acquire genes from its host, and/or exchange genetic information with other viruses?its host, and/or exchange genetic information with other viruses?

Page 8: Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

Only Bioinformatics is not EnoughOnly Bioinformatics is not Enough

Searches of the GeneBank and genomic databases (with BLAST)Searches of the GeneBank and genomic databases (with BLAST)

indicated there is no significant sequence matches to ORF1a,indicated there is no significant sequence matches to ORF1a, and other ORF’s in the last one thirds of the genome. and other ORF’s in the last one thirds of the genome.

This motivates us to consider the other ORF’s X1-X7 that encode putative This motivates us to consider the other ORF’s X1-X7 that encode putative proteins, which are located in the last one thirds of the genome.proteins, which are located in the last one thirds of the genome.

We conductedWe conducted a)a) BLAST searches BLAST searches b)b) Analysis of nucleotide base difference among different strains of Analysis of nucleotide base difference among different strains of

the SARS-COV.the SARS-COV.

Our analysis shows clearly that they are very unique to the SARS-COV.Our analysis shows clearly that they are very unique to the SARS-COV.

Page 9: Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

BLAST Searches: No significant hits.BLAST Searches: No significant hits. X1: X1:

gi|30794373|ref|NM_178804.2| Mus musculus slit homolog 2 (D... 44 0.35 gi|30794373|ref|NM_178804.2| Mus musculus slit homolog 2 (D... 44 0.35 gi|20830572|ref|XM_132035.1| Mus musculus slit homolog 2 (D... 44 0.35 gi|20830572|ref|XM_132035.1| Mus musculus slit homolog 2 (D... 44 0.35 gi|5532494|gb|AF144628.1|AF144628 Mus musculus SLIT2 (Slit2... 44 0.35 gi|5532494|gb|AF144628.1|AF144628 Mus musculus SLIT2 (Slit2... 44 0.35 gi|26343252|dbj|AK053145.1| Mus musculus 0 day neonate lung... 44 0.35 gi|26343252|dbj|AK053145.1| Mus musculus 0 day neonate lung... 44 0.35 gi|4151258|gb|AF074960.1|AF074960 Mus musculus neurogenic e... 44 0.35 gi|4151258|gb|AF074960.1|AF074960 Mus musculus neurogenic e... 44 0.35 gi|26084277|dbj|AK034918.1| Mus musculus 12 days embryo emb... 44 0.35 gi|26084277|dbj|AK034918.1| Mus musculus 12 days embryo emb... 44 0.35

X2:X2:

gi|4096106|gb|U23446.1|DMU23446 Drosophila melanogaster tri... 40 3.0 gi|4096106|gb|U23446.1|DMU23446 Drosophila melanogaster tri... 40 3.0 gi|21689975|emb|AL670472.7| Mouse DNA sequence from clone R... 40 3.0 gi|21689975|emb|AL670472.7| Mouse DNA sequence from clone R... 40 3.0 gi|12743807|emb|AL365322.10| Mouse DNA sequence from clone ... 40 3.0 gi|12743807|emb|AL365322.10| Mouse DNA sequence from clone ... 40 3.0

X3:X3:

gi|21212249|emb|AL672015.6| Mouse DNA sequence from clone R... 46 0.019 gi|21212249|emb|AL672015.6| Mouse DNA sequence from clone R... 46 0.019 gi|1065946|gb|U40800.1| Caenorhabditis elegans cosmid D2096... 42 0.29 gi|1065946|gb|U40800.1| Caenorhabditis elegans cosmid D2096... 42 0.29 gi|17539537|ref|NM_069020.1| Caenorhabditis elegans putativ... 42 0.29 gi|17539537|ref|NM_069020.1| Caenorhabditis elegans putativ... 42 0.29 gi|13625036|emb|AL390789.13| Human DNA sequence from clone ... 42 0.29 gi|13625036|emb|AL390789.13| Human DNA sequence from clone ... 42 0.29 gi|16041563|gb|AC067844.6| Homo sapiens chromosome 8, clone... 40 1.2 gi|16041563|gb|AC067844.6| Homo sapiens chromosome 8, clone... 40 1.2 gi|18855236|emb|AL645928.9| Mouse DNA sequence from clone R... 40 1.2 gi|18855236|emb|AL645928.9| Mouse DNA sequence from clone R... 40 1.2 gi|30267397|gb|AY261744.1| Rhingia campestris 28S ribosomal... 38 4.6 gi|30267397|gb|AY261744.1| Rhingia campestris 28S ribosomal... 38 4.6 gi|28372410|gb|AY228336.1| Homo sapiens Kell blood group (K... 38 4.6 gi|28372410|gb|AY228336.1| Homo sapiens Kell blood group (K... 38 4.6 gi|21844631|gb|AC122049.2| Mus musculus clone RP24-456N19, ... 38 4.6 gi|21844631|gb|AC122049.2| Mus musculus clone RP24-456N19, ... 38 4.6

X4:X4:

gi|18464246|gb|AC104684.3| Homo sapiens BAC clone RP11-1N7 ... 44 0.15 gi|18464246|gb|AC104684.3| Homo sapiens BAC clone RP11-1N7 ... 44 0.15 gi|30682091|ref|NM_120859.2| Arabidopsis thaliana formin ho... 40 2.4 gi|30682091|ref|NM_120859.2| Arabidopsis thaliana formin ho... 40 2.4 gi|23462974|gb|AC121780.3| Mus musculus chromosome 5 clone ... 40 2.4 gi|23462974|gb|AC121780.3| Mus musculus chromosome 5 clone ... 40 2.4 gi|19482381|gb|AC023886.7| Homo sapiens BAC clone RP11-402J... 40 2.4 gi|19482381|gb|AC023886.7| Homo sapiens BAC clone RP11-402J... 40 2.4 gi|19034012|gb|AC097520.2| Homo sapiens BAC clone RP11-562F... 40 2.4 gi|19034012|gb|AC097520.2| Homo sapiens BAC clone RP11-562F... 40 2.4 gi|15668150|gb|AC096670.1| Homo sapiens BAC clone RP11-438K... 40 2.4 gi|15668150|gb|AC096670.1| Homo sapiens BAC clone RP11-438K... 40 2.4 gi|28411192|emb|AL662879.4|HS279F22 Homo sapiens chromosome... 40 2.4 gi|28411192|emb|AL662879.4|HS279F22 Homo sapiens chromosome... 40 2.4 gi|16973056|emb|AL590082.9| Human DNA sequence from clone R... 40 2.4 gi|16973056|emb|AL590082.9| Human DNA sequence from clone R... 40 2.4 gi|12657180|emb|AL390882.12| Human DNA sequence from clone ... 40 2.4 gi|12657180|emb|AL390882.12| Human DNA sequence from clone ... 40 2.4

Page 10: Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

BLAST Searches (Con’t)BLAST Searches (Con’t)

X5: X5:

gi|8927595|gb|AC019026.12| Mus musculus chromosome 6 clone ... 42 0.40 gi|8927595|gb|AC019026.12| Mus musculus chromosome 6 clone ... 42 0.40 gi|28830180|gb|AC115682.2| Dictyostelium discoideum chromos... 40 1.6 gi|28830180|gb|AC115682.2| Dictyostelium discoideum chromos... 40 1.6 gi|21306527|gb|AC120337.4| Homo sapiens X BAC RP11-804N7 (R... 40 1.6 gi|21306527|gb|AC120337.4| Homo sapiens X BAC RP11-804N7 (R... 40 1.6 gi|24286735|gb|AY160107.1| Dictyostelium discoideum nucleot... 40 1.6 gi|24286735|gb|AY160107.1| Dictyostelium discoideum nucleot... 40 1.6 gi|18423447|ref|NM_124670.1| Arabidopsis thaliana pyruvate ... 38 6.3 gi|18423447|ref|NM_124670.1| Arabidopsis thaliana pyruvate ... 38 6.3 gi|30522907|gb|AC124906.3| Equus caballus clone CH241-268I1... 38 6.3 gi|30522907|gb|AC124906.3| Equus caballus clone CH241-268I1... 38 6.3 gi|21403217|gb|AY084507.1| Arabidopsis thaliana clone 10991... 38 6.3 gi|21403217|gb|AY084507.1| Arabidopsis thaliana clone 10991... 38 6.3 gi|28850462|gb|AF277315.6| Homo sapiens chromosome X clone ... 38 6.3 gi|28850462|gb|AF277315.6| Homo sapiens chromosome X clone ... 38 6.3 gi|27802686|gb|AY213194.1| Homo sapiens Cockayne syndrome 1... 38 6.3 gi|27802686|gb|AY213194.1| Homo sapiens Cockayne syndrome 1... 38 6.3 gi|25189039|gb|AC116179.6| Mus musculus chromosome 10 clone... 38 6.3 gi|25189039|gb|AC116179.6| Mus musculus chromosome 10 clone... 38 6.3

X6X6

gi|6513908|gb|AC005875.2|AC005875 citb_188_b_12, complete s... 40 1.9 gi|6513908|gb|AC005875.2|AC005875 citb_188_b_12, complete s... 40 1.9 gi|21111045|gb|AE012104.1| Xanthomonas campestris pv. campe... 38 7.4 gi|21111045|gb|AE012104.1| Xanthomonas campestris pv. campe... 38 7.4 gi|30142462|emb|AL954861.11| Zebrafish DNA sequence from cl... 38 7.4 gi|30142462|emb|AL954861.11| Zebrafish DNA sequence from cl... 38 7.4 gi|18642949|gb|AC104083.3| Homo sapiens BAC clone RP11-588K... 38 7.4 gi|18642949|gb|AC104083.3| Homo sapiens BAC clone RP11-588K... 38 7.4 gi|14196413|gb|AC010881.8| Homo sapiens BAC clone RP11-289J... 38 7.4 gi|14196413|gb|AC010881.8| Homo sapiens BAC clone RP11-289J... 38 7.4 gi|7839913|gb|AC016678.4|AC016678 Homo sapiens BAC clone RP... 38 7.4 gi|7839913|gb|AC016678.4|AC016678 Homo sapiens BAC clone RP... 38 7.4 gi|7740042|gb|AC005703.2|AC005703 Homo sapiens chromosome 1... 38 7.4 gi|7740042|gb|AC005703.2|AC005703 Homo sapiens chromosome 1... 38 7.4 gi|26095929|dbj|AK053670.1| Mus musculus 0 day neonate eyeb... 38 7.4 gi|26095929|dbj|AK053670.1| Mus musculus 0 day neonate eyeb... 38 7.4 gi|23337151|emb|AL606479.16| Mouse DNA sequence from clone ... 38 7.4 gi|23337151|emb|AL606479.16| Mouse DNA sequence from clone ... 38 7.4

X7X7

gi|22833002|gb|AE003426.2| Drosophila melanogaster chromoso... 40 1.3 gi|22833002|gb|AE003426.2| Drosophila melanogaster chromoso... 40 1.3 gi|27676657|ref|XM_218459.1| Rattus norvegicus similar to h... 40 1.3 gi|27676657|ref|XM_218459.1| Rattus norvegicus similar to h... 40 1.3 gi|27597031|gb|AC131788.3| Mus musculus chromosome 7 clone ... 40 1.3 gi|27597031|gb|AC131788.3| Mus musculus chromosome 7 clone ... 40 1.3 gi|21397252|gb|AC104148.5| Drosophila melanogaster X BAC RP... 40 1.3 gi|21397252|gb|AC104148.5| Drosophila melanogaster X BAC RP... 40 1.3 gi|18129378|gb|AC098575.6| Drosophila melanogaster X BAC RP... 40 1.3 gi|18129378|gb|AC098575.6| Drosophila melanogaster X BAC RP... 40 1.3 gi|6634463|emb|AL117344.12|HSJ395C13 Human DNA sequence fro... 40 1.3 gi|6634463|emb|AL117344.12|HSJ395C13 Human DNA sequence fro... 40 1.3

Page 11: Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

Variations among the SARS-Cov strains:Variations among the SARS-Cov strains:Nucleotide base differencesNucleotide base differences

Vietnam_CDC tagtggggtt cagtttcgtg ccattccgta ccct Singapore_2774 tagcggggtt cagtctcgaa tcattttgta ctct Toronto_02 tagcggggtt cagtctcgta ctatgctata ctct HongKong cagcaccgtt caagttcata ctattctgaa ttct CUHK tatcggggcc cagtcgtgtg ctactctgta ctcc Beijing_01 tagcgggtct tcgtcgcgta ctgctctgtc cctt Gunagzhou_01 tagcgggtcc cagtcgcgta ctactctgta ctcc Taiwan_01 tggcggggtt cagtctcgta ctattctgta ctct

Summary: a) Differences occur in 34 positions; b) There are 10 positions where the sequence variants appear in >=1 sequences; c) The base differences could be sequencing errors, mutational noises, and mutational sites in the SARS genomes.

Page 12: Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

Simple StatisticsSimple Statistics

Strains The Number of Nucleotide Base Differences

ORF1a ORF1b S The rest

VietnamCDC 1 3 1 1

Singapore 0 3 1 0

Toronto 0 1 1 1

HongKong1 4 4 0 2

CUHK 3 3 1 1

Beijing 4 1 1 1

Guangzhou 3 1 1 1

Taiwan 1 0 0 0

Means 2 2 0.75 0.875

Deviations 1.69 1.55 0.46 0.64

All differences appear in 78 positionsReference: Their consensus sequence

Page 13: Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

No Significant Differences in X’sNo Significant Differences in X’s

Total length is 2595 bps(8.65%). The expected differences in theseRegions are about 5 to 6.

Therefore, those X’scould be useful as markers forinferring the evolutionaryhistory of SARS virus.

PositionsCDC

GIS

TOR

HK

CUHK

BJ

GZ

TW

X1 25298 g g a g g g g g

25569 t t t a t t t t

X2

X3 27243 c c c c c t c c

X4

X5

X6

X7

Page 14: Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

Part2:Part2: Phylogenetic analysis Phylogenetic analysis

in tracking the step-by-step transmission in tracking the step-by-step transmission

One unusual property of viruses is that they evolves rapidly. One unusual property of viruses is that they evolves rapidly. Such a property is an unfortunate one from the perspective ofSuch a property is an unfortunate one from the perspective of creating a vaccine. creating a vaccine.

However, their rapid evolution enables fine-scale phylogeneticHowever, their rapid evolution enables fine-scale phylogeneticanalysis. For example, it has been used for analysis. For example, it has been used for a)a) finding the origins of the HIV viruses ( finding the origins of the HIV viruses (Gao et al.’99, Hahn et al.’00Gao et al.’99, Hahn et al.’00))

b)b) tracking man-to-man transmission of the HIV virus ( tracking man-to-man transmission of the HIV virus (Ou et al.’92Ou et al.’92))

c)c) deriving the transmission model of influenza deriving the transmission model of influenza ((Fitch et al.’97, Bush et al.’99Fitch et al.’97, Bush et al.’99))

Page 15: Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

Phylogenetic Analysis Phylogenetic Analysis

A molecular phylogeny summariesA molecular phylogeny summaries

the genetic variations of the molecularthe genetic variations of the molecular

sequences being studied.sequences being studied.

(Marra et al, 2003)The rationale behind disease tracking:

If A infects B and then B infects C, then the virus in C is more similar to B than A.

Page 16: Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

Phylogenetic Analysis of 8 StrainsPhylogenetic Analysis of 8 Strains

a) Compute pairwise alignemnts over ORF1b region.

b) Find pairwise distances using Jukes and Cantor model;

c) Draw the phylogeny using neighboring drawing method

Page 17: Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

Phylogenetic Analysis of 8 Strains (Con’t)Phylogenetic Analysis of 8 Strains (Con’t)

The phylogeny was constructed over S-E-M-N regions using the parsimony method.

Page 18: Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

Phylogenetic Analysis of 8 Strains (Con’t)Phylogenetic Analysis of 8 Strains (Con’t)

Our work indicates that it is possible to track the step-by-steptransmission of the SARS using molecular data.

However, a lots of work need to be done in the future. For example, we need to study the effects of mutational noise on the phylogeny analysis.

Page 19: Phylogenetic Analysis of the SARS virus Zhang Louxin Dept of Math, NUS Dept of Math, NUS.

ConclusionConclusion

a)a) Our genome-wide analysis indicates that the regions that Our genome-wide analysis indicates that the regions that encode unknown proteins in the last 1/3 part of the SARS encode unknown proteins in the last 1/3 part of the SARS genome could be useful as markers for studying the virus.genome could be useful as markers for studying the virus.

b). Phylogenetic analysis can be used for tracking the step-by-b). Phylogenetic analysis can be used for tracking the step-by-step transmission of the SARS. step transmission of the SARS.

The work was discussed with Louis Chen, Choi Kwok Pui, David Chew, The work was discussed with Louis Chen, Choi Kwok Pui, David Chew,

Phil Long, P. Kolatkar, Vega Vinsensius, Lin Chin-Yo.Phil Long, P. Kolatkar, Vega Vinsensius, Lin Chin-Yo.

The work was done with Li Quan and Wu YongHui.The work was done with Li Quan and Wu YongHui.