Top Banner
1 Introduction to Bioinformatic s
52

Introduction to

Jan 15, 2016

Download

Documents

adelio

Introduction to. Bioinformatics. Introduction to Bioinformatics. LECTURE 6: Natural selection at the molecular basis * Chapter 6: Fighting HIV. Introduction to Bioinformatics LECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS. 6.1 Acquired Immune Deficiency Syndrome (AIDS) - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to

1

Introduction to

Bioinformatics

Page 2: Introduction to

2

Introduction to Bioinformatics.

LECTURE 6: Natural selection at the molecular basis

* Chapter 6: Fighting HIV

Page 3: Introduction to

3

Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

6.1 Acquired Immune Deficiency Syndrome (AIDS)

* First noticed in 1979 as peculiar disease in US

* Only 1981 recognized as transmissible disease: AIDS

* Infectious agent: HIV (Human immunodeficiency Virus)

* Still not curable, more than 20 M victims, expensive medication (eg AZT) to keep the virus in check

* How does HIV manage to evade our attempts to destroy it?

Page 6: Introduction to

6

HIV is a retrovirus

A retrovirus is an enveloped virus possessing a RNA genome, and replicate via a DNA intermediate.

Retroviruses rely on the enzyme reverse-transcriptase to perform the reverse transcription of its genome from RNA into DNA, which can then be integrated into the host's genome with an integrase enzyme.

Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

Page 7: Introduction to

7

Page 8: Introduction to

8

Page 9: Introduction to

9Scanning electron micrograph of HIV-1 budding from lymphocyte.

Page 10: Introduction to

10

Page 11: Introduction to

11

Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

Page 12: Introduction to

12

THE WORLD

Mark Newman (http://www-personal.umich.edu/~mejn/)

Page 13: Introduction to

13

PEOPLE LIVING WITH HIV/AIDS

Mark Newman (http://www-personal.umich.edu/~mejn/)

Page 14: Introduction to

14

Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

6.2 Evolution and natural selection

1859: Charles Darwin: on the origin of species by means of natural selection.

At the molecular level: natural selection :

* removes deleterious mutations: purifying or negative selection

* Promotes spread of advantageous mutation: positive selection

Page 15: Introduction to

15

Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

6.3 HIV and the human immune system

* HIV has a 9.5 Kb RNA genome - no DNA!!!

* HIV is a retro-virus: RNA DNA virus

* HIV recognizes helper T-cells of the human immune system

* Infected T-cells have viral proteins sticking out that can be recognized by the immune system

* Short reproduction span: 1.5 days to reproduce

* RNA High error rate

Page 16: Introduction to

16

Introduction to Bioinformatics6.3: HIV and the human immune system

Fast reproduction + High error rate =

FAST EVOLUTION

Evolutionary arms race between human immune system and HIV

Page 17: Introduction to

17

Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

6.4 Quantifying natural selection on DNA sequences

* Mutations arise in the germ-line of one single individual and eventually become fixed in the population

* We observe fixed mutations as differences between individuals

* Most fixed mutations are neutral: genetic drift

* Some 80-90% of the non-neutral mutations are detrimental to the organismal function.

* A very small fraction of mutations is advantageous – but this is the engine for evolution.

Page 18: Introduction to

18

Introduction to Bioinformatics6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES

* How to measure whether mutations are neutral, deleterious, or advantageous?

* Experimentally very difficult: short-lived simple organisms, and large populations (typical a virus)

* Alternative: count number of mutations that can change the protein and those that don’t

* Synonymous and non-synonymous mutations.

Page 19: Introduction to

19

Introduction to Bioinformatics6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES

Remember the translation from nucleotides to aminoacids

(read from centre outwards)

Page 20: Introduction to

20

Introduction to Bioinformatics6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES

* Synonymous mutation: the new codon translates for the same amino-acid, example: GTT (Val) → GTA (Val).

* Non-synonymous mutations do not

* Mutations in the first position are sometimes synonymous (5%)

* Mutations in the second position are never synonymous

* Mutations in the third position are mostly synonymous

Page 21: Introduction to

21

Introduction to Bioinformatics6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES

* Almost all synonymous mutations are neutral.

* A priori, there are many more non-synonymous mutations possible than synonymous.

* In most genes 70% of the mutations are non-synonymous

* KA: #non-synonymous substitutions per non-synonymous site

* KS: #synonymous substitutions per synonymous site

Page 22: Introduction to

22

Introduction to Bioinformatics6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES

Motoo Kimura (1977):

Comparison of the non-synonymous to the synonymous substitutions in a gene tells us about the strength and form of the natural selection, i.e.: the ratio KA / KS.

Reasoning:

* Advantageous mutations are very rare* Deleterious mutations will ‘not’ spread through a population* Therefore, most mutations are neutral

Strong negative selection → Few non-synonymous substitutions

Page 23: Introduction to

23

Introduction to Bioinformatics6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES

* f0 = fraction of non-synonymous mutations that are neutral.* v = mutation rate

* # non-synonymous mutations after time t : KA = v f0 t* # synonymous mutations after time t : KS = v t

* KA / KS = f0

* Strong negative selection: f0 is small thus KA / KS < 1

* If KA / KS is > 1 this is evidence for advantageous non-synonymous mutations

Page 24: Introduction to

24

Introduction to Bioinformatics6.4 QUANTIFYING NATURAL SELECTION ON DNA SEQUENCES

* Define: α = fraction of non-synonymous mutations that are advantageous

* Then after time t : KA = v(f0 + α)t

* and: KA / KS = f0 + α

* Thus KA / KS is gauge for the natural selection on genes

* negative selection dominates: KA / KS < 1

* positive selection dominates: KA / KS > 1

* But averaged over the gene!

Page 25: Introduction to

25

Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

6.5 Estimating KA/KS

How to determine KA/KS?

Simplest way: just count and compare the number of synonymous and non-synonymous sites and ditto differences between two aligned strings

Correct for multiple substitutions (e.g. Jukes-Cantor)

Thus obtain a normalized ratio

Page 26: Introduction to

26

Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

6.5 Estimating KA/KS

Based upon this idea the algorithm of Masatoshi Nei and Takashi Gojobori (1986):

Assume that rate of transitions and transversions is the same

There is no bias towards codon usage (i.e. no information on the ensuing protein)

Page 27: Introduction to

27

Introduction to Bioinformatics6.5 ESTIMATING KA/KS

Nei-Gojobori algorithm

* Consider two aligned homologous sequences without gaps s1 and s2

* Sc = #synonymous sites between s1 and s2

* Ac = #non-synonymous sites between s1 and s2

* Sd = #synonymous differences between s1 and s2

* Ad = #non-synonymous differences between s1 and s2

Page 28: Introduction to

28

Introduction to Bioinformatics6.5 ESTIMATING KA/KS

Nei-Gojobori algorithm

* As the two sequences s1 and s2 are aligned there should be a correspondence between their codons.

NOTE: point mutations only act on nucleotides and not on codons but here we analyse whether a mutation results in different aminoacids

Page 29: Introduction to

29

Introduction to Bioinformatics6.5 ESTIMATING KA/KS

Nei-Gojobori algorithm

STEP 1: Count A and S sites

Page 30: Introduction to

30

Introduction to Bioinformatics6.5 NEI-GOJOBORI ALGORITHM

STEP 1: Count A and S sites

Example:

Consider the alignment : TTTTTA

This is – say – the k-th codon of a sequence.

Page 31: Introduction to

31

Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM

Now define:

sc(ck) = #synonymous sites in this codonac(ck) = 1 - sc(ck) = #non-synonymous sites in this codon

fi : fraction of changes in at i-th position of codon that result in a synonymous change (i=1,2,3)

Then:

sc(ck) = ∑ fi and: ac(ck) = 3 - sc(ck) = 3 - ∑ fi

Page 32: Introduction to

32

Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM

In our example:

Codon: TTA codes for: Leucine

The 6 synonyms for Leucine (table 2.2 chapter 2, p. 27):

CTA CTG CTC CTT TTA TTG

f1 : 1 (ATA(-),GTA(-),CTA(+) from 3 changes, so: 1/3

f2 : 0 (TAA(-),TGA(-),TCA(-)) from 3 changes, so: 0/3

f3 : 1 (TTG(+),TTC(-),TTT(-)) from 3 changes, so: 1/3

So:

sc(ck) = ∑ fi = 2/3 ac(ck) = 3 - sc(ck) = 3 - ∑ fi = 7/3

Page 33: Introduction to

33

Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM

For a DNA sequence of r codons:

Sc = ∑k=1:r sc(ck)

Ac = 3r - Sc

For multiple sequences: average these quantities

Note: do not include the STOP codon

Page 34: Introduction to

34

Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM

Nei-Gojobori algorithm

STEP 2: Count A and S differences

Page 35: Introduction to

35

Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM

Now define:

sd(ck) = #synonymous differences in this codonad(ck) = 1 - sd(ck) = #non-synonymous differences

Example:

sequence 1: GTT (Val)sequence 2: GTA (Val)

there is only 1 difference and it is synonymous, so:

sd = 1 and ad = 0

Page 36: Introduction to

36

Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM

Multiple nucleotide differences between two codons: If there are n differences between two codons (n=0,1,2,3)then there are n! pathways from the first to the second codon

Example:

sequence 1: TTT (Phe)sequence 2: GTA (Val)

the two possible pathways are :

pathway 1 : TTT (Phe) ↔ GTT (Val) ↔ GTA (Val)pathway 2 : TTT (Phe) ↔ TTA (Leu) ↔ GTA (Val)

Page 37: Introduction to

37

Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM

Example (Continued):

the two possible pathways are :

pathway 1 : TTT (Phe) ↔ GTT (Val) ↔ GTA (Val)pathway 2 : TTT (Phe) ↔ TTA (Leu) ↔ GTA (Val)

Pathway 1 has: 1 non-syn and 1 syn substitutionPathway 2 has: 2 non-syn and 0 syn substitutions

Assume that both pathways occur with same probability

Therefore:

sd = 1 syn / 2 pathways = 0.5ad = 3 non-syns / 2 pathways = 1.5

Page 38: Introduction to

38

Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM

For a codon with n differences:

* Consider all n! pathways of n point-mutations* Evaluate sd and ad as above:* Average over all paths with equal weights* The total number of syn and non-syn differences is:

Sd = ∑k=1:r sd(ck)

Ad = ∑k=1:r ad(ck)

Note: Sd + Ad is the total number of differences between the two sequences

Page 39: Introduction to

39

Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM

Nei-Gojobori algorithm

STEP 3: Compute KA and KS

Page 40: Introduction to

40

Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM

* Approximate the proportion of synonymous (ds) and non-synonymous differences by:

and

* Use the Jukes-Cantor correction to find the number of substitutions:

For both ds and da to obtain KS and KA.

c

ds

S

Sd

ˆ

c

da

A

Ad

ˆ

dK 34

43 1ln

Page 41: Introduction to

41

Introduction to Bioinformatics6.5: NEI-GOJOBORI ALGORITHM

SUMMARY of Nei-Gojobori algorithm:

see box on page 105 of the book

Remark: the algorithm is linear in the size of the sequences

Page 42: Introduction to

42

Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

6.6 Case study: natural selection and the HIV genome

* HIV is a fast evolving virus

* HIV is a different kind of virus and has RNA and no DNA

* An analysis of KA/KS over a gene is not so informative as it averages over positive and negative selection

* Sliding window plot gives information on smaller scale of evolution pressure.

Page 43: Introduction to

43

Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

Page 44: Introduction to

44

Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

6.6 Case study: natural selection and the HIV genome

* STEP 1: ORF finding

Page 45: Introduction to

45

Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

HIV-I genome

Page 46: Introduction to

46

Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

6.6 Case study: natural selection and the HIV genome

* STEP 1: ORF finding

* STEP 2: Nei-Gojobori to find high KA/KS ratios with sliding window plot.

Page 47: Introduction to

47

Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

HIV epitopes: the ENV geneAn epitope is the part of a macromolecule that is recognized by the immune system, specifically by antibodies.

ENV: Envelope and docking: strong selection pressure from human immune system

Page 48: Introduction to

48

Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

Page 49: Introduction to

49

Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

HIV epitopes: the GAG polyprotein

1500 bp : viral core

Strong selection pressure from human immune system

Page 50: Introduction to

50

Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

Page 51: Introduction to

51

Introduction to BioinformaticsLECTURE 6: NATURAL SELECTION AT THE MOLECULAR BASIS

Visualisation of the fast evolution of the HIV virus with a phylogenetic tree

Page 52: Introduction to

52

END of LECTURE 6