Top Banner
Learning how antibodies are drafted and revised Frederick “Erick” Matsen Fred Hutchinson Cancer Research Center @ematsen http://matsen.fredhutch.org/ with Trevor Bedford (FH), Connor McCoy, Vladimir Minin (UW), and Duncan Ralph (FH)
77
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Learning how antibodies are drafted and revised

Learning how antibodies are drafted and revised

Frederick “Erick” Matsen

Fred Hutchinson Cancer Research Center

@ematsenhttp://matsen.fredhutch.org/

with Trevor Bedford (FH), Connor McCoy, Vladimir Minin (UW), and Duncan Ralph (FH)

Page 2: Learning how antibodies are drafted and revised
Page 3: Learning how antibodies are drafted and revised

Jenner’s 1796 vaccine

 

Where are we 200 years later?

Page 4: Learning how antibodies are drafted and revised

RV144 HIV trial: 2003-200926,676 volunteers enrolled16,395 volunteers randomized125 infections$105,000,000 and 6 years

 

Prospective studies are expensive, slow, and entail complex moral issues.This does not lend itself to rapid vaccine development.

 

How might we guide vaccine development without disease exposure?

Page 5: Learning how antibodies are drafted and revised

Vaccines manipulate the adaptive immune system

 

 

What can we learn from antibody-making B cells without battle-testing them through disease exposure?

Page 6: Learning how antibodies are drafted and revised

Antibodies bind antigensAntigen

Light chain

Heavy chain

Page 7: Learning how antibodies are drafted and revised

Too many antigens to code for directly

≈∞⋯ ∞∞

Page 8: Learning how antibodies are drafted and revised

B cell diversification processV genes D genes J genes

Affinitymaturation

Somatic hypermutation

VDJ rearrangement

includingerosion and

nontemplatedinsertion

AntigenNaive B cell

Experienced B cell

Page 9: Learning how antibodies are drafted and revised

What germline really looks like (Eichler and Breden groups)

Page 10: Learning how antibodies are drafted and revised

Big aim: reconstruct from memory reads

ACATGGCTC...

ATACGTTCC...

TTACGGTTC...

ATCCGGTAC...

ATACAGTCT...

...

reality

...inference

Page 11: Learning how antibodies are drafted and revised

Why reconstruct B cell lineages?

...

1. Vaccine design

This one is really good.How can we elicit it?

Page 12: Learning how antibodies are drafted and revised

Why reconstruct B cell lineages?

...

1. Vaccine design

immunogen 1

immunogen 2

Page 13: Learning how antibodies are drafted and revised

Why reconstruct B cell lineages?

...

1. Vaccine design

?

2. Vaccine assay

Page 14: Learning how antibodies are drafted and revised

Why reconstruct B cell lineages?

...

1. Vaccine design

3. Evolutionary analysis to learn about underlying mechanisms

2. Vaccine assay

Page 15: Learning how antibodies are drafted and revised

Goal 1: find rearrangement groups

ACATGGCTC...

ATACGTTCC...

TTACGGTTC...

ATCCGGTAC...

ATACAGTCT...

...

reality

...rearrangement groups

Page 16: Learning how antibodies are drafted and revised

VDJ annotation problem: from where did each nucleotide come?

Somatic hypermutation

Sequencing primerSequencing error

3’V deletion

VD insertion

5’D deletion

3’D deletion5’J deletion

DJ insertion

Biological process

Sequencing

Inference

G

 

 

This is a key first step in BCR sequence analysis.

Page 17: Learning how antibodies are drafted and revised

Data: Illumina reads from CDR3 locus

Somatic  hypermut ation

Sequencing primerSequencing error

3’V deletion

VD insertion

5’D deletion

3’D deletion5’J deletion

DJ insertion

Biological process

SequencingG

Total of about 15 million unique 130nt sequences from memory B cellpopulations of three healthy individuals A, B, and C.

Page 18: Learning how antibodies are drafted and revised

“Thread” reads onto structureV genes D genes J genes

...

...

...

Page 19: Learning how antibodies are drafted and revised

HMM intro: dishonest casino

6 6

Page 20: Learning how antibodies are drafted and revised

HMM intro: dishonest casino

6 6

1-p

1-p

p

Page 21: Learning how antibodies are drafted and revised

HMM intro: dishonest casino

6 6

1-p

1-p

p

6 6

Page 22: Learning how antibodies are drafted and revised

HMM intro: dishonest casino

6 6

1-p

1-p

p

6 6

p1-p 1-p 1-p 1-p 1-p 1-p 1-p 1-p 1-p 1-p

Page 23: Learning how antibodies are drafted and revised

p

1-p 1-p 1-p 1-p1-p

1-p 1-p 1-p 1-p1-p

p p p pp

1-p

1-p

p

...

...

...

...

1-p

1-p

Page 24: Learning how antibodies are drafted and revised

p

1-p 1-p 1-p 1-p1-p

1-p 1-p 1-p 1-p1-p

p p p pp

1-p

1-p

p

...

...

...

...

1-p

1-p

Page 25: Learning how antibodies are drafted and revised

V genes D genes J genes

...

...

...

Page 26: Learning how antibodies are drafted and revised

V genes D genes J genes

...

...

...

Page 27: Learning how antibodies are drafted and revised

V genes D genes J genes

...

...

...

Page 28: Learning how antibodies are drafted and revised

Detour: write HMM inference package 

We wanted to use HMMoC by G Lunter (Bioinf 2007)… then tried extending StochHMM by Lott & Korf (Bioinf 2014)…

but it ended up being a complete rewrite by Duncan to make ham.

 

Takes HMM description in concise & intuitive YAML format (for CpG example, 440 chars for ham vs 5,961 for HMMoC XML)slightly faster and more memory efficient than HMMoCcontinuous integration via Docker

 

Then write BCR annotation package:

https://github.com/psathyrella/ham

https://github.com/psathyrella/partis

Page 29: Learning how antibodies are drafted and revised

What are probabilities?V genes D genes J genes

...

...

...

Page 30: Learning how antibodies are drafted and revised

Distributions are reproducibly weird!

bases0 5 10

frequency

0.0

0.1

0.2

0.3

0.4

IGHV270*12  V 3' deletion

ABC

IGHV270*12  V 3' deletion

bases0 5 10

frequency

0.0

0.1

0.2

0.3

0.4

IGHD114*01  D 5' deletion

ABC

IGHD114*01  D 5' deletion

bases0 5 10

frequency

0.0

0.2

0.4

0.6

IGHD727*01  D 3' deletion

ABC

IGHD727*01  D 3' deletion

bases0 5 10

frequency

0.00

0.05

0.10

0.15

0.20

IGHJ4*02  J 5' deletion

ABC

IGHJ4*02  J 5' deletion

Page 31: Learning how antibodies are drafted and revised

Distributions are reproducibly weird!

position200 250

mutation freq

0.0

0.1

0.2

0.3

0.4

IGHV323D*01

ABC

IGHV323D*01

position200 250

mutation freq

0.0

0.2

0.4

0.6

IGHV333*06

ABC

IGHV333*06

Page 32: Learning how antibodies are drafted and revised

Only insertions look simple

bases0 5 10 15

frequency

0.00

0.05

0.10

0.15

VD insertion

ABC

VD insertion

bases0 5 10

frequency

0.0

0.1

0.2

DJ insertion

ABC

DJ insertion

Page 33: Learning how antibodies are drafted and revised

Simulate sequences to benchmark 

 

Somatic hypermutation

Sequencing primerSequencing error

3’V deletion

VD insertion

5’D deletion

3’D deletion5’J deletion

DJ insertion

Biological process

Sequencing

Inference

G

 

Simulation code independent from inference code.

Page 34: Learning how antibodies are drafted and revised

Incorporating this complexity is good

hamming distance0 5 10 15

frequency

0.0

0.1

0.2

0.3

HTTNpartis (k=5)partis (k=1)ighutiliHMMunealignigblastimgt

HTTN

but there are still a number of errors.

Page 35: Learning how antibodies are drafted and revised

Remember goal: find rearrangement groups

ACATGGCTC...

ATACGTTCC...

TTACGGTTC...

ATCCGGTAC...

ATACAGTCT...

...

reality

...rearrangement groups

Page 36: Learning how antibodies are drafted and revised

Say we are given two sequences

1-p

p

1-p

2 ×

2 ×

Double rollof a single die

per turn

1-p

p

1-p

1-p

p

1-p

+

Two independentdie rolling games

vs.

Page 37: Learning how antibodies are drafted and revised

Double roll Pair HMM↔

p

1-p 1-p 1-p 1-p1-p

1-p 1-p 1-p 1-p1-p

p p p pp

1-p

1-p

p

...

...

...

...

1-p

1-p

Page 38: Learning how antibodies are drafted and revised

Do two sequences come from a singlerearrangement event?

 

The forward algorithm for HMMs gives probability of generatingobserved sequence from a given HMM:x

 

P(x) = P(x; σ),∑paths σ

 

probability of generating two sequences and from the same paththrough the HMM (summed across paths).

P(x, y) = P(x, y; σ),∑paths σ

x y

Page 39: Learning how antibodies are drafted and revised

V genes D genes J genes

...

...

...

Page 40: Learning how antibodies are drafted and revised

Do sets of sequences come from a single rearrangement event?

 

=P(A ∪ B)P(A)P(B)

P(A ∪ B | single rearrangement)P(A, B | independent rearrangements)

 

 

Use this for agglomerative clustering; stop when the ratio < 1.

Page 41: Learning how antibodies are drafted and revised

Preliminary simulation

 

Integrate out annotation uncertainty and win.

Page 42: Learning how antibodies are drafted and revised

Goal 2: how are antibodies revised?

Page 43: Learning how antibodies are drafted and revised

First, investigate BCR mutation patterns

affinitymaturation

antigennaive B cell

experienced B cell

clonalexpansion

somatic hypermutation

Page 44: Learning how antibodies are drafted and revised

Use two-taxon “trees” for model fittingnote: we know ancestral state within V, D, J.

VV DD JJ

IGN

OR

E

IGN

OR

E

IGN

OR

E

IGN

OR

E

 

Our “trees” have an observed read on the bottom and the corresponding“ancestral” germline sequence on top, connected by a branch,

representing some amount of divergence.

Page 45: Learning how antibodies are drafted and revised

model fitGeneral Time ReversibleIndividual A Individual B Individual C

0.14

0.79

0.10

0.22

0.72

0.41

0.69

0.40

0.06

0.28

0.73

0.17

0.08

0.48

0.27

0.17

0.35

0.50

0.66

0.23

0.32

0.42

0.37

0.36

0.35

0.11

0.46

1.02

0.12

1.12

0.85

0.31

0.18

1.10

0.91

0.06

0.12

0.79

0.10

0.19

0.60

0.43

0.76

0.36

0.07

0.24

0.67

0.18

0.07

0.64

0.23

0.14

0.36

0.44

0.74

0.21

0.36

0.33

0.33

0.45

0.28

0.14

0.44

0.76

0.13

1.15

0.94

0.34

0.24

0.89

0.86

0.07

0.14

0.72

0.11

0.21

0.54

0.43

0.71

0.37

0.08

0.24

0.65

0.18

0.08

0.50

0.27

0.16

0.27

0.49

0.65

0.16

0.45

0.39

0.34

0.52

0.27

0.14

0.50

0.73

0.14

1.05

0.79

0.28

0.23

0.90

0.70

0.08

T

C

G

A

T

C

G

A

T

C

G

A

IGHV

IGHD

IGHJ

A G C T A G C T A G C Tread

germ

line

Page 46: Learning how antibodies are drafted and revised

Best model according to AIC/BIC… has different matrices and fixed rate multipliers

for the different segments.

V D J

Seq. 1

Seq. 2

Seq. 3

t2

t3

t1 rDt1rJt1

rDt2rJt2

rDt3rJt3

Mutation Model

Page 47: Learning how antibodies are drafted and revised

Branch length distribution under this bestmodel

IGHD rate: 3.36IGHJ rate: 0.62

IGHD rate: 4.44IGHJ rate: 0.62

IGHD rate: 3.88IGHJ rate: 0.63

Individual A Individual B Individual C

0e+00

2e+05

4e+05

6e+05

8e+05

0.0e+00

5.0e+05

1.0e+06

1.5e+06

2.0e+06

0e+00

5e+05

1e+06

0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4ML branch length

count

 

D segments evolve substantially faster than VJ segments evolve more slowly than VIndividual A has a higher mutational load.

Page 48: Learning how antibodies are drafted and revised

Next consider selection (Goal 2 con’t)

affinitymaturation

antigennaive B cell

experienced B cell

clonalexpansion

somatic hypermutation

Page 49: Learning how antibodies are drafted and revised

 

AAC AAG

GTGGTC

more likely

less likely

In antibodies

Page 50: Learning how antibodies are drafted and revised

 

CCA CCTPro Pro

Thr Ile

ATCACC

synonymous

nonsynonymous

For selection

ὡ� ὡ� AAC AAG

GTGGTC

more likely

less likely

In antibodies

Page 51: Learning how antibodies are drafted and revised

Would like per-site selection inference 

ω ≡ ≡dN

dS

rate of non-synonymous substitutionrate of synonymous substitution

 

position200 250

mutation freq

0.0

0.1

0.2

0.3

0.4

IGHV323D*01

ABC

IGHV323D*01

Page 52: Learning how antibodies are drafted and revised

Productive vs. out-of-frame receptors 

Each cell may carry two IGH alleles, but only one is expressed.

 

V D J

V D J

insertion thatdisrupts frame

Page 53: Learning how antibodies are drafted and revised

ω ≡ ≡dN

dS

rate of non-synonymous substitutionrate of synonymous substitution

λSo

ut−

of−

fram

e

70 80 90 100

site (IMGT numbering)

0.1

1.0

individual A B C

 

 

Out-of-frame reads can be used to infer neutral mutation rate!

Page 54: Learning how antibodies are drafted and revised

is a ratio of rates in terms of observedneutral process

ωl

: nonsynonymous in-frame rate for site

: nonsynonymous out-of-frame rate for site

: synonymous in-frame rate for site

: synonymous out-of-frame rate for site

λ(N−I)l l

λ(N−O)l l

λ(S−I)l l

λ(S−O)l l

 

 

=ωl

/λ(N−I)l λ

(N−O)l

/λ(S−I)l λ

(S−O)l

Page 55: Learning how antibodies are drafted and revised

Renaissance count (Lemey,Minin… 2012)

TGGCCGCGAseq−5 CCTCAAATCACTCTATGGCCGCGA

seq−2 CCACAAATCACGTTA TGGCCGCGA

ArgPro Gln

Thr

Ile Thr L eu Trp Gln

Pro

seq−1 CCACAAACCACGTTA TGGCAG

seq−3

CGA

CCTCAAACCACTCTATGGCAGCGAseq−4 CCTCAAATCACTCTA

ACCATCATC

ATCACC

ACCATC

ATC

ATC

ACC

ATC

ATCACC

ACC

ACC

ATCATC

mutation historysample

Use sampledmutation histories to estimate rates...

but suchestimatescan be unstable.

Page 56: Learning how antibodies are drafted and revised

Empirical Bayes regularization to stabilize estimates

Say we are doing a per-county smoking survey.

zero smokers? Really?

Use all of the data to fit prior distribution of smoking prevalence, thenwith given observations obtain per-county posterior.

Page 57: Learning how antibodies are drafted and revised

Estimating selection coefficient ωl

: nonsynonymous in-frame rate for site

: nonsynonymous out-of-frame rate for site

: synonymous in-frame rate for site

: synonymous out-of-frame rate for site

λ(N−I)l l

λ(N−O)l l

λ(S−I)l l

λ(S−O)l l

 

 

=ωl

/λ(N−I)l λ

(N−O)l

/λ(S−I)l λ

(S−O)l

Page 58: Learning how antibodies are drafted and revised

Overall IGHV selection map

0.1

1.0

10.0

75 80 85 90 95 100 105

me

dia

Individual A

050

100150200

75 80 85 90 95 100 105S ite (IMGT numbering)

cou

nt

purifying neutral diversifying

Distribution of classifications across IGHV genes

Distribution ofmedian estimates of ω

Page 59: Learning how antibodies are drafted and revised

Similar across individualsIndividual A

050

100150200

75 80 85 90 95 100 105

cou

nt

Individual B

050

100150200

75 80 85 90 95 100 105

cou

nt

Individual C

Site (IMGT numbering)

050

100150

75 80 85 90 95 100 105

cou

nt

purifying neutral diversifying

Page 60: Learning how antibodies are drafted and revised

antigen

light chain

purifying

neutral

diversifying

Page 61: Learning how antibodies are drafted and revised
Page 62: Learning how antibodies are drafted and revised

Conclusion 

B cell receptors are “drafted” and “revised” randomly, but

… with remarkably consistent deletion and insertion patterns… with remarkably consistent substitution and selection

 

We can learn about these processes using model-based inference.

 

Paper on annotation with partis will be up soon is up on arXivSelection analysis paper

Page 63: Learning how antibodies are drafted and revised

Thank youTrevor Bedford, Connor McCoy, Vladimir Minin & Duncan RalphPhil Bradley for doing structural workMolecular work done by Paul Lindau in Phil Greenberg’s lab withsupport from Harlan Robins and Adaptive BiotechnologiesAdaptive Biotechnologies computational biology team

 

National Science Foundation and National Institute of HealthUniversity of Washington Center for AIDS Research (CFAR)University of Washington eScience InstituteW. M. Keck Foundation

 

Page 64: Learning how antibodies are drafted and revised

Addenda

Page 65: Learning how antibodies are drafted and revised

Measuring clustering agreementgood agreement:

bad agreement:

Cx

Cy

Cx

Cy

 

 

Intuition: “how much variability is there in the color for amongst theitems of a given color under ?

Cx

Cy

Page 66: Learning how antibodies are drafted and revised

Mutual information IThink of cluster identity under for a uniformly selected point as a

random variable (similarly for and ):Cx

X Cy Y

I(X; Y ) = H(X) − H(X|Y )where is the entropy of (ignoring ), and is the

entropy of given the value for .H(X) X Y H(X|Y )

X Y

 

I(X; Y ) = p(x, y) log ( )∑y∈Y

∑x∈X

p(x, y)p(x) p(y)

 

AMI(U, V ) =MI(U, V ) − E{MI(U, V )}

max {H(U), H(V )} − E{MI(U, V )}

Page 67: Learning how antibodies are drafted and revised

Estimates of the mutational process are quiteconsistent between individuals

(each point is a single entry for one of the matrices for a pair ofindividuals.)

Page 68: Learning how antibodies are drafted and revised

Branch length differences between productive,unproductive

Unproductive rearrangements are more likely to be either: unchangedfrom germline, or more divergent.

Page 69: Learning how antibodies are drafted and revised

Sites are generally under purifying selectionIndividual A

Individual B

Individual C

0

200

400

600

800

0

200

400

600

800

0

200

400

600

800

−1 0 1median log10(ω)

cou

nt

purifying diversifying neutral

cou

nt

cou

nt

Page 70: Learning how antibodies are drafted and revised

Similar across individuals (ii)

Page 71: Learning how antibodies are drafted and revised

Distribution of amino acidsbeginningof CDR3

selection for aromaticamino acids?Frequency: left of line = out-of-frame, right of line = in-frame

Page 72: Learning how antibodies are drafted and revised

Stabilize with empirical Bayes regularizationAssume that , the substitution rate at site , comes from a Gamma

distribution with shape and rate :λl l

α β

∼ Gamma(α, β).λl

 

Model total substitution counts (sampled via stochastic mapping) for asite as Poisson with rate :λl

∼ Poisson( ),Cl λl

 

Fit and to all data, then draw rates from the posterior:α̂ β̂ λl

∣ ∼ Gamma( + , 1 + ).λl Cl Cl α̂ β̂

 

We extended this regularization to case of non-constant coverage.

Page 73: Learning how antibodies are drafted and revised

Sequence countsstatus A B Cfunctional 4,139,983 4,861,800 3,748,306out-of-frame 533,919 794,845 558,246stop 104,525 169,423 112,901

Page 74: Learning how antibodies are drafted and revised

Correlation between sequence and GTR matrix

 

Each dot is a pair of genes.

Page 75: Learning how antibodies are drafted and revised

Simulation results for selection inference

● ● ● ● ● ● ● ● ●●

● ● ● ● ● ●● ● ● ●

●●

● ● ● ●● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

●● ● ● ● ● ● ●

●● ●

● ● ● ● ● ● ● ● ● ● ●●

● ● ● ● ● ● ● ● ●●

●●

● ● ●

● ●●

●● ●

● ● ●

● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●● ● ● ● ● ● ●

●● ●

● ● ● ● ● ● ● ● ● ● ●●

● ● ● ● ● ● ● ● ●●

● ●●

● ●

● ●

● ● ●●

● ● ● ● ● ● ●●

●●

● ● ●

●● ●

●● ●●

●● ●

0.1

1.0

10.0

0 25 50 75 100site

ω

s ynonymouschang epos s ible?

● yesno

type●●

●●

●●

purifyingneutraldiversifying

0.00

0.25

0.50

0.75

1.00

0 25 50 75 100site

Pro

po

rtio

n

typeNS

0

250

500

750

1000

0 25 50 75 100site

cove

rag

e

Page 76: Learning how antibodies are drafted and revised

Omega distribution

Page 77: Learning how antibodies are drafted and revised

Random factsMean length of D segment in individual A’s naive repertoire is 16.61.Subject A’s naive sequences were 37% CDR3Divergence between the various germ-line V genes:> summary(dist.dna(allele_01, pairwise.deletion=TRUE, model='raw'))Min. 1st Qu. Median Mean 3rd Qu. Max.0.003846 0.201300 0.344600 0.304700 0.384900 0.539500