Top Banner
Analysis of adaptive immune receptors and repertoires and immuneML [email protected] Milena Pavlović Department of Informatics Department of Immunology
16

Analysis of adaptive immune receptors and repertoires and ...€¦ · Immune receptors are natural diagnostics and therapeutics • Immune cells use immune receptors on their surface

Sep 30, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Analysis of adaptive immune receptors and repertoires and ...€¦ · Immune receptors are natural diagnostics and therapeutics • Immune cells use immune receptors on their surface

Analysis of adaptive immune receptors and repertoires and immuneML

[email protected]

Milena PavlovićDepartment of InformaticsDepartment of Immunology

Page 2: Analysis of adaptive immune receptors and repertoires and ...€¦ · Immune receptors are natural diagnostics and therapeutics • Immune cells use immune receptors on their surface

Immune receptors are natural diagnostics and therapeutics

• Immune cells use immune receptors on their surface to recognize antigens (e.g. parts of a virus or bacteria)

• Immune receptor consists of two sequences (chains) of amino acids (~110 amino acids long)

• CDR3: the most variable part of an immune receptor; 5-20 amino acids long

• Immune repertoires consist of all immune receptors in an individual: 109 unique receptors per individual with low overlap between individuals

• Immune receptors in a person are specific to disease the person has or had before

epitope

antigen receptor

FR1 FR2 FR3

CDR1 CDR2

N1-D-N2 FR4 CH1...

Constantregion

Variable region

CDR3Light chain

Heavy chain

Page 3: Analysis of adaptive immune receptors and repertoires and ...€¦ · Immune receptors are natural diagnostics and therapeutics • Immune cells use immune receptors on their surface

Immune receptors are natural diagnostics and therapeutics

epitope

antigen receptor

learn the mechanism of generation of receptors and recognition of antigens

use it to diagnose a diseaseor create artificial receptors

Page 4: Analysis of adaptive immune receptors and repertoires and ...€¦ · Immune receptors are natural diagnostics and therapeutics • Immune cells use immune receptors on their surface

Problem formulation or what to consider when building a diagnostic

Predict if a patient has a disease (multi-instance learning problem - only some receptors out of all receptors in the patient are disease-specific)

Predict if a receptor binds an antigen (describe antigen’s receptor sequence space)

Predict if a receptor sequence binds to antigens (use structural data from multiple antigens - are there shared rules?)

Predict antigens the receptor will bind to

Page 5: Analysis of adaptive immune receptors and repertoires and ...€¦ · Immune receptors are natural diagnostics and therapeutics • Immune cells use immune receptors on their surface

Disease status and antigen binding prediction

Page 6: Analysis of adaptive immune receptors and repertoires and ...€¦ · Immune receptors are natural diagnostics and therapeutics • Immune cells use immune receptors on their surface

Challenges when building a diagnostic• Receptor sequence space is low-dimensional

(~15) and yet receptors recognize a lot of different antigens - a lot of dependencies in the sequence

• Low signal-to-noise ratio for any given disease - very low percentage of receptors is specific to the disease (and this percentage is disease-dependent)

• A lot of confounding factors (genetics - antigen presentation and sequence diversity, age)

Page 7: Analysis of adaptive immune receptors and repertoires and ...€¦ · Immune receptors are natural diagnostics and therapeutics • Immune cells use immune receptors on their surface

Challenges when building a diagnostic• Receptor sequence space is low-dimensional

(~15) and yet receptors recognize a lot of different antigens - a lot of dependencies in the sequence

• Low signal-to-noise ratio for any given disease - very low percentage of receptors is specific to the disease (and this percentage is disease-dependent)

• A lot of confounding factors (genetics - antigen presentation and sequence diversity, age)

⇒How to represent the (set of) sequence(s)

& model the problem?

Page 8: Analysis of adaptive immune receptors and repertoires and ...€¦ · Immune receptors are natural diagnostics and therapeutics • Immune cells use immune receptors on their surface

Challenges when building a diagnostic• Receptor sequence space is low-dimensional

(~15) and yet receptors recognize a lot of different antigens - a lot of dependencies in the sequence

• Low signal-to-noise ratio for any given disease - very low percentage of receptors is specific to the disease (and this percentage is disease-dependent)

• A lot of confounding factors (genetics - antigen presentation and sequence diversity, age)

⇒How to represent the (set of) sequence(s)

& model the problem?

Page 9: Analysis of adaptive immune receptors and repertoires and ...€¦ · Immune receptors are natural diagnostics and therapeutics • Immune cells use immune receptors on their surface

Challenges when building a diagnostic• Receptor sequence space is low-dimensional

(~15) and yet receptors recognize a lot of different antigens - a lot of dependencies in the sequence

• Low signal-to-noise ratio for any given disease - very low percentage of receptors is specific to the disease (and this percentage is disease-dependent)

• A lot of confounding factors (genetics - antigen presentation and sequence diversity, age)

⇒How to represent the (set of) sequence(s)

& model the problem?

⇒ How to account for confounders?

Page 10: Analysis of adaptive immune receptors and repertoires and ...€¦ · Immune receptors are natural diagnostics and therapeutics • Immune cells use immune receptors on their surface

Challenges when building a diagnostic• Receptor sequence space is low-dimensional

(~15) and yet receptors recognize a lot of different antigens - a lot of dependencies in the sequence

• Low signal-to-noise ratio for any given disease - very low percentage of receptors is specific to the disease (and this percentage is disease-dependent)

• A lot of confounding factors (genetics - antigen presentation and sequence diversity, age)

⇒How to represent the (set of) sequence(s)

& model the problem?

⇒ How to account for confounders?

• Approaching the challenges: build a platform which will allow the experimentation with different models and data representations for large immune datasets

Page 11: Analysis of adaptive immune receptors and repertoires and ...€¦ · Immune receptors are natural diagnostics and therapeutics • Immune cells use immune receptors on their surface

Immune repertoire, paired and single chain receptor data

Simulation & import and preprocessing

Comprehensive machine learning, data representation and feature recovery

nested cross-validation for hyperparameter optimizationdeep learningclustering and similarity metrics

multi-label classification

feature recovery

Scalable and modular platform on cloud infrastructurepublic instance available at immuneml.org

simulation of antigen-specific sequences and diseased repertoiresimport from Adaptive Biotechnologies, MiXCR, VDJdb, AIRR-compliant formats

preprocess immune data

work with immune repertoires for disease status prediction and confounder analysis

paired chain and single chain receptor data for antigen binding prediction

immuneML

GGP

encodings and embeddings and

and

Page 12: Analysis of adaptive immune receptors and repertoires and ...€¦ · Immune receptors are natural diagnostics and therapeutics • Immune cells use immune receptors on their surface

immuneML facilitates machine learning applications in the immune receptor domain

• ImmuneML provides:

• Multiple predefined workflows (hyperparameter optimization, fit / use trained ML method, data splitting and simulation, exploratory analysis on data)

• ML methods for classification (disease status and antigen binding prediction)

• Different data representations

• Integration with popular tools for immune data analysis through the Galaxy framework (iReceptor, MiXCR, immuneSim, immunoProbs)

data_reports optimal_model_reports

performance_reports

model_reports

data_split_reports

data_split_reports report report reportreport report report

M1.1 M2.1 M3.1

M1.2 M2.2 M3.2

M1.3 M2.3 M3.3

M1.4 M2.4 M3.4

M1.5 M2.5 M3.5

M1.1 M2.1 M3.1

M1.2 M2.2 M3.2

M1.3 M2.3 M3.3

M1.4 M2.4 M3.4

M1.5 M2.5 M3.5

Scenario 1: nested cross-validation with defined report points

test

test

test

test

test

val

val

val

val

val

M1.1 M2.1 M3.1

M1.2 M2.2 M3.2

M1.3 M2.3 M3.3

M1.4 M2.4 M3.4

M1.5 M2.5 M3.5

nr a it

hyperparameters of the model

mod

elsm

odel

sel

ectio

n cr

oss-

valid

atio

n fo

lds

inner cross-validation loop: model selection

outer cross-validation loop: model assessment

performance 5

performance 4

performance 3

performance 2

performance 1

task performance

Scenario 2: repeated holdout with random splits (Monte Carlo cross-validation)

M1.1 M2.1 M3.1

M1.2 M2.2 M3.2

M1.3 M2.3 M3.3

M1.4 M2.4 M3.4

M1.1 M2.1 M3.1

M1.2 M2.2 M3.2

M1.3 M2.3 M3.3

M1.4 M2.4 M3.4

test

test

test

test

val

val

val

val

M1.1 M2.1 M3.1

M1.2 M2.2 M3.2

M1.3 M2.3 M3.3

M1.4 M2.4 M3.4

train optimal models

hyperparameters of the model

mod

elsm

odel

sel

ectio

n tra

inin

g /

valid

atio

n sp

lits

inner holdout loop: model selection

outer repeated holdout loop: model assessment

performance 5

performance 4

performance 3

performance 2

performance 1

task performancetest

report

report

report

report

report

report report report

report report reportreport report report

report report report

report report reportreport report report

report report report

report report reportreport report report

report report report

report report reportreport report report

report report report

report

report

report

report

report

report report

report

report

report

report

report

report

report

report

report

report

report

optimal models

Hyperparameter optimization workflow

immuneML

Page 13: Analysis of adaptive immune receptors and repertoires and ...€¦ · Immune receptors are natural diagnostics and therapeutics • Immune cells use immune receptors on their surface

immuneML facilitates machine learning applications in the immune receptor domain

• ImmuneML is available as:

• A web tool through integration with Galaxy

• A command line tool with domain-specific language for analysis definition

• Python library

definitions: datasets: d1: metadata: "./metadata.csv" format: MiXCR encodings: e1: type: KmerFrequency params: { k: 3 } e2: type: Word2Vec params: {vector_size: 16, context: sequence} ml_methods: log_reg1: type: LogisticRegression params: { C: 0.001 } reports: r1: { type: SequenceLengthDistribution } preprocessing_sequences: seq1: - filter_chain_B: type: DatasetChainFilter params: {keep_chain: A}instructions: HPOptimization: settings: - preprocessing: seq1 encoding: e1 ml_method: log_reg1 - preprocessing: [] encoding: e2 ml_method: log_reg1 assessment: split_strategy: random split_count: 1 training_percentage: 70 selection: split_strategy: k-fold split_count: 5 reports: data_splits: [r1] labels: [CD] dataset: d1 strategy: GridSearch metrics: [accuracy, f1_micro]

image credit: https://usegalaxy.org/

immuneML

Page 14: Analysis of adaptive immune receptors and repertoires and ...€¦ · Immune receptors are natural diagnostics and therapeutics • Immune cells use immune receptors on their surface

immuneML applications

• To show immuneML’s capabilities, we demonstrate:

• Replication of one of the largest studies for disease state prediction

• Antigen binding prediction from paired chain data

• Benchmarking results with simulated datasets and overview of models and representations

immuneML

Emerson et al 2017

Page 15: Analysis of adaptive immune receptors and repertoires and ...€¦ · Immune receptors are natural diagnostics and therapeutics • Immune cells use immune receptors on their surface

Use-case: predicting T1D status with immuneML

• Type 1 Diabetes (T1D) status prediction from 1600 samples (456 T1D patients, 762 first degree relatives, 76 second degree relatives and 224 controls)

• Age and genetic background are the most significant confounding factors

• Goal:

• Build a classifier to predict the disease state

• Recover distinctive features on which the classification is based

Brusko lab, University of Florida

Page 16: Analysis of adaptive immune receptors and repertoires and ...€¦ · Immune receptors are natural diagnostics and therapeutics • Immune cells use immune receptors on their surface

Acknowledgements

Lonneke SchefferAndrei Slabodkin