Top Banner
Applicability Domain Towards a more formal definition Thierry Hanser Research Leader [email protected]
51

Applicability Domain - Towards a More Formal Definition ...

Mar 26, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Applicability Domain - Towards a More Formal Definition ...

Applicability Domain

Towards a more formal definition

Thierry Hanser

Research Leader

[email protected]

Page 2: Applicability Domain - Towards a More Formal Definition ...

Can we trust a specific individual prediction?

83% accuracy

Screening Risk assement

Page 3: Applicability Domain - Towards a More Formal Definition ...

Can we trust a specific individual prediction?

83% accuracy

x 1000000

Screening Risk assement

Page 4: Applicability Domain - Towards a More Formal Definition ...

Can we trust a specific individual prediction?

83% accuracy

x 1000000

Screening Risk assement

Page 5: Applicability Domain - Towards a More Formal Definition ...

Can we trust a specific individual prediction?

83% accuracy

x 1000000

Screening Risk assement

x 1

Page 6: Applicability Domain - Towards a More Formal Definition ...

Can we trust a specific individual prediction?

83% accuracy

x 1000000

Screening Risk assement

x 1

Page 7: Applicability Domain - Towards a More Formal Definition ...

Can we trust a specific individual prediction?

83% accuracy

x 1000000

Screening Risk assement

Indivitual prediction accuracy estimateGlobal model accuracy estimate

x 1

Page 8: Applicability Domain - Towards a More Formal Definition ...

Articulation of the method

• Applicability domain is not a monolithic concept, there are 3 key layers

• Separation of concern can help clarify and formalise the notion of AD

• Purpose: Initiate a constructive discussion among our QSAR community to

build a common understanding together

• Harmonize the way we define and present AD to the end users across models

and applications

• Remove confusion for the end user and improve the value of our AD model

Page 9: Applicability Domain - Towards a More Formal Definition ...

Setubal workshop report : Jaworska, J. S.; Comber, M.; Auer, C.; Van Leeuwen, C. Environ. Health Perspect. 2003, 111, 1358−1360

Guidance Document on the Validation of (Quantitative) Structure− Activity Relationship QSAR Models; OECD Series on Testing and Assessment No.69; OECD Environment Directorate, Environment, Health and Safety Division: Paris, 2007

Current understanding and definitions

Common definition2

“AD is the response and chemical structure space in which the model makes predictions with a given reliability”.

• A defined endpoint

• An unambiguous algorithm

• A defined domain of applicability• Appropriate measures of goodness-of–fit, robustness and predictivity

• A mechanistic interpretation, if possible

QSAR principles1

Page 10: Applicability Domain - Towards a More Formal Definition ...

Current understanding and definitions

Boundaries

ReliabilityApplicability

Common definition2

“AD is the response and chemical structure space in which the model makes predictions with a given reliability”.

• A defined endpoint

• An unambiguous algorithm

• A defined domain of applicability• Appropriate measures of goodness-of–fit, robustness and predictivity

• A mechanistic interpretation, if possible

QSAR principles1

Likelihood ?

Page 11: Applicability Domain - Towards a More Formal Definition ...

A good fundation to build on

Mathea M, Klingspohn W, Baumann K. Chemoinformatic Classification Methods and their Applicability Domain. Mol Inf. 2016 May 1;35(5):160–80.

Gadaleta D, Mangiatordi GF, Catto M, Carotti A, Nicolotti O. Applicability Domain for QSAR Models:: Where Theory Meets Reality. International Journal of Quantitative Structure-Property Relationships. 2016 Jan;1(1):45–63.

Norinder U, Rybacka A, Andersson PL. Conformal prediction to define applicability domain – A case study on predicting ER and AR binding. SAR and QSAR in Environmental Research. 2016 Apr 2;27(4):303–16.

Toccacheli P, Nouretdinov I, Gammerman A. Conformal Predictors for Compound Activity Prediction. arXiv:160304506 [cs] [Internet]. 2016 Mar 14 [cited 2016 May 11]; Available from: http://arxiv.org/abs/1603.04506

Roy K, Kar S, Ambure P. On a simple approach for determining applicability domain of QSAR models. Chemometrics and Intelligent Laboratory Systems. 2015 Jul 15;145:22–9.

Sheridan RP. The Relative Importance of Domain Applicability Metrics for Estimating Prediction Errors in QSAR Varies with Training Set Diversity. J Chem Inf Model. 2015 Jun 22;55(6):1098–107.

Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, et al. QSAR Modeling: Where have you been? Where are you going to? J Med Chem. 2014 Jun 26;57(12):4977–5010.

Carrió P, Pinto M, Ecker G, Sanz F, Pastor M. Applicability Domain Analysis (ADAN): A Robust Method for Assessing the Reliability of Drug Property Predictions. J Chem Inf Model. 2014 May 27;54(5):1500–11.

Toplak M, Močnik R, Polajnar M, Bosnić Z, Carlsson L, Hasselgren C, et al. Assessment of Machine Learning Reliability Methods for Quantifying the Applicability Domain of QSAR Regression Models. J Chem Inf Model. 2014 Feb 24;54(2):431–41.

Dragos H, Gilles M, Alexandre V. Predicting the Predictability: A Unified Approach to the Applicability Domain Problem of QSAR Models. J Chem Inf Model. 2009 Jul 27;49(7):1762–76.

And many more...

Page 12: Applicability Domain - Towards a More Formal Definition ...

Current common methods

Molecule classes

• Organic-Organometalic-Inorganic

• Class of molecules (Arom. Amines)

Feature representation

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

Distance based methods

• Distance to data points

• Density

Response domain

Page 13: Applicability Domain - Towards a More Formal Definition ...

Current common methods

Molecule classes

• Organic-Organometalic-Inorganic

• Class of molecules (Arom. Amines)

Feature representation

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

Distance based methods

• Distance to data points

• Density

Response domainNo boronic acids in the training set

S

OH

NH

Page 14: Applicability Domain - Towards a More Formal Definition ...

Current common methods

Molecule classes

• Organic-Organometalic-Inorganic

• Class of molecules (Arom. Amines)

Feature representation

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

Distance based methods

• Distance to data points

• Density

Response domain

Nearest Neighbours

Random forest

Like

lihoo

dLi

kelih

ood

Page 15: Applicability Domain - Towards a More Formal Definition ...

Current common methods

Molecule classes

• Organic-Organometalic-Inorganic

• Class of molecules (Arom. Amines)

Feature representation

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

Distance based methods

• Distance to data points

• Density

Response domain

Box

Page 16: Applicability Domain - Towards a More Formal Definition ...

Current common methods

Molecule classes

• Organic-Organometalic-Inorganic

• Class of molecules (Arom. Amines)

Feature representation

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

Distance based methods

• Distance to data points

• Density

Response domain

Convex hull

Page 17: Applicability Domain - Towards a More Formal Definition ...

Current common methods

Molecule classes

• Organic-Organometalic-Inorganic

• Class of molecules (Arom. Amines)

Feature representation

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

Distance based methods

• Distance to data points

• Density

Response domain

Distance to data

Distance to data points

Page 18: Applicability Domain - Towards a More Formal Definition ...

Current common methods

Molecule classes

• Organic-Organometalic-Inorganic

• Class of molecules (Arom. Amines)

Feature representation

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

Distance based methods

• Distance to data points

• Density

Response domain

Descriptor density

Density of data

Page 19: Applicability Domain - Towards a More Formal Definition ...

Current common methods

Molecule classes

• Organic-Organometalic-Inorganic

• Class of molecules (Arom. Amines)

Feature representation

• Unseen features

Agreement based

• RF consensus

• kNN

Descriptor ranges

• Box

• Convex hull

Distance based methods

• Distance to data points

• Density

Response domain

Predicted property

Dis

trib

utio

n

Page 20: Applicability Domain - Towards a More Formal Definition ...

Mixture of different concepts

Reliability(is the prediction reliable?)

Applicability(can I use this model to make a prediction ?)

Decidability(can I make a clear decision)

Like

lihoo

d

Page 21: Applicability Domain - Towards a More Formal Definition ...

Mixture of different concepts

Reliability(is the prediction reliable?)

Applicability(can I use this model to make a prediction?)

Decidability(can I make a clear decision)

Page 22: Applicability Domain - Towards a More Formal Definition ...

Mixture of different concepts

Applicability Domain

Page 23: Applicability Domain - Towards a More Formal Definition ...

Towards an extended and more formal framework

My model can be applied for this query compound

The prediction is reliable enough for my use case

I can make a clear decision

Applicabilitydomain

Reliabilitydomain

Decidabiitydomain

Confidence in the prediction if ...

Page 24: Applicability Domain - Towards a More Formal Definition ...

Applicability (of the model)

Can I apply my model ?

Perform the predictionOut of Applicability

domain

ApplicabiltyModel boundaries

(Designer's specifications)

• Is the class of my query compound supported by the model ?e.g. exclude polymers, proteins, inorganic molecules, etc.

• Is my query compound in the range of the descriptor of the training set ?e.g. inside convex hull, minimum information density

• Did my model see all the structural features present in the query compound ?e.g. not in domain, contains unseen boronic acid functional group

Page 25: Applicability Domain - Towards a More Formal Definition ...

Reliability (of the prediction)

Can I apply my model ?

Perform the prediction

Can I trust my prediction ?

Look at the predictionOut of Reliability domain

ReliabilityPrediction boundaries

(User defined )

• How close are the nearest neighbours ?

• How reliable are these nearest data points ?e.g. GLP compliance

• How well did my model predict these data points ? e.g. performance during CV

Out of Applicability

domain

Page 26: Applicability Domain - Towards a More Formal Definition ...

Decidability (of the outcome)

• Does my evidence converge or conflict ?e.g. k Nearest Neightbours distribution

• Is there a consensus between intermediate conclusions ? e.g. RF tree distribution

• Is my posterior likelihood strong enough ?e.g. Naïve Bayes posterior probability

Can I apply my model ?

Perform the predictionOut of applicability

domainCan I trust my prediction ?

Look at the predictionOut of Reliability domain

Can I make a clear call ?

Equivocalor

Undecided

Make a statement

DecidabilityLikelihood boundaries

(User defined)

Page 27: Applicability Domain - Towards a More Formal Definition ...

Implementation

Can I apply my model ?

Perform the predictionOut of applicability

domainCan I trust my prediction ?

Look at the predictionOut of Reliability domain

Can I make a clear call ?

Equivocalor

Undecided

Make a statement

Calibrate (Conformal analysis based on likelihood)

Use local model performance

Calibrate (Null hypothesis)

Rule driven

Page 28: Applicability Domain - Towards a More Formal Definition ...

Confidence

Predictability

Reliability

Confidence(subjective)

Applicability

Page 29: Applicability Domain - Towards a More Formal Definition ...

Confidence in the decision

Low reliability High reliability

Page 30: Applicability Domain - Towards a More Formal Definition ...

Confidence in the decision

Low decidability

High decidability

Page 31: Applicability Domain - Towards a More Formal Definition ...

Confidence in the decision

Page 32: Applicability Domain - Towards a More Formal Definition ...

Confidence in the decision

We can't trust this high

likelihood !

Outside the Reliability Domainc

Outside the Reliability Domain

Can I trust my prediction ?

Page 33: Applicability Domain - Towards a More Formal Definition ...

Confidence in the decision

Confident call

Difficult call(activity cliff)

Equivocal / UndecidedOutside the Reliability Domain

Outside the Reliability Domain

Can I make a clear call ?

Page 34: Applicability Domain - Towards a More Formal Definition ...

Reliability and Likelihood are different concepts

Reliability and Likelihood are not interconvertibleThey can't compensate for each other

(e.g. low likelihood can't be compensated for by high reliability)

Page 35: Applicability Domain - Towards a More Formal Definition ...

Intuitive, non ambigous and formal decision framework

Can I apply my model ?

Perform the prediction

Is my predictionreliable enough?

Look at the predictionOut of reliability domain

Equivocal / Undecided

Make a statement

Can I make a clear call ?

Out of applicability

domain

Applicabilty

Reliability

Decidability

Page 36: Applicability Domain - Towards a More Formal Definition ...

Intuitive, non ambigous and formal decision framework

Can I apply my model ?

Perform the prediction

Is my predictionreliable enough?

Look at the predictionOut of reliability domain

Equivocal / Undecided

Make a statement

Can I make a clear call ?

Out of applicability

domain

Decisiondomain

Page 37: Applicability Domain - Towards a More Formal Definition ...

Decision Domain

Decision Domain

“The Decision Domain (DC) is the scope in which it is possible to apply the model to make a non equivocal decision based on a reliable prediction”.

Proposal: Extend and refine the Applicability Domain into a Decision Domain

Page 38: Applicability Domain - Towards a More Formal Definition ...

Transparency decision process

Decisiondomain

My model can be applied for this query compound

The prediction is reliable enough for my use case

I can make a clear decision

Applicability

Reliability

Decidabiity

Page 39: Applicability Domain - Towards a More Formal Definition ...

Transparency decision process

I know which are the data (examples) that support the model's conclusion

I understand which assumptions the model has used in its reasoning Decision

support

Interpretation

Support

Decisiondomain

My model can be applied for this query compound

The prediction is reliable enough for my use case

I can make a clear decision

Applicability

Reliability

Decidabiity

Page 40: Applicability Domain - Towards a More Formal Definition ...

Transparency decision process

TransparentDecision

Decisionsupport

Interpretation

Support

Decisiondomain

My model can be applied for this query compound

The prediction is reliable enough for my use case

I can make a clear decision

Applicability

Reliability

Decidabiity

I know which are the data (examples) that support the model's conclusion

I understand which assumptions the model has used in its reasoning

Page 41: Applicability Domain - Towards a More Formal Definition ...

TARDIS principle

ransparency (of the method)pplicability (of the model)eliability (of the prediction)ecidability (non equivocal result)nterpretability (of the result)upport (evidence supporting the result)

(TARDIS principle)

TARDIS

Requirements to usefully support a decision process :

Page 42: Applicability Domain - Towards a More Formal Definition ...

Conclusion

• In the context of risk assesment, expert and statistical models are tools to

support a decision process.

• These models should provide transparent and interpretable conclusions

that can be easily assessed by a human expert using his own knowledge.

• Finally it is the human experts that make the decision based on their

knowledge and the helpful information provided by the tools.

Page 43: Applicability Domain - Towards a More Formal Definition ...

Lhasa Limited

Granary Wharf House, 2 Canal WharfLeeds, LS11 5PS

Registered Charity (290866)

Company Registration Number 01765239

+44(0)113 394 6020

[email protected]

www.lhasalimited.org

Thank you

Page 44: Applicability Domain - Towards a More Formal Definition ...

Confidence vs Observed accuracy

Likelihood

Obs

erve

d ac

cura

cy

Accuracy ∼ Confidence ?

Page 45: Applicability Domain - Towards a More Formal Definition ...

Confidence vs Observed accuracy (current)

LikelihoodReliability

RF, SVM, etc.

Local CV accuracy (kNN)

Polynomial Fitting (Sarah)

Confidence descriptors

Training setdataset

Individual prediction accuracy estimate

LikelihoodReliability

Null hypothesis (p-value)

Conformal prediction (target accuracy)

Confidence metric + Desired accuracy A%

Prediction within A% or no prediction

Calibrationdataset

Meta model Calibration

At what level I can trust my prediction Can I trust my prediction to a given level ?

Page 46: Applicability Domain - Towards a More Formal Definition ...

Confidence vs Observed accuracy (proposed)

LikelihoodReliability

Confidence

Reliability and Likelihood are not interchangeableThey can't compensate each other

(low likelihood can't be compensated for by high reliability)

C ∼ R ⋅ L

C ∼ R ⋅ L

Page 47: Applicability Domain - Towards a More Formal Definition ...

Confidence model (proposed)

LikelihoodReliability

Reliability Likelihood

Confidence

Noise estimate Accuracy Estimate

Noise

Page 48: Applicability Domain - Towards a More Formal Definition ...

Confidence model (proposed)

Obs

erve

d a

ccur

acy

Low Reliability

High RMSE

Likelihood

Page 49: Applicability Domain - Towards a More Formal Definition ...

Confidence model (proposed)

Low RMSE

High Reliability

Obs

erve

d a

ccur

acy

Likelihood

Page 50: Applicability Domain - Towards a More Formal Definition ...

Confidence model (proposed)

ReliabilityLikelihood

Obs

erve

d ac

cura

cy

RM

SE e

rror

Use case defined threshold"...with a given reliability"

Which level of likelyhood ~ accuracy fitting do I need for my use case ?

If the reliability of the prediction meets this criteria then I can use the likelihood as an accuracy metric

Captured during cross validation

Meta model

Likelihood ~ Accuracy estimate

Page 51: Applicability Domain - Towards a More Formal Definition ...

Confidence vs Observed accuracy (proposed)O

bser

ved

accu

racy

Likelihood is the meta model

Meta model Calibration using Conformal Prediction

Likelihood

becomes the non-conformity metric

(under the requested reliability levelassumption)

Conformitywiththe

calibration dataset

Required accuracy

A %Prediction

with at least an accuracy

of A%

Conformal prediction framework

J. Chem. Inf. Model., 2014, 54 (6), pp 1596–1603 "Introducing Conformal Prediction in Predictive Modeling. A Transparent and Flexible Alternative to Applicability Domain Determination" Ulf Norinder, Lars Carlsson, Scott Boyer, and Martin Eklund

Likelihood

Likelihood ~ Accuracy estimate