Top Banner
2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for the Screening Process C.A. Nicolaou 1 , D.A. Kleier 2 , T.K. Brunck 1 , P.A. Bacha 1 1 Bioreason, Inc., 121 Sandoval St., Suite 220, Santa Fe, NM, USA 2 DuPont Agricultural Products, Stine-Haskell Research Center, Newark, DE, USA
35

Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

May 25, 2018

Download

Documents

vanthuan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Automated Decision Support for the Screening Process

C.A. Nicolaou1, D.A. Kleier2, T.K. Brunck1, P.A. Bacha1

1Bioreason, Inc., 121 Sandoval St., Suite 220, Santa Fe, NM, USA2DuPont Agricultural Products, Stine-Haskell Research Center, Newark,

DE, USA

Page 2: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Outline

• Decision Support for LeadOptimization– Focus on SAR extraction

• Definitions and Goals • Bioreason Approach

– Fundamentals– RTableGenerator and SARXtractor

• Examples

Page 3: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Decision Support for LeadOptimization

• LeadOptimization:– Fine tuning lead compounds to increase potency

and remove undesired biological properties• Focus on SAR extraction step

– An effort to understand mechanisms/interactions– Suggest optimization paths– Enable synthesis/acquisition of compounds with

highest probability of generating further crucial knowledge

– Task performed by highly trained experts• Wrong decisions can be very-very costly

– Time, resources, …

Page 4: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Definitions

• What is SAR/SPR? Ask a chemist…– “I know it when I see it”– “I definitely don’t want to see a table of numbers –

I need a scaffold and R-groups to work with” • It is usually reported in the form of R-tables

on a given scaffold• Ideally it is reduced to a set of rules

– Rules can be targeted for use with an expert system engine or on their own

– Rules can be accumulated into a knowledge base

Page 5: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Example: SAR Information on 3o Amines

N

Rvariation of R allowedto retain activity 3

dim ethylsubstitutionallowed methyl = inactive

propynyl a llowed

Et, i-Pr = inactive

EW D groups = inactive

sensitive to stericbulk and EWD groups

What a chemist would like to see:

Page 6: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Goals

• Correctness• Completeness• Easily interpretable/usable results

– As close to the form described by experts– Meaningful descriptors

• Usability– User friendly– Capable of handling screening datasets

generated in modern drug discovery environment

Page 7: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Bioreason Fundamentals

• Technical approach– Learn what is important directly from the data

• Assumptions/predefined knowledge kept to a minimum

– Form structural classes• Unsupervised, solely based on molecular graphs

– Reason with the classes• Overlay activities, other biological attributes• Characterize & prioritize classes• Build models, etc…

Page 8: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Overview of the classes

Page 9: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Types of Classes

Defined Rings Rings with Variable Closures

Page 10: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Subclass

Parent Class Subclasses

Page 11: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Compounds in classes

Page 12: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

SAR Extraction Approach

• Global SAR models – Few rules, sometimes quite hard to interpret

• Local models on Bioreason classes?– First, learn good classes

• The scaffolds are the first part of SAR– Generate R-tables– Choose an appropriate class of descriptors

• Characterize R-groups – From the R-tables and descriptors…

• Construct models• Extract rules

Page 13: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Generating R-Tables

• For classes without variation in activity…– No need for R-table – Scaffold is indicative/predictive of activity

• For mixed classes…– Using scaffold as a starting point automatically

learn R-groups– Relate R-groups to available compound

activity/property attributes.• Not an easy problem!

– Enabling user interaction to change alignment

Page 14: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Automatic R-table Examplefrom acute-toxicity data set

Page 15: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Discovering SAR

• Calculate appropriate SAR descriptors– Position-Specific-Descriptors(PSD) to

characterize R-groups • Construct models

– From R-tables and PSD set of each class• Interpret models

– Extract concise, meaningful rules

Page 16: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Position-Specific-Descriptors

• Physicochemical– Rgroup_Property– LogP, molecular weight, and number of H-bond

donors, H-bond acceptors, charges, rings, and rotatable bonds, polar surface area, basic sites, …

– Example: R3_logP == 0.75• Pseudo-3D pharmacophores

– Rgroup_number of bonds_pharmacophore point – H-bond donor, H-bond acceptor, anion, cation, polar,

hydrophobe, aromatic ring, and aliphatic ring– Example: R4_3_HBA: Yes

• All descriptors learned for each class modeled

Page 17: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

SAR Extraction Algorithm

• Multiple Domain– compound can appear in

more than one child node– compound can contribute to

more than one rule• Multiple Splitting

– parent node can have more than two children

– extract as much SAR at each level as is statistically meaningful

– number of splits controlled by statistical means, e.g. parent-child chi2 cutoff (0.7)

DatasetAvg. Prop. 4.2

R4_3_HBA: YesR10_MW>31

R3_logP inrange(0.45-0.75)

R11_2_HYD: Yes… …

Page 18: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

SAR Tree Interpretation

• Each node in this type of decision tree is a rule or hypothesis easily to express in English

• Each hypothesis has– Indication of certainty (statistical)– Feature name/range (e.g. logP between x and y)– Support (number of examples)– List of examples

• Rules with multiple elements are possible – Aggregate certainty terms

Page 19: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

SAR Extraction Example

Page 20: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Analysis of Commercial Pesticides

• Source of compounds– The pesticide manual: A world compendium– Published by the British crop protection council

• Types of activity considered– Herbicide, insecticide, fungicide, plant growth

regulation– Binary indicator variables used for type of activity

• Task: – Identify scaffolds associated with herbicidal activity

& features that distinguish herbicides from non-herbicides within the same class

®

Page 21: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Diphenylether subclass is evenly

distributed between herbicides and non-herbicides. What

substituent features distinguish the

herbicides?

OR1

R4

R2 R7

R5

R3R6

Page 22: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

R-Table sorted by herbicide activity and displayed at

cutoff between herbicides and non-herbicides

Page 23: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

If R5 has a HBA located 2 bonds from the

scaffold,then probability of

activity is 95% (cf. 47% for class as a whole)

with a certainty of 1.00.

Page 24: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

If R5 PSA within the range of 33.97 to 65.16,

then probability of herbicidal activity is

100% (cf. 47% for class as a whole)

with a certainty of 1.00.

Page 25: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

This rule cleanly differentiates

pyrethroid insecticides from

diphenyl ether herbicides

If R3 has an AlR center located 5 bonds from

the scaffold,probability of herbicidal

activity is 0% with a certainty of 1.00.

Page 26: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Peptide Deformylase Inhibitors

ClassPharmer™SAR extraction & pharmacophore

perception

®

Page 27: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Learning SAR Rules for Inhibitors of Peptide Deformylase (PDF)

• Training set of 22 mostly Beta-sulfinylhydroxamates– Reference: Apfel, et al., J. Med. Chem., 43,

2324(2000)• Compounds classified & characterized by

MCS using ClassPharmer™ technology• R-Tables generated for each class• QSARs learned for each class

Page 28: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

Training Set of Hydroxamic Acids1 2.22 (0)

NOH

O NH

O

NH

O OH

16 1.96 (0)

S

OO

O

NHOHNHO

14 1.51 (0)

S

OO

O

NHOHO

5 1.46 (0)

S

OO

O

NHOH

9 1.03 (0)

S

OO

O

NHOHO

7 0.80 (0)

S

OO

O

NHOH

12 0.72 (0)

S

OO

O

NHOH

8 -0.04 (0)

S

OO

O

NH OH

O

O

2 -1.34 (0)

S

OO

O

NHOH

Page 29: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Classification & R-grouping by ClassPharmer™

NO

O

R3

SO

Bx1

A

R1R2

pIC50 R116 1.95860731484

O

*H

*

NHO a

b

13 1.63827216398

O

*H

* ab

8 -0.0413926851582

O

*H

OO

*

a

b

2 -1.34242268082

O

*H

*

a

b

R2 R3 X1

1.96

1.64

-0.04

-1.34

Cpd ID

16

13

8

2

Page 30: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

8o 342 1.96 (1)

SO

OO

NHOHNHO

8h 295 1.03 (1)

S

OO

O

NHOHO

9a 269 1.00 (1)

S

O

O

NHOH

8e 299 0.85 (1)

S

OO

O

NHOH

9e 348 0.68 (1)

SO

O

NHOHBr

ClassPharmer™ Rule for Desirable R3 Groups

If R3(MW) in range of 50 to 74, the probability of activity is significantly enhanced

92% of CompdsSatisfying the

premise are active

67% of Compounds in class are active

Page 31: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

4 0.80 (1)

SO

OO

NHOH

7 0.80 (1)

S

OO

O

NHOH

8 -0.04 (1)

SO

OO

NH OH

OO

3 -0.48 (1)

S

OO

O

NHOH

2 -1.34 (1)

S

OO

O

NHOH

Obverse Rule for Undesirable R3 Groups

If R3(MW) outside of range of 50 to 74, the probability of activity is significantly decreased

All Compounds Satisfying the

premise are inactive

Page 32: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

R3 pocket in active site of (PDF)Ni(II)

CGG49 in active site of E. coli Ni-PDF (Roche) Apfel, et al. J. Med. Chem. 2000, 43, 2324-2331

OHNHO

S OO

R3 = nBu

Page 33: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Future Directions

• Expand position specific descriptors types– ADME/Tox analysis– Electronic

• Rule Synopsis

• Mine info across screens, libraries, time

Page 34: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Acknowledgements

• Bioreason– Terence K. Brunck– Pat Bacha– Suzanne Sloan

• DuPont– Dan A. Kleier– A number of forward thinking and very

patient scientists

Page 35: Automated Decision Support for the Screening …cisrg.shef.ac.uk/shef2004/talks/CNicolaou.pdf2004 Sheffield Chemoinformatics Conference, April 21-23 Automated Decision Support for

2004 Sheffield Chemoinformatics Conference, April 21-23

Getting Close…

N

Rvariation of R allowedto retain activity 3

dim ethylsubstitutionallowed methyl = inactive

propynyl a llowed

Et, i-Pr = inactive

EW D groups = inactive

sensitive to stericbulk and EWD groups