Active Learning Active Learning Strategies for Strategies for Compound Screening Compound Screening Megon Walker Megon Walker 1 and Simon Kasif and Simon Kasif 1,2 1,2 1 Bioinformatics Program, Boston University Bioinformatics Program, Boston University 2 Department of Biomedical Engineering, Boston Department of Biomedical Engineering, Boston University University 229 229 th th ACS National Meeting ACS National Meeting March 13-17, 2005 March 13-17, 2005 San Diego, CA San Diego, CA
Active Learning Strategies for Compound Screening. Megon Walker 1 and Simon Kasif 1,2 1 Bioinformatics Program, Boston University 2 Department of Biomedical Engineering, Boston University 229 th ACS National Meeting March 13-17, 2005 San Diego, CA. Outline. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Active Learning Active Learning Strategies for Strategies for
Compound Compound ScreeningScreeningMegon WalkerMegon Walker11 and Simon Kasif and Simon Kasif1,21,2
11Bioinformatics Program, Boston UniversityBioinformatics Program, Boston University22Department of Biomedical Engineering, Boston Department of Biomedical Engineering, Boston
UniversityUniversity
229229thth ACS National Meeting ACS National MeetingMarch 13-17, 2005March 13-17, 2005
San Diego, CASan Diego, CA
OutlineOutline
Introduction to active learning for compound Introduction to active learning for compound screeningscreening
Objectives and performance criteriaObjectives and performance criteria Algorithms and proceduresAlgorithms and procedures Thrombin dataset resultsThrombin dataset results Preliminary conclusionsPreliminary conclusions
Introduction: Introduction: drug discoverydrug discovery
drug discovery is drug discovery is an iterative an iterative processprocess
goal: to identify goal: to identify many target many target binding binding compounds with compounds with minimal screening minimal screening iterationsiterations
input: data set with positive and negative input: data set with positive and negative examplesexamples
output: a classifier such that for each example output: a classifier such that for each example = 1 if example is positive= 1 if example is positive = -1 if example is negative= -1 if example is negative
x
standard learningstandard learning classifier trains on classifier trains on
a static training seta static training set train, then testtrain, then test
active learningactive learning classifier chooses data classifier chooses data
points for training setpoints for training set classifer “requests” labelsclassifer “requests” labels iterative rounds of training iterative rounds of training
and testingand testing
( )o x
( )o x
Introduction:Introduction:active learning & compound active learning & compound
screeningscreening
Mamitsuka Mamitsuka et al. Proceedings of the Fifteenth International Conference on et al. Proceedings of the Fifteenth International Conference on Machine Learning,Machine Learning, 1998:1-9. 1998:1-9.
Warmuth Warmuth et al. J. Chem Inf. Comput. Sci.et al. J. Chem Inf. Comput. Sci. 2003, 43: 667-673. 2003, 43: 667-673.
1st query 2nd queryFeaturesFeatures A/IA/I
Com
pou
nd
sC
om
pou
nd
s
train classifier # 1train classifier # 1 II
train classifier # 2train classifier # 2 AA
NOT labeledNOT labeled
??
??
??
??
??
??
testtest ??
??
FeaturesFeatures A/IA/I
Com
pou
nd
sC
om
pou
nd
s
train classifier # 1train classifier # 1
II
AA
AA
train classifier # 2train classifier # 2
II
AA
II
NOT labeledNOT labeled ??
??
testtest ??
??
ObjectivesObjectives
exploitationexploitation Hit PerformanceHit Performance
Enrichment Factor Enrichment Factor (EF)(EF)
explorationexploration Accurate model of Accurate model of
Sample Selection:- P(active)- uncertainty- density
w��������������
Methods: Methods: classifier classifier
committeescommittees
baggingbagging: uniform sampling : uniform sampling distribution distribution
boostingboosting: compounds misclassified : compounds misclassified by classifier #1 more likely by classifier #1 more likely resampled by classifier #2resampled by classifier #2
FeaturesFeatures A/IA/I
Com
pou
nd
sC
om
pou
nd
s
train classifier # 1train classifier # 1
II
AA
AA
train classifier # 2train classifier # 2
II
AA
II
NOT labeledNOT labeled ??
??
testtest ??
??
Start
Input data files
Pick training and testing data for next round of cross validation
1st batch?
Query training set batch labels
Train classifier committee on labeled training set subsamples
Sample Selection:- P(active)- uncertainty- density
Methods: weighted Methods: weighted votingvoting
weighted vote of all weighted vote of all classifiers predicts classifiers predicts compound activity labelcompound activity labelperceptron output x perceptron weight
Start
Input data files
Pick training and testing data for next round of cross validation
1st batch?
Query training set batch labels
Train classifier committee on labeled training set subsamples
active with highest probability by the active with highest probability by the committeecommittee
uncertaintyuncertainty: select compounds on which : select compounds on which the committee disagrees most stronglythe committee disagrees most strongly
density with respect to activesdensity with respect to actives: select : select compounds most similar to previously compounds most similar to previously labeled or predicted activeslabeled or predicted actives Tanimoto similarity metricTanimoto similarity metric
given compound bitstrings A and Bgiven compound bitstrings A and B a = # bits on in Aa = # bits on in A b = # bits on in Bb = # bits on in B c = # bits on in both A and Bc = # bits on in both A and Bc
(a+b-c)
Start
Input data files
Pick training and testing data for next round of cross validation
1st batch?
Query training set batch labels
Train classifier committee on labeled training set subsamples
Sample selectionSample selection Bag vs. boostBag vs. boost Committee vs. single classifierCommittee vs. single classifier Testing set sensitivityTesting set sensitivity Trade off: exploration and exploitationTrade off: exploration and exploitation