Automated Parameter Automated Parameter Setting Based on Setting Based on Runtime Prediction: Runtime Prediction: Towards an Instance-Aware Towards an Instance-Aware Problem Solver Problem Solver Frank Hutter, Univ. of British Columbia, Vancouver, Canada Youssef Hamadi, Microsoft Research, Cambridge, UK
35
Embed
Automated Parameter Setting Based on Runtime Prediction: Towards an Instance-Aware Problem Solver Frank Hutter, Univ. of British Columbia, Vancouver, Canada.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Automated Parameter Automated Parameter Setting Based on Runtime Setting Based on Runtime
Prediction:Prediction:
Towards an Instance-Aware Towards an Instance-Aware Problem SolverProblem Solver
Frank Hutter, Univ. of British Columbia, Vancouver, CanadaYoussef Hamadi, Microsoft Research, Cambridge, UK
Motivation(1): Motivation(1): Why automated parameter setting Why automated parameter setting ??• We want to use the best available heuristic for a We want to use the best available heuristic for a
problemproblem– Strong domain-specific heuristics in tree searchStrong domain-specific heuristics in tree search
• Domain knowledge helps to pick good heuristicsDomain knowledge helps to pick good heuristics• But maybe you don‘t know the domain ahead of time ...But maybe you don‘t know the domain ahead of time ...
– Local search parameters must be tunedLocal search parameters must be tuned• Performance depends crucially on parameter settingPerformance depends crucially on parameter setting
• New application/algorithm: New application/algorithm: – Restart parameter tuning from scratchRestart parameter tuning from scratch– Waste of time both for researchers and practicionersWaste of time both for researchers and practicioners
• ComparabilityComparability– Is algorithm A faster than algorithm B because they Is algorithm A faster than algorithm B because they
spent more time tuning it ? spent more time tuning it ?
Motivation(2): operational Motivation(2): operational scenarioscenario• CP solver has to solve instances from a variety of CP solver has to solve instances from a variety of
domainsdomains• Domains not known a prioriDomains not known a priori• Solver should automatically use best strategy for Solver should automatically use best strategy for
each instance each instance • Want to learn from instances we solveWant to learn from instances we solve
• Previous work on runtime prediction we Previous work on runtime prediction we base onbase on[Leyton-Brown, Nudelman et al. ’02 & ’04][Leyton-Brown, Nudelman et al. ’02 & ’04]
• Part I: Automated parameter setting based Part I: Automated parameter setting based on runtime predictionon runtime prediction
• Part II: Incremental learning for runtime Part II: Incremental learning for runtime prediction in a priori unknown domainsprediction in a priori unknown domains
Previous work on runtime Previous work on runtime prediction for algorithm prediction for algorithm selectionselection• General approachGeneral approach
– Portfolio of algorithmsPortfolio of algorithms– For each instance, choose the algorithm that promises to For each instance, choose the algorithm that promises to
be fastestbe fastest• ExamplesExamples
– [Lobjois and Lemaître, AAAI’98] [Lobjois and Lemaître, AAAI’98] CSPCSP• Mostly propagations of different complexityMostly propagations of different complexity
– [Leyton-Brown et al., CP’02][Leyton-Brown et al., CP’02] Combinatorial auctions Combinatorial auctions• CPLEX + 2 other algorithms (which were thought CPLEX + 2 other algorithms (which were thought
incompetitive)incompetitive)– [Nudelman et al., CP’04][Nudelman et al., CP’04] SAT SAT
• Many tree-search algorithms from last SAT competitionMany tree-search algorithms from last SAT competition
• On average considerably faster than each single On average considerably faster than each single algorithmalgorithm
– The learning problem thus reduces to fitting The learning problem thus reduces to fitting the weights the weights w w == ww11,...,w,...,wmm
• To grasp the vast different in runtime To grasp the vast different in runtime better, estimate the logarithm of runtime: better, estimate the logarithm of runtime: e.g. ye.g. yi i = 5 = 5 runtime is 10 runtime is 1055 sec sec
• Features can be computed quickly (in seconds)Features can be computed quickly (in seconds)– Basic properties like #vars, #clauses, ratioBasic properties like #vars, #clauses, ratio– Estimates of search space sizeEstimates of search space size– Linear programming boundsLinear programming bounds– Local search probesLocal search probes
• Linear functions are not very powerfulLinear functions are not very powerful• But you can use the same methodology to learn But you can use the same methodology to learn
more complex functionsmore complex functions– Let Let = ( = (11,...,,...,qq) be arbitrary combinations of the ) be arbitrary combinations of the
features xfeatures x11,...,x,...,xmm (so-called basis functions) (so-called basis functions)– Learn linear function of basis functions: f(Learn linear function of basis functions: f() = ) = * * ww
• Basis functions used in Basis functions used in [Nudelman et al. ’04][Nudelman et al. ’04]– Original features: xOriginal features: xii – Pairwise products of features: xPairwise products of features: xii * x * xjj – Only subset of these (drop useless basis functions)Only subset of these (drop useless basis functions)
Algorithm selection based on Algorithm selection based on runtime predictionruntime prediction[Leyton-Brown, Nudelman et al. ’02 & [Leyton-Brown, Nudelman et al. ’02 & ’04]’04]• Given n different algorithms AGiven n different algorithms A11,...,A,...,Ann
• Training:Training:– Learn n separate functions fLearn n separate functions fjj:: !! RR, ,
• Previous work on runtime prediction we Previous work on runtime prediction we base on [Leyton-Brown, Nudelman et al. ’02 base on [Leyton-Brown, Nudelman et al. ’02 & ’04]& ’04]
• Part I: Automated parameter setting based Part I: Automated parameter setting based on runtime predictionon runtime prediction
• Part II: Incremental learning for runtime Part II: Incremental learning for runtime prediction in a priori unknown domainsprediction in a priori unknown domains
Parameter setting based on Parameter setting based on runtime predictionruntime prediction
Finding the best default parameter setting for a problem classGenerate special purpose code Generate special purpose code [Minton [Minton ’93] ’93] Minimize estimated error Minimize estimated error [Kohavi & John [Kohavi & John ’95]’95]Racing algorithmRacing algorithm [Birattari et al. ’02] [Birattari et al. ’02]Local searchLocal search [Hutter ’04] [Hutter ’04]Experimental design Experimental design [Adenso-Daz & [Adenso-Daz & Laguna ’05]Laguna ’05]Decision trees Decision trees [Srivastava & Mediratta, [Srivastava & Mediratta, ’05]’05]
Runtime prediction for algorithm selection on a per-instance basePredict runtime for each Predict runtime for each algorithm and pick the algorithm and pick the best best [Leyton-Brown, [Leyton-Brown, Nudelman et al. ’02 & ’04]Nudelman et al. ’02 & ’04]
Runtime prediction for setting parameters on a per-instance base
Naive application of runtime Naive application of runtime prediction for parameter prediction for parameter settingsetting• Given one algorithm with n different parameter Given one algorithm with n different parameter
settings Psettings P11,...,P,...,Pnn
• Training:Training:– Learn n separate functions fLearn n separate functions fjj:: !! RR, j=1...n, j=1...n
t+1t+1 = f = fjj((t+1t+1) for each of the parameter ) for each of the parameter settingssettings
– Run algorithm with setting PRun algorithm with setting Pjj with minimal y with minimal yjjt+1t+1
• If there are too many parameter configurations:If there are too many parameter configurations:– Cannot run each parameter setting on each instanceCannot run each parameter setting on each instance– Need to generalize (cf. human parameter tuning)Need to generalize (cf. human parameter tuning)– With With separateseparate functions there is no way to generalize functions there is no way to generalize
Application of runtime Application of runtime prediction for parameter prediction for parameter settingsetting• View the parameters as additional features, learn a View the parameters as additional features, learn a single single
functionfunction• Training: Given a set of instances zTraining: Given a set of instances z11,...,z,...,ztt
– For each instance zFor each instance zii
• Compute features Compute features xxii • Pick some parameter settings pPick some parameter settings p11,...,p,...,pnn
• Run algorithm with settings pRun algorithm with settings p11,...,p,...,pnn to get runtimes y to get runtimes y11i i ,...,y,...,ynn
ii
• Basic functions Basic functions 11ii, ..., , ..., nn
ii include the parameter settings include the parameter settings– Collect pairs (Collect pairs (jj
ii,y,yjjii) (n data points per instance)) (n data points per instance)
– Only learn a Only learn a single function single function g:g: !! RR • Test: Given a new instance zTest: Given a new instance zt+1t+1
– Compute features Compute features xxt+1t+1 – Search over parameter settings pSearch over parameter settings pjj. .
Summary of automated Summary of automated parameter setting based on parameter setting based on runtime predictionruntime prediction• Learn a single function that maps Learn a single function that maps
features and parameter settings to features and parameter settings to runtimeruntime
• Given a new instanceGiven a new instance– Compute the features (they are fix)Compute the features (they are fix)– Search for the parameter setting that Search for the parameter setting that
minimizes predicted runtime for these minimizes predicted runtime for these featuresfeatures
• Previous work on runtime prediction we Previous work on runtime prediction we base on [Leyton-Brown, Nudelman et al. ’02 base on [Leyton-Brown, Nudelman et al. ’02 & ’04]& ’04]
• Part I: Automated parameter setting based Part I: Automated parameter setting based on runtime predictionon runtime prediction
• Part II: Incremental learning for runtime Part II: Incremental learning for runtime prediction in a priori unknown domainsprediction in a priori unknown domains
Solution: Sequential Bayesian Solution: Sequential Bayesian Linear RegressionLinear RegressionUpdate “knowledge“ as new data arrives:Update “knowledge“ as new data arrives:
probability distribution over weights probability distribution over weights ww• Incremental (one (Incremental (one (xxii, y, yii) pair at a time)) pair at a time)
– Seemlessly integrate this new dataSeemlessly integrate this new data– ““Optimal“: yields same result as a batch approachOptimal“: yields same result as a batch approach
• EfficientEfficient– Computation: 1 matrix inversion per update Computation: 1 matrix inversion per update – Memory: can drop data we integratedMemory: can drop data we integrated
• RobustRobust– Simple to implement (3 lines of Matlab)Simple to implement (3 lines of Matlab)– Provides estimates of uncertainty in predictionProvides estimates of uncertainty in prediction
Sequential Bayesian linear Sequential Bayesian linear regression – intuitionregression – intuition
• Instead of predicting Instead of predicting a single runtime y, a single runtime y, use a use a probability probability distribution P(Y)distribution P(Y)
• The mean of P(Y) is The mean of P(Y) is exactly the prediction exactly the prediction of the non-Bayesian of the non-Bayesian approach, but we get approach, but we get uncertainty estimatesuncertainty estimates
• Standard linear regression: Standard linear regression: – Training: given training data Training: given training data 1:n1:n, y, y1:n1:n, fit the weights , fit the weights w w
such that ysuch that y1:n1:n ¼¼ 1:n1:n * * ww– Prediction: yPrediction: yn+1n+1 = = n+1n+1 * * ww
• Bayesian linear regression:Bayesian linear regression:– Training: Given training data Training: Given training data 1:n1:n, y, y1:n1:n, infer probability , infer probability
distribution P(distribution P(ww||1:n1:n, y, y1:n1:n) ) // P( P(ww) * ) * ii P(y P(yii||ii, , ww))
– Prediction: P(yPrediction: P(yn+1n+1||n+1n+1, , 1:n1:n, y, y1:n1:n) = ) = ss P(yP(yn+1n+1||w, w, n+1n+1) * P() * P(ww||1:n1:n, y, y1:n1:n) d) dww
Gaussian
Sequential Bayesian linear Sequential Bayesian linear regression – technicalregression – technical
• ““Knowledge“ about the weights: Gaussian (Knowledge“ about the weights: Gaussian (ww, , ww))
Summary of incremental Summary of incremental learning for runtime predictionlearning for runtime prediction• Have a probability distribution over the weights:Have a probability distribution over the weights:
– Start with a Gaussian prior, incremetally update it with more dataStart with a Gaussian prior, incremetally update it with more data• Given the Gaussian weight distribution, the predictions are Given the Gaussian weight distribution, the predictions are
also Gaussiansalso Gaussians– We know how uncertain our predictions areWe know how uncertain our predictions are– For new domains, we will be very uncertain and only grow more For new domains, we will be very uncertain and only grow more
confident after having seen a couple of data pointsconfident after having seen a couple of data points
• Previous work on runtime prediction we Previous work on runtime prediction we base on [Leyton-Brown, Nudelman et al. ’02 base on [Leyton-Brown, Nudelman et al. ’02 & ’04]& ’04]
• Part I: Automated parameter setting based Part I: Automated parameter setting based on runtime predictionon runtime prediction
• Part II: Incremental learning for runtime Part II: Incremental learning for runtime prediction in a priori unknown domainsprediction in a priori unknown domains
Domain for our experimentsDomain for our experiments
• SATSAT– Best studied NP-hard problemBest studied NP-hard problem– Good features already exist Good features already exist [Nudelman et al.’04][Nudelman et al.’04]– Lots of benchmarksLots of benchmarks
• Stochastic Local Search (SLS)Stochastic Local Search (SLS)– Runtime prediction has never been done for SLS beforeRuntime prediction has never been done for SLS before– Parameter tuning is very important for SLSParameter tuning is very important for SLS– Parameters are often continuousParameters are often continuous
• SAPS algorithm SAPS algorithm [Hutter, Tompkins, Hoos ‘02][Hutter, Tompkins, Hoos ‘02]– Still amongst the state-of-the-artStill amongst the state-of-the-art– Default setting not always bestDefault setting not always best– Well, I also know it well ;-)Well, I also know it well ;-)
• But the approach is applicable to about anything whenever we But the approach is applicable to about anything whenever we can compute features!!can compute features!!
Stochastic Local Search for Stochastic Local Search for SAT:SAT:Scaling and Probabilistic Smoothing Scaling and Probabilistic Smoothing (SAPS)(SAPS)[Hutter, Tompkins, Hoos ‘02][Hutter, Tompkins, Hoos ‘02]• Clause weighting algorithm for SAT, was state-Clause weighting algorithm for SAT, was state-
of-the-art in 2002of-the-art in 2002– Start with all clause weights set to 1Start with all clause weights set to 1– Hillclimbing until you hit a local minimumHillclimbing until you hit a local minimum– In local minima:In local minima:
• Scaling: scale weights of unsatisfied clauses: wScaling: scale weights of unsatisfied clauses: wcc ÃÃ * w * wcc
• Probabilistic smoothing: with probability PProbabilistic smoothing: with probability Psmoothsmooth, smooth all , smooth all clause weights: wclause weights: wcc ÃÃ * w * wcc + (1- + (1-) * average w) * average wcc
• Only satisfiable instances!Only satisfiable instances!
•SAT04randSAT04rand: SAT ‘04 competition : SAT ‘04 competition instancesinstances
•mixmix: mix of lots of different domains : mix of lots of different domains from SATLIB: random, graph from SATLIB: random, graph colouring, blocksworld, inductive colouring, blocksworld, inductive inference, logistics, ...inference, logistics, ...
Where uncertainty helps in Where uncertainty helps in practice: practice: qualitative differences in qualitative differences in training & test settraining & test set
• Trained on Trained on mixmix, tested on , tested on SAT04randSAT04rand
Where uncertainty helps in practice Where uncertainty helps in practice (2):(2):Zoomed to predictions with low Zoomed to predictions with low uncertaintyuncertainty
• Previous work on runtime prediction we Previous work on runtime prediction we base on [Leyton-Brown, Nudelman et al. ’02 base on [Leyton-Brown, Nudelman et al. ’02 & ’04]& ’04]
• Part I: Automated parameter setting based Part I: Automated parameter setting based on runtime predictionon runtime prediction
• Part II: Incremental learning for runtime Part II: Incremental learning for runtime prediction in a priori unknown domainsprediction in a priori unknown domains
• Automated parameter tuning is needed and Automated parameter tuning is needed and feasiblefeasible– Algorithm experts waste their time on itAlgorithm experts waste their time on it– Solver can automatically choose appropriate Solver can automatically choose appropriate
heuristics based on instance characteristicsheuristics based on instance characteristics
• Such a solver could be used in practiceSuch a solver could be used in practice– Learns incrementally from the instances it Learns incrementally from the instances it
Future work along these Future work along these lineslines• Increase predictive performanceIncrease predictive performance
– Better featuresBetter features– More powerful ML algorithmsMore powerful ML algorithms
• Active learningActive learning– Run most informative probes for new domains (need the Run most informative probes for new domains (need the
uncertainty estimates)uncertainty estimates)• Use uncertaintyUse uncertainty
– Pick algorithm with maximal probability of success Pick algorithm with maximal probability of success ((notnot the one with minimal expected runtime!) the one with minimal expected runtime!)
• More domainsMore domains– Tree search algorithmsTree search algorithms– CPCP
Future work along related Future work along related lineslines• If there are no features: If there are no features:
– Local search in parameter space to find the Local search in parameter space to find the best default parameter setting best default parameter setting [Hutter ‘04][Hutter ‘04]
• If we can change strategies while If we can change strategies while running the algorithm:running the algorithm:– Reinforment learning for algorithm selectionReinforment learning for algorithm selection
• Thanks to Thanks to – Youssef HamadiYoussef Hamadi– Kevin Leyton-BrownKevin Leyton-Brown– Eugene NudelmanEugene Nudelman– You for your attention You for your attention