Automatic Algorithm Configuration based on Local Search
Post on 08-Jan-2016
37 Views
Preview:
DESCRIPTION
Transcript
Automatic Algorithm Configuration based on Local Search
EARG presentationDecember 13, 2006
Frank Hutter
2
Motivation
• Want to design “best” algorithm A to solve a problem Many design choices need to be made Some choices deferred to later: free parameters of algorithm Set parameters to maximise empirical performance
• Finding best parameter configuration is non-trivial Many parameter configurations Many test instances Many runs to get realistic estimates for randomised algorithms Tuning still often done manually, up to 50% of development
time
• Let’s automate tuning!
3
Parameters in different research areas• NP-hard problems: tree search
Variable/value heuristics, learning, restarts, …
• NP-hard problems: local search Percentage of random steps, tabu length, strength of escape moves, …
• Nonlinear optimisation: interior point methods Slack, barrier init, barrier decrease rate, bound multiplier init, …
• Computer vision: object detection Locality, smoothing, slack, …
• Supervised machine learning NOT model parameters L1/L2 loss, penalizer, kernel, preprocessing, num. optimizer, …
• Compiler optimisation, robotics, …
4
Related work• Best fixed parameter setting
Search approaches [Minton ’93, ‘96], [Hutter ’04], [Cavazos & O’Boyle ’05],[Adenso-Diaz & Laguna ’06], [Audet & Orban ’06]
Racing algorithms/Bandit solvers [Birattari et al. ’02], [Smith et al. ’04 – ’06] Stochastic Optimisation [Kiefer & Wolfowitz ’52], [Geman & Geman ’84], [Spall ’87]
• Per instance Algorithm selection [Knuth ’75], [Rice 1976], [Lobjois and Lemaître ’98],
[Leyton-Brown et al. ’02], [Gebruers et al. ’05] Instance-specific parameter setting [Patterson & Kautz ’02]
• During algorithm run Portfolios [Kautz et al. ’02], [Carchrae & Beck ’05],
[Gagliolo & Schmidhuber ’05, ’06] Reactive search [Lagoudakis & Littman ’01, ’02], [Battiti et al ’05], [Hoos ’02]
5
Static Algorithm Configuration (SAC)
SAC problem instance: 3-tuple (D,A,), where D is a distribution of problem instances, A is a parameterised algorithm, and is the parameter configuration space of A.
Candidate solution: configuration 2 , expected cost
C() = EI»D[Cost(A, , I)]
Stochastic Optimisation Problem
6
Static Algorithm Configuration (SAC)
SAC problem instance: 3-tuple (D,A,), where D is a distribution of problem instances, A is a parameterised algorithm, and is the parameter configuration space of A.
Candidate solution: configuration 2 , expected cost
C() = statistic[ CD (A, , I) ]
CD(A, , D): cost distribution of algorithm A with parameterconfiguration across instances from D. Variation due to randomisation of A and variation in instances.
Stochastic Optimisation Problem
7
Parameter tuning in practice
• Manual approaches are often fairly ad hoc Full factorial design
- Expensive (exponential in number of parameters) Tweak one parameter at a time
- Only optimal if parameters independent Tweak one parameter at a time until no more improvement
possible- Local search ! local minimum
• Manual approaches are suboptimal Only find poor parameter configurations Very long tuning time Want to automate
11
What is the objective function?• User-defined objective function
E.g. expected runtime across a number of instances Or expected speedup over a competitor Or average approximation error Or anything else
• BUT: must be able to approximate objective based on a finite (small) number of samples Statistic is expectation ! sample mean (weak law of large numbers) Statistic is median ! sample median (converges ??) Statistic is 90% quantile ! sample 90% quantile
(underestimated with small samples !) Statistic is maximum (supremum) ! cannot generally approximate based on
finite sample?
12
Parameter tuning as “pure” optimisation problem
• Approximate objective function (statistic of a distribution) based on fixed number of N instances Beam search [Minton ’93, ‘96] Genetic algorithms [Cavazos & O’Boyle ’05] Exp. design & local search [Adenso-Diaz & Laguna ‘06] Mesh adaptive direct search [Audet & Orban ‘06]
• But how large should N be ? Too large: take too long to evaluate a configuration Too small: very noisy approximation ! over-tuning
13
Minimum is a biased estimator
• Let x1, …, xn be realisations of rv’s X1, …, Xn
• Each xi is a priori an unbiased estimator of E[ Xi ] Let xj = min(x1,…,xn)
This is an unbiased estimator of E[ min(X1,…,Xn) ]but NOT of E[ Xj ] (because we’re conditioning on it being the minimum!)
• Example Let Xi » Exp(), i.e. F(x|) = 1 - exp(-x) E[ min(X1,…,Xn) ] = 1/n E[ min(Xi) ] I.e. if we just take the minimum and report its runtime, we
underestimate cost by a factor of n (over-confidence)
• Similar issues for cross-validation etc
14
Primer: over-confidence for N=1 sample
y-axis: runlength of best found so farTraining: approximation based on N=1 sample (1 run, 1 instance)Test: 100 independent runs (on the same instance) for each Median & quantiles over 100 repetitions of the procedure
15
Over-tuning
• More training ! worse performance Training cost monotonically decreases Test cost can increase
• Big error in cost approximation leads to over-tuning (on expectation): Let * 2 be the optimal parameter configuration Let ’ 2 be a suboptimal parameter configuration
with better training performance than * If the search finds * before ’ (and it is the best one so
far), and the search then finds ’, training cost decreases but test cost increases
18
Other approaches without over-tuning
• Another approach Small N for poor configurations , large N for good ones
- Racing algorithms [Birattari et al. ’02, ‘05]- Bandit solvers [Smith et al., ’04 –’06]
But treat all parameter configurations as independent- May work well for small configuration spaces- E.g. SAT4J has over a million possible configurations
! Even a single run each is infeasible
• My work: combination of approaches Local search in parameter configuration space Start with N=1 for each , increase it whenever re-visited Does not have to visit all configurations, good ones visited often Does not suffer from over-tuning
19
Unbiased ILS in parameter space
• Over-confidence for each vanishes as N!1
• Increase N for good Start with N=0 for all Increment N for
whenever is visited
• Can proof convergence Simple property,
even applies for round-robin
• Experiments: SAPS on qwh
ParamsILS with N=1
Focused ParamsILS
22
SAC problem instances
• SAPS [Hutter, Tompkins, Hoos ’02] 4 continuous parameters h,,wp, Psmoothi Each discretised to 7 values ! 74 =2,401 configurations
• Iterated Local Search for MPE [Hutter ’04]
4 discrete + 4 continuous (2 of them conditional) In total 2,560 configurations
• Instance distributions Two single instances, one easy (qwh), one harder (uf) Heterogeneous distribution with 98 instances from satlib for
which SAPS median is 50K-1M steps Heterogeneous distribution with 50 MPE instances: mixed
• Cost: average runlength, q90 runtime, approximation error
26
Local Search for SAC: Summary
• Direct approach for SAC• Positive
Incremental homing-in to good configurations But using the structure of parameter space (unlike bandits) No distributional assumptions Natural treatment of conditional parameters
• Limitations Does not learn
- Which parameters are important?- Unnecessary binary parameter: doubles search space
Requires discretization- Could be relaxed (hybrid scheme etc)
top related