Quasi-random resampling. O. Teytaud *, S. Gelly *, S. Lallich **, E. Prudhomme ** *Equipe I&A-TAO, LRI, Université Paris-Sud, Inria, UMR-Cnrs 8623 **Equipe ERIC, Université Lyon 2 Email : [email protected] , [email protected] , - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Koksma & Hlawka :error in Monte-Carlo integration < Discrepancy x V
V= total variation (Hardy & Krause)
( many generalizations in Hickernel, A Generalized Discrepancy and Quadrature Error Bound, 1997 )
12
Which set do you trust ?
13
Which quasi-random numbers ?
« Halton-sequence with a simple scrambling scheme »
● fast (as fast as pseudo-random numbers) ;● easy to implement ;● available freely if you don't want to implement it.
(we will not detail how this sequence is built here)
(also:
Sobol sequence)
14
What else than Monte-Carlo integration ?
Thanks to various forms of quasi-random :
● Numerical integration [thousands of papers; Niederreiter 92]
● Learning [Cervellera et al, IEEETNN 2004, Mary phD 2005]
● Optimization [Teytaud et al, EA'2005]
● Modelizat° of random-process [Growe-Kruska et al,
BPTP'03, Levy's method]
● Path planning [Tuffin]
15
... and how to do in strange spaces ?
(1) why resampling is Monte-Carlo integration
(2) quasi-random numbers
(3) quasi-random numbers in strange spaces
(4) applying quasi-random numbers in resampling
(5) when does it work and when doesn't it work ?
16
Have fun with QR in strange spaces
(3) quasi-random numbers in strange spaces
We have seen that resampling is Monte-Carlo integration, and how Monte-Carlo is replaced by Quasi-Random Monte-Carlo.
But resampling is random in a non-standard space.
We will see how to do Quasi-Random Monte-Carlo in non-standard spaces.
17
Quasi-random numbers in strange spaces
We have seen hypercubes :
18
... but we need something else !
Sample of points ---> QR sample of points
Sample of samples ---> QR sample of samples
19
Quasi-random points in strange spaces
Fortunately, some QR-points exist also in various spaces.
20
Why not in something isotropic ?
How to do it in the sphere ? Or for gaussian distributions ?
21
For the gaussian : easy !
Generate x in [0,1]^d by quasi-random
Build y: P( N < y(i) ) = x(i)
It works because distrib = product of distrib(y(i))
What in the general case ?
22
Ok !
- generate x in [0,1]^d
- define y(i) such that P(t<y(i) | y(1), y(2), ..., y(i-1))=x(i)
Ok !
23
However, we will do that
●We do not have better than this general method for the strange distributions in which we are interested
●At least we can prove the O(1/n) property (see the paper)
●Perhaps there is much better
●Perhaps there is much simpler
24
The QR-numbers in resampling
(4) applying quasi-random numbers in resampling
we have seen that resampling is Monte-Carlo integration,that we were able of generating quasi-random points for any distribution in continuous domains;
==> it should work==> let's see in details how to move the problem to the continuous domain
25
QR-numbers in resampling
A very particular distribution for QR-points : bootstrap samples. How to move the problem to continuous spaces ?
y(i) = x(r(i)) where r(i) = randomly uniformly distributed in [[1,n]] ==> this is discrete
26
QR-numbers in resampling
A very particular distribution for QR-points : bootstrap samples. How to move the problem to continuous spaces ?
We know :
We need :
y(i) = x(r(i)) where r(i) = randomly uniformly distributed in [[1,n]] --> many solutions exist
Rectangularuniformdistribution
Any continuousdistribution
Continuousdistribution
Our discretedistribution
27
What are bootstrap samples ?
Our technique works for various forms of resamplings :- subsamples without replacement (random-CV, subagging)
- subsamples with replacement (bagging, bootstrap)
- random partitionning (k-CV).
W.l.o.g., we present here the sampling of n elements in a sample of size n with replacement (= bootstrap resampling).
(usefull in e.g. Bagging, bias/variance estimation...)
28
A naive solution
y(i) = x(r(i))
r(1),...,r(n) = ceil( n x qr ) where qr [0,1]^n
QR in dimension n with n the number of examples.
0.1 0.9 0.84 0.9 0.7
X
29
A naive solution
y(i) = x(r(i))
r(1),...,r(n) = ceil( n x qr ) where qr [0,1]^n
QR in dimension n with n the number of examples.
0.1 0.9 0.84 0.9 0.7
X
X
30
A naive solution
y(i) = x(r(i))
r(1),...,r(n) = ceil( n x qr ) where qr [0,1]^n
QR in dimension n with n the number of examples.
0.1 0.9 0.84 0.9 0.7
X
X X
31
A naive solution
y(i) = x(r(i))
r(1),...,r(n) = ceil( n x qr ) where qr [0,1]^n
QR in dimension n with n the number of examples.
0.1 0.9 0.84 0.9 0.7
X
X
X X X
32
A naive solution
y(i) = x(r(i))
r(1),...,r(n) = ceil( n x qr ) where qr [0,1]^n
QR in dimension n with n the number of examples.
0.1 0.9 0.84 0.9 0.7==> (1, 0,0,1,3)
X
X
X X X
33
A naive solution
y(i) = x(r(i))
r(1),...,r(n) = ceil( n x qr ) where qr [0,1]^n
QR in dimension n with n the number of examples.
==> all permutations of ( 0.1, 0.9, 0.84, 0.9, 0.7) lead to the same result !
X
X
X X X
34
...which does not work.
In practice it does not work better than random.
Two very distinct QR-points can lead to very similar resamples (permutation of a point lead to the same sample).
We have to remove this symetry.
35
A less naive solution
z(i) = number of times x(i) appears in the bootstrap sample
Then, we randomly draw the elements in each cluster.
38
Let's conclude
(1) why resampling is Monte-Carlo integration
(2) quasi-random numbers
(3) quasi-random numbers in strange spaces
(4) applying quasi-random numbers in resampling
(5) when does it work and when doesn't it work ?
39
Experiments
In our (artificial) experiments :● QR-randomCV is better than randomCV● QR-bagging is better than bagging● QR-subagging is better than subagging● QR-Bsfd is better than Bsfd (a bootstrap)
But QR-kCV is not better than kCV
kCV already has some derandomization:
each point appears the same number
of times in learning
40
A typical exampleYou want to learn a relation x--> y on a huge ordered dataset.
The dataset is too large for your favorite learner.
A traditional solution is subagging : average 100 learnings performed on random subsets (1/20) of your dataset
We propose : use QR-sampling to average only 40 learnings.
Or do you have a better solution for choosing 40 subsets of 1/20 ?
41
Conclusions
Therefore:● perhaps simpler derandomizations are enough ?● perhaps in cases like CV in which « symetrizing » (picking each example the same number of times) is easy, this is useless ?
For bagging, subagging, bootstrap, simplifying the approach is not so simple
==> now, we use QR-bagging, QR-subagging and QR-bootstrap instead of bagging, subbagging and bootstrap
42
Further work
Real-world experiments (in progress, for DP-applications)
Other dimension reduction (this one involves clustering)Simplified derandomization methods (jittering, antithetic variables, ...)
Random clustering for dimension reduction ?(yes, we have not tested, sorry ...)