The MDL principle for arbitrary data: either discrete or continuous or none of them Joe Suzuki Osaka University WITMSE 2013 Sanjo-Kaikan, University of Tokyo, Japan August 26, 2013 Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of them WITMSE 2013Sanjo-Kaikan, University of To / 24
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
.
.
. ..
.
.
The MDL principle for arbitrary data:either discrete or continuous or none of them
Joe Suzuki
Osaka University
WITMSE 2013Sanjo-Kaikan, University of Tokyo, Japan
August 26, 2013
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 1
/ 24
Road Map
Road Map
.
. . 1 Problem
.
. .
2 The Ryabko measure
.
. .
3 The Radon-Nikodym theorem
.
. .
4 Generalization
.
. .
5 Universal Histogram Sequence
.
. .
6 Conclusion
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 2
/ 24
Road Map
The slides of this talk can be seen via Internet
keywords: Joe Suzukislideshare
http://www.slideshare.net/prof-joe/
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 3
/ 24
Problem
Given {(xi , yi)}ni=1, identify whether X ⊥⊥ Y or not
Pn(xn|θ), Pn(yn|θ), Pn(xn, yn|θ): expressed by parameter θ
p: the prior probability of X ⊥⊥ Y
.
Bayesian solution
.
.
.
. ..
. .
X ⊥⊥ Y ⇐⇒ pQn(xn)Qn(yn) ≥ (1− p)Qn(xn, yn)
Qn(xn) :=
∫Pn(xn|θ)w(θ)dθ , Qn(yn) :=
∫Pn(yn|θ)w(θ)dθ
Qn(xn, yn) :=
∫Pn(xn, yn|θ)w(θ)dθ
using a weight w over θ.
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 4
/ 24
Problem
Q should be an alternative to P as n grows
A: the finite set in which X takes values.
.
Q is a Bayesian measure
.
.
.
. ..
.
.
Kraft’s inequality: ∑xn∈An
Qn(xn) ≤ 1 (1)
For Example, Qn(xn) = |A|−n, xn ∈ An
satisfies (1); but
does not converges to Pn in any sense
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 5
/ 24
Problem
Universal Bayesian Measures
Qn(xn) :=
∫Pn(xn|θ)w(θ)dθ
w(θ) ∝∏x∈A
θ−a[x] with {a[x ] = 12}x∈A (Krichevsky-Trofimov)
−1
nlogQn(xn) → H(P)
for any Pn(xn|θ) =∏x∈A
θ−c[x] with {c[x ]}x∈A in xn ∈ An.
Shannon McMillian Breiman:
−1
nlogPn(xn|θ) −→ H(P)
for any stationary ergodic P , so that for Pn(xn) := Pn(xn|θ),
1
nlog
Pn(xn)
Qn(xn)→ 0 . (2)
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 6
/ 24
Problem
When X has a density function f
There exists a g s.t. ∫xn∈Rn
gn(xn) ≤ 1 (3)
1
nlog
f n(xn)
gn(xn)→ 0 (4)
for any f satisfying a condition mentioned later (Ryabko 2009).
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 7
/ 24
Problem
The problem in this paper
.
Universal Bayesian measure in the general settings
.
.
.
. ..
.
.
What are (1)(2) and (3)(4) for general random variables ?
.
.
.
1 without assuming either discrete or continuous
.
.
.
2 removing the constraint Ryabko poses:
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 8
/ 24
The Ryabko measure
Ryabko measure: X has a density function f
A: the set in which X takes values.
{Aj}∞j=0 :
{A0 := {A}Aj+1 is a refinement of Aj
For example, for A = [0, 1), A0 = {[0, 1)}A1 = {[0, 1/2), [1/2, 1)}A2 = {[0, 1/4), [1/4, 1/2), [1/2, 3/4), [3/4, 1)}. . .
sj : A → Aj : x ∈ a ∈ Aj =⇒ sj(x) = aλ: the Lebesgue measure
fj(x) :=Pj(sj(x))
λ(sj(x))for x ∈ A
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 9
Qj : a universal Bayesian measure w.r.t. finite set Aj . f n(xn) := f (x1) · · · f (xn)
gn(xn) :=∞∑j=0
wjgnj (x
n) for {ωj}∞j=1 s.t.∑j
ωj = 1, ωj > 0
1
nlog
f n(xn)
gn(xn)→ 0
for any f s.t. differential entropy h(fj) → h(f ) as j → ∞ (Ryabko, 2009)
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 10
/ 24
The Radon-Nikodym theorem
In general, exactly when a density function exists ?
B: the entire Borel sets of Rµ(D) := P(X ∈ D): the probability of (X ∈ D) for D ∈ BFX : the distribution function of X
.
µ is absolutely continuous w.r.t. λ (µ ≪ λ)
.
.
.
. ..
.
.
The following two are equivalent:
.
.
.
1 f : R → R exists s.t. P(X ≤ x) = FX (x) =
∫t≤x
f (t)dt
.
.
.
2 for any D ∈ B, λ(D) :=∫D dx = 0 =⇒ µ(D) = 0.
f (x) =dFX (x)
dx
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 11
/ 24
The Radon-Nikodym theorem
Even discrete variables have density functions!
B: a countable subset of Rµ(D) := P(X ∈ D): the probability of (X ∈ D) for D ⊆ Br : B → R
.
µ is absolutely continuous w.r.t. η (µ ≪ η)
.
.
.
. ..
.
.
.
.
.
1 f : B → R exists s.t. P(X ∈ D) =∑x∈D
f (x)r(x), D ⊆ B
.
.
.
2 for any D ⊆ B, η(D) :=∑x∈D
r(x) = 0 =⇒ µ(D) = 0.
f (x) =P(X = x)
r(x)
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 12
/ 24
The Radon-Nikodym theorem
Radon-Nikodym
µ, η: σ-finite measures over σ-field F
.
µ is absolutely continuous w.r.t. η (µ ≪ η)
.
.
.
. ..
.
.
.
.
.
1 F-measurable f exists s.t. for any A ∈ F , µ(A) =
∫Af (t)dη(t)
.
.
.
2 for any A ∈ F , η(A) = 0 =⇒ µ(A) = 0
∫Af (t)dη(t) := sup
{Ai}
∑i
[ infx∈Ai
f (x)]η(Ai )
dµ
dη:= f is the density function w.r.t. η when µ is the probability measure.
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 13
/ 24
Generalization
When Y has a density function w.r.t. η s.t. µ ≪ η
B: the set in which Y takes values.
{Bj}∞k=0 :
{B0 := {B}Bk+1 is a refinement of Bk
For example, for B = N := {1, 2, · · · }, B0 = {B}B1 := {{1}, {2, 3, · · · }}B2 := {{1}, {2}, {3, 4, · · · }}. . .Bk := {{1}, {2}, · · · , {k}, {k + 1, k + 2, · · · }}. . .
tk : B → Bk : y ∈ b ∈ Bk =⇒ tk(y) = bη: µ ≪ η
fk(y) :=Pk(tk(y))
η(tk(y))for y ∈ B
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 14
Qk : a universal Bayesian measure w.r.t. finite set Bk
Similarly,1
nlog
f n(xn)
gn(xn)→ 0
for any f s.t. h(fj) → h(f ) as j → ∞
h(f ) :=
∫−f (y) log f (y)dη(y)
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 15
/ 24
Generalization
Generalization
µn(Dn) :=
∫Df n(yn)dηn(yn) , Dn ∈ Bn
νn(Dn) :=
∫Dgn(yn)dηn(yn) , Dn ∈ Bn
f n(yn)
gn(yn)=
dµn
dηn(yn)/
dνn
dηn(yn) =
dµn
dνn(yn)
D(µ||ν) :=∫
dµ logdµ
dν
h(f ) :=
∫−f (y) log f (y)dη(y)
= −∫
dµ
dη(y) log
dµ
dη(y) · dη(y) = −D(µ||η)
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 16
/ 24
Generalization
Result 1
.
Proposition 1 (Suzuki, 2011)
.
.
.
. ..
.
.
If µ ≪ η, ν ≪ η exists s.t. νn(Rn) ≤ 1 and
1
nlog
dµn
dνn(yn) → 0
for any µ s.t. D(µk ||η) → D(µ||η) as k → ∞.
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 17
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 18
/ 24
Generalization
Further generalization
Proposition 1 assumes
a specific histogram sequence {Bk}; andµ should satisfy D(µk ||η) → D(µ||η) as k → ∞
.
{Bk} should be universal
.
.
.
. ..
.
.
Construct {Bk} s.t. D(µk ||η) → D(µ||η) as k → ∞ for any µ
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 19
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 20
/ 24
Universal Histogram Sequence
B = R and µ ≪ λ
{Bk} = {Ck}
For each y ∈ B, there exist K ∈ N and a unique {(ak , bk ]}∞k=K s.t.{y ∈ [ak , bk ] ∈ Bk , k = K ,K + 1, · · ·|ak − bk | → 0 , k → ∞
FY : the distribution function of Y
fk(y) =P(Y ∈ (ak , bk ])
λ((ak , bk ])=
FY (bk)− FY (ak)
bk − ak→ f (y) , y ∈ B
h(fk) → h(f )
as k → ∞ for any f
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 21
can be obtained via µ = 1, σ = 1.For each y ∈ B, there exists K ∈ N and a unique {Dk}∞k=1 s.t.{
y ∈ Dk ∈ Bk k = 1, 2, · · ·{y} = Dk ∈ Bk , k = K ,K + 1, · · · fk(y) =P(Y ∈ Dk)
η(Dk)→ f (y) =
P(Y = y)
η({y}), y ∈ B
h(fk) → h(f )
as k → ∞ for any f
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 22
/ 24
Universal Histogram Sequence
Result 2
.
Theorem 1
.
.
.
. ..
.
.
If µ ≪ η, ν ≪ η exists s.t. νn(Rn) ≤ 1 and for any µ
1
nlog
dµn
dνn(yn) → 0
The proof is based on the following observation:
.
Billingeley: Probability & Measure, Problem 32.13
.
.
.
. ..
.
.
limh→0
µ((x − h, x + h])
η((x − h, x + h])= f (x) , x ∈ R
to remove the condition Ryabko posed:“for any µ s.t. D(µk ||η) → D(µ||η) as k → ∞”
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 23
/ 24
Conclusion
Summary and Discussion
.
Universal Bayesian Measure
.
.
.
. ..
.
.
the random variables may be either discrete or continuous
a universal histogram sequence to remove Ryabko’s condition
.
Many Applications
.
.
.
. ..
.
.
Bayesian network structure estimation (DCC 2012)
The Bayesian Chow-Liu Algorithm (PGM 2012)
Markov order estimation even when {Xi} is continuous
Extending MDL:gn(yn|m): the universal Bayesian measure w.r.t. model m given yn ∈ Bn
pm: the prior probability of model m
− log gn(yn|m)− log pm → min
Joe Suzuki (Osaka University) The MDL principle for arbitrary data: either discrete or continuous or none of themWITMSE 2013Sanjo-Kaikan, University of Tokyo, JapanAugust 26, 2013 24