Top Banner
A survey on mixing coe cients: computation and estimation. Vitaly Kuznetsov Courant Institute of Mathematical Sciences, New York University October 29, 2013 1 / 24
24

A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

Jul 31, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

A survey on mixing coecients:computation and estimation.

Vitaly Kuznetsov

Courant Institute of Mathematical Sciences,New York University

October 29, 2013

1 / 24

Page 2: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

Introduction

Binary classication

Receive a sample X1, . . . ,Xm with labels in 0, 1.Choose a hypothesis h that has a good expectedperformance on unseen data.

X1, . . . ,Xm are typically assumed i.i.d.

2 / 24

Page 3: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

Introduction (continued)

Much of the learning theory operates under theassumption that data comes from an i.i.d. source.

In certain scenarios this assumption is not appropriate,e.g. time series analysis.

To extend learning theory to this scenarios we need tond a suitable relaxation of i.i.d. requirement.

One common approach found in literature is imposingvarious \mixing conditions".

Under these mixing conditions the strength ofdependence between random variables is measuredusing \mixing coecients".

3 / 24

Page 4: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

Outline

Mixing conditions and coefficients: definitions

and basic properties.

Computational aspects.

Estimating mixing coefficients.

Discussion.

4 / 24

Page 5: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

How can we measure dependence between

random variables?

Common measures of dependence are so called

“mixing” coefficients.

Originally introduced to prove laws of large

numbers for sequences of dependent variables.

5 / 24

Page 6: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

α mixing coecient between two σ-algebras

Given a probability space (Ω,F ,P) and two sub

σ-algebras σ1 and σ2, define α-mixing coefficient

α(σ1, σ2) = supA,B|P(A)P(B)− P(A ∩ B)|

where supremum is taken over all A ∈ σ1 and

B ∈ σ2.

6 / 24

Page 7: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

ϕ mixing coecient

Define ϕ-mixing coefficient

ϕ(σ1|σ2) = supA,B|P(A)− P(A|B)|

where supremum is taken over all A ∈ σ1 and

B ∈ σ2.

Note that ϕ coefficient is not symmetric.

7 / 24

Page 8: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

β mixing coecient

Dene β-mixing coecient between two σ-algebras σ1and σ2:

β(σ1, σ2) = E supA|P(A)− P(A|σ2)|

where supremum is taken over all A ∈ σ1.

We can rewrite β-mixing coecient as follows:

β(σ1, σ2) = 12 sup

I∑i=1

J∑j=1

|P(Ai)P(Bj)− P(Ai ∩ Bj)|

where supremum is taken over all nite partitionsA1, . . . ,AI and B1, . . . ,BJ of such that Ai ∈ σ1and Bj ∈ S2.

8 / 24

Page 9: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

Alternative denitions of β mixing coecient

This leads to yet another characterization of β-mixingcoecient:

β(σ1, σ2) = ‖Pσ1⊗ Pσ2

− Pσ1⊗σ2‖

where ‖ · ‖ denotes the total variation distance, i.e.‖P − Q‖ = supA |P(A)− Q(A)|.Assuming distributions P and Q have densities f andg respectively

‖P − Q‖ = 12

∫|f − g |

9 / 24

Page 10: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

Relations between mixing coecients

We have the following:

2α(σ1, σ2) ≤ β(σ1, σ2) ≤ ϕ(σ1, σ2)

The second inequality is immediate from thedenition.

Proof of the rst inequality:

|P(A)P(B)− P(A ∩ B)|+ |P(A)P(Bc)− P(A ∩ Bc)|+ |P(Ac)P(B)− P(Ac ∩ B)|+ |P(Ac)P(Bc)− P(Ac ∩ Bc)| ≤ 2β(σ1, σ2)

10 / 24

Page 11: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

From two variables to stochastic processes (i)

Let Xt∞t=−∞ be a doubly infinite sequence of

random variables.

Notation:

X ji = (Xi ,Xi+1, . . . ,Xj)

Pji is the joint probability distribution of X j

i

σji is the σ-algebra generated by X ji

11 / 24

Page 12: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

From two variables to stochastic processes (ii)

Dene the following mixing coecients

α(a) = suptα(σt−∞, σ

∞t+a)

β(a) = suptβ(σt−∞, σ

∞t+a)

ϕ(a) = suptϕ(σt−∞, σ

∞t+a)

We say that a sequence of random variables X∞−∞ is α,β or ϕ mixing if the corresponding mixing coecient→ 0 as a→∞.

These coecients measure dependence between futureand the past separated by a time units.

12 / 24

Page 13: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

Stationary stochastic processes

A stochastic process X∞−∞ is (strictly) stationary forany t ∈ Z and k , n ∈ N the distribution of X t+n

t is thesame as the distribution of X t+k+n

t+k .

For stationary processes mixing coecients can besimplied to

α(a) = α(σ0−∞, σ∞a )

β(a) = β(σ0−∞, σ∞a )

ϕ(a) = ϕ(σ0−∞, σ∞a )

13 / 24

Page 14: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

Connections to machine learning

Theorem (M. Mohri, A. Rostamizadeh, 2009): LetH = X → Y be a set of hypothesis and L be an M-bounded lossfunction. Let S be a sample of size 2µa from a stationary β-mixingprocess on X × Y , for any δ > 4(µ− 1)β(a) with probability at least1− δ′ the following holds for all h ∈ H

E[L(h(X ),Y )] ≤ 1

m

m∑i=1

L(h(Xi),Yi) + RSµ(L H) + 3M

√log 4

δ′

where RSµ denotes the empirical Rademacher complexity andδ′ = δ − 4(µ− 1)β(a).

Other results of the similar nature by R. Meir, M. Mohri and A.Rostamizadeh, I. Steinwart et. al. to name a few.

14 / 24

Page 15: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

Can we compute mixing coecients?

Theorem (M. Ahsen, M. Vidyasagar, 2013):Suppose X and Y are discrete random variables withknown joint and marginal probability distributions. Thencomputing α-mixing coecient is NP - hard. (equivalentto \partition problem").

Ahsen and Vidyasgar also give eciently computableupper and lower bounds.

15 / 24

Page 16: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

Can we compute mixing coecients? (continued)

Theorem (M. Ahsen, M. Vidyasagar, 2013):Suppose X and Y are discrete random variables withknown joint distribution θij and marginal probabilitydistributions µi and νj . Then one has that

β(σ(X ), σ(Y )) = 12

∑∑|γij |

ϕ(σ(X ), σ(Y )) = maxj

1νj

∑i

max(γij , 0)

where γij = θij − µiνj . Thus, β(σ(X ), σ(Y )) andϕ(σ(X ), σ(Y )) both are computable in polynomial time.

16 / 24

Page 17: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

Estimation of mixing coecients: naive approach (i)

Question: Given i.i.d. samples (X1,Y1), . . . , (Xm,Ym) from a jointdistribution of real-valued (X ,Y ), can we estimate any of the mixingcoecients?

Dene the following estimators of the joint and marginaldistributions:

(x) =1

m

m∑i=1

IXi≤x

(y) =1

m

m∑i=1

IYi≤y

(x , y) =1

m

m∑i=1

IXi≤x ,Yi≤y

Let β and ϕ be estimators of β and γ based on empirical c.d.f.’s.

17 / 24

Page 18: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

Estimation of mixing coecients: naive approach (ii)

Theorem (M. Ahsen, M. Vidyasagar, 2013):

ϕ ≥ β =m − 1

m→ 1 as m→∞

Justification: Under empirical probability distributionseach sample has mass 1/m. Marginals are also uniformand hence product distribution assigns mass of 1/m toeach point in the grid (xi , yj). The conclusion now followsfrom the above formula for discrete β.

18 / 24

Page 19: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

Estimation of mixing coecients: histograms (i)

A histogram estimator f of a density f based on a sampleX1, . . . ,Xm is

f (x) =J∑

j=1

pjmwj

IBj(x)

where

Bj ’s are bins partitioning the region with observations

pj =m∑i=1

IBj(Xi) counts number of samples in bin Bj

wj is the width of the j-th bin

19 / 24

Page 20: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

Estimation of mixing coecients: histograms (ii)

Given m samples choose Jm intervals on R so that eachbin contains bm/Jmc or bm/Jmc+ 1 samples from both Xand Y .Theorem (M. Ahsen, M. Vidyasagar, 2013):Suppose (X ,Y ) ∼ θ, X ∼ µ and Y ∼ ν with θ beingabsolutely continuous with respect to µ⊗ ν. Then βconverges to β provided that Jm/m→ 0. If in addition,the density f ∈ L∞ then α and ϕ also converge to α andϕ respectively.

The measure-theoretic arguments used in the proofestablish consistency of the estimators but do not yielderror rates.

20 / 24

Page 21: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

Estimation of mixing coecients: stochastic processes (i)

Two step approximation

|βd(a)− β(a)| ≤ |βd(a)− βd(a)|+ |βd(a)− β(a)|

where βd(a) = sup β(σtt−d , σt+a+dt+a ) and βd(a) is an

estimator based on

βd(a) = 12

∫|fd ⊗ fd − f2d |

with fd , f2d being d and 2d dimensional histogramestimators.

21 / 24

Page 22: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

Estimation of mixing coecients: stochastic processes (ii)

Theorem (D. McDonald, C. Shalizi, M. Shervish, 2011): LetXm1 be a sample from a stationary β-mixing process. For m = 2µmbm

and d ≤ µm we have that

P(|βd(a)− βd(a)| ≥ ε) ≤2 exp

(−µmε

21

2

)+ 2 exp

(−µmε

22

2

)+ 4(µm − 1)β(bm)

where ε1 = ε/2− E[∫|fd − fd |] and ε2 = ε− E[

∫|f2d − f2d |].

Proof is based on blocking technique.

22 / 24

Page 23: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

Estimation of mixing coecients: stochastic processes (iii)

|βd(a)− β(a)| a measure-theoretic argument can be used toshow that this → 0 as d →∞.

Under the assumption that densities fd and f2d are in the Sobolevspace H2 McDonald, Shalizi and Shervish argue that f2d and fdare consistent.

Choosing dm = O(exp(W (log n)), wm = O(m−km) where

km =W (logm) + 1

2logm

logm(12

exp(W (log n)) + 1)

and W is an inverse of w exp(w), they show that estimator of βbased on histograms is consistent.

23 / 24

Page 24: A survey on mixing coefficients: computation and estimation.munoz/schedule/2013/slides/mixing_coeff.pdf · Estimation of mixing coe cients: stochastic processes (iii) j d(a) (a)ja

Estimation of mixing coecients: discussion

Results do not provide convergence rate.

High-dimensional histogram estimation may not beaccurate.

Instead of estimating β directly intermediate step isused to estimate densities.

Estimators based on kernels instead of histograms?

24 / 24