Top Banner
Generalized Notions of Data Depth Spring 2015 Data Reading Seminar Mukund Raj 12 th Mar, 2015 1 / 25
25

Generalized Notions of Data Depth

Jan 13, 2017

Download

Data & Analytics

Mukund Raj
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Generalized Notions of Data Depth

Generalized Notions of Data DepthSpring 2015 Data Reading Seminar

Mukund Raj

12th Mar, 2015

1 / 25

Page 2: Generalized Notions of Data Depth

Outline

1 Data Depth BackgroundWhat is Data Depth?Geometrical Data DepthGeneral Properties of Data Depth

2 Generalized Notions of Data DepthFunctionsMultivariate CurvesSetsPaths (on a graph)

3 DiscussionRelaxed FormulationsAdvantages and Limitations of Data Depth

2 / 25

Page 3: Generalized Notions of Data Depth

What is Data Depth?

A means of measuring how deep a data point p is within acloud of points {p1, . . . , pn}.Multivariate data analysis approach to generate order statisticswhich capture high-dimensional features and relationships.

Descriptive nonparametric method of statistical analysis.

3 / 25

Page 4: Generalized Notions of Data Depth

Why is Data Depth Interesting?

Estimate the location from center outward ( with respect toparent distribution ).

Identify outliers.

Formulate quantitative and graphical methods for analyzingdistributional characteristics such as location, scale, e.t.c aswell as hypothesis testing.

Robustness.

4 / 25

Page 5: Generalized Notions of Data Depth

Various Formulations of Data Depth

Geometrical (for Data inEuclidean Space)

L2 depth

Mahalanobis depth

Oja depth

Expected convex hull depth

Zonoid depth

Simplex depth

Half Space depth or Tukeydepth or Location depth

Generalized (for Complex Data)

Functional Band Depth

Depth for MultivariateCurves

Sets

Paths on a Graph

5 / 25

Page 6: Generalized Notions of Data Depth

Geometrical data depth

Depth based on distances / volumes

L2 depth

Mahalanobis depth

Oja depth

Depth based on weighted means

Zonoid depth

Expected Convex Hull depth

Depth based on half spaces and simplices

Tukey depth

Simplicial depth

[Mosler 2012]

6 / 25

Page 7: Generalized Notions of Data Depth

General Properties of Data Depth

1 Zero at infinity

2 Maximality at Center

3 Monotonicity

4 Affine Invariance

[Zuo and Serfling, 2000]

7 / 25

Page 8: Generalized Notions of Data Depth

Outline

1 Data Depth BackgroundWhat is Data Depth?Geometrical Data DepthGeneral Properties of Data Depth

2 Generalized Notions of Data DepthFunctionsMultivariate CurvesSetsPaths (on a graph)

3 DiscussionRelaxed FormulationsAdvantages and Limitations of Data Depth

8 / 25

Page 9: Generalized Notions of Data Depth

Function Ensembles

A function ensemble can be defined as:{xi (t), i = 1, . . . , n, t ∈ I} where I is an interval in < andxi : < 7→ <

Time series observations annual trend of temperature orprecipitation, prices of commodities, heights of children versusage e.t.c.

9 / 25

Page 10: Generalized Notions of Data Depth

Motivation for Functional Band Depth

Challenge with regular multivariate analysis of functions

Curve ensembles that are sampled at different points.

Curse of dimensionality in case of current methods (e.g.PCA).

Contribution by [Lopez-Pintado et. al. 2009]

Given an ensemble of functions (sampled from a distribution),a formulation of data depth associated with the function.

10 / 25

Page 11: Generalized Notions of Data Depth

Functional Band Depth Formulation

Figure: A functional band [Lopez-Pintado et. al. 2009].

Functional band formulation:

g ⊂ B(f1, · · · , fj) iff ∀x mini∈{1...j}

{fi(x)} ≤ g(x) ≤ maxi∈{1...j}

{fi(x)}

(1)

Functional band depth formulation:

BDj (g) = P (g ⊂ B(f1, · · · , fj)) (2)

11 / 25

Page 12: Generalized Notions of Data Depth

Visualization of Data Depth for Functions

Figure: Visualization of functionensemble [Lopez-Pintados et. al.2009].

Figure: Boxplot visualization offunction ensemble [Sun et. al. 2011,Whitaker et. al. 2013].

12 / 25

Page 13: Generalized Notions of Data Depth

Multivariate Curve Ensembles

A parameterized curve can be defined in termsof an independent parameter s as:

c(s) = x(s) c : D 7→ R D ⊂ R,R ⊂ Rd

Hurricane paths.

Brain tractography data.

Pathline ensemble in fluid simulation. Figure: A syntheticensemble ofmultivariate curves in[Mirzargar et. al.2014]

13 / 25

Page 14: Generalized Notions of Data Depth

Data Depth Formulation for Multivariate Curves

(a) (b)

Figure: Band formed by 3 multivariate curves [Lopez-Pintado et. al.2014, Mirzargar et. al. 2014]

Curve band formulation:g ⊂ B(ci1 , · · · , cij ) iff ∀x g(x) ∈ simplex

(ci1(x), · · · , cij(x)

)(3)

Curve band depth formulation:

SBDj (g) = P(g ⊂ B(fc1 , · · · , cij)

)(4)

14 / 25

Page 15: Generalized Notions of Data Depth

Visualization of Data Depth for Curves

Figure: Chinese Script replicated100 times [Lopez-Pintado 2014].

Figure: Curve boxplot for hurricanepath ensemble [Mirzargar et. al.2014]

15 / 25

Page 16: Generalized Notions of Data Depth

Set / Isocontour Ensembles

Given an ensemble of real valued functionsf (x , y), the sublevel and superlevel sets for anyparticular isovalue.

Isocontours of temperature field.

Isocontours of pressure field in fluiddynamics simulations.

Figure: A syntheticensemble of contoursin [Whitaker et. al.2013]

16 / 25

Page 17: Generalized Notions of Data Depth

Data Depth Formulation for Sets

Figure: Examples of set band [Whitaker et. al. 2013]

Set band formulation:

S ∈ sB(S1, . . . ,Dj)↔j⋃

k=1

Sk ⊂ S ⊂j⋂

k=1

Sk (5)

Set band depth formulation:

sBDj (S) = P (S ⊂ sB(S1, . . . ,Sj) (6)

17 / 25

Page 18: Generalized Notions of Data Depth

Visualization of Data Depth for Sets

(a)

(b)

Figure: Contour boxplot for an ensemble of isocontours of pressure field[Whitaker et. al. 2013]

18 / 25

Page 19: Generalized Notions of Data Depth

Paths (on a graph)

Let G = {V ,E ,W }. A path p can be denotedas p : I 7→ V where index set I = (1, . . . ,m)

Paths of packets in computer networks.

Paths on transportation networksmodelled as graphs.

Figure: A syntheticensemble of paths ona graph.

19 / 25

Page 20: Generalized Notions of Data Depth

Data Depth Formulation for Paths

Figure: Illustration of band formed by 3 paths.

Path band formulation:

p ∈ B[Pj ] iff p(l) ∈ H[p1(l), . . . , pj(l)] ∀l ∈ I (7)

Path band depth formulation:

pBDj (p) = E [χ(p ∈ B(pj))] (8)

20 / 25

Page 21: Generalized Notions of Data Depth

Visualization of Data Depth for Paths

(a) (b)

Figure: Path boxplots for paths on AS and road graphs.

21 / 25

Page 22: Generalized Notions of Data Depth

Outline

1 Data Depth BackgroundWhat is Data Depth?Geometrical Data DepthGeneral Properties of Data Depth

2 Generalized Notions of Data DepthFunctionsMultivariate CurvesSetsPaths (on a graph)

3 DiscussionRelaxed FormulationsAdvantages and Limitations of Data Depth

22 / 25

Page 23: Generalized Notions of Data Depth

Relaxed formulations

1 Modified Band Depth - Instead of an indicator function,measure object inside the band.

2 ε Subsets - Indicator function with a relaxed threshold.

23 / 25

Page 24: Generalized Notions of Data Depth

Advantages and Limitations

For Combinatorial Data Depth Formulations for Complex Data

Advantages

No assumption required for the underlying distribution.

Captures nonlocal relationships

Robust.

Limitations

Computationally expensive for large ensembles.

24 / 25

Page 25: Generalized Notions of Data Depth

Thank You

Questions?

25 / 25