Top Banner
What does it mean for data to have shape? Elizabeth Munch University at Albany – SUNY:: Dept. of Mathematics & Statistics Apr 7, 2016 Liz Munch (UAlbany) TDA Apr 7, 2016 1 / 24
46

Elizabeth Munch SOED 2016

Jan 25, 2017

Download

Education

Colleen Ganley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Elizabeth Munch SOED 2016

What does it mean for data to have shape?

Elizabeth Munch

University at Albany – SUNY:: Dept. of Mathematics & Statistics

Apr 7, 2016

Liz Munch (UAlbany) TDA Apr 7, 2016 1 / 24

Page 2: Elizabeth Munch SOED 2016

What does it mean for data to have shape?

Elizabeth Munch Data Point

University at Albany – SUNY:: Dept. of Mathematics & Statistics

Apr 7, 2016

Liz Munch (UAlbany) TDA Apr 7, 2016 1 / 24

Page 3: Elizabeth Munch SOED 2016

(-0.02,-1.62) (-1.38,-0.93) (1.22,1.55) (-0.71,-1.48) (-0.17,-0.99)(0.25,-1.19) (-0.48,-1.71) (1.21,1.06) (-0.4,-1.73) (0.21,-1.87) (-0.09,1.23)(-0.95,0.33) (1.07,0.22) (1.87,-0.17) (-1.69,0.06) (-0.76,-0.9) (0.38,1.49)(-0.22,-1.31) (0.67,-1.58) (1.39,1.13) (-1.07,1.2) (1.26,1.02) (0.63,-1.01)(-1.13,0.37) (0.82,1.26) (0.92,0.46) (0.27,-1.22) (1.24,-1.56) (-1.38,1.0)(1.43,0.98) (-0.96,0.98) (1.77,-0.08) (-0.27,1.64) (1.48,1.2) (1.08,1.3)

(-1.16,-0.3) (-1.29,1.5) (-0.14,-1.93) (0.32,1.78) (-1.5,0.72) (-1.28,-0.63)(0.03,1.1) (1.57,-1.05) (-1.5,-0.34) (-0.22,-1.53) (0.39,-1.59) (-1.81,0.59)

(-0.38,-1.63) (-0.69,1.62) (-0.5,1.25) (-1.71,-1.03) (1.1,-0.11) (-0.02,-1.48)(-1.3,-0.25) (-1.37,0.84) (-0.88,-1.39) (-0.38,-1.77) (0.0,1.72) (-0.61,1.75)(0.15,1.74) (-0.11,-1.55) (-1.53,0.2) (-0.96,0.43) (-0.87,0.79) (-0.36,1.03)(1.59,0.15) (-0.13,1.18) (1.21,-0.35) (1.18,-0.85) (-1.2,1.27) (-1.43,-0.91)

(-1.44,-0.06) (-1.86,-0.55) (0.5,-1.24) (-1.78,-0.07) (0.48,-1.22)(-0.43,1.02) (1.37,-0.91) (-1.59,0.98) (1.15,-0.1) (-1.59,-0.6) (0.09,1.25)(0.32,1.53) (0.89,-1.43) (1.15,-1.22) (0.29,1.84) (-0.4,1.61) (-1.57,-1.07)

(-0.29,-1.55) (1.42,-0.99) (0.86,-1.81) (1.43,-1.15) (-0.53,1.65)(-1.18,-0.72) (-0.59,1.22) (-1.22,-0.61) (0.19,-1.26) (1.82,-0.84)

(-0.06,1.36) (-1.27,0.59)

Liz Munch (UAlbany) TDA Apr 7, 2016 2 / 24

Page 4: Elizabeth Munch SOED 2016

Liz Munch (UAlbany) TDA Apr 7, 2016 2 / 24

Page 5: Elizabeth Munch SOED 2016

Large Data Sets

Main goal of Topological Data Analysis (TDA)

Find and quantify structure in big data.

Goals of this talk

What tools are available?

How do we fit educational data into this pipeline?Spoiler alert: I don’t know how to do this....

Liz Munch (UAlbany) TDA Apr 7, 2016 3 / 24

Page 6: Elizabeth Munch SOED 2016

Large Data Sets

Main goal of Topological Data Analysis (TDA)

Find and quantify structure in big data.

Goals of this talk

What tools are available?

How do we fit educational data into this pipeline?

Spoiler alert: I don’t know how to do this....

Liz Munch (UAlbany) TDA Apr 7, 2016 3 / 24

Page 7: Elizabeth Munch SOED 2016

Large Data Sets

Main goal of Topological Data Analysis (TDA)

Find and quantify structure in big data.

Goals of this talk

What tools are available?

How do we fit educational data into this pipeline?Spoiler alert: I don’t know how to do this....

Liz Munch (UAlbany) TDA Apr 7, 2016 3 / 24

Page 8: Elizabeth Munch SOED 2016

1 Persistent Homology

2 Reeb graphs and Mapper

Liz Munch (UAlbany) TDA Apr 7, 2016 4 / 24

Page 9: Elizabeth Munch SOED 2016

1 Persistent Homology

2 Reeb graphs and Mapper

Liz Munch (UAlbany) TDA Apr 7, 2016 4 / 24

Page 10: Elizabeth Munch SOED 2016

What does it mean for data to have shape?

Topology 6= Topography

Mathematical study of spacespreserved under continuousdeformations

stretching and bending

not tearing or gluing

Study of the shape andfeatures of the surface of theEarth

Liz Munch (UAlbany) TDA Apr 7, 2016 5 / 24

Page 11: Elizabeth Munch SOED 2016

History

LeonhardEuler

(1707-1783)Euler Characteristic

Images: Wikipedia

Liz Munch (UAlbany) TDA Apr 7, 2016 6 / 24

Page 12: Elizabeth Munch SOED 2016

History Pt 2

Esoteric field of study 1700-2000I Algebraic topologyI Applications/intersections with dynamical systemsI Would never be considered “applied” in traditional sense.

Topology, the pinnacle of human thought.In four centuries it may be useful.- Alexander Solzhenitzin, “The First Circle” 1968

Things change ca.2000I Introduction of Persistent Homology

Liz Munch (UAlbany) TDA Apr 7, 2016 7 / 24

Page 13: Elizabeth Munch SOED 2016

History Pt 2

Esoteric field of study 1700-2000I Algebraic topologyI Applications/intersections with dynamical systemsI Would never be considered “applied” in traditional sense.

Topology, the pinnacle of human thought.In four centuries it may be useful.- Alexander Solzhenitzin, “The First Circle” 1968

Things change ca.2000I Introduction of Persistent Homology

Liz Munch (UAlbany) TDA Apr 7, 2016 7 / 24

Page 14: Elizabeth Munch SOED 2016

History Pt 2

Esoteric field of study 1700-2000I Algebraic topologyI Applications/intersections with dynamical systemsI Would never be considered “applied” in traditional sense.

Topology, the pinnacle of human thought.In four centuries it may be useful.- Alexander Solzhenitzin, “The First Circle” 1968

Things change ca.2000I Introduction of Persistent Homology

Liz Munch (UAlbany) TDA Apr 7, 2016 7 / 24

Page 15: Elizabeth Munch SOED 2016

Main questions

How do we quantify the structure we see?

Can we calculate something to represent the structure?

Liz Munch (UAlbany) TDA Apr 7, 2016 8 / 24

Page 16: Elizabeth Munch SOED 2016

Very small radius is justdots.

Very large radius is just ablob.

Some range of radii lets ussee the big circle.

Some small circles appearand disappear quickly....maybe we get to just callthese noise!

How do we quantify this?

Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24

Page 17: Elizabeth Munch SOED 2016

Very small radius is justdots.

Very large radius is just ablob.

Some range of radii lets ussee the big circle.

Some small circles appearand disappear quickly....maybe we get to just callthese noise!

How do we quantify this?

Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24

Page 18: Elizabeth Munch SOED 2016

Very small radius is justdots.

Very large radius is just ablob.

Some range of radii lets ussee the big circle.

Some small circles appearand disappear quickly....maybe we get to just callthese noise!

How do we quantify this?

Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24

Page 19: Elizabeth Munch SOED 2016

Very small radius is justdots.

Very large radius is just ablob.

Some range of radii lets ussee the big circle.

Some small circles appearand disappear quickly....maybe we get to just callthese noise!

How do we quantify this?

Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24

Page 20: Elizabeth Munch SOED 2016

Very small radius is justdots.

Very large radius is just ablob.

Some range of radii lets ussee the big circle.

Some small circles appearand disappear quickly....maybe we get to just callthese noise!

How do we quantify this?

Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24

Page 21: Elizabeth Munch SOED 2016

Very small radius is justdots.

Very large radius is just ablob.

Some range of radii lets ussee the big circle.

Some small circles appearand disappear quickly....maybe we get to just callthese noise!

How do we quantify this?

Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24

Page 22: Elizabeth Munch SOED 2016

Homology & Persistent Homology

What is Homology?

A topological invariant which assignsa sequence of vector spaces, Hk(X ),to a given topological space X .

What is Persistent Homology?

A way to watch how the homology ofa filtration (sequence) of topologicalspaces changes so that we canunderstand something about thespace.

Liz Munch (UAlbany) TDA Apr 7, 2016 10 / 24

Page 23: Elizabeth Munch SOED 2016

Homology & Persistent Homology

What is Homology?

A topological invariant which assignsa sequence of vector spaces, Hk(X ),to a given topological space X .

What is Persistent Homology?

A way to watch how the homology ofa filtration (sequence) of topologicalspaces changes so that we canunderstand something about thespace.

Liz Munch (UAlbany) TDA Apr 7, 2016 10 / 24

Page 24: Elizabeth Munch SOED 2016

Homology & Persistent Homology

What is Homology?

A topological invariant which assignsa sequence of vector spaces, Hk(X ),to a given topological space X .

What is Persistent Homology?

A way to watch how the homology ofa filtration (sequence) of topologicalspaces changes so that we canunderstand something about thespace.

Liz Munch (UAlbany) TDA Apr 7, 2016 10 / 24

Page 25: Elizabeth Munch SOED 2016

Understanding a persistence diagram

Liz Munch (UAlbany) TDA Apr 7, 2016 11 / 24

Page 26: Elizabeth Munch SOED 2016

Circles are useful when you least expect it.

Caveat:

Persistence does more than circles....

Liz Munch (UAlbany) TDA Apr 7, 2016 12 / 24

Page 27: Elizabeth Munch SOED 2016

Circles are useful when you least expect it.

Caveat:

Persistence does more than circles....

Liz Munch (UAlbany) TDA Apr 7, 2016 12 / 24

Page 28: Elizabeth Munch SOED 2016

Machining Dynamics

Workpiece

Stable

feed

Unstable

Images: Firas Khasawneh, SUNY Polytechnic Institute; and Boeing.

Liz Munch (UAlbany) TDA Apr 7, 2016 13 / 24

Page 29: Elizabeth Munch SOED 2016

Chatter

Liz Munch (UAlbany) TDA Apr 7, 2016 14 / 24

Page 30: Elizabeth Munch SOED 2016

Delay embedding

Definition

Given a time series X (t), the delay embedding is

ψmη : t 7−→ (X (t),X (t + η), · · · ,X (t + (m − 1)η)).

Liz Munch (UAlbany) TDA Apr 7, 2016 15 / 24

Page 31: Elizabeth Munch SOED 2016

Differentiation by Max Persistence

100 120 140 160 180 200 220 240−1.0

−0.5

0.0

0.5

1.0

1.5

2.0 Signal, [0.9, 0.07]

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0Y(t)

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

Y(t

+2.

13)

Takens Embedding, [0.9, 0.07]

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8Birth Radius

0.00.20.40.60.81.01.21.41.61.8

Dea

thR

adiu

s

Persistence Diagram, [0.9, 0.07]

70 80 90 100 110 120 130 140 150−1.0

−0.5

0.0

0.5

1.0

1.5

2.0 Signal, [1.42, 0.05]

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0Y(t)

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

Y(t

+1.

62)

Takens Embedding, [1.42, 0.05]

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8Birth Radius

0.00.20.40.60.81.01.21.41.61.8

Dea

thR

adiu

s

Persistence Diagram, [1.42, 0.05]

60 70 80 90 100 110 120 130 140−1.0

−0.5

0.0

0.5

1.0

1.5

2.0 Signal, [1.48, 0.25]

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0Y(t)

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

Y(t

+1.

56)

Takens Embedding, [1.48, 0.25]

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8Birth Radius

0.00.20.40.60.81.01.21.41.61.8

Dea

thR

adiu

s

Persistence Diagram, [1.48, 0.25]

Liz Munch (UAlbany) TDA Apr 7, 2016 16 / 24

Page 32: Elizabeth Munch SOED 2016

Turning Model

0.5 1 1.5 2 2.5 30

0.05

0.1

0.15

0.2

0.25

Liz Munch (UAlbany) TDA Apr 7, 2016 17 / 24

Page 33: Elizabeth Munch SOED 2016

Turning Model

Results

Warm colors⇒ High max persistence⇒ Chatter

Cool colors⇒ Low max persistence⇒ No Chatter

Combination withMachine LearningMethods⇒ 97% classificationaccuracy

Liz Munch (UAlbany) TDA Apr 7, 2016 17 / 24

Page 34: Elizabeth Munch SOED 2016

1 Persistent Homology

2 Reeb graphs and Mapper

Liz Munch (UAlbany) TDA Apr 7, 2016 18 / 24

Page 35: Elizabeth Munch SOED 2016

Clustering

Liz Munch (UAlbany) TDA Apr 7, 2016 19 / 24

Page 36: Elizabeth Munch SOED 2016

1-Dimensional Structure

Liz Munch (UAlbany) TDA Apr 7, 2016 20 / 24

Page 37: Elizabeth Munch SOED 2016

1-Dimensional Structure

Liz Munch (UAlbany) TDA Apr 7, 2016 20 / 24

Page 38: Elizabeth Munch SOED 2016

Original Reeb Graph construction

Liz Munch (UAlbany) TDA Apr 7, 2016 21 / 24

Page 39: Elizabeth Munch SOED 2016

Original Reeb Graph construction

Liz Munch (UAlbany) TDA Apr 7, 2016 21 / 24

Page 40: Elizabeth Munch SOED 2016

Mapper

Image: Nicolau Levine Carlsson, PNAS 2011

Liz Munch (UAlbany) TDA Apr 7, 2016 22 / 24

Page 41: Elizabeth Munch SOED 2016

Mapper

Breast cancer gene expression data

Determine a good filter function

Run mapper

Found new type of breast cancer (c-MYB+) with high survival rateImage: Nicolau Levine Carlsson, PNAS 2011

Liz Munch (UAlbany) TDA Apr 7, 2016 22 / 24

Page 42: Elizabeth Munch SOED 2016

Mapper

Image: Nicolau Levine Carlsson, PNAS 2011

Liz Munch (UAlbany) TDA Apr 7, 2016 22 / 24

Page 43: Elizabeth Munch SOED 2016

Conclusions

Topology can help find structure in data that is not obvious by othermeans.

Lots of tools available, lots of open-source code for computation!I Mapper, Reeb graph, Contour Tree, Merge tree

F Python mapper - danifold.net/mapper/

I PersistenceF Perseus - sas.upenn.edu/~vnanda/perseus/F Dionysus - mrzv.org/software/dionysus/F R TDA - cran.r-project.org/web/packages/TDA/F PHAT - bitbucket.org/phat-code/phat

Input from domain scientists is imperative!I What is the right question?I What is the right tool?I How do we interpret the output?

Liz Munch (UAlbany) TDA Apr 7, 2016 23 / 24

Page 44: Elizabeth Munch SOED 2016

Conclusions

Topology can help find structure in data that is not obvious by othermeans.

Lots of tools available, lots of open-source code for computation!I Mapper, Reeb graph, Contour Tree, Merge tree

F Python mapper - danifold.net/mapper/

I PersistenceF Perseus - sas.upenn.edu/~vnanda/perseus/F Dionysus - mrzv.org/software/dionysus/F R TDA - cran.r-project.org/web/packages/TDA/F PHAT - bitbucket.org/phat-code/phat

Input from domain scientists is imperative!I What is the right question?I What is the right tool?I How do we interpret the output?

Liz Munch (UAlbany) TDA Apr 7, 2016 23 / 24

Page 45: Elizabeth Munch SOED 2016

Conclusions

Topology can help find structure in data that is not obvious by othermeans.

Lots of tools available, lots of open-source code for computation!I Mapper, Reeb graph, Contour Tree, Merge tree

F Python mapper - danifold.net/mapper/

I PersistenceF Perseus - sas.upenn.edu/~vnanda/perseus/F Dionysus - mrzv.org/software/dionysus/F R TDA - cran.r-project.org/web/packages/TDA/F PHAT - bitbucket.org/phat-code/phat

Input from domain scientists is imperative!I What is the right question?I What is the right tool?I How do we interpret the output?

Liz Munch (UAlbany) TDA Apr 7, 2016 23 / 24

Page 46: Elizabeth Munch SOED 2016

Thank you!

CollaboratorsJose Perea (MSU)

Firas Khasawneh (SUNY Poly)

[email protected]

Liz Munch (UAlbany) TDA Apr 7, 2016 24 / 24