What does it mean for data to have shape? Elizabeth Munch University at Albany – SUNY:: Dept. of Mathematics & Statistics Apr 7, 2016 Liz Munch (UAlbany) TDA Apr 7, 2016 1 / 24
What does it mean for data to have shape?
Elizabeth Munch
University at Albany – SUNY:: Dept. of Mathematics & Statistics
Apr 7, 2016
Liz Munch (UAlbany) TDA Apr 7, 2016 1 / 24
What does it mean for data to have shape?
Elizabeth Munch Data Point
University at Albany – SUNY:: Dept. of Mathematics & Statistics
Apr 7, 2016
Liz Munch (UAlbany) TDA Apr 7, 2016 1 / 24
(-0.02,-1.62) (-1.38,-0.93) (1.22,1.55) (-0.71,-1.48) (-0.17,-0.99)(0.25,-1.19) (-0.48,-1.71) (1.21,1.06) (-0.4,-1.73) (0.21,-1.87) (-0.09,1.23)(-0.95,0.33) (1.07,0.22) (1.87,-0.17) (-1.69,0.06) (-0.76,-0.9) (0.38,1.49)(-0.22,-1.31) (0.67,-1.58) (1.39,1.13) (-1.07,1.2) (1.26,1.02) (0.63,-1.01)(-1.13,0.37) (0.82,1.26) (0.92,0.46) (0.27,-1.22) (1.24,-1.56) (-1.38,1.0)(1.43,0.98) (-0.96,0.98) (1.77,-0.08) (-0.27,1.64) (1.48,1.2) (1.08,1.3)
(-1.16,-0.3) (-1.29,1.5) (-0.14,-1.93) (0.32,1.78) (-1.5,0.72) (-1.28,-0.63)(0.03,1.1) (1.57,-1.05) (-1.5,-0.34) (-0.22,-1.53) (0.39,-1.59) (-1.81,0.59)
(-0.38,-1.63) (-0.69,1.62) (-0.5,1.25) (-1.71,-1.03) (1.1,-0.11) (-0.02,-1.48)(-1.3,-0.25) (-1.37,0.84) (-0.88,-1.39) (-0.38,-1.77) (0.0,1.72) (-0.61,1.75)(0.15,1.74) (-0.11,-1.55) (-1.53,0.2) (-0.96,0.43) (-0.87,0.79) (-0.36,1.03)(1.59,0.15) (-0.13,1.18) (1.21,-0.35) (1.18,-0.85) (-1.2,1.27) (-1.43,-0.91)
(-1.44,-0.06) (-1.86,-0.55) (0.5,-1.24) (-1.78,-0.07) (0.48,-1.22)(-0.43,1.02) (1.37,-0.91) (-1.59,0.98) (1.15,-0.1) (-1.59,-0.6) (0.09,1.25)(0.32,1.53) (0.89,-1.43) (1.15,-1.22) (0.29,1.84) (-0.4,1.61) (-1.57,-1.07)
(-0.29,-1.55) (1.42,-0.99) (0.86,-1.81) (1.43,-1.15) (-0.53,1.65)(-1.18,-0.72) (-0.59,1.22) (-1.22,-0.61) (0.19,-1.26) (1.82,-0.84)
(-0.06,1.36) (-1.27,0.59)
Liz Munch (UAlbany) TDA Apr 7, 2016 2 / 24
Large Data Sets
Main goal of Topological Data Analysis (TDA)
Find and quantify structure in big data.
Goals of this talk
What tools are available?
How do we fit educational data into this pipeline?Spoiler alert: I don’t know how to do this....
Liz Munch (UAlbany) TDA Apr 7, 2016 3 / 24
Large Data Sets
Main goal of Topological Data Analysis (TDA)
Find and quantify structure in big data.
Goals of this talk
What tools are available?
How do we fit educational data into this pipeline?
Spoiler alert: I don’t know how to do this....
Liz Munch (UAlbany) TDA Apr 7, 2016 3 / 24
Large Data Sets
Main goal of Topological Data Analysis (TDA)
Find and quantify structure in big data.
Goals of this talk
What tools are available?
How do we fit educational data into this pipeline?Spoiler alert: I don’t know how to do this....
Liz Munch (UAlbany) TDA Apr 7, 2016 3 / 24
What does it mean for data to have shape?
Topology 6= Topography
Mathematical study of spacespreserved under continuousdeformations
stretching and bending
not tearing or gluing
Study of the shape andfeatures of the surface of theEarth
Liz Munch (UAlbany) TDA Apr 7, 2016 5 / 24
History
LeonhardEuler
(1707-1783)Euler Characteristic
Images: Wikipedia
Liz Munch (UAlbany) TDA Apr 7, 2016 6 / 24
History Pt 2
Esoteric field of study 1700-2000I Algebraic topologyI Applications/intersections with dynamical systemsI Would never be considered “applied” in traditional sense.
Topology, the pinnacle of human thought.In four centuries it may be useful.- Alexander Solzhenitzin, “The First Circle” 1968
Things change ca.2000I Introduction of Persistent Homology
Liz Munch (UAlbany) TDA Apr 7, 2016 7 / 24
History Pt 2
Esoteric field of study 1700-2000I Algebraic topologyI Applications/intersections with dynamical systemsI Would never be considered “applied” in traditional sense.
Topology, the pinnacle of human thought.In four centuries it may be useful.- Alexander Solzhenitzin, “The First Circle” 1968
Things change ca.2000I Introduction of Persistent Homology
Liz Munch (UAlbany) TDA Apr 7, 2016 7 / 24
History Pt 2
Esoteric field of study 1700-2000I Algebraic topologyI Applications/intersections with dynamical systemsI Would never be considered “applied” in traditional sense.
Topology, the pinnacle of human thought.In four centuries it may be useful.- Alexander Solzhenitzin, “The First Circle” 1968
Things change ca.2000I Introduction of Persistent Homology
Liz Munch (UAlbany) TDA Apr 7, 2016 7 / 24
Main questions
How do we quantify the structure we see?
Can we calculate something to represent the structure?
Liz Munch (UAlbany) TDA Apr 7, 2016 8 / 24
Very small radius is justdots.
Very large radius is just ablob.
Some range of radii lets ussee the big circle.
Some small circles appearand disappear quickly....maybe we get to just callthese noise!
How do we quantify this?
Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24
Very small radius is justdots.
Very large radius is just ablob.
Some range of radii lets ussee the big circle.
Some small circles appearand disappear quickly....maybe we get to just callthese noise!
How do we quantify this?
Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24
Very small radius is justdots.
Very large radius is just ablob.
Some range of radii lets ussee the big circle.
Some small circles appearand disappear quickly....maybe we get to just callthese noise!
How do we quantify this?
Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24
Very small radius is justdots.
Very large radius is just ablob.
Some range of radii lets ussee the big circle.
Some small circles appearand disappear quickly....maybe we get to just callthese noise!
How do we quantify this?
Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24
Very small radius is justdots.
Very large radius is just ablob.
Some range of radii lets ussee the big circle.
Some small circles appearand disappear quickly....maybe we get to just callthese noise!
How do we quantify this?
Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24
Very small radius is justdots.
Very large radius is just ablob.
Some range of radii lets ussee the big circle.
Some small circles appearand disappear quickly....maybe we get to just callthese noise!
How do we quantify this?
Liz Munch (UAlbany) TDA Apr 7, 2016 9 / 24
Homology & Persistent Homology
What is Homology?
A topological invariant which assignsa sequence of vector spaces, Hk(X ),to a given topological space X .
What is Persistent Homology?
A way to watch how the homology ofa filtration (sequence) of topologicalspaces changes so that we canunderstand something about thespace.
Liz Munch (UAlbany) TDA Apr 7, 2016 10 / 24
Homology & Persistent Homology
What is Homology?
A topological invariant which assignsa sequence of vector spaces, Hk(X ),to a given topological space X .
What is Persistent Homology?
A way to watch how the homology ofa filtration (sequence) of topologicalspaces changes so that we canunderstand something about thespace.
Liz Munch (UAlbany) TDA Apr 7, 2016 10 / 24
Homology & Persistent Homology
What is Homology?
A topological invariant which assignsa sequence of vector spaces, Hk(X ),to a given topological space X .
What is Persistent Homology?
A way to watch how the homology ofa filtration (sequence) of topologicalspaces changes so that we canunderstand something about thespace.
Liz Munch (UAlbany) TDA Apr 7, 2016 10 / 24
Circles are useful when you least expect it.
Caveat:
Persistence does more than circles....
Liz Munch (UAlbany) TDA Apr 7, 2016 12 / 24
Circles are useful when you least expect it.
Caveat:
Persistence does more than circles....
Liz Munch (UAlbany) TDA Apr 7, 2016 12 / 24
Machining Dynamics
Workpiece
Stable
feed
Unstable
Images: Firas Khasawneh, SUNY Polytechnic Institute; and Boeing.
Liz Munch (UAlbany) TDA Apr 7, 2016 13 / 24
Delay embedding
Definition
Given a time series X (t), the delay embedding is
ψmη : t 7−→ (X (t),X (t + η), · · · ,X (t + (m − 1)η)).
Liz Munch (UAlbany) TDA Apr 7, 2016 15 / 24
Differentiation by Max Persistence
100 120 140 160 180 200 220 240−1.0
−0.5
0.0
0.5
1.0
1.5
2.0 Signal, [0.9, 0.07]
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0Y(t)
−1.0
−0.5
0.0
0.5
1.0
1.5
2.0
Y(t
+2.
13)
Takens Embedding, [0.9, 0.07]
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8Birth Radius
0.00.20.40.60.81.01.21.41.61.8
Dea
thR
adiu
s
Persistence Diagram, [0.9, 0.07]
70 80 90 100 110 120 130 140 150−1.0
−0.5
0.0
0.5
1.0
1.5
2.0 Signal, [1.42, 0.05]
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0Y(t)
−1.0
−0.5
0.0
0.5
1.0
1.5
2.0
Y(t
+1.
62)
Takens Embedding, [1.42, 0.05]
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8Birth Radius
0.00.20.40.60.81.01.21.41.61.8
Dea
thR
adiu
s
Persistence Diagram, [1.42, 0.05]
60 70 80 90 100 110 120 130 140−1.0
−0.5
0.0
0.5
1.0
1.5
2.0 Signal, [1.48, 0.25]
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0Y(t)
−1.0
−0.5
0.0
0.5
1.0
1.5
2.0
Y(t
+1.
56)
Takens Embedding, [1.48, 0.25]
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8Birth Radius
0.00.20.40.60.81.01.21.41.61.8
Dea
thR
adiu
s
Persistence Diagram, [1.48, 0.25]
Liz Munch (UAlbany) TDA Apr 7, 2016 16 / 24
Turning Model
Results
Warm colors⇒ High max persistence⇒ Chatter
Cool colors⇒ Low max persistence⇒ No Chatter
Combination withMachine LearningMethods⇒ 97% classificationaccuracy
Liz Munch (UAlbany) TDA Apr 7, 2016 17 / 24
Mapper
Breast cancer gene expression data
Determine a good filter function
Run mapper
Found new type of breast cancer (c-MYB+) with high survival rateImage: Nicolau Levine Carlsson, PNAS 2011
Liz Munch (UAlbany) TDA Apr 7, 2016 22 / 24
Conclusions
Topology can help find structure in data that is not obvious by othermeans.
Lots of tools available, lots of open-source code for computation!I Mapper, Reeb graph, Contour Tree, Merge tree
F Python mapper - danifold.net/mapper/
I PersistenceF Perseus - sas.upenn.edu/~vnanda/perseus/F Dionysus - mrzv.org/software/dionysus/F R TDA - cran.r-project.org/web/packages/TDA/F PHAT - bitbucket.org/phat-code/phat
Input from domain scientists is imperative!I What is the right question?I What is the right tool?I How do we interpret the output?
Liz Munch (UAlbany) TDA Apr 7, 2016 23 / 24
Conclusions
Topology can help find structure in data that is not obvious by othermeans.
Lots of tools available, lots of open-source code for computation!I Mapper, Reeb graph, Contour Tree, Merge tree
F Python mapper - danifold.net/mapper/
I PersistenceF Perseus - sas.upenn.edu/~vnanda/perseus/F Dionysus - mrzv.org/software/dionysus/F R TDA - cran.r-project.org/web/packages/TDA/F PHAT - bitbucket.org/phat-code/phat
Input from domain scientists is imperative!I What is the right question?I What is the right tool?I How do we interpret the output?
Liz Munch (UAlbany) TDA Apr 7, 2016 23 / 24
Conclusions
Topology can help find structure in data that is not obvious by othermeans.
Lots of tools available, lots of open-source code for computation!I Mapper, Reeb graph, Contour Tree, Merge tree
F Python mapper - danifold.net/mapper/
I PersistenceF Perseus - sas.upenn.edu/~vnanda/perseus/F Dionysus - mrzv.org/software/dionysus/F R TDA - cran.r-project.org/web/packages/TDA/F PHAT - bitbucket.org/phat-code/phat
Input from domain scientists is imperative!I What is the right question?I What is the right tool?I How do we interpret the output?
Liz Munch (UAlbany) TDA Apr 7, 2016 23 / 24
Thank you!
CollaboratorsJose Perea (MSU)
Firas Khasawneh (SUNY Poly)
Liz Munch (UAlbany) TDA Apr 7, 2016 24 / 24