Machine Learning for Data Science (CS4786) Lecture 9 · METHOD I: ISOMAP 1 For every point, find its (k-) Nearest Neighbors 2 Form the Nearest Neighbor graph 3 For every pair of

Post on 14-Oct-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Isomap + TSNE

Machine Learning for Data Science (CS4786)Lecture 9

Principal Component Analysis

Course Webpage :http://www.cs.cornell.edu/Courses/cs4786/2017fa/

MANIFOLD BASED DIMENSIONALITY REDUCTION

Key Assumption: Points live on a low dimensional manifold

Manifold: subspace that looks locally Euclidean

Given data, can we uncover this manifold?

Can we unfold this?

METHOD I: ISOMAP

1 For every point, find its (k-) Nearest Neighbors

2 Form the Nearest Neighbor graph

3 For every pair of points A and B, distance between point A to B isshortest distance between A and B on graph

4 Find points in low dimensional space such that distances betweenpoints in this space is equal to distance on graph.

METHOD I: ISOMAP

1 For every point, find its (k-) Nearest Neighbors

2 Form the Nearest Neighbor graph

3 For every pair of points A and B, distance between point A to B isshortest distance between A and B on graph

4 Find points in low dimensional space such that distances betweenpoints in this space is equal to distance on graph.

METHOD I: ISOMAP

1 For every point, find its (k-) Nearest Neighbors

2 Form the Nearest Neighbor graph

3 For every pair of points A and B, distance between point A to B isshortest distance between A and B on graph

4 Find points in low dimensional space such that distances betweenpoints in this space is equal to distance on graph.

Pair-wise Distance

Matrix

METHOD I: ISOMAP

1 For every point, find its (k-) Nearest Neighbors

2 Form the Nearest Neighbor graph

3 For every pair of points A and B, distance between point A to B isshortest distance between A and B on graph

4 Find points in low dimensional space such that distances betweenpoints in this space is equal to distance on graph.

Pair-wise Distance

Matrix

ISOMAP: PITFALLS

1 If we don’t take enough nearest neighbors, then graph may not beconnected

2 If we connect points too far away, points that should not beconnected can get connected

3 There may not be a right number of nearest neighbors we shouldconsider!

STOCHASTIC NEIGHBORHOOD EMBEDDING

Use a probabilistic notion of which points are neighbors.

Close by points are neighbors with high probability, . . .Eg: For point xt, point xs is picked as neighbor with probability

pt→s = exp(− �xs−xt�2

2�2 )∑u≠t exp(− �xu−xt�2

2�2 )Probability that points s and t are connected Ps,t = Pt,s = pt→s+ps→t

2n

Goal: Find y1, . . . ,yn with stochastic neighborhood distribution Qsuch that “P and Q are similar”

i.e. minimize:

KL(P�Q) =�s,t

Ps,t log� Ps,t

Qs,t� =�

s,tPs,t log (Ps,t) −�

s,tPs,t log (Qs,t)

STOCHASTIC NEIGHBORHOOD EMBEDDING

Use a probabilistic notion of which points are neighbors.

Close by points are neighbors with high probability, . . .Eg: For point xt, point xs is picked as neighbor with probability

pt→s = exp(− �xs−xt�2

2�2 )∑u≠t exp(− �xu−xt�2

2�2 )Probability that points s and t are connected Ps,t = Pt,s = pt→s+ps→t

2n

Goal: Find y1, . . . ,yn with stochastic neighborhood distribution Qsuch that “P and Q are similar”

i.e. minimize:

KL(P�Q) =�s,t

Ps,t log� Ps,t

Qs,t� =�

s,tPs,t log (Ps,t) −�

s,tPs,t log (Qs,t)

Stochastic neighborhood distribution P<latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit>

STOCHASTIC NEIGHBORHOOD EMBEDDING

Use a probabilistic notion of which points are neighbors.

Close by points are neighbors with high probability, . . .Eg: For point xt, point xs is picked as neighbor with probability

pt→s = exp(− �xs−xt�2

2�2 )∑u≠t exp(− �xu−xt�2

2�2 )Probability that points s and t are connected Ps,t = Pt,s = pt→s+ps→t

2n

Goal: Find y1, . . . ,yn with stochastic neighborhood distribution Qsuch that “P and Q are similar”

i.e. minimize:

KL(P�Q) =�s,t

Ps,t log� Ps,t

Qs,t� =�

s,tPs,t log (Ps,t) −�

s,tPs,t log (Qs,t)

Stochastic neighborhood distribution P<latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit>

STOCHASTIC NEIGHBORHOOD EMBEDDING

Use a probabilistic notion of which points are neighbors.

Close by points are neighbors with high probability, . . .Eg: For point xt, point xs is picked as neighbor with probability

pt→s = exp(− �xs−xt�2

2�2 )∑u≠t exp(− �xu−xt�2

2�2 )Probability that points s and t are connected Ps,t = Pt,s = pt→s+ps→t

2n

Goal: Find y1, . . . ,yn with stochastic neighborhood distribution Qsuch that “P and Q are similar”

i.e. minimize:

KL(P�Q) =�s,t

Ps,t log� Ps,t

Qs,t� =�

s,tPs,t log (Ps,t) −�

s,tPs,t log (Qs,t)

Stochastic neighborhood distribution P<latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit>

STOCHASTIC NEIGHBORHOOD EMBEDDING

Use a probabilistic notion of which points are neighbors.

Close by points are neighbors with high probability, . . .Eg: For point xt, point xs is picked as neighbor with probability

pt→s = exp(− �xs−xt�2

2�2 )∑u≠t exp(− �xu−xt�2

2�2 )Probability that points s and t are connected Ps,t = Pt,s = pt→s+ps→t

2n

Goal: Find y1, . . . ,yn with stochastic neighborhood distribution Qsuch that “P and Q are similar”

i.e. minimize:

KL(P�Q) =�s,t

Ps,t log� Ps,t

Qs,t� =�

s,tPs,t log (Ps,t) −�

s,tPs,t log (Qs,t)

Stochastic neighborhood distribution P<latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit>

STOCHASTIC NEIGHBORHOOD EMBEDDING

Use a probabilistic notion of which points are neighbors.

Close by points are neighbors with high probability, . . .Eg: For point xt, point xs is picked as neighbor with probability

pt→s = exp(− �xs−xt�2

2�2 )∑u≠t exp(− �xu−xt�2

2�2 )Probability that points s and t are connected Ps,t = Pt,s = pt→s+ps→t

2n

Goal: Find y1, . . . ,yn with stochastic neighborhood distribution Qsuch that “P and Q are similar”

i.e. minimize:

KL(P�Q) =�s,t

Ps,t log� Ps,t

Qs,t� =�

s,tPs,t log (Ps,t) −�

s,tPs,t log (Qs,t)

Stochastic neighborhood distribution P<latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit>

CHOICE FOR Q

Just like we defined P, we can define Q for a given y1, . . . ,yn by

qt→s = exp(− �ys−yt�2

2�2 )∑u≠t exp(− �yu−yt�2

2�2 )and then set Qs,t = qt→s+qs→t

2n

However we are faced with the crowding problem:In high dimension we have a lot of space, Eg. in d dimension wehave d + 1 equidistant point

For d dimensional gaussians, most points are found at distance√

dfrom mean!

If we use gaussians in both high and low dimensional space, all thepoints are squished in to a small spaceToo many points crowd the center!

CHOICE FOR Q

Just like we defined P, we can define Q for a given y1, . . . ,yn by

qt→s = exp(− �ys−yt�2

2�2 )∑u≠t exp(− �yu−yt�2

2�2 )and then set Qs,t = qt→s+qs→t

2n

However we are faced with the crowding problem:In high dimension we have a lot of space, Eg. in d dimension wehave d + 1 equidistant point

For d dimensional gaussians, most points are found at distance√

dfrom mean!

If we use gaussians in both high and low dimensional space, all thepoints are squished in to a small spaceToo many points crowd the center!

CHOICE FOR Q

Just like we defined P, we can define Q for a given y1, . . . ,yn by

qt→s = exp(− �ys−yt�2

2�2 )∑u≠t exp(− �yu−yt�2

2�2 )and then set Qs,t = qt→s+qs→t

2n

However we are faced with the crowding problem:In high dimension we have a lot of space, Eg. in d dimension wehave d + 1 equidistant point

For d dimensional gaussians, most points are found at distance√

dfrom mean!

If we use gaussians in both high and low dimensional space, all thepoints are squished in to a small spaceToo many points crowd the center!

CHOICE FOR Q

Just like we defined P, we can define Q for a given y1, . . . ,yn by

qt→s = exp(− �ys−yt�2

2�2 )∑u≠t exp(− �yu−yt�2

2�2 )and then set Qs,t = qt→s+qs→t

2n

However we are faced with the crowding problem:In high dimension we have a lot of space, Eg. in d dimension wehave d + 1 equidistant point

For d dimensional gaussians, most points are found at distance√

dfrom mean!

If we use gaussians in both high and low dimensional space, all thepoints are squished in to a small spaceToo many points crowd the center!

CHOICE FOR Q

Just like we defined P, we can define Q for a given y1, . . . ,yn by

qt→s = exp(− �ys−yt�2

2�2 )∑u≠t exp(− �yu−yt�2

2�2 )and then set Qs,t = qt→s+qs→t

2n

However we are faced with the crowding problem:In high dimension we have a lot of space, Eg. in d dimension wehave d + 1 equidistant point

For d dimensional gaussians, most points are found at distance√

dfrom mean!

If we use gaussians in both high and low dimensional space, all thepoints are squished in to a small spaceToo many points crowd the center!

METHOD II: T-SNE

Instead for Q we use, student t distribution which is heavy tailed:

qt→s = �1 + �ys − yt�2�−1

∑u≠t (1 + �yu − yt�2)−1

and then set Qs,t = qt→s+qs→t2n

It can be verified that

∇ytKL(P�Q) = 4n�

s=1(Ps,t −Qs,t)(yt − ys) �1 + �ys − yt�2�−1

Algorithm: Find y1, . . . ,yn by performing gradient descent

METHOD II: T-SNE

Instead for Q we use, student t distribution which is heavy tailed:

qt→s = �1 + �ys − yt�2�−1

∑u≠t (1 + �yu − yt�2)−1

and then set Qs,t = qt→s+qs→t2n

It can be verified that

∇ytKL(P�Q) = 4n�

s=1(Ps,t −Qs,t)(yt − ys) �1 + �ys − yt�2�−1

Algorithm: Find y1, . . . ,yn by performing gradient descent

METHOD II: T-SNE

Instead for Q we use, student t distribution which is heavy tailed:

qt→s = �1 + �ys − yt�2�−1

∑u≠t (1 + �yu − yt�2)−1

and then set Qs,t = qt→s+qs→t2n

It can be verified that

∇ytKL(P�Q) = 4n�

s=1(Ps,t −Qs,t)(yt − ys) �1 + �ys − yt�2�−1

Algorithm: Find y1, . . . ,yn by performing gradient descent

METHOD II: T-SNE

Instead for Q we use, student t distribution which is heavy tailed:

qt→s = �1 + �ys − yt�2�−1

∑u≠t (1 + �yu − yt�2)−1

and then set Qs,t = qt→s+qs→t2n

It can be verified that

∇ytKL(P�Q) = 4n�

s=1(Ps,t −Qs,t)(yt − ys) �1 + �ys − yt�2�−1

Algorithm: Find y1, . . . ,yn by performing gradient descent

Demo

top related