Isomap + TSNE Machine Learning for Data Science (CS4786) Lecture 9
Isomap + TSNE
Machine Learning for Data Science (CS4786)Lecture 9
Principal Component Analysis
Course Webpage :http://www.cs.cornell.edu/Courses/cs4786/2017fa/
MANIFOLD BASED DIMENSIONALITY REDUCTION
Key Assumption: Points live on a low dimensional manifold
Manifold: subspace that looks locally Euclidean
Given data, can we uncover this manifold?
Can we unfold this?
METHOD I: ISOMAP
1 For every point, find its (k-) Nearest Neighbors
2 Form the Nearest Neighbor graph
3 For every pair of points A and B, distance between point A to B isshortest distance between A and B on graph
4 Find points in low dimensional space such that distances betweenpoints in this space is equal to distance on graph.
METHOD I: ISOMAP
1 For every point, find its (k-) Nearest Neighbors
2 Form the Nearest Neighbor graph
3 For every pair of points A and B, distance between point A to B isshortest distance between A and B on graph
4 Find points in low dimensional space such that distances betweenpoints in this space is equal to distance on graph.
METHOD I: ISOMAP
1 For every point, find its (k-) Nearest Neighbors
2 Form the Nearest Neighbor graph
3 For every pair of points A and B, distance between point A to B isshortest distance between A and B on graph
4 Find points in low dimensional space such that distances betweenpoints in this space is equal to distance on graph.
Pair-wise Distance
Matrix
METHOD I: ISOMAP
1 For every point, find its (k-) Nearest Neighbors
2 Form the Nearest Neighbor graph
3 For every pair of points A and B, distance between point A to B isshortest distance between A and B on graph
4 Find points in low dimensional space such that distances betweenpoints in this space is equal to distance on graph.
Pair-wise Distance
Matrix
ISOMAP: PITFALLS
1 If we don’t take enough nearest neighbors, then graph may not beconnected
2 If we connect points too far away, points that should not beconnected can get connected
3 There may not be a right number of nearest neighbors we shouldconsider!
STOCHASTIC NEIGHBORHOOD EMBEDDING
Use a probabilistic notion of which points are neighbors.
Close by points are neighbors with high probability, . . .Eg: For point xt, point xs is picked as neighbor with probability
pt→s = exp(− �xs−xt�2
2�2 )∑u≠t exp(− �xu−xt�2
2�2 )Probability that points s and t are connected Ps,t = Pt,s = pt→s+ps→t
2n
Goal: Find y1, . . . ,yn with stochastic neighborhood distribution Qsuch that “P and Q are similar”
i.e. minimize:
KL(P�Q) =�s,t
Ps,t log� Ps,t
Qs,t� =�
s,tPs,t log (Ps,t) −�
s,tPs,t log (Qs,t)
STOCHASTIC NEIGHBORHOOD EMBEDDING
Use a probabilistic notion of which points are neighbors.
Close by points are neighbors with high probability, . . .Eg: For point xt, point xs is picked as neighbor with probability
pt→s = exp(− �xs−xt�2
2�2 )∑u≠t exp(− �xu−xt�2
2�2 )Probability that points s and t are connected Ps,t = Pt,s = pt→s+ps→t
2n
Goal: Find y1, . . . ,yn with stochastic neighborhood distribution Qsuch that “P and Q are similar”
i.e. minimize:
KL(P�Q) =�s,t
Ps,t log� Ps,t
Qs,t� =�
s,tPs,t log (Ps,t) −�
s,tPs,t log (Qs,t)
Stochastic neighborhood distribution P<latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit>
STOCHASTIC NEIGHBORHOOD EMBEDDING
Use a probabilistic notion of which points are neighbors.
Close by points are neighbors with high probability, . . .Eg: For point xt, point xs is picked as neighbor with probability
pt→s = exp(− �xs−xt�2
2�2 )∑u≠t exp(− �xu−xt�2
2�2 )Probability that points s and t are connected Ps,t = Pt,s = pt→s+ps→t
2n
Goal: Find y1, . . . ,yn with stochastic neighborhood distribution Qsuch that “P and Q are similar”
i.e. minimize:
KL(P�Q) =�s,t
Ps,t log� Ps,t
Qs,t� =�
s,tPs,t log (Ps,t) −�
s,tPs,t log (Qs,t)
Stochastic neighborhood distribution P<latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit>
STOCHASTIC NEIGHBORHOOD EMBEDDING
Use a probabilistic notion of which points are neighbors.
Close by points are neighbors with high probability, . . .Eg: For point xt, point xs is picked as neighbor with probability
pt→s = exp(− �xs−xt�2
2�2 )∑u≠t exp(− �xu−xt�2
2�2 )Probability that points s and t are connected Ps,t = Pt,s = pt→s+ps→t
2n
Goal: Find y1, . . . ,yn with stochastic neighborhood distribution Qsuch that “P and Q are similar”
i.e. minimize:
KL(P�Q) =�s,t
Ps,t log� Ps,t
Qs,t� =�
s,tPs,t log (Ps,t) −�
s,tPs,t log (Qs,t)
Stochastic neighborhood distribution P<latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit>
STOCHASTIC NEIGHBORHOOD EMBEDDING
Use a probabilistic notion of which points are neighbors.
Close by points are neighbors with high probability, . . .Eg: For point xt, point xs is picked as neighbor with probability
pt→s = exp(− �xs−xt�2
2�2 )∑u≠t exp(− �xu−xt�2
2�2 )Probability that points s and t are connected Ps,t = Pt,s = pt→s+ps→t
2n
Goal: Find y1, . . . ,yn with stochastic neighborhood distribution Qsuch that “P and Q are similar”
i.e. minimize:
KL(P�Q) =�s,t
Ps,t log� Ps,t
Qs,t� =�
s,tPs,t log (Ps,t) −�
s,tPs,t log (Qs,t)
Stochastic neighborhood distribution P<latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit>
STOCHASTIC NEIGHBORHOOD EMBEDDING
Use a probabilistic notion of which points are neighbors.
Close by points are neighbors with high probability, . . .Eg: For point xt, point xs is picked as neighbor with probability
pt→s = exp(− �xs−xt�2
2�2 )∑u≠t exp(− �xu−xt�2
2�2 )Probability that points s and t are connected Ps,t = Pt,s = pt→s+ps→t
2n
Goal: Find y1, . . . ,yn with stochastic neighborhood distribution Qsuch that “P and Q are similar”
i.e. minimize:
KL(P�Q) =�s,t
Ps,t log� Ps,t
Qs,t� =�
s,tPs,t log (Ps,t) −�
s,tPs,t log (Qs,t)
Stochastic neighborhood distribution P<latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit><latexit sha1_base64="0O/UJQ7uxI+Ut2EW8/bmymWIfqc=">AAACGHicdVBNSwMxFMz6bf2qevQSLIKnmohovYlePCpaFdoi2fS1DWaTJXkrlqU/w4t/xYsHRbx689+Y1QoqOhAYZubx8iZOtfLI2Fs0Mjo2PjE5NV2amZ2bXygvLp15mzkJdWm1dRex8KCVgToq1HCROhBJrOE8vjoo/PNrcF5Zc4r9FFqJ6BrVUVJgkC7LG02EG3RJfoJW9oRHJakB1e3F1vWsbdN2+INTcVbE6eDoslxhVcYY55wWhO9ss0B2d2ubvEZ5YQVUyBAh/9psW5klYFBq4X2DsxRbuXBhk4ZBqZl5SIW8El1oBGpEAr6Vfxw2oGtBadOOdeEZpB/q94lcJN73kzgkE4E9/9srxL+8RoadWitXJs0QjPxc1Mk0RUuLlsLVDiTqfiBCOlW0EtpxQmLoshRK+LqU/k/ONqucVfnxVmVvf1jHFFkhq2SdcLJD9sghOSJ1IsktuSeP5Cm6ix6i5+jlMzoSDWeWyQ9Er+/nG6GM</latexit>
CHOICE FOR Q
Just like we defined P, we can define Q for a given y1, . . . ,yn by
qt→s = exp(− �ys−yt�2
2�2 )∑u≠t exp(− �yu−yt�2
2�2 )and then set Qs,t = qt→s+qs→t
2n
However we are faced with the crowding problem:In high dimension we have a lot of space, Eg. in d dimension wehave d + 1 equidistant point
For d dimensional gaussians, most points are found at distance√
dfrom mean!
If we use gaussians in both high and low dimensional space, all thepoints are squished in to a small spaceToo many points crowd the center!
CHOICE FOR Q
Just like we defined P, we can define Q for a given y1, . . . ,yn by
qt→s = exp(− �ys−yt�2
2�2 )∑u≠t exp(− �yu−yt�2
2�2 )and then set Qs,t = qt→s+qs→t
2n
However we are faced with the crowding problem:In high dimension we have a lot of space, Eg. in d dimension wehave d + 1 equidistant point
For d dimensional gaussians, most points are found at distance√
dfrom mean!
If we use gaussians in both high and low dimensional space, all thepoints are squished in to a small spaceToo many points crowd the center!
CHOICE FOR Q
Just like we defined P, we can define Q for a given y1, . . . ,yn by
qt→s = exp(− �ys−yt�2
2�2 )∑u≠t exp(− �yu−yt�2
2�2 )and then set Qs,t = qt→s+qs→t
2n
However we are faced with the crowding problem:In high dimension we have a lot of space, Eg. in d dimension wehave d + 1 equidistant point
For d dimensional gaussians, most points are found at distance√
dfrom mean!
If we use gaussians in both high and low dimensional space, all thepoints are squished in to a small spaceToo many points crowd the center!
CHOICE FOR Q
Just like we defined P, we can define Q for a given y1, . . . ,yn by
qt→s = exp(− �ys−yt�2
2�2 )∑u≠t exp(− �yu−yt�2
2�2 )and then set Qs,t = qt→s+qs→t
2n
However we are faced with the crowding problem:In high dimension we have a lot of space, Eg. in d dimension wehave d + 1 equidistant point
For d dimensional gaussians, most points are found at distance√
dfrom mean!
If we use gaussians in both high and low dimensional space, all thepoints are squished in to a small spaceToo many points crowd the center!
CHOICE FOR Q
Just like we defined P, we can define Q for a given y1, . . . ,yn by
qt→s = exp(− �ys−yt�2
2�2 )∑u≠t exp(− �yu−yt�2
2�2 )and then set Qs,t = qt→s+qs→t
2n
However we are faced with the crowding problem:In high dimension we have a lot of space, Eg. in d dimension wehave d + 1 equidistant point
For d dimensional gaussians, most points are found at distance√
dfrom mean!
If we use gaussians in both high and low dimensional space, all thepoints are squished in to a small spaceToo many points crowd the center!
METHOD II: T-SNE
Instead for Q we use, student t distribution which is heavy tailed:
qt→s = �1 + �ys − yt�2�−1
∑u≠t (1 + �yu − yt�2)−1
and then set Qs,t = qt→s+qs→t2n
It can be verified that
∇ytKL(P�Q) = 4n�
s=1(Ps,t −Qs,t)(yt − ys) �1 + �ys − yt�2�−1
Algorithm: Find y1, . . . ,yn by performing gradient descent
METHOD II: T-SNE
Instead for Q we use, student t distribution which is heavy tailed:
qt→s = �1 + �ys − yt�2�−1
∑u≠t (1 + �yu − yt�2)−1
and then set Qs,t = qt→s+qs→t2n
It can be verified that
∇ytKL(P�Q) = 4n�
s=1(Ps,t −Qs,t)(yt − ys) �1 + �ys − yt�2�−1
Algorithm: Find y1, . . . ,yn by performing gradient descent
METHOD II: T-SNE
Instead for Q we use, student t distribution which is heavy tailed:
qt→s = �1 + �ys − yt�2�−1
∑u≠t (1 + �yu − yt�2)−1
and then set Qs,t = qt→s+qs→t2n
It can be verified that
∇ytKL(P�Q) = 4n�
s=1(Ps,t −Qs,t)(yt − ys) �1 + �ys − yt�2�−1
Algorithm: Find y1, . . . ,yn by performing gradient descent
METHOD II: T-SNE
Instead for Q we use, student t distribution which is heavy tailed:
qt→s = �1 + �ys − yt�2�−1
∑u≠t (1 + �yu − yt�2)−1
and then set Qs,t = qt→s+qs→t2n
It can be verified that
∇ytKL(P�Q) = 4n�
s=1(Ps,t −Qs,t)(yt − ys) �1 + �ys − yt�2�−1
Algorithm: Find y1, . . . ,yn by performing gradient descent
Demo