Unsupervise d learning with Random forest predictors Abhishek Singh MSAN 2015-16 Based on the research article by, Author(s): Tao Shi and Steve Horvath Source: Journal of Computational and Graphical Statistics, Vol. 15, No. 1 (Mar., 2006), pp. 118- 138
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Unsupervised learning with Random forest predictors
Abhishek Singh
MSAN 2015-16
Based on the research article by, Author(s): Tao Shi and Steve HorvathSource: Journal of Computational and Graphical Statistics, Vol. 15, No. 1 (Mar., 2006), pp. 118- 138
Understanding Random Forest classifier
Bootstrapping– Create multiple samples from
the original training data
Understanding Random Forest classifier
Bootstrapping– Create multiple samples from
the original training data
Multiple Decision trees– Creating decision trees from
each of these samples
Understanding Random Forest classifier
Bootstrapping– Create multiple samples from
the original training data
Multiple Decision trees– Creating decision trees from
each of these samples
Plurality of votes principle– Taking the majority vote as
the final vote
Applying Random Forest (RF) in Unsupervised learning
Compute Dissimilarity
Matrix
Project the Dissimilarity Matrix
into Euclidean space
Create Random Forest clusters
1
Dissimilarity Matrix
Applying Random Forest (RF) in Unsupervised learning
2
Compute Dissimilarity
Matrix
Project the Dissimilarity Matrix
into Euclidean space
Create Random Forest
clusters
Project the Dissimilarity Matrix into Euclidean space
Applying Random Forest (RF) in Unsupervised learning
Random Forest (RF) Clusters3
Compute Dissimilarity
Matrix
Project the Dissimilarity Matrix
into Euclidean space
Create Random Forest clusters
But clustering can be done much easily on this unlabeled data
Euclidean Distance (ED) Clusters
Random Forest (RF) Clusters
Compute Dissimilarity
Matrix
Project the Dissimilarity Matrix
into space
Create Random Forest
Clusters
Create clusters based on Euclidean distances between points
Then ,why do hard Random Forest clustering?
Tumor data classification: to calibrate the effectiveness of Random Forest (RF) Cluster
TASK: Group Tumor expressions into 2 groups: With Cancer or without Cancer
1.Group Tumor expressions into RF Cluster 1 (Without cancer) or RF Cluster 2 (With Cancer)
2.Group the same Tumor expressions into Euclidean Cluster 1 (Without cancer) or Euclidean Cluster 2 (With Cancer)
3.Colour tumor expressions based on the 2 clusters
ED Cluster 1 ED Cluster 2
RF Cluster 1
RF Cluster 2
And the results are..
Random Forest clusters outperformed the Euclid clusters– RF clusters had more larger difference in mean survival time
– RF Cluster results had much higher concordance with the true biological results
And the results are..
Random Forest clusters outperformed the Euclid clusters– RF clusters had more larger difference in mean survival time
– RF Cluster results had much higher concordance with the true biological results