Top Banner
Unsupervise d learning with Random forest predictors Abhishek Singh MSAN 2015-16 Based on the research article by, Author(s): Tao Shi and Steve Horvath Source: Journal of Computational and Graphical Statistics, Vol. 15, No. 1 (Mar., 2006), pp. 118- 138
14
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Technical Presentation

Unsupervised learning with Random forest predictors

Abhishek Singh

MSAN 2015-16

Based on the research article by, Author(s): Tao Shi and Steve HorvathSource: Journal of Computational and Graphical Statistics, Vol. 15, No. 1 (Mar., 2006), pp. 118- 138

Page 2: Technical Presentation

Understanding Random Forest classifier

Bootstrapping– Create multiple samples from

the original training data

Page 3: Technical Presentation

Understanding Random Forest classifier

Bootstrapping– Create multiple samples from

the original training data

Multiple Decision trees– Creating decision trees from

each of these samples

Page 4: Technical Presentation

Understanding Random Forest classifier

Bootstrapping– Create multiple samples from

the original training data

Multiple Decision trees– Creating decision trees from

each of these samples

Plurality of votes principle– Taking the majority vote as

the final vote

Page 5: Technical Presentation

Applying Random Forest (RF) in Unsupervised learning

Compute Dissimilarity

Matrix

Project the Dissimilarity Matrix

into Euclidean space

Create Random Forest clusters

1

Page 6: Technical Presentation

Dissimilarity Matrix

Page 7: Technical Presentation

Applying Random Forest (RF) in Unsupervised learning

2

Compute Dissimilarity

Matrix

Project the Dissimilarity Matrix

into Euclidean space

Create Random Forest

clusters

Page 8: Technical Presentation

Project the Dissimilarity Matrix into Euclidean space

Page 9: Technical Presentation

Applying Random Forest (RF) in Unsupervised learning

Random Forest (RF) Clusters3

Compute Dissimilarity

Matrix

Project the Dissimilarity Matrix

into Euclidean space

Create Random Forest clusters

Page 10: Technical Presentation

But clustering can be done much easily on this unlabeled data

Euclidean Distance (ED) Clusters

Random Forest (RF) Clusters

Compute Dissimilarity

Matrix

Project the Dissimilarity Matrix

into space

Create Random Forest

Clusters

Create clusters based on Euclidean distances between points

Page 11: Technical Presentation

Then ,why do hard Random Forest clustering?

Page 12: Technical Presentation

Tumor data classification: to calibrate the effectiveness of Random Forest (RF) Cluster

TASK: Group Tumor expressions into 2 groups: With Cancer or without Cancer

1.Group Tumor expressions into RF Cluster 1 (Without cancer) or RF Cluster 2 (With Cancer)

2.Group the same Tumor expressions into Euclidean Cluster 1 (Without cancer) or Euclidean Cluster 2 (With Cancer)

3.Colour tumor expressions based on the 2 clusters

ED Cluster 1 ED Cluster 2

RF Cluster 1

RF Cluster 2

Page 13: Technical Presentation

And the results are..

Random Forest clusters outperformed the Euclid clusters– RF clusters had more larger difference in mean survival time

– RF Cluster results had much higher concordance with the true biological results

Page 14: Technical Presentation

And the results are..

Random Forest clusters outperformed the Euclid clusters– RF clusters had more larger difference in mean survival time

– RF Cluster results had much higher concordance with the true biological results