Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

Faithful Sampling for Spectral Clustering to Analyze High

Throughput Flow Cytometry Data

Parisa Shooshtari

School of Computing Science, Simon Fraser University, Burnaby

Brinkman’s Lab, Terry Fox Laboratory, BC Cancer Agency, Vancouver

Outline:

• Flow Cytometry (FCM) Data• Clustering of FCM data• Spectral Clustering• Faithful Sampling for Spectral Clustering• Result• Summary

Basics of Flow Cytometry Technique

Sample

Wave Length

MHC-II

CD-11c

MHC-IIInt-1Int-2

Cell Population Identification in Flow Cytometry (FCM)

Adapted from the Science Creative Quarterly (2)

Parameter 4Pa

Parameter 1

Importance of FCM Data Clustering

• Manual Gating is– Subjective– Error-prone– Time-Consuming– It ignores the multi-variation nature of the data

• Analyzing large size FCM data sets (with up to 19 dimensions and 1000,000 points) is impractical without the aim of automated techniques

Which Clustering Algorithm Is Suitable?• Model-Based algorithms like FlowClust, FlowMerge and FLAME

are not suitable for non-elliptical shape clusters.

FlowMergeA Good Clustering

Our Motivation for Using Spectral Clustering

• Spectral clustering does not require any priori assumption on cluster size, shape or distribution

• It is not sensitive to outliers, noise and shape of clusters

Spectral Clustering in One SlideRepresent data sets by a similarity graph

Construct the Graph:• Vertices: data points p1, p2, …, pn

• Weights of edges: similarity values Si, j as

Clustering: Find a cut through the graph• Define a cut objective function• Solve it

The Bottleneck of Spectral Clustering

• Serious empirical barriers when applying this algorithm to large datasets

• Time complexity: O(n3) ---- > 2 years for 300,000 data points (cells)

• Required memory: O(n2) ---- > 5 terabytes for 300,000 data points (cells)

Faithful Sampling: Our Solution for Applying Spectral Clustering to Large Data

• Uniform Sampling:Low density populations close to dense ones may not remain distinguishable

• Faithful Sampling:Tends to choose more samples from non-dense parts of the data.

How Does Our Faithful SamplingPreserve Information?

1.1. Space Uniform Sampling: Space Uniform Sampling: It preserves low-density parts of the data by selecting more samples from them compared to the uniform sampling.

2.2. Keeping the list of points in Keeping the list of points in neighbourhood of samples: neighbourhood of samples: This will be used to define similarities between communities.

Clustering Result• Low density populations surrounded by dense ones

Clustering Result• Populations with Non-elliptical Shapes

• Subpopulations of a major population

SamSPECTRAL flowMerge FLAME

Summary• Spectral clustering can now be applied to large size data

by our proposed Faithful (Information Preserving) sampling.

• This sampling method can be used in combination with other graph-based clustering algorithms with different objective functions to reduce size of the data.

• We have shown that SamSPECTRAL has advantage over model-based clusterings in identification of– Cell populations with non-elliptical shapes– Low-density populations surrounded by dense ones– Sub-populations of a major population

Acknowledgement• Committee:

– Dr. Arvind Gupta– Dr. Ryan Brinkman– Dr. Tobias Kollman

• Co-authors on SamSPECTRAL – Habil Zare

• Data Providers – Connie Eaves– Peter Landsdrop– Keith Humphries

Thanks for Thanks for Your Attention!Your Attention!

Faithful Sampling for Spectral Clustering to Analyze High Throughput Flow Cytometry Data Parisa Shooshtari School of Computing Science, Simon Fraser University,

Documents

Parisa Mosayebi - University of...

International Society for Advancement of Cytometry...

flow cytometry

Ali Neissi Shooshtari

Integrated Cytometry

Workshop: Flow Cytometry LBFF: Leeds Bioimaging and Flow...

cytometry RNA sequencing, mass cytometry, and flow ›...

JULIA PICCININI PARISA CRANE ERIKO TAKAMINE Southern...

Parisa HEYDARIZADEH

Why are things_colourful by Parisa and Laurelinda

Intracellular flow cytometry - Thermo Fisher...

Candidate: Parisa Rashidi Advisor: Diane J. Cook 1.

Flow cytometry leukocyte differential : a critical appraisal...

HUMAN ENVIRONMENT by Parisa Watson

RSC Water Forum: Flow Cytometry Day Using Flow Cytometry ...

What is Flow Cytometry? Introduction to Flow Cytometry IGC.....