FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard H. Scheuermann, Ph.D. Department of Pathology and Division of Biomedical Informatics U.T. Southwestern Medical Center, Dallas, TX
31
Embed
FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High- Dimensional Flow Cytometry Data and the Cell Ontology Richard.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High-
Dimensional Flow Cytometry Data and the Cell Ontology
Richard H. Scheuermann, Ph.D.
Department of Pathology and Division of Biomedical Informatics
• Method:– Stain cell population with fluorescent reagents that bind to specific
molecules, e.g. fluorescein-conjugated anti-CD40 antibodies
– Measure fluorescence properties of each cell using flow cytometer
• Direct and indirect measurement of individual cell characteristics, e.g. cell size, membrane protein expression, secreted protein expression, cell cycle state, DNA ploidy, signal transduction activation
Uses of Flow Cytometry (FCM)
• Differences in cell populations between specimens
• Study of normal cell activation, differentiation and function
• Study of abnormal cell activation, differentiation and function
• Isolate cells from mixture based on their molecular characteristics
– Some cell populations are relatively sparse even on 2D space• compositions
– Events that pile up on axis can change data distribution• positions
– Some are very close while others are far away• sizes
– From several events to hundreds of thousands events
FLOCK APPROACH
Grid-based Clustering Approach
• Divide n-dimensional space with hyper-grids
• Identify dense hyper-regions
• Merge neighboring dense hyper-regions to define k populations
• Determine centroids of each population
• Cluster data using k centroids to seed
2D example
Divide with hyper-grids
Find dense hyper-regions
Merge neighboring dense hyper-regions
Clustering based on region centers
FLOCK v2.0 STEPS
1. File Conversion - Convert binary .fcs file into a data matrix
2. Data Cleansing - Remove boundary events (noise) in FSC and SSC dimensions
3. Data Shrinking - Collapse data toward distribution modes
4. Normalization - Z-score normalization for values in each dimension ((x i - µ)/SD)
5. Dimension Selection - Select most informative dimensions based on measures of dispersion and
distortion
6. FLOCK LoDi. Partition each dimension to generate a hyper-gridii. Identify dense hyper-regions in hyper-gridiii. Merge neighboring dense hyper-regions to define hyper-region groups (n)iv. Determine centroids for each hyper-region groupv. Use n centroids to seed single round of distance-based clustering
7. FLOCK HiD - Refine population definition based on histogram partitioning
8. Group Merging - Merge close hyper-region groups based on [distance metric]
9. Centroid Calculation - Compute centroid for each hyper-region group
10. Clustering - Cluster events to nearest centroid
11. Population statistics - Summarize population proportions, intensity levels, etc.
12. Visualization
Data
• Source: University of Rochester (Sanz)
• Normal human PBMC sample stained with:– FITC‑IgD– PE‑CD1c– PE‑Alexa610‑CD24– PE‑Cy5‑IgG– PerCP‑Cy5.5‑CD3– PE‑Cy7‑B220– PacificBlue‑CD38– PacificOrange‑Aqua dead cell staining– APC‑CD27– APC‑Cy7‑CD19