Iden%fying Behavioral Strategies through Large Scale Phenotyping and Sta%s%cal Analysis Stephen Helms, Ph.D. April 8, 2014 – EYR Global FOM Ins%tute AMOLF, Amsterdam, Netherlands Leon Avery (VCU), Greg Stephens (VU Amsterdam/OIST), Tom Shimizu (AMOLF)
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Iden%fying Behavioral Strategies through Large Scale Phenotyping and Sta%s%cal Analysis
Stephen Helms, Ph.D. April 8, 2014 – EYR Global FOM Ins%tute AMOLF, Amsterdam, Netherlands Leon Avery (VCU), Greg Stephens (VU Amsterdam/OIST), Tom Shimizu (AMOLF)
How Do We Understand Complex Systems With Many Parts?
(Also a general “big data” ques%on!)
A Model Complex System
Tradi%onal approaches for understanding
complex biological systems
Sta%s%cal approach for understanding
biological systems
Data and computa%on problems
Proposed computa%onal
pla^orm
Outlook for the future
A Simple Model Nervous System: C. elegans
S%muli Response
The Worm • ~1000 total cells
• 302 neurons • 95 muscles
• ~20000 genes • Smell
(vola%le odors) • Taste
(soluble chemicals) • Feel
(touch, heat)
• Movement • Neural ac%vity • Biochemical reac%ons
A Biologist’s Toolbox • Break individual parts, see what happens Gene%cs
• Look at how parts chemically interact Biochemistry
• See where the parts are Cell Biology
End result: • A list of lots of details about what individual genes and proteins are doing • But no clear view on what the system as a whole does
Idea: Finding Simple Models Through Quan%ta%ve, Compara%ve Studies
• Build quan=ta=ve models that are just complicated enough to explain the phenotypes we can observe and care about
• Compare models across mul%ple strains and species to see what phenotypes biology cares about
• The molecular and cellular details can be filled in later using tradi%onal approaches
• Model system: Mo=le behavior – Behavior is the output of all the complicated systems of an organism
Gray and Lissmann (1964) J. Exp. Biol. 41:135-‐54, Croll (1975) J Zool. 176:159–176, Croll (1975) Adv Parasitol 13:71–122, Pierce-‐Shimomura et al. (1999) J. Neurosci. 19:9557-‐69. Iino, Y. & Yoshida, K. (2009) J. Neurosci. 29:5370-‐80. Helms (2013) Figshare.hqp://dx.doi.org/10.6084/m9.figshare.705155
Experimental Overview
Record video of freely moving worms up to 30
minutes Extract behavioral data Develop models
Sampling Behavioral Variability: Individual, Intra-‐ and Inter-‐Species
Holovachov, O. et al. (2009) Nematology 11(6):927-‐950. Chiang, J.-‐T.A. et al. (2006) J. Exp. Biol. 209(10):1859-‐73. Andersen, E.C. et al. (2012) Nat. Genet. 44(3):285-‐90.
Up to 20 individuals per strain
Building Quan%ta%ve Models
• Correla%on func%ons • Phase spaces • Firng linear models
Determinis%c dynamics
• Distribu%ons Stochas%c components
• Monte Carlo simula%ons • Comparison with sta%s%cs of data
• Videos are large • 240 GB/h raw • 12 GB/h compressed
• Using ~1 TB of storage for a proof of concept project
• Want to scale up: • # individuals by 10-‐fold
• Sampling rate by 3-‐fold
Processing
• >3-‐fold slower than data collec%on on a desktop computer
• Results in: • A backlog of data to analyze
• A long delay before experiments can be interpreted
Sharing
• Videos are too big to regularly transfer around
• Extracted data is also big • 2 GB for the proof of concept project
• Limited ability for others to explore the data themselves
Need to record data on many individuals for a long =me at high frequency
Proposal: Centrally located data processing and
analysis services at SURFsara
SURFsara Video storage
Video processing Standard analyses
Experimental Users (AMOLF, VCU, etc.) Generate videos Visualize data
Develop analyses
Theory Users (VU, OIST, etc.) Visualize data
Develop analyses
Exchange datasets and analysis results (few GBs, weekly)
Upload videos Download datasets (hundreds of GBs, daily at peak)
Download datasets (tens of GBs, weekly)
• Loading large (>10 GB) videos • Processing 104-‐106 frames / video
How EYR Is Helping
Storage
• SURFsara will provide up to 20 TB of storage for the video data
Processing
• SURFsara will provide compu%ng resources • Cloud or grid
• eScience Center is helping with migra%ng analysis code to run on HPC infrastructure
Sharing
• Internet2 and SURFnet are connec%ng the involved ins%tutes with SURFsara using high-‐speed lightpath connec%ons • FOM Ins%tute AMOLF • VU • Okinawa Ins%tute of Science and Tech
• Virginia Commonwealth University
Growth Prospects • Open source aspects of C. elegans community