BIG DATA A Life Sciences Perspective Scott Novogoratz, CIO College of Veterinary Medicine & Biomedical Sciences
BIG DATAA Life Sciences Perspective
Scott Novogoratz, CIOCollege of Veterinary Medicine &
Biomedical Sciences
Infectious Disease Research Center
Among the world's leaders in researching West Nile Virus, drug-resistant Tuberculosis, Yellow Fever, Dengue, Hantavirus, Plague, Tularemia and other zoonotic and human diseases
“BIG DATA is data that exceeds the processing
capacity of conventional database
systems.”Ed Dumbill, Big Data, Editor in Chief
Omics & Ologies -Life Sciences BIG DATA
Omics– Genomics,– Transcriptomics,– Proteomics,– Metabolomics,– Metagenomics
BIG DATA Devices– Gene sequencing– Mass spectrometry– Imaging– Microarrays– Liquid chromatography
Ology(ies)– Radiology– Gastroenterology,– Cardiology,– Pathology
Medical Imaging BIG DATA Demands
Increases due to:• Avg. Size/Study • More Digitized Data• Pathology• Endoscopy• Pictures
• More Imaging Procedures
Endoscopy
Canine Duodenum Endoscopy Procedure
How Big is a Genome?
Paris Japonica152 Billion Base Pairs
Human3 Billion Base Pairs
E.Coli4 Million Base Pairs
The scale of biological data is exponentially increasing with sequencing technologies now producing data at a rate exceeding growth in computing power predicted by Moore’s Law
(10,000-fold improvement in sequencing vs. 16-fold improvement in computing
From the Big Data article Unraveling the Complexities, Higdon et al
What Do Life Science Researchers Want?
1. Reliable Data2. Statistically Valid Results
3. Analysis Tools with User-Friendly I/F4. Transparent Reporting of Results
5. Ability to Share Data
From U of Washington study to assess data & analysis needs for Life Scientists
Conclusions
• Recognize that BIG DATA storage issues differ based on the purpose and use of data
• Maximize the value of biological research, by improving the capability to store, catalog, share and compare research through:– Low cost and shared storage mechanisms– Universal and easy-to-use tools that provide
researchers with the capability to compare their findings with libraries of information