Top Banner
Big Data and Large Scale Data Analysis Andrew Mead School of Life Sciences 23 rd October 2013
7

Big Data and Large Scale Data Analysis Andrew Mead School of Life Sciences 23 rd October 2013.

Jan 04, 2016

Download

Documents

Arlene Cameron
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big Data and Large Scale Data Analysis Andrew Mead School of Life Sciences 23 rd October 2013.

Big Data andLarge Scale Data Analysis

Andrew MeadSchool of Life Sciences

23rd October 2013

Page 2: Big Data and Large Scale Data Analysis Andrew Mead School of Life Sciences 23 rd October 2013.

Big Data

• Modern technologies make it increasingly easy to collect large quantities of data– ‘Omics revolution– Remote sensing– Weather (and hence climate change applications)– Internet applications– Social networking– Shopping preferences– Health applications– …

• But how do we make the most of these data?

Page 3: Big Data and Large Scale Data Analysis Andrew Mead School of Life Sciences 23 rd October 2013.

Gene expression microarrays

• Data on many thousands of genes (spots) on each array

• Comparisons of multiple samples (treatments, time, individual plants or animals, …)

• Processing of data for each gene separately or in combination

Page 4: Big Data and Large Scale Data Analysis Andrew Mead School of Life Sciences 23 rd October 2013.

Landscape data

• Land-use/cover for each land-parcel

• Basis for simulation studies of changes in land-use

• Summary of spatial data into simple statistics

JCA101 - Simulation01 - Run001 - Year2009

50 100 150 200

50

100

150

200

250

Page 5: Big Data and Large Scale Data Analysis Andrew Mead School of Life Sciences 23 rd October 2013.

Challenges

• Storage of big data sets• Management

– Structured– Unstructured

• Analysis– Often similar questions as for smaller data sets– Computationally intractable as data volume

increases

Page 6: Big Data and Large Scale Data Analysis Andrew Mead School of Life Sciences 23 rd October 2013.

Multivariate Statistics and Data Mining

• Dimensionality reduction– Find the important combinations of variables– Use these in models

• Use computing power to search for “patterns”

• Challenge in connecting the analysis process to the data

• Distributed computing, massively parallel processing (MPP), machine learning, search-based applications (SBA), …

Page 7: Big Data and Large Scale Data Analysis Andrew Mead School of Life Sciences 23 rd October 2013.

Statistics and Big Data

• Computing power is probably crucial!• But statistical approaches are important

– Designing the data collection• Sub-sampling?

– Defining the problem– Managing the data– Dimension reduction

• Finding the signal amidst the noise!