Top Banner
Data Mining and the OptIPuter Padhraic Smyth University of California, Irvine
12

Data Mining and the OptIPuter

Jan 17, 2016

Download

Documents

marty

Data Mining and the OptIPuter. Padhraic Smyth University of California, Irvine. Data Mining of Spatio-Temporal Scientific Data. Modern scientific data analysis increasingly data-driven data often consist of massive spatio-temporal streams Research focus - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Mining and the OptIPuter

Data Mining and the OptIPuter

Padhraic SmythUniversity of California, Irvine

Page 2: Data Mining and the OptIPuter

Data Mining of Spatio-Temporal Scientific Data

– Modern scientific data analysis• increasingly data-driven• data often consist of massive spatio-temporal streams

– Research focus• characterizing spatio-temporal structure in data• statistical models for object shapes, trajectories, patterns...• data mining from scientific data streams (NSF, Optiputer)• recognition of waveforms in time-series archives (JPL,NASA)• inference of dynamic gene-regulation networks from data

(NIH) • Markov models for spatio-temporal weather patterns (DOE)• clustering and modeling of storm trajectories (LLNL)

Page 3: Data Mining and the OptIPuter

100 200 300 400 500 600

50

100

150

200

250

300

350

400

450

Image-voxel Data(“slices” of olfactory bulb in rats)

Automatic segmentation of cellular structures of interest(glomelular layer)

Thematic mapsData miningScientific discovery

Page 4: Data Mining and the OptIPuter

Image-voxel Data(Remote sensing AVIRIS spectral data)

Focus of attention on wavelengths of interest

Thematic mapsData miningScientific discovery

Page 5: Data Mining and the OptIPuter

What’s wrong with this information flow?

• “One-way”– Flow of information is from data to scientist

• Real scientific investigation is “two-way”• Scientist interacts, explores, queries the data• Most current data mining/analysis tools are relatively poor

at handling interaction– Algorithms are “black-box”, do not allow scientists to be

“in the loop”– Algorithms have no representation of the scientist’s

prior knowledge or goals (no user models)

– OptIPuter project• “next generation” data mining tools for effective exploration

of massive 2d/3d data sets

Page 6: Data Mining and the OptIPuter

OptIPuter focus in Data Mining

• Data– 2d (or multi-d) spatio-temporal image/voxel data

• Goals– Allow scientists to explore these massive data sets in an

efficient and flexible manner leveraging the OptIPuter architecture

– Produce interactive software tools that allow scientists to explore massive data in an interactive manner:

• automated segmentation, thematic maps, focus of interest

• Technical Challenges– Scaling statistical algorithms to massive data streams– Providing mechanisms for effective scientific interaction – Developing algorithms for automated “focus-of-attention”

Page 7: Data Mining and the OptIPuter

Analysis of Extra-Tropical Cyclones

• Extra-tropical cyclone = mid-latitude storm

• Practical Importance– Highly damaging weather over Europe– Important water-source in United States

• Scientific Importance– Influence of climate on cyclone frequency, strength, etc.– Impact of cyclones on local weather patterns

[with Scott Gaffney (UCI), Andy Robertson (IRI/Columbia), Michael Ghil (UCLA)]

Page 8: Data Mining and the OptIPuter

Sea-Level Pressure Data

– Mean sea-level pressure (SLP) on a 2.5° by 2.5° grid– Four times a day, every 6 hours, over 20 years

Blue indicateslow pressure

Page 9: Data Mining and the OptIPuter

Winter Cyclone Trajectories

Page 10: Data Mining and the OptIPuter

Clustering Methodology

• Mixtures of curves– model as mixtures of noisy linear/quadratic curves

• note: true paths are not linear• use the model as a first-order approximation for

clustering

• Advantages– allows for variable-length trajectories– allows coupling of other “features” (e.g., intensity)– provides a quantitative (e.g., predictive) model– [contrast with k-means for example]

Page 11: Data Mining and the OptIPuter

Clusters of Trajectories

Page 12: Data Mining and the OptIPuter

Applications

• Visualization and Exploration– improved understanding of cyclone dynamics

• Change Detection– can quantitatively compare cyclone statistics over

different era’s or from different models

• Linking cyclones with climate and weather– correlation of clusters with NAO index– correlation with windspeeds in Northern Europe