Top Banner
Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg Dept. of Computer Science, CCNY, CUNY NOAA/CREST Thursday, July 23, 2009
42

Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

May 26, 2018

Download

Documents

duonglien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Applications of Statistical Data Analysis at CCNY and

the Graphyte ToolkitIrina Gladkova

Michael GrossbergDept. of Computer Science, CCNY, CUNY

NOAA/CREST

Thursday, July 23, 2009

Page 2: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Flood of DataGOES 9,10,12NOAA-15,16,17,18LandSat 5,7DMSP F13,14,15,16Meteosat 6,7,8,9CBERS-2,2BSPOT-2,4,5ENVISATResourcesat 1CARTOSAT-1,2,2ARADARSAT-1,2KOMPSAT-1THEO-1GOMsGMS-5METEOR-3OKEANFeng-Yun

50 > Multi-sensor Platforms

1 Sensor (MODIS) = 125 GB/DAY

Thursday, July 23, 2009

Page 3: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Moore’s Law

WJet System - 3376 CPUs

NOAA High Performance Computing SystemsExponential Growth Continues

Thursday, July 23, 2009

Page 4: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Complex Relationships

• High Dimensionality:

• Hyper-spectral images

• High resolution

• Non-linear relationships

• Statistical Analysis:

• Starting point for physical modeling

• Pre-processes for visualizations

Thursday, July 23, 2009

Page 5: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Application and Data Driven

• Built tools

• Developed expertise

• Applying statistical analysis to NOAA data and problems in collaboration with NOAA Scientists

Thursday, July 23, 2009

Page 6: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Reality: Detectors Break

Band 6: 1628 - 1652 nm

Manufacturing FlawsLaunch DamageSpace is Harsh

Band 6: 15/20 Detectors Noisy or Totally Non-FunctionalThursday, July 23, 2009

Page 7: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Lost Opportunity• NASA MYD10_L2: “Aqua MODIS band 7 is used in the

algorithm. The test for snow in dense vegetation in the algorithm was disabled because it was observed to result in frequent erroneous snow mapping in some situations." (http://modis-snow-ice.gsfc.nasa.gov/val.html)

• The National Snow and Ice Data Center: "Version 4 (V004) MYD29 data, the most current version available, uses Aqua/MODIS band 7 instead of band 6." (http://www-nsidc.colorado.edu/data/myd29.html)

• NOAA/STAR: “On Aqua the retrievals are made in band 7 (2.119 µm) because of poor quality data from band 6."(Ignatov A., et al "Two MODIS Aerosol Products over Ocean on the Terra and Aqua CERES SSF Datasets")

Thursday, July 23, 2009

Page 8: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

What is ‘Plan B’?

NASA: Column-wise Interpolation?

Bad: Visible Artifacts

Worse: Derivatives (Gradient) Fully Corrupted

Essential Features Destroyed

Simulate Aqua Damage with Terra for Evaluation

Damaged Interpolated Ground Truth

Values

Gradient

Hue = Gradient Direction, Value = Gradient Magnitude

Thursday, July 23, 2009

Page 9: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Gradients

Hue = Gradient Direction, Value = Gradient Magnitude

Thursday, July 23, 2009

Page 10: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Not Much Proposed

• Only 2 papers try to fix

• Both Use Band 7 to Predict Band 6

• 2006: Global Polynomial Regression

• 2009: Local Polynomial Regression

Fundamental Problem: Band 6 not a function of 7

Thursday, July 23, 2009

Page 11: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

More Information Available

• 500m Bands have Significant Correlations

• Why not use all available information?

4

3

5

6

7

76543

Thursday, July 23, 2009

Page 12: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Statistical Approach

• Hypothesis:

We can predict band 6 from bands 3,4,5,7.

• MODIS on Terra has same bands

• Quantify prediction accuracy from test data (not used to build predictor)

Thursday, July 23, 2009

Page 13: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Train Using Terra

Training

Terra Radiance Band 3,4,5,6,7

TrainingData

PredictorParameters

TestingData

Terra Radiance Band 3,4,5,6,7

Band 3,4,5,7

PredictionPredictedBand 6

EvaluateErrors

Band 6

Prediction used for Quantitative restorationThursday, July 23, 2009

Page 14: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Preliminary Terra EvaluationDamaged Interpolated Ground Truth

Values

Gradient

Predicted (Restored)

Thursday, July 23, 2009

Page 15: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Histogram Of Angles

Thursday, July 23, 2009

Page 16: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Aqua Restoration

Thursday, July 23, 2009

Page 17: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Aqua Restoration

Thursday, July 23, 2009

Page 18: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Evaluate For Products

• Work with STAR to potential impact for aerosol M and A products

• Investigate use for snow mapping, and cloud mask algorithms

• Adapt prediction for products directly

• Collaborate with STAR to explain physical models driving prediction

Thursday, July 23, 2009

Page 19: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Sensor Synthesis

AvailableBands

DesiredBand

Eg, Band 3,4,5,7 Eg, Band 6

StatisticalPrediction

Old Elements: Prediction ~ Regression ~ EstimationNew Elements: More and higher quality data Much faster computers Able to handle non-linear multivariate problems in higher dimensions

Thursday, July 23, 2009

Page 20: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

No Green on GOES-R

Thursday, July 23, 2009

Page 21: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

6 Channels close to visible

Thursday, July 23, 2009

Page 22: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Why is Green Band Important?

• Primary reason: generate color images (RGB)

• GOES-R will have Red 640nm, and Cyan 470nm

• Current methods use lookup tables to predict green then produce RGB

Problem: Human color vision not based on

narrow band RGB

Thursday, July 23, 2009

Page 23: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Vision: Wide Band Response

Thursday, July 23, 2009

Page 24: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Tristimulus and XYXI(λ) spectral power

distributionI’(λ)

Two objects have same color <=> XYZ=X’Y’Z’

Don’t estimate green! Estimate XYZ and get accurate RGB

Thursday, July 23, 2009

Page 25: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Hyperion as Spectrometer

Hyperion Data, 220 bands

XYZ GOES-R Bands 1,2,3,4,5,6

Spectral Projection

Spectral Projection

Statistical Prediction

Thursday, July 23, 2009

Page 26: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Proof of Concept Results

Thursday, July 23, 2009

Page 27: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Equalized Images

Equalization simply for magnifying differences

Thursday, July 23, 2009

Page 28: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Beyond Prediction

• Statistical Estimation applies to clustering and classification tasks

• Example Clustering Problem (from Paul Menzel)

• What bands are most important for separating different cloud states?

• How do statistical clusters with those predicted by physics models?

Thursday, July 23, 2009

Page 29: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Library of Algorithms

• Many different statistical clustering algorithms

• Hard to evaluate: what defines a good cluster?

• We built a library: implements/wraps major clustering algorithms

Agglomerative

Agglomerative Hierarchical

Average Link

Best One Element Move Consensus

Best of K Consensus

CC Average Link

CC Pivot

Competitive Learning

Connected Component

Connected Components

Expectation Maximization

Fuzzy K-means

Graph Cut

Hierarchal Dimensionality Reduction

K-means

Leader Follower

Majority Rule Consensus

Mean Shift

Multi-Dimensional Scaling

Spectral Clustering

Stepwise Optimal Hierarchical

Available Clustering Algorithms

Thursday, July 23, 2009

Page 30: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Eg: Competitive Learning

Input: 7 dimensions/pixel

MODIS

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Thursday, July 23, 2009

Page 31: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Consensus Clustering

Consensus Label AgreementThursday, July 23, 2009

Page 32: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Data Clustering For Classification

What multispectral signatures correlate with presence of a bloom?

Algae Bloom Bulletin: bloom outlined in red Clustering Result: bloom shown in red

Algae Blooms

Thursday, July 23, 2009

Page 33: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

h

Modeled Remote Sensing Reflectance Spectra

The solid green spectra are when chlorophyll fluorescence is excluded from the simulation and solid red spectra are when fluorescence is included in the simulation assuming 0.75% quantum yield. Band 13 and 14 are MODIS bands centered at 667nm and 678nm respectively.

S. Ahmet et. al. “Novel optical techniques for detecting and classifying toxic dinoflagellate Karenia brevis blooms using satellite imagery”Thursday, July 23, 2009

Page 34: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Graphyte Tool Kit

• Web based interface to:

• Data

• Computation

• Algorithms

• 2D/3D graphical interactive tools

• Data Exploration

• Data Visualization

Thursday, July 23, 2009

Page 35: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Hardware Architecture

Thursday, July 23, 2009

Page 36: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Edit Code In Browser

Thursday, July 23, 2009

Page 37: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Software Architecture

Thursday, July 23, 2009

Page 38: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Interactive 3D Scatter Plot

Thursday, July 23, 2009

Page 39: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Rich Internet Application

Thursday, July 23, 2009

Page 40: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Edit/Run Code Through Browser

Thursday, July 23, 2009

Page 41: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Near Universal Availability

Thursday, July 23, 2009

Page 42: Applications of Statistical Data Analysis at CCNY and the ... · 7/24/2009 · Applications of Statistical Data Analysis at CCNY and the Graphyte Toolkit Irina Gladkova Michael Grossberg

Conclusion• Provide Expertise

• High Dimensionality

• Large Data Sets

• Statistical Clustering, Estimation, Classification

• Provide Tools for

• Computation

• Data Access

• Visualization

• Remote Collaboration

Thursday, July 23, 2009