A Data Quality Screening Service for Remote Sensing Data

Post on 25-Feb-2016

59 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Advancing Collaborative Connections for Earth System Science (ACCESS) Program. A Data Quality Screening Service for Remote Sensing Data. Christopher Lynnes, NASA/GSFC (P.I.) Edward Olsen, NASA/JPL Peter Fox, RPI Bruce Vollmer, NASA/GSFC Robert Wolfe, NASA/GSFC. Contributions from - PowerPoint PPT Presentation

Transcript

A Data Quality Screening Service for Remote Sensing

DataChristopher Lynnes, NASA/GSFC (P.I.)

Edward Olsen, NASA/JPLPeter Fox, RPI

Bruce Vollmer, NASA/GSFCRobert Wolfe, NASA/GSFC

Contributions fromR. Strub, T. Hearty, Y-I Won, M. Hegde, V. Jayanti

S. Zednik, P. West, N. Most, S. Ahmad, C. Praderas, K. Horrocks, I. Tcherednitchenko, and A.

Rezaiyan-Nojani

Advancing Collaborative Connections for Earth System Science (ACCESS) Program

2

Outline Why a Data Quality Screening Service? Making Quality Information Easier to Use via

the Data Quality Screening Service (DQSS) DEMO DQSS Architecture DQSS Accomplishments DQSS Going Forward The Complexity of Data Quality

3

Why a Data Quality Screening Service?

4The quality of data can vary considerably

AIRS Variable Best (%)

Good (%)

Do Not Use (%)

Total Precipitable Water

38 38 24

Carbon Monoxide 64 7 29Surface Temperature

5 44 51Version 5 Level 2 Standard Retrieval Statistics

5Quality schemes can be relatively simple…

Total Column Precipitable Water

Qual_H2O

Best Good Do Not Usekg/m2

6…or they can be more complicated

Hurricane Ike, viewed by the Atmospheric Infrared Sounder (AIRS)

PBest : Maximum pressure for which quality value is

“Best” in temperature profiles

Air Temperatureat 300 mbar

300 mbar

7Quality flags are also sometimes packed together

into bytesCloud Mask Status Flag0=Undetermined1=Determined

Cloud Mask Cloudiness Flag0=Confident cloudy1=Probably cloudy2=Probably clear3=Confident clear

Day / Night Flag0=Night1=Day

Sunglint Flag0=Yes1=No

Snow/ Ice Flag0=Yes1=No

Surface Type Flag0=Ocean, deep lake/river1=Coast, shallow lake/river2=Desert3=Land

Big-endian arrangement for the Cloud_Mask_SDS variable in atmospheric products from Moderate Resolution Imaging

Spectroradiometer (MODIS)

Recommendations for quality filtering can be

confusingAIRS

Quality IndicatorsMODIS Aerosols Confidence Flags

Data Assim. Best 0

Climatic Studies Good 1

Do Not Use Do Not Use 2 Use these flags in order

to stay within expected error bounds

3 Very Good2 Good1 Marginal0 Poor

3 Very Good2 Good1 Marginal0 Poor

Ocean Land

±0.05 ± 0.15 t ±0.03 ± 0.10 tOcean Land

Different framing for recommendations

Opposite direction for QC codes

9

Current user scenarios...

Nominal scenario Search for and download data Locate documentation on handling quality Read & understand documentation on quality Write custom routine to filter out bad pixels

Equally likely scenario (especially in user communitiesnot familiar with satellite data) Search for and download data Assume that quality has a negligible effect

Repeat for

each user

10The effect of bad

qualitydata is often not

negligibleTotal Column

Precipitable Water Quality

Best Good Do Not Usekg/m2

Hurricane Ike, 9/10/2008

11Neglecting quality may introduce bias (a more subtle

effect)AIRS Relative Humidity Comparison against Dropsonde with and without Applying PBest Quality Flag Filtering

Boxed data points indicate AIRS Relative Humidity data with dry bias > 20%

From a study by Sun Wong (JPL) on specific humidity in the Atlantic Main Development Region for Tropical

Storms

without QC filtering

with QC filtering

12Percent of Biased-High Data in MODIS Aerosols Over Land Increases as Confidence

Flag Decreases

Bad

Marginal

Good

Very Good

0% 20% 40% 60% 80% 100%

Compliant*Biased LowBiased High

*Compliant data are within + 0.05 + 0.2tAeronet

Statistics derived from Hyer, E., J. Reid, and J. Zhang, 2010, An over-land aerosol optical depth data set for data assimilation by filtering, correction, and aggregation of MODIS Collection 5 optical depth retrievals, Atmos. Meas. Tech., 3, 4091–4167.

Poor

13

Making Quality Information Easier to Use

via the Data Quality Screening Service (DQSS)

14

DQSS Team P.I.: Christopher Lynnes Software Implementation: Goddard Earth Sciences

Data and Information Services Center Implementation: Richard Strub Local Domain Experts (AIRS): Thomas Hearty and Bruce

Vollmer

AIRS Domain Expert: Edward Olsen, AIRS/JPL MODIS Implementation

Implementation: Neal Most, Ali Rezaiyan, Cid Praderas, Karen Horrocks, Ivan Tcherednitchenko

Domain Experts: Robert Wolfe and Suraiya Ahmad

Semantic Engineering: Tetherless World Constellation @ RPI Peter Fox, Stephan Zednik, Patrick West

15The DQSS filters out bad pixels for the user

Default user scenario Search for data Select science team recommendation for quality

screening (filtering) Download screened data

More advanced scenario Search for data Select custom quality screening options Download screened data

16DQSS replaces bad-quality pixels with fill values

Mask based on user criteria

(Quality level < 1*)

Good quality data pixels

retained

Output file has the same format and structure as the input file (except for extra mask and original_data fields)

Original data array(Total column precipitable water)

*0 = Best

17Visualizations help users see the effect of different quality filters

18DQSS can encode the science team recommendations on

quality screening AIRS Level 2 Standard Product

Use only Best for data assimilation uses Use Best+Good for climatic studies

MODIS Aerosols Use only VeryGood (highest value) over

land Use Marginal+Good+VeryGood over ocean

19

Initial settings are based on Science Team recommendation.

(Note: “Good” retains retrievals that are Good or better).

You can choose settings for all parameters at once...

...or variable by variable

Or, users can select their own criteria...

20

DEMO

http://ladsweb.nascom.nasa.gov Select Terra or Aqua Level 2 Atmosphere Select “51 - Collection 5.1”! Add results to Shopping Cart (do not go directly to

“Order Now”) Go to shopping cart and ask to “Post-Process and

Order” http://mirador.gsfc.nasa.gov

(Search for ‘AIRX2RET’)

21

DQSS Architecture

22

DQSS Flow

user selection

Screener

Quality Ontology

data file

quality mask

screened data file

EndUserdata file

w/ mask

Masker

Ontology Query

23DQSS Ontology(The Whole Enchilada)

Data Field Binding Module

Data Field Semantics

Module

Quality View Module

24DQSS Quality View Zoom

25DQSS encapsulates ontology and screening parameters

MODAPS Minions

LAADSWeb GUI

DQSS Ontolog

y Service

data productscreening options

MODAPSDatabase

screeningrecipe(XML)

selected options

screening recipe(XML)

screening parameters

(XML)Encapsulation enables reuse in other data centers with diverse environments.

26

DQSS Accomplishments DQSS is operational at two EOSDIS data

centers: TRL = 9 MODIS L2 Aerosols MLS L2 Water Vapor AIRS Level 2 Retrievals MODIS L2 Water Vapor

Metrics are being collected routinely DQSS is almost released (All known

paperwork filed)

27Papers and Presentations

Managing Data Quality for Collaborative Science workshop (peer-reviewed paper + talk)

Sounder Science meeting talk ESDSWG Poster ESIP Poster A-Train User Workshop (part of GES DISC

presentation) AGU: Ambiguity of Data Quality in Remote Sensing

Data

28

Metrics DQSS is included in web access logs sent to EOSDIS

Metrics System (EMS) Tagged with “Protocol” DQSS EMS MCT (ESDS Metrics) bridge implemented for both

DAACs* Metrics from EMS:

Users: 208 Downloads: 73,452 But still mostly AIRS, some MLS MODIS L2 Aerosol may be the “breakthrough” product

*DQSS is the first multi-DAAC bridge for ACCESS

29ESDSWG Participation by DQSS

Active participant (Lynnes) in Technology Infusion Working Group Subgroups: Semantic Web, Interoperability

and Processes and Strategies Established and led two ESIP clusters as

technology infusion activities Federated Search Discovery Earth Science Collaboratory

Participated in ESDSWG reorganization tiger team

Founded ESDSWG colloquium series

30

DQSS Spinoff Benefits DQSS staff unearthed a subtle error in the MODIS

Dark Target Aerosols algorithm Quality Confidence flags are sometimes set to Very

Good, Good or Marginal when they should be marked as “Bad”

Fixed in Version 6 of the algorithm Also fixed in Aerostat ACCESS project

Semantic Web technology infusion in DQSS enables future infusion of the RPI component of Multi-Sensor Data Synergy Advisor into the Atmospheric Composition Portal Shared skills, shared language, shared tools

The Earth Science Collaboratory concept was born in a discussion with Kuo about DQSS in the ESDSWG Poster Session in New Orleans.

31

Going Forward Maintain software as part of GES DISC core services

Modify as necessary for new releases New data product versions have simpler quality

schemes Add data products as part of data support

Sustainability Test: How long to add a product? Answer: a couple days for MLS Ozone

Continue outreach Demonstrated to scientists in NASA ARSET

Release Software and Technology

32

DQSS Recap

Screening satellite data can be difficult and time consuming for users

The Data Quality Screening System provides an easy-to-use service

The result should be: More attention to quality on users’ part More accurate handling of quality information… …With less user effort

33The Complexity of Data QualityPart 1

Quality Control ≠ Quality Assessment QC represents algorithm’s “happiness” with an

individual retrieval or profile at “pixel” level QA is science team’s statistical assessment of

quality at the product level, based on cal / val campaigns

Algorithm version N

Data Product

version NScience Team

Algorithm version

N+1

Data Product

version N+1Science Team

QC guess quality assessme

ntimproved QC estimation

QC guess qualityassessme

nt

QC vs. QA – User View

Give me just the good- quality data values

Tell me how good the dataset is

Data Provider

Quality Control

Quality Assessment

35

Quality (Priority) is in the Eye of the Beholder

Climate researchers: long-term consistency, temporal coverage, spatial coverage

Algorithm developers: accuracy of retrievals, information content

Data assimilators: spatial consistency Applications: latency, spatial resolution Education/Outreach: usability, simplicity

Recommendation 1: Harmonize Quality Terms

Start with ISO 19115/19157 Data Quality model, but...

Q: Examples of “Temporal Consistency” quality issues?ISO: Temporal Consistency=“correctness of the order of events”

Land Surface Temperature anomaly from Advanced Very High Resolution Radiometer

trend artifact from orbital

drift

discontinuity artifact

from change in satellites

Recommendation 2: Address More Dimensions of Quality

Accuracy: measurement bias + dispersion Accuracy of data with low-quality flags Accuracy of grid cell aggregations

Consistency: Spatial Temporal Observing conditions

38Recommendation #2 (cont.)More dimensions of quality

Completeness Temporal: Time range, diurnal coverage,

revisit frequency Spatial: Coverage and Grid Cell

Representativeness Observing conditions: cloudiness, surface

reflectance N.B.: Incompleteness affects accuracy

via sampling bias AIRS dry sampling bias at high latitudes due to

incompleteness in high-cloud conditions, which tend to have rain and convective clouds with icy tops

AIRS wet sampling bias where low clouds are prevalent

Quality Indicator: AOD spatial completeness (coverage)

MODIS Aqua MISR

Due to a wider swath, MODIS AOD covers more area than MISR. The seasonal and zonal patterns are rather similar.

Average Percentage of Non-Fill Values in Daily Gridded Products

40Quality Indicator: Diurnal Coverage for MODIS Terra in summer

Local Hour of Day

Probability (%) of an “Overpass*” in a Given Day*That is, being included in a MODIS L2 pixel during that hour of the day

Because Terra is sun-synchronous with a 10:30 equator crossing, observations are limited to a short period during the day at all but high latitudes (Arctic in summer).

Recommendation 3: Address Fitness for Purpose Directly*

Standardize terms of recommendation Enumerate more positive realms and

examples Enumerate negative examples

*A bit controversial

42Recommendation #4: More Outreach...

...Once we know what we want to say Quality Screening: no longer any excuse not to for

DQSS-served datasets Should NASA supply DQSS for more datasets?

Quality Assessment: dependent on Recommendations 1-3

Venues Science team meetings...but for QC this is

“preaching to the choir” NASA Applied Remote Sensing Education and

Training (ARSET): get them at the start of their NASA data usage! Demonstrated DQSS and Aerostat to ARSET Any others like ARSET?

Training workshops?

43

Backup Slides

44

Lessons Learned

The tall pole is acquiring knowledge about what QC indicators really mean, and how they should be used.

Is anyone out there actually using the quality documentation? Detailed analysis by DQSS of quality documentation turned up

errors, ambiguities, inconsistencies, even an algorithm bug Yet the help desks are not getting questions about these...

Seemed like a good idea at the time... Visualizations: good for validating screening, but do users use

them? Hdf-java: not quite the panacea for portability we hoped DQSS – OPeNDAP gateway:

OPeNDAP access to DQSS-screened data enables use by IDV /McIDAS-V

But it’s hard to integrate into a search interface (Not giving up yet...)

45Possible Client Side Concept

Ontology

Web Form

Data Provider

Masker

Screener

dataset-specific

instance infodata files

screening criteria

XML File

XML File

data files

GES DISC End User

46DQSS Target Uses and Users

  RoutineVisual-ization

Quick Recon.

Machine-level Metrics

Interdisciplinary X X      Educational X X      Expert/Power ?   X X  Applications       X  Algorithm Developers         X

47

ESDSWG Activities Contributions to all 3 TIWG subgroups

Semantic Web: Contributed Use Case for Semantic Web tutorial @ ESIP Participation in ESIP Information Quality Cluster

Services Interoperability and Orchestration Combining ESIP OpenSearch with servicecasting and

datacasting standards Spearheading ESIP Federated OpenSearch cluster

5 servers (2 more soon); 4+ clients Now ESIP Discover cluster (see above)

Processes and Strategies: Developed Use Cases for Decadal Survey Missions:

DESDynI-Lidar, SMAP, and CLARREO

File-Level Quality Statistics are not always useful for data

selectionStudy Area

Percent Cloud Cover?

Level 3 grid cell standard deviation is difficult to interpret due to its dependence

on magnitude

Mean

StandardDeviation

MODIS AerosolOptical Thicknessat 550 nm

Neither pixel count nor standard deviation alone express how representative the grid

cell value isMODIS Aerosol Optical Thickness at 0.55 microns

Level 3 Grid AOT Mean

Level 2 Swath

Level 3 Grid AOT Standard Deviation

Level 3 Grid AOT Input Pixel Count0 1 2 0 10.5

1 122

top related