Top Banner
Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz
19

Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.

Mar 27, 2015

Download

Documents

Jada Patterson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.

Vega: A Flexible Data Model for Environmental Time Series Data

L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz

Page 2: Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.

Storing High Resolution Sensor Data in a Relational Database

• Deploy system• Create data table • Date/Time column• Each variable is

unique column

Mendota_Buoy_Table:

Page 3: Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.

Accommodate Additional Site• Create Additional

Table• Table Name from Site

Name

Mendota_Buoy_Table:

Long_Lake_Buoy_Table:

1 2

• What about 5 sites? • Or 10?

Page 4: Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.

Changes in Measured Variables

• Add or remove variables

• End up with many NULL fields

• ‘Legacy Structure’

Page 5: Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.

Add Complex Metadata

• Add Metadata– Sensor Info– Data steward– Offset (depth, height)– Sampling Method

• Combine in Field Name– DO_05M– DO_DOPTO_05M– DO_YSI_10M– DO_YSI_CALIBRATED_10M– WIND_SPEED_VECTOR_AVG

Page 6: Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.

Long-term datasets are becoming more common

Sparkling Lake: Air Temperature

-30

-20

-10

0

10

20

30

40

1/1/1987 1/1/1991 1/1/1995 1/1/1999 1/1/2003 1/1/2007 1/1/2011

Date

Air Temp (C)

Page 7: Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.

Vega Data Model

• Goals– Accommodate dataset

changes over time• Eliminate legacy

structure

– Easy to understand and develop software

– Maintain rapid query times

• Inspired by the CUAHSI ODM

Page 8: Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.

Central Concepts

• Values– Individual observation (floating point format)– Air temp at airport at 12:00 1-1-2007 (-5.1° C)– Individually linked to metadata

• Data Streams– Group of Values which vary only in time– Individual time series– All air temp sampled at airport

• Wind speed is different ‘Data Stream’

Page 9: Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.

Vega: Simple

Page 10: Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.

Indexing

• Speeds up searching through large tables– Vega impossible without it

• Similar to an alphabetized phonebook

• With Index: – Time ~ Log(number of rows)

• Without Index:– Time ~ number of rows

• Values Index (also Unique)– DateTime– StreamID

Page 11: Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.

Performance

• 40 million Value Database Time to Query– One Value: 0.07 Sec– ~20k Values: 0.5 Sec

• Data Volumes– GLEON ~90,000 new values per day– Currently storing 30 million values– Values table 2.6 GB

Page 12: Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.

Software Development Gains

• Software for one site works for all sites

• Example: HTML– Many document formatting standards– HTML emerged as standard– Millions of websites can be read by one

browser

Page 13: Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.

Current software for GLEON and Madison LTER: Data Acquisition

Page 14: Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.

Data Retrieval:dbBadger.gleonrcn.org

Page 15: Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.

Data QA/QC

Page 16: Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.

Vision

• Simple software package– No IT support required– Facilitate web-enabled data sharing

• Future– Expand to all GLEON sites– Include those with custom IM system in place

Page 17: Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.

Acknowledgements

• This work was supported by awards from the National Science Foundation grants DEB-0217533, DBI-0639229, and DBI-0446017 and the Gordon and Betty Moore Foundation.

Page 18: Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.
Page 19: Vega: A Flexible Data Model for Environmental Time Series Data L. A. Winslow, B. J. Benson, K. E. Chiu, P. C. Hanson, T. K. Kratz.

Performance

0

10

20

30

40

50

60

70

80

90

100

3000000 13000000 23000000 33000000

Values Stored

Time to Execute Query (sec)

ODM:37316

ODM:15537

ODM:7245

ODM:1

Vega:44639

Vega:17,279

Vega:8639

Vega:1