Top Banner
Tools of the Data Smithe’s Trade Joe Smithe, Tim Hunter, Tad Slawecki, Steve Ruberg
34

Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Feb 02, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Tools of the Data Smithe’s Trade

Joe Smithe, Tim Hunter, Tad Slawecki, Steve Ruberg

Page 2: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Before we begin, a thank you:

Drew Gronewold, Tim Hunter, Steve Ruberg, Ron Muzzi, more…

Special thanks to the IJC for the invite

Page 3: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Recommendations to the IJC

● The end point: storage, access, analysis, presentation○ Products of sensor technology infrastructure○ Data from sensors to users, decision makers, etc.

● Some old tech are fine● Some new tech are begging to be adopted● Do what is socially sustainable and secure

○ Account for the retiring generations and the up and coming working ones

○ Adopt technologies with support from many people

■ Fair chance of hackers, greater chance of good programmers who can fix things fast

Page 4: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Labyrinths of data, hard to get around...

Page 5: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

http://www.ibmbigdatahub.com/infographic/four-vs-big-data

Page 6: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

http://s382.photobucket.com/user/Gandalf-lotr/media/Gandalfsfirework.jpg.html

Page 7: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

http://corecanvas.s3.amazonaws.com/theonering-0188db0e/gallery/original/pippinmerry011128a.jpg

Page 8: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

http://iihtofficialblog.blogspot.com/2014/07/5-vs-of-hadoop-big-data.html

Page 9: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

http://iihtofficialblog.blogspot.com/2014/07/5-vs-of-hadoop-big-data.html

Page 10: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview of Infrastructure Technology

Page 11: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

DISCLAIMER: I HAVE NOT WORKED WITH ALL OF THESE TECHNOLOGIES. THIS IS MERELY A

CATALOG OF TOOLS TO DISCUSS.

Overview of Infrastructure Technology

Page 12: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview

1. Target Platforms2. Data Storage Formats3. Data Management4. Model Coupling/Combining5. Probabilistic Modelling6. Distributed processing7. Modelling Services, Processing, Presentation8. Visualization and interaction

Page 13: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview

1. Target Platforms2. Data Storage Formats3. Data Management4. Model Coupling/Combining5. Probabilistic Modelling6. Distributed processing7. Modelling Services, Processing, Presentation8. Visualization and interaction

Page 14: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Target Platforms

DesktopMobile

orTablet

Web

Page 15: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Target Platforms

DesktopMobile

orTablet

Web

MS Windows● .NET● OneCoreApple● OS X and

Xcode*nix● Various

(Linux)

Win Phone

Apple● iOS

Android

Microsoft● ASP .NETLinux● LAMPOther● Wordpress● Drupal● Many more

Page 16: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview

1. Target Platforms2. Data Storage Formats3. Data Management4. Model Coupling/Combining5. Probabilistic Modelling6. Distributed processing7. Modelling Services, Processing, Presentation8. Visualization and interaction

Page 17: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Storage formats

● Plain text○ “Future proof”○ Growth can prove challenging○ Examples: XML, WaterML,

[other]ML, CSV● Binary

○ Computers eat this stuff up, but humans don’t. Good to have transformers to create downloadable and ingestible copies

○ Examples: GRiB, NetCDF

BluePenguino - Photobuckethttp://culturepopped.blogspot.com/2014/12/the-legends-of-pac-man.html

Page 18: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview

1. Target Platforms2. Data Storage Formats3. Data Management4. Model Coupling/Combining5. Probabilistic Modelling6. Distributed processing7. Modelling Services, Processing, Presentation8. Visualization and interaction

Page 19: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Data management

● Data provenance (origin) - copies aren’t great, version control systems offer limited help. Authoritative sources and citations to them mitigate noise, copies.

● Structured directories, even on the web● Relational Database Management Systems (RDBMSs)

○ Postgre SQL (recommended), MySQL, SQLite■ http://ask.metafilter.com/92162/MySQL-vs-PostgreSQL

○ Big Data - NoSQL, SciDB○ Geospatial - PostGIS, SpatialLite, MySQL Spatial

■ CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above

■ Web services (accessibility)

Page 20: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Data management - new tech to adopt

● GRAPH DATABASES○ Fund them○ Power++

■ Utilizes the power of graphs to explore relationships between data points

■ Understand, investigate many to many, one to many, many to one relationships with ease

○ http://cyanohub.earth.lsa.umich.edu/

○ For more: http://neo4j.com/developer/graph-db-vs-

rdbms/ and http://mashable.com/2012/09/26/graph-databases/

Page 21: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview

1. Target Platforms2. Data Storage Formats3. Data Management4. Model Coupling/Combining5. Probabilistic Modelling6. Distributed processing7. Modelling Services, Processing, Presentation8. Visualization and interaction

Page 22: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Model coupling or combining

● Java-based Object Modelling System● OpenMI (Open Modelling Interface, C# and Java)

○ GUIs - OpenMI Configuration Editor, Pipistrelle

A lot of specialized models focus on limited domains, and via coupling, we can attain a modelling domain that spans current problems...

Page 23: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview

1. Target Platforms2. Data Storage Formats3. Data Management4. Model Coupling/Combining5. Probabilistic Modelling6. Distributed processing7. Modelling Services, Processing, Presentation8. Visualization and interaction

Page 24: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Probabilistic Modelling

● Bayesian hierarchical modelling is becoming a very popular approach in many problems where estimates are many but conclusions are few or divergent○ JAGS○ Stan

● Cha, Y. and C.A. Stow. 2014. A Bayesian network incorporating observation error to predict phosphorus and chlorophyll a in Saginaw Bay. Environmental Modelling & Software, 57: 90- 100

● Gronewold, A.D., J. Bruxer, D. Durnford, J. Smith, A. Clites, F. Seglenieks, T. Hunter, S. Qian, V. Fortin (Accepted, 2016).

Hydrological drivers of record-setting water level rise on Earth’s

largest lake system. Water Resources Research.

Page 25: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)
Page 26: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview

1. Target Platforms2. Data Storage Formats3. Data Management4. Model Coupling/Combining5. Probabilistic Modelling6. Distributed processing7. Modelling Services, Processing, Presentation8. Visualization and interaction

Page 27: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Distributed processing

● High Performance Computers (HPCs, formerly Super)● MapReduce (key/value pairs as input)

○ programming model, similar to the Message Passage Interface (MPI)

○ scalable○ reputable fault tolerance (robust)

■ Apache Hadoop (an implementation)■ R and Hadoop Integrated Processing Environment

(RHIPE)

Page 28: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview

1. Target Platforms2. Data Storage Formats3. Data Management4. Model Coupling/Combining5. Probabilistic Modelling6. Distributed processing7. Modelling Services, Processing, Presentation8. Visualization and interaction

Page 29: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Modelling Services, Processing, Presentation

● Matlab, R, Python (Anaconda distribution), assisted with shell scripting○ http://www.talyarkoni.org/blog/2013/11/18/the-homogenization-of-scientific-computing-or-why-python-

is-steadily-eating-other-languages-lunch/

● Julia● Web Development

○ PHP, Javascript (and packages, more later)○ Frameworks under Java, Python, Ruby on Rails○ *.NET Frameworks (Microsoft)○ Backbone.js, Django

○ Content Management Systems (CMSs) such as Drupal, CKAN

Page 30: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Overview

1. Target Platforms2. Data Storage Formats3. Data Management4. Model Coupling/Combining5. Probabilistic Modelling6. Distributed processing7. Modelling Services, Processing, Presentation8. Visualization and interaction

Page 31: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Fireworks (Visualization)

● Often cast as the data themselves...

● Javascript Packages: jqPlot, Flot, Processing (language), Raphaël, D3 (successor to Protovis), Google Charts, and Dygraphs

● Apache Flex● Mapping: OpenLayers, Google Earth/Maps● Interfaces: CUAHSI HydroShare, QGIS (like ArcGIS), uDig● Desktop plotting packages:

○ R: ggplot2, ggvis, rgl, and default packages○ Python: Matplotlib, Plotly, Pychart...

■ https://wiki.python.org/moin/NumericAndScientific/Plotting

Page 32: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

jpTheSmithe.com

Page 33: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

All from Environmental Modelling and Software:

● Web technologies for environmental big data (Open Access), Vitolo et al. (2015)

● Web based visualization of large climate data sets, J. R. Alder and S.W. Hostetler (2015)

● A review of open source software solutions for developing water resources web applications, Swain et al. (2015)

And we’ll probably do this again in 5-10 years next year!

Relevant parchments:

Page 34: Smithe’s Trade Tools of the Datajoeseph/docs/IAGLR2016IJCTalk.pdf · CUAHSI Hydroserver, THREDDS, MapServer, GeoServer, and Deegree implement above Web services (accessibility)

Recommendations to the IJC

● The end point: storage, access, analysis, presentation○ Products of sensor technology infrastructure○ Data from sensors to users, decision makers, etc.

● Some old tech are fine● Some new tech are begging to be adopted● Do what is socially sustainable and secure

○ Account for the retiring generations and the up and coming working ones

○ Adopt technologies with support from many people

■ Fair chance of hackers, greater chance of good programmers who can fix things fast