Bernadette Hyland CEO & co-founder 11911 Freedom Drive, Suite 850 Reston,VA 20190 Tel. +1-571-331-3758 [email protected]@BernHyland [email protected]@3RoundStones Extend Your Reach. Linked Data for Smarter Decisions. Follow up information prepared for Robin Thottungal, Chief Data Scientist / Director of Analytics US Environmental Protection Agency - Feb 26, 2016
42
Embed
3 Round Stones Briefing to U.S. EPA's Chief Data Scientist on Open Data
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Credit: Frederick Giasson, Data Scientist & Software Developer, http://fgiasson.com/blog/index.php/2014/07/23/big-structures-where-the-semantic-web-meets-artificial-intelligence/
Hadoop Integration»While over 90% of the world’s data has been created in the last two
years, EPA has tremendous variety of data requires the “right tool for the job”
»Historic data (“short, wide, complex data”) vs.
»Granular sensor & GIS data (“long skinny data”)
»Core mission-based systems with robust historic data, includes:
»Toxics Release Inventory (TRI)
»Facilities Registry (FRS)
»RCRA Handler
»EPA’s enterprise information architecture should include a data platform that leverages Hadoop: HDFS and MapReduce, and accommodates EPA’s robust data landscape.
»Must support modern, open source tools for application development, visualizations, crowdsourcing, and deployment on the Web
8
One option - MarkLogic Integrates Hadoop Ecosystem &EPA’s Robust Data Landscape
Governments Worldwide are using a Linked Data Approach
Linked Data Apps use data from many
EPA programs and other Open Data Sources
Linked Data Management SystemFor government open data publishing
Funded by
Linked Data Platform is in QA now! https://usepa.3roundstones.net Anticipated to move to production in 2016.
shared innovation™
Search for facilities where we live. Unlike many EPA Web portals, linked data is human AND machine readable data. No screen scraping is required. Encourages re-use (discourages data silos)
The EPA Linked Data service CONNECTS data silos, and provides familiar map and table data views
Click to drill down to pollution reports that combine data from 5 previously unconnected data silos.
Click through to the source of the pollution data via the source reports (TRI).
EPA collects granular pollution data. Linked Data opens up the data to a much wider audience in a human readable format.
Previously, only people who employed complex screen scraping techniques could get at this data. Now, EPA open data is available using an international data standard, with one click!
Good news story!Pollution graphs created in one week using Open Source Software & EPA Linked Data
Use of shared vocabularies, e.g. Places, Geographis, Dublin Core, Geo, FOAF, ORG, Vcard are the “lingua franca” of data interoperability
Case StudyUsing EPA Linked Data to assist chronic asthma/COPD patients
with timely weather alerts
Funded by
User
NOAA US EPA AirNow
DBpediaNational Library of Medicine
US EPA SunWise
Case Study: OrgpediaAn open organizational data project
on public & private companies
Funded by
Using the Callimachus Open Source Data Platform, we rapidly built a crowdsourcing platform.
3 Round Stones provides commercial application
support on the cloud or behind the enterprise firewall using
Callimachus Enterprise customers are creating data-driven applications with data from leading graph
databases:
Callimachus is a scalable Web application server for publishing and consuming open data
Who uses it?
• Government, international publishers, healthcare / life sciencesWhat pain does Callimachus address?
• Integration of data silos where a graph approach is needed• Rapid creation of visualizations, dashboards (mashups) & info graphics• Less expensive solution to a data warehouse