Next Generation Environmental Informatics as exemplified by the Tetherless World Semantic Water Quality Portal Ping Wang 1 ([email protected]) , Jin Guang Zheng 1 ([email protected]) , Linyun Fu 1 ([email protected]) , Evan W. Patton 1 ([email protected]) , Timothy Lebo 1 ([email protected]) , Li Ding 1 ([email protected]) , Joanne S. Luciano 1 ([email protected]) , and Deborah L. McGuinness 1 ([email protected]) ( 1 Rensselaer Polytechnic Institute 110 8 th St., Troy, NY, 12180 United States) Poster: IN31B-1438 Glossary: EPA – U.S. Environmental Protection Agency MPN – Most Probable Number PML 2 – Proof Markup Language (PML) version 2 RPI – Rensselaer Polytechnic Institute TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute USGS – United States Geological Survey Motivation In late 2009 in Bristol County, RI there was a case of E. Coli contaminating the public water supply resulting in illnesses in the population, particularly young children. residents requested information concerning when the contamination began, how it happened, and what measures were being taken to monitor and prevent future occurrences. That event reflected the increasing demand for direct and transparent access to ecological and environmental information, and inspired the Semantic Water Quality Portal (SemantAqua) project. Next Generation Environmental Informatics Starting with the domain of water quality, we are investigating a general framework called SemantEcothat can support dynamicenvironmental informatics portals via semantically-enabled approaches, including: • capture of the semantics of domain knowledge using a family of modular simple OWL2 ontologies, • integration of environmental monitoring and regulation data from multiple sources following Linked Data principles • preservation of provenance metadata using the Proof Markup Language (PML) version 2 • inference of environment pollution events using OWL2 inference Combined with distributed sensor networks and incremental OWL2 classification, this work could provide a scaffold for deploying near real-time reporting of pollution events in communities. SemantAqua Workflow Location-based Information Retrieval Users input a ZIP Code™ to identify the area for their search. SemantAqua uses Geonames to look up additional information, e.g. city and state, to generate location-based query over the USGS and EPA datasets. The mobile interface also takes advantage of the W3C geolocation APIs to find polluted sites near the user. Enabling Context-Sensitive Actions In order to help users take an active role in monitoring water quality where they live, SemantAqua attempts to identify useful links where users can report problems with their local water supplies. Currently, the portal supports reporting to the EPA and some state departments that are related to environmental preservation and protection (e.g. the California Department of Fish and Game). Work to identify the appropriate links to external authorities that accept reports within their jurisdictions is still ongoing. Provenance-based Query SemantAqua captures provenance i during the data integration stages them in the Proof Markup Langu version2 Provenance Interlingua. The provenance information is beingused to support provenance-based queries. For example the system allows users to inspect data source information choose to rely only on data from so trust. This will be particularly i portal expands to include other sources of data (see Future Work). Using Ontologies as Facets Regulations are encoded as ontologies, an ontology a potential view of the wo select from a number of different regulation ontologies to classify the data, allow differences between state regulations regulations set forth by the EPA. In addition, type information from the water ontology that describes the different types of measurement sites and their pollut gives the user some control over is displayed on the map. More customized Queries The Characteristic, Health Concern and Time Frame facets enable the user to his/her query. The user can issue the most relevant to his/her interests • What sites/facilities in this ar withthese specific contaminants, e.g. fecal coliform, lead? • What polluted sites/facilities ar with pollutants that could cause symptoms or health problems, e.g. Di • What sites/facilities were pollut two years? Data Presentation Different icons are used to differentiate polluted sites from clean sites. Clicking on one of these polluted sites will display a popup window that provides more details about the pollution events: names of contaminants, measured values, limit values, time of measurement, and health effects. Archive CSV2RDF4LOD Enhance derive a r c h i v e Publish CSV2RDF4LOD Direct Visualize Reason derive Connecting to Health Issues Aiming at helping citizens investigate health impacts of water pollution, SemantAqua links water quality data to some known health considerations. We have generated an initial ontology describing potential health impacts of overexposure to contaminants. Initial content came from EPA. For example, exposure to E. Coli results in abdominal cramping and diarrhea, and if left untreated can result in high bloodpressure and kidney damage. Thishealth information is presented to the user together with the pollution details (see Data Presentation) and also used to customize information retrieval (see More Customized Queries). Time Series Visualization The timeseries visualization retrieves water quality data related to a selected water site or facility by querying the triple store and displays the water quality data as a time series. The user selects a particular permit for a facility, the characteristic of the water, and the test type (if any) associated with that particular characteristic. For the EPA data there are up to five different test types that take measurements in different ways and compute the limits differently: Quantity Average, Quantity Max, Concentration Min, Concentration Average, Concentration Max. The visualization on the right is about the quality of the water released by the Southeast Water Pollution Control Plant located in San Francisco. The plot showstheenterococci measurements in green and theregulation defined limit in blue. We can see that there are three severe violations (in red) happened during 2009 and 2010. Access to such information can help citizens be more informed and make requests to the state administrator to improve the handling of the water at the local facilities. Future Work Currently, twenty-seven states out of fifty have been encoded in RDF using the SemantEco and SemantAqua ont and work continues on converting the remaining states. The current portal contains the regulatory informati the fifty states. An effort is underway to encode additional regulatory information from different states a what states simply defer to the EPA on different pollutants as the EPA regulations have already been encode In addition, work on linking contaminants to external resources such as DBpedia and symptom a information from sources such as WebMD will provide the data needed to answer the more interesting question the health impacts of pollution. We also have initiated work on linking to reporting systems at the federal so that users can report potential issues in their neighborhoods, thus making this portal a environmental change. Lastly, we plan to augment the portal to generate data reports of user's query result contain query specification, identified pollution events, relevant converted and source data and provenance These data reports can be useful when users report their findings to authorities or environmental organizat Sponsors: Visit our project page at: http://tw.rpi.edu/web/project/SemantAQ Try it out: http://aquarius.tw.rpi.edu/projects/sem