Abstracts for the Workshop on Semantics in Geospatial Architectures: Applications and Implementation 28-29 October 2013 Pyle Center, University of Wisconsin-Madison (Abstracts, listed alphabetically by first author, linked to page number) Semantic Issues in Land Use and Land Cover Studies – Foundations, Application and Future Directions (Ola Ahlqvist) 2 Enhanced Semantics for Gazetteers (Kate Beard) 4 Working on Common Frameworks for Data Infrastructures (Gary Berg-Cross) 6 Open Geospatial Consortium (OGC) (Luis Bermudez) 7 Enabling Semantic Search in Geospatial Metadata Catalogue to Support Polar Sciences (Vidit Bhatia [Wenwen Li]) 8 World Historical Ontology Research in CHIA Project (Kai Cao) 10 Enabling Semantic Mediation in OGC SWE (Janet Fredericks) 12 The Many Semantic Domains of Spatial Data Infrastructure (Mark Gahegan) 14 The iPlant Collaborative Semantic Web Platform (Damian Gessler) 16 Semantic Portals for Semantic Spatial Data Infrastructures (Francis Harvey) 18 Implementation Issues in using Parliament for GeoSPARQL (Dave Kolas) 19 Geosearch: A System Utilizing Ontology and Knowledge Reasoning to Support Geospatial Data Discovery (Kai Liu [Chaowei Yang]) 20 Developing Semantics Rules using Evolutionary Computation for Information Extraction from Remotely Sensed Imagery (Henrique Momm) 21 Reading News with Maps (Hanan Samet) 22 Spatial Semantics enhanced Geoscience Interoperability, Analytics, and Applications (Krishnaprasad Thirunarayan)(T.K. Prasad) [work done with Amit Sheth] 23 The Need to Determine Ontology System Requirements for Online Graduate Students (Dalia Varanka) 24 The GeoQuery Tool for Parliament SPARQL (James Wilson) 26 1
26
Embed
Abstracts for the Workshop on Semantics in Geospatial ... · field of land use and land cover (LULC) is seemingly at a crossroads for effective and open uses of data. The use of categorical
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Abstracts for the Workshop on Semantics in Geospatial Architectures:
Applications and Implementation 28-29 October 2013
Pyle Center, University of Wisconsin-Madison (Abstracts, listed alphabetically by first author, linked to page number)
Semantic Issues in Land Use and Land Cover Studies – Foundations, Application and Future Directions (Ola Ahlqvist) 2
Enhanced Semantics for Gazetteers (Kate Beard) 4
Working on Common Frameworks for Data Infrastructures (Gary Berg-Cross) 6
Open Geospatial Consortium (OGC) (Luis Bermudez) 7
Enabling Semantic Search in Geospatial Metadata Catalogue to Support Polar Sciences (Vidit Bhatia [Wenwen Li]) 8
World Historical Ontology Research in CHIA Project (Kai Cao) 10
Enabling Semantic Mediation in OGC SWE (Janet Fredericks) 12
The Many Semantic Domains of Spatial Data Infrastructure (Mark Gahegan) 14
The iPlant Collaborative Semantic Web Platform (Damian Gessler) 16
Semantic Portals for Semantic Spatial Data Infrastructures (Francis Harvey) 18
Implementation Issues in using Parliament for GeoSPARQL (Dave Kolas) 19
Geosearch: A System Utilizing Ontology and Knowledge Reasoning to Support Geospatial Data Discovery (Kai Liu [Chaowei Yang]) 20
Developing Semantics Rules using Evolutionary Computation for Information Extraction from Remotely Sensed Imagery (Henrique Momm) 21
Reading News with Maps (Hanan Samet) 22
Spatial Semantics enhanced Geoscience Interoperability, Analytics, and Applications (Krishnaprasad Thirunarayan)(T.K. Prasad) [work done with Amit Sheth] 23
The Need to Determine Ontology System Requirements for Online Graduate Students (Dalia Varanka) 24
The GeoQuery Tool for Parliament SPARQL (James Wilson) 26
1
Semantic issues in Land Use and Land Cover Studies – Foundations, Application and
The Open Geospatial Consortium (OGC) is an international industry consortium of 474
companies, government agencies, and universities participating in a consensus process to
develop publicly available interface standards. These standards support interoperable solutions
that "geo-enable" the Web, wireless and location-based services, and mainstream IT. The
standards empower technology developers to make complex spatial information and services
accessible and useful with all kinds of applications.
This presentation will be on the importance of standards and how to implement semantic
capabilities into statewide geospatial information systems. These systems will likely use
standards such as the OGC Web Feature Service (WFS) to publish vector data (e.g., polygons
representing parcels or state boundaries). OGC, in the last two testbeds (OWS-8 and OWS-9),
has advanced cross-community interoperability. Feature types could be mapped to other states’
data to better share information across states, or features can be categorized depending on the
purpose (customer or decision maker).
7
Enabling Semantic Search in Geospatial Metadata Catalogue to Support Polar Sciences
Wenwen Li1 and Vidit Bhatia2
1GeoDa Center for Geospatial Analysis and Computation, School of Geographical Sciences and Urban Planning, Arizona State University, Tempe, AZ 85287
2Department of Computer Science, Arizona State University, Tempe, AZ 85287 {wenwen, vidit.bhatia}@asu.edu
Studies of polar regions have become increasingly important in recently years because (1) increasing interest in mining and natural resource exploration; (2) both poles are sensitive to human activities and global, environmental, and climate changes; and (3) polar regions are key drivers of the Earth climate. In May 2013, the White House released “President’s National Strategy for the Arctic Region” and identified “increasing understanding of the Arctic through scientific research and traditional knowledge” and “making decisions using the best available information” as the overarching stewardship objectives to achieve in the coming decade. Fortunately, we are entering the era of big data. Pervasive technology for Earth observation such as sensor network, high-resolution telescope and the Polar satellites enables the retrieval of large amount of polar data to accelerate the scientific discovery process. Several data centers, including ACADIS (https://www.aoncadis.org), NSIDC (http://nsidc.org/data/search/data-search.html) and the Antarctic and Southern Ocean Data Portal (http://www.marine-geo.org/portals/antarctic/) have established to share these available resources. A metadata catalogue is usually provided by these portals to support the discovery of data of interest through a keyword-based search interface. Currently, Lucene technique is widely used in these portals and this text-based search approach hinders the retrieval of semantically related dataset, the content of which is described using different keyword set from a user’s query. To enable an intelligent search and a smart connection between an end user and his most needed dataset, semantic-based search comes into play. This search strategy can be categorized into two classes: ontology-based semantic expansion and smart search based on knowledge mining. The ontology-based approach can be considered as a top-down approach: possible semantic linkages are populated by domain experts and encoded in a machine-understandable format, and then a user’s query is expanded by traversing these predefined semantic linkages/relationships. This approach assumes that the semantic relationships in the data can well be captured in advance. However, different people tend to have different perspective on how the ontology should be established and it is extremely difficult to build a complete knowledge base to serve various search purposes. To overcome this issue, in this work, we aim at employing a bottom-up approach which relies on mining the dataset itself to discover the latent semantic relationships between keywords/terms in the metadata corpus. A Latent semantic indexing technique combined with Paice Stemming algorithm is performed on top of the Lucene indexing to further improve
the search results. A new ranking algorithm based on a revised cosine similarity and two-tier ranking algorithm ensures the high precision of the top search results. In addition, we integrated this approach into a popular metadata catalogue- Geonetwork (http://geonetwork-opensource.org/) to broadly share this semantic search capability with peers. We expect this work to greatly enhance the capability of data search in existing polar data portals and geospatial data discovery at large.
metadata—the description of values and variables in each data set and the recording of
the sources and compilers of data. The incorporation of such existing detailed
classifications means that data- ingest work can start before the high level framework –
the overall project ontology – is finalized. Later stages of ontology include more
comprehensive categorization of types of data, definitions and classification for the
linkage and aggregation of datasets, and definitions for the analysis and visualization of
data.
As one of the primary researchers in the NSF funded CHIA project, I believe that through
this workshop, the world historical ontology research, even the study of world historical
gazetteers could benefit a lot from all the other successful applications and
implementations of semantics in geospatial architectures.
Hopefully I could be one of the attendees of this workshop (with no presentation).
Many thanks.
Best regards,
Postdoctoral Research Associate, University of Pittsburgh
Visiting Research Fellow, Harvard University
11
Janet Fredericks Woods Hole Oceanographic Institution [email protected] Enabling Semantic Mediation in OGC SWE The OGC has developed core standards to provide a framework that enables machine-to-machine harvesting of observational geospatial data and metadata. What is under the hood doesn’t matter – data can be stored in native data systems. The services upon an HTML request return the OGC-adopted encodings that encapsulate the information, enabling machine harvesting of data selected through geospatial, temporal queries, as well as other specifications, depending on the implementation. The use of the OGC standards supports brokering activities that can provide translations across standards. But, use of the adopted service standards in a collaborative environment requires an implementation designed to enable semantic mediation. OGC Sensor Observation Service (SOS) offers a standards-based framework in which to describe observational provenance (SensorML), as well as observational data (O&M). The use of OGC SOS has been adopted in real-time ocean observing systems, such as the NOAA IOOS and the associated regional associations (NFRA). Through participation in the EarthCube Brokering Team Hackathons, a demonstration SOS delivering oceanic wave data was tested on three brokering sites. Through the NOAA ERDDAP broker, the WHOI/Q2O (Quality to OGC) SOS implementation (q2o.whoi.edu/node/129) was translated into ISO metadata with NetCDF or TSV requested output. Brokering services enable users to choose to work in frameworks beyond the primary data offering without installation of or development of translation tools. Through the ESRI Geoportal and the Data Access Broker, catalogue services were automatically populated with information relating to geospatial and temporal coverage along with basic metadata, enabling data discovery and access. The Q2O SOS demonstration was developed, with funding from NOAA, to enable dynamic quality assessment. It is content-rich and delivers information about how the observations came to be, as well as information about quality tests and associated real-time results. The implementation also demonstrates the ability to enable the development of ontologies by integrating links within the SOS to URLs that link to SKOS-encoded terms (Figure 1). These ontologies can be utilized in collaborative environments where terms with the same or similar meanings may have different names but must have associations. For example (Figure 2), one provider’s QC test result is called pass and another’s is called _1 and a data aggregator can map each to the same meaning. Also, one can use the mapped terms to have code values encoded. For example, one’s ass may have a value of one (1) and another provider may use a value of zero (0) to represent a passed qc-test. Through inclusion of links to encoded terms, these values can be mapped to have the same meaning in data integration and filtering data offerings. The use of standards in geospatial data access is important. But without the inclusion of references to registered terms in a semantics framework, ontologies cannot be developed, making automated data assessment and integration nearly impossible.
The Many Semantic Domains of Spatial Data Infrastructure
Mark Gahegan and Ben Adams
The Centre for eResearch, University of Auckland
New Zealand
Collections of geospatial data can become overwhelming to search and to organize, yet the
successful management and description of data is an essential step towards an effective
Spatial Data Infrastructure (SDI). The following list introduces five very difficult challenges,
none of them solved as yet.
First, there’s the volume of the data, from massive imagery collections, new sensing
technologies, crowd-sourced data and more, along with better availability of more traditional
data, such as roads and census data. The data production rate is staggering.
Second, there’s the complexity of the data itself, often with a generous array of attributes,
with intricate spatial encodings and complex geometric and topologic relations.
Third, there’s the variety of conceptual models used, encompassing imagery, collections of
features (objects), thematic maps, regular and irregular networks, graphs, point clouds, even
place-based data that is devoid of any explicit geography.
Fourth, there is the problem of attribute semantics. Some data have rich schemas and
ontologies to help describe their meaning, but most do not. Where schemas do exist, they
often do not align well with each other. So being certain of meaning, or harmonizing data to
ensure a consistent meaning is challenging at best.
Fifth, there’s the difficulty of establishing the authority of data. Within an increasingly
complicated network of data suppliers, how does a researcher know which data to trust, or
which data has been through a quality control process, or which data is the most reliable?
Buried within this issue are all the traditional problems of provenance, accuracy and fitness
for use.
We tend to think of the ‘semantic problem’ as pertaining only to attribute data, but in fact
there are semantic issues across all of the above domains (and possibly more domains
besides). So judging the ‘worth’ or ‘utility’ of a dataset for a given task requires that we
reason across all of these domains together—in an SDI, there really is no point in solving any
of the problems in isolation, since even datasets that harmonize correctly are no use at all if
they have the wrong conceptual model or are not at all trustworthy.
When searching for, or trying to understand a dataset, how should such each of these domains
or dimensions be presented and explored? Can they all be harmonized into a single
conceptual model for an SDI? Can some kind of ‘fitness-for-use’ score be calculated across
the combined space, and used as an aid to locating suitable datasets?
In this talk, we:
Briefly recap each of the above issues.
Tentatively propose an over-arching framework to organize the above domains into a
series of dimensions that can be stacked together to support some kind of distance
14
metrics. Such metrics allow us to represent the similarity between datasets—given
some objective function defined by the user—and thus the appropriateness of a
dataset for a given task.
Introduce some of the semantic challenges that need to be overcome in order to
reason over such a complex and multi-faceted space.
15
Damian Gessler Semantic Web Architect The iPlant Collaborative University of Arizona Tucson, AZ 87521 [email protected] Indication: Can attend the meeting and discuss iPlant’s Semantic Web Platform at the ‘Workshop on Semantics in Geospatial Architectures: Applications and Implementation.’ The iPlant Collaborative Semantic Web Platform Geospatial semantics has huge promise. Yet implementing semantics in large infrastructures is challenging. Early implementers face significant obstacles in migrating from research-grade proof-of-concept applications to production-grade, value-added platforms. The well-informed perceived promise of semantics may cloud inconvenient “details” that significantly hinder operational maturity. Yet the promise is real, and the complexity of today’s earth science challenges imply that computational semantics has an important role. To get from promise to realization, we need a sober understanding of the challenges and solutions. Cyberinfrastructure semantics is challenging because there does not exist a generally adopted technology stack that integrates the various technology layers and social norms into a readily accessible platform for the end-user. Thus semantic technologies—from RDF (Resource Description Framework) and OWL (Web Ontology Language) to pseudo-semantic ontologies such as OBO (Open Biological and Biomedical Ontologies), Darwin Core, schema.org, OGC (Open Geospatial Consortium), and LOD (Linked Open Data)—exist in a disjointed ecosystem of ad hoc installations and social contracts. Indeed, the implied semantics inherent in making any individual system operate often outweigh the explicit semantics that are needed for computational and integrative maturity. The iPlant Collaborative—an NSF-funded large cyberinfrastructure for the plant sciences—approaches this challenge with a three-tier architecture. At the Foundational layer is a tight collaboration with NSF XSEDE resources (Extreme Science and Engineering Discovery Environment; https://www.xsede.org). This delivers world-class high performance computing clusters (“big iron”) at the peta-FLOPS and peta-byte scale [O(1015) floating point operations per second and storage capacity respectively]. The next tier is an Enterprise layer, consisting of a Web-accessible Discovery Environment and virtual machine farm. The former delivers a breath of applications (approximately 300 bioinformatic applications accessible in a virtual desktop interface), while the later delivers a depth (scientists and labs can configure customized virtual machines with specific software and workflows). The final tier is the semantic layer of iPlant’s production-grade semantic platform using SSWAP: Simple Semantic Web Architecture and Protocol. SSWAP (http://sswap.info) uses open, Just-In-Time ontologies and transaction-time OWL reasoning to bridge Foundational resources with third-party Web sites and distributed scientific
16
offerings. SSWAP is a light-weight OWL protocol that allows any Web resource to describe its offering—its mapping of some input to some output—in simple, first-order description logic. iPlant runs a semantic Discovery Server that allows users to “discover” these resources, send data, invoke and execute services, and daisy-chain services into semantic pipelines. Visitors to third-party Web sites can click a button and have requests sent to iPlant for real-time semantic service discovery and invocation. Actual Web service execution is performed at the separate, distributed semantic Web service sites. Thus iPlant’s Semantic Web Platform is a semantic broker performing both vertical and horizontal semantic integration: it is not simply a feeder of data into iPlant, but an integrator of third-party and/or iPlant semantic resources across the Web. For the geospatial context, iPlant collaborators at TreeGenes have implemented a Web resource called CartograTree for tree scientists. Scientists can visually select tree samples as displayed by their lat/long coordinates and then send data into just-in-time TreeGenes and iPlant semantic pipelines. A worked example is at http://sswap.info/example.
17
Semantic Portals for Semantic Spatial Data InfrastructuresFrancis HarveyUniversity of [email protected]
This position paper suggests that a challenge in developing semantic interpretability for next generation spatial data infrastructures lies in the creation of portal-level semantics. What this means is that architectures have to conceptually blend the work on semantic interoperability with portal designs that support domain needs and requirements. Why? Portal support specific applications or ranges of applications. Semantic interoperability has however focused on data set level documentation and operationalization. Merging these two approaches leads to architectures that support domain semantics through terminological and interface bridges that connect to robust data set level description languages. This approach seems to offer a helpful way to resolve the current arms-race in portal building and harness the strengths of semantic interoperability solutions. The idea comes as the University of Minnesota is beginning to develop an interoperable data management system for geospatial data. A large research university, with over 52,000 students and funded research projects totaling over $749 million in 2012, UMN researchers produce practically every conceivable kind of spatial data. Spatial and temporal resolutions, object footprints, semantics, etc vary enormously. Second, disciplinary and legal requirements lead to a broad range of data practices across the sciences. All attempts to the contrary, it seems likely that a large number, if not finally, unknowable number, of portals to facilitate researcher access to data resources can develop. Getting ahead of this development and providing a suite of portals that support researcher needs to ingest, edit, display, search and visualize semantic data in a user-friendly and meaningful way seems a wise strategy in an era of diminishing resources. Indeed, how can an information infrastructure support the diversity of a research university? How can it do this especially when the nature of research encourages a multiplicity of management approaches and organization of research data? Lessons from experiences with spatial data infrastructures suggest that multiple means to participate and balance researcher control and institutional management offer the best concepts for architectures that support data sharing without encumbering researchers with bureaucracy. The challenge lies in creating user-friendly interfaces to display, browse and query data while the underlying Semantic Web technology remains extremely obtuse for users unfamiliar with the RDF triple format. Instead of attempting to create a single universal geospatial data portal for the university, we are exploring the concept of supporting multiple portals that gain access to data through a linked open data design to metadata and data held by researchers or archived by the institution. This holds similarities with Semantic Web Portals proposed by a number of researchers including Ding et al 2010. We are currently exploring this idea with participants in the U-Spatial project through a user-orientated design process. The concept described in this brief paper suggests the initial idea; it will certainly be altered through the design process before implementation begins.
The difference between low-level information extraction techniques using traditional pixel-based classification methods and the high-level information extracted by a human analyst is often referred to as “semantic gap”. Human analysts use a complex combination of different image cues such as color (spectral information), image texture, object geometry (geometry of image regions), and context (relationship between image regions). Because human analysis of large areas and often multiple periods of time (multiple images) is costly and time consuming, scientists have recognized the importance of developing more sophisticated semi-automated or automated methods to convert large quantities of image into actionable information. The challenge resides in multifaceted problems where the relationship between image’s regions is too complex to be defined by explicit programming, and therefore stochastic algorithms are being investigated as a plausible alternative. Evolutionary computation algorithms were integrated with standard image processing and unsupervised clustering algorithms to derive individual image cues in a “learn-from-examples” mode. Genetic programming was selected as the evolutionary engine because these methods represent candidate solutions as mathematical equations (human readable), do not require assumptions about target data statistics, and their ability to develop robust models even when the parameters relationship is not fully understood. The principal objective is to bridge the semantic gap by sub-dividing the overall information extraction task into sequential steps. In the first steps, the evolutionary framework derives candidate solutions based on image spectral and texture information image cues through optimize search for spectral transformations and image texture operators (or sequence of texture operators) that maximizes the influence of the feature of interest and minimizes the influence of the remaining image background. Based on the findings of the initial steps, the evolutionary framework is used to evolve solutions to identify features of interest based on geometric properties of image regions. Future research opportunities are also discussed herein. Potential developments should include the addition of more steps to investigate ontology between features (relationship between image regions). The efficient optimized learn-from-examples schema of evolutionary computation algorithms could be utilized to generate the most appropriate ontology representation to replicate our spatial relationship perception ability. Enhancements should also be made to combine all individual image cues solutions into a single decision-making procedure, just like human analysts do. Finally, a database of semantics rules should be implemented and shared with the scientific community allowing for collaborative contributions and enhancements. I have interest in presenting.
NewsStand is an example application of a general framework that we are developing to enable
people to search for information using a map query interface, where the information results from
monitoring the output of over 10,000 RSS news sources and is available for retrieval within
minutes of publication. The advantage of doing so is that a map, coupled with an ability to vary
the zoom level at which it is viewed, provides an inherent granularity to the search process that
facilitates an approximate search. This distinguishes it from today's prevalent keyword-based
conventional search methods that provide a very limited facility for approximate searches which
are realized primarily by permitting a match via use of a subset of the keywords. However, it is
often the case that users do not have a firm grasp of which keyword to use, and thus would
welcome the capability for the search to also take synonyms into account. In the case of queries
to spatially-referenced data, the map query interface is a step in this direction as the act of
pointing at a location (e.g., by the appropriate positioning of a pointing device) and making the
interpretation of the precision of this positioning specification dependent on the zoom level, is
equivalent to permitting the use of spatial synonyms. The issues that arise in the design of such a
system include the identification of words that correspond to geographic locations, and the
disambiguation of the ones that do.
22
Spatial Semantics enhanced Geoscience Interoperability, Analytics, and Applications Krishnaprasad Thirunarayan and Amit Sheth
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, OH-45435 [email protected], [email protected]
We present our research ideas for developing cyberinfrastructure for Geoscience
applications developed in the context of the EarthCube initiative, and our NSF-sponsored work on incorporating spatial-temporal-thematic semantics for enhanced querying and feature extraction from sensor data streams. (1) Semantics-empowered Cyberinfrastructure for Geoscience applications Rapidly maturing semantic technologies, based in part on Semantic Web standards, have the potential to increase opportunities for interdisciplinary research by providing support and incentives for sharing, publishing, accessing and discovering heterogeneous data. Our thesis is that associating machine-processable lightweight semantics with the long tail of science data can overcome challenges associated with data discovery, integration and interoperability caused by data heterogeneity. In order to demonstrate this, we propose to develop cyberinfrastructure (CI) utilizing lightweight semantic capabilities to serve individual researchers. Specifically, the focus is on ease of use, low upfront cost, and shallow semantics that appeals to, and is most likely to be used by the broad community of geoscientists. The choice of using controlled vocabularies and lightweight ontologies, as compared with using formal ontologies in OWL, reduces complexities and training efforts, enabling wider and faster adoption by scientists not skilled in computer science techniques. We propose to use existing, community-ratified and enhanced ontologies that scientists can employ with minimal training to easily annotate (tag) their data, publish it, and discover relevant data in support of scientific discoveries. Coarse-grained annotations can facilitate semantic search, while fine-grained annotations and extraction can be used to create Linked Open Datasets (LOD). Using LOD, that is increasingly being adopted by open government and open science initiatives, data can be translated to a form that makes it readily available, reusable, and amenable to automatic processing, while supporting conceptual richness of data representation. Our research is aligned with National Science Foundation’s EarthCube initiative. (2) Expressive search and integration using Geospatial information
We have developed expressive extensions to RDF and SPARQL that associate spatio-temporal information with triples via annotations and employ rich operators to support inferencing1. This framework extended using geospatial knowledge to support spatial semantics can support interoperability and complex analysis.2
In the context of Semantic Sensor Web3, to process multimodal sensor data streams, we have used spatio-temporal context in Semantic Sensor Observation Service (SemSOS) to aggregate and combine primitive weather sensor data to obtain weather features, and exploit Geonames portion of LOD to map place names to GPS coordinates, to locate relevant sensors and to provide easy-to-use and natural query interfaces4. 1 http://knoesis.org/research/semweb/projects/stt/ 2 http://knoesis.org/library/resource.php?id=903 3 http://knoesis.wright.edu/research/semsci/application_domain/sem_sensor/ 4 http://www.slideshare.net/patniharshal/real-‐time-‐semantic-‐analysis-‐of-‐streaming-‐sensor-‐data
23
The Need to Determine Ontology System Requirements for Online Graduate Students