Abstracts for the Workshop on Semantics in Geospatial ... · field of land use and land cover (LULC) is seemingly at a crossroads for effective and open uses of data. The use of categorical

Abstracts for the Workshop on Semantics in Geospatial Architectures:

Applications and Implementation 28-29 October 2013

Pyle Center, University of Wisconsin-Madison (Abstracts, listed alphabetically by first author, linked to page number)

Semantic Issues in Land Use and Land Cover Studies – Foundations, Application and Future Directions (Ola Ahlqvist) 2

Enhanced Semantics for Gazetteers (Kate Beard) 4

Working on Common Frameworks for Data Infrastructures (Gary Berg-Cross) 6

Open Geospatial Consortium (OGC) (Luis Bermudez) 7

Enabling Semantic Search in Geospatial Metadata Catalogue to Support Polar Sciences (Vidit Bhatia [Wenwen Li]) 8

World Historical Ontology Research in CHIA Project (Kai Cao) 10

Enabling Semantic Mediation in OGC SWE (Janet Fredericks) 12

The Many Semantic Domains of Spatial Data Infrastructure (Mark Gahegan) 14

The iPlant Collaborative Semantic Web Platform (Damian Gessler) 16

Semantic Portals for Semantic Spatial Data Infrastructures (Francis Harvey) 18

Implementation Issues in using Parliament for GeoSPARQL (Dave Kolas) 19

Geosearch: A System Utilizing Ontology and Knowledge Reasoning to Support Geospatial Data Discovery (Kai Liu [Chaowei Yang]) 20

Developing Semantics Rules using Evolutionary Computation for Information Extraction from Remotely Sensed Imagery (Henrique Momm) 21

Reading News with Maps (Hanan Samet) 22

Spatial Semantics enhanced Geoscience Interoperability, Analytics, and Applications (Krishnaprasad Thirunarayan)(T.K. Prasad) [work done with Amit Sheth] 23

The Need to Determine Ontology System Requirements for Online Graduate Students (Dalia Varanka) 24

The GeoQuery Tool for Parliament SPARQL (James Wilson) 26

1

Semantic issues in Land Use and Land Cover Studies – Foundations, Application and

Future Directions

Ola Ahlqvist The Ohio State University

[email protected]

After decades of accomplishments and faced with new technological and scientific insights, the

field of land use and land cover (LULC) is seemingly at a crossroads for effective and open uses

of data. The use of categorical LULC data in computer-based land analysis poses a significant

challenge because it usually leads to a binary treatment of the information in subsequent

analysis. Still, LULC data offers a rich and generic resource and it is often used for purposes

other than just finding out what the land cover is at a location; examples include climate

modeling, monitoring of biodiversity, and simulation of urban expansion. As a result, the

objectives for LULC semantics have moved from those of increasingly accurate technological

representations of spatially explicit change to the representation of integrated roles LULC play

within a broader environmental context. Many of these uses call for deeper understanding of the

categories in order for the data to be re-purposed. As more and more land cover data sets have

been developed there is also an increased recognition that variation in nomenclature and class

definitions poses significant hurdles to effective and synergistic use of LULC resources.

A book on this subject would provide a platform for scholars to reassess the field, affirm

successful approaches, and point to future possibilities in advancing LULC semantics. The

proposed book will consist of three parts. The first section will be a summary and analysis of

land-use/land-cover semantics and explanation of current practices. The objective of this section

is to review aspects of data modeling where designers and practitioners should be aware of

providing clear and consistent semantic details. The second section will consist of current

approaches to manage LULC semantics. This section will serve as a resource for LULC data

producers, managers and users to adopt current best practices in their own work with land cover

and land use information. The third and final section will consist of a forward-looking collection

of ongoing research across the entire spectrum of LULC semantics solicited from recent

conferences and workshops.

The topics to be covered in sections one and two will consist of both conceptual and

technological semantic practices, including but not limited to:

Categorization; the definition of criteria for sets and their members. Classes may be named in

advance top-down, such as those based on scientific literature, or may be formed bottom-up by

aggregating observations on the ground.

Metadata; documentation for data reuse. Data-sharing services rely on metadata specifications

for searching portals for public use.

Ontology logic restrictions; logical axioms applied to data to guide the way those data engage

and execute in information processing.

Reasoning from text sources; content analysis of texts to elicit semantics and identify reasoning

principles.

Explicit semantic specifications, ontologies, vocabularies and design patterns.

2

Use cases from applying semantics in searches, LULC classification, spatial analysis and

visualization

The content of the forward-looking last section is harder to predict as it will rely on contributions

of cutting-edge work drawn from upcoming research conferences. Nevertheless it is reasonable

to expect that there will be contributions that treat issues of Big Data, Open Science, knowledge

infrastructures and their organization, integration of bottom-up and top-down approaches to

LULC semantics, collaboration frameworks, and interdisciplinary challenges such as

EarthCube.

3

Enhanced Semantics for Gazetteers

Kate Beard

School of Computing and Information Science

University of Maine [email protected]

This proposed presentation would discuss the development and architecture for a geo-

semantically enhanced gazetteer. Place names provide easily expressible ways for people to

engage in geospatial searches but they have computational limitations. Digital gazetteers provide

a means to expand and make place name search more effective and robust. Current digital

gazetteers generally take the form of triples of place names (N), geographic footprints (F) and

feature types (T), and impose no constraints on the number of names, footprints, or feature types

that can be associated with a feature (Hill 2006). They support query expansion of official names

to name variants and the translation of place names to geographic coordinate foot prints to enable

spatial searches (Goodchild and Hill 2008). A functionality not well addressed by current

gazetteers is a capability for expanding place name searches to geographically related place

names (e.g. connecting Atlanta, GA to its named suburbs or the Potomac River to its named

tributaries). Humans are able to make geo-semantic connections between place names, for

example that Queens, Central Park, and Rockefeller Center, are parts of New York City, but

computational systems, without such knowledge, are unable to make such geo-semantic

connections.

Suggestions have been made to add relationships to gazetteers (Hill 2006), but to date gazetteers

have remained largely flat structures with named features as isolated unconnected instances.

Some gazetteers have incorporated one or a few modelled relationships such as the containment

relationship in the Getty Thesaurus which links named features to administrative units and

GeoNames which includes parent, neighbour and nearby relationships between places. Partial

solutions can also be obtained by deriving topological relationships between feature footprints

but these relationships are not semantically based and are limited by the dimension and

configuration of gazetteer footprints which are predominantly points.

While topological relationships between feature footprints can be derived, these do not capture

semantically rich feature to feature or feature-part relationships such as the relationship of

tributary between streams. This presentation will describe a semantically enhanced gazetteer

model developed from two ontologies; a gazetteer ontology and a geographic feature domain

ontology. The two ontologies align with Coucelis’s (2010) distinction between ontologies

focused on information constructs and those focused on real world entities. The gazetteer

ontology models features as information constructs and formalizes relationships (in an

information space) between a feature, its location representations, and feature types. The

geographic feature domain ontology specifies a set of canonical geographic feature classes and

models feature to feature and feature part relationships between the canonical classes.

The gazetteer instantiates classes and relationships from both ontologies with the result being

instantiated relationships between named feature instances that can be queried. The approach is

demonstrated with named hydrologic features from the National Hydrographic Dataset (NHD).

4

The gazetteer and supporting ontologies were developed with semantic web technologies: RDF

(Resource Description Framework), RDFS (RDF Schema), OWL (Web Ontology Language) and

SPARQL. These technologies have a number of limitations for working with and representing

geospatial semantics which the presentation will outline.

References

Couclelis, H. (2010). Ontologies of geographic information. International Journal of

Geographical Information Science(24):12, 1785-1809.

Goodchild, M.F. and L.L. Hill, 2008. Introduction to digital gazetteer research.

International Journal of Geographical Information Science. 22(10): 1039-1044.

Hill, L.L. 2006. Georeferencing: The Geographic Associations of Information (Digital Libraries

and Electronic Publishing). Cambridge MA: MIT Press.

5

Working on Common Frameworks for Data Infrastructures

Gary Berg-Cross

Spatial Ontology Community of Practice (SOCoP)

[email protected]

As part of the NSF SOCoP INTEROP group, Gary Berg-Cross will help facilitate the Workshop

on Semantics in Geospatial Architectures: Applications and Implementation on October 28-29,

2013 at the University of Wisconsin-Madison.

Improved semantics are widely recognized as important to make modern, multi-layered

information management systems and infrastructure more intelligent, inclusive, integrated,

interoperable, flexible and transparent. General advance on semantic technology inclusion of

data and metadata services is part of efforts such as NSF’s decade-long vision for EarthCube and

the recent Research Data Alliance effort, each of which is likely to benefit geospatial information

systems and national spatial data infrastructures. For example, EarthCube seeks to promote

integration, flexibility, inclusiveness, and easy adoption by connecting the several layers of data

and information management, from the resource layer with access to data and information, to the

data curation and management layer. Infrastructure developed from such efforts, such as

semantic-based metadata repositories, will be of use to the geospatial community.

However, no generally agreed upon integrated framework currently exists to guide these efforts

and adoption of a more focused geospatial SDI vision. As to data content, a start is the geospatial

framework data layers defined by the Federal Geographic Data Committee (FGDC), such as

identification of spatial layers (e.g. hydrology, land use etc.). But more layers are likely needed

as evidenced by additional layers defined by and for state-level data. In addition the semantic

relations between layers is important to explicate such as the relation of hydro layers and a

wetland layer.

6

Luis Bermudez

Open Geospatial Consortium (OGC)

[email protected]

The Open Geospatial Consortium (OGC) is an international industry consortium of 474

companies, government agencies, and universities participating in a consensus process to

develop publicly available interface standards. These standards support interoperable solutions

that "geo-enable" the Web, wireless and location-based services, and mainstream IT. The

standards empower technology developers to make complex spatial information and services

accessible and useful with all kinds of applications.

This presentation will be on the importance of standards and how to implement semantic

capabilities into statewide geospatial information systems. These systems will likely use

standards such as the OGC Web Feature Service (WFS) to publish vector data (e.g., polygons

representing parcels or state boundaries). OGC, in the last two testbeds (OWS-8 and OWS-9),

has advanced cross-community interoperability. Feature types could be mapped to other states’

data to better share information across states, or features can be categorized depending on the

purpose (customer or decision maker).

7

Enabling Semantic Search in Geospatial Metadata Catalogue to Support Polar Sciences

Wenwen Li1 and Vidit Bhatia2

1GeoDa Center for Geospatial Analysis and Computation, School of Geographical Sciences and Urban Planning, Arizona State University, Tempe, AZ 85287

2Department of Computer Science, Arizona State University, Tempe, AZ 85287 {wenwen, vidit.bhatia}@asu.edu

Studies of polar regions have become increasingly important in recently years because (1) increasing interest in mining and natural resource exploration; (2) both poles are sensitive to human activities and global, environmental, and climate changes; and (3) polar regions are key drivers of the Earth climate. In May 2013, the White House released “President’s National Strategy for the Arctic Region” and identified “increasing understanding of the Arctic through scientific research and traditional knowledge” and “making decisions using the best available information” as the overarching stewardship objectives to achieve in the coming decade. Fortunately, we are entering the era of big data. Pervasive technology for Earth observation such as sensor network, high-resolution telescope and the Polar satellites enables the retrieval of large amount of polar data to accelerate the scientific discovery process. Several data centers, including ACADIS (https://www.aoncadis.org), NSIDC (http://nsidc.org/data/search/data-search.html) and the Antarctic and Southern Ocean Data Portal (http://www.marine-geo.org/portals/antarctic/) have established to share these available resources. A metadata catalogue is usually provided by these portals to support the discovery of data of interest through a keyword-based search interface. Currently, Lucene technique is widely used in these portals and this text-based search approach hinders the retrieval of semantically related dataset, the content of which is described using different keyword set from a user’s query. To enable an intelligent search and a smart connection between an end user and his most needed dataset, semantic-based search comes into play. This search strategy can be categorized into two classes: ontology-based semantic expansion and smart search based on knowledge mining. The ontology-based approach can be considered as a top-down approach: possible semantic linkages are populated by domain experts and encoded in a machine-understandable format, and then a user’s query is expanded by traversing these predefined semantic linkages/relationships. This approach assumes that the semantic relationships in the data can well be captured in advance. However, different people tend to have different perspective on how the ontology should be established and it is extremely difficult to build a complete knowledge base to serve various search purposes. To overcome this issue, in this work, we aim at employing a bottom-up approach which relies on mining the dataset itself to discover the latent semantic relationships between keywords/terms in the metadata corpus. A Latent semantic indexing technique combined with Paice Stemming algorithm is performed on top of the Lucene indexing to further improve

8

https://www.aoncadis.org/

http://nsidc.org/data/search/data-search.html

http://nsidc.org/data/search/data-search.html

http://www.marine-geo.org/portals/antarctic/

the search results. A new ranking algorithm based on a revised cosine similarity and two-tier ranking algorithm ensures the high precision of the top search results. In addition, we integrated this approach into a popular metadata catalogue- Geonetwork (http://geonetwork-opensource.org/) to broadly share this semantic search capability with peers. We expect this work to greatly enhance the capability of data search in existing polar data portals and geospatial data discovery at large.

9

http://geonetwork-opensource.org/

Attendee: Dr. Kai Cao, World History Center, University of Pittsburgh,

[email protected]

Title: World Historical Ontology Research in CHIA Project

Philosopher George Santayana warned (in 1901) that, “Those who cannot remember the

past are condemned to repeat it.” Less known, but likely as important, is his challenge

that “a man's feet should be planted in his country, but his eyes should survey the world.”

The Collaborative for Historical Information and Analysis (CHIA) exists to accelerate

and empower research surveying the global human record. As a public system, it will be

used synergistically by policy-makers adopting decisions, scholars identifying global

processes, and educators developing student skills in global analysis. It will ingest

comprehensive, multidisciplinary data and provide tools to uncover the patterns of social

interaction and the processes driving these interactions. Its research and analysis will

integrate the approaches of social, health, and environmental sciences with those of

information sciences. The result, an improved understanding of past patterns in society at

all levels, is fundamental to assessing future challenges and predicting the success of

proposed solutions.

In roughly five years from now on, CHIA intends to develop a strong and expanding

research team which will unleash a rapid inflow of historical data to be documented and

archived. CHIA will develop an overall ontology for world-historical documentation and

analysis, including an expanding system of metadata to describe data and assist in their

integration and aggregation. CHIA will conduct interactive analysis at regional and

global levels of variables in social sciences, health, and climate; and develop systems of

visualization that will assist in analysis and provide feedback for collection and definition

of data. Here are the primary goals:

Global Collaboration: collaborative relations to sustain and expand the

creation of a world-historical data resource

Crowd-sourcing applications: to facilitate data ingest and file merging

CHIA Archive: a distributed archive with datasets held at five levels of

integration into the overall CHIA system

World-historical gazetteer: a comprehensive historical gazetteer, and a

spatial search engine to accompany it

Temporal search engine, with extended temporal metadata

Ontology: a developing CHIA ontology, providing topical classification

of data, as well as space, time, and the tasks and applications of CHIA.

Digital Stewardship: following best practices in housing and display of

datasets

Data: energetic collection of historical data worldwide

Theory: engage debate on linkage of social-science theories to each other

Undoubtedly, the overall ontology—the overarching classification system—of the global

archive would be necessary and meaningful. Various aspects of the ontology would be

established at different stages of the project. Initially it includes what we here call

10

mailto:[email protected]

metadata—the description of values and variables in each data set and the recording of

the sources and compilers of data. The incorporation of such existing detailed

classifications means that data- ingest work can start before the high level framework –

the overall project ontology – is finalized. Later stages of ontology include more

comprehensive categorization of types of data, definitions and classification for the

linkage and aggregation of datasets, and definitions for the analysis and visualization of

data.

As one of the primary researchers in the NSF funded CHIA project, I believe that through

this workshop, the world historical ontology research, even the study of world historical

gazetteers could benefit a lot from all the other successful applications and

implementations of semantics in geospatial architectures.

Hopefully I could be one of the attendees of this workshop (with no presentation).

Many thanks.

Best regards,

Postdoctoral Research Associate, University of Pittsburgh

Visiting Research Fellow, Harvard University

11

Janet Fredericks Woods Hole Oceanographic Institution [email protected] Enabling Semantic Mediation in OGC SWE The OGC has developed core standards to provide a framework that enables machine-to-machine harvesting of observational geospatial data and metadata. What is under the hood doesn’t matter – data can be stored in native data systems. The services upon an HTML request return the OGC-adopted encodings that encapsulate the information, enabling machine harvesting of data selected through geospatial, temporal queries, as well as other specifications, depending on the implementation. The use of the OGC standards supports brokering activities that can provide translations across standards. But, use of the adopted service standards in a collaborative environment requires an implementation designed to enable semantic mediation. OGC Sensor Observation Service (SOS) offers a standards-based framework in which to describe observational provenance (SensorML), as well as observational data (O&M). The use of OGC SOS has been adopted in real-time ocean observing systems, such as the NOAA IOOS and the associated regional associations (NFRA). Through participation in the EarthCube Brokering Team Hackathons, a demonstration SOS delivering oceanic wave data was tested on three brokering sites. Through the NOAA ERDDAP broker, the WHOI/Q2O (Quality to OGC) SOS implementation (q2o.whoi.edu/node/129) was translated into ISO metadata with NetCDF or TSV requested output. Brokering services enable users to choose to work in frameworks beyond the primary data offering without installation of or development of translation tools. Through the ESRI Geoportal and the Data Access Broker, catalogue services were automatically populated with information relating to geospatial and temporal coverage along with basic metadata, enabling data discovery and access. The Q2O SOS demonstration was developed, with funding from NOAA, to enable dynamic quality assessment. It is content-rich and delivers information about how the observations came to be, as well as information about quality tests and associated real-time results. The implementation also demonstrates the ability to enable the development of ontologies by integrating links within the SOS to URLs that link to SKOS-encoded terms (Figure 1). These ontologies can be utilized in collaborative environments where terms with the same or similar meanings may have different names but must have associations. For example (Figure 2), one provider’s QC test result is called pass and another’s is called _1 and a data aggregator can map each to the same meaning. Also, one can use the mapped terms to have code values encoded. For example, one’s ass may have a value of one (1) and another provider may use a value of zero (0) to represent a passed qc-test. Through inclusion of links to encoded terms, these values can be mapped to have the same meaning in data integration and filtering data offerings. The use of standards in geospatial data access is important. But without the inclusion of references to registered terms in a semantics framework, ontologies cannot be developed, making automated data assessment and integration nearly impossible.

12


Figure 1

Figure 2

13

The Many Semantic Domains of Spatial Data Infrastructure

Mark Gahegan and Ben Adams

The Centre for eResearch, University of Auckland

New Zealand

Collections of geospatial data can become overwhelming to search and to organize, yet the

successful management and description of data is an essential step towards an effective

Spatial Data Infrastructure (SDI). The following list introduces five very difficult challenges,

none of them solved as yet.

First, there’s the volume of the data, from massive imagery collections, new sensing

technologies, crowd-sourced data and more, along with better availability of more traditional

data, such as roads and census data. The data production rate is staggering.

Second, there’s the complexity of the data itself, often with a generous array of attributes,

with intricate spatial encodings and complex geometric and topologic relations.

Third, there’s the variety of conceptual models used, encompassing imagery, collections of

features (objects), thematic maps, regular and irregular networks, graphs, point clouds, even

place-based data that is devoid of any explicit geography.

Fourth, there is the problem of attribute semantics. Some data have rich schemas and

ontologies to help describe their meaning, but most do not. Where schemas do exist, they

often do not align well with each other. So being certain of meaning, or harmonizing data to

ensure a consistent meaning is challenging at best.

Fifth, there’s the difficulty of establishing the authority of data. Within an increasingly

complicated network of data suppliers, how does a researcher know which data to trust, or

which data has been through a quality control process, or which data is the most reliable?

Buried within this issue are all the traditional problems of provenance, accuracy and fitness

for use.

We tend to think of the ‘semantic problem’ as pertaining only to attribute data, but in fact

there are semantic issues across all of the above domains (and possibly more domains

besides). So judging the ‘worth’ or ‘utility’ of a dataset for a given task requires that we

reason across all of these domains together—in an SDI, there really is no point in solving any

of the problems in isolation, since even datasets that harmonize correctly are no use at all if

they have the wrong conceptual model or are not at all trustworthy.

When searching for, or trying to understand a dataset, how should such each of these domains

or dimensions be presented and explored? Can they all be harmonized into a single

conceptual model for an SDI? Can some kind of ‘fitness-for-use’ score be calculated across

the combined space, and used as an aid to locating suitable datasets?

In this talk, we:

Briefly recap each of the above issues.

Tentatively propose an over-arching framework to organize the above domains into a

series of dimensions that can be stacked together to support some kind of distance

14

metrics. Such metrics allow us to represent the similarity between datasets—given

some objective function defined by the user—and thus the appropriateness of a

dataset for a given task.

Introduce some of the semantic challenges that need to be overcome in order to

reason over such a complex and multi-faceted space.

15

Damian Gessler Semantic Web Architect The iPlant Collaborative University of Arizona Tucson, AZ 87521 [email protected] Indication: Can attend the meeting and discuss iPlant’s Semantic Web Platform at the ‘Workshop on Semantics in Geospatial Architectures: Applications and Implementation.’ The iPlant Collaborative Semantic Web Platform Geospatial semantics has huge promise. Yet implementing semantics in large infrastructures is challenging. Early implementers face significant obstacles in migrating from research-grade proof-of-concept applications to production-grade, value-added platforms. The well-informed perceived promise of semantics may cloud inconvenient “details” that significantly hinder operational maturity. Yet the promise is real, and the complexity of today’s earth science challenges imply that computational semantics has an important role. To get from promise to realization, we need a sober understanding of the challenges and solutions. Cyberinfrastructure semantics is challenging because there does not exist a generally adopted technology stack that integrates the various technology layers and social norms into a readily accessible platform for the end-user. Thus semantic technologies—from RDF (Resource Description Framework) and OWL (Web Ontology Language) to pseudo-semantic ontologies such as OBO (Open Biological and Biomedical Ontologies), Darwin Core, schema.org, OGC (Open Geospatial Consortium), and LOD (Linked Open Data)—exist in a disjointed ecosystem of ad hoc installations and social contracts. Indeed, the implied semantics inherent in making any individual system operate often outweigh the explicit semantics that are needed for computational and integrative maturity. The iPlant Collaborative—an NSF-funded large cyberinfrastructure for the plant sciences—approaches this challenge with a three-tier architecture. At the Foundational layer is a tight collaboration with NSF XSEDE resources (Extreme Science and Engineering Discovery Environment; https://www.xsede.org). This delivers world-class high performance computing clusters (“big iron”) at the peta-FLOPS and peta-byte scale [O(1015) floating point operations per second and storage capacity respectively]. The next tier is an Enterprise layer, consisting of a Web-accessible Discovery Environment and virtual machine farm. The former delivers a breath of applications (approximately 300 bioinformatic applications accessible in a virtual desktop interface), while the later delivers a depth (scientists and labs can configure customized virtual machines with specific software and workflows). The final tier is the semantic layer of iPlant’s production-grade semantic platform using SSWAP: Simple Semantic Web Architecture and Protocol. SSWAP (http://sswap.info) uses open, Just-In-Time ontologies and transaction-time OWL reasoning to bridge Foundational resources with third-party Web sites and distributed scientific

16

offerings. SSWAP is a light-weight OWL protocol that allows any Web resource to describe its offering—its mapping of some input to some output—in simple, first-order description logic. iPlant runs a semantic Discovery Server that allows users to “discover” these resources, send data, invoke and execute services, and daisy-chain services into semantic pipelines. Visitors to third-party Web sites can click a button and have requests sent to iPlant for real-time semantic service discovery and invocation. Actual Web service execution is performed at the separate, distributed semantic Web service sites. Thus iPlant’s Semantic Web Platform is a semantic broker performing both vertical and horizontal semantic integration: it is not simply a feeder of data into iPlant, but an integrator of third-party and/or iPlant semantic resources across the Web. For the geospatial context, iPlant collaborators at TreeGenes have implemented a Web resource called CartograTree for tree scientists. Scientists can visually select tree samples as displayed by their lat/long coordinates and then send data into just-in-time TreeGenes and iPlant semantic pipelines. A worked example is at http://sswap.info/example.

17

Semantic Portals for Semantic Spatial Data InfrastructuresFrancis HarveyUniversity of [email protected]

This position paper suggests that a challenge in developing semantic interpretability for next generation spatial data infrastructures lies in the creation of portal-level semantics. What this means is that architectures have to conceptually blend the work on semantic interoperability with portal designs that support domain needs and requirements. Why? Portal support specific applications or ranges of applications. Semantic interoperability has however focused on data set level documentation and operationalization. Merging these two approaches leads to architectures that support domain semantics through terminological and interface bridges that connect to robust data set level description languages. This approach seems to offer a helpful way to resolve the current arms-race in portal building and harness the strengths of semantic interoperability solutions. The idea comes as the University of Minnesota is beginning to develop an interoperable data management system for geospatial data. A large research university, with over 52,000 students and funded research projects totaling over $749 million in 2012, UMN researchers produce practically every conceivable kind of spatial data. Spatial and temporal resolutions, object footprints, semantics, etc vary enormously. Second, disciplinary and legal requirements lead to a broad range of data practices across the sciences. All attempts to the contrary, it seems likely that a large number, if not finally, unknowable number, of portals to facilitate researcher access to data resources can develop. Getting ahead of this development and providing a suite of portals that support researcher needs to ingest, edit, display, search and visualize semantic data in a user-friendly and meaningful way seems a wise strategy in an era of diminishing resources. Indeed, how can an information infrastructure support the diversity of a research university? How can it do this especially when the nature of research encourages a multiplicity of management approaches and organization of research data? Lessons from experiences with spatial data infrastructures suggest that multiple means to participate and balance researcher control and institutional management offer the best concepts for architectures that support data sharing without encumbering researchers with bureaucracy. The challenge lies in creating user-friendly interfaces to display, browse and query data while the underlying Semantic Web technology remains extremely obtuse for users unfamiliar with the RDF triple format. Instead of attempting to create a single universal geospatial data portal for the university, we are exploring the concept of supporting multiple portals that gain access to data through a linked open data design to metadata and data held by researchers or archived by the institution. This holds similarities with Semantic Web Portals proposed by a number of researchers including Ding et al 2010. We are currently exploring this idea with participants in the U-Spatial project through a user-orientated design process. The concept described in this brief paper suggests the initial idea; it will certainly be altered through the design process before implementation begins.

18

Dave Kolas

Raytheon BBN Technologies

[email protected]

I would like to attend the workshop in order to understand and contribute to the current state of

the art in geospatial Semantic Web systems. While I do not have a particular system of interest

to present at this time, I have experience both as the initial developer and current maintainer of

the spatial indexing support in BBN’s triple store Parliament and as the co-chair of the OGC

GeoSPARQL working group.

While I could certainly present introductory GeoSPARQL information, or information about

using Parliament for GeoSPARQL, it might be more interesting at this point to facilitate a

discussion about implementation issues in these types of systems. Most presentations about

semantic integration often focus on the high-level structure and prototype systems. If we had a

group discussion about all of the difficult parts of actual implementation, it is possible that we

could push progress forward significantly. The result might be the community getting closer to

more deployed systems.

19


Kai Liu

Ph.D. student

Department of Geography and GeoInformation Science

Works with Chaowei Yang

George Mason University

[email protected]

Geosearch: A System Utilizing Ontology and Knowledge Reasoning

to Support Geospatial Data Discovery

Earth science communities generate large amounts of datasets daily. Their contents are described

in millions of metadata records. It is critical for data users that they discover suitable data from

the millions of metadata records based on defined search criteria. Spatial Data Infrastructure

(SDI) has been built for data management. Inside SDI, many data discovery components have

been developed to assist users to search for data. However, efficient and accurate geospatial data

discovery is still a big challenge because of the heterogeneity and complexity of concepts under

different disciplinary semantics. Most of the operational SDIs adopt keyword-search

technologies which use string matching mechanism to discover the existence of keywords in the

metadata of geospatial data. Solely string matching mechanism is hard to improve search

precision and recall because: 1) The terminologies and vocabularies describing the same concept

may vary with domains and languages; 2) The search content cannot be expressed using several

keywords explicitly (such as the sub-concepts of a concept).

Geosearch (http://geosearch.cloudapp.net/) is a search engine system we developed to address

the limitations of solely string matching mechanism. It provides metadata search capabilities

against Global Earth Observation System of Systems (GEOSS) Clearinghouse (Liu et al. 2011)

which stores large amount of metadata harvested from data registered in GEOSS. Geospatial

ontology and knowledge reasoning capacities are built into the engine for improving search

performance. Firstly, it uses Semantic Web for Earth and Environmental Terminology (SWEET)

ontology to improve the search recall by searching related records. Secondly, it improves the

search precision by using result similarity evaluation functions. Finally, the engine assists user’s

decision-making by integrating a service quality monitoring and evaluation system, data/service

visualization tools, multiple views and additional information (Gui et al. 2013).

References:

Liu, K., Yang, C., Li, W., Li, Z., Wu, H., Rezgui, A., & Xia, J. (2011, June). The GEOSS

Clearinghouse high performance search engine. In Geoinformatics, 2011 19th International

Conference on (pp. 1-4). IEEE.

Gui, Z., Yang, C., Xia, J., Liu, K., Xu, C., Li, J., & Lostritto, P. (2013). A performance, semantic

and service quality-enhanced distributed search engine for improving geospatial resource

discovery. International Journal of Geographical Information Science, 27(6), 1109-11

20

Developing Semantics Rules using Evolutionary Computation for Information Extraction from Remotely Sensed Imagery

Henrique Momm

Assistant Professor Department of Geosciences

Middle Tennessee State University [email protected]

The difference between low-level information extraction techniques using traditional pixel-based classification methods and the high-level information extracted by a human analyst is often referred to as “semantic gap”. Human analysts use a complex combination of different image cues such as color (spectral information), image texture, object geometry (geometry of image regions), and context (relationship between image regions). Because human analysis of large areas and often multiple periods of time (multiple images) is costly and time consuming, scientists have recognized the importance of developing more sophisticated semi-automated or automated methods to convert large quantities of image into actionable information. The challenge resides in multifaceted problems where the relationship between image’s regions is too complex to be defined by explicit programming, and therefore stochastic algorithms are being investigated as a plausible alternative. Evolutionary computation algorithms were integrated with standard image processing and unsupervised clustering algorithms to derive individual image cues in a “learn-from-examples” mode. Genetic programming was selected as the evolutionary engine because these methods represent candidate solutions as mathematical equations (human readable), do not require assumptions about target data statistics, and their ability to develop robust models even when the parameters relationship is not fully understood. The principal objective is to bridge the semantic gap by sub-dividing the overall information extraction task into sequential steps. In the first steps, the evolutionary framework derives candidate solutions based on image spectral and texture information image cues through optimize search for spectral transformations and image texture operators (or sequence of texture operators) that maximizes the influence of the feature of interest and minimizes the influence of the remaining image background. Based on the findings of the initial steps, the evolutionary framework is used to evolve solutions to identify features of interest based on geometric properties of image regions. Future research opportunities are also discussed herein. Potential developments should include the addition of more steps to investigate ontology between features (relationship between image regions). The efficient optimized learn-from-examples schema of evolutionary computation algorithms could be utilized to generate the most appropriate ontology representation to replicate our spatial relationship perception ability. Enhancements should also be made to combine all individual image cues solutions into a single decision-making procedure, just like human analysts do. Finally, a database of semantics rules should be implemented and shared with the scientific community allowing for collaborative contributions and enhancements. I have interest in presenting.

21

Reading News with Maps

Hanan Samet

Computer Science Department

Center for Automation Research

Institute for Advanced Computer Studies

University of Maryland

College Park, Maryland 20742

e-mail: [email protected]

url: http://www.cs.umd.edu/~hjs

Abstract

NewsStand is an example application of a general framework that we are developing to enable

people to search for information using a map query interface, where the information results from

monitoring the output of over 10,000 RSS news sources and is available for retrieval within

minutes of publication. The advantage of doing so is that a map, coupled with an ability to vary

the zoom level at which it is viewed, provides an inherent granularity to the search process that

facilitates an approximate search. This distinguishes it from today's prevalent keyword-based

conventional search methods that provide a very limited facility for approximate searches which

are realized primarily by permitting a match via use of a subset of the keywords. However, it is

often the case that users do not have a firm grasp of which keyword to use, and thus would

welcome the capability for the search to also take synonyms into account. In the case of queries

to spatially-referenced data, the map query interface is a step in this direction as the act of

pointing at a location (e.g., by the appropriate positioning of a pointing device) and making the

interpretation of the precision of this positioning specification dependent on the zoom level, is

equivalent to permitting the use of spatial synonyms. The issues that arise in the design of such a

system include the identification of words that correspond to geographic locations, and the

disambiguation of the ones that do.

22

Spatial Semantics enhanced Geoscience Interoperability, Analytics, and Applications Krishnaprasad Thirunarayan and Amit Sheth

Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, OH-45435 [email protected], [email protected]

We present our research ideas for developing cyberinfrastructure for Geoscience

applications developed in the context of the EarthCube initiative, and our NSF-sponsored work on incorporating spatial-temporal-thematic semantics for enhanced querying and feature extraction from sensor data streams. (1) Semantics-empowered Cyberinfrastructure for Geoscience applications Rapidly maturing semantic technologies, based in part on Semantic Web standards, have the potential to increase opportunities for interdisciplinary research by providing support and incentives for sharing, publishing, accessing and discovering heterogeneous data. Our thesis is that associating machine-processable lightweight semantics with the long tail of science data can overcome challenges associated with data discovery, integration and interoperability caused by data heterogeneity. In order to demonstrate this, we propose to develop cyberinfrastructure (CI) utilizing lightweight semantic capabilities to serve individual researchers. Specifically, the focus is on ease of use, low upfront cost, and shallow semantics that appeals to, and is most likely to be used by the broad community of geoscientists. The choice of using controlled vocabularies and lightweight ontologies, as compared with using formal ontologies in OWL, reduces complexities and training efforts, enabling wider and faster adoption by scientists not skilled in computer science techniques. We propose to use existing, community-ratified and enhanced ontologies that scientists can employ with minimal training to easily annotate (tag) their data, publish it, and discover relevant data in support of scientific discoveries. Coarse-grained annotations can facilitate semantic search, while fine-grained annotations and extraction can be used to create Linked Open Datasets (LOD). Using LOD, that is increasingly being adopted by open government and open science initiatives, data can be translated to a form that makes it readily available, reusable, and amenable to automatic processing, while supporting conceptual richness of data representation. Our research is aligned with National Science Foundation’s EarthCube initiative. (2) Expressive search and integration using Geospatial information

We have developed expressive extensions to RDF and SPARQL that associate spatio-temporal information with triples via annotations and employ rich operators to support inferencing1. This framework extended using geospatial knowledge to support spatial semantics can support interoperability and complex analysis.2

In the context of Semantic Sensor Web3, to process multimodal sensor data streams, we have used spatio-temporal context in Semantic Sensor Observation Service (SemSOS) to aggregate and combine primitive weather sensor data to obtain weather features, and exploit Geonames portion of LOD to map place names to GPS coordinates, to locate relevant sensors and to provide easy-to-use and natural query interfaces4. 1 http://knoesis.org/research/semweb/projects/stt/ 2 http://knoesis.org/library/resource.php?id=903 3 http://knoesis.wright.edu/research/semsci/application_domain/sem_sensor/ 4 http://www.slideshare.net/patniharshal/real-‐time-‐semantic-‐analysis-‐of-‐streaming-‐sensor-‐data

23

The Need to Determine Ontology System Requirements for Online Graduate Students

Dalia Varanka

Johns Hopkins University

[email protected]

The Johns Hopkins (JHU) Online Master of Science in Geographic Information Systems (GIS)

offers an entirely online course in Geospatial Ontology and Semantics (. Few courses on this

subject have been taught in addition to the James Madison University included in the INTEROP

project. The initial architectural components for this program have been started, but face design

and implementation challenges in order to more fully educate new ontologists. This brief paper

discusses the state of semantic technology architecture intended to support the geospatial

ontology education process. Though the announced INTEROP workshop objectives specifically

mention spatial data infrastructure (SDI) requirements, these SDI are intended to be used by

educational institutions as one group of stakeholders. Thus, the identification of educational

objectives is useful to government designers.

The course aims to prepare JHU program graduates with skill sets serving a role such as ‘Data

Analyst.’ The term Data Analyst is intended to mean a thorough understanding of the system,

especially data and software, to allow for the expert solution of geospatial information

manipulation and knowledge acquisition tasks. Because students are temporary ‘residents’ of a

broader academic program, extensive architectural design is not left for them to design and

support, though the system must be easily navigable. Presently (2013), students download

Protégé (BMIR 2013) to their personal computers and are provided with a remote connection to

a SPARQL endpoint, supporting the GeoSPARQL standard, at the U.S. Geological Survey. A

cloud-based (Geo)SPARQL endpoint implementation is being built on JHU networks, but is not

yet operating for semantic technology architecture.

In addition to achieving course objectives, a primary aim of online education is to involve the

student at a high level of interaction, both technically and through discussion. Thus

opportunities are needed for testing design ideas and experiencing immediate feedback. Online

educational systems such as Blackboard have advanced capabilities for technical exchanges, but

qualitative methods are less well researched; such exchanges are crucial in semantic architecture,

which leverages a range of philosophical, linguistic, geographical, and social skills. These skills

are presently the domain of discrete experts, contributing through various ontology media such

as upper ontologies, computational linguistics, logic, social media, and information models.

Educational software designs are required for the cohesive integration of these approaches for

the student. Lastly, these approaches must be shown to integrate with other GIS courses.

An additional objective for the JHU course is to prepare students for potential doctoral-level

research, whether on their own or as part of a research team. Advanced applied ontology skills

would require ontology modeling and linked-data collaboration, sometimes at the international

level. Initially, JHU students have access to Internet-based semantic technology projects.

Unfortunately, solutions for ontology modeling are very hard to find.

24


The expansion of semantic technology for science benefits will require investment in university

information science curricula. The extensively used Esri ArcGIS software does not lend itself to

semantic technology, though conversion programs between Esri data formats and RDF are

available. Commercial software with RDF capabilities will probably only be purchased if

semantic technology shows itself to be better more popular than it is at present. Open source

digital solutions present the most promising area of semantic technology development for

university courses.

REFERENCES

Albrecht, J., Derman, B., and Ramasubramanian, L. 2008. Geo-ontology Tools: the Missing

Link. Transactions in GIS, 12 (4): 409-424.

Stanford Center for Biomedical Informatics Research (BMIR). Protégé. Stanford University

School of Medicine. Accessed September 19, 2013 at: http://protege.stanford.edu/.

25

http://protege.stanford.edu/

James W. Wilson, PhD James Madison University

[email protected]

I would like to attend and can present at the Workshop on Semantics in Geospatial Architectures:

Applications and Implementation. I believe that semantics will play an ever increasing role in the

development of SDIs. Finding, visualizing, and understanding geospatial data and analytical results is

becoming ever more important and ever more difficult to accomplish given the rise in the amount of

heterogeneous data and systems. In very controlled environments (e.g. within a government

organization), standardized ways of storing and encoding data can be accomplished, but not so in the

open environment of the Internet. The development of formal ontologies in different domain areas

can aid in bridging between knowledge areas, and semantic technologies can provide a framework

for interacting with diverse SDI components.

As part of the NSF INTEROP team, I have led the effort to develop an Internet-based GUI to create

GeoSPARQL queries and visualize the results in a web-based map. The tool, GeoQuery, has been

programed by Dr. Ralph Grove of James Madison University (who is not able to attend the

workshop), and is loosely based on a tool developed by the Center for Excellence in Geographic

Information Science at the USGS. At present, GeoQuery queries a Parliament SPARQL endpoint at

JMU that contains data for the Shenandoah River watershed; however, the tool can be configured by

the user to work on any SPARQL endpoint. I can demonstrate this tool at the workshop if an Internet

connection is available at the workshop location.

In the current testing environment, GeoQuery is used to search and visualize spatial data that is

stored in a triplestore. The tool could also be used to query and visualize metadata that represents

geospatial data, and Dr. Grove and myself, along with Dr. Steve Whitmeyer (a geologist at JMU)

have had initial conversations with the GeoPortal team at ESRI to look at ways to enhance the

semantic abilities of their open source Geoportal software. Three areas that have been identified for

possible future discussions are 1) using GeoQuery to query their rudimentary SPARQL endpoint, 2)

invoking an ontology in the search, and 3) invoking an ontology to add meaning to the query results.

Another area that has not been discussed yet is the possibility of using GeoQuery to access data not

stored in a triplestore. Since GeoQuery is using a standards-based mapping interface (OpenLayers), it

could be modified to include querying and visualizing data from standard web-based mapping

applications (e.g. OGC WMS & WFS, ESRI ArcGIS rest services) as well.

26

Abstracts for the Workshop on Semantics in Geospatial ... · field of land use and land cover (LULC) is seemingly at a crossroads for effective and open uses of data. The use of categorical

Documents