Top Banner
TWC Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies Xiaogang (Marshall) Ma Tetherless World Constellation Rensselaer Polytechnic Institute @MarshallXMa [email protected] x.marshall.ma rpi.edu/~max7 0000-0002-9110-7369 MarshallXMa
23

Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

Jul 14, 2015

Download

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWCKnowledge Evolution in

Distributed Geoscience Datasets and

the Role of Semantic Technologies

Xiaogang (Marshall) Ma

Tetherless World Constellation

Rensselaer Polytechnic Institute

@[email protected]

x.marshall.ma

rpi.edu/~max7

0000-0002-9110-7369MarshallXMa

Page 2: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWCWilliam Smith's 1815 geologic

map of England and Wales

with part of Scotland

William Smith

(1769-1839)

(Image source: Geological Society of London)

Page 3: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWC1874

(Image source: British

Geological Survey)

Evolution of the

Geological Map of

British Islands / UK

Page 4: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWC1874

(Image source: British

Geological Survey)

1906

Evolution of the

Geological Map of

British Islands / UK

Page 5: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWC1874

(Image source: British

Geological Survey)

1906

Evolution of the

Geological Map of

British Islands / UK

1939

Page 6: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWC1874

(Image source: British

Geological Survey)

1906

Evolution of the

Geological Map of

British Islands / UK

1939

1969

Page 7: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWC1874

(Image source: British

Geological Survey)

1906

Evolution of the

Geological Map of

British Islands / UK

1939

1969

2007

Page 8: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWC1874

(Image source: British

Geological Survey)

1906

Evolution of the

Geological Map of

British Islands / UK

1939

1969

2007

2013

Page 9: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWC

9

2004 2005

2008 2009

Definition of

“Quaternary” in

several versions of

the International

Stratigraphic Chart

Page 10: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWC

10

Page 11: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWC

(Haq, 2007)

Distributed datasets:

Regional geologic

time scales

Page 12: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWC

(Haq, 2007)

Distributed datasets:

Regional geologic

time scales

Page 13: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWC

13

Distributed datasets:

Mismatches of geological

units across political

boundaries

Italy/France near

Cuneo/Colmar

Cambrian Carboniferous

(Asch et al., 2012)

(Base map courtesy:

OneGeology-Europe and USGS)

Page 14: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWC

14

Distributed datasets:

Mismatches of geological

units across political

boundaries

Italy/France near

Cuneo/Colmar

Cambrian Carboniferous

(Asch et al., 2012)

(Ma et al., 2014)

Felsic and hornblendic gneisses

Granitic rocks

Wyoming/Colorado

(Base map courtesy:

OneGeology-Europe and USGS)

Page 15: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWC• Data and models, vocabularies, and ontologies

– Have we ever had model-independent datasets?

• Ontology dynamics and a data life cycle

15

CONCEPT

*Initial concepts

*Questions and

answers

*Grant info

COLLECTION

*Questionnaire

*Coded instrument

*CAI metadata

*Paradata

PROCESSING

*Data specs

*Recodes

*Summary

descriptive info

DISTRIBUTION

*Terms of use

*Citation

*Packaging info

DISCOVERY

*Catalog record

*Indexing

*Related

publications

ANALYSIS

*Replication code

*Publications

ARCHIVING

*Preservation metadata

*Confidentiality

*Additional processing

REPURPOSING

*Post-hoc harmonization

*Data transformations

Diagram reproduced from (Spencer, 2012)

Page 16: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWCOntology dynamics

• Ontology Mapping

• Ontology Morphism

• Ontology Matching

• Ontology Articulation

• Ontology Translation

• Ontology Evolution

• Ontology Debugging

• Ontology Versioning

• Ontology Integration

• Ontology Merging

16(Flouris et al., 2008)

Page 17: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWCPotential challenges

• Reworking of the extant data in a data center

– e.g. caused by ontology/vocabulary versioning

• Semantic mismatch among data sources

– e.g. heterogeneity in ontologies of the same topic

• Differentiated understanding of a same piece of dataset

between data providers and data users

– e.g. a data provider understands Quaternary as 1.806 Ma-present,

and a data user understands it as 2.588 Ma-present

• Error propagation in cross-discipline data re-use

– e.g. heterogeneous datasets may cause misconception in

subsequent works

17(Ma et al., 2014)

Page 18: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWCOneGeology-Europe

• 20 European nations

providing national geologic

maps at scale ~1: 1M

• Harmonized geological

terms and map legends

• Multilingual labels in 18

languages

• Central portal for data

browsing/query among

distributed data sources

A contribution to

INSPIRE

http://www.onegeology-europe.org

18

A few recent works of interest

Page 19: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWC

19

Federated query:

Result of geologic

units with age

‘Cenozoic - from 66

million years to today’

Page 20: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWC

20

Earth Resource Form

Environmental Impact Value

Exploration Activity Type

Exploration Result

UNFC Value

Earth Resource Expression

Earth Resource Shape

Enduse Potential

Mineral Occurrence Type

Mining Activity Type

Processing Activity Type

Mining Waste Type Value

Commodity Code

Mineral Deposit Group

Mineral Deposit Type

Product Value

Recently finished CGI vocabularies

• Construct a collection of vocabularies for

populating information interchange

documents and enabling interoperability

• Provide labels for concepts, scope to

various communities defined by

language, science domain, or application

domain

CGI Geoscience Terminology Workgroup

http://cgi-iugs.org/tech_collaboration/

geoscience_terminology_working_group.html

Page 21: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWC

21

USGS Online Geologic Maps

• Standardized vocabulary

with detailed annotation

• Forward and backward

queries between spatial

data and attribute data

• Links to further data

sources, e.g. aeromagnetic

survey, mineral resources

data, soils, geochemical

samples, etc.

http://mrdata.usgs.gov/geology/

state/map.html

Page 22: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWC

22

Records of a point in the

San Francisco area

Page 23: Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

TWCRecommendations

• Communities of practice on ontology and vocabulary

– Bottom-up, self-organized, and loose top-down control

• Formalize the ‘Concept’ step in a data life cycle

– Top-down, and adopt outputs from the bottom-up approach

• Make it a virtuous circle among the bottom-up and top-

down approaches

23

Thanks for listening.

@[email protected]