Top Banner
1 1 Yannis Tzitzikas et al., MTSR 2013, Thessaloniki Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology Y. Tzitzikas 1,2 , C. Alloca 1 , C. Bekiari 1 , Y. Marketakis 1 , P. Fafalios 1,2 , M. Doerr 1 , N. Minadakis 1 , T.Patkos 1 , L. Candela 3 1 Institute of Computer Science, FORTH-ICS 2 Computer Science Department, University of Crete, GREECE 3 Consiglio Nazionale delle Ricerche, CNR-ISTI, Pisa, Italy 7th Metadata and Semantics Research Conference (MTSR), Thessaloniki, Nov 19-22, 2013
21

Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

Jan 13, 2015

Download

Education

iMarine283644

On the 21st of November 2013, Yannis Tzitzikas, FORTH, presented the Integrating heterogeneous and distributed information about marine species through a top level ontology paper at the 7th Metadata and Semantic Research Conference in Thessaloniki, Greece.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

1

1Yannis Tzitzikas et al., MTSR 2013,

Thessaloniki

Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

Y. Tzitzikas 1,2 , C. Alloca 1 , C. Bekiari 1 , Y. Marketakis 1 , P. Fafalios 1,2 , M. Doerr 1 , N. Minadakis 1 , T.Patkos 1 , L. Candela 3

1 Institute of Computer Science, FORTH-ICS2 Computer Science Department, University of Crete, GREECE

3 Consiglio Nazionale delle Ricerche, CNR-ISTI, Pisa, Italy

7th Metadata and Semantics Research Conference (MTSR), Thessaloniki, Nov 19-22, 2013

Page 2: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

2

Outline

• Context, Problem, Objectives

• Main Approaches for Integration

• The Followed Approach

– The Ontology MarineTLO• Objectives, Benefits, Architecture

– The MarineTLO-based Warehouse

• Exploitation Scenarios

• Concluding Remarks

3Yannis Tzitzikas et al., MTSR 2013,

Thessaloniki

Context: iMarine

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

4

Id: It is an FP7 Research Infrastructure Project (2011-2014)

Final goal: launch an initiative aimed at establishing and operating an e-infrastructure supporting the principles of the Ecosystem Approach to fisheries management and conservation of marine living resources.

Partners:

Page 3: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

3

Problem and objectives

The Problem

• There are several sources of the marine domain, but each of them stores complementary information structured according to its needs.

Our objective

• Harmonize and integrate (link, connect) information of the marine domain – Specific motivating scenario and use cases will be given at the end

5Yannis Tzitzikas et al., MTSR 2013,

Thessaloniki

Marine Information: in several sources

6

WoRMS: World Register of Marine Species

Registers more than 200K species

ECOSCOPE- A Knowledge Base About Marine Ecosystems (IRD, France)

FLOD (Fisheries Linked Data) of

Food and Agriculture Organization (FAO) of the United Nations

FishBase: Probably the largest and most extensively

accessed online database

of fish species.

DBpediaYannis Tzitzikas et al., MTSR 2013,

Thessaloniki

Page 4: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

4

Marine Information: in several sources

7

Taxonomic information

Ecosystem information (e.g. which fish eats which fish)

Commercial codes

General information, occurrence data, including information from other sources

General information, figuresYannis Tzitzikas et al., MTSR 2013,

Thessaloniki

Storing

complementaryinformation

Marine Information: in several sources

8

Web services (SOAP/WSDL)

RDF + OWL files

SPARQL Endpoint

Relational Database

SPARQL EndpointYannis Tzitzikas et al., MTSR 2013,

Thessaloniki

Using and accessed through

different technologies

Page 5: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

5

Main approaches for Integration

In general there are two main approaches for integration

Warehouse approach (materialized integration)

• Design Phase: The underlying sources (and their parts) have to be selected

• Creation Phase: Process for getting and creating the warehouse

• Maintenance Phase: Ability to create the warehouse from scratch, and/or ability to update parts of it

• Mappings are exploited to extract information from data sources, to transform it to the target model and then to store it at the central repository

Mediator approach (virtual integration)

• The mediator receives a query formulated in terms of the unified model/schema. The mappings are used to enable query translation. The derived sub-queries are sent to the wrappers of the individual sources, which transform them into queries over the underlying sources. The results of these sub-queries are sent back to the mediator where they are assembled to form the final answer

9Yannis Tzitzikas et al., MTSR 2013,

Thessaloniki

Main approaches for integration (cont.)

Warehouse• Benefit: Flexibility in transformation

logic (including ability to curate and fix problems)

• Benefit: Decoupling of the release management of the integrated resourcefrom the management cycles of the underlying sources

• Benefit: Decoupling of access load from the underlying sources.

• Benefit: Faster responses (in query answering but also in other tasks, e.g. if one wants to use it for applying an entity matching technique).

• Shortcomings You have to pay the cost for hosting the warehouse. You have to refresh periodically the warehouse

10

Mediator• Benefit: One advantage (but in some

cases disadvantage) of virtual integration is the real-time reflection of source updates in integrated access

• Comment: The higher complexity of the system (and the quality of service demands on the sources) is only justified if immediate access to updates is indeed required.

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

Page 6: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

6

Main approaches for integration (cont.)

In both cases we need a unified model/schema

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

11

The ontology MarineTLO(Marine Top Level Ontology)

Page 7: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

7

MarineTLO: Objectives

• MarineTLO aims at being a global core model that – provides a common, agreed-upon and understanding of the concepts

and relationships holding in the marine domain to enable knowledge sharing, information exchanging and integration between heterogeneous sources

– covers with suitable abstractions the marine domain to enable the most fundamental queries,

– can be extended to any level of detail on demand, and

– allows data originating from distinct sources to be adequately mappedand integrated

• MarineTLO is not supposed to be the single ontology covering the entirety of what exists

13Yannis Tzitzikas et al., MTSR 2013,

Thessaloniki

MarineTLO: Benefits from a Top-Level Ontology

• The adoption of a global core model has various benefits:

– reduced effort for improving and evolving• the focus is given on one model, rather than many (the results are

beneficial for the entire community

– reduced effort for constructing mappings• this approach avoids the inevitable combinatorial explosion and

complexities that results from pair-wise mappings between individual metadata formats and/or ontologies

14Yannis Tzitzikas et al., MTSR 2013,

Thessaloniki

Page 8: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

8

MarineTLO: Key Design Principles

• Formulation– It is an object-oriented semantic model, expressed to a form

comprehensible to both documentation experts and information scientists while readily can be converted to machine-readable formats such as RDF Schema, OWL, etc

• Metaclasses– certain types of inference about classes is supported in an analogous

way as classes support certain types of inference about instances

• Monotonicity– It aims to be monotonic in the sense of Domain Theory: the existing

constructs and the deductions made from them should remain valid and well-formed, even as new constructs are added to the MarinTLO

15Yannis Tzitzikas et al., MTSR 2013,

Thessaloniki

MarineTLO: Query capabilities

It allows formulating complex queries, e.g.:

1.Given the scientific name of a species, find its predators with

the related taxon-rank classification and with the different codes that the organizations use to refer to them.

2. Given the scientific name of a species, find the ecosystems, waterareas and countries that this species is native to, and the common names that are used for this species in each of the countries

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

16

Page 9: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

9

The notion of competence queries as driver

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

17

#Query For a scientific name of a species (e.g. Thunnus Albacares or Poromitra Crassiceps),find/give me

Q1 the biological environments (e.g. ecosystems) in which the species has been introduced and moregeneral descriptive information of it (such as the country)

Q2 its common names and their complementary info (e.g. languages and countries where they areused)

Q3 the water areas and their FAO codes in which the species is native

Q4 the countries in which the species lives

Q5 the water areas and the FAO portioning code associated with a country

Q6 the presentation w.r.t Country, Ecosystem, Water Area and Exclusive Economical Zone (of thewater area)

Q7 the projection w.r.t. Ecosystem and Competitor, providing for each competitor the identificationinformation (e.g. several codes provided by different organizations)

Q8 a map w.r.t. Country and Predator, providing for each predator both the identification informationand the biological classification

Q9 who discovered it, in which year, the biological classification, the identification information, thecommon names - providing for each common name the language, the countries where it is usedin.

MarineTLO as Product

• The “full” version of MarineTLO (Version3.0.0)– aims at covering any part of the marine domain

– contains 70 classes and 41 properties

• The “operational” version, for the needs of iMarine(Version 3.0.0)– used for building MarineTLO Warehouse (Version 3.0.0)

– contains 92 classes and 41 properties

– applied for integrating data mainly from FLOD, ECOSCOPE, part of WoRMS and FISHBASE sources

• URL: www.ics.forth.gr/isl/MarineTLO

18Yannis Tzitzikas et al., MTSR 2013,

Thessaloniki

Page 10: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

10

FORTH, i-Marine, Ostend, January 2013 19

TLO Entity

Temporal Phenomenon

Persistent Item Actor

Physical Man Made Thing

Man Made Thing

Conceptual Object

Physical Thing

Event

S-Class Level (Version 3.0.0)

Exclusive Economic Zone

Codification System

Identifier EEZCode

FAOGearTypeIdentifier

FAOVesselTypeIdentifier

Man Made Object

Vessel

Water Area

Area

Sub Area

Division

Sub Division

Ecosystem

Human ActivityAttribute Assignment

Country Code Assignment

Ecosystem Code Assignment

Scientific Name Assignment

Common Name Assignment

Water Area Code Assignment

Country

Class Level (excerpt)

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

19

FORTH, i-Marine, Ostend, January 2013 20

TLO Entity Type

Temporal Phenomenon Type

Persistent Item Type

Actor TypeDigital Object type

Conceptual Object Type

Identifier Type

Physical Thing Type

Event Type

Equipment Type

Gear Type

Vessel Type

Ecosystem Type

Human Activity Type

Marine Ecosystem Type

Attribute Assignment Type

Biotic ElementType

Fish Base MarineAnimal TypeMarine Animal

Type

DBpedia MarineAnimal Type

WoRMS MarineAnimal Type

FLOD Marine

Animal Type

ECOSCOPE MarineAnimal Type

Meta Class Level (Version 3.0.0)Meta Class Level (excerpt)

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

20

Page 11: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

11

Example 1: ThunnusAlbacares

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

21

MarineSpecies

relatedIdentifierAssigment relatedAuthorshipAssigment

Thunnus_albacares

blank_node_Thunnus_albacares

assignedName

“Thunnus Albacares”

Actor

blank_node_Bonnaterre

name

reference

Attribute AssignmentPersistentItem

Scientific Name Assignment

Event

assignedDate assignedIdentifier

assignedDate

“1788”

relatedIdentifierAssigment relatedAuthorshipAssigment

“Bonnaterre”

name

assignedName

Example 2: Scientific name assignment

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

22

Page 12: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

12

Ecosystem

isAssocitedWith isAssociatedWith

Antarctic Elephant IAtlantic Antarctic

Country Water Area

Marine Species

usualluIsBioticElementOf

nativeIntroducedEndemic

usualluIsBioticElementOf

nativeIntroducedEndemic

usualluIsBioticElementOf

nativeIntroducedEndemic

isAssocitedWith isAssocitedWith

Poromitra crassiceps

Example 3: Species Establishment

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

23

Exploiting MarineTLO

Page 13: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

13

Ways to use/exploit MarineTLO

1. For constructing semantic warehouses which:– can answer queries which cannot be answered by the underlying

sources individually

– can aid the construction of mappings between instances

– can be exploited for various other task

• We shall see how they are exploited in the context of semantic post-processing of search results

2. Various other uses

– For publishing Linked Data

– For mashing up facts

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

25

Publishing Linked Data,

Mashups

For semantic-post processing

of search results

Constructing Warehouses offering

Complex query answering

26Yannis Tzitzikas et al., MTSR 2013,

Thessaloniki

Page 14: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

14

The MarineTLO-based Warehouse

MarineTLO

Warehouse

Warehouse construction and evolution process

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

28

Define requirements in terms of competence queries

Fetch the data from the selected sources (SPARQL endpoints, services, etc)

Queries

Transform and Ingest to the Warehouse

Inspect the connectivity of the Warehouse

Formulate rules creating sameAs relationships

Apply the rules to the warehouse

Rules for Instance Matching

sameAs triples

Ingest the sameAs relationships to the warehouse

Test and evaluate the Warehouse (using competence queries)

produces

creates

Warehouse

produces

Triples

uses

uses

uses

MaTWare

MaTWare

MaTWare

Page 15: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

15

The MarineTLO-based warehouse’s contents: used sources

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

29

RDF Triple StoreMarineTLO

FLOD ECOSCOPEWoRMS (part of)

FLOD-to-TLO mapping

ReplicateReplicate

ECOSCOPE-to-TLOmapping

WoRMS-to-TLOmapping

Replicate

DBpedia-to-TLOmapping

FishBase-to-TLOmapping

DBpedia(part of)

FishBase(part of)

Replicate Replicate

The MarineTLO-based warehouse’s contents: in numbers

iMarine 2nd Review, September 2013,Brussels

SourceSpecies Number

DBpedia 14,291

FLOD 10,849

WoRMS 1124

Ecoscope 277

FishBase 31,277

Common Species (size of intersections)

FLOD WoRMS Ecoscope Fishbase

DBpedia 3,046 731 56 9833

FLOD 768 73 6141

WoRMS 53 1288

Ecoscope 53

• Now contains information about 37,000 distinct marine

species (including Fishbase). Number of triples: 2,970,058

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

30

Page 16: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

16

The MarineTLO-based warehouse’s contents: concepts

iMarine 2nd Review, September 2013,Brussels

Concepts Ecoscope FLOD WoRMS DBpedia Fishbase

Species

Scientific Names

Authorships

Common Names

Predators

Ecosystems

Countries

Water Areas

Vessels

Gears

EEZYannis Tzitzikas et al., MTSR 2013,

Thessaloniki31

Exploiting the MarineTLO-based Warehouse

forSemantic Post-Processing of Search Results

Page 17: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

17

For Semantic Post-Processing: The process

queryterms (top-L) results

(+ metadata)

Entity Mining

Semantic Analysis

Visualization/Interaction(faceted search, entity

exploration, annotation, top-k graphs, etc.)

entities / contents

semanticdata

webbrowsing

contents

• Grouping,• Ranking • Retrieving more

properties33

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

MarineTLO

Warehouse

XSearch-Portlet Screenshot

Search Results

Result of Entity

Mining

Result of textual

clustering

34Yannis Tzitzikas et al., MTSR 2013,

Thessaloniki

The Warehouse is used

The Warehouse is used

Page 18: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

18

From FLOD

From DBpedia

From Ecoscope

From WoRMS

Example of an EntityCardof Xsearch (if the entity’s type is Species)

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

The Warehouse is used

XSearch as a bookmarklet

Annotating entities over the original page

explorationEntity

exploration

36Yannis Tzitzikas et al., MTSR 2013,

Thessaloniki

The Warehouse is used

Page 19: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

19

Concluding Remarks

Concluding Remarks

• To tackle the need for having integrated sets of facts about marine species, and thus to assist research about species and biodiversity, we have described a top level ontology for that domain.

– It provides a unified and coherent core model for schema mapping which enables formulating and answering queries which cannot be answered by any individual source.

• We detailed the process of constructing MarineTLO-based warehouses. The current warehouse contains information about more than 37K marine species

• We have identified and described particular use cases and applications that exploit this ontology and it warehouse.

38Yannis Tzitzikas et al., MTSR 2013,

Thessaloniki

Page 20: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

20

Future Work and Research

• Next steps

– Finalize and make accessible the next release of the warehouse (in 2013)

• Current and Future Research

– Focus on quality/connectivity issues

Yannis Tzitzikas et al., MTSR 2013, Thessaloniki

39

Links

• MarineTLO

• http://www.ics.forth.gr/isl/MarineTLO/

• TripleStores– MarineTLO-Warehouse: http://virtuoso.i-marine.d4science.org:8890/sparql

– also browsable through http://virtuoso.i-marine.d4science.org:8890/fct

• Systems– X-Search and gCube Search

• Portlet: https://i-marine.d4science.org/ (in various VREs, e.g. FCPPS , iSearch)

• Web Applications:

– http://62.217.127.118/x-search/ (over Bing and MarineTLO-Warehouse)

– http://62.217.127.118/x-search-fao/ (over ECOSCOPE and MarineTLO-Warehouse)

40Yannis Tzitzikas et al., MTSR 2013,

Thessaloniki

Page 21: Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology

21

Thank you for your attention

41Yannis Tzitzikas et al., MTSR 2013,

Thessaloniki

Visit and send us feedback:www.ics.forth.gr/isl/MarineTLO