Top Banner
Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg Pike Vienna, VA [email protected]
38

Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

Mar 27, 2015

Download

Documents

Jose Bryant
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

Semantic Interoperability: Automatically Resolving Vocabularies

4th Semantic Interoperability Conference February 10, 2006

Chuck Mosher8500 Leesburg Pike

Vienna, [email protected]

Page 2: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

2

Interoperable Information Backbone

• Enterprise-wide data abstraction layer for applications• Integrated views of data from multiple sources

– Relational databases, applications, files

• Re-useable Data Services for data consistency• Metadata-driven data management and integration• Complements other data integration tools (ETL, EAI, quality, etc.)

MetaMatrix

Enterprise Data Service LayerApplications

Data Sources

Page 3: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

3

Data Services

• A type of Web Service• Does all of the work to transform any data in

any format to a W3C compliant service– Implements all of the logic to effect the

transformation– Provides access to data sources, regardless of

source API, technology

• Does not implement application logic• Decouples the data from the application

while making the data discoverable and accessible

Page 4: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

4

Custom Apps

Web Services,Business Processes

Packaged Apps

Reporting, Analytics

EAI, Data warehouses

xml

databases

warehouses

spreadsheets

services

<sale/> <value/></ sale >

geo-spatial

rich media

Enterprise Enterprise Information Information

Sources (EIS)Sources (EIS)

Information Information ConsumersConsumers

Reusable Integrated Reusable Integrated Business ObjectsBusiness Objects

OD

BC

JDB

CS

OA

P

Exposed Exposed Information Information

ServicesServices

<WSDL><WSDL>(contract)

<WSDL><WSDL>(contract)

<WSDL><WSDL>(contract)

Model-Based Approach Maximizes Re-useData Abstraction Without Coding

Page 5: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

5

Data

Model

Meta-model

Meta Object Facility (MOF)

Page 6: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

6

MetaMatrix MetaBase Modeler• Model disparate

information sources– Relational DBs– Content Management

Systems– Files– Services– Applications

• Uses and retains domain-specific modeling terminology– Relational models

have “Tables”, “Foreign Keys”, “Columns”, etc.

– UML models have “Packages”, “Classes”, “Attributes”, etc.

Page 7: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

7

MetaMatrix MetaBase Modeler

• Define reusable data services/ business objects

• Transformations defined with:– Selects– Joins– Criteria– Unions– Functions– User defined

• Perform schema and semantic matching, data type conversion

Page 8: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

8

T

Data Sources - Authoritative- Redundant

- Overlapping

Multiple Internal/External Information Sources

Aggregate Data Services:• Relational or XML• Application-specific• Access via ODBC,

JDBC, or SOAP APIs

T T

Virtual XML Document<a>

</a>

<b>

</b>…

TTT

ODBC/JDBC JDBC SOAP

WebServices

WebServices

Portal Applications

Portal Applications

BusinessIntelligence

Applications

BusinessIntelligence

Applications

Enterprise-wide or COI-driven Data Model

• Rationalization and Semantic mediation Layer• Harmonization• Data Catalog/Dictionary

Logical Data Model

Semantic Mediation: The Problem

bldg_id SITENUM Facility_ID

Location_ID

bldg_type Depot_Number

Location_Type

Page 9: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

9

J-8 Force Structure

J-7 Operational Plans

J-6 C4CS

TData Sources- Authoritative- Redundant

- Overlapping

Multiple Internal/External Information Sources

T T

ODBC/JDBC JDBC SOAP

WebServices

WebServices

Portal Applications

Portal Applications

BusinessIntelligence

Applications

BusinessIntelligence

Applications

Enterprise-wide or COI-driven Data Models

• Rationalization• Harmonization• Data Catalogs

Building Enterprise Semantic Model(s)

J-5 Plans & Policy

J-4 Logistics (GCSS)

J-3 Operations

J-2 Intelligence

J-1 Manpower / Personnel

Page 10: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

10

Biggest Challenge in Creating Data Services?

• Semantics!!!

• Structural differences are straightforward

• Differing definitions among data sources

• Differing vocabularies among COI’s

• Established, emerging, and evolving data standards– C2IEDM, JC3IEDM, GJXDM, NIEM, GFM,

many more

• Not addressed by ETL, EAI, SOA

Page 11: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

11

A Previously Intractable Problem

• TWPDES has 1000+ core entities

• NIEM has 100,000+!

• Even a limited program with a dozen data sources could yield 10’s of 1000’s of potential mappings

• Humans cannot address this without help

• Indeed, it has stopped many data integration/reconciliation programs in their tracks.

Page 12: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

Automated Semantic Matching

Page 13: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

13

DISCLAIMER

• Semantic matching can't really be done automatically yet!

• Requires intelligence to understand the context and semantics.

• So use computers to do most of the work but then have the user confirm or check the result.

Page 14: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

14

• Given two symbols, calculate a measure of the relationship between them:

Doesn’t seem so hard…

amount quantity

The Matching Problem

Page 15: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

15

ftuqky aqfkyeyr

The Matching Problem

• Given two symbols, calculate a measure of the relationship between them:

This is what a computer “sees.”

Page 16: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

16

The Matching Problem

• Even after extracting likely symbols, matching is a difficult problem.

• Symbols alone are not enough to generate good matches: – “ID” -> “SocialSecurityNumber” or “NY”

• The solution relies on context:– “NJ”,”MA”,”CA”,”ID”– “Ego”, “SuperEgo”, “ID”

• MatchIt provides that context

Page 17: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

17

MatchIT 1.0

• Integrated component of the MetaMatrix Semantic Data Services product

• Based on ontology-driven semantic knowledge base– Word relationships, dictionaries, lexicons, thesauri

• Plug-in architecture• Standards-compliant:

– OWL– RDF– Inference engines– OSGI– Eclipse– JDBC

Page 18: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

18

FBI CBP NYC NY NJ

Data Source Services

Matched (Confidence of 90%)

Gender ID

Person Sex Code

Ontology

“Sex” semantically related to “Gender”

(Semi-)Automated Semantic Mediation

*An extensible semantic knowledge base provides a dictionary and thesaurus like information on “words”, their “meanings”, and their relationships to other words.

*A sophisticated set of matching algorithms provides string similarity matches and semantic matches with confidence ratings and explanations.

Page 19: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

19

Matching Techniques

• MatchIT uses two types of matching techniques:– String Matching

• Attempts to determine string similarity based on the lexical distance between them.

– Semantic Matching• Attempts to determine string similarity based on the

ontological distance between them within a semantic ontology.

• Generate Match Sets• Can be run individually or in combinations• Pluggable architecture allows for algorithmic

extendibility

Page 20: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

20

String Matching

• What is the lexical distance between two symbols?– “PUZZLE”, “PUZZ”– “ID”,”IDENTIFIER”– “STRONG”,”SONG”

Page 21: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

21

Semantic Matching

• How semantically similar are two concepts?

car

motor vehicle

self-propelled vehicle

wheeled vehicle

vehicle

craft

aircraft

heavier-than-air craft

airplanetruck

is a

is a

is a

is a is a

is a

is a

is a

is a

car and truck are very similar

Car and airplane are less similar

Page 22: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

22

Semantic Matching Objectives

• Find and rank the potential matches, but let the user review and decide for sure.

• I.e., eliminate 99+% of the things that don't match, and let the user review the <1%.

• Many times, a user can visually scan a small list of the top 1% and very quickly agree or disagree with the results.

• Favor false positives over false negatives.

Page 23: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

23

Semantic Matching in MetaMatrix

Ontologies[OWL/RDF]

Relational

XML

XML

XML

XMLDomain[UML/ER]

MetaBase Modeler

Custom

AnySource

XML

File System

JDBC

RDBMS

Instance-levelMatch

Instance-levelMatch

Schema-levelMatch

Schema-levelMatch

MatchIt Ontology

Semantic Knowledge Base

MetaMatrix Connector Framework

MetaMatrix Importer Framework

Models & Files[versioned]

Models & Files[versioned]

Search Index

Search Index

Web Reporting

Web Reporting

MetaBase Repository

Data Harmonization Complete

MetadataAccess

Data/ContentAccess

Ontological Semantics Access

Lexicons

Fact

Repository

Onomasticons

Find Matches

•Analyze

•Visualize

•Collaborate

•Transform

Import Export

Conceptual/Logical/Physical Data ModelsEnterprise Information Sources

Representations

Page 24: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

Example

Page 25: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

25

Overall process

• Import two nontrivial vocabularies– ERwin model of large data warehouse– TWPDES XML schema

• Extract symbols– Schema-specific tokenization algorithms

• Assign semantics to each– Symbols are keys into dictionaries

• Perform semantic matching between them

• Analyze results

Page 26: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

26

ERwin Data Warehouse Model

Page 27: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

27

TWPDES XML Schema

Mapping Classes for each XML frag

in hierarchy

Page 28: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

28

Generated Symbol Dictionary (TWPDES)

Page 29: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

29

Generated Symbol Dictionary (ERwin model)

Page 30: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

30

Editing the Dictionary

Modify Definition

Page 31: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

31

Editing the Semantics

Control Senses

Page 32: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

32

Target Model

Match Results

Page 33: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

33

Examine Details

Page 34: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

34

Match Details

Page 35: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

35

Matches Used to Build Mappings

Page 36: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

36

From Pat Cassidy & COSMO

Obligation Duty

GenericObligation

SameAs

SameAs

The Integrating Function of the Common Semantic Model –via Domain-level Mapping

Page 37: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

37

MatchIt Semantic Matching Tool

• A way to use ontologies in a world where nearly 100% of what already exists is not in an ontology.

• Map connections between ontologies that are being built and artifacts currently in use:– RDBMs schemas– XML and XSD files– Spreadsheet data– More coming, including ontologies!

• Map an imported model to a Vocabulary, and a Vocabulary to an Ontological structure

Page 38: Semantic Interoperability: Automatically Resolving Vocabularies 4 th Semantic Interoperability Conference February 10, 2006 Chuck Mosher 8500 Leesburg.

Thank you