Top Banner
1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise Architect U.S. EPA, Washington, DC May 20, 2009 http://semanticommunity.net
26

1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

Jan 01, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

1

Open Group Internet Workshop:Enterprise Vocabulary

Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data

Brand NiemannSenior Enterprise ArchitectU.S. EPA, Washington, DC

May 20, 2009http://semanticommunity.net

Page 2: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

2

Background

• The Open Group:– The Open Group is a vendor-neutral and technology-

neutral consortium, whose vision of Boundaryless Information Flow™ will enable access to integrated information, within and among enterprises, based on open standards and global interoperability.

– Semantic Interoperability Work Group:• The Internet and the World-Wide Web have solved the basic

problems of information transmission; the next major advance will come from resolving the deeper issues of semantic interoperability.

Page 3: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

3

Background

• The workshop will include a series of brief contributions on vocabularies and their use, including from:– Dennis Attinger, Philips, on why Philips should

use vocabularies; – Ron Schuldt, Lockheed Martin, on controlled

vocabularies; – Brand Niemann, EPA, on Lightweight

Vocabularies/Ontologies for the Semantic Web / Web of Data.

Page 4: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

4

Background

• These contributions will be the basis for discussion of: – The Problem Space: Why enterprises use

vocabularies, how enterprises use vocabularies, and what problems enterprises have in using vocabularies;

– What should vocabularies contain? – Are there common principles that apply to the

seemingly different approaches?– The discussions and conclusions will be summarised

in a report which will be distributed to attendees and others that have provided input.

• Proceedings (password required)

Page 5: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

5

Overview

• 1. Some Examples:– Dublin Core, FOAF, and DOAP: Metadata, People, & Projects– SKOS: Semantic Web Topic Hierarchy– Gist: “The Minimalist Upper Ontology” (Organizations)

• 2. U.S. Federal Data Reference Model:– SICoP Special Conferences: February 6, 2007,

February 5, 2008, and February 17, 2009– Semantic Technology Conferences 2008 and 2009– DRM 3.0, Data.Gov, and Data Modeling

• 3. Recent Activities:– DAMA Data Management Book of Knowledge Glossary– Interagency Working Group on Digital Data– 2009 Ontology Summit (April 5-6th) Pilot Projects– Vocabulary Camp (May 30th)

Page 6: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

6

1. Some Examples

• Dublin Core:– The Dublin Core metadata element set is a standard for cross-

domain information resource description. It provides a simple and standardised set of conventions for describing things online in ways that make them easier to find. Dublin Core is widely used to describe digital materials such as video, sound, image, text, and composite media like web pages. Implementations of Dublin Core typically make use of XML and are Resource Description Framework based. Dublin Core is defined by ISO in ISO Standard 15836, and NISO Standard Z39.85-2007

– The Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata elements: Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, and Rights.

– See http://en.wikipedia.org/wiki/Dublin_Core

Page 7: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

7

1. Some Examples

• FOAF:– An acronym of Friend of a Friend is a machine-

readable ontology describing persons, their activities and their relations to other people and objects. Anyone can use FOAF to describe him or herself. FOAF allows groups of people to describe social networks without the need for a centralised database.

– FOAF is a descriptive vocabulary expressed using RDF Resource Description Framework and OWL Web Ontology Language.

– See: http://en.wikipedia.org/wiki/FOAF_(software)

Page 8: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

8

1. Some Examples

• DOAP:– Description of a Project (DOAP) is an RDF schema

and XML vocabulary to describe open-source projects. It was created and initially developed by Edd Dumbill to convey semantically information associated with open-source software projects. It is currently used in the Mozilla Foundation's project page and in several other software repositories.

– There are currently generators, validators, viewers, and converters to enable more projects to be able to be included in the semantic web.

– See http://en.wikipedia.org/wiki/DOAP

Page 9: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

9

1. Some Examples

• SKOS:– Simple Knowledge Organisation Systems (SKOS)

is a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is built upon RDF and RDFS, and its main objective is to enable easy publication of controlled structured vocabularies for the Semantic Web. SKOS is currently developed within the W3C framework.

– See http://en.wikipedia.org/wiki/SKOS

Page 10: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

10

1. Some Examples

• SKOS:– Semantic Web Topic Hierarchy (in OWL):

• Taxonomy of Semantic Web Topics adopted by the European Projects Knowledge Web:

– 1.0 Foundations– 2.0 Semantic Web: Core Topics– 3.0 Semantic Web Special Topics

• and REWERSE:– Knowledge Engineering / Ontology Engineering– Knowledge Representation and Reasoning– Basic Web Technologies– Information Access– Ontologies on the Semantic Web– Rules– Security / Trust / Privacy in the Semantic Web– Application Domains– Special Topics

Page 11: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

11

1. Some Examples

• Gist: The Minimalist Upper Ontology:– Introduced in 2006 and at the 2007 Ontology

Summit and Semantic Technology Conference:

• It is different from other upper ontologies in that we have attempted to do two things simultaneously:

– cover a very broad range of future applications– cover them with the fewest number of concepts

– See http://www.gist-ont.com/

Page 12: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

12

1. Some Examples• Gist: The Minimalist Upper Ontology:

– gist is an OWL 2 upper ontology with fewer than 200 concepts. It is freely available to use or modify. We have used it on several client engagements and have found that it covers most of the concepts needed for a large enterprise. Most of the distinctions we find are specializations of gist concepts rather than new concepts.

– In this session we will briefly survey the world of upper ontologies. We will describe the structure and organization of gist. We will then show several design patterns contained in gist, including those that make use of OWL 2 features.

• At the conclusion of this talk participants should be able to: – Access gist and use it as a learning aid– Understand how to specialize gist for an enterprise– Appreciate the need for some of the new OWL 2 features– Understand why committing to a minimalist upper ontology will reduce integration

effort internally and externally– This talk assumes some previous knowledge of OWL

– See http://www.semantic-conference.com/session/2054/• Dave McComb, President, Semantic Arts

– 2009 Semantic Technology Conference, June 14-18, 2009, San Jose, California, June 16, 2009, 2:15-3:15 p.m.

Page 13: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

13

Gist: the minimalist upper ontology

http://www.gist-ont.com/data/50643f60-cdd7-4e9d-b080-a543eb7c62a1/files//Gist%20Diagram3.vsd

Entities (130)

Page 14: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

14

Gist: the minimalist upper ontology

http://www.gist-ont.com/data/50643f60-cdd7-4e9d-b080-a543eb7c62a1/files//Gist%20Diagram3.vsd

Properties (86)

Page 15: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

15

2. U.S. Federal Data Reference Model

• Brief History:– DRM 1.0 – Mid-2005 (not accepted)– DRM 2.0 – December 2006 (widely accepted)– DRM 3.0 – June 2007 and Recently (Best Practices

Committee)• Workshops: February 6, 2007, February 5, 2008, and

February 17, 2009.– Lucian Russell wrote White Paper: Ontologies in the OWL-DL

sense should be created or referenced for each data item as needed, but class names should only be nouns. Non-lexical terms should only be specified as a specialization of a lexical term and specific inclusion/exclusion rules should be provided.

» Best Practice: NASA Global Change Master Directory– Professor Selmer Bringsjord: Using Sorted Logic to overcome

schema mismatch for semantic interoperability (ontology) across multiple relational databases.

Page 16: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

16

2. U.S. Federal Data Reference Model

• Federal Enterprise Architecture Reference Model Revision Submission (April 10th):– Data Description:

• Uniform Resource Identifiers (URI)

– Data Context:• Taxonomy/Ontology:

– Information: Topic and Subtopic– Data: Data Table and Data Elements– Information and Data Modeling: Build on David Hay’s “Data

Model Patterns (2009)

– Data Sharing:• Data and Metadata “Travel Together”

Page 17: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

17

2. U.S. Federal Data Reference Model

• Work on DRM 3.0:– 2008: Getting to Web Semantics for

Spreadsheets in the U.S. Government– 2009: Real World Semantic Query of

Organizational Data– Recovery.gov and Data.gov Pilots

Page 18: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

18

Steps in the Semantic Web @ EPA

Stage Knowledgebase URL

Pre-RDF Infobase http://www.sdi.gov

Ontology for Multiple RDBMS

OWL Not publicly accessible

RDF Access to RDBMS

RDF Triple Stores Proceedings of the 2008 Semantic Technology Conference

Semantic Publishing

Web 2.0/3.0 Wikis http://semanticommunity.net/ Dynamic Ontology Blackbook2 In process (1)

(1) See Semantic Web Project Methodology

Page 19: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

19

Getting to Web Semantics for Spreadsheets in the U.S. Government

• Every year, the U.S. Census Bureau publishes the Annual Statistical Abstract, "the authoritative and comprehensive summary of statistics on the social, political, and economic organization of the United States" as a large set of downloadable Excel spreadsheet files. This government data is not readily accessible to Web search engines and cannot readily be shared, reused, and analyzed in new contexts.

• This talk will present joint efforts between Cambridge Semantics, the U.S. EPA, and the Federal Semantic Interoperability Community of Practice (SICoP) to integrate semantic technologies, spreadsheets, and the Web to overcome many of these shortcomings. In particular, by representing information in the Census Bureau's spreadsheets as RDF data backed by definitions in a common semantic repository, shared concepts and relationships between different agencies' data is easily discovered and exploited. And by treating the spreadsheet as a user interface for manipulating semantic data, the data can easily be presented on the Web, where it is automatically updated when the underlying data tables change. This presentation will demonstrate the following in the context of the data that comprises the U.S. Government's Annual Statistical Abstract:

– The use of Cambridge Semantics' SHAPE middleware platform to extract semantic information from Microsoft Excel spreadsheets.

– A semantic repository containing shared definitions of data table columns that can be created, extended, and reused via a tightly integrated user interface in Excel.

– Real-time changes to information that are reflected in other spreadsheets.– Repurposing the spreadsheet-based data tables onto the Web, while maintaining a live connection to the

authoritative spreadsheet tables.– Guided search and query across the data from different spreadsheets.

• http://www.semantic-conference.com/2008/session/588/index.html– Lee Feigenbaum, VP Technology and Standards, Cambridge Semantics Inc. and Brand Niemann, Senior

Enterprise Architect, US EPA.• 2008 Semantic Technology Conference, May 18-22, 2008, San Jose, California, Wednesday, May 21, 2008, 08:30 AM -

09:30 AM.

Page 20: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

20

Real World Semantic Query of Organizational Data

• Our experience in enterprise data integration over many years has taught us that for a new technology such as the Semantic Web to succeed, we need a solution offering zero programming to implement; we deem this an essential prerequisite for mainstream adoption. We have built such a solution and show it in action providing a query-able interface to some 300+ Environmental Protection Agency spreadsheets and Oracle RDBMS. We believe this is the first time that the benefit of the Semantic Web in this context - making it completely possible for end users to ask any query across dozens of spreadsheets and databases via an Ontology - has been exposed to a mainstream audience.– http://www.semantic-conference.com/session/1559/

• Brian Donnelly, CEO, Semantic Discovery System, and Brand Niemann, Senior Enterprise Architect, US EPA.

– 2009 Semantic Technology Conference, June 14-18, 2009, San Jose, California, Wednesday, June 17, 2009, 05:00 PM - 06:00 PM.

Page 21: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

21

Recovery.gov and Data.gov Pilots

http://federaldata.wik.is/May_13%2c_2009_Semantic_Web_Meetup

Page 22: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

22

3. Recent Activities

• The DAMA Dictionary of Data Management (Version 1.0) was announced at the DAMA International Symposium & Wilshire Meta Data Conference in San Diego March 16-20, 2008:– Over 800 terms defining a common data management

vocabulary for IT professionals, data stewards and business leaders.

– Over 40 topics including finance and accounting, knowledge management, architecture, data modeling, XML, and analytics.

• DAMA Dictionary of Data Management was developed as the glossary for the DAMA-DMBOK Guide. Version 1.1 of the Dictionary will be published in conjunction with The DAMA-DMBOK Guide in 2009.

Page 23: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

23

3. Recent Activities• Interagency Working Group on

Digital Data (IWGDD):– Formed under the auspices of

the NSTC Committee for Science, the purpose is to develop and promote the implementation of a strategic plan for the Federal government to cultivate an open interoperable framework that will ensure reliable preservation and effective access to digital data for research, development, and education in science, technology, and engineering.

• See Harnessing the Power of Digital Data for Science and Society

Role Request DRM 3.0 and Data.gov

Senior Advisor Value-added Data Annual Statistical Abstract of the Census Bureau

Chief Enterprise Architect

SOA Data Services Layer

Use a Web 2.0 Wiki with Web Oriented Architecture

Data Architect Data Model Data Model Patterns (2009) by David Hay (work on data model for the Federal Government in progress)

Data Standards Ontologies Pilot Projects from Recent Ontology Summit 2009 at NIST

Information Architect

Semantic Web Open Linked Data with RDF (W3C Standard for Data on the Web)

Users Innovation Ideas (1)

The National Dialogue on IT Solutions for Recovery.gov

(1) http://federaldata.wik.is/The_Recovery_Dialogue_on_IT_Solutions

Page 24: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

24

3. Recent Activities

• 2009 Ontology Summit (April 5-6th) Pilot Projects:– Connecting ISO/IEC 11179 to Data Sets:

• ISO/IEC 11179-3 Edition 3 is expected to provide a standard metamodel for (among other things) defining the semantics of Data Elements in terms of formally defined concepts, as defined by formal ontologies. The connection between Data Elements and the actual data is however beyond the scope of 11179. Realization of the "Data Web" will require closing of this gap, to connect datasets with ontologies which define their semantics. A complete solution will need to address an array of dataset forms including XBRL, SDMX, domain-specific XML schemas and "microformats", and relational and non-relational DBMSs. Some of this may be supported by OMG CWM and/or forthcoming IMM standards, but a broader framework is called for. Details here.

Page 25: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

25

3. Recent Activities

• 2009 Ontology Summit (April 5-6th) Pilot Projects:– Suggested Structure for Work:

• Of – Existing Standards: Re-engineering, FEA Reference Models (e.g. FEA Reference Model Ontology), etc.

• For – New Standards: Compliance, Privacy (e.g. Rick Murphy Ontology of Privacy Act of 1974), etc.

• and By – New Standards: Harmonization, Executable, Acquisition (e.g. TopQuadrant Work for GSA), etc.

Page 26: 1 Open Group Internet Workshop: Enterprise Vocabulary Lightweight Vocabularlies / Ontologies for the Semantic Web / Web of Data Brand Niemann Senior Enterprise.

26

3. Recent Activities

• Vocabulary Camp, May 30-31, 2009, Washington DC:– Level of knowledge about ontologies:

– Beginner should denote that you are completely new to ontologies and semantic web standards in general.

– Intermediate notes that you have familiarity with ontologies and semantic web standards but are inexperienced building ontologies in practice.

– Expert is anyone who has actually developed an ontology before.

– Mike Lang of Revelytix is the organizer. You can contact him at [email protected].

– See http://vocamp.org/wiki/VoCampDCMay2009