Top Banner
An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway [email protected] Vijay Sugumaran Oakland University, USA [email protected]
22

An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway [email protected] Vijay Sugumaran.

Mar 28, 2015

Download

Documents

Nathan Meyer
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

An Ontology Creation Methodology: A Phased Approach

Jon Atle GullaNorwegian University of Science and Technology; Norway

[email protected]

Vijay SugumaranOakland University, [email protected]

Page 2: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

Agenda

• Ontology development• Traditional ontology learning• Limitations of ontology learning• A phased approach to ontology learning

Page 3: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

The Challenge• How to develop large complex ontologies?• How to keep ontologies updated in dynamic domains?

Page 4: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

Ontology Modeling vs. Learning• Traditional ontology engineering

approach– Project:

Form team of ontology and domain experts

– Ontology & domain experts:Collaborative manual modeling process

– Domain experts:Verify ontology against domain knowledge

– Ontology experts:Verify ontology against syntactic and semantic quality measures

• Expensive and time-consuming approach

• Stable domains assumed

• Ontology learning approach:– Domain experts:

Find representative domain text– Tool:

Extract candidate classes, individuals and properties automatically from domain texts

– Ontology & domain experts:Verify candidate structures and complete ontology

• Can also be used to verify domain quality of existing ontology

• Cost-effective approach• Not unproblematic in dynamic

domains

Page 5: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

Agenda

• Ontology development• Traditional ontology learning• Limitations of ontology learning• A phased approach to ontology learning

Page 6: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

Ontology Learning Basis• People communicate using domain-specific concepts• People document using domain-specific concepts• Ontology learning: Extract ontology structures from written documentation

• Requirements:– Documents representative for domain terminology– Documents cover all the terminology– Well-defined and consistent use of terminology in domain

Ontology discussions

Realm ofontology learning

Realm ofontology engineering

Ontology in use

Page 7: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

Levels of Ontology Learning

TermsTerms

SynonymsSynonyms

ConceptsConcepts

Concept hierarchiesConcept hierarchies

RelationsRelations

RulesRules

sponsors, costs, charter

(leader, manager, lead)

PROJECT

is_a(MANAGER, EMPLOYEE)

FINANCE(ag:SPONSOR, go: PROJECT)

x,y(manager(x,y) → report(y,x))

Degree ofdifficulty

Page 8: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

Ontology Learning Strategies• Term extraction

– Linguistic analysis– Statistical analysis

• Synonyms– Classification-based techniques– Distribution-based techniques

• Concept formation– Structure recognition– Keyphrase generation– Instance learning

• Concept hierarchy– Clustering– Lexico-syntactic patterns– Head-modifier approaches– Subsumption approaches– Classification-based techniques

• Relations– Association rules

– Concept vectors

• Rules– Structure recognition for meta-

property recognition

– Dependency trees and path similarities

Page 9: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

Ontology Learning Process

Scope managementWBSBusiness needConstituent componentsProduct description...

Domain text

Reference set

Concept candidates

PMBOK

Search ontology

Abstract elementsConstraintsPropertiesRules

Automatic extraction of concept and relationship candidates Manual selection of candidates and completion of model

Page 10: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

Ex 1. Learning Concept/Individual Candidates

Scope planning is the process of progressively elaborating and documentingthe project work (project scope) that produces the product of the project.

Scope/NNP planning/NN is/VBZ the/DT process/NN of/IN progressively/RB elaborating/VBG and/CC documenting/VBG the/DT project/NN work/NN (/( project/NN scope/NN )/) that/WDT

produces/VBZ the/DT product/NN of/IN the/DT project/NN ./.

Scope planning is the process of progressively elaborating and documenting the project work (project scope) that produces the product of the project.

Scope plan process progress elaborate document project work project scope produce product project

POS tagging

Stopword removal (571 words)

Lemmatization/stemming(POS tags not shown)

{scope planning, process, project work, project scope, product, project}Select consecutive nounsas candidate phrases

Calculate tf.idf score for phrases

{(scope planning, 0.0097), (project scope, 0.0047), (product, 0.0043), (project work, 0.0008), (project, 0.0001), (process, 0.0000)}

Page 11: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

Classes Relevant to the Drama Genre

• Data sources: IMDB, Wikipedia, Videoload

• Keyphrase extraction technique• Noun phrases ranked according

to various statistical measures

Page 12: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

TokenizerTokenizer

GATESentence

splitter

GATESentence

splitter

GATETaggerGATETagger

GATELemmatizer

GATELemmatizer

GATENoun phrase

extractor

GATENoun phrase

extractor

Noun phraseindexer

Noun phraseindexer

Associationrulesminer

Associationrulesminer

Association rules

Concept profiles

Conceptsimilarity

calculation

Conceptsimilarity

calculation

Conceptprofilebuilder

Conceptprofilebuilder

LuceneDocument

indexer

LuceneDocument

indexer

LuceneParagraph

indexer

LuceneParagraph

indexer

LuceneSentenceindexer

LuceneSentenceindexer

Lightstemmer

Lightstemmer

Relationshipmerger

Relationshipmerger

Ex 2. Learning Relationship Candidates

sim (q,d)(qi*di)

i1

n

qi2

i1

n

* di2

i1

n

(

qq)(

dd)sim (q,d)

(qi*di)i1

n

qi2

i1

n

* di2

i1

n

(

qq)(

dd)

YX , where YXIYIX ,, Ø

A rule YX holds in the transaction set D with confidence c if c% of the transactions in D that contain X also contain Y. The rule YX has support s in the transaction set D if s% of the transactions in D contains YX .

Page 13: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

Relationships Relevant to Drama Genre

• Association rules on extracted concepts

Page 14: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

Automatic OWL Generation

Page 15: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

Agenda

• Ontology development• Traditional ontology learning• Limitations of ontology learning• A phased approach to ontology learning

Page 16: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

Limitations of Ontology Learning

• Different techniques produce different results• Different data sources produce different results• Lost control over process• Extensive verification of

final ontology needed• New data hard to combine

with old data

Page 17: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

Agenda

• Ontology development• Traditional ontology learning• Limitations of ontology learning• A phased approach to ontology learning

Page 18: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

Ontology Learning for Entertainment Domain

Ontology evolution for DeutscheTelecom’s Videoload downloadservice

• What does Brangelina mean?• Should Pitt be Brad Pitt or Michael Pitt?• Actor vs. Schauspieler?• All movies of Brad Pitt?• Last movie of Pitt?

Page 19: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

Ontology Learning Project

• Duration: Nov 2007 – Nov 2009• Domain: movie download service• Ontology analysis and creation based on indexed noun phrases from

movie documents• Ontology used for search and navigation on top of FAST search platform

• Ontology learning challenges:– Domain changes from one day to another

– No consistent domain terminology

– No professional domain terminology

– Multiple languages

– Movies about anything... unlimited domain

– Ontology needs to be up to date to support search

Page 20: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

Ontology Workbench

• 3 phases that are carried out independently– Crawling into Lucene indices

– Supervised extraction of candidates

– Combining candidates into ontology structures

Collection Analysis Creation

Web document

s

Web document

s

Web document

s

CrawlingLinguistic pre-

processing

Statistical ontology extraction

Set-theoretical ontology

operations

OWL creation

Document & term

statistics

Extracted ontology elements

OWL ontology

Document ontology

Page 21: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

Interactive Ontology Development

Expandable indices

Subset of data source

Focus of analysis

List of techniques

Partial results

Stored results

Set operations for combining results

Page 22: An Ontology Creation Methodology: A Phased Approach Jon Atle Gulla Norwegian University of Science and Technology; Norway jag@idi.ntnu.no Vijay Sugumaran.

Thank you