Top Banner
Asian Language Resources Summit, Phuket, March, 2009 (ICT-211423) ing Ontologies for Transition-Based Organization Intelligent Content and Semantics //www.kyoto-project.eu/ Vossen, VU University Amsterdam
38

Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Mar 27, 2015

Download

Documents

Lillian Roche
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

KYOTO (ICT-211423)Yielding Ontologies for Transition-Based OrganizationFP7: Intelligent Content and Semantics

http://www.kyoto-project.eu/

Piek Vossen, VU University Amsterdam

Page 2: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

2

Overview

• Background information

• Baseline for retrieval in environment domain

• System architecture

• Knowledge mining

• Conclusions

Page 3: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

3

KYOTO (ICT-211423) Overview • Title: Knowledge Yielding Ontologies for Transition-Based Organization

• Funded: – 7th Framework Program-ICT of the European Union: Intelligent Content and Semantics

– Taiwan and Japan funded by national grants • Goal:

– Open and free platform for knowledge sharing across languages and cultures– Wiki environment that allows people in the field to maintain their knowledge and agree on

meaning without knowledge engineering skills– Bootstrap through open text mining & concept learning– Enables knowledge transition and information search across different target groups,

transgressing linguistic, cultural and geographic boundaries.– Enables deep semantic search for facts and knowledge

• URL: http://www.kyoto-project.eu/ (http://www.kyoto-project.eu/)• Duration:

– March 2008 – March 2011• Effort:

– 364 person months of work.

Page 4: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

4

Consortium

1. Vrije Universiteit Amsterdam (Amsterdam, The Netherlands), 2. Consiglio Nazionale delle Ricerche (Pisa, Italy), 3. Berlin-Brandenburg Academy of Sciences and Humantities (Berlin,

Germany), 4. Euskal Herriko Unibertsitatea (San Sebastian, Spain), 5. Academia Sinica (Tapei, Taiwan), 6. National Institute of Information and Communications Technology

(Kyoto, Japan), 7. Irion Technologies (Delft, The Netherlands), 8. Synthema (Rome, Italy), 9. European Centre for Nature Conservation (Tilburg, The Netherlands), • Subcontractors:

– World Wide Fund for Nature (Zeist, The Netherlands), – Masaryk University (Brno, Czech)

Page 5: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

5

KYOTO (ICT-211423) Overview

• Languages: – English, Dutch, Italian, Spanish, Basque, Chinese, Japanese

• Domain:– Environmental domain, BUT usable in any domain

• Global: – Both European and non-European languages

• Available: – Free: as open source system and data (GPL)

• Future perspective: – Content standardization that supports world wide communication

Page 6: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

State of the artin the environment domain

Page 7: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

7

Baseline for environment domain• Mainly use Google, first 10 hits, no advanced options• Textual search with linguistic enhancements but no real semantic

search:– polluted water….

– polluting water….

• Growing time & information pressure:– deliver actual information from diverse & dynamic sources

– regional, local situations ►no general source

– various subdomains ► government, legal, biology, health, industry

– difficult access ► scientific publications

– no time to read ► too much information and work pressure

– dependent on trust: scientists ► environmentalist ►government ►general public

Page 8: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

8

High-level targets &Low-level questions

• High level target (about 300 questions collected)– Are there huge negative effects with regard to ecological

networks and alien invasive species?

• Low level facts that support answering the high level targets:– cases of alien invasion– amount of species– causal relations associated with these (increments of)

invasions– causes related to ecological networks– limit in the same time and location boundary

Page 9: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

9

Page 10: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

10

Baseline retrieval results 6 persons, 30 high-level questions,

Result Rank

CONFIRMED 

DISAPPROVED 

UNDECIDED 

Total 

0 13 20.31% 27 20.30% 10 15.87% 50 19.23%

1 6 9.38% 9 6.77% 9 14.29% 24 9.23%

2 8 12.50% 13 9.77% 7 11.11% 28 10.77%

3 5 7.81% 6 4.51% 3 4.76% 14 5.38%

4 8 12.50% 6 4.51% 2 3.17% 16 6.15%

5 2 3.13% 7 5.26% 3 4.76% 12 4.62%

6 2 3.13% 6 4.51% 4 6.35% 12 4.62%

7 2 3.13% 2 1.50% 1 1.59% 5 1.92%

8 4 6.25% 3 2.26% 1 1.59% 8 3.08%

9 1 1.56% 5 3.76% 0 0.00% 6 2.31%

-1 13 20.31% 49 36.84% 23 36.51% 85 32.69%

Total 64 24.62% 133 51.15% 63 24.23% 260  

Page 11: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

11

KYOTO's Solution• Text mining:

– Massive and accurate indexing of facts from vast amounts of text;– In any language/culture from scattered sources;– Again and again to detect trends and changes;– Direct relation between knowledge modeling effort and text mining

• Knowledge modeling:– automatic learning of terms and concepts from text in any language;– formalization of knowledge in computer usable format -> wordnets &

ontologies• Community software:

– For experts in the field and not knowledge engineers– Continuous and collaborative effort:

• adapt to the changing domain;• consensus in the field;• consensus across languages and cultures

– Produce interoperable, formal, standardized knowledge structures;– Relate knowledge structure to expressions in languages

Page 12: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Top

Middle

H20 CO2

Substance

Abstract

Process

Physical

Ontology

Environmental organizations

Tybot: term yielding robot

Kybot: knowledge yielding robot

Wordnets

Distributed, diverse & dynamic data

1

Capture text:"Sudden increase of CO2 emissions in 2008 in Europe"

2

CO2 emission3

Wikyoto

maintainterms & concepts

4

Index facts:Process: Emission Involves: CO2Property: increase, suddenWhen: 2008 Where: Europe

5Text & Fact Index

SemanticSearch

6

Citizens

Governments

Companies

DomainCO2

EmissionH20

PollutionGreenhouse

Gas

Page 13: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

System architecture

Page 14: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Original Document

Base

Keyword Search

Semantic & Syntactic Base

Kyoto Annotation

Format (KAF)

Linguistic Processor

End User

Semantic Search

End User

1

2

3

Data Flow Diagram of Kyoto System

Fact Base

Fact Extractor

Fact User

Kybot

Term BaseTerm

Extractor

Tybot

Multilingual Knowledge

Base

Wiki Term Editor

Concept User

Wikyoto

WordnetsOntologiesinterlinked

Page 15: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

15

Kyoto Annotation Format KAF

• Kyoto Annotation Format (Level 1)a multi-layered annotation format for:– Tokenizaton and word form segmentation– POS tagging – Lemmatization and Term extraction – Constituency Tagging– Dependency Tagging

ENG-3.0-107695012-N

Page 16: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

16

Semantic Annotation• Semantic Annotation Format for:

– Named Entity Recognition (time, events, quant. …)

– Word Sense Disambiguation (D-WSD)– Semantic Role Labeling (SRL)

no synsets

KAF level2 (SemKAF)ENG-3.0-107630294-N

Page 17: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

17

<term tid="t4" type="open" lemma="population" pos="N"> <span> <target id="w4"/> </span> <senseAlt>

<sense sensecode="EN-17-00861095-n" /><sense sensecode="EN-17-00859568-n" />.......

<term tid="t4" type="open" lemma="population" pos="N"> <span> <target id="w4"/> </span> <senseAlt>

<sense sensecode="EN-17-00859568-n" confidence="0.80 "/><sense sensecode="EN-17-00257849-n" confidence="0.13 /><sense sensecode="EN-17-00962397-n" confidence="0.07 />

</senseAlt> </term>

KAF annotation: WSD

Page 18: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

18

Data formats

Level of annotation:1. Morpho-syntax annotation2. Semantic annotation

3. Terms representation

4. Facts annotation

5. Wordnets6. Ontologies

Standard format

}KAF <=(MAF, SYNAF, SEMAF)

TMF

KAF

Wordnet-LMF OWL

Page 19: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Knowledge mining

Page 20: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

20

Knowledge mining

• Concept mining (Tybots):– Extract terms and relations in a language– Map the terms to an existing wordnet– Ontologize terms to concepts and axioms

• Fact mining (Kybots)– Define logical patterns– Define expression rules in a language

Page 21: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

21

What Tybots do...

• Input are text documents• Linguistic processors generate KAF annotation

(sequential):– morpho-syntactic analysis– semantic roles– named entities– wordnet and ontology mappings

• Output are term hierarchies in TMF (generic):– structural parent relations– quantified structural and semantic relations– statistical data

Page 22: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

22

SourceDocuments

LinguisticProcessors

[[the emission]NP [of greenhouse gases]PP [in agricultural areas]PP] NP

Morpho-syntactic analysis

TYBOT ConceptMiners

Abstract Physical

H20 CO2

Substance

CO2Emission

WaterPollution

Ontology

Process

Chemical Reaction

GlobalWarming

GreenhouseGas

Ontologize

Axiomatize

(instance s1 Substance) (instance e1 Warming) (katalyist s1 e1)

Synthesize

in

of

Term hierarchy

emission gas

greenhouse gas

area

agricultural area

CO2

naturalprocess:1

English Wordnet

emission:2gas:1

area:1

greenhouse gas:1

rural area:1

geographical area:1

region:3

location:3 substance:1

emission:3

farmland:2

CO2

Conceptual modeling

Page 23: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

23

What Kybots do

• Input:– KAF annotations of text: sequential & encoded by

language– Conceptual frame from the ontology– Expression rules for frame to language mapping:

• Wordnet in a language• Morpho-syntactic mappings rules

• Output are a database of facts in FactAF (generic):– aggregated facts– inferred facts– language neutral

Page 24: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

24

Fact mining• KYBOT = Knowledge Yielding Robot• Logical expression

– (instance, e1, Burn) (instance, e2, Warming) (cause, e1, e2) – (instance, s1, CO2) (instance, e1, GlobalWarming) (katalyist, s1,e1)

• Expression rules per language: – [N[s1]V[e1]]S e.g. "CO2 is emitted", "fine dust blocks sun-light"– [N[s1]N[e1]N e.g. "CO2 emission", "sun-light blocking"– [[N[e1]][prep][N[s2]]NP e.g. "emission of CO2", "sun light blocking by fine dust"

• Ontology * Wordnets– Capabilities: WNT -> adjectives ("explosive", "toxic"), WNT -> nouns

("explosive", "poison")– Causes: WNT -> verbs ("eat") , WNT -> nouns ("consumption")– Process: DamageProcess, ProduceProcess

• Kybot compiler– kybots = logical pattern+ ontology + WN[Lx] + ER[Lx]

Page 25: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

25

Fact mining by Kybots

SourceDocuments

LinguisticProcessors

[[the emission]NP [of greenhouse gases]PP [in agricultural areas]PP] NP

Morpho-syntactic analysis (KAF)

Abstract Physical

H2O CO2

Substance

CO2 emission

water pollution

Ontology Wordnets &Linguistic Expressions

Process

Chemical Reaction

Generic

Logical Expressions

[[the emission]NP ] Process: e1 [of greenhouse gases]PP Patient: s2 [in agricultural areas]PP] Location: a3

Fact analysis

Patient

PatientDomain

• semantic role labelling• time & place• aggregation from all relevant phrases and documents

• inferencing• adding trust and reliability

Page 26: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Wikyoto

Page 27: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

27

Do populations always consist of marine species?

A.....

decline...

population.....Z

Are terrestrial species never

marine species?

Simplified Term Fragment

population

marinespecies

terrestrialspecies

Simplified Ontology Fragment

?Population

Group

KyotoServer

Hidden

Shown

.... populations declined

.....terrestrial andmarine species..

in forests.....declined

Do populations consist of

marine species?

InterviewAre terrestrial

species a type of

populations?

Interview

.... populations such as

terrestrial and marine species .....

Smart Kytext

KAF DE-TNTybotspdf

FactAFKAF

Kybots

plugin plugin

DE-KONDE-WN

Facts in RDF

G-WN

Wordnets in LMFOntologies in OWL-DL

G-KON

WIKIPEDIA

SUMO DOLCE

GEO

FRAMENET

Page 28: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Kyoto Knowledge Base

WnIT

Domain

WnEN

Domain

WnEU

Domain

WnNL

DomainWnJP

Domain

WnCH

Domain

WnES

DomainOntologyOntologyOntology

Domain Ontology

Page 29: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Potential impact

Page 30: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

30

Ultimate goal

• Global standardization and anchoring of meaning such that:– Machines can start to approach text understanding -> semantic

web connects to the current web– Communities can dynamically maintain knowledge, concepts

and their terms in an easy to use system– Cross-linguistic and cross-cultural sharing and communication

of knowledge is enabled

• Establish a Global-Wordnet-Grid: formalization of Wikipedia for humans AND machines across languages

Page 31: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

31

Inter-LingualOntology

Device

Object

TransportDeviceEnglish Words

vehicle

car train

1

2

3 3

Czech Words

dopravní prostředník

auto vlak

2

1French Words

véhicule

voiture train

2

1

Estonian Words

liiklusvahend

auto killavoor

2

1

German Words

Fahrzeug

Auto Zug

2

1

Spanish Words

vehículo

auto tren

2

1

Italian Words

veicolo

auto treno

2

1

Dutch Words

voertuig

auto trein

2

1

Global WordNet Grid

Page 32: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

32

Linking Open Data dataset cloud

http://richard.cyganiak.de/2007/10/lod/

Wordnetsailingterms

Ontologyenvironment

concepts

environmentfacts

Ontologymedical

concepts

Wordnetlegalterms

Wordnetmedicalterms

medicalfacts

legalfacts

Ontologylegal

concepts

Ontologysailing

concepts

Wordnetenvironment

terms

Wordnetenvironment

terms

Wordnetenvironment

terms

Wordnetenvironment

terms

Wordnetenvironment

terms

Page 33: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Conclusions

Page 34: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

34

Kyoto main assets

• Wiki platform (WIKYOTO) for connecting, transferring and controlling knowledge and information across people and computers

• Term yielding robots (TYBOT): software that extracts terms and concepts from documents

• Knowledge yielding robots (KYBOT): fact extraction software that generates a comprehensive list of facts from collection of sources

• Fact repositories & fact alert: reports changes in facts on a collection of sources

• Domain WORDNETS and domain ONTOLOGIES• Create the backbone for the Global Wordnet Grid

Page 35: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

35

What makes KYOTO unique?

• Integrates & combines all ► knowledge engineering, language engineering, wikis, term & concept learning, fact mining from text in and across languages, & standardization

• Direct relation between concept modeling and text mining ► make it worth the effort

• Wikyoto community tool ► hides technology and complex knowledge and language representation

• Operated by community people and not by knowledge engineers and language technology people ► exploits massive labor force of communities all over the world

Page 36: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

36

• Text mining and ontology learning developed for separate languages – ►KYOTO multi and cross-lingual & cultural– ► cross-lingual and cross-cultural semantic interoperability

• Text mining and ontology learning is often limited to a specific domain and/or application ►KYOTO for any domain and application

• Text mining and ontology learning does not relate the terms and concepts to generic language and knowledge resources ►KYOTO anchors knowledge from a community to general vocabulary and likewise to other communities

What makes KYOTO unique?

Page 37: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Free, open source license (GPL)Thank you for your attention

Page 38: Asian Language Resources Summit, Phuket, March, 2009 KYOTO (ICT-211423) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.

Asian Language Resources Summit, Phuket, March, 2009

38

Contribution of KYOTO

html

•hundreds of thousands sources in the environment domain•in many different languages•spread all over the world•changing every day

xls

pdf

• KYOTO learns terms and concepts from text documents, • Stored as structures that people and computers understand

Wordnetenvironment

terms

Ontologyenvironment

concepts

Wordnetenvironment

terms

Wordnetenvironment

termsWordnet

environmentterms

• KYOTO delivers a Web 2.0 environment for community based control• Connects people across language and cultures• Establish consensus and knowledge transition

• KYOTO enables semantic search and fact extraction• Software can partially understand language and exploit web 1 data• Understanding is helped by the terms and concepts defined for each language

environmentfacts

TYBOT

KYBOT

WIKYOTO