Page 1
© Paul Buitelaar – LREC ISO meeting
May 2008
Lexical Markup Framework &
LingInfo
Paul BuitelaarCompetence Center Semantic Web & Language Technology Lab DFKI GmbH - Saarbrücken, Germany
contributions by Michael Sintek and others (LingInfo); Nils Reiter (lexical enrichment)
Page 2
© Paul Buitelaar – LREC ISO meeting
May 2008
Overview
Lexical Markup Framework (LMF) Motivation, Model, Example
LingInfo Motivation, Model, Example
Discussion
Page 3
© Paul Buitelaar – LREC ISO meeting
May 2008
LMF Motivation
Provide a common model for the creation and use of lexical resources
Manage the exchange of data between and among these resources
Enable the merging of individual electronic resources to form extensive global electronic resources
Page 4
© Paul Buitelaar – LREC ISO meeting
May 2008
LMF Model – “Core Package”
Page 5
© Paul Buitelaar – LREC ISO meeting
May 2008
LMF Model – “NLP Semantics”
Page 6
© Paul Buitelaar – LREC ISO meeting
May 2008
LMF Example – ‘Homonymy’
Page 7
© Paul Buitelaar – LREC ISO meeting
May 2008
LingInfo Motivation
Lexicalized Ontologies Representation of terms instead of ontology class labels Lexical (morphosyntactic) information for such terms – to handle
morphological& syntactic term variations and disambiguation (Lexical) Semantics is strictly in the (domain) ontology
Lexical Ontology Enrichment Increasing number of ontologies published on (Semantic) Web Reuse of knowledge in semantic annotation, ontology-based
information extraction, question answering, etc. Enrichment/integration of ontologies with lexical knowledge –
terminology, synonyms, translations, morphosyntactic information
Page 8
© Paul Buitelaar – LREC ISO meeting
May 2008
Ontologies – Multilingual Labels (RDF)
University
School is_part_of
Student
studies_at
Staff
works_at
Campuslocated_at
has_German_term
has_US-English_term
has_Dutch_termFakultät
SchoolFaculteit
Page 9
© Paul Buitelaar – LREC ISO meeting
May 2008
Ontologies – Multilingual Terms (~OMV)
University
School is_part_of
Term
has_term
Fakultät
instance_of
DE
language
faculteit
instance_of
NL
language
school
EN-US
language
Student
studies_at
Staff
works_at
Campuslocated_at
Page 10
© Paul Buitelaar – LREC ISO meeting
May 2008
Lexicalized Ontologies - LingInfo
Term-1
fakulteitsgebouw
hasOrthographicForm
NL
hasLang
hasMorphSynInfo
WordForm-1
N
hasPoS
Term-2
hasStem
Term-3
hasStem
fakulteit
hasOrthographicForm
gebouw
hasOrthographicForm
LingInfo
instanceOf
SCHOOL
hasLingInfo
“department building”“school”
“department”“school”
“building”
Page 11
© Paul Buitelaar – LREC ISO meeting
May 2008
Mapping Lexical to Semantic Structure
Term-1
fakulteitsgebouw
hasOrthographicForm
NL
hasLang
hasMorphSynInfo
WordForm-1
instanceOf
LingInfo
instanceOf
SCHOOL
hasLingInfo
LingInfo
instanceOf
BUILDING
hasLingInfo
“department building”“school”
isLocatedAt
N
hasPoS
Term-2
hasStem
Term-3
hasStem
fakulteit
hasOrthographicForm
gebouw
hasOrthographicForm
“department”“school”
“building”
Page 12
© Paul Buitelaar – LREC ISO meeting
May 2008
Derive synonyms from WordNet Check if class names are lexical entries in WordNet Extract synonyms from corresponding synsets Sense Disambiguation (pick the right synset)
Derive translations from Wikipedia Check if class names correspond to Wikipedia pages Extract translations through “Interlanguage links“ Sense Disambiguation
Derive morphosyntactic information from corpora Reverse engineer the lexicon behind PoS-tagger / Morph
Analyzer (advantage: disambiguation in context)
Lexical Enrichment
Page 13
© Paul Buitelaar – LREC ISO meeting
May 2008
LingInfo Info http://olp.dfki.de/LingInfo/
Page 14
© Paul Buitelaar – LREC ISO meeting
May 2008
Finance OntologyGeo Ontology
LingInfo Ontology
LingInfo ISO LangCodehasLang
Discussion – Homonymy (bank)
X100
hasLingInfo
X100.1
river
hasOrthographicForm
EN
hasLang
hasMorphSynInfo
Y345
instanceOf
X123
hasLingInfo
X123.1
bank
hasOrthographicForm
EN
hasLang
Z278
hasMorphSynInfo
hasLingInfo
Z278.1
money
hasOrthographicForm
EN
hasLang
hasMorphSynInfo
WordFormhasMorphSynInfo
...
...
instanceOf instanceOf
Page 15
© Paul Buitelaar – LREC ISO meeting
May 2008
SUMODOLCE
LingInfo Ontology
LingInfohasLang
hasMorphSynInfo
Automotive Ontology
Discussion – Metonymy (car-engine)
4657
hasLingInfo
4657.1
car
hasOrthographicForm
EN
hasLang
hasMorphSynInfo
39890290‘artifact’
instanceOf
8393
8393.1
engine
hasOrthographicForm
EN
hasLang
23908239‘process’
hasMorphSynInfo
instanceOf
“switch off the car” > ‘switch off the engine (process) of the car’
hasPart
hasLingInfo
Page 16
© Paul Buitelaar – LREC ISO meeting
May 2008
SUMODOLCE
LingInfo Ontology
LingInfohasLang
hasMorphSynInfo
Sports Ontology
Discussion – Syst. Polysemy (act/human)
223AX
hasLingInfo
223AX.1
catcher
hasOrthographicForm
EN
hasLang
hasMorphSynInfo
39495672‘human’
instanceOf
345ZD
345ZD.1
to catch
hasOrthographicForm
EN
hasLang
468934208‘act’
hasMorphSynInfo
instanceOf
“John is an excellent catcher / a fraud / a new arrival / a pushover / ...”
hasAction
hasLingInfo
http://www.dfki.de/~paulb/corelex.html
Page 17
© Paul Buitelaar – LREC ISO meeting
May 2008
Thanks for Your Attention!
http://olp.dfki.de/LingInfo/http://olp.dfki.de/OntoSelect/http://www.dfki.de/~paulb/corelex.html