GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation] “This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no601138”. DETECTING SEMANTIC DRIFT FOR ONTOLOGY MAINTENANCE Sándor Darányi (University of Borås, Sweden) Panos Mitzias (CERTH/ITI, Greece)
23
Embed
Detecting Semantic Drift for ontology maintenance - Acting on Change 2016
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation]
“This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no601138”.
DETECTING SEMANTIC DRIFT FOR ONTOLOGY MAINTENANCESándor Darányi (University of Borås, Sweden)Panos Mitzias (CERTH/ITI, Greece)
▶Evolving Semantics & Digital Preservation▶The PERICLES Approach▶PERICLES Tools
◦ Somoclu◦ SemaDrift
▶Putting it All Together◦ Data◦ Workflow◦ Sample Results
▶Conclusions
Outline
▶Schlieder (2010) brings three examples why LTDP is paramount:◦ Because hardware and software evolves (technology
drift)◦ Because language changes (semantic drift)◦ Because value systems underlying societies change
(social value shifts)▶Apart from DP, formalizing change scenarios so that
they become manageable by computers is a hot research topic also in:◦ Semantic Web, Knowledge Engineering & Management,
Natural Language Processing, Document Engineering, Digital Humanities, Data Science
Evolving Semantics & Digital Preservation
▶With DP acting roughly in the 5 to 50 years interval, recent advances in LTDP look at longer ranges◦ 2000 years: the use of DNA for very long term DP [Grass et
al., 2015]◦ 13.8 billion years: DNA combined with nanostructured
glass storage [Kazansky et al., 2016]▶The ultimate question is the returns of investment
into DP and LTDP, should one lose access to already preserved content◦ Currently proposed preventive measure: Develop scalable
methodologies of context-aware content interpretation by monitoring semantic vs. conceptual drifts
Evolving Semantics & LTDP Horizons
▶We address evolving semantics from two perspectives: ◦ Change-sensitive ontologies necessitate logic◦ Scalability and the distributed nature of content asks
for statistical processing▶These two major components complement and
inform each other and become tools of the model-driven DP paradigm◦ E.g. collection-specific domain ontologies and change
monitoring options help appraisal
The PERICLES Approach
▶Time-dependent content displacement in vector space affects categorization & retrieval
▶Model such content dynamics on a vector field, by metaphorical use of physical concepts
▶Somoclu (Self-Organizing Map Over a CLUster), using the ESOM (Emerging Self-Organizing Maps) algorithm
◦ Fastest massively parallel open-source SOM algorithm available, developed in PERICLES
[email protected] I added the Somoclu URL in the bottom, you can remove it if you don't want it there.
Sándor Darányi
_Re-opened_To the Morphing Chains slide: it will be confusing that our single S2S example is Ophelia whereas the chains relate to terminology changes about software based art, but that for a single slide only. Somehow Panos should stand by to explain how morphing chains can be matched with statistical drifts, or we end up in an unexplored area. Maybe how the drift log can be used to create such chains?
▶Reduces high-dimensional space to low-dimensional one (2-d)
▶Preserves local topology▶Suitable for drift detection of
feature/ object locations▶After training the algorithm, each
data instance has a node (Best Matching Unit, BMU) on the map
▶Intense colours on the map indicate high distances between the original data points
Somoclu and Self-Organizing Maps
Drift Detection Workflow
Problem▶To measure semantic
drift in ontologies across time & versions▶Related to ontology
evolution, versioning, drift/shift/decay ->
PERICLES Tools - SemaDrift
▶A suite of tools for measuring drift in ontologies across time/versions◦ SemaDrift Library (API)◦ SemaDrift Protege Plugin (GUI desktop application)◦ SemaDrift FX (GUI desktop application)
▶Cross-domain, no prior programming knowledge▶Apache V2 License▶Two proof-of-concept use cases: Tate and OWL-S
PERICLES Tools - SemaDrift
SemaDrift Workflow
Collection of rdfs:labels
Concept to concept Label Drift
Series of ontologies
Collection of property triples
Concept to concept Intensional Drift
Collection of instance URIs
Concept to concept Extensional Drift
Concept to concept Whole Drift
Average Label, Intension, Extension Drift for the series
compare each concept to all concepts of next ontologyfor all concepts
average of label, int, ext
average of all concepts
output
SemaDrift GUI Desktop Applications
▶Extracted offline▶In this scenario, extensional drift shows
clearly
Morphing Chains
▶Monitor feature (index term) drifts over time▶ Apply threshold to at-risk (splitting/merging) index terms▶ Extract index terms above threshold▶Ontology Creation (creation of a Digital Ecosystem Model,
DEM)SOMOCLU > Propose least volatile terms to be included in the model
▶Ontology MaintenanceSOMOCLU+SemaDrift > Assess at-risk terminology, update model, alert user