Speaker: Raúl García-Castro Talk at IMATI-CNR, October 15th, Genova, Italy The evolution of semantic technology evaluation in my own flesh (The 15 tips for technology evaluation) Raúl García-Castro Ontology Engineering Group. Universidad Politécnica de Madrid, Spain [email protected]
51
Embed
The evolution of semantic technology evaluation in my own flesh (The 15 tips for technology evaluation)
Slides of my talk given at IMATI-CNR on October 15th 2013.
If you like them, I am available for gigs!
Abstract: In this talk I will describe how semantic technology evaluation has evolved in the last ten years, focusing on my own research and experiences. It starts with evaluation as a one-time one-user activity and shows the progress towards mature evaluations that are community-driven and supported by rich methods and infrastructures. Along this talk, I will unveil the 15 tips for technology evaluation, which should be of interest for anyone interested in such topic.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Speaker: Raúl García-Castro Talk at IMATI-CNR, October 15th, Genova, Italy
The evolution of semantic technology evaluation
in my own flesh (The 15 tips for technology evaluation)
Raúl García-Castro
Ontology Engineering Group. Universidad Politécnica de Madrid, Spain
The Semantic Web is: • “An extension of the current web in which information is given well-
defined meaning, better enabling computers and people to work in cooperation” [Berners-Lee et al., 2001]
• A common framework for data sharing and reusing across applications • Distinctive characteristics:
- Use of W3C standards - Use ontologies as data models - Inference of new information - Open world assumption
• High heterogeneity: - Different functionalities
• In general • In particular
- Different KR formalisms • Different expressivity • Different reasoning capabilities
4
Distributed data
repository
Distributed instance repository
Ontology learner
Ontology selector
Distributed alignment repository
Ontology ranker
Ontology versioner
Service composer
Distributed annotated
data repository Ontology
evaluator
Ontology visualizer
Ontology editor
Ontology browser
Ontology profiler
Service orchestration
Service choreography
engine
ONTOLOGY DEVELOPMENT & MANAGEMENT
ONTOLOGY CUSTOMIZATION
Manual annotation
Ontology populator
Query answering
Ontology merger
Instance editor
Ontology integrator
Ontology transformer
Ontology reconciler
ONTOLOGY ALIGNMENT
DATA MANAGEMENT
Ontology evolution manager
Ontology evolution visualizer
ONTOLOGY EVOLUTION
Ontology searcher
Ontology localizer
Ontology configuration
manager
Ontology aligner
Ontology matcher
Semantic query
processor
Semantic query editor
Service process mediator
Service non-functional
selector
Service discoverer
Service directory manager Distributed
ontology repository
Information Directory manager
Ontology modularizer
Automatic annotation
Distributed registry
SEMANTIC WEB
SERVICES
ONTOLOGY INSTANCE
GENERATION
QUERYING AND
REASONING
García-Castro, R.; Muñoz-García, O.; Gómez-Pérez, A.; Nixon L. "Towards a component-based framework for developing Semantic Web applications". 3rd Asian Semantic Web Conference (ASWC 2008). 2-5 February, 2009. Bangkok, Thailand.
• Generates and inserts into the tool synthetic ontologies accordant with: - Load factor (X). Defines the size of ontology data - Ontology structure dependent on the benchmarks
benchmark1_1_08
benchmark1_1_09
Inserts N concepts in an ontology
Inserts a concept in N ontologies
1 ontology
N ontologies
benchmark1_3_20
benchmark1_3_21
Removes N concepts from an ontology
Removes a concept from N ontologies
1 ontology with N concepts
N ontologies with 1 concept
Execution needs Operation Benchmark
For executing all the benchmarks, the ontology structure includes the execution needs of all the benchmarks
Result analysis - Latency Metric for the execution time: The median of the execution times of a method
No se puede mostrar la imagen. Puede que su equipo no tenga suficiente memoria para abrir la imagen o que ésta esté dañada. Reinicie el equipo y, a continuación, abra el archivo de nuevo. Si sigue apareciendo la x roja, puede que tenga que borrar la imagen e insertarla de nuevo.
N=400, X=5000
Metric for the variability of the execution time: The interquartile range of the execution times of a method
No se puede mostrar la imagen. Puede que su equipo no tenga suficiente memoria para abrir la imagen o que ésta esté dañada. Reinicie el equipo y, a continuación, abra el archivo de nuevo. Si sigue apareciendo la x roja, puede que tenga que borrar la imagen e insertarla de nuevo.
N=400, X=5000
Metric for anomalies in the execution times: Percentage of outliers in the execution times of a method
No se puede mostrar la imagen. Puede que su equipo no tenga suficiente memoria para abrir la imagen o que ésta esté dañada. Reinicie el equipo y, a continuación, abra el archivo de nuevo. Si sigue apareciendo la x roja, puede que tenga que borrar la imagen e insertarla de nuevo.
N=400, X=5000
Effect of changes in method parameters: Comparison of the medians of the execution times of the benchmarks that use the same method
No se puede mostrar la imagen. Puede que su equipo no tenga suficiente memoria para abrir la imagen o que ésta esté dañada. Reinicie el equipo y, a continuación, abra el archivo de nuevo. Si sigue apareciendo la x roja, puede que tenga que borrar la imagen e insertarla de nuevo.
N=400, X=5000
8 Methods with execution times>800 ms.
3 methods with IQR>11 ms.
2 methods with % outliers>5%
5 methods with differences in execution times > 60 ms.
Slope of the function estimated by simple linear regression of the medians of the execution times from a minimum load (X=500) to a maximum one (X=5000).
• Analysis of results was difficult - The evaluation was executed 10 times with different load factors:
500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, and 5000 - 128 benchmarks x 10 executions = 1280 files with results!!!!!
Ontology Development
Tool
Ontology Development
Tool
Ontology Development
Tool
Benchmark Suite
Executor
Workload Generator
Performance Benchmark
Suite
Ontology Development
Tool
Measurement Data Library
Statistical Analyser
García-Castro R., Gómez-Pérez A "Guidelines for Benchmarking the Performance of Ontology Management APIs" 4th International Semantic Web Conference (ISWC2005), LNCS 3729. November 2005. Galway, Ireland.
• Interoperability is the ability that Semantic Web technologies have to interchange ontologies and use them - At the information level; not at the system level - In terms of knowledge reuse; not information integration
• In the real world it is not feasible to use a single system or a single formalism
• Different behaviours in interchanges between different formalisms:
Subclass of class Subclass of restriction Value constraints
Cardinality + object property
Cardinality + datatype property
Set operators
Group No. Class hierarchies 17 Class equivalences 12 Classes defined with set operators 2 Property hierarchies 4 Properties with domain and range 10 Relations between properties 3 Global cardinality constraints and logical property characteristics
5
Single individuals 3 Named individuals and properties 5 Anonymous individuals and properties 3 Individual identity 3 Syntax and abbreviation 15 TOTAL 82
David S., García-Castro, R.; Gómez-Pérez, A. "Defining a Benchmark Suite for Evaluating the Import of OWL Lite Ontologies". Second International Workshop OWL: Experiences and Directions 2006 (OWL2006). November, 2006. Athens, Georgia, USA.
• Execution informs about the correct execution: – OK. No execution problem – FAIL. Some execution problem – Comparer Error (C.E.) Comparer exception – Not Executed. (N.E.) Second step not executed
• Information added or lost in terms of triples
• Interchange informs whether the ontology has been interchanged correctly with no addition or loss of information: – SAME if Execution is OK and Information added and
Information lost are void – DIFFERENT if Execution is OK but Information added
or Information lost are not void – NO if Execution is FAIL, N.E. or C.E.
• Automatically executes experiments between all the tools • Allows configuring different execution parameters • Uses ontologies to represent benchmarks and results • Depends on external ontology comparers (KAON2 OWL Tools and RDF-
García-Castro, R.; Gómez-Pérez, A., Prieto-González J. "IBSE: An OWL Interoperability Evaluation Infrastructure". Third International Workshop OWL: Experiences and Directions 2007 (OWL2007). June, 2007. Innsbruck, Austria.
• Different perspectives for analysis - Results per tool / pair of tools - Results per component - Result evolution over time - …
26
Tool import/export: Models and executes
Does not model and executes Models and fails
Does not model and fails Not executed
Same information More information Less information
Tool fails Comparer fails
Not valid ontology
Ontology comparison:
0
5
10
15
20
25
30
35
40
45
50
04-2005 05-2005 10-2005 01-2006
Models and executes
Not models and executes
Models and fails
Not models and fails
Combinations K-K P-P W-W K-P K-W P-W K-P-W Classes (2) Y Y Y Y Y Y Y Classes instance of a single metaclass (4) Y Y - N - - - Classes instance of multiple metaclasses (1) Y N - N - - - Class hierarchies without cycles (3) Y Y Y Y Y Y Y Class hierarchies with cycles (2) - - - - - - - Classes related through object or datatype properties (6) - - - - - - - Datatype properties without domain or range (7) Y Y - N - - - Datatype properties with multiple domains (3) Y - - - - - - Datatype properties whose range is String (5) Y Y Y N N Y N Datatype properties whose range is a XML Schema datatype (2) Y - Y - Y - - Object properties without domain or range (8) Y Y - Y - - - Object properties with a domain and a range (2) Y Y Y Y Y Y Y Object properties with multiple domains or ranges (5) Y - - - - - - Instances of undefined resources (1) - - - - - - - Instances of a single class (2) Y Y Y Y Y Y Y Instances of a multiple classes (1) Y N - N - - - Instances related via object properties (7) Y Y Y Y Y Y Y Instances related via datatype properties (2) Y Y Y N Y Y N Instances related via datatype properties with range a XML schema datatype (2) - - Y - - - - Instances related via undefined object or datatype properties (3) - - - - - - -
Clear picture of the interoperability between different tools • Low interoperability and few clusters of interoperable tools • Interoperability depends on:
• Tools have improved • Involvement of tool developers is needed
- Tool developers have been informed - Tool improvement is out of our scope
• Results are expected to change - Continuous evaluation is needed
García-Castro, R.; Gómez-Pérez, A. "Interoperability results for Semantic Web technologies using OWL as the interchange language". Web Semantics: Science, Services and Agents in the World Wide Web. ISSN: 1570-8268. Elsevier. Volume 8, number 4. pp. 278-291. November 2010.
García-Castro, R.; Gómez-Pérez, A. "RDF(S) Interoperability Results for Semantic Web Technologies". International Journal of Software Engineering and Knowledge Engineering. ISSN: 0218-1940. Editor: Shi-Kuo Chang. Volume 19, number 8. pp. 1083-1108. December 2009.
Method for benchmarking interoperability • Common for different Semantic Web technologies • Problem-focused instead of tool-focused • Manual vs automatic experiments:
- It depends on the specific needs of the benchmarking - Automatic: cheaper, more flexible and extensible - Manual: higher quality of results
Resources for benchmarking interoperability • All the benchmark suites, software and results are publicly
available • Independent of:
- The interchange language - The input ontologies
Automatic Manual
IBSE rdfsbs IRIBA
OWL Lite Import B. Suite RDF(S) Import B. Suite RDF(S) Export B. Suite
RDF(S) Interoperability B. Suite
Tool X Tool Y Tool X Tool Y
RDF(S) Interoperability B. OWL Interoperability B.
García-Castro, R. "Benchmarking Semantic Web technology". Studies on the Semantic Web vol. 3. AKA Verlag – IOS Press. ISBN: 978-3-89838-622-7. January 2010.
Universidad Politécnica de Madrid, Spain (Coordinator) University of Sheffield, UK Forschungszentrum InformaCk, Germany University of Innsbruck, Austria InsCtut NaConal de Recherche en InformaCque et en AutomaCque, France
1
3 1
2 1 2
University of Mannheim, Germany University of Zurich, Switzerland STI InternaConal, Austria Open University, UK Oxford University, UK
Wrigley S.; García-Castro R.; Nixon L. "Semantic Evaluation At Large Scale (SEALS)". 21st International World Wide Web Conference (WWW 2012). European projects track. pp. 299-302. Lyon, France. 16-20 April 2012.
García-Castro R.; Esteban-Gutiérrez M.; Kerrigan M.; Grimm S. "An Ontology Model to Support the Automatic Evaluation of Software". 22nd International Conference on Software Engineering and Knowledge Engineering (SEKE 2010). pp. 129-134. Redwood City, USA. 1-3 July 2010.
García-Castro R.; Esteban-Gutiérrez M.; Gómez-Pérez A. "Towards an Infrastructure for the Evaluation of Semantic Technologies". eChallenges e-2010 Conference (e-2010). pp. 1-8. Warsaw, Poland. 27-29 October 2010.
García-Castro R.; Gómez-Pérez A. "A Keyword-driven Approach for Generating Ontology Language Conformance Test Data". Engineering Applications of Artificial Intelligence. ISSN: 0952-1976. Elsevier. Editor: B. Grabot.
Grangel-González I.; García-Castro R. "Automatic Conformance Test Data Generation Using Existing Ontologies in the Web". Second International Workshop on Evaluation of Semantic Technologies (IWEST 2012). 28 May 2012. Heraklion, Greece.
Campaign Tool Provider Country Ontology engineering Jena HP Labs UK
Sesame Aduna Netherlands Protégé 4 University of Stanford USA Protégé OWL University of Stanford USA NEON toolkit NEON Foundation Europe OWL API University of Manchester UK
Reasoning HermiT University of Oxford UK jcel Tec. Universitat Dresden Germany FaCT++ University of Manchester UK
Matching AROMA INRIA France ASMOV INFOTECH Soft USA Aroma Nantes University France Falcon-AO Southeast University China Lily Southeast University China RiMOM Tsinghua University China Mapso FZI Germany CODI University of Mannheim Germany AgreeMaker Advances in Computing Lab USA Gerome* RWTH Aachen Germany Ef2Match Nanyang Tec. University China
Semantic search K-Search K-Now Ltd UK Ginseng University of Zurich Switzerland NLP-Reduce University of Zurich Switzerland PowerAqua KMi, Open University UK Jena Arq HP Labs, Talis UK
Semantic web service 4 OWLS-MX variants DFKI Germany
29 tools from 8 countries
Nixon L.; García-Castro R.; Wrigley S.; Yatskevich M.; Trojahn-dos-Santos C.; Cabral L. "The state of semantic technology today – overview of the first SEALS evaluation campaigns". 7th International Conference on Semantic Systems (I-SEMANTICS2011). Graz, Austria. 7-9 September 2011.
10 Jena HP Labs UK Sesame Aduna Netherlands Protégé 4 University of Stanford USA Protégé OWL University of Stanford USA NeOn toolkit NeOn Foundation Europe OWL API University of Manchester UK
11 HermiT University of Oxford UK jcel Technischen Universitat Dresden Germany FaCT++ University of Manchester UK WSReasoner University of New Brunswick Canada
2nd Evaluation Campaign
42
41 tools from 13 countries
WP
Tool Provider Country
12 AgrMaker University of Illinois at Chicago USA Aroma INRIA Grenoble Rhone-Alpes France AUTOMSv2 VTT Technical Research Centre Finland CIDER Universidad Politecnica de
Madrid Spain
CODI Universitat Mannheim Germany CSA University of Ho Chi Minh City Vietnam GOMMA Universitat Leipzig Germany Hertuda Technische Universitat
Darmstadt Germany
LDOA Tunis-El Manar University Tunisia Lily Southeast University China LogMap University of Oxford UK LogMapLt University of Oxford UK MaasMtch Maastricht University Netherlands MapEVO FZI Forschungszentrum
Informatik Germany
MapPSO FZI Forschungszentrum Informatik
Germany
MapSSS Wright State University USA Optima University of Georgia USA WeSeEMtch Technische Universitat
Darmstadt Germany
YAM++ LIRMM France
WP
Tool Provider Country
13 K-Search K-Now Ltd UK Ginseng University of Zurich Switzerland NLP-Reduce University of Zurich Switzerland PowerAqua KMi, Open University UK Jena Arq v2.8.2 HP Labs, Talis UK Jena Arq v2.9.0 HP Labs, Talis UK rdfQuery v0.5.1-beta
University of Southampton UK
Semantic Crystal University of Zurich Switzerland Affective Graphs University of Sheffield UK
14 WSMO-LITE-OU KMi, Open University UK SAWSDL-OU KMi, Open University UK OWLS-URJC University of Rey Juan Carlos Spain OWLS-M0 DFKI Germany
Radulovic, F., Garcia-Castro, R., Extending Software Quality Models - A Sample In The Domain of Semantic Technologies. 23rd International Conference on Software Engineering and Knowledge Engineering (SEKE2011). Miami, USA. July, 2011
I need a robust ontology engineering tool and a seman.c search tool with the highest precision
User Quality Requirements
SEALS Pla<orm Tools Repository Service
Results Repository Service
Seman2c Technology
Recommenda2on
RecommendaCon
SemanCc Technology
Quality model
You should use Sesame v2.6.5 and Arq v2.9.0
The reason for this is...
Alterna.vely, you can use ...
Radulovic F.; García-Castro R. "Semantic Technology Recommendation Based on the Analytic Network Process". 24th Int. Conference on Software Engineering and Knowledge Engineering (SEKE 2012). Redwood City, CA, USA. 1-3 July 2012. 3rd Best Paper Award!
• The SEALS Platform facilitates: - Comparing tools under common settings - Reproducibility of evaluations - Reusing evaluation resources, completely or partially - Or defining new ones - Managing evaluation resources using platform services - Computational resources for demanding evaluations
Machine-processable.Combined for manysoftware products ofdifferent types.High availability andquality.
Community.
characteristics of such software products. This workflow issupported by evaluation software that can be used to assessany software product of the type covered by the evaluation;the software product must have previously implemented therequired mechanisms to be integrated with the evaluation soft-ware. Test data and evaluation results are machine-processable;therefore, they can be reused. Furthermore, the results can becombined for all the software products of the same type.
D. Level 4. IntegratedAt this level, several teams in collaboration with relevant
stakeholders (e.g., users or providers) define a generic evalu-ation framework that can be used with any type of softwareproduct. This generic framework for software evaluation al-lows building evaluation resources (i.e., evaluation workflow,tools, test data, and results) upon shared principles and reusingcommon parts. Here, evaluation workflows are defined in amachine-interpretable format so they can be automated. Anevaluation infrastructure gives support both to the evaluationof multiple types of software products, taking into accounttheir different characteristics, and to the management of thedifferent evaluation resources. Test data can be reused acrossdifferent evaluations, and the evaluation results can be com-bined for software products of different types.
E. Level 5. OptimizedAt this level the whole community has adopted a generic
framework for software evaluation in which evaluation work-flows are measured and optimized. The centralized scenarioof the previous levels has now evolved into a federationof autonomous evaluation infrastructures. These evaluationinfrastructures must support not only the evaluation workflowbut also new requirements, such as the interchange of eval-uation resources or the implementation of policies for data
access, interchange, and use. This federation of infrastructurespermits satisfying any software or hardware requirements ofthe different software products; customizing, optimizing, andcurating test data; and improving the availability and qualityof the evaluation results.
One of the notions behind the maturity model, as Figure 2shows, is that a higher maturity level implies higher integra-tion of evaluation efforts in one field, ranging from isolatedevaluations in the lower maturity level to fully-integratedevaluations in the higher level. In this scenario, maturityevolves from a starting point of decentralized efforts intocentralized infrastructures and ends with networks of federatedinfrastructures.
Another notion to consider in this model is that of cost.While the cost of defining new evaluations decreases when thematurity level increases, mainly due to the reuse of existingresources, the cost associated to the evaluation infrastructure(hardware and infrastructure development and maintenance)significantly increases.
VI. ASSESSMENTS IN THE SEMANTIC RESEARCH FIELD
This section presents how we have used SET-MM to assessthe maturity of software evaluation technologies in a specificresearch field.
Other maturity models provide appraisal methods for com-paring with the maturity model. However, we do not proposeany appraisal method because our scope is a whole researchfield and, therefore, it would be difficult to obtain objectivemetrics since any judgment would be subjective.
Therefore, our approach has been, first, to identify someevaluation efforts that stand out because of their impact in thefield and, second, to try to assess the maturity of the softwareevaluation technologies used in them.
Software Evaluation Technology Maturity Model
Automatic Manual
IBSE rdfsbs IRIBA
OWL Lite Import B. Suite
RDF(S) Import B. Suite
RDF(S) Export B. Suite
RDF(S) Interoperability B.
Suite
Tool X Tool Y Tool X Tool Y
RDF(S) Interoperability B. OWL Interoperability B.
UPM-FBI
García-Castro R. "SET-MM – A Software Evaluation Technology Maturity Model". 23rd International Conference on Software Engineering and Knowledge Engineering (SEKE2011). pp. 660-665. Miami Beach, USA. 7-9 July 2011.