Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom B, 1 Baker CJO 1 Department of Computer Science and Applied Statistics, University of New Brunswick, Saint John, Canada 2 Innovatia, Inc, Saint John, Canada
41
Embed
Algorithm to populate Telecom domain OWL-DL ontology with A-box object properties derived from Technical Support Documents 1 Kouznetsov A, 2 Shoebottom.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Algorithm to populate Telecom domain OWL-DL ontology
with A-box object properties derived from Technical Support Documents
1Kouznetsov A, 2Shoebottom B, 1Baker CJO
1 Department of Computer Science and Applied Statistics, University of New Brunswick, Saint John, Canada2 Innovatia, Inc, Saint John, Canada
Motivation: Why Ontology-Centric?
• Problem: To respond information requests timely contact center workers need to search through many types of knowledge resources
• Challenge: increasing quality of service and decreasing contact center costs
• Solution: using the ontology centric‐ platform– less escalation to more experienced workers– less time spent in resolving cases– training time is also greatly reduced
Motivation: Why Text Mining?
• Problem : Significant time spent by highly educated experts in populating ontology.
• Challenge: Reduce the workload• Solution: Apply text mining - semiautomatic
method for extracting information, specifically named entities and their relations, from texts and populating a domain ontology.
Focus
• We are focused on the problem of accurately extracting and populating relations between the named entities and presenting them as object properties between A-box individuals in an OWL-DL ontology.
Populate A-box Object Property. Single Property
Domain ClassMan
Range ClassWoman
Object Property
hasSister
Domain InstanceSamuel
Range InstanceMary?
T-Box
A-Box
Populate A-box Object Property. Multi-properties
Domain ClassMan
Range ClassWoman
Object Property
hasSister
T-Box
A-Box
Object Property
hasMother
Domain Instance
SamuelRange Instance
MaryhasSister
?
hasMother
?
More complicate case….
Domain Instance
SamuelRange Instance
Mary
hasSister ?
hasMother ?
hasSameLastName
?
Methodology
• Ontology-based information retrieval applies Natural Language processing (NLP) to link text segments, named entities and relations between named entities to existing ontologies.
• Algorithm leverages a customized gazetteer list, including lists specific to object property synonyms
• Score A-box property candidates by using functions of distance between co-occurred terms.
• A-box Property prediction and population based on these scores (Thresholds, Fuzzy approach)
Main Implementation tools
Java
GATE/JAPE
OWLAPI
Semi-Automatic Ontology populating pipeline
Source Documents
XML
Preprocessing
SynonymsLists
TextSegmentsProcessing
TextSegments
Separation
Sentences
Tables
Other Text Segments
Ontologyunpopulated
(OWL)
Term List(Excel)
OntologyPopulation
Named Entities
Single Relations
MultiRelations
Populated Ontology
Using Ontology
Reasoning
Visualizing
VisualQueries
Connecting Recourses
Populating Ontology
Scoring Framework
Co-occurrence Based Scores
generator
Relation Framework for A-box
candidates extraction
Candidate
Decision Framework
Decisionmodule
Reasoning
Ontology
Scores
Focus
LabelledDataTres
Co-occurrence Based Scores generator
Co-occurrence Based Scores generator (Light version)
A-box CandidateAll related content
Scores
Relations Framework
Relation Object
Tokenizer
Gazetteer
Score calculator
IntegratorFragments Processor
Synonyms List
Generation of Scores
• Relation Collection
Framework to process Relation objects
• Relation Object
integrates object property with:• all types of related text fragments• ontology objects• and score processing intermediate and final results
identified as : Domain Class: Domain Instance : Object Property : Range Class: Range Instance
Scores Generator: Details
Score Calculator: • Score calculation for text fragments associated
with the Relation .
• Current version based on distance between occurred entities and number of text fragments with co-occurrence
• Includes by Text Fragments Processor and Integrator
2-terms and 3-terms scoring system
Tokenizer
Score Gazeteer
ScoreProcessor
Domain Synonyms list
RangeSynonyms list
Object Property
Synonyms list
Tokenized sentence
sentencescore
Legend Legacy (2 terms) System
Modified/Added on new (3 terms) system
Multiple Formats Score Generation
Technical documentation contains knowledge displayed in multiple formats, each requiring different processing subroutines:
• Table Processing• Sentence Processing• Other segments
sentence before cleaning: ["<Paragraph></Action> <Figure Numbered="Unnumbered" Position="Inline" TextSize="medium" Width="column" frame="all" id="DLM-11334063" xml:lang="en"><image border-style="none" border-width="medium" xml:lang="en" href="ERGNN46205-301Loosening_screws_on_the_SDM_FW4_8010co_chassis33b.png"/></Figure></Step><Step xml:lang="en"><Action><Paragraph xml:lang="en">Rotate the insert/extractlevers to eject the 8660 SDM from the chassis.] Final Score=9.99000999000999E-4 Best Bonus=0.0 Final Distance=1000.0
Domain Synonyms:•8010co chassis•8010co Chassis•8010 CO chassis•8010co•8010CO chassis
Range Synonyms:
•Screws•screws
Example Sentence Type 2
sentence after cleaning: In a chassis that includes two power supplies in a non redundant power configuration, you must start both restrictions dual power supplies power supply units within 2 seconds of each other.
sentence after cleaning: In a chassis that includes two power supplies in a non redundant power configuration, you must start both restrictions dual power supplies power supply units within 2 seconds of each other.
Final Score=10.05Best Bonus=10.0 Final Distance=19
< > </ >1 2 3D R6P Distance: 6, Bonus Constant =10, Tokens in Property=2, Score= 1/(6+1)+2*10=20.14
Distance: 6, Bonus Constant=10, Tokens in Property=1, Score= 1/(6+1)+1*10=10.14
P
3
Bonus= Bonus Constant * Number of tokens in property
Sentence Example: Device X does not support Device Y
Object Properly Tokens Number Obtained Score Support 1 1/(3+1)+1*10=10.25
Not Support 2 1/(3+1)+2*10=20.25 V
Normalization
• Norm coefficient for A-box object property
Log(1.0+(NSD+1.0/Cd) *(NSR+1.0/Cr) )NSD – Number Of Sentences Domain OccurredCd – Domain Synonyms List CardinalityNSR – Number Of Sentences Range OccurredCr – Range Synonyms List Cardinality
Gold Standard and Evaluation Framework
A-BoxOntology
T-Box Ontology
LabelsEvaluation
Report
Source Documents
XML
Preprocessing
Synonyms
Lists
TextSegmentsProcessing
TextSegment
sSeparati
on
Sentences
Tables
BulletLists
Ontologyunpopulated
(OWL)
Term List(Excel)
OntologyPopulation
Named
Entities
Single Relatio
ns
MultiRelatio
ns
Populated Ontology
Using Ontology
Reasoning
Visualizing
VisualQueries
Connecting
Recourses
PopulateOntology
Prediction evaluation Framework
Evaluate predictedProperties
/Update DB
Golden StandardDatabase
Import labels
KnowledgeEngineer
Thresholds: Decision Boundary
All scores for each A-box property candidate are summarized for based on eligible sources of evidence for the A-box in question
Threshold in use Trade off - Recall vs. Precision
Results for Tables: Baseline result
Focus on Positive class Recall and Positive class Precision
Class of interest (Positive class) Recall =0.80 Precision=0.85
Results for Tables: Continued
Focus on Positive class Precision
Class of interest (Positive class) Recall =0.25 Precision=1.0
Results for Tables: Continued
Focus on Positive class Recall
Class of interest (Positive class) Recall =1.0 Precision=77.5
Results for Sentences
Focus on Positive class Precision
Class of interest (Positive class) Recall =0.14 Precision=1.0
Results for Sentences and Tables
Focus on Positive class Precision Class of interest (Positive class)
Recall =0.4 Precision=1.0
Synergetic effect of using Sentences and Tables (wrt Precision=1.0):