Ontology-Based Information Extraction and Structuring Stephen W. Liddle † School of Accountancy and Information Systems Brigham Young University Douglas M. Campbell, David W. Embley, ‡ and Randy D. Smith Research funded in part by † Faneuil Research and ‡ Novell, Inc. Copyright 1998
27
Embed
Ontology-Based Information Extraction and Structuring Stephen W. Liddle † School of Accountancy and Information Systems Brigham Young University Douglas.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Ontology-Based InformationExtraction and Structuring
Stephen W. Liddle†
School of Accountancy and Information Systems
Brigham Young University
Douglas M. Campbell, David W. Embley,‡ and Randy D. SmithResearch funded in part by †Faneuil Research and ‡Novell, Inc.
Copyright 1998
Motivation
Database-style queries are effective– Find red cars, 1993 or newer, < $5,000
• Select * From Car Where Color=“red” And Year >= 1993 And Price < 5000
Web is not a database– Uses keyword search– Retrieves documents, not records– Assuming we have a range operator:
• “red” and (1993 to 1998) and (1 to 5000)
Solutions
Web query languages Wait for XML to emerge
– Interoperation/Standards?– XML query language?
Wrappers– Hand-written or semi-automatically
generated parsers– Specific to source site, subject to change
Our Approach
Automatic wrapper generation Based on application ontology
– Augmented conceptual model– Defines constants, keywords, their
relationships Best for:
– Narrow ontological breadth– Data-rich documents
Car-Ad Ontology Object-Relationship Model + Data Frames
Year Price
MakeMileage
Model
Feature
PhoneNr
Extension
Car
hashas
has
has
is for
has
has
has
1..*
0..1
1..*
1..* 1..*
1..*
1..*
1..*
0..1 0..10..1
0..1
0..1
0..1
0..*
1..*
Graphical
Car [0:1] has Year [1:*];Year {regexp[2]: “\d{2} : \b’\d{2}\b, … };Car [0:1] has Make [1:*];Make {regexp[10]: “\bchev\b”, “\bchevy\b”, … };Car [0:1] has Model [1:*];Model {…};Car [0:1] has Mileage [1:*];Mileage {regexp[8] “\b[1-9]\d{1,2}k”, “1-9]\d?,\d{3} : [^\$\d][1-9]\d?,\d{3}[^\d]” } {context: “\bmiles\b”, “\bmi\.”, “\bmi\b”};Car [0:*] has Feature [1:*];Feature {regexp[20]: -- Colors “\baqua\s+metallic\b”, “\bbeige\b”, … -- Transmission “(5|6)\s*spd\b”, “auto : \bauto(\.|,)”, -- Accessories “\broof\s+rack\b”, “\bspoiler\b”, …...
Textual
(See Figures 2 & 3 of Paper)
Fixed Processes ApplicationOntology
OntologyParser
Constant/KeywordRecognizer
Database-InstanceGenerator
UnstructuredDocument
Constant/KeywordMatching Rules
Data-Record Table
List of Objects, Relation-ships, and Constraints
DatabaseScheme
PopulatedDatabase
(See Figure 1 of Paper)
Constant/KeywordRecognizer
Database-InstanceGenerator
UnstructuredDocument
Data-Record Table
PopulatedDatabase
Make : \bchev\b…KEYWORD(Mileage) : \bmiles\bKEYWORD(Mileage) : \bmi\....
create table Car ( Car integer, Year varchar(2), … );create table CarFeature ( Car integer, Feature varchar(10)); ...
Object: Car;...Car: Year [0:1];Car: Make [0:1];…CarFeature: Car [0:*] has Feature [1:*];
'97 CHEV Cavalier, Red, 5 spd, only 7,000 miles on her.Previous owner heart broken! Asking only $11,995. #1415.JERRY SEINER MIDVALE, 566-3800
Constant/KeywordRecognizer
UnstructuredDocument
Constant/KeywordMatching Rules
Data-Record Table
ApplicationOntology
OntologyParser
Constant/KeywordRecognizer
UnstructuredDocument
Constant/KeywordMatching Rules
Database-Instance Generator
insert into Car values(1001, “97”, “CHEV”, “Cavalier”, “7,000”, “11,995”, “556-3800”)insert into CarFeature values(1001, “Red”)insert into CarFeature values(1001, “5 spd”)
Database-InstanceGenerator
Data-Record Table
List of Objects, Relation-ships, and Constraints
DatabaseScheme
PopulatedDatabase
Heuristics
Keyword proximity Subsumed and overlapping constants Functional relationships Nonfunctional relationships First occurrence without constraint
'97 CHEV Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800, 566-3802
'97 CHEV Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800, 566-3802
'97 CHEV Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800
'97 CHEV Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415. JERRY SEINER MIDVALE, 566-3800
Recall & Precision
N
CRecall
IC
C
Precision
N = number of facts in sourceC = number of facts declared correctlyI = number of facts declared incorrectly