Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Snapshot of Semantic Web Commercial State of the Art
(presented at Science on the Semantic Web, Rutgers, October 2002)
Amit Sheth
CTO, Semagix Inc. Large Scale Distributed Information Systems (LSDIS) Lab
It is interesting to note SW = Software has move to SW = Semantic Web
Fundamental Issue
• Ontology Creation and maintenance– Human consensus + automatic KB
(assertion) extraction
• Automatic Semantic Annotation• Extremely fast computations
exploiting semantic metadata– Especially named relationships
Central Role of Metadata
Where is the
content? Whose is
it?
ProduceAggregate
What is this
content about?
Catalog/Index
What other
content is it
related to?
Integrate Syndicate
What is the right
content for this user?
Personalize
What is the best way to
monetize this interaction?
Interactive Marketing
Broadcast,Wireline,Wireless,Interactive TV
Semantic Metadata
ApplicationsBack End
"A Web content repository without metadata is like a library without an index." - Jack Jia, IWOV“Metadata increases content value in each step of content value chain.” Amit Sheth
A Metadata Classification
Data (Heterogeneous Types/Media)(Heterogeneous Types/Media)
Content Dependent Metadata (size, max colors, rows, columns...)(size, max colors, rows, columns...)
Direct Content Based Metadata (inverted lists, document vectors, LSI)(inverted lists, document vectors, LSI)
Domain Independent (structural) Metadata (C++ class-subclass relationships, HTML/SGML(C++ class-subclass relationships, HTML/SGML Document Type Definitions, C program structure...)Document Type Definitions, C program structure...)
Domain Specific Metadata area, population (Census),area, population (Census), land-cover, relief (GIS),metadata land-cover, relief (GIS),metadata concept descriptions from ontologiesconcept descriptions from ontologies
Intelligent Content = What You Asked for + What you need to know!
Syntax Metadata
Semantic Metadata
led by
Same entity
Human-assisted inference
Knowledge-based & Manual Associations
Blended Semantic Browsing and Querying (Intelligence Analyst
Workbench)
Innovations that affect User Experience
• BSBQ: Blended Semantic Browsing and Querying
– Ability to query and browse relevant desired content in a highly contextual manner
• Seamless access/processing of Content, Metadata and Knowledge
– Ability to retrieve relevant content, view related metadata, access relevant knowledge and switch between all the
above, allowing user to follow his train of thought
• dACE: dynamic Automatic Content Enhancement
– Ability to provide enhanced annotation features, allowing the user to retrieve relevant knowledge about significant
pieces of content during content consumption
• Semantic Engine APIs with XML output
– Ability to create customized APIs for the Semantic Engine involving Semantic Associations with XML output to
cater to any user application
VisionicsAcSysSecurity Portal
Check-in
Interrogation
Boarding Gate AirportAirspace
SemagixOntologyMetabase
Threat Scoring
Gov’t WatchlistsNews Media
Web Info
LexisNexisRiskWise
Passenger RecordsReservation Data
Airline DataAirport Data
Airline and Airport Data Future and Current Risks
Airport LEO
ARC AvSec ManagerData Management
Data Mining
IPG
Sources Used
Knowledge Sources:FBI - Most Wanted Terrorists
Denied Persons Lists
Terrorism Files
ICT
Office of Foreign Asset Control (OFAC)
Hamas terrorists
CNN Locations
FAA_Airport_Codes
About.com
Comtex_International
Hindustan Times
JerusalemPost
CNN
Newstrove_Hamas
Content Sources :
Africa News Service
AFX News – Asia/UK/Europe
AP Worldstream
Asia Pulse
BusinessWire
ComputerWire (CTW)
EFE News Services
FWN Select
Itar-TASS
Knight Ridder News (Open)
Knight-Ridder Open
M2 - International
M2 Airline Industry Information
New World Publishing
PR Newswire
PRLine (PRL)
Resource News International
RosBusiness
United Press International
UPI Spotlights
Semagix’s Semantic
Technology enables flight
authorities to :
- take a quick look at the
passenger’s history
- check quickly if the passenger is
on any official watchlist
- interpret and understand
passenger’s links to other
organizations (possibly terrorist)
- verify if the passenger has
boarded the flight from a “high
risk” region
- verify if the passenger originally
belongs to a “high risk” region
- check if the passenger’s name
has been mentioned in any news
article along with the name of a
known bad guy
Interrogation Kiosk – Unique Advantages of Semagix
SmithJohn
SmithJohn
Threat Score Components
LEXIS NEXIS ANNOTATION
Action: Information about or related to the passenger returned by Lexis Nexis is enhanced by linking important entities to Semagix’s rich ontology
Ability Proven: Ability to automatically aggregate relevant rich domain knowledge, recognize entities in a piece of text and further automatically co-relate it with other data in the ontology to present a clear picture about the passenger to the flight official
Flight Coutry Check 45 0.15
Person Country Check 25 0.15
Nested Organizations Check 75 0.8
Aggregate Link Analysis Score: 17.7
LINK ANALYSIS
Action: Semantic analysis of the various components (watchlist, Lexis Nexis, ontology search, metabase search, etc.) to come up with an aggregate threat score for the passenger
Ability Proven: Ability to automatically aggregate relevant rich domain knowledge, recognize entities in a piece of text, automatically co-relate it with other data in the ontology, search for relevant content to present an overall idea of the threat level fo the passenger, allowing him to take quick action
appearsOn watchList:
FBI
ONTOLOGY SEARCH
Action: Semagix’s rich ontology is searched for this name and associated information like position, aliases, relationships (past or present) of this name to other organizations, watchlists, country, etc. are retrieved
Ability Proven: Ability to automatically aggregate relevant rich domain knowledge about a passenger and automatically co-relate it with other data in the ontology to present a visual association picture to the flight official
METABASE SEARCH
Action: Semagix’s rich metabase is searched for this name and associated content stories mentioning the passenger’s name are retrieved
Ability Proven: Ability to automatically aggregate and retrieve relevant content stories, field reports, etc. about the passenger that can be used by flight officials to determine if the passenger has any connections with known bad people or organizations
WATCHLIST ANALYSIS
Action: Semagix’s rich ontology is automatically searched for the possible appearance of this name on any of the watchlists
Ability Proven: Ability to automatically aggregate relevant rich domain knowledge and automatically co-relate it and rank the threat factors to indicate threat level of the passenger on the watchlist front
What it will take RDBMS to support flight security application
Link Analysis Component # Queries (Voquette) # Queries (RDBMS) Time (Voquette) Time (RDBMS)
Direct Watchlist Match (person name)lookup person entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve person's relationships to watchlists 1 SQL Query 1 SQL Query .005 sec .005 sec
Organization Watchlist Match (person name, organization name)lookup person entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve person's relationships to organizations 1 SQL Query 1 SQL Query .005 sec .005 secretrieve the organizations' relationships to watchlists 1 SQL Query 1 SQL Query .005 sec .005 seclook up organization entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve the organizations' relationships to watchlists 1 SQL Query 1 SQL Query .005 sec .005 sec
Nested Organization Watchlist Match (person name, organization name)look up organization entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve the organization's relationships to organizations 1 SQL Query 1 SQL Query .005 sec .005 secretrieve the organizations' relationships to watchlists 1 SQL Query 1 SQL Query .005 sec .005 sec
Flight Origin (country name)retrieve country entity 1 SQL Query 1 SQL Query .005 sec .005 secsee if country is on a list containing "high-risk" countries 1 SQL Query 1 SQL Query .005 sec .005 sec
Person Origin (person name)lookup person entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve person's home country 1 SQL Query 1 SQL Query .005 sec .005 secretrieve the organization's relationships to lists containing "high-risk" countries 1 SQL Query 1 SQL Query .005 sec .005 sec
Field Report Search (person name)perform SSE query for field reports that mention this person 1 SSE Request 2 SQL Queries .03 sec 5-30 secretrieve a list of people associated with these field reports 1 SQL Query 1 SQL Query .005 sec .005 secdetermine which people are on watchlists, terrorists, etc… 1 SQL Query 1 SQL Query .005 sec .005 sec
18 requests 39-64 SQL Queries .33 sec 30-80 sec.
Query Comparison:Semagix vs. RDBMS
Performance
> 10,000 entities/relationships per hr.Population/update rate in a Ontology with 1 million entities/relationships
1 minute (near real-time)Incremental Index Update Frequency