Databases & Information Retrieval Maya Ramanath (Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G. Weikum, G. Kasneci, M. Ramanath and F.M. Suchanek, CACM, April 2009 DB & IR: Both Sides Now. G. Weikum, Keynote at SIGMOD 2007)
17
Embed
Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Databases & Information Retrieval
Maya Ramanath
(Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G. Weikum, G. Kasneci, M. Ramanath and F.M. Suchanek, CACM, April 2009
DB & IR: Both Sides Now. G. Weikum, Keynote at SIGMOD 2007)
DB and IR: Different Motivations
• Both deal with large amounts of information, but…
DB IR
Applications online reservation, banking
libraries
Emphasis data consistency, efficiency
result quality, user satisfaction
Data structured records
unstructured text
Queries precise interpretations vary
Results exact match/all results
ranked/top-k results
Why Combine Now?
• The applications drive the need– The need to manage both structured
and unstructured data in an integrated manner
• Healthcare example– Find young patients in central Europe
who have been reported, in the last two weeks, to have symptoms of tropical virus diseases and an indication of anomalies.
• Newspaper archives, product catalogues, etc.
Integrating DB & IR
top-k processing,
keyword search on graphs
IR Systems
extracting entities and
relationships, ranking for
entities
DB SystemsStructured queries / boolean match results(SQL)
• Easy to query with keywords, instead of SQL/XQuery/SPARQL
• Results are the top-k interconnections between the keywords
3. Keyword Search on Graphs (2/3)
3. Keyword Search on Graphs (3/3)
Query: “Einstein”, “Bohr”
vegetarian
Tom Cruise
1962
isa isabornIn
diedIn
Einstein
BohrNobel Prizewon
won
4. Entity and Relationship Extraction (1/2)
Information Extraction (or Knowledge Harvesting)
Bill Gates was the founder of Microsoft and later it’s CEO.
Apple was established on April 1, 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne.
Infosys was founded on 2 July 1981 by seven entrepreneurs: N. R. Narayana Murthy, Nandan Nilekani, …
Company Founder
Microsoft Bill Gates
Apple Steve Jobs
Apple Steve Wozniak
Infosys N. R. Narayana Murthy
4. Entity and Relationship Extraction (2/2)
• How to build a knowledge-base of facts?– Structurize Wikipedia– Construct rules for extraction
• How do I acquire all the facts in the world?– Extract “everything”– Don’t stop extracting
5. Ranking and Structured Data
• Not the same as top-k processing• Given: Data with stucture in it– Relational tables (flat)– XML (trees/graphs)– Text documents consisting of entities
• Task: Rank the query results– SQL/Xquery/”typed” keywords