Entity Search Are you searching for what you want? Kevin C. Chang Joint work with: Bin He, Zhen Zhang, Chengkai Li, Govind Kabra, Shui-Lung Chuang, Joe Kelley, Tao Cheng, Bill Davis, Mitesh Patel, Dave Killian
Jan 02, 2016
Entity SearchAre you searching for what you want?
Kevin C. ChangJoint work with: Bin He, Zhen Zhang, Chengkai Li, Govind Kabra,
Shui-Lung Chuang, Joe Kelley, Tao Cheng, Bill Davis, Mitesh Patel, Dave Killian
2
What have you been reading
lately?
Let’s start with the new universal greeting…
What have you been searching
lately?
3
From the MetaQuerier to WISDM:I am becoming superficial…
Access
Structure
Deep Web Surface Web
Kevin’s 2 projects in the 4-quardants:
4
First Question:
Where is U. of Illinois?
Can we search it?
5
What have you been searching lately? The university and area of Kevin Chang? The email of Marc Snir? Customer service phone number of Amazon? What profs are doing databases at UIUC? The papers and presentations of ICDE 2007? Due date of SIGMOD 2007? Sale price of “Canon PowerShot A400”? “Hamlet” books available at bookstores?
6
Are we searchingfor what we want?
Challenge of the surface Web:Despite all the glorious search
engines…
7
What you search is not what you want.
8
Function follows view:
What is “the Web”? Or: How do search engines view the Web?
9
They say: Web is a corpus of PAGES.
10
We take an entity view of the Web:
11
What is an “entity”? Your target of information– or, anything.
Phone number Email address PDF Image Person name Book title, author, … Price (of something)
12
From pages to entitiesTraditional Search Entity Search
13
Demo.We build Ver. 0.1,
to understand the promises and issues.
Three scenarios: Academic: CS sites, DBLP homepages. ECommerce: Books, Cellphones. Yellowpage: Comprehensive corpus.
14
Special Thanks:Data from Stanford WebBase.
15
Example application: Question answering
Q: Who are DB profs at UIUC?
WISDM
query: #dtf-nnuw100(#entity(professor) #entity(university) #entity(research Database Systems, Data Mining, IR))
results: ranked list of (<prof, univ, research>, )
Query Generation
Querying
Filtering& Validation
A: Geneva Belford, Kevin C. Chang, AnHan Doan, Jiawei Han, Marianne Winslett , ChengXiang Zhai
16
Example application: Relation construction
… …… …… …
[email protected] Winslett
[email protected] DeWitt
emailphoneprof
<prof, phone, email>
WISDM
tagging: #entity(prof)
query: #tf-nnow50(#entity(professor) #tf-nnuw20(#entity(email) #entity(phone)))
results: ranked list of (<prof, phone, email>, )
App-specificEntity Tagging
Querying
RelationConstruction
17
Example application: Best-effort integrationPrice of “Hamlet”?
WISDM
query: #od50(#entity(title Hamlet) #entity(price))
results: ranked list of (<title, price>, )
Buy.com: $ $10.99, Amazon.com: $12.00… …
Query Generation
Querying
Validation& Ranking
18
How different is “entity search”?How to define such searches?
19
Why is Entity Search different…
Probabilistic entities v.s. A page is for sure a page.
Contextual patterns v.s. Match a page by its content.
Holistic Aggregates v.s. A page occurs only once.
Associative results v.s. We never search for pairs of pages.
20
Consider the entire process:Page Retrieval
1. Input: pages.
2. Criteria: content keywords.3. Scope: Each page itself.
4. Output: one page per result.
Marc Snir
Marc Snir
21
Entity search is thus different…Entity Search
1. Input: probabilistic entities.
2. Criteria: contextual patterns.3. Scope: holistic aggregates.
4. Output: associative results.
22
What are technical
challenges?
Or, how to write (reviewer-friendly) papers?
23
Issue #1. EntityRank: How to rank entities?Say, Jiawei Han with #email, #phone, #researcharea Entity matters
Is “jhan@” an email? Is “2-3457” a phone? Context matters:
Order, distance Frequency matters:
How often is Jiawei Han – “data mining”? Associativity matters:
“[email protected]” “algorithm”
Source matters: Where did you get this info from?
24
Issue #2: Query Processing: How to optimize?
phone
tf
#entity(professor)
prof=“…”
“fax”-#entity(phone)
nnow50
Q: #tf-nnow50(#entity(professor[David DeWitt]) fax #entity(phone))
(pre-materialized context index)
25
Conclusion: One step at a time towards …
Integration Mining
Search
surface
deep
What You Search Is What You Want!
26
Thank You!
Chengkai LiZhen Zhang ShuiLung Chuang
Tao ChengGovind Kabra
And the warriors behind …
Arpit Jain
Amit Behal David Killian
Yuping Tseng
Hanna Zhong Ngoc Bui Sonia Jahid
Aniruddh Nath Paul Yuan Raj Sodhi
Quoc Le
Hemanta MajiSung-Eun Kim
27
Thank You!
Chengkai LiZhen Zhang ShuiLung Chuang
Tao ChengGovind Kabra Arpit Jain
Amit Behal David Killian
Yuping Tseng
Hanna Zhong Ngoc Bui Sonia Jahid
Aniruddh Nath Paul Yuan Raj Sodhi
Quoc Le
Hemanta MajiSung-Eun Kim
And the warriors behind …