Top Banner
Data Intelligence Lab Yonsei University, Korea Seung-won Hwang http://dilab.yonsei.ac.kr/~swhwang
21

Yonsei University, Korea Seung-won Hwang ...

Apr 21, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Yonsei University, Korea Seung-won Hwang  ...

Data Intelligence LabYonsei University, Korea

Seung-won Hwanghttp://dilab.yonsei.ac.kr/~swhwang

Page 2: Yonsei University, Korea Seung-won Hwang  ...

Research background

PhD(@2005) on left-hand side

Recent work on right-hand side

2

Page 3: Yonsei University, Korea Seung-won Hwang  ...

Entity search as a platform

Browser Spreadsheet

3

Page 4: Yonsei University, Korea Seung-won Hwang  ...

Entity search as a platform

Mobile search And beyond!!

4

Page 5: Yonsei University, Korea Seung-won Hwang  ...

Knowledge WebSemantic Web Human readable vs machine r

eadable contents

Human defines standard for data formats and models

Explicit and precise specification of knowledge representation that everyone has to agree upon

Machine reads human readable contents

Machine learns to conflate different formats of the same thing

Latent and fuzzy representation of knowledge learned by mining big data

Excerpt from Kuansan Wang’s keynote slides@ICADL2015

Page 6: Yonsei University, Korea Seung-won Hwang  ...

Recent Work

Harvesting, Completion (#1,#3)AAAI, ICDE, VLDB, VLDB Journal

Linking, Multilingual linking (#2)ACL, EMNLP, ACM TOIS, IEEE TKDE

PerformanceSIGIR, WSDM, VLDB

6

isA(Bordeaux, wine)=??isProperty(wine|age,texture,aroma)=0.8Verb?

황승원 黃升嫄

Page 7: Yonsei University, Korea Seung-won Hwang  ...

Conflation: Graph

7

Rij is confidence of G.i matches G’.j

Propagate matching confidence of G.i and G’.j neighbors

Repeat #1 and #2 until convergence

Rij

Rij

Page 8: Yonsei University, Korea Seung-won Hwang  ...

Search performance as a platform

8

Diverse software generate search queries

Consistent low latency is crucial

“Microsoft” Long

Short

Cost prediction

Resource manager

Prediction model

SIGIR14, WSDM15 Best paper runner-up

Page 9: Yonsei University, Korea Seung-won Hwang  ...

Cost prediction features

9

Inverted index for “Microsoft”

Processing Not evaluated

Doc 1 Doc 2 Doc 3 ……. Doc N-2 Doc N-1 Doc N

Docs sorted by static rankHighest Lowest

……. …….

Score distribution (mean,max,var), #postings, etc

Page 10: Yonsei University, Korea Seung-won Hwang  ...

Advanced features for automatic refinement

10

<Fields related to query execution plan>rank=BM25Fenablefresh=1 partialmatch=1language=en location=us ….

<Fields related to search keywords>Redmond (MS or Microsoft)

Page 11: Yonsei University, Korea Seung-won Hwang  ...

Performance when deployed

11

50

100

150

2005

0

10

0

15

0

20

0

25

0

30

0

35

0

40

0

45

0

50

0

55

0

60

0

65

0

70

0

75

0

80

0

85

0

90

0

95

0

Re

spo

nse

Tim

e

(ms)

Query Arrival Rate (QPS)

Sequential

Degree=3

Predictive

50% throughput increase

Page 12: Yonsei University, Korea Seung-won Hwang  ...

Spatial KB and search as a platform

Devices as a producer/consumer of information

Location as a first-class citizen context

12

Page 13: Yonsei University, Korea Seung-won Hwang  ...

13

Page 14: Yonsei University, Korea Seung-won Hwang  ...

Conflation for spatial entity [AAAI16, ICDM15]

KB harvesting Map translation

Intelligent query expansion (“seattle center” “seattle center” or “space needle” or “Chihuly museum”)

14

Page 15: Yonsei University, Korea Seung-won Hwang  ...
Page 16: Yonsei University, Korea Seung-won Hwang  ...

Performance [VLDB16]

Automatic query expansion restaurant restaurant OR banquet

“seattle center” “seattle center” or “space needle”

Multiple keywords Complex AND/OR with location

Example

T = ((restaurant OR banquet) AND (vegetarian OR halal)OR ((hotel OR resort) AND wifi)OR …S = user location (Seoul)

16

Page 17: Yonsei University, Korea Seung-won Hwang  ...

Additional technical challenges

17

S2I: Text-first indexGood at selective keyword predicates

IR-tree: Augmented R-treeGood at selective spatial predicates

Crane:Good at narrow-necked vessel

Fox:Good at bowl

Page 18: Yonsei University, Korea Seung-won Hwang  ...

Challenges Cost model design

Exponential possible ways(solution space)

Efficient optimization

Theoretic guarantee

······

Additional technical challenges

Our approach

18

Measuringthe problem(Cost model)

Proposingthe solution(Optimization)

Page 19: Yonsei University, Korea Seung-won Hwang  ...

Additional technical challenges

Base mapping (spatial keyword processing part) Intersection (keyword predicate processing part)

Optimized solution Base mapping is optimized with the following five techniques.

19

Name Space Reduction(↑ better)

Alg. Cost(↓ better)

TheoreticBound(↓ better)

T1 Single verification pop up 2𝐾 Linear OPT

T2 Intersection push down 2𝐹 Linear 5

3

𝐹X OPT

T3 Least selective intersection first ෑ

𝑖=1

𝑁

𝑀𝑖! ⋅ 𝐶𝑀𝑖Sorting OPT

T4 Modified Huffman union tree 𝐶𝑁 Sorting OPT

T5 Verification selection 2σ𝑖=1𝑁 (𝑀𝑖−1)

Exp. OPT

Linear 2X OPT

······

Page 20: Yonsei University, Korea Seung-won Hwang  ...

Additional technical challenges

20

Base mapping [134.7 ms] vs. Optimized solution [1.8 ms]

0

500

1000

99th Allqueries

99th Allqueries

99th Allqueries

Base kNN TopK

Res

po

nse

tim

e (m

s)

Baseline-I

Baseline-P

Base mapping

Optimized solution

Up to 11 times faster

𝒮𝐼 𝑆 𝒦𝛪(wifi)

V

𝒦𝛪(halal)𝒦𝛪(banquet)

V

𝒮𝐼 𝑆

𝒦𝛪(halal)

𝒦𝛪(wifi)

𝒦𝛪(banquet)

Page 21: Yonsei University, Korea Seung-won Hwang  ...

Thanks!!! Understanding Emerging Spatial Entities, AAAI 2016 Fine-grained Semantic Conceptualization of FrameNe

t, AAAI 2016 Verb Pattern: A Probablistic Semantic Representation

of Verbs, AAAI 2016 Processing and Optimizing Main Memory Spatial-

Keyword Queries, VLDB 2016 KSTR: Keyword-aware Skyline Travel Route Recomme

ndation, ICDM 2015 Delayed-Dynamic-Selecive (DDS) Prediction

for Reducing Extreme Tail Latencies in Web Search, WSDM 2015 (Best Paper Runner-up)

Predictive Parallelization: Taming Tail Latencies in Web Search, SIGIR 2014

Overcoming Asymmetry in Entity Graphs, IEEE TKDE 14

ARIA: Asymmetry-Resistant Instance Alignment, AAAI 14

Bootstrapping Entity Translation on Weakly Comparable Corpora, ACL 13

Entity Translation Mining from Comparable Corpora: Combining Graph Mapping with Corpus Latent Features, IEEE TKDE 13

Efficient Entity Translation Mining: A Parallelized Graph Alignment Approach, ACM TOIS 12

Web Scale Taxonomy Cleansing, VLDB 2011 Mining Entity Translations from Comparable Corpora:

A Holistic Graph Mapping Approach ,CIKM 2011 SocialSearch: Enhancing Entity Search with Social Net

work Matching ,EDBT 2011

21

AnyQuestions?Visit dilab.yonsei.ac.kr/~swhwang