Exemplar Queries: Knowledge Exploration using Information Graphs Davide Mottin, University of Trento August 20, 2015 @ RMIT University, Melbourne Department.

Post on 30-Dec-2015

215 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

Transcript

Exemplar Queries: Knowledge Exploration using Information GraphsDavide Mottin, University of TrentoAugust 20, 2015 @ RMIT University, Melbourne

Department ofInformation Engineering and Computer Science

2 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Short Bio

Education• April 2015 – Now in the job market!: PhD in computer science

from University of Trento• Thesis title: “Advanced Query Paradigms for the Novice User”• Advisors: Prof. Themis Palpanas, Prof. Yannis Velegrakis

• 2010/08: MSc/BSc in computer science

Working Experience• 2012: Yahoo! Labs, Barcelona under Dr. Francesco Bonchi • 2011: Microsoft Research, Beijing under Dr. Haixun Wang

3 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Traditional Query Answering

owns=Search Engine, based=California produces=Mobiles

Database

4 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Hardly Expressible Queries

Query???

Does not know how to describe other companies

Database

5 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

The Exemplar Queries perspective

“I think the greatest way to learn is to learn by someone's example.”

Tobey Maguire

6 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

A different need

7 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Existing Search Engines

acquisitions like Google Youtube

Yahoo!-Tumblr or Microsoft-Skypenot present as interesting acquisitions.

Cannot be solved by Related Queries [Boldi11,Bordino13] and Query Relaxation [Mottin13,Mishra09].

8 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

A new perspective

9 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Exemplar Queries

Input: Qe, an example element of interestOutput: set of elements in the desired result set

Exemplar Query Evaluation• evaluate Qe in a database D, finding a sample s• find the set of elements a similar to s given a similarity relation

[PVLDB 2014, SIGMOD 2014 (Demo)]

10 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Challenges

• Define the similarity between sample and answers• Determine the best data-model for the problem• Find answers efficiently

11 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Our Approach

Exemplar Queries• The user query is an indication of the structure of the answers

12 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Problem

Solution Overview [SIGMOD Record 2014]

13 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

General Solution

Input: User Query Q, an example of the expected resultsOutput: Set of expected results

Procedure:- Detect the sample for the query Q- Find the structures similar to the sample- Rank the results

14 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Data Model: Knowledge graph

14

15 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Strict equality: Edge Isomorphism

15

S A1 A2

Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT16

Similarity: Edge Isomorphism

D. Mottin et al. Exemplar queries: Give me an example of what you need. PVLDB, 7(5), 2014.

17 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

subgraph isomorphism is NP-complete [Cook71]

Solution

Input: User Query Q, an example of the expected results.Output: Set of expected results

Procedure:- Detect the sample for the query Q

- Find the structures edge isomorphic to the sample- Rank the results

- Prune the non-matching nodes

Solution1. IterativePruning: fast

reject non matching nodes

18 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

distance 1 distance 2

a b c a b c

2 0 0 1 2 1

d-neighborhood

distance 1 distance 2

a b c a b c

1 0 0 0 1 1

Query node q1

Graph node 1

Difference

1 0 0 1 1 01 0 0 1 1 0

Theorem

19 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

d-neighborhood

distance 1 distance 2

a b c a b c

1 0 0 0 1 1

distance 1 distance 2

a b c a b c

1 1 1 2 1 0

Query node q1

Graph node 2

Difference

0 1 1 2 0 -10 1 1 2 0 -1

Theorem

20 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

The IterativePruning Algorithm

1. Start from a query node q2. Match q with the graph nodes3. For each adjacent node of q4. Find nodes in the graph from

candidate map of q matching the edge

5. Repeat 2. with an adjacent node of q until all nodes have been visited

Theorem (Pruning Completeness)No subgraph isomoprhic solution is discarded by IterativePruning Algorithm

21 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Solution

Input: User Query Q, an example of the expected results.Output: Set of expected results

Procedure:- Detect the sample for the query Q

subgraph isomorphism is NP-complete [Cook71]

- Prune the non-matching nodes - Find the structures edge isomorphic to the sample- Rank the results

- Restrict the search space

Solution1. IterativePruning: fast

reject non matching nodes

Solution1. IterativePruning: fast

reject non matching nodes

2. RelevantNeighborhood: restrict the search space to “near” nodes

22 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Restricting the search space

22

S A1 A2

User Query

Idea1. Not all the the nodes are equally relevant2. Nodes “far” from the query are less related

23 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

The Relevant Neighborhood Algorithm

Prune the search space by identifying the valuable portions:• Based on an approximation of Personalized PageRank

• Transition matrix A with non-uniform edge weights based on inverse frequency

Procedure1. Assign each node in the sample a fixed number of particles2. Distribute the particles on neighbor nodes favoring sample edge-

labels3. Repeat 2 until the number of particles is less than a threshold

Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT24

Similarity: Simulation

D. Mottin et al. Exemplar queries: a New Way of Searching. Submitted for publication.

25 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Strict equality: Edge Isomorphism

S A1 A2

Why Yahoo! Tumblr are not present?

26 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

More freedom: Simulation

S A1 A2

Tumblr matches both an acquisition and a

website

Match edge-label sequences instead of structures

27 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

• Use Strong Simulation [Ma14], with:• bounded matchings• node-topology preserving

Issue: Strong Simulation preserves node labelsIdea: Apply Strong Simulation algorithm on a graph where edges becomes nodes with label equal to the original edge.

Pruning: • d-neighborhood becomes a boolean vector• a node matches a query node if the boolean and between the two

vectors is positive

Theorem

Algorithms for Simulation

28 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Ranking results

28

S A1 A2

User Query

Google Yahoo! CBS

Combination of two factors1. Structural: similarity of two nodes in terms of neighbor

relationships2. Distance-based: the PageRank already computed

29 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Experimental Setup

Dataset• Freebase: 76M nodes, 314M edges (entire!)

• Freebase Internet Domain: 2M nodes, 6M edges

• Synthetic datasets

• Testset: 100 queries manually mapped from AOL query logs

• Baseline: NeMa [6]: approximate answers on graphs

Measures• Algorithms total time

• User study asking to evaluate the usefulness of our approach

29

30 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Scalability results (10M nodes)

30

Time• RelevantNeighborhood is stable on the number of

answers

• <150ms to get the answers

31 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Usefulness

Quality• 92% people say that

Exemplar Queries are useful

• 62% already had the need for such a service

ComparisonWhich method is preferred? • 64% Exemplar Queries • 30% Other approaches

32 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Simulation vs Isomorphism

32

Analysis• Simulation finds more answers (up to 48%) but aggregates results

• Isomorphism runs faster than simulation (less operations on simple queries)

33 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Qualitative Evaluation

33

Query: Google – YouTube – Menlo Park

Approximate Graph Query Answering [Khan13]

Edge Isomorphism

Simulation

Answers are collapsed

More interesting answers

34 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Size increment for Simulation

25% to 46% more edges than isomorphism: Answers are collapsed

35 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Dealing with too many results

“One of the effects of living with electric information is that we live habitually in a state of information

overload. There's always more than you can cope with.”

Marshall McLuhan

36 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Result Refinement

37 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Information overload

37

I want to know about IT company

acquisitions

38 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Too many results to visualize

39 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Dealing with Information Overload

• Faceted Search• present aspects of the results [Roy08]

• Query reformulation• Modify some of the query conditions

• In structured databases [Mishra09]• In web search [Dang10]

Frist Study of Problem on GRAPHS

40 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Graph Search

40

41 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Graph Query Reformulation

Results

Query

Reformulations:query supergraphs

Exponential numberof reformulations

42 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Challenges

• The number of reformulation is exponential• Quantify the interestingness of a reformulation• Finding query reformulations is NP-complete

43 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

A Naïve Approach: k-most frequent super-graphs

Query

480 matches

450 matches

100 matches

Supergraphs

30 matches420 matches

Until k reformulations are found:- Retrieve the most frequent super-

patternFrequent ≠ Interesting

!

44 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Our Approach

Graph Query Reformulation with Diversity• Finds k meaningful reformulation efficiently

D. Mottin, F. Bonchi, F. Gullo. Graph Query Reformulation with Diversity, KDD 2015.

45 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Finding meaningful Reformulations

Results

Query

Coverage Diversity

Find k meaningful reformulations:1. Span all the results

2. Present different aspects of the results

?

46 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Diversity Matters

Results

Query

Objective function f(Q)

λ = 1• Non optimal: f({Q1’,Q2’}) = 7

• Optimal: f({Q3’,Q4’}) = 8

47 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Problem

Graph Query Reformulation with Diversity

47

Theorem (NP-hardness)The problem reduces to MAX-SUM Diversification Problem, so it is NP-hard

[KDD 2015]

48 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Solution: Greedy Algorithm

Greedy

While k-reformulations are not found

1. Find the reformulation leading to the maximum increment of the objective function (marginal gain)

2. Add the reformulation to the results

48

TheoremThe algorithm is a ½-approximation

Finding the maximum gain is #P-complete

[Valiant79]

Solution

Fast_MMPG: Branch and bound algorithm with quality guarantees

49 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

The multiplicity vector

Results

0 0 0 0 01 1 0 0 02 2 1 1 02 2 2 2 02 3 3 3 1

Output set of reformulations

50 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Upper bound on the Marginal gain

LemmaThe marginal gain increases if the multiplicity of the considered item is where |Q| is the number of reformulations in the reformulated set constructed so far.

Upper bound : is the value of the objective function considering only results with multiplicity

Theorem

51 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Upper bound

Results

0 0 0 0 01 2 1 1 1

Output set of reformulations

1 2 1 1 1

52 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Until the reformulation with the maximum upper bound and marginal gain is not found1. Expand the reformulation with the max upper

bound2. Prune Reformulations with marginal gain

smaller than the upper bound so far

The Fast_MMPG Algorithmupper bound

marginal gain

53 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Experimental Setup

• Datasets: • AIDS: 10k chemical compounds

• Financial: 17k transaction workflows

• Web: 13k interactions with a recommender system

• Baseline algorithms: • k-freq: returns top-k frequent supergraphs of a query

• LIndex: informative patterns index

• Experiments: • Time and objective function value varying k, query size, λ

• Anecdotal

• Scalability

54 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Time Comparison

Number of reformulations1. k-freq runs only slighly faster2. Time increases linearly in k3. Fast_MMPG has real-time

performance

Query size1. Fast_MMPG comparable to k-

freq2. Time decreases with query

size (less reformulations)

number of reformulations (k)

query size

55 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Objective function gain

Analysis1. Lambda correctly moves the objective function towards

diversity2. k-freq only captures coverage

56 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Qualitative evaluation

k-freq

Fast_MMPG

C O

O OH

C

O CH3

C

O Fe

C

O NH2

C

O

CH3

C

CH3

O CH3

C

O CH3

C C

O CH2

C C

O NH2

C

O CH2

C NH

Query

Analysis• k-freq finds reformulation of the same superquery

• Fast_MMPG returns reformulations with more diversified structures

57 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Conclusions

Hardly Expressible Queries • Exemplar Queries: user query is an example of the desired

results

• Efficient algorithmic solution scaling on real knowledge graphs

• Study of 2 similarity measures for query answering

Information Overload • Study of the problem in graph databases

• Principled objective function optimizing coverage and diversity

• Algorithmic solutions with quality guarantees

58 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Other Studied Problems

“There are no right answers to wrong questions.”

Ursula K. Le Guin

59 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Company

BasedRevenue

Mobile

Search

Hardware

Cloud

Apple Cupertino $62B 0 0 0 1

Google M.View $80B 0 1 1 0

HP Palo Alto $30B 0 0 1 0

Yahoo!Sunnyval

e$16B 0 1 0 0

Empty-Answer Problem

COMPANYDB

query = Mobile, Search, Hardware

{}

No answer

60 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Dealing with the Empty Answer Problem

• Ranking results based on user preferences• IR [Baeza11] and database solutions [Chaudhuri04]

• Query relaxation• Modify some of the query conditions [Mishra09]

• (-) Suggests all the modification together• (-) Does not take user feedback into account

61 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Our Solution: Interactive Query Relaxation

• Suggests one relaxation at a time• Takes user feedback into account• Models user preferences• Optimization centric relaxation suggestions• User centric (effort, relevance)

• System-centric (profit)

[PVLDB 2013, SIGMOD 2014 (Demo)]

62 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Conclusions

We propose

• Exemplar Query Framework on Information Graphs: user query is an example of the desired results

We study

• Exemplar Query Answering: efficiently answering and ranking of exemplar queries

• Graph Query Reformulation: provide insights of the exemplar query answers

We show

• Solutions scaling on real size information graphs

• Principled approaches with quality guarantee

• Practical applicability of the problem

63 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Future Directions

Query reformulation in connected-graphs• Current: set of small graphs (simulated in big graphs)

Include User preferences• In exemplar queries• In graph query reformulation

Multiple exemplar queries• Current: single exemplar queries• With multiple exemplar queries semantics changes

Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT64

Questions?

Thank you!

65 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Publications

Hardly Expressible Queries• D. Mottin, M. Lissandrini, Y. Velegrakis, T. Palpanas. Exemplar queries: Give

me an example of what you need. PVLDB, 7(5), 2014.• D. Mottin, M. Lissandrini, Y. Velegrakis, T. Palpanas. Searching with XQ: the

eXemplar Query Search Engine. SIGMOD, 2014.• M. Lissandrini, D. Mottin, D. Papadimitriou, T. Palpanas, Y. Velegrakis.

Unleashing the power of information graphs. SIGMOD Record, 43(4), 2014.

• D. Mottin, M. Lissandrini, Y. Velegrakis, T. Palpanas. Exemplar queries: A new Way of Searching. (under submission)

Information Overload• D. Mottin, F. Bonchi, F. Gullo. Graph Query Reformulation with Diversity.

(KDD 2015)

Empty-Answer• D. Mottin, A. Marascu, S. B. Roy, G. Das, T. Palpanas, Y. Velegrakis. A

probabilistic optimization framework for the empty-answer problem. PVLDB, 6(14), 2013.

• D. Mottin, A. Marascu, S. B. Roy, G. Das, T. Palpanas, Y. Velegrakis. IQR: An interactive query relaxation system for the empty-answer problem. SIGMOD, 2014

• D. Mottin, A. Marascu, S. B. Roy, G. Das, T. Palpanas, Y. Velegrakis. A holistic and principled approach for the empty-answer problem. (under submission)

66 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Bibliography

[Mishra09] C. Mishra and N. Koudas. Interactive query refinement. In EDBT, 2009.

[Roy08] S. Basu Roy, H. Wang, G. Das, U. Nambiar, and M. Mohania. Minimum-effort driven dynamic faceted search in structured databases. In CIKM, 2008.

[Chadhuri04] S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum. Probabilistic ranking of database query results. In VLDB, 2004.

[Baeza11] R. A. Baeza-Yates and B. A. Ribeiro-Neto. Modern Information Retrieval. 2011.

[Haveliwala02] T. H. Haveliwala. Topic-sensitive pagerank. In WWW, 2002.

[Cook71] S. A. Cook. The complexity of theorem-proving procedures. In Symposium on Theory of Computing, 1971.

[Ma14] S. Ma, Y. Cao, W. Fan, J, Huai, and T. Wo. Strong simulation: Capturing topology in graph pattern matching. TODS, 2014.

66

67 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Bibliography

[Valiant79] Leslie G Valiant. The complexity of computing the permanent. Theoretical computer science, 1979.

[Dang10] V. Dang and B.W.Croft. Query reformulation using anchor text. In WSDM, 2010.

[Bordino13] I. Bordino, G. De F. Morales, I. Weber, and F. Bonchi. From machu picchu to rafting the urubamba river: anticipating information needs via the entity-query graph. In WSDM, 2013.

[Boldi11] P. Boldi, F., C. Castillo, and S. Vigna. Query reformulation mining: models, patterns, and applications. Information retrieval, 2011.

[Khan13] A. Khan, Y. Wu, C. C. Aggarwal, and X. Yan. Nema: Fast graph search with label similarity. In PVLDB, 6(3), 2013.

67

68 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Research Topics

Probabilistic databases• Consider probabilistic knowledge bases to capture noise and

uncertainty• Propose solutions that cope with many world semantics• Propose novel similarity measures for exemplar queries• Define reformulations in a probabilistic fashion

Exemplar Query Answering Framework• Study the problem of identifying exemplar queries need• Propose solutions for keyword queries to graph samples• Extend current solution with incomplete queries or multiple queries• Include reformulation capabilities • Study exemplar queries in other context (research papers,

newspapers, …)

69 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

Back-up slides

70 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT

RelevantNeighborhood

top related