Top Banner
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07
24

1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

Dec 29, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

1

Cross-Lingual Query Suggestion Using

Query Logs of Different Languages

SIGIR 07

Page 2: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

2

Abstract

• Query suggestion– To suggest relevant queries for a given query– To help users better specify their information

needs

• Cross-Lingual Query Suggestion (CLQS): – For a query in one language, we suggest similar or

relevant queries in other languages.• cross-lingual keyword bidding (Search Engine)

• cross-language information retrieval (CLIR)

Page 3: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

3

Introduction

• CLQS vs. Cross-Lingual Query Expansion – Full queries formulated by users in another

language.

• The users of search engines – similar interests in the same period of time– queries on similar topics in different languages

• Key point– How to learn a similarity measure between two

queries– MLQS: Term Co-Occurrence based MI and 2

Page 4: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

4

Estimating Cross-Lingual Query similarity

• Discriminative Model for Estimating Cross-Lingual Query Similarity

• Monolingual Query Similarity Measure Based on Click-through Information

• Features Used for Learning Cross-Lingual Query Similarity Measure– Bilingual Dictionary– Parallel Corpora– Online Mining for Related Queries– Monolingual Query Suggestion

• Estimating Cross-lingual Query Similarity

Page 5: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

5

Discriminative Model for Estimating Cross-Lingual Query Similarity – 1/2

– qf : a source language query

– qe : a target language query

– simML : Monolingual query similarity

– simCL : Cross-lingual query similarity

– Tqf : translation of qf in the target language

Page 6: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

6

Discriminative Model for Estimating Cross-Lingual Query Similarity – 2/2

• Learning: LIBSVM regression algorithm– f : feature functions– : mapping feature space onto kernel space– w : weight vector in the kernel space

– relevant vs. irrelevant– strongly relevant, weakly relevant or irrelevant

Page 7: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

7

Estimating Cross-Lingual Query similarity

• Discriminative Model for Estimating Cross-Lingual Query Similarity

• Monolingual Query Similarity Measure Based on Click-through Information

• Features Used for Learning Cross-Lingual Query Similarity Measure– Bilingual Dictionary– Parallel Corpora– Online Mining for Related Queries– Monolingual Query Suggestion

• Estimating Cross-lingual Query Similarity

Page 8: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

8

Monolingual Query Similarity Measure Based on Click-through Information

• click-through information in query logs [26]

• KN(x) : number of keyword in a query x

• RD(x) : number of clicked URLs for a query x

• = 0.4 , =0.6

Page 9: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

9

Estimating Cross-Lingual Query similarity

• Discriminative Model for Estimating Cross-Lingual Query Similarity

• Monolingual Query Similarity Measure Based on Click-through Information

• Features Used for Learning Cross-Lingual Query Similarity Measure– Bilingual Dictionary– Parallel Corpora– Online Mining for Related Queries– Monolingual Query Suggestion

• Estimating Cross-lingual Query Similarity

Page 10: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

10

1. Bilingual Dictionary – 1/2

– 120,000 unique entries (built-in-house)– Given an input query qf={wf1,wf2,…,wfn} (in source languag

e)– By bilingual dictionary D: D(wfi)={ti1,ti2,…,tim}

– C(x,y) is the number of queries in the log containing both x and y.

– C(x) is the number of queries in the log containing x. – N is the total number of queries in the log

Page 11: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

11

1. Bilingual Dictionary – 2/2

– The set of top-4 query translations is denoted as S(Tqf)

– T S(Tqf)• Retrieve all queries containing T in target language and

assign Sdict(T) as their value

Page 12: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

12

2. Parallel Corpora– Given a pair of queries

• qf : in the source language • qe : in the target language

– Bi-Directional Translation Score : • IBM model 1 & GIZA++ tool

• P(yj|xi) is the word to word translation probability

– Top 10 queries {qe} with qf from the query log

Page 13: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

13

3. Online Mining for Related Queries – 1/3

• OOV is a major knowledge bottleneck for query translation and CLIR

• Assumption :– A query in the target co-occurs with the source

query in many web pages– They are probably semantically related – but, amount of noise

Page 14: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

14

3. Online Mining for Related Queries – 2/3

– Frequency in the Snippets• For example:

– Given a query q=abc in source language

– By dictionary : a={a1,a2,a3}, b={b1,b2} and c={c1}

– Web query : q ^ (a1 v a2 v a3) ^ (b1 v b2) ^ (c1) in target language

– 700 snippets , most frequent 10 target queries

Page 15: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

15

3. Online Mining for Related Queries – 3/3

– Any query qe mined from the web will be associated with a feature CODC Measure with SCODC(qf,qe)

Page 16: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

16

4. Monolingual Query Suggestion

• Q0 : candidate queries (in target language)

– For each target query qe,

• SQML(qe) : monolingual source query

Page 17: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

17

Estimating Cross-Lingual Query similarity

• Discriminative Model for Estimating Cross-Lingual Query Similarity

• Monolingual Query Similarity Measure Based on Click-through Information

• Features Used for Learning Cross-Lingual Query Similarity Measure– Bilingual Dictionary– Parallel Corpora– Online Mining for Related Queries– Monolingual Query Suggestion

• Estimating Cross-lingual Query Similarity

Page 18: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

18

Estimating Cross-lingual Query Similarity

• Four categories of features are used to learn the cross-lingual query similarity.

• cross-lingual query similarity score– Learning: LIBSVM regression algorithm

• f : feature functions

• : mapping feature space onto kernel space

• w : weight vector in the kernel space

Page 19: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

19

Performance Evaluation – Log Data

• Data Resources : – MSN Search Engine

• French (source language) vs. English ( target language)– A one-month English query log

– 7 million unique English queries

– Occurrence frequency more than 5

• 5,000 French queries – 4,171 queries have their translations in the English queries

– 70% training weight of LIBSVM

– 10% development data

– 20% testing

Page 20: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

20

Performance Evaluation - CLIR

• Data Resources : – TREC6 CLIR data (AP88-90 newswire, 750MB)– 25 short French-English queries Pairs (CL1-CL25)

• average long 3.3

• match in the web query logs for training CLQS

Source Language

Target Language

BM25

CLIR

CLQS {q

e}qf

Page 21: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

21

• CLQS

Page 22: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

22

Page 23: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

23

• CLIR

Page 24: 1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

24

Conclusion

• Cross-lingual query suggestion

• Query Logs

• French to English

• TREC6 French to English CLIR task– CLQO demonstrates the high quality