Top Banner
KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching Kumiko TANAKA-Ishii Masato YAMAMOTO Hiroshi NAKAGAWA Language Informatics Laboratory, University of Tokyo
33

KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Jan 07, 2016

Download

Documents

caroun

KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching. Kumiko TANAKA-Ishii Masato YAMAMOTO Hiroshi NAKAGAWA Language Informatics Laboratory, University of Tokyo. How do you say 「無線 LAN 」 in French?. =. wireless. Never be found in dictionaries…. What could be done. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

KIWIA Multi-Lingual Usage Consultation Tool

based on Internet Searching

Kumiko TANAKA-IshiiMasato YAMAMOTOHiroshi NAKAGAWA

Language Informatics Laboratory,University of Tokyo

Page 2: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

How do you say

「無線 LAN 」 in French?

Never be found in dictionaries…

wireless

=

Page 3: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

What could be done

1. Look up part of key in the dictionary

   無線 → sans fil

2. Enter the translation into search engine

   

Page 4: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

1. Look up part of key in the dictionary

   無線 = sans fil

2. Enter the translation into search engine

   

le reseau sans fil le net sans fil l’acces sans fil

Top 20

What could be done

Page 5: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

1. Look up part of key in the dictionary

   無線 = sans fil

2. Enter the translation into search engine

   

les reseaux sans fil l’internet sans fil

Sum up top 500

What could be done

Page 6: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Similar Ocassions….• Up to date expression   * sans fil les reseaux sans fil• Commonness of expression (le reseaux)/(les reseaux) sans fil  • Simple Q&A    * Zidane• Grammar check    -noun gender un/une langage, un/une langue -preposition discuter ? -articles    du/de Japon, du/de Nancy

Have clues but can’t remember exactly…

Page 7: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Our Idea• Multiple candidates Which one?    • Minority candidate 300th candidate?

  Impossible to manually scan 500 candidates!

A tool for scanning search engine results

Kiwi

Page 8: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Related Work 1 : www.webcorp.org.uk

ー The Web as Corpus ー1999

Page 9: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

• English only• Sum up fixed length words• Slow!!

Related Work 1 : www.webcorp.org.uk

ー The Web as Corpus ー

Page 10: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

•Compare the frequency of 2 phrases •Multilingual

Related work 2 : Google Fight, Google Duel

Page 11: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Related work 2 : Google Fight, Google Duel

•Comparison of two phrases only

Page 12: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Kiwi’s characteristics

•Flexible query    - comparison A/B   - wild card   *A A*B B*  

•Multilingual aspect -String based processing    Language dependent analysis  

User has clues

English webuser36.5%globstats 2002

Page 13: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Online Language Populations

http://www.glreach.com/globstats

Page 14: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

The Process

1. Obtain search results

summaries only

2. Extract candidates   *A, A*

3. Order candidates

Page 15: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

• Frequent

• Moderately long

• Various succeeding characters  

Characteristics of candidates    (at entry being A *)  

Page 16: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

• Frequent

• Moderately long

• Various succeeding characters  Extraction

Ordering} }

Characteristics of candidates    (at entry being A *)  

Page 17: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Extraction: number of succeeding character kind

n a u r et _

a

l

human *

_

cut

longer context

increase

decrease increase

Branching degreedecreases

cut

is

Page 18: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Ordering

-Shorter more frequent

  Ex. “international” includes “in”    

Eval-fun (candidate)    = freq ( candidate)× log (length(candidate) + 1)  

       Empirically defined

Page 19: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Examples

german_demo_viewlet_swf.html

Page 20: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

German

“Atemwegssyndrom”

Other candidates

・ Respiratorische Syndrom

・ oder Chronische

Gesundheits ・ Erkrankung

Etc…

Page 21: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Japanese

“ 重症急性呼吸器”( SARS)

Other candidates

・シックハウス (Sick Building)

・エコノミークラス (Economy class)

・慢性疲労 (Chronic Fatigue)

Etc…

Page 22: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

French

“Respiratoire Aigu Sévère ”

Other candidates

・ de Marfan ・ Prémenstruel ・ de la class

économique

Etc…

Page 23: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Chinese

“ 嚴重急性呼吸道”

Other candidates

・經前 ・電腦視覺 ・腕道 ・睡眠呼吸中止 ・後天免疫缺乏

Etc…

Page 24: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Korean

“ 중증급성호흡기”

Other candidates

・급성호흡기 ・만성피로 ・ 과민성 대장

Etc…

Page 25: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Evaluation: English collocations

Kiwi : 1000 match totalized: examine top n        (exact match)Baseline : Search engine results: top n       (included or not)

and so on in spite of

and so * * spite of

tail head

300 300

Page 26: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Results

③   upper bound of Kiwi    

n = 1 n = 10 n = 1000head

head

tail

tail

Page 27: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Results③ ー②  Extraction error② ー① Ordering error

+ test set problem} Ex.   be anxious for         to

≠search engine 

n = 1 n = 10 n = 1000head

head

tail

tail

Page 28: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

No. matches

Rankin

g

Insufficient

Sufficient

Data amountRank transition of best & correct candidate

Page 29: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

EvaluationUsing different search engines

Answer in Top 1

in Top 10 In Candidate

Mean Reciprocal

Ranking

AltaVista head

AllTheWeb head

Google head

77.0%74.8%76.4%

93.3%91.5%92.7%

97.0%97.6%97.2%

0.830.800.82

AltaVista tail

AllTheWeb tail

Google tail

78.5%73.6%75.8%

92.8%93.2%

93.8%

96.3%97.8%98.1%

0.850.800.82

Red score is the best score

n = 1 n = 10 n=1000

Page 30: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Obtain Q&A answers from search engine results  

  

Related work 3 : NL based on search engineEx.   Q&A Brill et al.(2002)

already totalized results

What does this mean to NL?

Page 31: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Comparison of Results

head tail

Top n candidates

1 3 10 1 3 10

Using different search engine

AltaVista-

AllTheWeb87.4% 72.9

%56.2% 84.5% 71.0% 57.8%

AllTheWeb-

Google86.2% 75.1

%59.4% 87.0% 75.8% 61.8%

Google-

AltaVista87.8% 72.2

%57.9% 82.3% 75.5% 59.9%

Using different segment of search results (AltaVista)

1st 1000 match–

Next 1000 match

91.1% 69.1%

60.0% 87.0% 70.9% 59.9%

Page 32: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Conclusion

• Usage consultation tool

   up-to-date expression, grammar check

• Totalize search engine results

• Multi-lingual & flexible entry

• String based candidate extraction and ordering

• Evaluation

Page 33: KIWI A Multi-Lingual Usage Consultation Tool based on Internet Searching

Thank you!

Demonstration at ACL

(demo session & Univ.Tokyo booth)

Please come!