Top Banner
Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王王 王王王王王王王
22

Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Jan 21, 2016

Download

Documents

Sharon Collins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness

Wang   Hua  王化情報科学科四年

Page 2: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Motivation

Too many search engines More than 20 major general-purpose enginesMore specific-purpose engines

Simple aggregation of rankings is popular.

We address the need to quantify and visualize the closeness between search engines.

Page 3: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.
Page 4: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Too Many Search Engines with Different Policy

Major search enginesYahoo, Altavista, Google,Lycos etc.

Distinct ranking policyDirectory type Robot typePagerank type with hyperlink

Page 5: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Outline of Methods

Ranking

Li st d istance measure

Distance between search engines

Page 6: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Ranking

Partial ListCases for WWW web sitesTop 100 list

Page 7: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

List of results from search engines

Page 8: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Footrule Distance among Ranking Lists

: ranking lists i |(i) - (i)| [a,b,c,d,e]

[a,d,e,c,b] 0+2+1+2+3 =8

Page 9: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Kendall-tau Distance Definition [Dwork, WWW10, 2001] Counts the number of pairwise disagreements betwe

en two lists

| { i < j | (i) < (j) but (i) > (j) } |

[a,b,c,d]   [a,d,c,b]6 pairs: (a,b) (a,c) (a,d) (b,c) (b,d) (c,d)

0+0+0+1+1+1=3

Page 10: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Character  of Distance 

Kendall-tau has O(n log n)-time complexity

Meets triangle inequality and norm distance

Page 11: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Matrix of Distance

Keyword = “university

Engines Dmos Alta Yahoo OverT Excite Lycos Aol Sprinks Galay

Dmos 441 100 132 121 190 213 211 42

Alta 490 737 574 895 915 100 720

Yahoo 2324 2123 1349 879 1221 1766

Overture 7162 7113 6254 945 312

Excite 8927 9699 282 192

Lycos 8712 462 354

Aol 461 365

Sprinks 123

Galaxy

Table 4.2 The Closeness of Search Engines

Page 12: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Visualization

Kernighan-Lin Algorithm

Kamada Spring Model

Comparison of the 2 methods

Page 13: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Kernighan-Lin Method

Brief explanation

Page 14: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Kernighan-Lin by Color Coding Keyword1 =“Totti” Keyword2=“Nakata”

Page 15: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Kernighan-Lin by Color CodingKeyword1=“Gucci” Keyword2=“Hermes”

Page 16: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Kamada Spring Model

Brief explanation

Page 17: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

An example

Page 18: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Kamada Spring ModelKeyword1=“Totti” Keyword2=“Nakata”

Page 19: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Comparison of the 2 methods

Page 20: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Results

Distances between search engines are different.

Different fields have different characters

Some search engines such as Sprinks are far away from others.

Excite, Aol are near to each other in most cases.

Page 21: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Conclusion

Address the need to quantify and visualize the closeness between search engines.

Provide users GUI to see the closeness of search engines.

Help users to select the proper search engines

Help users to see the features of each search engines in carious fields.

Page 22: Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.

Future Work

Use more search engines

Use both general-purpose and special-purpose search engines

Use hyperlinks to find the resemblance

Apply this idea to other fields