Top Banner
Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper Advisor Advisor Dr. Koh Jia-Ling Dr. Koh Jia-Ling Speaker Speaker Che-Wei Liang Che-Wei Liang Date Date 2007.11.20 2007.11.20 1
25

Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Truth Discovery with Multiple Confliction Information Providers

on the WebXiaoxin Yin, Jiawei Han, Philip S.Yu

Industrial and Government Track short paper

AdvisorAdvisor :: Dr. Koh Jia-LingDr. Koh Jia-LingSpeakerSpeaker :: Che-Wei LiangChe-Wei Liang

DateDate :: 2007.11.202007.11.20

1

Page 2: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Outline

• Introduction• Problem Definitions• Computational Model– Web Site Trustworthiness and Fact Confidence– Iterative Computation

• Empirical Study• Conclusions

2

Page 3: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Introduction

• World-wide web– a necessary part of our lives.– ex: Amazon.com, ShopZilla.com.

• Is the world-wide web always trustable?– There is no guarantee for the correctness of

information on the web.

3

Page 4: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Introduction

• Example 1: Authors of books

incomplete!

incorrect!

4

Page 5: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Introduction

• Ranking web pages– According to authority based on hyperlinks.– Ex: Authority-Hub analysis, PageRank,

more general link-based analysis.

• Does authority or popularity of web sites lead to accuracy of information?

5

Page 6: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Introduction

• Veracity problem– Discover the true fact about each object.

6

Page 7: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Problem Definitions

• Define1: Confidence of facts.– The probability of a fact f being correct,

denote by s(f).

• Define2: Trustworthiness of web sites.– The expected confidence of the facts provided by

a web site w, denote by t(w).

7

Page 8: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Problem Definitions

• Facts may be conflict or supportive to each other.– Ex: “Jennifer Widom”, “J. Widom”

• Concept of implication– imp(f1 → f2): f1’s influence on f2’s confidence.

8

Page 9: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Basic heuristic

• Basic heuristic1. Usually there is only one true fact

for a property of an object.

2. This true fact appears to be the same or similar on different web sites.

9

Page 10: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Basic heuristic (cont.)

• Basic heuristic3. The false facts on different web sites are

less likely to be the same or similar.

4. In a certain domain, a web site that provides mostly true facts for many objects will likely provide true facts for other objects.

10

Page 11: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Web Site Trustworthiness and Fact Confidence

• Trustworthiness t(w)

where F(w) is the set of facts provided by w.

11

Page 12: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Web Site Trustworthiness and Fact Confidence

• more difficult to estimate the confidence of a fact.

12

Page 13: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Web Site Trustworthiness and Fact Confidence

• Simple case– f1 is the only fact about object o1

– assume w1 and w2 are independent.

• Confidence s(f)

W(f) is the set of web sites providing f.13

Page 14: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Web Site Trustworthiness and Fact Confidence

• Trustworthiness score of a web site

• τ(w) is between 0 and +∞, better characterizes how accurate w is.– ex: t(w1) = 0.9, t(w2) = 0.99

t(w2) = 1.1 × t(w1)

τ(w2) = 2 × τ(w1)

14

Page 15: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Web Site Trustworthiness and Fact Confidence

• Confidence score of a fact

– Property:

15

Page 16: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Web Site Trustworthiness and Fact Confidence

• adjusted confidence score of a fact f

16

Page 17: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Web Site Trustworthiness and Fact Confidence

• Compute the confidence of f based on σ*(f) in the same way as computing it based on σ(f).

• Different web sites are independent. add a dampening factor γ, 0 < γ < 1.

incorrect!

17

Page 18: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Web Site Trustworthiness and Fact Confidence

• Negative-confidence problem– a fact f conflicting with some facts provided by

trustworthy web sites. σ*(f) < 0 and s*(f) < 0.

• – If γ . σ*(f) > 0, s(f) is very close to s*(f).– If γ . σ*(f) < 0, s(f) is close to zero but still

positive.

unreasonable!

18

Page 19: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Iterative Computation

• TRUTHFINDER - Iterative method– TruthFinder has little information about the

web sites and the facts.

– Each iteration, improves its knowledge about trustworthiness and confidence.

– Stops when the computation reaches a stable state.

19

Page 20: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Empirical Study

• Compare with VOTING– Which Chooses the fact that is provided by most

web sites.

• Intel PC with a 1.66GHz dual-core processor, 1GB memory, Windows XP Professional.ρ = 0.5 and γ = 0.3.

20

Page 21: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Empirical Study

21

Page 22: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Empirical Study

22

Page 23: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Empirical Study

23

Page 24: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Empirical Study

24

Page 25: Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Conclusions

• Introduce and formulate the Veracity problem– resolving conflicting facts from multiple web site.– finding true facts among them.

• Propose TRUTHFINDER– Utilizes Web site trustworthiness and fact confidence to

find trustable web sites and true facts.

• Experiment achieves high accuracy.

25