Top Banner
Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007
39

Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Bayesian Models, Prior Knowledge, and Data Fusion for

Monitoring Messages and Identifying Authors.

Paul Kantor Rutgers

May 14, 2007

Page 2: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Outline• The Team

• Bayes’ Methods

• Method of Evaluation

• A toy Example

• Expert Knowledge

• Efficiency issues

• Entity Resolution

• Conclusions

Page 3: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Many collaborator

• Principals• Fred Roberts• David Madigan• Dave D. Lewis• Paul Kantor• Programmers• Vladimir Menkov• Alex Genkin

• Now Ph.Ds• Suhrid

Balakrishnan• Dmitriy Fradkin• Aynur Dayanik• Andrei

Anghelescu

• REU Students• Ross Sowell• Diana Michalek• Jordana Chord• Melissa Mitchell

• The Team

• Bayes’ Methods

• Method of Evaluation

• A toy Example

• Expert Knowledge

• Efficiency issues

• Entity Resolution

• Conclusions

Page 4: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.
Page 5: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Overview of Bayes

• Personally I – go to Frequentist Church on Sunday– shop at Bayes’ the rest of the week

• The Team

• Bayes’ Methods

• Method of Evaluation

• A toy Example

• Expert Knowledge

• Efficiency issues

• Entity Resolution

• Conclusions

Page 6: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Making accurate predictions• Be careful to not over fit

the training data

– Way to avoid this: use a prior distribution

• 2 types: Gaussian and Laplace

Laplace

Page 7: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

If you use Gaussian Prior• For Gaussian prior(ridge):

Posterior Modes with Varying Hyperparameter - Gaussian

tau

pos

teri

or m

ode

Posterior Modes with Varying Hyperparameter - Gaussian

tau

pos

teri

or m

ode

Posterior Modes with Varying Hyperparameter - Gaussian

tau

pos

teri

or m

ode

Posterior Modes with Varying Hyperparameter - Gaussian

tau

pos

teri

or m

ode

Posterior Modes with Varying Hyperparameter - Gaussian

tau

pos

teri

or m

ode

Posterior Modes with Varying Hyperparameter - Gaussian

tau

pos

teri

or m

ode

Posterior Modes with Varying Hyperparameter - Gaussian

tau

pos

teri

or m

ode

Posterior Modes with Varying Hyperparameter - Gaussian

tau

pos

teri

or m

ode

-0.1

0-0

.05

0.0

00

.05

0.1

0

0 0.05 0.1 0.15 0.2 0.25 0.3

intercept

npreg

glu

bp

skin

bmi/100

ped

age/100Every feature enters the models

Page 8: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

If you use a LaPlace Prior•For Laplace prior(Lasso):

j

j

N

iii

T wyxwn

w 1

))exp(1log(1

infargˆPosterior Modes with Varying Hyperparameter - Laplace

lambda

po

ste

rio

r m

od

ePosterior Modes with Varying Hyperparameter - Laplace

lambda

po

ste

rio

r m

od

ePosterior Modes with Varying Hyperparameter - Laplace

lambda

po

ste

rio

r m

od

ePosterior Modes with Varying Hyperparameter - Laplace

lambda

po

ste

rio

r m

od

ePosterior Modes with Varying Hyperparameter - Laplace

lambda

po

ste

rio

r m

od

ePosterior Modes with Varying Hyperparameter - Laplace

lambda

po

ste

rio

r m

od

ePosterior Modes with Varying Hyperparameter - Laplace

lambda

po

ste

rio

r m

od

ePosterior Modes with Varying Hyperparameter - Laplace

lambda

po

ste

rio

r m

od

e

-0.1

0-0

.05

0.0

00

.05

0.1

0

120 100 80 60 40 20 0

intercept

npreg

glu

bp

skin

bmi/100

ped

age/100

Features are added slowly, and require stronger evidence

Page 9: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Success Qualifier

AUC =average distance, from the bottom of the list, of the items we’d like to see at the top. Null Hypothesis: Distributed as the averagesum of P uniform variates. �

• The Team

• Bayes’ Methods

• Method of Evaluation

• A toy Example

• Expert Knowledge

• Efficiency issues

• Entity Resolution

• Conclusions

Page 10: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

• The Team

• Bayes’ Methods

• Method of Evaluation

• A toy Example

• Expert Knowledge

• Efficiency issues

• Entity Resolution

• Conclusions

Test Corpus

• We took ten authors who were prolific authors between 1997 and 2002 and who had papers which were easy to disambiguate manually so that we could check the results of BBR. We then chose six KINDS OF features from these people‘s work to be used in training and testing BBR in the hopes that a specific Kind of features might prove more useful in identifying authors than another

• Keywords

• Co-Author Names

• Addresses (words)

• Abstract

• Addresses (n-grams)

• Title

Page 11: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.
Page 12: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Domain Knowledge and Optimization in

Bayesian Logistic Regression

Thanks toDave LewisDavid D. Lewis Consulting, LLC

• The Team

• Bayes’ Methods

• Method of Evaluation

• A toy Example

• Expert Knowledge

• Efficiency issues

• Entity Resolution

• Conclusions

Page 13: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Outline• Bayesian logistic regression

• Advances

• Using domain knowledge to reduce the need for training data

• Speeding up training and classification

• Online training

Page 14: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Logistic Regression in Text and Data Mining

• Classification as a fundamental primitive– Text categorization: classes = content distinctions

– Filtering: classes = user interests

– Entity resolution: classes = entities

• Bayesian logistic regression– Probabilistic classifier allows combining outputs

– Prior allows creating sparse models and combining training data and domain knowledge

– Our KDD-funded BBR and BMR software now widely used

Page 15: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

A Lasso Logistic Model (category “grain”)

Word Beta Word Betacorn 29.78 formal -1.15wheat 20.56 holder -1.43rice 11.33 hungarian -6.15sindt 10.56 rubber -7.12madagascar 6.83 special -7.25import 6.79 … …grain 6.77 beet -13.24contract 3.08 rockwood -13.61

Page 16: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Using Expert Knowledge in Text Classification

• What do we know about a category: – Category description (e.g. MESH- MEdical Subject

Headings)– Human knowledge of good predictor words– Reference materials (e.g. CIA Factbook)

• All give clues to good predictor words for a category– We convert these to a prior on parameter values for

words – Other classification tasks, e.g. entity resolution, have

expert knowledge also

Page 17: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Constructing Informative Prior Distributions from Domain Knowledge in Text Classification

Aynur Dayanik, David D. Lewis, David Madigan, Vladimir Menkov, and Alexander Genkin, January 2006

Page 18: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Corpora

• TREC Genomics. Presence or absence of certain mesh headings

• ModApte “top 10 categories” (Wu and Srihari)

• RCV1 A-B ……see next slide…...

Page 19: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Categories. We selected a subset of the Reuters Region categories whose names exactly matched the names of geographical regions with entries in the CIA World Factbook (see below) and which had one or more positive examples in our large (23, 149 document) training set. There were 189such matches, from which we chose the 27 with names beginning with the letter A or B to work with, reserving the rest for future use.

Page 20: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Some experimental results

• Documents are represented by the so-called tf.idf representation.

• Prior information can be used either to change the variance, or set the mode (an offset or bias). Those results are shown in red. Lasso with no prior information is shown in black.

Page 21: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Large Training Sets(3700 to 23000 examples)

68.983.852.2 Var/TFIDF

64.583.653.3 Mode/TFIDF

70.884.655.2 Var/TFIDF

62.983.141.9 Mode/TFIDF

42.282.926.3Ridge

62.984.154.2Lasso

RCV1-v2ModApteBio Articles

Page 22: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Tiny Training Sets(5 positive examples + 5 random examples)

53.061.535.7 Var/TFIDF

51.558.536.4 Mode/TFIDF

50.761.334.3 Var/TFIDF

48.862.133.9 Mode/TFIDF

23.027.118.8Ridge

52.142.729.6Lasso

RCV1-v2ModApteBio Articles

Page 23: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Findings

• For lots of training data, adding domain knowledge doesn’t help much

• For little training data, it helps more, and more often.

• It is more effective when used to set the priors, than when used as “additional training data”.

Page 24: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Speeding Up Classification

• Completed new version of BMRclassify– Replaces old BBRclassify and BMRclassify

• More flexible– Can apply 1000’s of binary and polytomous classifiers

simultaneously

– Allows meaningful names for features

• Inverted index to classifier suites for speed– 25x speedup over old BMRclassify and BBRclassify

• The Team

• Bayes’ Methods

• Method of Evaluation

• A toy Example

• Expert Knowledge

• Efficiency issues

• Entity Resolution

• Conclusions

Page 25: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Online & Reduced Memory Training

• Rapid updating of classifiers as new data arrives

• Use training sets too big to fit in memory– Larger training sets, when available, give higher

accuracy

Page 26: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

KDD Entity Resolution Challenges

• ER1b: Is this pair of persons who have been possibly renamed, both really the same person?

• ER2a: Which of these persons, in the author list, is using a pseudonym?

• The Team

• Bayes’ Methods

• Method of Evaluation

• A toy Example

• Expert Knowledge

• Efficiency issues

• Entity Resolution

• Conclusions

Page 27: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

KDD Entity Resolution Challenges

• ER1b: Is this pair of persons who have been possibly renamed, both really the same person? (NO)

• Smith and Jones

• Smith Jones and Wesson

• ER2a: Which of these persons, in the author list, is using a pseudonym?

YES

This one!

Page 28: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Conclusions drawn from KDD-1

• On ER1b several of our submissions topped the rankings based on accuracy (the only measure used). Best: dimacs-er1b-modelavg-tdf-aaan– probabilities for all document pairs from 11

CLUTO methods and 1 Doc. Sim. model; summed; Vectors included some author address information

Page 29: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

And ….

– dimacs-er1b-modelavg-tdf-noaaan was second– no AAAN info in the vectors. – Third:dimacs-er1b-modelavg-binary-noaaan– combining many models, no binary

representation.– CONCLUSION: Model averaging (alias: data

fusion, combination) is better than any of the parts.

Page 30: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Conclusions on ER2a

• On ER2a measures were accuracy, squared error, ROC area (AUC) and cross entropy. – Our : dimacs-er2a-single-X; 3rd (accuracy) ; 4th by

AUC. – trained binary logistic regression, using binary vectors

(with AAAN info). – probability for “no replacement” = product of

conditional probabilities for individual authors. Some post-processing using AAAN info.

– No information from the text itself

Page 31: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

And …...

– Omitting the AAAN post-processing (dimacs-er2a-single), somewhat worse (4th by accuracy and 6th by AUC).

• Every kind of information helps, even if it is there by accident.

Page 32: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

ER2: Which authors belong:The affinity of authors to each other

(naïve)

Page 33: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

ER2: Which authors belong?More sophisticated model

• Update the probability that y is an author of D yielding, after some work, the formula:

)0(

)0(

11)0,|(

)(1

)0,|(

/1 ][

][

zp

zp

c

c

c

czDyp

hrh

zDypR

DAAa a

ak

y yu

yu

Page 34: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Read More at:

• Simulated Entity Resolution by Diverse Means: DIMACS Work on the KDD Challenge of 2005

Andrei Anghelescu, Aynur Dayanik, Dmitriy Fradkin, Alex Genkin, Paul Kantor, David Lewis, David Madigan, Ilya Muchnik and Fred Roberts, December 2005, DIMACS Technical Report 2005-42

Page 35: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Read more at

• http://www.stat.rutgers.edu/~madigan/PAPERS/tc-dk.pdf

Page 36: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Streaming algorithms

• L1

• Good performance

• Algorithms for Sparse Linear Classifiers in the Massive Data Setting. S. Balakrishnan and D. Madigan. Journal of Machine Learning Research, submitted, 2006.

Page 37: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

Summary of findings• With little training data, Bayesian methods

work better when they can use general knowledge about the target group

• To determine whether several “records” refer to the same “person” there is no “magic bullet” and combining many methods is feasible and powerful

• to detect an imposter in a group, “social” methods based on combing probabilities are effective (we did not use info in the papers)

• The Team

• Bayes’ Methods

• Method of Evaluation

• A toy Example

• Expert Knowledge

• Efficiency issues

• Entity Resolution

• Conclusions

Page 38: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.
Page 39: Bayesian Models, Prior Knowledge, and Data Fusion for Monitoring Messages and Identifying Authors. Paul Kantor Rutgers May 14, 2007.

• The Team

• Bayes’ Methods

• Method of Evaluation

• A toy Example

• Expert Knowledge

• Efficiency issues

• Entity Resolution

• Conclusions