Top Banner
Matwin 1999 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa [email protected]
29

Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa [email protected].

Jan 20, 2016

Download

Documents

Clement Green
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 19991

Text classification: In Search of a Representation

Stan Matwin

School of Information Technology and Engineering

University of [email protected]

Page 2: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 19992

Outline

Supervised learning=classificationML/DM at U of OClassical approachAttempt at a linguistic representationN-grams – how to get them?Labelling and co-learningNext steps?…

Page 3: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 19993

Supervised learning (classification)

Given:a set of training instances T={et}, where each

t is a class label : one of the classes C1,…Ck

a concept with k classes C1,…Ck (but the definition of the concept is NOT known)

Find: a description for each class which will perform

well in determining (predicting) class membership for unseen instances

Page 4: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 19994

Classification

Prevalent practice:

examples are represented as vectors of values of attributes

Theoretical wisdom,

confirmed empirically: the more examples, the better predictive accuracy

Page 5: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 19995

ML/DM at U of O

Learning from imbalanced classes: applications in remote sensing

a relational, rather than propositional representation: learning the maintainability concept

Learning in the presence of background knowledge. Bayesian belief networks and how to get them. Appl to distributed DB

Page 6: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 19996

Why text classification?

Automatic file savingInternet filtersRecommendersInformation extraction…

Page 7: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 19997

Bag of words

Text classification: standard approach

1. Remove stop words and markings2. remaining words are all attributes3. A document becomes a vector

<word, frequency>

4. Train a boolean classifier for each class

5. Evaluate the results on an unseen sample

Page 8: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 19998

Text classification: tools

RIPPERA “covering”learnerWorks well with large sets of binary

featuresNaïve Bayes

Efficient (no search)Simple to programGives “degree of belief”

Page 9: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 19999

“Prior art”

Yang: best results using k-NN: 82.3% microaveraged accuracy

Joachim’s results using Support Vector Machine + unlabelled data

SVM insensitive to high dimensionality, sparseness of examples

Page 10: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199910

SVM in Text classification

SVM

Transductive SVMMaximum separationMargin for test set

Training with 17 examples in 10 most frequent categories gives test performance of 60% on 3000+ test cases available during training

Page 11: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199911

Problem 1: aggressive feature selection

“Machine”: 50%“Learning”: 75%“Machine Learning”: 50%

AI

“Machine”: 4%“Learning”: 75%“Machine Learning”: 0%

EP

“Machine”: 80%“Learning”: 5%“Machine Learning”: 0%

MT

RIPPER (B.O.W.): machine & learning = AI �

FLIPPER (Cohen): machine & learning & near & after = AI �

RIPPER (Phrases): “machine learning” = AI �

Page 12: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199912

Problem 2: semantic relationships are missed

knife gundagger sword

rifle slingshot

weapon� Semantically related words may

be sparsely distributed throughmany documents

� Statistical learner may be able topick up these correlations

� Rule-based learner isdisadvantaged

Page 13: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199913

Proposed solution (Sam Scott)

Get noun phrases and/or key phrases (Extractor) and add to the feature list

Add hypernyms

Page 14: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199914

Hypernyms - WordNet

“synset” => SYNONYM“is a” => HYPERNYM“instance of” => HYPONYM

“is a”

“instance of”

“Synset”

weapon

gun

pistol,revolver

knife

Page 15: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199915

Evaluation (Lewis)

•Vary the “loss ratio” parameter

• For each parameter value

• Learn a hypothesis for each class (binary classification)

• Micro-average the confusion matrices (add component-wise)

• Compute precision and recall

• Interpolate (or extrapolate) to find the point where micro- averaged precision and recall are equal

Page 16: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199916

Results

No gain over BW in alternative representations

But…

Comprehensibility…

Micro-averaged b.e.Reuters DigiTrad

BW .821 .359BWS .810 .360NP .827 .357NPS .819 .356KP .817 .288e

KPS .816 .297e

H0 .741e .283H1 .734e .281NPW .823 N/A

Page 17: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199917

Combining classifiers

Comparable to best known results (Yang)

Reuters DigiTrad# representations b.e. representations b.e.1 NP .827 BWS .3603 BW, NP, NPS .845 BW, BWS, NP .404e

5 BW, NP, NPS, KP, KPS .849 BW, BWS, NP, KPS, KP .422e

Page 18: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199918

Other possibilities

Using hypernyms with a small training set (avoids ambiguous words)

Use Bayes+Ripper in a cascade scheme (Gama)

Other representations:

Page 19: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199919

Collocations

Do not need to be noun phrases, just pairs of words possibly separated by stop words

Only the well discriminating ones are chosen

These are added to the bag of words, and…

Ripper

Page 20: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199920

N-grams

N-grams are substrings of a given lengthGood results in Reuters [Mladenic, Grobelnik]

with Bayes; we try RIPPER

A different task: classifying text filesAttachments

Audio/video

Coded

From n-grams to relational features

Page 21: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199921

How to get good n-grams?

We use Ziv-Lempel for frequent substring detection (.gz!)

abababaa ba a

b

b

a

Page 22: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199922

N-grams

Counting Pruning:

substring occurrence ratio < acceptance threshold

Building relations: string A almost always precedes string B

Feeding into relational learner (FOIL)

Page 23: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199923

Using grammar induction (text files)

Idea: detect patterns of substringsPatterns are regular languagesMethods of automata induction: a

recognizer for each class of filesWe use a modified version of RPNI2

[Dupont, Miclet]

Page 24: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199924

What’s new…

Work with marked up text (Word, Web)

XML with semantic tags: mixed blessing for DM/TM

Co-learningText mining

Page 25: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199925

Co-learning

How to use unlabelled data? Or How to limit the number of examples that need be labelled?

Two classifiers and two redundantly sufficient representations

Train both, run both on test set, add best predictions to training set

Page 26: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199926

Co-learning

Training set grows as……each learner predicts independently

due to redundant sufficiency (different representations)

would also work with our learners if we used Bayes?

Would work with classifying emails

Page 27: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199927

Co-learning

Mitchell experimented with the task of classifying web pages (profs, students, courses, projects) – a supervised learning task

Used Anchor textPage contents

Error rate halved (from 11% to 5%)

Page 28: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199928

Cog-sci?

Co- learning seems to be cognitively justified

Model: students learning in groups (pairs)

What other social learning mechanisms could provide models for supervised learning?

Page 29: Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca.

Matwin 199929

Conclusion

A practical task, needs a solutionNo satisfactory solution so farFruitful ground for research