Machine Learning Applications in NLP.ppt

Machine LearningReal World Applications

Rob JasperIntelligent Results

http://fac-staff.seattleu.edu/jasperr

[email protected]

04/13/23 2

Goals

Convince you that: You can use Machine learning (ML)

techniques to solve difficult real world problems

Real world programmers / programs use ML techniques

Applications for ML abound (especially in text processing)

Provide overview Variety of applications in just one small area

(text processing) Classification is the quintessential ML problem Variety of techniques to solving classification

problems Issues involved in building classifiers Advanced techniques for dealing with

particular problems

04/13/23 3

Overview

Machine Learning Definition Supervised versus unsupervised

Machine Learning in NLP Part of Speech (PoS) tagging Named entity extraction Key phrase extraction Spelling correction (Text) classification

The quintessential ML problem Classification techniques

K nearest neighbor Rocchio Support Vector Machines (SVM) Ensemble Techniques

Bagging Boosting

04/13/23 4

Machine Learning

“Machine Learning is the study of computer algorithms that improve automatically through experience.” —Tom Mitchell

“A computer program is said to learn from experience E w.r.t. some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”—Tom Mitchell

04/13/23 5

Machine Learning Example

Backgammon T: playing backgammon P: percent of games won against

opponents E: playing practice games against

itself

TD-Gammon (Tesauro 1992, 1995) learned to play at level of world champions by playing games against itself

What are other approaches to this problem?

04/13/23 6

Unsupervised versus Supervised Learning

Unsupervised learning “Learning in which the system

parameters are adapted using only the information of the input and are constrained by prespecified internal rules.”

Supervised learning “Learning or adaptation in

which a desired response can be used by the system to guide the learning.”

Is learning backgammon supervised or unsupervised?

04/13/23 7

Problem SettingFrom Machine Learning,

Tom Mitchell, 1997 X set of instances over which

target functions can be defined C set of target concepts our

learner might want to learn Each concept c in C can be

viewed as a subset of X Training examples are

generated by drawing instance x of X at random according to some distribution D

04/13/23 8

Concepts and training examples

Instance Space X

++

+

+

c-

-

-

-

Training examples

04/13/23 9

General Model of Learning

General model of learning Learner, L considers set of

hypotheses H based on properties of x

L observers a sequence of training examples x c(x)

L outputs hypothesis h, which is its estimate of c

We evaluate h over new instances of X according to D

04/13/23 10

Error of hypothesis

Instance Space X

++

+

+

c-

h-

-

-

Where c and h disagree

04/13/23 11

An Operational Model of Machine Learning

Learner

TrainingData

Model

ExecutionEngine

ModelTagged

Data

ProductionData

04/13/23 12

Machine Learning in Natural Language Processing

NLP—”The branch of information science that deals with processing natural language”

Applications include Part of Speech (PoS) tagging Named entity extraction Key phrase extraction Spelling correction (Text) categorization

04/13/23 13

PoS Tagging

PoS tagging Task (T) : tag word tokens with correct

part of speech Measure (P): percent of correctly tagged

words Experience (E) manually tagged text

Input: “The dogmatic dog danced delightfully.”

Output “The<article> dogmatic<adjective> dog<noun> danced<verb> delightfully<adverb>”

2002-3 SU Masters Project

04/13/23 14

Named Entity Extraction

Named entities task Task (T) : tag entities (e.g., people,

places, things) Measure(P): Precision and Recall Experience (E) manually tagged text (e.g,

MUC)

Input: “George saw the New York skyline in the 50’s”

Output “George<Person> saw the New<Place-start> York<Place-end> skyline in the 50’s<Date>”

2003-4 SU Masters Project

04/13/23 15

Key Phrase Extraction

Key phrases task

Task (T) : Extract Key phrases from a

body of text

Measure(P): Precision and Recall

Experience (E) manually tagged text

(identifying key phrases).

Input: “DRESDEN, Germany (Reuters) - U.S.

semiconductor maker Advanced Micro Devices

is set to announce it will build a new chip plant

in the eastern German city of Dresden, industry

sources told Reuters on Saturday.”

Output: “Advanced Micro Devices”, “new chip

plant”, “Dresden”

04/13/23 16

Spelling Correction

Key phrases task

Task (T) : Identify and rank suitable

replacements for misspelled words

Measure(P): Ideal ranking

Experience (E) misspellings, correctly

spelled words, logical replacements

Input:”Fuedng”-->{“Feeding”, “Feudal”,

“Feuding”, “Feed”, “Feud”}

Output: ”Fuedng”-->{“Feuding”, “Feeding”,

“Feudal”, “Feud”}

04/13/23 17

Text Categorization

Key phrases task

Task (T) : Identify proper category among

a pre-defined set of categories

Measure(P): Precision and Recall

Experience (E) Text document tagged with

pre-defined set of categories (e.g.,

Reuters 21578)

Input:”Shortly after Phish wraps up their four-

night run in Miami this December, Page will

begin a short tour up the East Coast with Vida

Blue.”

-->{Music,Sports,Business}

Output: Music

04/13/23 18

Document Representation

“Shortly after Phish wraps up their four-night run in Miami this December, Page will begin a short tour up the East Coast with Vida Blue Page, Russell and Oteil will be joined by the six-member Spam Allstars, who back Vida Blue…”

PhishPage

RussellTrey

RecordCD

beginshorttour...

1432623113...

Remove (stop) non-content bearing terms: articles, conjuncts, etc.

Count content bearing words in document

Create vector: each word dimension, counts represent magnitude of dimension

04/13/23 19

Vector Example

Phish

Trey6

14

Example of documents in the Phish / Trey dimensions

d1

d2

04/13/23 20

Vector Comparison

Phish

Trey6

14d1

d2

i ii i

i ii

yx

yxyx

22),cos(

04/13/23 21

kNN classification

kNN--k nearest neighbors

?

MM

M

M

M

SS

S

S

S

B

B B

B

BK=1

K=5

K=10

S SportsM Music

B Business

04/13/23 22

Rocchio Classifier

SportsSports

s

ss

ss

s

sS

MusicMusic

mm

m

m

m

m

BusinessBusiness

bb

b

b

Bb

M

Threshold

CharacteristicVector

04/13/23 23

Rocchio Formula

}{}{

1

document of of weight theis

:where

,...

classifier compute

jk

i

POSdkj

i

POSdkj

ki

iTii

dt

NEG

w

POS

w

w

wwc

ijij

04/13/23 24

Rocchio Example

Phish

Sales

++

+

+

-

-

-

-

Centroid+

Centroid-

Rocchio

04/13/23 25

Support Vector Machines

+

+

+

+

+

+

+

+

-

-

---

-

-

-

-

-

04/13/23 26

Issues

Very few training examples Distribution of training examples

isn’t very representative of “real data”

Classifier works very well on training data, but poorly on new data Not a big issue with SVM An issue with kNN, Rocchio, C4.5,

and many others Bagging and Boosting are typical

responses

04/13/23 27

Bagging

Create a whole gaggle of classifiers, each trained on different sets of data Sample training data with replacement Majority of the sub-classifiers is the final

answer

Music Training DataMusic Training Data

T2T1

T3

T4 T5

Tn

04/13/23 28

Boosting

Similar to bagging, run multiple classifiers on altered training data, combining the results into a final answer.

AdaBoost: Assign each training example a weight (all the

same at start)

Boost a number of rounds

Build classifier using weighted examples

Classify training examples

Increase weight of wrongly classified examples

Create weighted majority classifier using weights (better classifiers get higher weights)

04/13/23 29

AdaBoost Algorithm

)(h)(

y)(xh if

y)(xh if )()(1 Update

1ln

2

1Let

]y)(x[hPr

error with 1}{-1,X:h

:hypothesisGet weak

Don distributi usinglearner each Train

,...,1For

/1)( :Initialize

),(),...,,(:Given

t1

iit

iit

t

t

iitD~it

t

t

1

11

t

xsignxH

e

e

Zt

iDtiDt

Tt

miD

yxyx

T

t t

t

mm

t

t

α

εε

α

ε

α

α

04/13/23 30

Summary

Machine learning (ML) provides a way to solve complex problems where programming would be difficult

Many problems can be framed a general classification problems

There are numerous (well known) techniques for solving these kinds of problems

Challenges are mainly collecting good training examples and identifying salient features

04/13/23 31

Resources

“Machine Learning”, Tom Mitchell, McGraw Hill, 1997

“Machine Learning in Automated Text Categorization”, Fabrizio Sabastiani, ACM Computing Surveys, March 2002

“A Short Introduction to Boosting”, Freund & Schapire, Journal of Japanese AI, Sept. 1999

Machine Learning Applications in NLP.ppt

Documents

unsupervised machine

text e

text input

entities task task t

instance x of x

body of text measurep

techniques applications

new instances of x