Machine Learning Real World Applications Rob Jasper Intelligent Results http://fac- staff.seattleu.edu/jasperr [email protected]
Jun 14, 2015
Machine LearningReal World Applications
Rob JasperIntelligent Results
http://fac-staff.seattleu.edu/jasperr
04/13/23 2
Goals
Convince you that: You can use Machine learning (ML)
techniques to solve difficult real world problems
Real world programmers / programs use ML techniques
Applications for ML abound (especially in text processing)
Provide overview Variety of applications in just one small area
(text processing) Classification is the quintessential ML problem Variety of techniques to solving classification
problems Issues involved in building classifiers Advanced techniques for dealing with
particular problems
04/13/23 3
Overview
Machine Learning Definition Supervised versus unsupervised
Machine Learning in NLP Part of Speech (PoS) tagging Named entity extraction Key phrase extraction Spelling correction (Text) classification
The quintessential ML problem Classification techniques
K nearest neighbor Rocchio Support Vector Machines (SVM) Ensemble Techniques
Bagging Boosting
04/13/23 4
Machine Learning
“Machine Learning is the study of computer algorithms that improve automatically through experience.” —Tom Mitchell
“A computer program is said to learn from experience E w.r.t. some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”—Tom Mitchell
04/13/23 5
Machine Learning Example
Backgammon T: playing backgammon P: percent of games won against
opponents E: playing practice games against
itself
TD-Gammon (Tesauro 1992, 1995) learned to play at level of world champions by playing games against itself
What are other approaches to this problem?
04/13/23 6
Unsupervised versus Supervised Learning
Unsupervised learning “Learning in which the system
parameters are adapted using only the information of the input and are constrained by prespecified internal rules.”
Supervised learning “Learning or adaptation in
which a desired response can be used by the system to guide the learning.”
Is learning backgammon supervised or unsupervised?
04/13/23 7
Problem SettingFrom Machine Learning,
Tom Mitchell, 1997 X set of instances over which
target functions can be defined C set of target concepts our
learner might want to learn Each concept c in C can be
viewed as a subset of X Training examples are
generated by drawing instance x of X at random according to some distribution D
04/13/23 8
Concepts and training examples
Instance Space X
++
+
+
c-
-
-
-
Training examples
04/13/23 9
General Model of Learning
General model of learning Learner, L considers set of
hypotheses H based on properties of x
L observers a sequence of training examples x c(x)
L outputs hypothesis h, which is its estimate of c
We evaluate h over new instances of X according to D
04/13/23 10
Error of hypothesis
Instance Space X
++
+
+
c-
h-
-
-
Where c and h disagree
04/13/23 11
An Operational Model of Machine Learning
Learner
TrainingData
Model
ExecutionEngine
ModelTagged
Data
ProductionData
04/13/23 12
Machine Learning in Natural Language Processing
NLP—”The branch of information science that deals with processing natural language”
Applications include Part of Speech (PoS) tagging Named entity extraction Key phrase extraction Spelling correction (Text) categorization
04/13/23 13
PoS Tagging
PoS tagging Task (T) : tag word tokens with correct
part of speech Measure (P): percent of correctly tagged
words Experience (E) manually tagged text
Input: “The dogmatic dog danced delightfully.”
Output “The<article> dogmatic<adjective> dog<noun> danced<verb> delightfully<adverb>”
2002-3 SU Masters Project
04/13/23 14
Named Entity Extraction
Named entities task Task (T) : tag entities (e.g., people,
places, things) Measure(P): Precision and Recall Experience (E) manually tagged text (e.g,
MUC)
Input: “George saw the New York skyline in the 50’s”
Output “George<Person> saw the New<Place-start> York<Place-end> skyline in the 50’s<Date>”
2003-4 SU Masters Project
04/13/23 15
Key Phrase Extraction
Key phrases task
Task (T) : Extract Key phrases from a
body of text
Measure(P): Precision and Recall
Experience (E) manually tagged text
(identifying key phrases).
Input: “DRESDEN, Germany (Reuters) - U.S.
semiconductor maker Advanced Micro Devices
is set to announce it will build a new chip plant
in the eastern German city of Dresden, industry
sources told Reuters on Saturday.”
Output: “Advanced Micro Devices”, “new chip
plant”, “Dresden”
04/13/23 16
Spelling Correction
Key phrases task
Task (T) : Identify and rank suitable
replacements for misspelled words
Measure(P): Ideal ranking
Experience (E) misspellings, correctly
spelled words, logical replacements
Input:”Fuedng”-->{“Feeding”, “Feudal”,
“Feuding”, “Feed”, “Feud”}
Output: ”Fuedng”-->{“Feuding”, “Feeding”,
“Feudal”, “Feud”}
04/13/23 17
Text Categorization
Key phrases task
Task (T) : Identify proper category among
a pre-defined set of categories
Measure(P): Precision and Recall
Experience (E) Text document tagged with
pre-defined set of categories (e.g.,
Reuters 21578)
Input:”Shortly after Phish wraps up their four-
night run in Miami this December, Page will
begin a short tour up the East Coast with Vida
Blue.”
-->{Music,Sports,Business}
Output: Music
04/13/23 18
Document Representation
“Shortly after Phish wraps up their four-night run in Miami this December, Page will begin a short tour up the East Coast with Vida Blue Page, Russell and Oteil will be joined by the six-member Spam Allstars, who back Vida Blue…”
PhishPage
RussellTrey
RecordCD
beginshorttour...
1432623113...
Remove (stop) non-content bearing terms: articles, conjuncts, etc.
Count content bearing words in document
Create vector: each word dimension, counts represent magnitude of dimension
04/13/23 19
Vector Example
Phish
Trey6
14
Example of documents in the Phish / Trey dimensions
d1
d2
04/13/23 20
Vector Comparison
Phish
Trey6
14d1
d2
i ii i
i ii
yx
yxyx
22),cos(
04/13/23 21
kNN classification
kNN--k nearest neighbors
?
MM
M
M
M
SS
S
S
S
B
B B
B
BK=1
K=5
K=10
S SportsM Music
B Business
04/13/23 22
Rocchio Classifier
SportsSports
s
ss
ss
s
sS
MusicMusic
mm
m
m
m
m
BusinessBusiness
bb
b
b
Bb
M
Threshold
CharacteristicVector
04/13/23 23
Rocchio Formula
}{}{
1
document of of weight theis
:where
,...
classifier compute
jk
i
POSdkj
i
POSdkj
ki
iTii
dt
NEG
w
POS
w
w
wwc
ijij
04/13/23 24
Rocchio Example
Phish
Sales
++
+
+
-
-
-
-
Centroid+
Centroid-
Rocchio
04/13/23 25
Support Vector Machines
+
+
+
+
+
+
+
+
-
-
---
-
-
-
-
-
04/13/23 26
Issues
Very few training examples Distribution of training examples
isn’t very representative of “real data”
Classifier works very well on training data, but poorly on new data Not a big issue with SVM An issue with kNN, Rocchio, C4.5,
and many others Bagging and Boosting are typical
responses
04/13/23 27
Bagging
Create a whole gaggle of classifiers, each trained on different sets of data Sample training data with replacement Majority of the sub-classifiers is the final
answer
Music Training DataMusic Training Data
T2T1
T3
T4 T5
Tn
04/13/23 28
Boosting
Similar to bagging, run multiple classifiers on altered training data, combining the results into a final answer.
AdaBoost: Assign each training example a weight (all the
same at start)
Boost a number of rounds
Build classifier using weighted examples
Classify training examples
Increase weight of wrongly classified examples
Create weighted majority classifier using weights (better classifiers get higher weights)
04/13/23 29
AdaBoost Algorithm
)(h)(
y)(xh if
y)(xh if )()(1 Update
1ln
2
1Let
]y)(x[hPr
error with 1}{-1,X:h
:hypothesisGet weak
Don distributi usinglearner each Train
,...,1For
/1)( :Initialize
),(),...,,(:Given
t1
iit
iit
t
t
iitD~it
t
t
1
11
t
xsignxH
e
e
Zt
iDtiDt
Tt
miD
yxyx
T
t t
t
mm
t
t
α
εε
α
ε
α
α
04/13/23 30
Summary
Machine learning (ML) provides a way to solve complex problems where programming would be difficult
Many problems can be framed a general classification problems
There are numerous (well known) techniques for solving these kinds of problems
Challenges are mainly collecting good training examples and identifying salient features
04/13/23 31
Resources
“Machine Learning”, Tom Mitchell, McGraw Hill, 1997
“Machine Learning in Automated Text Categorization”, Fabrizio Sabastiani, ACM Computing Surveys, March 2002
“A Short Introduction to Boosting”, Freund & Schapire, Journal of Japanese AI, Sept. 1999