Machine Learning - Biostatistics and Medical Informaticspage/course-overview.pdf · unsupervised learning, reinforcement learning, active learning, etc. • a broad toolbox of machine-learning

Machine Learning Computer Sciences 760

Spring 2014

www.cs.wisc.edu/~dpage/cs760/

Class capacity

•  Typically limited to 30 •  demand has grown over 150, hence now twice/year •  I’ve allowed 45 to register •  > 50 are on the waiting list •  if you’re thinking about dropping, please do so as soon as

you make the decision

CS 760: Machine Learning

•  Professor: David Page email: page@biostat.wisc.edu office hours: 11am-Noon Tues, 1pm-2pm Thurs,

or by appointment office: 6743 Medical Sciences Center

•  TA: TBA

Finding my office

•  6743 Medical Sciences Center

•  Easiest to enter from Charter St. and take elevator immediately to your right

Monday, Wednesday and Friday?

•  We’ll have 30 lectures in all, just like a standard TR class

•  Most weeks we’ll just meet Mon and Wed •  Some weeks we’ll meet on Friday •  This arrangement facilitates making up for days I’m

out of town •  First three weeks we will meet MWF •  I will give 2 weeks’ advance notice for other Fridays

that the class meets

First Assignment

•  read Chapters 1 and 3 of Mitchell •  www.cs.wisc.edu/~page/cs760/

Course emphases

•  a variety of learning settings: supervised learning, unsupervised learning, reinforcement learning, active learning, etc.

•  a broad toolbox of machine-learning methods: decision trees, nearest neighbor, Bayesian networks, SVMs, etc.

•  some underlying theory: bias-variance tradeoff, PAC learning, mistake-bound theory, etc.

•  experimental methodology for evaluating learning systems: cross validation, ROC and PR curves, hypothesis testing, etc.

Two major goals

1.  Understand what a learning system should do

2.  Understand how (and how well) existing systems work

Course requirements

•  4 homework assignments: ~40% –  programming –  computational experiments (e.g. measure the effect of

varying parameter x in algorithm y) –  written exercises

•  “midterm” exam (actually in April): ~35%

•  project: ~ 25%

Expected background •  CS 540 (Intro to Artificial Intelligence) or equivalent

–  search –  first-order logic –  unification –  deduction

•  reasonable programming skills

•  basics of probability: but we’ll review

•  calculus –  partial derivatives

Programming languages

•  for the programming assignments, you can use C C++ Java Perl Python R

•  programs must be callable from the command line

Course readings •  Machine Learning. T. Mitchell. McGraw Hill, 1997.

•  additional on-line articles, surveys, and chapters

What is machine learning?

•  the study of algorithms that improve their performance P at some task T with experience E

•  to have a well defined learning task, we must specify: < T, P, E >

ML example: spam filtering

•  T : given new mail message, classify as spam vs. other •  P : minimize misclassification costs •  E : previously classified (filed) messages

ML example: mammography [Burnside et al., Radiology 2009]

ML example: mammography

•  T : given new mammogram, classify as benign vs. malignant •  P : minimize misclassification costs •  E : previously encountered patient histories (mammograms +

subsequent outcomes)

ML example: predictive text input

•  T : given (partially) typed word, predict the word the user intended to type

•  P : minimize misclassifications •  E : words previously typed by the user

(+ lexicon of common words + knowledge of keyboard layout)

domain knowledge

ML example: Netflix Prize

ML example: Netflix

•  T : given a user/movie pair, predict the user’s rating (1-5 stars) of the movie

•  P : minimize difference between predicted and actual rating •  E : histories of previously rated movies (user, movie, rating triples)

The day I bought it

ML example: Stock Forecasting

•  T : given a stock, predict the value tomorrow/next week/next month •  P : minimize difference between predicted and actual value •  E : histories of this stock, other stocks

•  Alternatives in specification: •  T: given NYSE, choose an investment strategy •  P: maximize profit •  E: might also include background information about companies

ML example: Personalized Medicine

ID Year of Birth

Gender

P1 3.10.1946 M

ID Date Diagnosis Sign/Symptom P1 6.2.2011 Atrial fibrillation Discomfort

Demographics

Diagnoses

The Electronic Health Record (EHR)

ID Year of Birth

Gender

P1 1946.03.10 M

ID Date Diagnosis Symptoms P1 2011.06.0

2 Atrial fibrillation Dizzy, discomfort

Demographics

Diagnoses

ID Date Diagnosis Sign/Symptom P1 7.3.2011 Atrial fibrillation Dizziness,

Nausea

ID Year of Birth

Gender

P1 1946.03.10 M

Demographics

Diagnoses

ID Date Diagnosis Sign/Symptom P1 2.29.2012 Stroke Schizophasia

ID Year of Birth

Gender

P1 1946.03.10 M

ID Date Diagnosis Sign/Symptom P1 6.2.2011 Atrial fibrillation Discomfort P1 7.3.2011 Atrial fibrillation Dizziness,

Nausea P1 2.29.2012 Stroke Schizophasia

Demographics

Diagnoses

Patient ID Gender BirthdateP1 M 3/22/1963

Patient ID Date Physician Symptoms DiagnosisP1 1/1/2001 Smith palpitations hypoglycemicP1 2/1/2001 Jones fever, aches influenza

Patient ID Date Lab Test ResultP1 1/1/2001 blood glucose 42P1 1/9/2001 blood glucose 45

Patient ID Date Observation ResultP1 1/1/2001 Height 5'11P2 1/9/2001 BMI 34.5

Patient IDDate

Prescribed Date Filled Physician Medication Dose DurationP1 5/17/1998 5/18/1998 Jones Prilosec 10mg 3 months

Electronic Health Record (EHR)

Demographics

Diagnoses

Lab Results

Vitals Medications

Personalized Treatment

Individual Patient

G + C + E

Predictive Model for Disease

Susceptibility & Treatment Response State-of-the-Art

Machine Learning

Genetic, Clinical,

& Environmental

Personalized Medicine

ML example: Personalized Medicine

•  T : given a patient and disease diagnosis, choose best treatment •  P : cure disease •  E : treatment and outcomes for other patients with same disease

(+ electronic health records (EHRs) + genome sequences)

•  Alternatives in specification: •  T: given a patient, choose lifestyle and treatment plan •  P: maximize patient health as measured by survey questions •  E: might also include answers to questionnaire about lifestyle

Goals for this part of lecture

•  define the supervised and unsupervised learning tasks •  consider how to represent instances as fixed-length feature vectors •  understand the concepts

•  instance (example) •  feature (attribute) •  feature space •  feature types •  supervised learning •  classification (concept learning) •  regression •  i.i.d. assumption •  generalization

Goals for the lecture (continued)

•  understand the concepts •  unsupervised learning •  clustering •  anomaly detection •  dimensionality reduction

Can I eat this mushroom?

I don’t know what type it is – I’ve never seen it before. Is it edible or poisonous?

Can I eat this mushroom?

suppose we’re given examples of edible and poisonous mushrooms (we’ll refer to these as training examples or training instances)

edible

poisonous

can we learn a model that can be used to classify other mushrooms?

Representing instances using feature vectors

•  we need some way to represent each instance •  one common way to do this: use a fixed-length vector to

represent features (a.k.a. attributes) of each instance •  also represent class label of each instance

x1 = bell, fibrous, gray, false, foul, …

x2 = convex, scaly, purple, false, musty, …

x3 = bell, smooth, red, true, musty, …

y1 = edibley2 = poisonousy3 = edible

Standard feature types

•  nominal (including Boolean) –  no ordering among possible values

e.g. color є {red, blue, green} (vs. color = 1000 Hertz)

•  linear (or ordinal) –  possible values of the feature are totally ordered

e.g. size є {small, medium, large} ← discrete weight є [0…500] ← continuous

•  hierarchical –  possible values are partially

ordered in an ISA hierarchy e.g. shape → closed

polygon continuous

triangle square circle ellipse

Feature hierarchy example Lawrence et al., Data Mining and Knowledge Discovery 5(1-2), 2001

Product

Pet Foods Tea

Canned Cat Food

Dried Cat Food

99 Product Classes

2,302 Product Subclasses

Friskies Liver, 250g

~30K Products

Structure of one feature!

Feature space

example: optical properties of oceans in three spectral bands [Traykovski and Sosik, Ocean Optics XIV Conference Proceedings, 1998]

we can think of each instance as representing a point in a d-dimensional feature space where d is the number of features

Another view of the feature-vector representation: a single database table

feature 1 feature 2 . . . feature d class

instance 1 0.0 small red true

instance 2 9.3 medium red false

instance 3 8.2 small blue false

instance n 5.7 medium green true

The supervised learning task

problem setting •  set of possible instances: •  unknown target function: •  set of models (a.k.a. hypotheses):

x1,y1( ), x2,y2( ) … xn,yn( )

given •  training set of instances of unknown target function f

f : X →Y

H = h | h : X →Y{ }

output •  model that best approximates target function

h ∈ H

The supervised learning task

•  when y is discrete, we term this a classification task (or concept learning)

•  when y is continuous, it is a regression task

•  later in the semester, we will consider tasks in which each y is more structured object (e.g. a sequence of discrete labels)

i.i.d. instances

•  we often assume that training instances are independent and identically distributed (i.i.d.) – sampled independently from the same unknown distribution

•  later in the course we’ll consider cases where this assumption does not hold –  cases where sets of instances have dependencies

•  instances sampled from the same medical image •  instances from time series •  etc.

–  cases where the learner can select which instances are labeled for training

•  active learning –  the target function changes over time (concept drift)

Generalization

•  The primary objective in supervised learning is to find a model that generalizes – one that accurately predicts y for previously unseen x

Can I eat this mushroom that was not in my training set?

Model representations

throughout the semester, we will consider a broad range of representations for learned models, including •  decision trees •  neural networks •  support vector machines •  Bayesian networks •  logic clauses •  ensembles of the above •  etc.

Mushroom features (from the UCI Machine Learning Repository)

cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r, pink=p,purple=u,red=e,white=w,yellow=y bruises?: bruises=t,no=f odor: almond=a,anise=l,creosote=c,fishy=y,foul=f, musty=m,none=n,pungent=p,spicy=s gill-attachment: attached=a,descending=d,free=f,notched=n gill-spacing: close=c,crowded=w,distant=d gill-size: broad=b,narrow=n gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g, green=r,orange=o,pink=p,purple=u,red=e, white=w,yellow=y stalk-shape: enlarging=e,tapering=t stalk-root: bulbous=b,club=c,cup=u,equal=e, rhizomorphs=z,rooted=r,missing=? stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s stalk-color-above-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y stalk-color-below-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y veil-type: partial=p,universal=u veil-color: brown=n,orange=o,white=w,yellow=y ring-number: none=n,one=o,two=t ring-type: cobwebby=c,evanescent=e,flaring=f,large=l, none=n,pendant=p,sheathing=s,zone=z spore-print-color: black=k,brown=n,buff=b,chocolate=h,green=r, orange=o,purple=u,white=w,yellow=y population: abundant=a,clustered=c,numerous=n, scattered=s,several=v,solitary=y habitat: grasses=g,leaves=l,meadows=m,paths=p, urban=u,waste=w,woods=d

sunken is one possible value of the cap-shape feature

A learned decision tree

if odor=almond, predict edible

if odor=none ∧ spore-print-color=white ∧ gill-size=narrow ∧ gill-spacing=crowded, predict poisonous

Classification with a learned decision tree

once we have a learned model, we can use it to classify previously unseen instances

x = bell, fibrous, brown, false, foul, …

y = edible or poisonous?

Unsupervised learning

in unsupervised learning, we’re given a set of instances, without y’s goal: discover interesting regularities that characterize the

instances

x1, x2 … xn

common unsupervised learning tasks •  clustering •  anomaly detection •  dimensionality reduction

Clustering

x1,x2 …xn given

•  training set of instances

output •  model that divides the training set into clusters such that

there is intra-cluster similarity and inter-cluster dissimilarity

h ∈ H

Clustering example

Clustering irises using three different features (the colors represent clusters identified by the algorithm, not y’s provided as input)

Anomaly detection

x1,x2 …xn given

output

•  model that represents “normal” x

h ∈ H

learning task

•  a previously unseen x

determine •  if x looks normal or anomalous

performance task

Anomaly detection example

Does the data for 2012 look anomalous?

Dimensionality reduction

x1,x2 …xn given

output

•  model that represents each x with a lower-dimension feature vector while still preserving key properties of the data

h ∈ H

Dimensionality reduction example

We can represent a face using all of the pixels in a given image

More effective method (for many tasks): represent each face as a linear combination of eigenfaces

Dimensionality reduction example

represent each face as a linear combination of eigenfaces

=α1,1 × + α1,2 × + …+ α1,20 ×

=α 2,1 × + α 2,2 × + …+ α 2,20 ×

x1 = α1,1, α1,2 , …, α1,20

x2 = α 2,1, α 2,2 , …, α 2,20

# of features is now 20 instead of # of pixels in images

Other learning tasks

later in the semester we’ll cover other learning tasks that are not strictly supervised or unsupervised •  reinforcement learning •  semi-supervised learning •  etc.

Machine Learning - Biostatistics and Medical Informaticspage/course-overview.pdf · unsupervised learning, reinforcement learning, active learning, etc. • a broad toolbox of machine-learning

Documents

Machine Learning using MapReduce - Boise State...

Machine Learning Overview -...

Section 1Section 1 Machine Learning basic concepts ·...

Machine Learning Probabilistic Machine Learning · Machine....

Machine Learning -...

Machine Learning with MATLAB · 2 Agenda Machine Learning.....

1-Machine Learning in Oracle - ITOUG2017/12/01 · –...

ARTIFICIAL INTELLIGENCE & MACHINE LEARNING · Intelligence....

Masters of Craft: Programming - Machine Learning :...

Machine Learning: Machine Learning:

An Introduction to Machine Learning - GitHub Pages ·...

Course...

MACHINE LEARNING...

DAT340/DIT866 Applied Machine Learning: Machine learning...

Introducing Machine Learning · 4 ntroducing Machine...

Predicting Housing Prices With Azure Machine Learning ·...