Oct 05, 2015
5/19/2018 10 - Introduction to Machine Learning
1/73
Course:
Biomedical Informatics
Parisa Rashidi
Fall 2014
Lecture 10: Introduction toMachine Learning
5/19/2018 10 - Introduction to Machine Learning
2/73
Reminder
Your project progress reports are due on Tuesday,
10/28
~2 pages in length (excluding references) formatted using IEEE style link
http://www.ieee.org/conferences_events/conferences/publishing/templates.htmlhttp://www.ieee.org/conferences_events/conferences/publishing/templates.html5/19/2018 10 - Introduction to Machine Learning
3/73
Agenda
Machine learning
Today
Introduction to machine learning Different types of machine learning methods
Walkthrough: a machine learning process
Later
More machine learning methods
NLP
5/19/2018 10 - Introduction to Machine Learning
4/73
Software
Rapidminer Link
http://sourceforge.net/projects/rapidminer/files/1.%20RapidMiner/5.3/http://sourceforge.net/projects/rapidminer/files/1.%20RapidMiner/5.3/5/19/2018 10 - Introduction to Machine Learning
5/73
Artificial Intelligence
Artificial Intelligence (AI) has many subfields
Machine Learning (ML)
Natural Language Processing (NLP) Vision
5/19/2018 10 - Introduction to Machine Learning
6/73
What is Learning ?
Machine learning is programming computers tooptimize a performance criterion using example dataor past experience.
5/19/2018 10 - Introduction to Machine Learning
7/73
You were not made to live like beasts, but to follow virtue and knowledge.
(Dante Alighieri)
*ROBERTO BATTITI AND MAURO BRUNATO.The LION way. Machine Learning plus Intelligent Optimization.
5/19/2018 10 - Introduction to Machine Learning
8/73
What We Talk About When We
Talk AboutLearning
Learning general models from a data of particularexamples
Data is cheap and abundant (data warehouses, data
marts); knowledge is expensive and scarce. Example 1: adverse drug-drug interactions
Example 2: Customer behavior:
People who bought Blink also bought David and
Goliath (www.amazon.com) Build a model that is a good and useful approximation
to the data.
8
5/19/2018 10 - Introduction to Machine Learning
9/73
Relation with Other Fields
ML draws on ideas from many fields
Statistics
ControlTheory
ComputerScience
OptimizationNeuroscience
Economics
StatisticalPhysics
Machine Learning
5/19/2018 10 - Introduction to Machine Learning
10/73
To Understand ML
You need
Basic Knowledge of computer science
Linear Algebra Calculus
Probability and statistics
Optimization
5/19/2018 10 - Introduction to Machine Learning
11/73
Example ML Algorithms
Linear Regression
Decision trees, neural network, support vector machine,
Total
Energy
Stand Run
Very Low Very High
Low
Main
Frequency
Low
Sit
High
Walk
A simple decision treeSupport Vector Machines
5/19/2018 10 - Introduction to Machine Learning
12/73
Generic Applications
Almost everywhere
Speech recognition, face recognition, search engines,
bioinformatics, fraud detection
And it will be everywhere
Smart homes, smart vehicles, smart cities
5/19/2018 10 - Introduction to Machine Learning
13/73
Biomedical Application
Mobile health monitoring solutions
Electronic Health Record (EHR) mining
Genome-wide associations (GWAS) Smart homes for elderly
Biomarker discovery
13
5/19/2018 10 - Introduction to Machine Learning
14/73
Challenges & Competitions
Many other competitions at Kaggle
http://www.kaggle.com/competitions
Example: predict the likelihood that an HIV patient's
infection will become less severe
A great way to improve your skills (and maybe make
some money!)
http://www.kaggle.com/competitionshttp://www.kaggle.com/competitionshttp://www.kaggle.com/competitions5/19/2018 10 - Introduction to Machine Learning
15/73
Supervised vs. Unsupervised
Learning
5/19/2018 10 - Introduction to Machine Learning
16/73
Supervised Machine Learning
Goal is Prediction
Example:
Input: examples of benign and malignant tumorsdefined in terms of tumor shape, radius, ..
Output: predict whether a previously unseen example is
benign or malignant
Machine
Learning
Algorithm
Tumor
Examples
New Instance
ModelBenign or
Malignant?
5/19/2018 10 - Introduction to Machine Learning
17/73
Supervised Learning Toy
Example: Classification
0Example: Surgery
Risk
0Differentiating
between low-riskandhigh-riskpatients
CellShape
Uniformity
Cell Size Uniformity
Rule: x > a AND y > b
then low-risk
5/19/2018 10 - Introduction to Machine Learning
18/73
Supervised Learning Toy
Example: Regression
0Example: Child Mortality
0x : maternal education
y : child mortalityy =g (x | q )
where
g ( ) model,
q parameters
y = wx+w0
ChildMortality
Maternal Education
5/19/2018 10 - Introduction to Machine Learning
19/73
Supervised Learning: Uses
Prediction of future cases: Use the rule to predict the
output for future inputs
Knowledge extraction: The rule is easy to understand Compression: The rule is simpler than the data it
explains
Outlier detection: Exceptions that are not covered by
the rule, e.g., fraud
5/19/2018 10 - Introduction to Machine Learning
20/73
Unsupervised Machine
Learning
Also known as data mining
Goal is knowledge discovery
Example: Input: DNA Sequence as a long string of {A,C,G,T}
Output: frequent subsequences (gene patterns)
Data
Mining
Algorithm
DNA
SequenceModel
Gene
Pattern
AACGTAACGGGACTCCAC AC
()
()
5/19/2018 10 - Introduction to Machine Learning
21/73
Unsupervised Learning
Example: Learning Associations
It started with market basket analysis
P (Y |X ) probability that somebody who buysXalso
buys Y whereXand Yare products/services.
5/19/2018 10 - Introduction to Machine Learning
22/73
Unsupervised Learning
Learning what normally happens
No labels
Example method: Clustering: Grouping similar instances
Example applications
Image compression: Color quantization
Bioinformatics: Learning motifs
5/19/2018 10 - Introduction to Machine Learning
23/73
You dont Always need Machine
Learning!
Machine Learning definition (supervised):
The ability to learn and to improve with experience
instead of using pre-determined rules.
Consider the following two tasks:
Recognizing
Handwritten Digits
Problem: Is m a prime number?
Solution: test up to to see if m can be
factored into two values.
Testing for
Prime Numbers
5/19/2018 10 - Introduction to Machine Learning
24/73
You dont Always need Machine
Learning!
Unsupervised learning definition(rather unofficial):
Automatic analysis of data to extract previously
unknown interesting patterns
Consider the following two tasks:
DNA Sequence Mining
Problem: Find all patterns matching regular
expression A*C.
Solution: Simple String matching (finite state
machine)
Regular Expression Matching
5/19/2018 10 - Introduction to Machine Learning
25/73
When Learning is needed?
There is no need to learn to calculate payroll
Learning is used when:
Human expertise does not exist (navigating on Mars), Humans are unable to explain their expertise (speech
recognition)
Solution changes in time (routing on a computer
network)
Solution needs to be adapted to particular cases (user
biometrics)
5/19/2018 10 - Introduction to Machine Learning
26/73
Supervised vs. Unsupervised
Learning Supervised Learning (learn from my example)
Goal: A program that performs a task as good as humans.
TASK well defined (the target function)
EXPERIENCE training data provided by a human
PERFORMANCE Metric error/accuracy on the task
Unsupervised Learning (see what you can find)
Goal: To find some kind of structure in the data.
TASK vaguely defined
No EXPERIENCE: no labeled data No PERFORMANCE Metric (but, there are some evaluations
metrics)
*TAKIS METAXAS, CS 315 Web Search and Data Mining
5/19/2018 10 - Introduction to Machine Learning
27/73
Terminology
5/19/2018 10 - Introduction to Machine Learning
28/73
A Simple Example
Tumor Classification
Benign: -1
Malignant: +1
Uniformity
of Cell Size
Uniformity
of Cell
Shape
Marginal
Adhesion
Single
Epithelial
Cell Size
Bare
Nuclei
Bland
Chromatin
Normal
Nucleoli
Mitoses Class Label
(benign =2,
malignant
=4)
2 5 1 1 1 2 1 3 -1
2 5 4 4 5 7 10 3 +1
3 2 1 1 1 2 5 4 ?
5/19/2018 10 - Introduction to Machine Learning
29/73
Terminology: Feature
Features = the set of attributes associated with an
example
(aka Independent variable in statistics)
Uniformity
of Cell Size
Uniformity
of CellShape
Marginal
Adhesion
Single
EpithelialCell Size
Bare
Nuclei
Bland
Chromatin
Normal
Nucleoli
Mitoses Class Label
(benign =2,malignant
=4)
2 5 1 1 1 2 1 3 -1
2 5 4 4 5 7 10 3 +1
3 2 1 1 1 2 5 4 ?
Feature
5/19/2018 10 - Introduction to Machine Learning
30/73
Terminology: Instance
Example = an instance of data = data point =xi Each row of the table is a data instance.
Uniformity
of Cell Size
Uniformity
of Cell
Shape
Marginal
Adhesion
Single
Epithelial
Cell Size
Bare
Nuclei
Bland
Chromatin
Normal
Nucleoli
Mitoses Class Label
(benign =2,
malignant
=4)
2 5 1 1 1 2 1 3 -1
2 5 4 4 5 7 10 3 +1
3 2 1 1 1 2 5 4 ?
Instance
5/19/2018 10 - Introduction to Machine Learning
31/73
Terminology: Label
Label = Class = the feature to be predicted = category
associated with an object
Denoted byyi
(aka Dependent variable in statistics)
Label usually provided by an expert
Uniformity
of Cell Size
Uniformity
of CellShape
Marginal
Adhesion
Single
EpithelialCell Size
Bare
Nuclei
Bland
Chromatin
Normal
Nucleoli
Mitoses Class Label
(benign =2,malignant
=4)
2 5 1 1 1 2 1 3 -1
2 5 4 4 5 7 10 3 +1
3 2 1 1 1 2 5 4 ?
Label
5/19/2018 10 - Introduction to Machine Learning
32/73
Data Representation
We usually represent data in a matrix
2 5 1 1 1 2 1 3
2 5 4 4 5 7 10 3
3 2 1 1 1 2 5 4
Features
=
-1
+1
?
=Instances
Label
Instances
=
=
Co-variance Matrix (Feature
Feature)
Gram Matrix (Instance Instance)
Note: We can also assign a probability to each label (well discuss it later)
5/19/2018 10 - Introduction to Machine Learning
33/73
Summary of Key Terms
Instance = example = data point
Feature = independent variable
Class label = dependent variable Decision boundary = separates examples in different
classes
5/19/2018 10 - Introduction to Machine Learning
34/73
Algorithms
5/19/2018 10 - Introduction to Machine Learning
35/73
Availability of Labeled Data
Supervised learning => when all data is labeled
Semi-supervised learning => when a small amount of data is labeled
Unsupervised learning => when data is not labeled
Transfer Learning => when labeled data is available in another domain
Active Learning => when the algorithm has access to a human oracle to
ask for labels of a few data points
Do you have
labeled data?
SupervisedSemi-
supervisedUnsupervised
TransferLearning
ActiveLearning
Yes A little NoIn another
domain
By asking
oracle
5/19/2018 10 - Introduction to Machine Learning
36/73
Task Type
Categorical: Classification task
Classifier
Continuous: Regression task
Ordered: Ranking taskWhat is youroutput type?
Classification Regression Ranking
Categorical Continuous Ordered
5/19/2018 10 - Introduction to Machine Learning
37/73
Input Representation
The most common type
Simple records in Tables
Can be analyzed using regular machine learningtechniques.
Most other data types are converted to this type.
(Not always: There are methods that directly process other
data types.)
ID WGT HGT CholesterolRisk
(Class)
1 high short 260 high
2 high med 254 high
3 high tall 142 med
A Simple Record
5/19/2018 10 - Introduction to Machine Learning
38/73
Input Representation(cont.)
Image, video
is preprocessed using Vision techniques.
Text
is preprocessed using NLP techniques.
Continuous measures along time (Time series)
is preprocessed using Time Series analysis.
Graphs
is preprocessed using Graph Theory tools.
Image Time series Text Graph
5/19/2018 10 - Introduction to Machine Learning
39/73
More Details
5/19/2018 10 - Introduction to Machine Learning
40/73
Important Steps
1. Determine relevant features (expert knowledge)
2. Collect data (and label data)
3. Split labeled data into training and test datasets4. Use training data to train machine learning
algorithm.
5. Predict labels of examples in test data,
6. Evaluate algorithm.
5/19/2018 10 - Introduction to Machine Learning
41/73
Features Are Important!
Should be rich enough to capture the problem
Should be simple enough to allow learning the model
Too Many features Makes learning more difficult
Not enough features
Impacts generalization power
5/19/2018 10 - Introduction to Machine Learning
42/73
Feature Extraction
Typically results in significant reduction in
dimensionality
Domain-specific
* Image taken from Jeff Howbert Slides
5/19/2018 10 - Introduction to Machine Learning
43/73
Feature Extraction
Typically results in significant reduction in
dimensionality
Domain-specific
* Image taken from Jeff Howbert Slides
5/19/2018 10 - Introduction to Machine Learning
44/73
Important Steps
1. Determine relevant features (expert knowledge)
2. Collect data
3. Split labeled data into training and test datasets4. Use training data to train machine learning
algorithm.
5. Predict labels of examples in test data,
6. Evaluate algorithm.
5/19/2018 10 - Introduction to Machine Learning
45/73
How to Split Data?
Holdout
Training set
(validation set)
Test set
K-fold Cross-validation
E.g. 10 fold cross validation
5/19/2018 10 - Introduction to Machine Learning
46/73
Methods of Sampling
Holdout E.g. Reserve 2/3 for training and 1/3 for testing
Random subsampling
Cross validation Partition data into k disjoint subsets
k-fold: train on k-1 partitions, test on the remaining one
Leave-one-out: k=n
Stratified sampling
Bootstrap Sampling with replacement
5/19/2018 10 - Introduction to Machine Learning
47/73
Important Steps
1. Determine relevant features (expert knowledge)
2. Collect data
3. Split labeled data into training and test datasets4. Use training data to train machine learning
algorithm.
5. Predict labels of examples in test data
6. Evaluate algorithm.
5/19/2018 10 - Introduction to Machine Learning
48/73
Decision Boundary
We seek to find this boundary
x2(uniformity)
x1 (Radius)
OutlierBenign
Malignant
= Labeled
True Decision
Boundary
Learned
Decision
Boundary
5/19/2018 10 - Introduction to Machine Learning
49/73
Why Noise?
Noise might be due to different reasons
Imprecision in recording the input data
Errors in labeling data
We might not have considered additional features
(latent, or hidden features)
When there is noise, the decision boundary becomes
more complex
5/19/2018 10 - Introduction to Machine Learning
50/73
Overfitting
Data are well described by our model, but the
predictions do not generalize to new data.
A very rich hypothesis space
Training set too small
y
x
5/19/2018 10 - Introduction to Machine Learning
51/73
Overfitting and Underfitting
Underfitting
If your hypothesis is less complex than the actual
function
Using a straight line to model data generated by a third
order polynomial
Overfitting
If your hypothesis is more complex than the actual
function Using a fifth order polynomial to model data generated by a
second order polynomial
5/19/2018 10 - Introduction to Machine Learning
52/73
Bias-Variance
Bias = assumptions, restrictions on model
Variance = variation of the prediction of the model
Simple linear model => high bias
Complex model => high variance
y
x
y
x
Over-fittingUnder-fitting
5/19/2018 10 - Introduction to Machine Learning
53/73
Important Steps
1. Determine relevant features (expert knowledge)
2. Collect data
3. Split labeled data into training and test datasets4. Use training data to train machine learning
algorithm.
5. Predict labels of examples in test data
6. Evaluate algorithm.
5/19/2018 10 - Introduction to Machine Learning
54/73
Model Evaluation
Metrics for Performance Evaluation
How to evaluate the performance of a model?
Methods for Model Comparison
How to compare the relative performance among
competing models?
Metrics for Performance
5/19/2018 10 - Introduction to Machine Learning
55/73
Metrics for Performance
Evaluation
Focus on the predictive capability of a model
Rather than how fast it takes to classify or build models,
scalability, etc.
Confusion Matrix:
PREDICTED CLASS
ACTUAL
CLASS
Class=Yes Class=No
Class=Yes a b
Class=No c d
a: TP (true positive)
b: FN (false negative)
c: FP (false positive)
d: TN (true negative)
Metrics for Performance
5/19/2018 10 - Introduction to Machine Learning
56/73
Metrics for Performance
Evaluation
Most widely-used metric:
PREDICTED CLASS
ACTUALCLASS
Class=Yes Class=No
Class=Yes a
(TP)
b
(FN)
Class=No c
(FP)
d
(TN)
FNFPTNTP
TNTP
dcba
da
Accuracy
5/19/2018 10 - Introduction to Machine Learning
57/73
Cost Matrix
PREDICTED CLASS
ACTUALCLASS
C(i|j) Class=Yes Class=No
Class=Yes C(Yes|Yes) C(No|Yes)
Class=No C(Yes|No) C(No|No)
C(i|j): Cost of misclassifying class j example as class i
5/19/2018 10 - Introduction to Machine Learning
58/73
Computing Cost of Classification
CostMatrix
PREDICTED CLASS
ACTUAL
CLASS
C(i|j) + -
+ -1 100
- 1 0
Model
M1
PREDICTED CLASS
ACTUAL
CLASS
+ -
+ 150 40- 60 250
Model
M2
PREDICTED CLASS
ACTUAL
CLASS
+ -
+ 250 45- 5 200
Accuracy = 80%
Cost = 3910
Accuracy = 90%
Cost = 4255
5/19/2018 10 - Introduction to Machine Learning
59/73
Limitation of Accuracy
Consider a 2-class problem
Number of Class 0 examples = 9990
Number of Class 1 examples = 10
If model predicts everything to be class 0, accuracy is
9990/10000 = 99.9 %
Accuracy is misleading because model does not detect
any class 1 example
5/19/2018 10 - Introduction to Machine Learning
60/73
Other Measures
cba
a
pr
rp
baa
ca
a
2
22(F)measure-F
(r)Recall
(p)PrecisionTrue Positives
All items predicted as positive
True Positives
All actual positive items
5/19/2018 10 - Introduction to Machine Learning
61/73
Triple Tradeoff
Complexity of the hypothesis space: C
Amount of training data: N
Generalization error on new data: E
N E
C first E, then E
5/19/2018 10 - Introduction to Machine Learning
62/73
Learning Curve
Learning curve shows howaccuracy (or error) changes
with varying sample size
5/19/2018 10 - Introduction to Machine Learning
63/73
More on Bias vs. Variance
Typical learning curve for high variance:
Test error still decreasing as m increases. Suggests largertraining set will help.
Large gap between training and test error.
*Andrew Y. Ng, Advice for applying Machine Learning, Stanford
5/19/2018 10 - Introduction to Machine Learning
64/73
More on Bias vs. Variance
Typical learning curve for high bias:
Even training error is unacceptably high.
Small gap between training and test error.
*Andrew Y. Ng, Advice for applying Machine Learning, Stanford
5/19/2018 10 - Introduction to Machine Learning
65/73
Diagnosis
Fixes to try:
Solution Fixes the problem of
Try getting more training examples.
Try a smaller set of features.
Try a larger set of features.
Try different features.
high variance.
high variance.
high bias.
high bias.
*Andrew Y. Ng, Advice for applying Machine Learning, Stanford
5/19/2018 10 - Introduction to Machine Learning
66/73
Model Evaluation
Metrics for Performance Evaluation
How to evaluate the performance of a model?
Methods for Model Comparison
How to compare the relative performance among
competing models?
We will look at this next time!
P i I All T h
5/19/2018 10 - Introduction to Machine Learning
67/73
Putting It All Together Differentiate between walking and Jogging using
accelerometer
Data
preprocess
Feature
Extraction
Feature
Selection
Train
Sample d=(x,y,z)at 60 HZ
-Segment-Label
f_1, f_2, f_3,.
Select some features
Kwapisz et al, SIGKDD exploration, 2010
Total
Energy
Stand Run
Very Low Very High
Low
Main
Frequency
Low
Sit
High
Walk
A simple decision tree model
Evaluate
5/19/2018 10 - Introduction to Machine Learning
68/73
References
Slides partially based on:
Lecture Notes for E Alpaydn 2010 Introduction to
Machine Learning 2e The MIT Press (V1.0)
5/19/2018 10 - Introduction to Machine Learning
69/73
Resources for You
5/19/2018 10 - Introduction to Machine Learning
70/73
Tools
RapidMiner
Weka
R
Scikits-learn
Matlab
More here
https://sites.google.com/site/parisar/links (You can also find some publicly available free e-books
on machine learning)
https://sites.google.com/site/parisar/linkshttps://sites.google.com/site/parisar/linkshttps://sites.google.com/site/parisar/links5/19/2018 10 - Introduction to Machine Learning
71/73
Resources: Datasets
UCI Repository:http://www.ics.uci.edu/~mlearn/MLRepository.html
UCI KDD Archive:
http://kdd.ics.uci.edu/summary.data.application.html
Statlib: http://lib.stat.cmu.edu/
Delve: http://www.cs.utoronto.ca/~delve/
71
http://www.ics.uci.edu/~mlearn/MLRepository.htmlhttp://kdd.ics.uci.edu/summary.data.application.htmlhttp://lib.stat.cmu.edu/http://www.cs.utoronto.ca/~delve/http://www.cs.utoronto.ca/~delve/http://lib.stat.cmu.edu/http://kdd.ics.uci.edu/summary.data.application.htmlhttp://www.ics.uci.edu/~mlearn/MLRepository.html5/19/2018 10 - Introduction to Machine Learning
72/73
Resources: Journals
IEEE transaction on knowledge and data engineering
Journal of Machine Learning Research www.jmlr.org
Machine Learning
Neural Computation Neural Networks
IEEE Transactions on Neural Networks
IEEE Transactions on Pattern Analysis and MachineIntelligence
Annals of Statistics Journal of the American Statistical Association
...
72
f
http://www.jmlr.org/http://www.jmlr.org/5/19/2018 10 - Introduction to Machine Learning
73/73
Resources: Conferences
International Conference on Knowledge Discovery andData Mining (KDD)
International Conference on Machine Learning (ICML)
European Conference on Machine Learning (ECML)
Neural Information Processing Systems (NIPS)
Uncertainty in Artificial Intelligence (UAI)
Computational Learning Theory (COLT)
International Conference on Artificial Neural Networks(ICANN)
International Conference on AI & Statistics (AISTATS)
International Conference on Pattern Recognition (ICPR)
...