Machine Learning with MATLAB - MathWorks · Machine Learning with MATLAB U. M. Sundar ... Generating HDL Code from Simulink Bangalore 28-29 Nov 2013 Email: training@mathworks.in URL:
Post on 29-Apr-2018
219 Views
Preview:
Transcript
1 © 2013 The MathWorks, Inc.
Machine Learning with MATLAB
U. M. Sundar
Senior Application Engineer
MathWorks India
2
Agenda
Machine Learning Overview
Machine Learning with MATLAB
– Unsupervised Learning
Clustering
– Supervised Learning
Classification
Regression
Learn More
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Group1
Group2
Group3
Group4
Group5
Group6
Group7
Group8
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
3
Machine Learning Overview Types of Learning, Categories of Algorithms
Machine
Learning
Supervised
Learning
Classification
Regression
Unsupervised
Learning Clustering
Group and interpret data based only
on input data
Develop predictive model based on both input and output data
Type of Learning Categories of Algorithms
4
Machine Learning When and where it is used?
When to use it
– Predict a future outcome based on
Historical data (many variables)
Specific patterns
– Define a System that is
Based on inputs and outputs from the system
complex to define using governing equations (e.g., black-box modeling)
Examples
– Pattern recognition (speech, images)
– Financial algorithms (credit scoring, algo trading)
– Energy forecasting (load, price)
– Biology (tumor detection, drug discovery)
93.68%
2.44%
0.14%
0.03%
0.03%
0.00%
0.00%
0.00%
5.55%
92.60%
4.18%
0.23%
0.12%
0.00%
0.00%
0.00%
0.59%
4.03%
91.02%
7.49%
0.73%
0.11%
0.00%
0.00%
0.18%
0.73%
3.90%
87.86%
8.27%
0.82%
0.37%
0.00%
0.00%
0.15%
0.60%
3.78%
86.74%
9.64%
1.84%
0.00%
0.00%
0.00%
0.08%
0.39%
3.28%
85.37%
6.24%
0.00%
0.00%
0.00%
0.00%
0.06%
0.18%
2.41%
81.88%
0.00%
0.00%
0.06%
0.08%
0.16%
0.64%
1.64%
9.67%
100.00%
AAA AA A BBB BB B CCC D
AAA
AA
A
BBB
BB
B
CCC
D
5
Basic Concepts in Machine Learning
Start with an initial set of data
“Learn” from this data
– “Train” your algorithm
with this data
Use the resulting model
to predict outcomes
for new data sets
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Group1
Group2
Group3
Group4
Group5
Group6
Group7
Group8
6
Machine Learning Process
Exploration Modeling Evaluation Deployment
7
Exploratory Data Analysis
Gain insight from visual examination
– Identify trends and interactions
– Detect patterns
– Remove outliers
– Shrink data
– Select and pare predictors
– Feature transformation
MPG Acceleration Displacement Weight Horsepow er
MP
GA
ccele
ratio
nD
ispla
cem
ent
Weig
ht
Hors
epow
er
50 1001502002000 4000200 40010 2020 40
50
100
150
200
2000
4000
200
400
10
20
20
40
8
Data Exploration Interactions Between Variables
Plot Matrix by Group Parallel Coordinates Plot Andrews’ Plot
Glyph Plot Chernoff Faces
MPG Acceleration Displacement Weight Horsepow er
MP
GA
ccele
ratio
nD
ispla
cem
ent
Weig
ht
Hors
epow
er
50 1001502002000 4000200 40010 2020 40
50
100
150
200
2000
4000
200
400
10
20
20
40
MPG Acceleration Displacement Weight Horsepower-3
-2
-1
0
1
2
3
4
Coord
inate
Valu
e
4
6
8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-8
-6
-4
-2
0
2
4
6
8
t
f(t)
4
6
8
chevrolet chevelle malibu buick skylark 320 plymouth satellite
amc rebel sst ford torino ford galaxie 500
chevrolet impala plymouth fury iii pontiac catalina
chevrolet chevelle malibubuick skylark 320 plymouth satellite
amc rebel sst ford torino ford galaxie 500
chevrolet impala plymouth fury iii pontiac catalina
9
Unsupervised Learning Clustering
Supervised
Learning
Unsupervised
Learning Clustering
Group and interpret data based only
on input data Machine
Learning
K-means,
Fuzzy K-means
Hierarchical
Neural Network
Gaussian
Mixture
Classification
Regression
10
Dataset We’ll Be Using
Cloud of randomly generated points
– Each cluster center is randomly chosen inside specified bounds
– Each cluster contains the specified number of points per cluster
– Each cluster point is sampled from a Gaussian distribution
– Multi-dimensional dataset
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Group1
Group2
Group3
Group4
Group5
Group6
Group7
Group8
11
Clustering Overview
What is clustering?
– Segment data into groups,
based on data similarity
Why use clustering?
– Identify outliers
– Resulting groups may be
the matter of interest
How is clustering done?
– Can be achieved by various algorithms
– It is an iterative process (involving trial and error)
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
12
Clustering: K – Means Clustering
K-means is a partitioning method
Partitions data into k mutually
exclusive clusters
Each cluster has a
centroid (or center)
– Sum of distances from
all objects to the center
is minimized
Statistics Toolbox
13
Clustering: Neural Networks
Networks are comprised of one or more layers
Outputs computed by applying a nonlinear
transfer function with weighted sum of
inputs
Trained by letting the network
continually adjust itself
to new inputs (determines weights)
Interactive apps for easily creating and training networks
Multi-layered networks created by cascading
(provide better accuracy)
Example architectures for clustering:
– Self-organizing maps
– Competitive layers
Output Variable
Transfer function
Weights
Bias
Input variables
14
Clustering : Gaussian Mixture Models
Good when clusters have different
sizes and are correlated
Assume that data is drawn
from a fixed number K
of normal distributions
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0
10
20
Statistics Toolbox
15
Cluster Analysis Summary
Segments data into groups, based on data similarity
No method is perfect (depends on data)
Process is iterative; explore different algorithms
Beware of local minima (global optimization can help)
Clustering
K-means,
Fuzzy K-means
Hierarchical
Neural Network
Gaussian
Mixture
16
Model Development Process
Exploration Modeling Evaluation Deployment
17
Supervised Learning Classification for Predictive Modeling
Unsupervised
Learning
Develop predictive model based on both input and output data
Machine
Learning
Supervised
Learning
Decision Tree
Ensemble
Method
Neural Network
Support Vector
Machine
Classification
18
Classification Overview
What is classification?
– Predicting the best group for each point
– “Learns” from labeled observations
– Uses input features
Why use classification?
– Accurately group data never seen before
How is classification done?
– Can use several algorithms to build a predictive model
– Good training data is critical
-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Group1
Group2
Group3
Group4
Group5
Group6
Group7
Group8
19
Classification - Decision Trees
Builds a tree from training data
– Model is a tree where each node is a
logical test on a predictor
Traverse tree by comparing
features with threshold values
The “leaf” of the tree
specifies the group
Statistics Toolbox
20
Classification - Ensemble Learners Overview
Decision trees are “weak” learners
– Good to classify data used to train
– Often not very good with new data
– Note rectangular groups
What are ensemble learners?
– Combine many decision trees to create a
“strong” learner
– Uses “bootstrapped aggregation”
Why use ensemble methods?
– Classifier has better predictive power
– Note improvement in cluster shapes
Statistics Toolbox
-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6-0.5
0
0.5
1
1.5
x1
x2
group1
group2
group3
group4
group5
group6
group7
group8
21
Classification - Support Vector Machines Overview
Good for modeling with complex
boundaries between groups
– Can be very accurate
– No restrictions on the predictors
What does it do?
– Uses non-linear “kernel” to
calculate the boundaries
– Can be computationally intensive
Version in Statistics Toolbox only
classifies into two groups
Statistics Toolbox
(as of R2013a)
-3 -2 -1 0 1 2 3-2
-1
0
1
2
3
4
1
2
Support Vectors
22
K-Nearest Neighbor Classification
One of the simplest classifiers
Takes the K nearest points
from the training set, and
chooses the majority class
of those K points
No training phase – all the
work is done during the
application of the model -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
-0.5
0
0.5
1
1.5
x1
x2
group1
group2
group3
group4
group5
group6
group7
group8
Statistics Toolbox
23
Classification Summary
No absolute best method
Simple does not
mean inefficient
Watch for overfitting
– Decision trees and neural networks may overfit the noise
– Use ensemble learning and cross-validation
Parallelize for speedup
Decision Tree
Ensemble
Method
Neural Network
Support Vector
Machine
Classification
24
Supervised Learning Regression for Predictive Modeling
Unsupervised
Learning
Develop predictive model based on both input and output data
Machine
Learning
Supervised
Learning
Linear
Non-linear
Non-parametric
Regression
25
Regression
Why use regression?
– Predict the continuous response
for new observations
Type of predictive modeling
– Specify a model that describes
Y as a function of X
– Estimate coefficients that
minimize the difference
between predicted and actual
You can apply techniques from earlier sections with
regression as well
Statistics Toolbox
Curve Fitting Toolbox
26
Linear Regression
Y is a linear function of the regression coefficients
Common examples:
Straight line 𝑌 = 𝐵0 + 𝐵1𝑋1
Plane
𝑌 = 𝐵0 + 𝐵1𝑋1 +𝐵2𝑋2
Polynomial 𝑌 = 𝐵0 + 𝐵1𝑋13 + 𝐵2𝑋1
2 +𝐵3𝑋1
Polynomial
with cross terms 𝑌 = 𝐵0 + 𝐵1𝑋1
2 + 𝐵2(𝑋1 ∗ 𝑋2) + 𝐵3 𝑋22
27
Fourier Series 𝑏0 + 𝑏1 cos 𝑏3𝑋 + 𝑏2 sin 𝑏3𝑋
Exponential Growth 𝑁 = 𝑁0𝑒
Logistic Growth 𝑃 𝑡 =𝑏0
1 + 𝑏1𝑒−𝑘𝑡
Nonlinear Regression
Y is a nonlinear function of the regression coefficients
Syntax for formulas:
y ~ b0 + b1*cos(x*b3) +
b4*sin(x*b3)
y ~ b0 + b1*cos(x*b3) +
b4*sin(x*b3)
@(b,t)(b(1)*exp(b(2)*t)
y ~ b0 + b1*cos(x*b3) +
b4*sin(x*b3)
@(b,t)(b(1)*exp(b(2)*t)
@(b,t)(1/(b(1) + exp(-
b(2)*x)))
28
Generalized Linear Models
Extends the linear model
– Define relationship between model and response variable
– Model error distributions other than normal
Logistic regression
– Response variable is binary (true / false)
– Results are typically expressed as an odd’s ratio
Poisson regression
– Model count data (non-negative integers)
– Response variable comes from a Poisson distribution
29
Machine Learning with MATLAB
Interactive environment
– Visual tools for exploratory data analysis
– Easy to evaluate and choose best algorithm
– Apps available to help you get started (e.g,. neural network tool, curve fitting tool)
Multiple algorithms to choose from
– Clustering
– Classification
– Regression
30
Learn More : Machine Learning with MATLAB
Classification
with MATLAB
Credit Risk Modeling with
MATLAB
Multivariate Classification in
the Life Sciences
Electricity Load and
Price Forecasting
http://www.mathworks.com/discovery/
machine-learning.html
Data Driven Fitting
with MATLAB Regression
with MATLAB
31
MathWorks India – Services and Offerings
Local website: www.mathworks.in
Technical Support India: www.mathworks.in/ myservicerequests
Customer Service for non-technical questions: info@mathworks.in
Application Engineering
Product Training:
www.mathworks.in/training
Consulting
32
Scheduled Public Training for Sep–Dec 2013
Course Name Location Training dates
Statistical Methods in MATLAB Bangalore 02- 03 Sep 2013
MATLAB based Optimization Techniques Bangalore 04 Sep 2013
Physical Modeling of Multi-Domain Systems using Simscape Bangalore 05 Sep 2013
MATLAB Fundamentals
Delhi 23-25 Sep 2013
Pune 07-09 Oct 2013
Bangalore 21-23 Oct 2013
Web based 05- 07 Nov 2013
Chennai 09-11 Dec 2013
Simulink for System and Algorithm Modeling
Delhi 26-27 Sep 2013
Pune 10-11 Oct 2013
Bangalore 24-25 Oct 2013
Web based 12-13 Nov 2013
Chennai 12-13 Dec 2013
MATLAB Programming Techniques Bangalore 18-19 Nov 2013
MATLAB for Data Processing and Visualization Bangalore 20 Nov 2013
MATLAB for Building Graphical User Interface Bangalore 21 Nov 2013
Generating HDL Code from Simulink Bangalore 28-29 Nov 2013 Email: training@mathworks.in URL: http://www.mathworks.in/services/training Phone: 080-6632-6000
33
MathWorks Certification Program- for the first
time in India!
MathWorks Certified MATLAB Associate Exam
Why certification?
Validates proficiency with MATLAB
Can help accelerate professional growth
Can help increase productivity and project success and thereby
prove to be a strategic investment
Certification exam administered in English at MathWorks facilities
in Bangalore on Nov 27,2013
Email: training@mathworks.in URL: http://www.mathworks.in/services/training Phone: 080-6632-6000
34
MathWorks India Contact Details
URL: http://www.mathworks.in
E-mail: info@mathworks.in
Technical Support: www.mathworks.in/myservicerequests
Tel: +91-80-6632 6000
Fax: +91-80-6632 6010
Thank You for Attending
Talk to Us – We are Happy to Support You
MathWorks India Private Limited
Salarpuria Windsor Building
Third Floor,
No.3 Ulsoor Road
Bangalore - 560042, Karnataka
India
top related