Machine Learning with MATLAB

U. M. Sundar

Senior Application Engineer

MathWorks India

Agenda

Machine Learning Overview

– Unsupervised Learning

Clustering

– Supervised Learning

Classification

Regression

Learn More

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

Group1

Group2

Group3

Group4

Group5

Group6

Group7

Group8

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

Machine Learning Overview Types of Learning, Categories of Algorithms

Machine

Learning

Supervised

Learning

Classification

Regression

Unsupervised

Learning Clustering

Group and interpret data based only

on input data

Develop predictive model based on both input and output data

Type of Learning Categories of Algorithms

Machine Learning When and where it is used?

When to use it

– Predict a future outcome based on

Historical data (many variables)

Specific patterns

– Define a System that is

Based on inputs and outputs from the system

complex to define using governing equations (e.g., black-box modeling)

Examples

– Pattern recognition (speech, images)

– Financial algorithms (credit scoring, algo trading)

– Energy forecasting (load, price)

– Biology (tumor detection, drug discovery)

93.68%

92.60%

91.02%

87.86%

86.74%

85.37%

81.88%

100.00%

AAA AA A BBB BB B CCC D

Basic Concepts in Machine Learning

Start with an initial set of data

“Learn” from this data

– “Train” your algorithm

with this data

Use the resulting model

to predict outcomes

for new data sets

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

Group1

Group2

Group3

Group4

Group5

Group6

Group7

Group8

Machine Learning Process

Exploration Modeling Evaluation Deployment

Exploratory Data Analysis

Gain insight from visual examination

– Identify trends and interactions

– Detect patterns

– Remove outliers

– Shrink data

– Select and pare predictors

– Feature transformation

MPG Acceleration Displacement Weight Horsepow er

50 1001502002000 4000200 40010 2020 40

Data Exploration Interactions Between Variables

Plot Matrix by Group Parallel Coordinates Plot Andrews’ Plot

Glyph Plot Chernoff Faces

MPG Acceleration Displacement Weight Horsepow er

50 1001502002000 4000200 40010 2020 40

MPG Acceleration Displacement Weight Horsepower-3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-8

chevrolet chevelle malibu buick skylark 320 plymouth satellite

amc rebel sst ford torino ford galaxie 500

chevrolet impala plymouth fury iii pontiac catalina

chevrolet chevelle malibubuick skylark 320 plymouth satellite

amc rebel sst ford torino ford galaxie 500

chevrolet impala plymouth fury iii pontiac catalina

Unsupervised Learning Clustering

Supervised

Learning

Unsupervised

Learning Clustering

Group and interpret data based only

on input data Machine

Learning

K-means,

Fuzzy K-means

Hierarchical

Neural Network

Gaussian

Mixture

Classification

Regression

Dataset We’ll Be Using

Cloud of randomly generated points

– Each cluster center is randomly chosen inside specified bounds

– Each cluster contains the specified number of points per cluster

– Each cluster point is sampled from a Gaussian distribution

– Multi-dimensional dataset

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

Group1

Group2

Group3

Group4

Group5

Group6

Group7

Group8

Clustering Overview

What is clustering?

– Segment data into groups,

based on data similarity

Why use clustering?

– Identify outliers

– Resulting groups may be

the matter of interest

How is clustering done?

– Can be achieved by various algorithms

– It is an iterative process (involving trial and error)

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

Clustering: K – Means Clustering

K-means is a partitioning method

Partitions data into k mutually

exclusive clusters

Each cluster has a

centroid (or center)

– Sum of distances from

all objects to the center

is minimized

Statistics Toolbox

Clustering: Neural Networks

Networks are comprised of one or more layers

Outputs computed by applying a nonlinear

transfer function with weighted sum of

inputs

Trained by letting the network

continually adjust itself

to new inputs (determines weights)

Interactive apps for easily creating and training networks

Multi-layered networks created by cascading

(provide better accuracy)

Example architectures for clustering:

– Self-organizing maps

– Competitive layers

Output Variable

Transfer function

Weights

Input variables

Clustering : Gaussian Mixture Models

Good when clusters have different

sizes and are correlated

Assume that data is drawn

from a fixed number K

of normal distributions

Statistics Toolbox

Cluster Analysis Summary

Segments data into groups, based on data similarity

No method is perfect (depends on data)

Process is iterative; explore different algorithms

Beware of local minima (global optimization can help)

Clustering

K-means,

Fuzzy K-means

Hierarchical

Neural Network

Gaussian

Mixture

Model Development Process

Exploration Modeling Evaluation Deployment

Supervised Learning Classification for Predictive Modeling

Unsupervised

Learning

Machine

Learning

Supervised

Learning

Decision Tree

Ensemble

Method

Neural Network

Support Vector

Machine

Classification

Classification Overview

What is classification?

– Predicting the best group for each point

– “Learns” from labeled observations

– Uses input features

Why use classification?

– Accurately group data never seen before

How is classification done?

– Can use several algorithms to build a predictive model

– Good training data is critical

-0.1 0 0.1 0.2 0.3 0.4 0.5 0.60

Group1

Group2

Group3

Group4

Group5

Group6

Group7

Group8

Classification - Decision Trees

Builds a tree from training data

– Model is a tree where each node is a

logical test on a predictor

Traverse tree by comparing

features with threshold values

The “leaf” of the tree

specifies the group

Statistics Toolbox

Classification - Ensemble Learners Overview

Decision trees are “weak” learners

– Good to classify data used to train

– Often not very good with new data

– Note rectangular groups

What are ensemble learners?

– Combine many decision trees to create a

“strong” learner

– Uses “bootstrapped aggregation”

Why use ensemble methods?

– Classifier has better predictive power

– Note improvement in cluster shapes

Statistics Toolbox

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6-0.5

group1

group2

group3

group4

group5

group6

group7

group8

Classification - Support Vector Machines Overview

Good for modeling with complex

boundaries between groups

– Can be very accurate

– No restrictions on the predictors

What does it do?

– Uses non-linear “kernel” to

calculate the boundaries

– Can be computationally intensive

Version in Statistics Toolbox only

classifies into two groups

Statistics Toolbox

(as of R2013a)

-3 -2 -1 0 1 2 3-2

Support Vectors

K-Nearest Neighbor Classification

One of the simplest classifiers

Takes the K nearest points

from the training set, and

chooses the majority class

of those K points

No training phase – all the

work is done during the

application of the model -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

group1

group2

group3

group4

group5

group6

group7

group8

Statistics Toolbox

Classification Summary

No absolute best method

Simple does not

mean inefficient

Watch for overfitting

– Decision trees and neural networks may overfit the noise

– Use ensemble learning and cross-validation

Parallelize for speedup

Decision Tree

Ensemble

Method

Neural Network

Support Vector

Machine

Classification

Supervised Learning Regression for Predictive Modeling

Unsupervised

Learning

Machine

Learning

Supervised

Learning

Linear

Non-linear

Non-parametric

Regression

Why use regression?

– Predict the continuous response

for new observations

Type of predictive modeling

– Specify a model that describes

Y as a function of X

– Estimate coefficients that

minimize the difference

between predicted and actual

You can apply techniques from earlier sections with

regression as well

Statistics Toolbox

Curve Fitting Toolbox

Linear Regression

Y is a linear function of the regression coefficients

Common examples:

Straight line 𝑌 = 𝐵0 + 𝐵1𝑋1

𝑌 = 𝐵0 + 𝐵1𝑋1 +𝐵2𝑋2

Polynomial 𝑌 = 𝐵0 + 𝐵1𝑋13 + 𝐵2𝑋1

2 +𝐵3𝑋1

Polynomial

with cross terms 𝑌 = 𝐵0 + 𝐵1𝑋1

2 + 𝐵2(𝑋1 ∗ 𝑋2) + 𝐵3 𝑋22

Fourier Series 𝑏0 + 𝑏1 cos 𝑏3𝑋 + 𝑏2 sin 𝑏3𝑋

Exponential Growth 𝑁 = 𝑁0𝑒

Logistic Growth 𝑃 𝑡 =𝑏0

1 + 𝑏1𝑒−𝑘𝑡

Nonlinear Regression

Y is a nonlinear function of the regression coefficients

Syntax for formulas:

y ~ b0 + b1*cos(x*b3) +

b4*sin(x*b3)

y ~ b0 + b1*cos(x*b3) +

b4*sin(x*b3)

@(b,t)(b(1)*exp(b(2)*t)

y ~ b0 + b1*cos(x*b3) +

b4*sin(x*b3)

@(b,t)(b(1)*exp(b(2)*t)

@(b,t)(1/(b(1) + exp(-

b(2)*x)))

Generalized Linear Models

Extends the linear model

– Define relationship between model and response variable

– Model error distributions other than normal

Logistic regression

– Response variable is binary (true / false)

– Results are typically expressed as an odd’s ratio

Poisson regression

– Model count data (non-negative integers)

– Response variable comes from a Poisson distribution

Interactive environment

– Visual tools for exploratory data analysis

– Easy to evaluate and choose best algorithm

– Apps available to help you get started (e.g,. neural network tool, curve fitting tool)

Multiple algorithms to choose from

– Clustering

– Classification

– Regression

Learn More : Machine Learning with MATLAB

Classification

with MATLAB

Credit Risk Modeling with

MATLAB

Multivariate Classification in

the Life Sciences

Electricity Load and

Price Forecasting

http://www.mathworks.com/discovery/

machine-learning.html

Data Driven Fitting

with MATLAB Regression

with MATLAB

MathWorks India – Services and Offerings

Local website: www.mathworks.in

Technical Support India: www.mathworks.in/ myservicerequests

Customer Service for non-technical questions: info@mathworks.in

Application Engineering

Product Training:

www.mathworks.in/training

Consulting

Scheduled Public Training for Sep–Dec 2013

Course Name Location Training dates

Statistical Methods in MATLAB Bangalore 02- 03 Sep 2013

MATLAB based Optimization Techniques Bangalore 04 Sep 2013

Physical Modeling of Multi-Domain Systems using Simscape Bangalore 05 Sep 2013

MATLAB Fundamentals

Delhi 23-25 Sep 2013

Pune 07-09 Oct 2013

Bangalore 21-23 Oct 2013

Web based 05- 07 Nov 2013

Chennai 09-11 Dec 2013

Simulink for System and Algorithm Modeling

Delhi 26-27 Sep 2013

Pune 10-11 Oct 2013

Bangalore 24-25 Oct 2013

Web based 12-13 Nov 2013

Chennai 12-13 Dec 2013

MATLAB Programming Techniques Bangalore 18-19 Nov 2013

MATLAB for Data Processing and Visualization Bangalore 20 Nov 2013

MATLAB for Building Graphical User Interface Bangalore 21 Nov 2013

Generating HDL Code from Simulink Bangalore 28-29 Nov 2013 Email: training@mathworks.in URL: http://www.mathworks.in/services/training Phone: 080-6632-6000

MathWorks Certification Program- for the first

time in India!

MathWorks Certified MATLAB Associate Exam

Why certification?

Validates proficiency with MATLAB

Can help accelerate professional growth

Can help increase productivity and project success and thereby

prove to be a strategic investment

Certification exam administered in English at MathWorks facilities

in Bangalore on Nov 27,2013

Email: training@mathworks.in URL: http://www.mathworks.in/services/training Phone: 080-6632-6000

MathWorks India Contact Details

URL: http://www.mathworks.in

E-mail: info@mathworks.in

Technical Support: www.mathworks.in/myservicerequests

Tel: +91-80-6632 6000

Fax: +91-80-6632 6010

Thank You for Attending

Talk to Us – We are Happy to Support You

MathWorks India Private Limited

Salarpuria Windsor Building

Third Floor,

No.3 Ulsoor Road

Bangalore - 560042, Karnataka

Machine Learning with MATLAB - MathWorks · Machine Learning with MATLAB U. M. Sundar ... Generating HDL Code from Simulink Bangalore 28-29 Nov 2013 Email: training@mathworks.in URL:

Documents

Big Data and Machine Learning Using MATLAB · Naive Bayes.....

Machine Learning for Financial Applications · Add-in...

Simulation Matlab/Simulink© d’une machine à...

Modélisation d'une machine asynchrone sous Matlab en vue...

Machine Learning for Risk Management in MATLAB ·...

Machine Vision Toolbox for MATLAB (Relese 3)

Commande de Machine Electrique en Environnement...

Introduction to Machine Learning€¦ · Machine Learning.....

A Simpler Approach to the Modelling of Permanent Magnet...

9.Algo Matlab I.r13.1pptx - EPFL...2018/08/09 · – FGV,....

Machine Vision Toolbox for Matlab

Machine Learning y Deep Learning con MATLAB · MATLAB makes...

MATLAB Deep Learning · 2019. 4. 29. · MATLAB Deep...

Matlab Program .m File for UTM (Universal Testing Machine)

Machine Learning with MATLAB · 2 Agenda Machine Learning.....