7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
1/33
Introduction to Machine Learning
Ibrahim Sabek
Computer and Systems Engineering Department,Faculty of Engineering,
Alexandria University, Egypt
1 / 3 3
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
2/33
Agenda
1 Machine learning overview and applications2 Supervised vs. Unsupervised learning
3 Generative vs. Discriminative models
4 Overview of Classification
5 The big picture
6 Bayesian inference
7 Summary
8 Feedback
2 / 3 3
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
3/33
Machine learning overview and applications
What is Machine Learning (ML)?Definition: algorithms for inferring unknowns from knowns.
What do you mean by inferring ??How to get unknowns from knowns??
3 / 3 3
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
4/33
Machine learning overview and applications
What is Machine Learning (ML)?Definition: algorithms for inferring unknowns from knowns.
What do you mean by inferring ??How to get unknowns from knowns??
ML applications
Spam detection
Handwriting detectionSpeech recognitionNetflix recommendation system
4 / 3 3
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
5/33
Machine learning overview and applications
What is Machine Learning (ML)?Definition: algorithms for inferring unknowns from knowns.
What do you mean by inferring ??How to get unknowns from knowns??
ML applications
Spam detection
Handwriting detectionSpeech recognitionNetflix recommendation system
Classes of ML models
Supervised vs. Unsupervised.Generative vs. Discriminative
5 / 3 3
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
6/33
Supervised vs. Unsupervised learning
Supervised vs. UnsupervisedSupervised: Given (x1,y1), (x2,y2), ......, (xn,yn), choose a
function f(xi) = yi
xi R2, xi = data pointsyi = class/value
6 / 3 3
S i d U i d l i
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
7/33
Supervised vs. Unsupervised learning
Supervised vs. UnsupervisedSupervised: Given (x1,y1), (x2,y2), ......, (xn,yn), choose a
function f(xi) = yi
xi R2, xi = data pointsyi = class/valueClassification: yi {finite set}
Regression: yi
R
7 / 3 3
S i d U i d l i
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
8/33
Supervised vs. Unsupervised learning
Supervised vs. UnsupervisedSupervised: Given (x1,y1), (x2,y2), ......, (xn,yn), choose a
function f(xi) = yixi R2, xi = data pointsyi = class/valueClassification: yi {finite set}Regression: yi R
8 / 3 3
Supervised vs Unsupervised learning
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
9/33
Supervised vs. Unsupervised learning
Supervised vs. UnsupervisedSupervised: Given (x1,y1), (x2,y2), ......, (xn,yn), choose a
function f(xi) = yixi R2, xi = data pointsyi = class/valueClassification: yi {finite set}Regression: yi R
9 / 3 3
Supervised vs Unsupervised learning
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
10/33
Supervised vs. Unsupervised learning
Supervised vs. UnsupervisedUnsupervised: Given (x1, x2, ..., xn), find patterns in the data.
xi R2, xi = data points
10/33
Supervised vs Unsupervised learning
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
11/33
Supervised vs. Unsupervised learning
Supervised vs. UnsupervisedUnsupervised: Given (x1, x2, ..., xn), find patterns in the data.
xi R2, xi = data pointsClusteringDensity estimationDimensional reduction
11/33
Supervised vs. Unsupervised learning
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
12/33
Supervised vs. Unsupervised learning
Supervised vs. UnsupervisedUnsupervised: Given (x1, x2, ..., xn), find patterns in the data.
xi R2, xi = data pointsClusteringDensity estimationDimensional reduction
12/33
Supervised vs. Unsupervised learning
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
13/33
p p g
Supervised vs. UnsupervisedUnsupervised: Given (x1, x2, ..., xn), find patterns in the data.
xi R2, xi = data pointsClusteringDensity estimationDimensional reduction
13/33
Supervised vs. Unsupervised learning
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
14/33
p p g
Supervised vs. UnsupervisedUnsupervised: Given (x1, x2, ..., xn), find patterns in the data.
xi R2, xi = data pointsClusteringDensity estimationDimensional reduction
14/33
Supervised vs. Unsupervised learning
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
15/33
Variations on Supervised and Unsupervised
Semi-supervised: Given
(x1,y1), (x2,y2), ......, (xk,yk), xk+1, xk+2, ..., xn, predictyk+1,yk+2, ...,yn
15/33
Supervised vs. Unsupervised learning
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
16/33
Variations on Supervised and Unsupervised
Semi-supervised: Given
(x1,y1), (x2,y2), ......, (xk,yk), xk+1, xk+2, ..., xn, predictyk+1,yk+2, ...,yn
Active learning:
16/33
Supervised vs. Unsupervised learning
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
17/33
Variations on Supervised and Unsupervised
Decision theory: measure the prediction performance of
unlabeled data
17/33
Supervised vs. Unsupervised learning
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
18/33
Variations on Supervised and Unsupervised
Decision theory: measure the prediction performance of
unlabeled dataReinforcement learning:
maximize rewards (minimize losses) by actionsmaximize overall lifetime reward
18/33
Generative vs. Discriminative models
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
19/33
Generative vs. Discriminative models
Given (x1,y1), (x2,y2), ......, (xn,yn), and a new point (x,y)
19/33
Generative vs. Discriminative models
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
20/33
Generative vs. Discriminative models
Given (x1,y1), (x2,y2), ......, (xn,yn), and a new point (x,y)
Discriminative:
you want to estimate p(y = 1|x), p(y = 0|x) for y {0, 1}
20/33
Generative vs. Discriminative models
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
21/33
Generative vs. Discriminative models
Given (x1,y1), (x2,y2), ......, (xn,yn), and a new point (x,y)
Discriminative:
you want to estimate p(y = 1|x), p(y = 0|x) for y {0, 1}
Generative:
you want to estimate the joint distribution p(x, y)
21/33
Overview of Classification
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
22/33
k-Nearest Neighbor classification (kNN)
Given D = {(x1,y1), (x2,y2), ......, (xn,yn)}, and a new point (x,y)
where xi R,yi {0, 1}
Dissimilarity metric: d(x, x) = ||x x||2 for k = 1Probabilistic interpretation:
Given fixed k, p(y) = fraction of pts xi in Nk(x) s.t. yi = y
y = argmaxyp(y|x, D)22/33
Overview of Classification
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
23/33
Classification trees (CART)
Given D = {(x1,y1), (x2,y2), ......, (xn,yn)}, and a new x where
xi R,yi {0, 1}You build a binary tree
Minimize error in each leaf
23/33
Overview of Classification
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
24/33
Regression tress (CART)
Given D = {(x1,y1), (x2,y2), ......, (xn,yn)}, and a new x where
xi R,yi R
24/33
Overview of Classification
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
25/33
Bootstrap aggregation (Bagging)
Given D = {(x1,y1), (x2,y2), ......, (xn,yn)} follows P iid , and a
new x where xi R,yi R, we need to find its y value
25/33
Overview of Classification
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
26/33
Bootstrap aggregation (Bagging)
Given D = {(x1,y1), (x2,y2), ......, (xn,yn)} follows P iid , and a
new x where xi R,yi R, we need to find its y value
Intuition: averaging makes your prediction close to the truelabel
Different training datasets , (xik,yik) follows uniform (D) iid.
The final label y is the average of generated labels from thedifferent datasets.
26/33
Overview of Classification
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
27/33
Random forests
Given D = {(x1,y1), (x2,y2), ......, (xn,yn)} where xi R,yi R
For i = 1, ..., BChoose bootstrap sample Di from DConstruct tree Ti using Di s.t. at each node choose randomsubset of features and only consider splitting on these features.
Given x, take majority vote (for classification) or average (forregression).
27/33
The big picture
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
28/33
The big picture
Given the expected loss function EL(y, f(x)) and
D = {(x1,y1), (x2,y2), ......, (xn,yn)} where xi R,yi R, we wantto estimate p(y|x)
Discriminative: Estimate p(y|x) directly using D.
KNN, Trees, SVM
Generative: Estimate p(x,y) directly using D. and thenp(y|x) = p(x,y)
p(x) , also we have p(x, y) = p(x|y)p(y)
Params/Latent variables : by including parameters, we havep(x,y|)
for discrete space: p(y|x, D) =
p(y|x, D, )p(|x, D)p(y|x, D, ) is nicep(|x, D) is nasty (called posterior dist. on )summation (or integration in case of continuous space) is nastyand often intractable
28/33
The big picture
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
29/33
The big picture
p(y|x, D) =
p(y|x, D, )p(|x, D)
Exact inference:Multi-variate Gaussian.Graphical models
Point estimate of
Maximum Likelihood Estimation (MLE)Maximum A Prior (MAP)Est. = argmaxp(|x, D)
Deterministic Approximation
Laplace Approx.Variational methods
Stochastic Approximation
Importance samplingGibbs sampling
29/33
Bayesian inference
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
30/33
Bayesian inference
Put distributions on everything and then use rules of probability
to infer valuesAspects of Bayesian inference
Priors: Assuming a prior distribution p()Procedures: Minimizing expected loss (averaging over )
Pros.:Directly answer questions.Avoid overfitting
Cons.:
Must assume prior.
Exact computation can be intractable
30/33
Bayesian inference
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
31/33
Directed graphical modelsBayesian networks or Conditional independ. diagram:
Why? Tractable inference.Factorization of the probabilistic model.Notational deviceVisualization for inference algorithmsExample for thinking graphically p(a, b, c):
p(a, b, c) = p(c|a, b)p(a, b) = p(c|a, b)p(b|a)p(a)
31/33
Summary
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
32/33
Summary
Machine learning is an essential field for our life.
Machine learning is a broad world, we just started it in thissession :D :D.
32/33
Feedback
7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek
33/33
Feedback
Your feedback is welcomed on alex.acm.org/feedback/machine/
33/33