Top Banner

of 33

Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

Apr 14, 2018

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    1/33

    Introduction to Machine Learning

    Ibrahim Sabek

    Computer and Systems Engineering Department,Faculty of Engineering,

    Alexandria University, Egypt

    1 / 3 3

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    2/33

    Agenda

    1 Machine learning overview and applications2 Supervised vs. Unsupervised learning

    3 Generative vs. Discriminative models

    4 Overview of Classification

    5 The big picture

    6 Bayesian inference

    7 Summary

    8 Feedback

    2 / 3 3

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    3/33

    Machine learning overview and applications

    What is Machine Learning (ML)?Definition: algorithms for inferring unknowns from knowns.

    What do you mean by inferring ??How to get unknowns from knowns??

    3 / 3 3

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    4/33

    Machine learning overview and applications

    What is Machine Learning (ML)?Definition: algorithms for inferring unknowns from knowns.

    What do you mean by inferring ??How to get unknowns from knowns??

    ML applications

    Spam detection

    Handwriting detectionSpeech recognitionNetflix recommendation system

    4 / 3 3

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    5/33

    Machine learning overview and applications

    What is Machine Learning (ML)?Definition: algorithms for inferring unknowns from knowns.

    What do you mean by inferring ??How to get unknowns from knowns??

    ML applications

    Spam detection

    Handwriting detectionSpeech recognitionNetflix recommendation system

    Classes of ML models

    Supervised vs. Unsupervised.Generative vs. Discriminative

    5 / 3 3

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    6/33

    Supervised vs. Unsupervised learning

    Supervised vs. UnsupervisedSupervised: Given (x1,y1), (x2,y2), ......, (xn,yn), choose a

    function f(xi) = yi

    xi R2, xi = data pointsyi = class/value

    6 / 3 3

    S i d U i d l i

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    7/33

    Supervised vs. Unsupervised learning

    Supervised vs. UnsupervisedSupervised: Given (x1,y1), (x2,y2), ......, (xn,yn), choose a

    function f(xi) = yi

    xi R2, xi = data pointsyi = class/valueClassification: yi {finite set}

    Regression: yi

    R

    7 / 3 3

    S i d U i d l i

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    8/33

    Supervised vs. Unsupervised learning

    Supervised vs. UnsupervisedSupervised: Given (x1,y1), (x2,y2), ......, (xn,yn), choose a

    function f(xi) = yixi R2, xi = data pointsyi = class/valueClassification: yi {finite set}Regression: yi R

    8 / 3 3

    Supervised vs Unsupervised learning

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    9/33

    Supervised vs. Unsupervised learning

    Supervised vs. UnsupervisedSupervised: Given (x1,y1), (x2,y2), ......, (xn,yn), choose a

    function f(xi) = yixi R2, xi = data pointsyi = class/valueClassification: yi {finite set}Regression: yi R

    9 / 3 3

    Supervised vs Unsupervised learning

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    10/33

    Supervised vs. Unsupervised learning

    Supervised vs. UnsupervisedUnsupervised: Given (x1, x2, ..., xn), find patterns in the data.

    xi R2, xi = data points

    10/33

    Supervised vs Unsupervised learning

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    11/33

    Supervised vs. Unsupervised learning

    Supervised vs. UnsupervisedUnsupervised: Given (x1, x2, ..., xn), find patterns in the data.

    xi R2, xi = data pointsClusteringDensity estimationDimensional reduction

    11/33

    Supervised vs. Unsupervised learning

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    12/33

    Supervised vs. Unsupervised learning

    Supervised vs. UnsupervisedUnsupervised: Given (x1, x2, ..., xn), find patterns in the data.

    xi R2, xi = data pointsClusteringDensity estimationDimensional reduction

    12/33

    Supervised vs. Unsupervised learning

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    13/33

    p p g

    Supervised vs. UnsupervisedUnsupervised: Given (x1, x2, ..., xn), find patterns in the data.

    xi R2, xi = data pointsClusteringDensity estimationDimensional reduction

    13/33

    Supervised vs. Unsupervised learning

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    14/33

    p p g

    Supervised vs. UnsupervisedUnsupervised: Given (x1, x2, ..., xn), find patterns in the data.

    xi R2, xi = data pointsClusteringDensity estimationDimensional reduction

    14/33

    Supervised vs. Unsupervised learning

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    15/33

    Variations on Supervised and Unsupervised

    Semi-supervised: Given

    (x1,y1), (x2,y2), ......, (xk,yk), xk+1, xk+2, ..., xn, predictyk+1,yk+2, ...,yn

    15/33

    Supervised vs. Unsupervised learning

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    16/33

    Variations on Supervised and Unsupervised

    Semi-supervised: Given

    (x1,y1), (x2,y2), ......, (xk,yk), xk+1, xk+2, ..., xn, predictyk+1,yk+2, ...,yn

    Active learning:

    16/33

    Supervised vs. Unsupervised learning

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    17/33

    Variations on Supervised and Unsupervised

    Decision theory: measure the prediction performance of

    unlabeled data

    17/33

    Supervised vs. Unsupervised learning

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    18/33

    Variations on Supervised and Unsupervised

    Decision theory: measure the prediction performance of

    unlabeled dataReinforcement learning:

    maximize rewards (minimize losses) by actionsmaximize overall lifetime reward

    18/33

    Generative vs. Discriminative models

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    19/33

    Generative vs. Discriminative models

    Given (x1,y1), (x2,y2), ......, (xn,yn), and a new point (x,y)

    19/33

    Generative vs. Discriminative models

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    20/33

    Generative vs. Discriminative models

    Given (x1,y1), (x2,y2), ......, (xn,yn), and a new point (x,y)

    Discriminative:

    you want to estimate p(y = 1|x), p(y = 0|x) for y {0, 1}

    20/33

    Generative vs. Discriminative models

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    21/33

    Generative vs. Discriminative models

    Given (x1,y1), (x2,y2), ......, (xn,yn), and a new point (x,y)

    Discriminative:

    you want to estimate p(y = 1|x), p(y = 0|x) for y {0, 1}

    Generative:

    you want to estimate the joint distribution p(x, y)

    21/33

    Overview of Classification

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    22/33

    k-Nearest Neighbor classification (kNN)

    Given D = {(x1,y1), (x2,y2), ......, (xn,yn)}, and a new point (x,y)

    where xi R,yi {0, 1}

    Dissimilarity metric: d(x, x) = ||x x||2 for k = 1Probabilistic interpretation:

    Given fixed k, p(y) = fraction of pts xi in Nk(x) s.t. yi = y

    y = argmaxyp(y|x, D)22/33

    Overview of Classification

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    23/33

    Classification trees (CART)

    Given D = {(x1,y1), (x2,y2), ......, (xn,yn)}, and a new x where

    xi R,yi {0, 1}You build a binary tree

    Minimize error in each leaf

    23/33

    Overview of Classification

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    24/33

    Regression tress (CART)

    Given D = {(x1,y1), (x2,y2), ......, (xn,yn)}, and a new x where

    xi R,yi R

    24/33

    Overview of Classification

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    25/33

    Bootstrap aggregation (Bagging)

    Given D = {(x1,y1), (x2,y2), ......, (xn,yn)} follows P iid , and a

    new x where xi R,yi R, we need to find its y value

    25/33

    Overview of Classification

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    26/33

    Bootstrap aggregation (Bagging)

    Given D = {(x1,y1), (x2,y2), ......, (xn,yn)} follows P iid , and a

    new x where xi R,yi R, we need to find its y value

    Intuition: averaging makes your prediction close to the truelabel

    Different training datasets , (xik,yik) follows uniform (D) iid.

    The final label y is the average of generated labels from thedifferent datasets.

    26/33

    Overview of Classification

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    27/33

    Random forests

    Given D = {(x1,y1), (x2,y2), ......, (xn,yn)} where xi R,yi R

    For i = 1, ..., BChoose bootstrap sample Di from DConstruct tree Ti using Di s.t. at each node choose randomsubset of features and only consider splitting on these features.

    Given x, take majority vote (for classification) or average (forregression).

    27/33

    The big picture

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    28/33

    The big picture

    Given the expected loss function EL(y, f(x)) and

    D = {(x1,y1), (x2,y2), ......, (xn,yn)} where xi R,yi R, we wantto estimate p(y|x)

    Discriminative: Estimate p(y|x) directly using D.

    KNN, Trees, SVM

    Generative: Estimate p(x,y) directly using D. and thenp(y|x) = p(x,y)

    p(x) , also we have p(x, y) = p(x|y)p(y)

    Params/Latent variables : by including parameters, we havep(x,y|)

    for discrete space: p(y|x, D) =

    p(y|x, D, )p(|x, D)p(y|x, D, ) is nicep(|x, D) is nasty (called posterior dist. on )summation (or integration in case of continuous space) is nastyand often intractable

    28/33

    The big picture

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    29/33

    The big picture

    p(y|x, D) =

    p(y|x, D, )p(|x, D)

    Exact inference:Multi-variate Gaussian.Graphical models

    Point estimate of

    Maximum Likelihood Estimation (MLE)Maximum A Prior (MAP)Est. = argmaxp(|x, D)

    Deterministic Approximation

    Laplace Approx.Variational methods

    Stochastic Approximation

    Importance samplingGibbs sampling

    29/33

    Bayesian inference

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    30/33

    Bayesian inference

    Put distributions on everything and then use rules of probability

    to infer valuesAspects of Bayesian inference

    Priors: Assuming a prior distribution p()Procedures: Minimizing expected loss (averaging over )

    Pros.:Directly answer questions.Avoid overfitting

    Cons.:

    Must assume prior.

    Exact computation can be intractable

    30/33

    Bayesian inference

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    31/33

    Directed graphical modelsBayesian networks or Conditional independ. diagram:

    Why? Tractable inference.Factorization of the probabilistic model.Notational deviceVisualization for inference algorithmsExample for thinking graphically p(a, b, c):

    p(a, b, c) = p(c|a, b)p(a, b) = p(c|a, b)p(b|a)p(a)

    31/33

    Summary

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    32/33

    Summary

    Machine learning is an essential field for our life.

    Machine learning is a broad world, we just started it in thissession :D :D.

    32/33

    Feedback

  • 7/30/2019 Alex ACM SC Machine Learning Day [Materials] | Introduction to Machine Learning By Eng. Ibrahim Sabek

    33/33

    Feedback

    Your feedback is welcomed on alex.acm.org/feedback/machine/

    33/33