Intro to machine learning

Artificial Intelligence[ECS 801] PresentationSubject Professor: Dr Y.N.Singh

Topic: Introduction to Machine Learning

Presented by:

Akshay Kanchan(1205210006)Mohd

Iqbal(1305210903)

Institute of Engineering and Technology Lucknow

In Artificial Intelligence, an intelligent machine should be able to:

1. Think and act Rationally2. Store and retrieve knowledge3. Adapt and Learn in new environment

and with new Data (Machine Learning)

"Field of study that gives computers theability to learn without being explicitlyprogrammed.”

What is machine learning?

Traditional Programming

Machine Learning

ComputerData

ProgramOutput

ComputerData

OutputProgram

-autonomous, self-driving car

- determining election results

- developing pharmaceutical drugs (combinatorial chemistry)

- predicting tastes in music (Pandora)

- predicting tastes in movies/shows (Netflix)

- search engines (Google)

- predicting interests (Facebook)

- predicting other books you might like (Amazon)

Where is Machine Learning being Used

ML in our daily lives

More Places where ML is being used

• 1950 — Alan Turing creates the “Turing Test” to determine if a computer has real intelligence.

• 1952 — Arthur Samuel wrote the first computer learning program. The program was the game of checkers.

• 1957 — Frank Rosenblatt designed the first neural network for computers.

• 1967 — The “nearest neighbour” algorithm was written, allowing computers to begin using very basic pattern recognition.

• 1979 — Students at Stanford University invent the “Stanford Cart” which can navigate obstacles in a room on its own

Brief History

• 1990s — Work on machine learning shifts from a knowledge-driven approach to a data-driven approach. Scientists begin creating programs for computers to analyze large amounts of data and draw conclusions — or “learn” — from the results.

• ASIMO, a Humanoid Robot designed and developed by Honda. Introduced in 2000.

• 2016, Google program AlphaGo beats Professional World Go champion by 4 games to 1.

In Machine Learning a computer program is said to learn from experience E with respect to some task T and performance metric P, if its performance at tasks in T, as measured by P, improves with experience E.

Formal Definition

Why is Machine Learning Important?

• Some tasks cannot be defined well, except by examples (e.g., recognizing people).

• Relationships and correlations can be hidden within large amounts of data. Machine Learning may be able to find these relationships.

11

Areas of Influence for Machine Learning

•Statistics: How best to use samples drawn from unknown probability distributions to help decide from which distribution some new sample is drawn.

•Psychology: How to model human performance on various learning tasks?

•Economics: How to write algorithms to maximum profits.

•Neural/Brain Models: How to model certain aspects of biological evolution to improve the performance of computer programs?

12

• Prepare DataRemove noise, smoothening, feature extraction, dimensionality reduction,

• Choose an AlgorithmLinear, non-linear, complexity, speed, accuracy.

• Train a ModelPrevent Over fitting and Under fitting

• Test the model

• Use for Prediction

Steps involved in Learning:

http://in.mathworks.com/help/stats/supervised-learning-machine-learning-workflow-and-algorithms.html?requestedDomain=www.mathworks.com#bswlxhb

http://in.mathworks.com/help/stats/supervised-learning-machine-learning-workflow-and-algorithms.html?requestedDomain=www.mathworks.com#bswlxht

http://in.mathworks.com/help/stats/supervised-learning-machine-learning-workflow-and-algorithms.html?requestedDomain=www.mathworks.com#bswlxht

Learning: Training and Test Data

Prediction

Learning Example:Training Labels

Training Images

Training

Training

Image Features

Image Features

Testing

Test Image

Learned model

Learned model

Reinforcemet Learning

18

Supervised learning

Supervised learning

The correct classes of the training data are

known

1. Naïve Bayes2. k-Nearest Neighbours 3. Support Vector Machine4. Decision Tree5. Neural Network6. Bayesian Network7. Random ForestEtc.

Supervised Learning Algorithms

K-nearest neighbor

x x

x x

x

xx

xo

oo

o

o

oo

x2

x1

+

+

The principle behind nearest neighbour methods is to find a predefined number of training samples closest in distance to the new point, and predict the label from these

1-nearest neighbor

x x

x x

x

xx

xo

oo

o

o

oo

x2

x1

+

+

3-nearest neighbor

x x

x x

x

xx

xo

oo

o

o

oo

x2

x1

+

+

Naïve Bayes

• Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of independence between every pair of features

• Uses Probabilistic approach to assign label to data

• Based on Bayesian Probability rule.

• It uses prior probability, evidence and posterior probability for classification

Support Vector machine

Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.

• Effective in high dimensional spaces.• Still effective in cases where number of dimensions is greater

than the number of samples.• Uses a subset of training points in the decision function (called

support vectors), so it is also memory efficient.• Versatile: different Kernel functions can be specified for the

decision function. Common kernels are provided, but it is also possible to specify custom kernels.

http://scikit-learn.org/stable/modules/svm.html#svm-classification

http://scikit-learn.org/stable/modules/svm.html#svm-regression

http://scikit-learn.org/stable/modules/svm.html#svm-outlier-detection

http://scikit-learn.org/stable/modules/svm.html#svm-kernels

SVM(cont’d)

• SVMs try to maximize margin of hyperplane.• SVM uses Kernel functions that take low-dimension input

space and map it to higher dimensional space.X,Y(Kernel)X1,X2,X3

• SVM uses parameters like Gamma, C, Kernel etc to define itself.

Kernel function

SVM(cont’d)

1. Kernel can be linear, non-linear etc

2. Gamma- describes how far the influence of a single training example reaches.

For low Gamma value influence is Farand for high Gamma values influence is low

3. C parameter: defines if decision boundary will be smooth or of high order. It is a trade-off between biasing and variance.Low C value: Smooth decision boundaryHigh C value: high order classification

Decision Tree

• Decision Trees (DTs) are a non-parametric supervised learning method. The goal is to create a model that predicts the value of a target variable by learning simple decision rules.

• Uses a white box model. If a given situation is observable in a model, the explanation for the condition is easily explained by Boolean logic.

• The problem of learning an optimal decision tree is known to be NP-complete so locally optimal decisions are made at each node.

Regression• Regression analysis is also used to understand which among the

independent variables are related to the dependent variable, and to explore the forms of these relationships.

• It includes many techniques for modelling and analysing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables (or 'predictors').

https://en.wikipedia.org/wiki/Dependent_variable

https://en.wikipedia.org/wiki/Independent_variable

Classification vs Regression •Classification means to group the output into a class.

•classification to predict the type of tumor i.e. harmful or not harmful using training data

• if it is discrete/categorical variable, then it is classification problem

• Regression means to predict the output value using training data.

• regression to predict the house price from training data

• if it is a real number/continuous, then it is regression problem.

The correct classes of the training data are not known

Unsupervised Learning

Unsupervised learning

Clustering• Cluster analysis or clustering is the task of grouping a set of objects in

such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).

K means clustering

• The algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares.

• This algorithm requires the number of clusters to be specified.

• It scales well to large number of samples and has been used across a large range of application areas in many different fields.

http://scikit-learn.org/stable/modules/inertia

K means Clustering Example

http://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_assumptions.html

That algorithm presents a state dependent on the input data in which a user rewards or punishes the algorithm via the action the user took, this continues over time

Reinforcement Learning

Reinforcement learning

Markov model

• It is a type of reinforcement learning.

• There are three fundamental problems for HMMs:1. Given the model parameters and observed data, estimate the

optimal sequence of hidden states.2. Given the model parameters and observed data, calculate the

likelihood of the data.3. Given just the observed data, estimate the model parameters.

HMM example for 2 classes

12

3 4

5

References

1. All definitions and explanations: http://scikit-learn.org/

2. Machine Learning History: http://www.forbes.com/

3. Images Online lectures of CMU Prof Sebastian Thrun.

http://scikit-learn.org/

Latest technologies in all field are being replaced by smart machines. Stock Market, Ecommerce, Personalized customer experience etc etc.

In future maybe presentations will be prepared and given by robots!!

Conclusion

Thank you