Top Banner
www.edureka.co/r-for-analytics Linear Regression in R
31

Linear Regression in R

Jan 25, 2017

Download

Technology

Edureka!
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Linear Regression in R

www.edureka.co/r-for-analytics

Linear Regression in R

Page 2: Linear Regression in R

www.edureka.co/r-for-analytics

What will you learn today?

What is Linear Regression ?

How to Design a Linear Regression Model ?

How to Compare Regression Models ?

Hands-On : Linear Regression in R

Page 3: Linear Regression in R

www.edureka.co/r-for-analytics

Problem

Lets assume you are an owner of a restaurant where “tips” are part of a waiter’s pay. The amount of tip depends on the amount of the total bill.

Lets see how we can predict the amount of tip

from the bill using Linear Regression

Page 4: Linear Regression in R

www.edureka.co/r-for-analytics

Predicting the Tip

Suppose you don’t have the data for the amount of bill, so only data that you have is the tip amount for the order as shown below.

For first meal order waiter got 5$ as tip, for second meal order waiter got 17$ as tip as shown above

Page 5: Linear Regression in R

www.edureka.co/r-for-analytics

Lets Visualize the data that we have

Page 6: Linear Regression in R

www.edureka.co/r-for-analytics

How to predict the next tip?

How can I predict the

tip amount ?

Page 7: Linear Regression in R

www.edureka.co/r-for-analytics

How to predict the next tip?

Since only data we have is the tip amount, all we can do is take a mean of the tip amount.

Page 8: Linear Regression in R

www.edureka.co/r-for-analytics

Conclusion

So the best estimate that we can do for the tip amount from the data that we have is 10$, which is the mean of all the tip amounts

Mean= 5$+17$+11$+8$+14$+5$

6

=10$

Note that when you have only one variable and no other information, the best prediction that can be made is the mean of the sample data itself

Page 9: Linear Regression in R

www.edureka.co/r-for-analytics

Residuals (Errors)

The deviation between actual and estimated value is called residuals or errors

Page 10: Linear Regression in R

www.edureka.co/r-for-analytics

Residuals (Errors)

Note that sum of the residuals is always zero. So if you add up all the positive and negative deviation you will get zero. In other words, amount of positive and negative deviation is always the same

Page 11: Linear Regression in R

www.edureka.co/r-for-analytics

Sum of Square of Residuals (Errors)

Note that sum of squared errors (SSE) is 120

Page 12: Linear Regression in R

www.edureka.co/r-for-analytics

Why Square the Residuals ?

What do we get from

squaring the residuals ?

Page 13: Linear Regression in R

www.edureka.co/r-for-analytics

Key Points

By squaring the residuals(errors) we achieve following :

It emphasizes the deviation and make it more obvious

It helps in comparing different analysis models

The goal of linear regression is to create a linear model which minimizes the sum of square of residuals/errors SSE

Page 14: Linear Regression in R

www.edureka.co/r-for-analytics

Improving the Current Model

The tip of the waiter depends on the amount of the bill.

Till now we were just using the value of previous tips to estimate the value of next tip.

Next we will design a linear regression model which will estimate the amount of tip depending on billing amount.

Page 15: Linear Regression in R

www.edureka.co/r-for-analytics

Lets Visualize the data that we have

Note that Tip amount is dependent variable which depends on Bill amount and Bill amount is independent variable

Page 16: Linear Regression in R

www.edureka.co/r-for-analytics

Linear Regression

Note that in linear regression the value of dependent variable (e.g. tip amount) is the mean of values, not just a single value

Linear Regression Equation

Page 17: Linear Regression in R

www.edureka.co/r-for-analytics

Linear Regression Types

A linear regression model with narrow distribution is much better than a model with broad distribution

Narrow Distribution Broad Distribution

Page 18: Linear Regression in R

www.edureka.co/r-for-analytics

Linear Regression – a closer look

To draw a linear regression line we would need value of slope (b1) and value of interceptor (b0) as shown below :

Page 19: Linear Regression in R

www.edureka.co/r-for-analytics

Linear Regression - Calculation

Page 20: Linear Regression in R

www.edureka.co/r-for-analytics

Linear Regression – Calculating Slope

Value of slope (b1) is 0.1462 as calculated below :

Page 21: Linear Regression in R

www.edureka.co/r-for-analytics

Linear Regression – Calculating Y Intercept

Value of Y intercept (b0) is -0.8188 as calculated below :

Page 22: Linear Regression in R

www.edureka.co/r-for-analytics

Linear Regression – Putting the values

Lets put the values of slope and Y intercept into the Linear Regression equation

Page 23: Linear Regression in R

www.edureka.co/r-for-analytics

Linear Regression – Predicting Tip amount

Lets calculate the predicted tip amount

Page 24: Linear Regression in R

www.edureka.co/r-for-analytics

Linear Regression – Drawing the regression line

Page 25: Linear Regression in R

www.edureka.co/r-for-analytics

Linear Regression – Calculating Residuals

Lets calculate the residuals (errors)

Page 26: Linear Regression in R

www.edureka.co/r-for-analytics

Linear Regression – Regression line with residuals

Page 27: Linear Regression in R

www.edureka.co/r-for-analytics

Linear Regression – Squaring the residuals (errors)

Lets calculate the sum of square of residuals

Page 28: Linear Regression in R

www.edureka.co/r-for-analytics

Summing it up - Comparison

As shown,

Second approach provides better estimate as it decreases the sum of squared errors (SSE)

Page 29: Linear Regression in R

www.edureka.co/r-for-analytics

Hands-onLinear Regression in R

Page 30: Linear Regression in R

www.edureka.co/r-for-analytics

Survey

Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better!

Please spare few minutes to take the survey after the webinar.

Page 31: Linear Regression in R

www.edureka.co/r-for-analytics

Thank You …

Questions/Queries/Feedback

Recording and presentation will be made available to you within 24 hours