Top Banner

of 49

FHMM1034 Chapter 5 Correlation and Regression (Student Version)

Jan 07, 2016

Download

Documents

Allen Fourever

gsdg
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • FHMM1034 Mathematics IIICentre For Foundation StudiesDepartment of Sciences and EngineeringChapter 5Correlation and Regression*FHMM1034 Mathematics III

  • *Content5.1Introduction5.2Linear Correlation5.3Simple Linear Regression5.4Coefficient of Determination5.5Regression Analysis : A complete example*FHMM1034 Mathematics III

  • *5.1Introduction*FHMM1034 Mathematics III

  • The main objective of this chapter is to analyze a collection of paired sample data (or bivariate data) and determine whether there appears to be a relationship between the two variables.Example:What is the relationship between cholesterol levels and the incidence of heart disease? Introduction*FHMM1034 Mathematics III

  • There are 2 most common procedures for examining relationships between measured variables:1.Correlation AnalysisIs there a relationship between two (or more variables?If there is, what is the strength of the relationship?2.Regression AnalysisDevelop a model that relates Y to X.Predict the future values of Y variable.Introduction*FHMM1034 Mathematics III

  • When two variables are measured on a single experimental unit, the resulting data are called bivariate data.You can describe each variable individually, and you can also explore the relationship between the two variables.Bivariate data can be described withGraphsNumerical Measures*FHMM1034 Mathematics IIIBivariate Data

  • Dependent variable (also known as Y variable) which measures the outcome of a study. It is the variable that is being predicted or estimated.Independent variable (also known as X variable) which is a variable that attempts to explain the variation in Y. It is the predictor variable.*FHMM1034 Mathematics IIIExamining Relationship

  • When both of the variables are quantitative, call one variable x and the other y. A single measurement is a pair of numbers (x, y) that can be plotted using a two-dimensional graph called a scatter plot.(2, 5)Scatter diagram (scatter plot) is a plot of paired observations that portrays the relationship between the X and Y variables.*FHMM1034 Mathematics IIIScatter Diagram

  • Incomes and food expenditure of seven households are listed below. Using the information, draw a scatter diagram. *FHMM1034 Mathematics IIIExample 1

    Income (hundreds RM)Food expenditure (hundreds of dollars)35492139152825915711589

  • *FHMM1034 Mathematics IIIExample 1 (cont.)The scatter diagram:

  • *5.2Linear Correlation*FHMM1034 Mathematics III

  • A group of techniques to measure the association relationship between variables.Examples:Time spent study and exam grade.Salary and years of working experience.Age and blood pressure.Smoking and lungs cancer.*FHMM1034 Mathematics IIICorrelation Analysis

  • Measures the strength of the linear association/relationship between two variables.The linear correlation coefficient measures how closely the points in a scatter diagram are spread around the regression line.The correlation coefficient calculated for the population data is denoted by and for sample data is denoted by r.The value of the correlation coefficient always lies in the range 1 to 1; that is,1 1 and 1 r 1*FHMM1034 Mathematics IIILinear Correlation Coefficient

  • The linear correlation coefficient, r, (is also called the Pearson product moment correlation coefficient) measures the strength of the linear relationship between the paired x and y quantitative values in a sample.*FHMM1034 Mathematics IIILinear Correlation Coefficient

  • where, *FHMM1034 Mathematics IIILinear Correlation Coefficient

  • *Linear Correlation*FHMM1034 Mathematics III

  • *Linear Correlation*FHMM1034 Mathematics III

  • *Linear Correlation*FHMM1034 Mathematics III

  • Perfect positive linear correlation :When r = 1:In this case, all points in the scatter diagram lie on a straight line that slopes upward from left to right.y*FHMM1034 Mathematics IIILinear Correlation

  • Perfect negative linear correlation :When r = 1:In this case, all points in the scatter diagram fall on a straight line that slopes downward from left to the right.y*FHMM1034 Mathematics IIILinear Correlation

  • *FHMM1034 Mathematics IIILinear Correlation

  • *Linear CorrelationProperties of the linear correlation coefficient, r :(i)The value of r is always between 1 and 1 inclusive. That is,1 r 1(ii)r measures the strength of a linear relationship. It is not designed to measure the strength of a relationship that is not linear.*FHMM1034 Mathematics III

  • *Example 2Calculate the correlation coefficient of the example of incomes and food expenditures of seven households in Example 1.*FHMM1034 Mathematics III

  • *5.3Simple Linear Regression*FHMM1034 Mathematics III

  • *

    Relationship betweenfood expenditure and income? The simple regression equation (model) expresses a relationship between 2 variable, one independent variable and one dependent variable.*FHMM1034 Mathematics IIISimple Linear Regression

  • *Simple Linear RegressionVariable x : Independent variablePredictor variableExplanatory variable

    Variable y :Dependent variableResponse variable*FHMM1034 Mathematics III

  • *Simple Linear RegressionA (simple) regression model that gives a straight line relationship between two variables is called a linear regression model, y = A + Bx .

    wherey= dependent variablex= independent variableA= y-interceptB= slope *FHMM1034 Mathematics III

  • *Simple Linear RegressionGiven a collection of paired sample data, the regression equation describes the relationship between the two variables algebraically.The graph of the regression equation is called the regression line.*FHMM1034 Mathematics III

  • *Simple Linear RegressionFor least squares regression line, and

    The least squares regression line is also called the regression of y on x.*FHMM1034 Mathematics III

  • Note: When b is positive, an increase in x will lead to an increase in y and a decrease in x will lead to a decrease in y. Such a relationship between x and y is called a positive linear relationship.If the value of b is negative, an increase in x will cause a decrease in y and a decrease in x will cause an increase in y. Such a relationship between x and y is called a negative linear relationship.*FHMM1034 Mathematics IIIInterpretation of a and b

  • There is a linear correlation between x and y, the best predicted yvalue is found by substituting the xvalue into the regression equation. *FHMM1034 Mathematics IIIUsing Regression Equation for Prediction

  • *Example 3Table below shows the incomes and food expenditures (in hundreds of dollar) of seven households.*FHMM1034 Mathematics III

    IncomeFood Expenditure35949152173911155288259

  • *Example 3(a)Find the least squares regression line for the data on incomes and food expenditures on the seven households.(b)What is the predicted food expenditure for a household with income of RM3000?(c)Give a brief interpretation of the values of a and b calculated in part (a).*FHMM1034 Mathematics III

  • *5.4Coefficient of Determination*FHMM1034 Mathematics III

  • *Error Sum of Squares, SSEThe error sum of squares, denoted by SSE, is

    The values of a and b which give the minimum SSE are called the least squares estimates of A and B and the regression line obtained with these estimates is called the least squares line.*FHMM1034 Mathematics III

  • *Standard Deviation ofRandom Errors, SeThe standard deviation of errors tells how widely the errors and hence the values of y are spread for a given x.

    For calculation, we will use,

    *FHMM1034 Mathematics III

  • *Example 4 Compute the standard deviation of errors, Se, for the data on monthly incomes and food expenditures of the seven households given in Example 3.*FHMM1034 Mathematics III

  • *Total Sum of Squares, SSTThe total sum of squares, denoted by SST is given by,*FHMM1034 Mathematics III

  • *Example 5 For the regression line in Example 3, find the value of its SSE and SST.*FHMM1034 Mathematics III

  • *Regression Sum of Squares, SSRThis reduction in squared errors is called the regression sum of squares and is denoted by SSR. Thus,*FHMM1034 Mathematics III

  • *Measure how well does the independent variable explain the dependent variable in the regression model.FHMM1034 Mathematics III*Coefficient of Determination, r2*FHMM1034 Mathematics III

  • *Coefficient of Determination, r2The coefficient of determination, denoted by r2,represents the proportion of SST that is explained by the use of the linear regression model.

    The computational formula for r2 is: *FHMM1034 Mathematics III

  • *Example 6For the data in Example 3, calculate the coefficient of determination. Interpret your answer.

    *FHMM1034 Mathematics III

  • *5.5Regression Analysis: A Complete Example*FHMM1034 Mathematics III

  • *A random sample of eight drivers insured with a company and having similar auto insurance policies was selected. The following table lists their driving experiences (in years) and monthly auto insurance premiums (in dollars).*FHMM1034 Mathematics IIIRegression Analysis: A Complete Example

  • **FHMM1034 Mathematics IIIRegression Analysis: A Complete Example

    Driving Experiences(in years)Monthly Auto InsurancePremium (in dollars)5642871250971154465625421660

  • *(a)Does the insurance premium depend on the driving experience or does the driving experience depend on the insurance premium? Do you expect a positive or negative relationship between these two variables?(b)Compute SXX , SYY and SXY .(c)Find the least squares regression line by choosing appropriate dependent and independent variables based on answer in part (a).*FHMM1034 Mathematics IIIRegression Analysis: A Complete Example

  • *(d)Interpret the meaning of the values of a and b calculated in part (c).(e)Plot the scatter diagram and the regression line.(f)Calculate r and r2 and explain what they mean.(g)Calculate standard deviation of errors.(h)Predict the monthly auto insurance premium for a driver with 10 years of driving experience.*FHMM1034 Mathematics IIIRegression Analysis: A Complete Example

  • *The End ofChapter 5*FHMM1034 Mathematics III

    *FHMM1034 Mathematics IIIFHMM1034 Mathematics III*FHMM1034 Mathematics IIIFHMM1034 Mathematics III