Top Banner

of 70

Fu Ch11 Linear Regression (1)

Jan 18, 2016

Download

Documents

Deals about linear regression in nutshell.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Chapter 11Regression and Correlation methods

    EPI 809/Spring 2008

  • Learning ObjectivesDescribe the Linear Regression ModelState the Regression Modeling StepsExplain Ordinary Least SquaresCompute Regression CoefficientsUnderstand and check model assumptionsPredict Response VariableComments of SAS Output

    EPI 809/Spring 2008As a result of this class, you will be able to...

  • Learning Objectives Correlation ModelsLink between a correlation model and a regression modelTest of coefficient of Correlation

    EPI 809/Spring 2008

  • Models

    EPI 809/Spring 20083

  • What is a Model?

    Representation of Some Phenomenon

    Non-Math/Stats Model

    Representation of Some Phenomenon

    Non-Math/Stats Model

    EPI 809/Spring 2008.

  • What is a Math/Stats Model?Often Describe Relationship between Variables

    TypesDeterministic Models (no randomness)

    Probabilistic Models (with randomness)

    EPI 809/Spring 2008.

  • Deterministic ModelsHypothesize Exact RelationshipsSuitable When Prediction Error is NegligibleExample: Body mass index (BMI) is measure of body fat based

    Metric Formula: BMI = Weight in Kilograms (Height in Meters)2

    Non-metric Formula: BMI = Weight (pounds)x703

    (Height in inches)2

    EPI 809/Spring 2008

  • Probabilistic ModelsHypothesize 2 ComponentsDeterministicRandom ErrorExample: Systolic blood pressure of newborns Is 6 Times the Age in days + Random ErrorSBP = 6xage(d) + Random Error May Be Due to Factors Other Than age in days (e.g. Birthweight)

    EPI 809/Spring 2008

  • Types of Probabilistic Models

    EPI 809/Spring 2008

    7

  • Regression Models

    EPI 809/Spring 200813

  • Types of Probabilistic Models

    EPI 809/Spring 2008

    7

  • Regression ModelsRelationship between one dependent variable and explanatory variable(s)Use equation to set up relationship

    Numerical Dependent (Response) Variable1 or More Numerical or Categorical Independent (Explanatory) VariablesUsed Mainly for Prediction & Estimation

    EPI 809/Spring 2008

  • Regression Modeling Steps 1.Hypothesize Deterministic Component

    Estimate Unknown Parameters2.Specify Probability Distribution of Random Error Term

    Estimate Standard Deviation of Error3.Evaluate the fitted Model4.Use Model for Prediction & Estimation

    EPI 809/Spring 2008

  • Model Specification

    EPI 809/Spring 200813

  • Specifying the deterministic component1.Define the dependent variable and independent variable

    2.Hypothesize Nature of RelationshipExpected Effects (i.e., Coefficients Signs)Functional Form (Linear or Non-Linear)Interactions

    EPI 809/Spring 2008

  • Model Specification Is Based on Theory1.Theory of Field (e.g., Epidemiology)2.Mathematical Theory3.Previous Research4.Common Sense

    EPI 809/Spring 2008

  • Thinking Challenge: Which Is More Logical?

    Years since seroconversionCD+ countsCD+ countsYears since seroconversionYears since seroconversionYears since seroconversionCD+ countsCD+ counts

    EPI 809/Spring 200817With positive linear relationship, sales increases infinitely.Discuss concept of relevant range.

  • OB/GYN Study

    EPI 809/Spring 2008

  • Types of Regression Models

    EPI 809/Spring 200818This teleology is based on the number of explanatory variables & nature of relationship between X & Y.

  • Types of Regression ModelsRegressionModels

    EPI 809/Spring 200819This teleology is based on the number of explanatory variables & nature of relationship between X & Y.

  • Types of Regression ModelsRegressionModelsSimple1 ExplanatoryVariable

    EPI 809/Spring 200820This teleology is based on the number of explanatory variables & nature of relationship between X & Y.

  • Types of Regression ModelsRegressionModels2+ ExplanatoryVariablesSimpleMultiple1 ExplanatoryVariable

    EPI 809/Spring 200821This teleology is based on the number of explanatory variables & nature of relationship between X & Y.

  • Types of Regression ModelsRegressionModelsLinear2+ ExplanatoryVariablesSimpleMultiple1 ExplanatoryVariable

    EPI 809/Spring 200822This teleology is based on the number of explanatory variables & nature of relationship between X & Y.

  • Types of Regression ModelsRegressionModelsLinearNon-Linear2+ ExplanatoryVariablesSimpleMultiple1 ExplanatoryVariable

    EPI 809/Spring 200823This teleology is based on the number of explanatory variables & nature of relationship between X & Y.

  • Types of Regression ModelsRegressionModelsLinearNon-Linear2+ ExplanatoryVariablesSimpleMultipleLinear1 ExplanatoryVariable

    EPI 809/Spring 200824This teleology is based on the number of explanatory variables & nature of relationship between X & Y.

  • Types of Regression ModelsRegressionModelsLinearNon-Linear2+ ExplanatoryVariablesSimpleMultipleLinear1 ExplanatoryVariableNon-Linear

    EPI 809/Spring 200824This teleology is based on the number of explanatory variables & nature of relationship between X & Y.

  • Linear Regression Model

    EPI 809/Spring 200826

  • Types of Regression Models

    EPI 809/Spring 2008

    27This teleology is based on the number of explanatory variables & nature of relationship between X & Y.

  • Linear Equations 1984-1994 T/Maker Co.

    EPI 809/Spring 2008

    28

  • Linear Regression Model1.Relationship Between Variables Is a Linear Function

    YXiii01Dependent (Response) Variable(e.g., CD+ c.)Independent (Explanatory) Variable (e.g., Years s. serocon.)Population SlopePopulation Y-InterceptRandom Error

    EPI 809/Spring 2008

  • Population & Sample Regression Models

    EPI 809/Spring 200830

  • Population & Sample Regression ModelsPopulation

    EPI 809/Spring 200831

  • Population & Sample Regression ModelsUnknown RelationshipPopulation

    EPI 809/Spring 2008

    32

  • Population & Sample Regression ModelsUnknown RelationshipPopulationRandom Sample

    EPI 809/Spring 2008

    33

  • Population & Sample Regression ModelsUnknown RelationshipPopulationRandom Sample

    EPI 809/Spring 2008

    34

  • Population Linear Regression ModelObservedvalueObserved valuei = Random error

    EPI 809/Spring 2008

    35

  • Sample Linear Regression ModelUnsampled observationi = Random errorObserved value^

    EPI 809/Spring 2008

    36

  • Estimating Parameters:Least Squares Method

    EPI 809/Spring 200840

  • Scatter plot1.Plot of All (Xi, Yi) Pairs2.Suggests How Well Model Will Fit

    02040600204060XY

    EPI 809/Spring 2008

  • Thinking ChallengeHow would you draw a line through the points? How do you determine which line fits best? 02040600204060XY

    EPI 809/Spring 200842

  • Thinking ChallengeHow would you draw a line through the points? How do you determine which line fits best?

    02040600204060XYSlope changedIntercept unchanged

    EPI 809/Spring 200843

  • Thinking ChallengeHow would you draw a line through the points? How do you determine which line fits best?

    02040600204060XYSlope unchangedIntercept changed

    EPI 809/Spring 200844

  • Thinking ChallengeHow would you draw a line through the points? How do you determine which line fits best?

    02040600204060XYSlope changedIntercept changed

    EPI 809/Spring 200845

  • Least Squares1.Best Fit Means Difference Between Actual Y Values & Predicted Y Values Are a Minimum. But Positive Differences Off-Set Negative ones

    EPI 809/Spring 200849

  • Least Squares1.Best Fit Means Difference Between Actual Y Values & Predicted Y Values is a Minimum. But Positive Differences Off-Set Negative ones. So square errors!

    EPI 809/Spring 2008

    50

  • Least Squares1.Best Fit Means Difference Between Actual Y Values & Predicted Y Values Are a Minimum. But Positive Differences Off-Set Negative. So square errors!

    2.LS Minimizes the Sum of the Squared Differences (errors) (SSE)

    EPI 809/Spring 2008

    51

  • Least Squares Graphically

    EPI 809/Spring 2008

    52

  • Coefficient EquationsPrediction equation

    Sample slope

    Sample Y - intercept

    EPI 809/Spring 2008

  • Derivation of Parameters (1)Least Squares (L-S):

    Minimize squared error

    EPI 809/Spring 2008

  • Derivation of Parameters (1)Least Squares (L-S):

    Minimize squared error

    EPI 809/Spring 2008

  • Computation Table

    EPI 809/Spring 2008

    Xi

    Yi

    Xi2

    Yi2

    XiYi

    X1

    Y1

    X12

    Y12

    X1Y1

    X2

    Y2

    X22

    Y22

    X2Y2

    :

    :

    :

    :

    :

    Xn

    Yn

    Xn2

    Yn2

    XnYn

    SXi

    SYi

    SXi2

    SYi2

    SXiYi

    54

  • Interpretation of Coefficients

    EPI 809/Spring 2008

  • Interpretation of Coefficients1.Slope (1)Estimated Y Changes by 1 for Each 1 Unit Increase in X

    If 1 = 2, then Y Is Expected to Increase by 2 for Each 1 Unit Increase in X^^^

    EPI 809/Spring 2008

  • Interpretation of Coefficients1.Slope (1)Estimated Y Changes by 1 for Each 1 Unit Increase in X

    If 1 = 2, then Y Is Expected to Increase by 2 for Each 1 Unit Increase in X2.Y-Intercept (0)Average Value of Y When X = 0

    If 0 = 4, then Average Y Is Expected to Be 4 When X Is 0^^^^^

    EPI 809/Spring 2008

  • Parameter Estimation ExampleObstetrics: What is the relationship betweenMothers Estriol level & Birthweight using the following data?

    Estriol Birthweight (mg/24h)(g/1000) 1121324254

    EPI 809/Spring 2008

  • Scatterplot Birthweight vs. Estriol levelBirthweightEstriol level

    EPI 809/Spring 2008

    57

  • Parameter Estimation Solution Table

    EPI 809/Spring 2008

    Xi

    Yi

    Xi2

    Yi2

    XiYi

    1

    1

    1

    1

    1

    2

    1

    4

    1

    2

    3

    2

    9

    4

    6

    4

    2

    16

    4

    8

    5

    4

    25

    16

    20

    15

    10

    55

    26

    37

    58

  • Parameter Estimation Solution

    EPI 809/Spring 2008

    59

  • Coefficient Interpretation Solution

    EPI 809/Spring 2008

  • Coefficient Interpretation Solution1.Slope (1)Birthweight (Y) Is Expected to Increase by .7 Units for Each 1 unit Increase in Estriol (X)

    ^

    EPI 809/Spring 2008

  • Coefficient Interpretation Solution1.Slope (1)Birthweight (Y) Is Expected to Increase by .7 Units for Each 1 unit Increase in Estriol (X)2.Intercept (0)Average Birthweight (Y) Is -.10 Units When Estriol level (X) Is 0

    Difficult to explainThe birthweight should always be positive^^

    EPI 809/Spring 2008

  • SAS codes for fitting a simple linear regressionData BW; /*Reading data in SAS*/input estriol birthw@@;cards;11 21 32 42 54; run;

    PROC REG data=BW; /*Fitting linear regression models*/model birthw=estriol;run;

    EPI 809/Spring 2008

  • Parameter Estimation SAS Computer Output

    Parameter Estimates

    Parameter Standard Variable DF Estimate Error t Value Pr > |t|

    Intercept 1 -0.10000 0.63509 -0.16 0.8849 Estriol 1 0.70000 0.19149 3.66 0.03540^1^

    EPI 809/Spring 2008

  • Parameter Estimation Thinking ChallengeYoure a Vet epidemiologist for the county cooperative. You gather the following data:Food (lb.) Milk yield (lb.) 43.0 65.5106.5129.0What is the relationship between cows food intake and milk yield?

    1984-1994 T/Maker Co.

    EPI 809/Spring 2008

    62

  • Scattergram Milk Yield vs. Food intake*M. Yield (lb.)Food intake (lb.)

    EPI 809/Spring 2008

    Sheet:

    65

  • Parameter Estimation Solution Table*

    EPI 809/Spring 2008

    Xi

    Yi

    Xi2

    Yi2

    XiYi

    4

    3.0

    16

    9.00

    12

    6

    5.5

    36

    30.25

    33

    10

    6.5

    100

    42.25

    65

    12

    9.0

    144

    81.00

    108

    32

    24.0

    296

    162.50

    218

    66

  • Parameter Estimation Solution*

    EPI 809/Spring 2008

    67

  • Coefficient Interpretation Solution*

    EPI 809/Spring 2008

  • Coefficient Interpretation Solution*1.Slope (1)Milk Yield (Y) Is Expected to Increase by .65 lb. for Each 1 lb. Increase in Food intake (X)

    ^

    EPI 809/Spring 2008

  • Coefficient Interpretation Solution*1.Slope (1)Milk Yield (Y) Is Expected to Increase by .65 lb. for Each 1 lb. Increase in Food intake (X)

    2.Y-Intercept (0)Average Milk yield (Y) Is Expected to Be 0.8 lb. When Food intake (X) Is 0

    ^^

    EPI 809/Spring 2008

    As a result of this class, you will be able to...3

    ..

    7

    13

    7

    13

    17With positive linear relationship, sales increases infinitely.Discuss concept of relevant range.18This teleology is based on the number of explanatory variables & nature of relationship between X & Y.19This teleology is based on the number of explanatory variables & nature of relationship between X & Y.20This teleology is based on the number of explanatory variables & nature of relationship between X & Y.21This teleology is based on the number of explanatory variables & nature of relationship between X & Y.22This teleology is based on the number of explanatory variables & nature of relationship between X & Y.23This teleology is based on the number of explanatory variables & nature of relationship between X & Y.24This teleology is based on the number of explanatory variables & nature of relationship between X & Y.24This teleology is based on the number of explanatory variables & nature of relationship between X & Y.26

    27This teleology is based on the number of explanatory variables & nature of relationship between X & Y.28

    30

    31

    32

    33

    34

    35

    36

    40

    42

    43

    44

    45

    49

    50

    51

    52

    54

    57

    58

    59

    62

    65

    66

    67