Top Banner
Relationship Between Two Quantitative Variables Scatterplot Scatterplot: Response Variable: Predictor Variable: Example: Response vs. Predictor Scenario: Question: Answer: Response: Predictor:
13

Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Mar 28, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Ø Scatterplots

Ø Covariance,!Correlation

ØOutliers

Ø Linear!Regression

ØResiduals,!Variation!in!the!Model

Ø Lurking!Variables!and!Causation

Relationship Between Two Quantitative Variables

Lecture!5Sections!4.1!� 4.11

Scatterplot

• Scatterplot: graphical!display!of!the!relationship!between!two!quantitative!variables• Response Variable: variable!plotted!along!y-axis!that!we!are!trying!to!explain!or!predict

• Predictor Variable: variable!plotted!along!x-axis!that!we!are!using!to!explain!changes!about!the!response!variable

• Observations!plotted!as!ordered!pairs

Example: Response vs. Predictor

• Scenario:Want!to!know!if!a!student�s!verbal!SAT!score!gives!any!information!about!their!math!SAT!score!by!randomly!sampling!44!high!school!seniors

• Question: What!should!be!the!response!variable?!!What!should!be!the!predictor!variable?

• Answer:• Response: _________________________

• Predictor: _________________________

Page 2: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Describing a Scatterplot

• Direction: As!predictor!variable!increases�• Positive: response!tends!to!increase

• Negative: response!tends!to!decrease

• Neither: no!obvious!change!in!response

• Form:• Linear: response!tends!to!increase!at!about!the!same!rate!across!all!values!of!predictor

• Curved: rate!at!which!response!changes!depends!on!value!of!predictor

• No Pattern

• Strength: How!tightly!clustered!together!are!the!points?• Usually!described!as!strong,moderate,!or!weak.

Example: Describing a Scatterplot

• Scenario:Want!to!know!if!a!student�s!verbal!SAT!score!gives!any!information!about!their!math!SAT!score!by!randomly!sampling!44!high!school!seniors

• Question: How!should!we!describe!the!relationship?

• Answer: ______________________________• Math!scores!change!at!______________________________________________________________

• Math!scores!tend!to!_____________________!____________________________________________

Example: Describing a Scatterplot

• Scenario: Supermarket!wants!to!know!how!much!they!should!sell!a!gallon!of!milk!for.!!Change!price!every!day!for!3!weeks!and!record!number!of!units!sold.

• Question: How!should!we!describe!the!relationship?

• Answer: ______________________________• Sales!tend!to!_______________________________

• Sales!decline!_______________________________at!lower!values!than!at!higher!values• Not!________________________

Page 3: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Covariance

• Covariance: measure!of!joint!variability!between!two!quantitative!variables

!"# =$% & '$ (% & )( +*+ $, & '$ (, & )(

- & 1

• Sign!of!covariance!dictates!direction!of!relationship

• Covariance!is!unbounded:!values!range!from!&. to!.

• Problem:!Does!not!help!us!interpret!strength!of!relationship

Example: Covariance

• Scenario:Math!vs.!verbal!SAT!scores!on!left.!Comparison!of!sample!of!students�!midterm!and!final!exam!scores!on!right

• Question: Which!scatterplot!has!the!stronger!linear!relationship?

• Answer: _________________________________________________________• Points!are!more!_______________________________________________________

Example: Covariance

• Scenario:Math!vs.!verbal!SAT!scores!on!left.!Comparison!of!sample!of!students�!midterm!and!final!exam!scores!on!right

• Question: What!does!the!covariance!tell!us?

• Answer: __________________________________• Covariance!will!be!large!if!the!_______________!of!the!observations!are!large!regardless!of!how!______________________________________________

!"# = __________

!"# = ______

Page 4: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Correlation

• Correlation: measure!of!the!strength!and!direction!of!the!linear!relationship!between!two!quantitative!variables

• Two!Types:• Population Correlation: Denoted!by!/ (Greek!letter!�rho�)• Parameter!à Generally!unknown

• Sample Correlation:!Denoted!by!0• Statistic!à Good!approximation!of /

0 =!"#

!"!#

Sample!covariance

Standard!deviation!of!predictor!values Standard!deviation!of!response!values

Correlation Facts

• Bounded!between!-1!and!+1• Sign!dictates!direction!of!relationship!(positive or negative)

• Magnitude!dictates!strength!of!relationship• Rule of Thumb: If the magnitude is…

• Between 0.70 and 1.00, the strength of the relationship is strong.

• Between 0.40 and 0.70, the strength of the relationship is moderate.

• Between 0.10 and 0.40, the strength of the relationship is weak.

• Between 0.00 and 0.10, there is little to no relationship.

• Scatterplot!must!have!a!linear form!for!the!correlation!to!make!sense.• As!the!scatterplot!becomes!more!curved,!the!correlation!becomes!less!accurate!as!a!means!of!describing!the!relationship.

Example: Calculating Correlation

• Scenario: Comparing!height!and!weight!of!15!adult!males.

• Question: What!is!the!correlation!between!height!and!weight?

• Answer: __________________________________

• Question: How!should!we!describe!the!linear!relationship?

• Answer: __________________________________

Variable Mean Std. Dev. Covariance

Height (In.) 68.66 1.76 12.92

Weight (Lbs.) 192.91 16.55

Page 5: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Using Excel

Example: Approximating Correlations

• Scenario: Scatterplots!show!observations!with!predictors!from!1!to!25,!responses!from!0!to!100.

• Task: Sort!the!scatterplots!from!weakest!correlation!to!strongest.

• Answer: __________________

• Question: Approximately!what!are!the!correlations!displayed!in!each!scatterplot?

• Answers:

A. B. C.

0 = _____ 0 = _____ 0 = ______

Outliers

• Outlier: point!on!a!scatterplot!that!is!unusually!far!away!from!the!rest!of!the!data• Have!potential!to!drastically!change!value!and/or!direction!of!correlation

• If!an!outlier!exists,!report!correlation!with!and!without!it!included!in!the!calculation

Page 6: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Example: Outliers

• Scenario: Scatterplot!shows!final!scores!of!1st round!games!between!1!and!16!seed!in!NCAA!Tournament!the!last!2!years.

• Question: Which!observation!is!an!outlier?

• Answer: ___________________!à 1!seed:!____;!16!seed:!____• First!time!a!_____________________________________________

• Question: What!impact!does!this!point!have!on!the!correlation?

• Answer: Significantly!__________________• With: 0 = ________; Without: 0 = ________

• Takeaway: Outliers!can!______________________!what!appears!to!be!an!otherwise__________________________.

Virginia vs. UMBC

• Least Squares Regression Line: the!best!line!that!can!be!drawn!through!data!on!a!scatterplot• Summarizes!the!general!pattern!to!help!us!understand!how!the!variables!are!related

• Allow!us!to!make!predictions!about!the!response!(Y)!given!some!value!of!the!predictor!(X)

• Almost!never!perfect,!but!gives!us!a!good!idea!of!how!the!response!changes!as!the!predictor!changes

Least!Squares!Regression!Line

Least Squares Regression Line

Example: Preparing for Linear Regression

Necessary!for!finding!equation!of!regression!line

Not!necessary!for!finding!regression!line,!but!help!us!understand!scope!of!data

Page 7: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Least Squares Regression Line

• Least Squares Regression Line:

2( = 34 + 3%$

• Slope: 3% = 056

57

• Estimated!increase!in!prediction!given!one!unit!increase!in!the!predictor

• Intercept: 34 = )( & 3% '$• Predicted!value!of!response!variable!when!predictor!equals!0

Predicted!Value!of!Response:�(-hat�

Intercept Slope Value!of!predictor

Example: Calculating Slope and Intercept

• Scenario: Using!number!of!attractions!at!amusement!park!to!predict!the!admission!price

• Question: What!is!the!equation!of!the!regression!line?

• Answer:

• Slope: ______________________________________________

• Intercept: ______________________________________________

• Regression Line: _____________________________

Variable Mean Std. Dev. Correlation

Attractions (X) 30 10.23 .588

Admission Price (Y) 64 21.53

Example: Interpreting Slope and Intercept

• Scenario: Using!number!of!attractions!at!amusement!park!to!predict!the!admission!price.!!Equation!of!the!regression!line!is:

2( = 89:;< + 1:8>$

• Question: What!do!the!intercept!and!slope!tell!us?

• Answer:• Intercept: A!park!that!has!_________________!would!have!a!___________________!________________________________

• Slope: For!every!additional!________________________________________,!the!________________________________________________________________.

Warning: Make sure you include the word “predicted” or

“expected”. The regression line only approximates values.

Page 8: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Example: Making Predictions

• Scenario: Using!number!of!attractions!at!amusement!park!to!predict!the!admission!price.!!Equation!of!the!regression!line!is:

2( = 89:;< + 1:8>$

• Question: What!would!we!expect!to!pay!to!enter!an!amusement!park!with!25!attractions?

• Answer: _______________________________

__________________

Residuals

• Residual: the!difference!between!the!predicted!value!of!a!response!and!the!observed!value!from!the!data

? = ( & 2(

• Observations!that!are:• Close to!the!regression!line!will!have!small residuals

• Far away from!the!regression!line!will!have!large residuals

• Above the!regression!line!will!have!positive residuals

• Below the!regression!line!will!have!negative residuals

Residual!(Error) Observed!Value!

From!Data

Predicted!Value!From!

Regression!Line

Example: Residuals

• Scenario: Using!number!of!attractions!at!amusement!park!to!predict!the!admission!price.!!The!amusement!park!that!had!25!attractions!had!an!admission!price!of!$66.

• Question: What!is!the!residual?

• Answer: ___________________________________

• Question: What!does!the!residual!mean?

• Answer: Admission!price!is!_______________________!than!what!would!be!______________________________________________.• Observation!lies!___________________________________________

Page 9: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Assumptions in Linear Regression

• Quantitative Data Condition: Regression!only!applies!to!quantitative!data

• Linearity Assumption: The!relationship!between!the!predictor!and!response!is!fairly!linear.

• Outlier Condition: No!individual!point!drastically!changes!the!slope!of!the!regression!line.

• Independence Assumption: Observations!in!the!data!are!selected!independently!of!one!another.

• Equal Spread Condition: The!spread!of!the!observations!around!the!regression!line!is!consistent!across!all!values!of!the!predictor.

Equal Spread Condition

• Residual Plot: plot!of!predictor!values!against!the!corresponding!residual!for!each!observation• Want: Plot!with!no!discernable!patterns!� no!direction!and!no!shape

Equal!Spread!Condition!Satisfied Equal!Spread!Condition!Not!Satisfied

Magnitude of residual

depends on value of predictor

Sign of residual depends

on value of predictor

No patterns. Range of

residuals same across all

predictor values

Example: Residual Plot

• Scenario: Using!number!of!attractions!at!amusement!park!to!predict!the!admission!price.!!Residual!plot!shown!below.

• Question: What!does!the!residual!plot!reveal?

• Answer:We!would!expect!the!actual!admission!price!of!an!amusement!park!to!be!_________________________________________________!_____________________________________

• Question: Does!the!equal!spread!condition!appear!to!hold?

• Answer: ________• Residuals!appear!to!_________________!________________________________________!________________________________________

Page 10: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Standard Deviation of Residuals

• Standard Deviation of Residuals: measure!of!the!size!of!a!typical!residual

!@ =A?B

- & 8

• Just!like!the!�regular�!standard!deviation,!residuals!that!are:• Within!one!standard!deviation!of!the!regression!line!are!quite!common

• More!than!three!standard!deviations!from!the!regression!line!are!likely!outliers

Note: The standard deviation of the residuals is rarely (if ever)

calculated by hand. We’ll see how to use Excel later to get this value.

Example: Standard Deviation of Residual

• Scenario: The!admission!price!for!an!amusement!park!with!25!rides!was!$66.!!The!predicted!value!was!$57.80!and!the!standard!deviation!of!the!residual!is!$17.76.

• Question: How!unusual!was!our!observation?

• Answer: __________________________• Actual Residual: ____________

• Standard Deviation of Residual: ____________• Actual!residual!is!_________________!� observed!value!within!___________________________!_____________________________________

Variation in the Linear Model

• Two!extremes:• Perfect Linear Model: regression!line!perfectly!predicts!response!all!the!time;!all!residuals!would!be!zero

• “Useless” Linear Model: predictor!gives!no!information!about!response

•Most!linear!models!fall!somewhere!in!between• Predictor!yields!some!information!about!response,!but!predictions!are!not!perfect!and!result!in!some!residuals

Perfect Useless Typical

Page 11: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Variation in the Linear Model

• R-Squared: the!fraction!of!the!variability!in!the!response!(C)!that!is!explained!by!the!predictor!(D)• Calculated!by!squaring!the!correlation:!0B

• No!relationship!between!variables!if!0B = <

• Observations!in!straight!line!if!0B = 1

• Generally!reported!as!a!percentage

Note: Remainder of variation E1 & 0BF is due to natural fluctuations

of the response in the sample and comes from the residuals.

Example: R-Squared

• Scenario: Using!number!of!attractions!at!amusement!park!to!predict!the!admission!price.!!Correlation!is!.588.

• Question: What!is!the!R-squared!value?

• Answer: _____________________________________

• Question: What!does!the!R-squared!mean!in!context?

• Answer: _________!of!the!variability!_____________________!is!explained!by!_________________________________________________________• The!remaining!____________!is!due!to!___________________________________________!caused!by!other!factors!such!as:• ______________________

• ______________________

• ______________________

Using Excel

• To!do!regression!in!Excel,!you!must!install!the!Data!Analysis!ToolPak,!which!is!an!add-in!available!on:• Excel!2007,!2010,!2013,!and!2016!for!PC

• Excel!2016!for!Mac

• To!do!regression!in!Excel:• Choose!�Data!Analysis�!from!�Data�!tab!to!get!the!menu!below

Links!to!videos!are!posted!on!CourseWeb to!help!you!install

Page 12: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Using Excel

Note: Focus on the parts of the output boxed in red. We’ll worry about the rest

once we get to inference later in the semester.

Appropriateness of Regression

• To!determine!if!regression!is!appropriate:• Ensure!variables!are!quantitative and!observations!are!independent

• Check!the!scatterplot!for:

• Linearity

• Outliers

• Calculate!the!equation!of!the!regression!line

• Use!the!regression!line!to!calculate!the!residuals

• Create!the!residual!plot!and!check!equal spread condition

• Report!final!results

Example: Appropriateness of Regression

• Scenario: Comparing!area!of!states!according!to!region!of!the!country!where!1!=!Northeast,!2!=!South,!3!=!Midwest,!4!=!West.!!Equation!of!the!regression!line!is! 2( = &G>H8G9 + GIHJK8$.

• Question: Is!linear!regression!appropriate!to!use?

• Answer: ________• Linear!regression!is!_____________________________________________________________________________• Could!have!assigned!_______________________________

• Slope!has!______________!because!region!___________________

• Should!use!______________________________________

Page 13: Relationship Between Two Quantitative Variables...ØScatterplots ØCovariance,!Correlation ØOutliers ØLinear!Regression ØResiduals,!Variation!in!the!Model ØLurking!Variables!and!Causation

Example: Appropriateness of Regression

• Scenario: Comparing!a!person�s!self-described!happiness!on!a!scale!from!0-15!with!the!number!of!close!friends!they!have

• Question: Is!linear!regression!appropriate!to!use?

• Answer: ________• Residual!plot!_____________________________________

• Happiness!______________________________________________________________________

• Happiness!______________________________________________________

Lurking Variables

• Lurking Variable: a!variable!that!could!potentially!confound!or!confuse!a!relationship!between!two!other!observed!variables• May!seem!as!if!the!predictor!(X)!causes!the!response!(Y)!to!occur

• In!reality,!the!relationship!between!X!and!Y!is!a!result!of!the!lurking!variable!(L)!being!related!to!both

X Y

L

Observed!Relationship

Example: Lurking Variables

• Scenario: Scatterplot!comparing!number!of!firefighters!at!a!fire!with!the!amount!of!damage!caused!(in!dollars).

• Question: Does!having!more!firefighters!cause!more!damage?

• Answer: ____________________

• Question: What!is!the!lurking!variable?

• Answer: ____________________________• _____________________!require!more!help

Firefighters Damage

_________________