Introduction to Econometrics - pith-edu · 2019. 10. 4. · Introduction to Econometrics is an introductory book for undergraduate students in economics ... Econometrics, Accompanied

August 2016

Introduction to Econometrics is an introductory book for undergraduate students in economics

and finance. It is designed to give students an understanding of why econometrics is necessary,

and to provide them with a working knowledge of basic econometric tools so that (1) they can

apply these tools to modeling, estimation, inference, and forecasting in the context of real-world

economic problems; (2) they can evaluate critically the results and conclusions from others who

use basic econometric tools; (3) they have a foundation and understanding for further study of

econometrics, …. It is assumed that students have taken courses in the principles of economics

and had review of probability and statistics. These things, previous semester, you need to remem-

ber to do well in this class .

Introduction to Econometrics

1

បលងគោលននមខវជជជ

ជនាញ ៖ ហរញញវតថ ធនាោរ នងគណគនយយ ១. ចណងគជើងមខវជជជ Econometrics (ឆមាសទ១)

២. គលខកដមខវជជជ ECO401

៣. ចននគរកឌត ៣ (៤៥ គមាោង កនង ១៥សបាត ហ)

៤. ខលមសារមខវជជជ Econometrics មាននយថា ការវាសវវងគសដឋកចច (Economic measurement)។ គ ោះបជជ ទសសន នននការវាសវវងគសដឋកចចតាមវបបបរមាណ (Quantitative method) វដលគតត តគៅគលើ ផលតផលកនងសសកសរប (GDP), នកមមភាព (Unemployment), អតផរណា (Inflation), ការនាចល (Import) នងការនាគចញ (Export) មានសារៈសខានខាល ងកត, វសាលភាពរបស Econometrics មានភាពធគធងជជងគនោះ គោយសារវារតវបានគគចាតទកថាជជ មខវជជជ វទាសាសរសតសងគម វដលកនងគនាោះ ឧបករណរទសតគសដឋកចច, គណតវទា, នងសថតវទា រតវបានយកមកគរបើរបាសសរមាបការវភាគ បាតភតគសដឋកចច។

៥. គោលបណងននមខវជជជ បនាទ បពបញច បមខវជជជ គនោះ នសសតនងអាច៖ - បគងកើតគតសតសមមតកមម - កណតគររទសតតាមវធគណតវទា

- បាោ នរបមាណតនមលបាោរោ វមរតននគរគសដឋកចចវដលរតវបានគរជើសគរ ើស

- រតតពនតយសរកតភាពរបសគរ - គរបើរបាសគរសរមាបការពាករ ៦. រងវវ យតនមលមខវជជជ

លកខណៈវនចឆយ ពនទជជភាគរយ

វតតមាន នងការចលរម ១០

កចចការសគណរ បទបងវា ញ នងលហាតគធវើគៅផទោះ ២០

របឡងពាកកណាត លឆមាស ២៥

របឡងបញច បឆមាស ៤៥

សរប ១០០

2

៧. ការកណតពនទ នងនគទទស

ពនទជាភាគរយ នទទទស ពនទនទទទស អតថនយនទទទស ៩១-១០០ A ៤.០០ លអររទសើរ ៨៥-៩០ B+ ៣.៥០ លអណាស ៧៥-៨៤ B ៣.០០ លអ ៦៥-៧៤ C+ ២.៥០ លអរងគរ ៦០-៦៤ C ២.០០ មធយម <៦០ F 0 ធលល ក

៨. សាសរសាត ចារយៈ ទោក នត លកម ទោក អត ផាន

៩. បលងគមគរៀនសគងខប

សបាត ហ ខលមសារគមគរៀន ចននគមាោង គផសងៗ

1

Chapter 1 What is Econometrics? - Definition of Econometrics - Probability Approach to Econometrics - Econometrics Terms

3

2 - Observational Data - Standard Data Structure - Source of Economic Data

3

3

Chapter 2 Foundations to Econometrics

- Identifying and summarizing data - Population distributions

3

4

- Selecting individuals at random—probability - Random sampling

- Central limit theorem—normal version - Central limit theorem—t version

- Interval estimation

3

5

- Hypothesis testing - The rejection region method - The p-value method - Hypothesis test errors

- Random errors and prediction - Chapter summary and problems

3

6

Chapter 3 Simple Linear Regression (SLR)

- Probability model for X and Y - Least squares criterion

3

7 Chapter 4 SLR Model Evaluation - Model evaluation

- Regression standard error 3

8 - Coefficient of determination—R

2 - Slope parameter

3

3

9 Chapter 5 SLR Assumption, Estimation & Prediction - Model assumptions

- Checking the model assumptions

3

10 - Testing the model assumptions

- Model interpretation 3

11

- Estimation and prediction - Confidence interval for the population mean,

E(Y) - Prediction interval for an individual Y-value

- Chapter summary

3

12

Chapter 6 Multiple Linear Regression Model

- Probability model for (X1, X2, ...) and Y - Least squares criterion - Model evaluation

3

13

- Regression standard error - Coefficient of determination—R

2 - Regression parameters—global usefulness test

3

14

- Regression parameters—nested model test - Regression parameters—individual tests

- Model assumptions - Checking the model assumptions - Testing the model assumptions

3

15

- Model interpretation

- Estimation and prediction - Confidence interval for the population mean,

E(Y) - Prediction interval for an individual Y-value

- Chapter summary and problems

3

១០. វធសាសរសតបគរងៀន

មខវជជជ គនោះរតវសកាតាមវធសាសរសតដចខាងគរកាម៖ - ការអតាថ ធបាយ

- សសសមជឈមណឌ ល

១១. គសៀវគៅសកា

Nith Laksmey, Ith Phanny, Econometrics, Accompanied Slides, 2016

១២. ឯកសារគោង

1. I. Pardoe., Applied Regression Modeling: A Business Approach, 2nd Edition, Jonh Wiley

& Sons, 2012

2. Gujarati, Damodar N., Essentials of Econometrics, 3rd Edition, McGraw Hill International

Edition, 2006

3. Jame H. Stock and Mark W. Watson, Introduction to Econometrics, Addison- Wesley, 2011

Introductory Econometrics for Finance

Copyright 2013 by Chris Brooks

Chapter 1

Introduction

Introductory Econometrics for Finance © Chris Brooks 2014 1

The Nature and Purpose of Econometrics

• What is Econometrics?

• Literal meaning is “measurement in economics”.

• Definition of financial econometrics:

The application of statistical and mathematical techniques to problems in

finance.




Examples of the kind of problems that

may be solved by an Econometrician

1. Testing whether financial markets are weak-form informationally efficient.

2. Testing whether the CAPM or APT represent superior models for the determination of returns on risky assets.

3. Measuring and forecasting the volatility of bond returns.

4. Explaining the determinants of bond credit ratings used by the ratings agencies.

5. Modelling long-term relationships between prices and exchange rates


Examples of the kind of problems that

may be solved by an Econometrician (cont’d)

6. Determining the optimal hedge ratio for a spot position in oil.

7. Testing technical trading rules to determine which makes the most money.

8. Testing the hypothesis that earnings or dividend announcements have no effect on stock prices.

9. Testing whether spot or futures markets react more rapidly to news.

10.Forecasting the correlation between the returns to the stock indices of two countries.




What are the Special Characteristics

of Financial Data?

• Frequency & quantity of data

Stock market prices are measured every time there is a trade or

somebody posts a new quote.

• Quality

Recorded asset prices are usually those at which the transaction took place.

No possibility for measurement error but financial data are “noisy”.


Types of Data and Notation

• There are 3 types of data which econometricians might use for analysis: 1. Time series data 2. Cross-sectional data 3. Panel data, a combination of 1. & 2.

• The data may be quantitative (e.g. exchange rates, stock prices, number of

shares outstanding), or qualitative (e.g. day of the week).

• Examples of time series data Series Frequency GNP or unemployment monthly, or quarterly government budget deficit annually money supply weekly value of a stock market index as transactions occur




Time Series versus Cross-sectional Data

• Examples of Problems that Could be Tackled Using a Time Series Regression

- How the value of a country‟s stock index has varied with that country‟s

macroeconomic fundamentals.

- How the value of a company‟s stock price has varied when it announced the

value of its dividend payment.

- The effect on a country‟s currency of an increase in its interest rate

• Cross-sectional data are data on one or more variables collected at a single point in time, e.g.

- A poll of usage of internet stock broking services

- Cross-section of stock returns on the New York Stock Exchange

- A sample of bond credit ratings for UK banks


Cross-sectional and Panel Data

• Examples of Problems that Could be Tackled Using a Cross-Sectional Regression

- The relationship between company size and the return to investing in its shares

- The relationship between a country‟s GDP level and the probability that the

government will default on its sovereign debt.

• Panel Data has the dimensions of both time series and cross-sections, e.g. the

daily prices of a number of blue chip stocks over two years.

• It is common to denote each observation by the letter t and the total number of

observations by T for time series data, and to to denote each observation by the

letter i and the total number of observations by N for cross-sectional data.




Continuous and Discrete Data

• Continuous data can take on any value and are not confined to take specific numbers.

• Their values are limited only by precision. o For example, the rental yield on a property could be 6.2%, 6.24%, or 6.238%.

• On the other hand, discrete data can only take on certain values, which are usually integers o For instance, the number of people in a particular underground carriage or the number

of shares traded during a day.

• They do not necessarily have to be integers (whole numbers) though, and are often defined to be count numbers. o For example, until recently when they became „decimalised‟, many financial asset

prices were quoted to the nearest 1/16 or 1/32 of a dollar.


Cardinal, Ordinal and Nominal Numbers

• Another way in which we could classify numbers is according to whether they are

cardinal, ordinal, or nominal.

• Cardinal numbers are those where the actual numerical values that a particular variable takes have meaning, and where there is an equal distance between the numerical values.

o Examples of cardinal numbers would be the price of a share or of a building, and the

number of houses in a street.

• Ordinal numbers can only be interpreted as providing a position or an ordering.

o Thus, for cardinal numbers, a figure of 12 implies a measure that is `twice as good' as a figure of 6. On the other hand, for an ordinal scale, a figure of 12 may be viewed as `better' than a figure of 6, but could not be considered twice as good. Examples of ordinal numbers would be the position of a runner in a race.




Cardinal, Ordinal and Nominal Numbers (Cont’d)

• Nominal numbers occur where there is no natural ordering of the values at all.

o Such data often arise when numerical values are arbitrarily assigned, such as telephone

numbers or when codings are assigned to qualitative data (e.g. when describing the exchange that a US stock is traded on.

• Cardinal, ordinal and nominal variables may require different modelling approaches or at least different treatments, as should become evident in the subsequent chapters.


Returns in Financial Modelling

• It is preferable not to work directly with asset prices, so we usually convert the raw prices into a series of returns. There are two ways to do this:

Simple returns or log returns

where, Rt denotes the return at time t pt denotes the asset price at time t ln denotes the natural logarithm

• We also ignore any dividend payments, or alternatively assume that the price series have been already

adjusted to account for them.


%1001

1

t

ttt

p

ppR %100ln

1

t

tt

p

pR



Log Returns

• The returns are also known as log price relatives, which will be used throughout this book. There are a number of reasons for this:

1. They have the nice property that they can be interpreted as continuously

compounded returns.

2. Can add them up, e.g. if we want a weekly return and we have calculated

daily log returns:

r1 = ln p1/p0 = ln p1 - ln p0

r2 = ln p2/p1 = ln p2 - ln p1

r3 = ln p3/p2 = ln p3 - ln p2

r4 = ln p4/p3 = ln p4 - ln p3

r5 = ln p5/p4 = ln p5 - ln p4

ln p5 - ln p0 = ln p5/p0


A Disadvantage of using Log Returns

• There is a disadvantage of using the log-returns. The simple return on a

portfolio of assets is a weighted average of the simple returns on the

individual assets:

• But this does not work for the continuously compounded returns.


R w Rpt ip iti

N

1



Steps involved in the formulation of

econometric models

Economic or Financial Theory (Previous Studies)

Formulation of an Estimable Theoretical Model

Collection of Data

Model Estimation

Is the Model Statistically Adequate?

No Yes

Reformulate Model Interpret Model

Use for Analysis


Some Points to Consider when reading papers

in the academic finance literature

1. Does the paper involve the development of a theoretical model or is it

merely a technique looking for an application, or an exercise in data

mining?

2. Is the data of “good quality”? Is it from a reliable source? Is the size of

the sample sufficiently large for asymptotic theory to be invoked?

3. Have the techniques been validly applied? Have diagnostic tests been

conducted for violations of any assumptions made in the estimation

of the model?




Some Points to Consider when reading papers

in the academic finance literature (cont’d)

4. Have the results been interpreted sensibly? Is the strength of the results

exaggerated? Do the results actually address the questions posed by the

authors?

5. Are the conclusions drawn appropriate given the results, or has the

importance of the results of the paper been overstated?


Bayesian versus Classical Statistics

• The philosophical approach to model-building used here throughout is

based on „classical statistics‟

• This involves postulating a theory and then setting up a model and

collecting data to test that theory

• Based on the results from the model, the theory is supported or refuted

• There is, however, an entirely different approach known as Bayesian

statistics

• Here, the theory and model are developed together

• The researcher starts with an assessment of existing knowledge or beliefs

formulated as probabilities, known as priors

• The priors are combined with the data into a model




Bayesian versus Classical Statistics (Cont’d)

• The beliefs are then updated after estimating the model to form a set of

posterior probabilities

• Bayesian statistics is a well established and popular approach, although

less so than the classical one

• Some classical researchers are uncomfortable with the Bayesian use of

prior probabilities based on judgement

• If the priors are very strong, a great deal of evidence from the data would

be required to overturn them

• So the researcher would end up with the conclusions that he/she wanted in

the first place!

• In the classical case by contrast, judgement is not supposed to enter the

process and thus it is argued to be more objective.


Applied Regression Modeling:A Business Approach

Chapter 1: FoundationsSections 1.1–1.4

by Iain Pardoe

1.1 Identifying and summarizing data 2

Identifying and summarizing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Stem-and-leaf plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Histogram for home prices example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Sample statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Sample standardized Z-values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2 Population distributions 7

Population distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Normal histogram for 1000 simulated home prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Standard normal density curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Critical values for standard normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Assessing normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11QQ-plot for home prices example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3 Selecting individuals at random—probability 13

Normal probability and percentile calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Finding probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Finding percentiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.4 Random sampling 16

Random sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Central limit theorem—normal version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Finding sampling distribution probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18The central limit theorem in action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Student’s t-distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Critical values for t-distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Central limit theorem—t version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1

1.1 Identifying and summarizing data 2 / 22

Identifying and summarizing data

• Overall task: analyze data to inform a (business) decision.• Assume data relevant to the problem has been collected.• Intermediate task: identify and summarize the data.• Example: we’ve moved to a new city and wish to buy a home.• Data: Y = selling price (in $ thousands) for n = 30 randomly sampled single-family homes

(HOMES1):155.5 195.0 197.0 207.0 214.9 230.0239.5 242.0 252.5 255.0 259.9 259.9269.9 270.0 274.9 283.0 285.0 285.0299.0 299.9 319.0 319.9 324.5 330.0336.0 339.0 340.0 355.0 359.9 359.9

c© Iain Pardoe, 2006 2 / 22

Stem-and-leaf plot

• Home prices example:

1 | 6

2 | 0011344

2 | 5666777899

3 | 002223444

3 | 666

• Consider lowest home price represented by “1” in the stem and “6” in the leaf.• This represents a number between 155 and 164.9 (thousand dollars).• In particular, it is the lowest price of $155,500.• What does this graph tell you about home prices in this market?


2

Histogram for home prices example

Compare stem-and-leaf plot with a histogram:

Y (price in $ thousands)

Fre

quen

cy

150 200 250 300 350 400

01

23

45

67


Sample statistics

• Sample mean, mY , measures “central tendency” of Y-values.• Median also measures central tendency, but less sensitive to very small/large values.• Sample standard deviation, sY , measures spread/variation.• Minimum and maximum.• Percentiles, e.g., 25th percentile: 25% of Y-values are smaller and 75% of Y-values are larger.• Question: what’s another name for the 50th percentile?


3

Sample standardized Z-values

• Standardizing calibrates a list of numbers (Y ) to a common scale.• Subtract the mean and divide by the standard deviation:

Z =Y − mY

sY

.

• Sample mean of Z-values? 0• Sample standard deviation of Z-values? 1• Exercise: use statistical software to create graphs, find summary statistics, and calculate

standardized values for home prices example.


1.2 Population distributions 7 / 22

Population distributions

• Population: entire collection of objects of interest.• Sample: (random) subset of population.• Statistical thinking: draw inferences about population by using sample data.• Model: mathematical abstraction of the real world used to make statistical inferences.• Assumptions:

◦ model provides a reasonable fit to sample data,◦ sample is representative of population.

• Normal distribution: simple, effective model (“bell-curve”).


4

Normal histogram for 1000 simulated home prices

What happens to histogram as sample size increases?

Y (price in $ thousands)

Den

sity

150 200 250 300 350 400 450

0.00

00.

002

0.00

40.

006

0.00

8


Standard normal density curve

Shaded area=Pr(standard normal is between a and b):

−3 −2 −1 0 1 2 3a=0 b=1.96

area=0.475


5

Critical values for standard normal distribution

upper-tail area 0.1 0.05 0.025 0.01 0.005 0.001

horizontal axis value 1.282 1.645 1.960 2.326 2.576 3.090

two-tail area 0.2 0.1 0.05 0.02 0.01 0.002

• Horizontal axis values are called critical values.• Tail areas (under the density curve) represent probabilities.• Example: Pr(Z > 1.960) = 0.025

and Pr(0 < Z < 1.960) = 1 − 0.5 − 0.025 = 0.475.• Exercises:

◦ Pr(Z > 1.645) = ?◦ Pr(Z < −2.326 or > 2.326) = ?◦ Pr(Z < ?) = 0.90

(i.e., what is the 90th percentile?).

c© Iain Pardoe, 2006 10 / 22

Assessing normality

• Previous slide showed how to make probability calculations for a standard normal distribution(mean 0, standard deviation 1).

• Section 1.3 shows similar calculations for a normal distribution with any mean and standarddeviation.

• Such calculations are useful if our variable of interest (e.g., home price) has a normaldistribution.

• How can we tell if a particular variable has a normal distribution?

◦ Draw a histogram: is it approximately symmetric and bell-shaped? (see histogram for homeprices example)

◦ Draw a QQ-plot: do the points lie reasonably close to the line? (see next slide)

c© Iain Pardoe, 2006 11 / 22

6

QQ-plot for home prices example

Do the points lie reasonably close to the line?

−2 −1 0 1 2

150

200

250

300

350

Theoretical Quantiles

Sam

ple

Qua

ntile

s

c© Iain Pardoe, 2006 12 / 22

1.3 Selecting individuals at random—probability 13 / 22

Normal probability and percentile calculations

• Connection between normal distribution with any mean, E(Y ), and standard deviation, SD(Y ),and standard normal distribution:

◦ Suppose Y ∼ Normal(E(Y ), SD(Y )2).

◦ Then Z =Y −E(Y )SD(Y ) ∼ Normal(0, 12).

• Idea for finding probabilities: standardize Y into Z-units, then do probability calculation on Z(example next slide).

• Can also go other way to find percentiles: do probability calculation on Z, then unstandardizeZ into Y-units (example subsequent slide).

c© Iain Pardoe, 2006 13 / 22

7

Finding probabilities

• Assume home prices Y ∼ Normal(280, 502).

• Then Z = Y −28050 ∼ Normal(0, 12).

• What is the probability a home price is greater than $360,000?•

Pr (Y > 360) = Pr

(

Y − 280

50>

360 − 280

50

)

= Pr (Z > 1.60)

≈ 0.05.

• What is the probability a home price is less than $165,000?

c© Iain Pardoe, 2006 14 / 22

Finding percentiles

• Assume home prices Y ∼ Normal(280, 502).

• Then Z = Y −28050 ∼ Normal(0, 12).

• What is the 95th percentile of Y ?• Pr (Z > 1.645) = 0.05

Pr

(

Y − 280

50> 1.645

)

= 0.05

Pr (Y > 1.645(50) + 280) = 0.05

Pr (Y > 362) = 0.05.

• What is the 90th percentile of Y ?

c© Iain Pardoe, 2006 15 / 22

8

1.4 Random sampling 16 / 22

Random sampling

• Population parameters: numerical summary measures of the population, e.g.:

◦ mean, E(Y ), and standard deviation, SD(Y ).

• Sample statistics: analagous sample measures, e.g.:

◦ mean, mY , and standard deviation, sY .

• Statistical inference: use sample statistics to infer about (likely values of) populationparameters.

• Example: the sample mean is an estimate of the population mean.• Question: how far off might the estimate be?

◦ Could be a long way off if Y is very variable and/or sample size is small.

• Quantify uncertainty using sampling distributions.

c© Iain Pardoe, 2006 16 / 22

Central limit theorem—normal version

• Randomly sample Y1, Y2, . . . , Yn from a population with mean, E(Y ), and standard deviation,SD(Y ).

• CLT: mY ∼ Normal(E(Y ), SD(Y )2/n),

so Z =mY −E(Y )SD(Y )/

√n∼ Normal(0, 12).

• Assume home prices Y1, Y2, . . . , Y30 have E(Y ) = 280 and SD(Y ) = 50.• What is the 95th percentile of mY ?• Pr (Z > 1.645) = 0.05

Pr

(

mY − 280

50/√

30> 1.645

)

= 0.05

Pr (mY > 1.645(50/√

30) + 280) = 0.05

Pr (mY > 295) = 0.05.

• What is the 90th percentile of mY ?

c© Iain Pardoe, 2006 17 / 22

9

Finding sampling distribution probabilities

• Assume home prices Y1, Y2, . . . , Y30 have E(Y ) = 280 and SD(Y ) = 50.• What is the probability the sample mean is greater than 295?

• CLT: Z =mY −28050/

√30

∼ Normal(0, 12).

•Pr (mY > 295) = Pr

(

mY − 280

50/√

30>

295 − 280

50/√

30

)

= Pr (Z > 1.643)

≈ 0.05.

• What is the probability the sample mean is greater than 292?

c© Iain Pardoe, 2006 18 / 22

The central limit theorem in action

Top: Y population distn. Bottom: mY sampling distn.

Y280 344.100

mY280 291.703

c© Iain Pardoe, 2006 19 / 22

10

Student’s t-distribution

• Drawback to CLT: need to know population standard deviation, SD(Y ), to use it.• Since we rarely know SD(Y ), what would be a good estimate to use instead? The sample s.d.,

sY .• Replacing SD(Y ) with sY requires use of a t-distribution rather than the normal:

◦ t-distribution is like normal but more spread out (fatter tails) to reflect additionaluncertainty;

◦ additional uncertainty is due to using sY instead of assuming we know SD(Y );◦ sY is a better estimate of SD(Y ) for large n;◦ t-distribution accounts for this using degrees of freedom (df= n−1 in this case);◦ as df becomes large, t-distribution looks more and more like normal.

c© Iain Pardoe, 2006 20 / 22

Critical values for t-distributions

upper-tail area 0.1 0.05 0.025 0.01 0.005 0.001

df = 3 1.638 2.353 3.182 4.541 5.841 10.215df = 15 1.341 1.753 2.131 2.602 2.947 3.733df = 29 1.311 1.699 2.045 2.462 2.756 3.396df = 60 1.296 1.671 2.000 2.390 2.660 3.232df = ∞ (normal) 1.282 1.645 1.960 2.326 2.576 3.090

two-tail area 0.2 0.1 0.05 0.02 0.01 0.002

• Horizontal axis values are called critical values.• Tail areas (under the density curve) represent probabilities.• Example: Pr(t29 > 1.699) = 0.05.• Note that critical values get closer to those for the normal as df gets larger.

c© Iain Pardoe, 2006 21 / 22

11

Central limit theorem—t version

• Randomly sample Y1, Y2, . . . , Yn from a population with mean, E(Y ).

• CLT: t-statistic =mY −E(Y )

sY /√

n∼ tn−1

(t-distribution with n−1 df).

• Assume home prices Y1, . . . , Y30 have E(Y )=280.• Sample standard deviation, sY , is 53.8656.• What is the 95th percentile of mY ?• Pr (t29 > 1.699) = 0.05

Pr

(

mY − 280

53.8656/√

30> 1.699

)

= 0.05

Pr (mY > 1.699(53.8656/√

30) + 280) = 0.05

Pr (mY > 297) = 0.05.

• What is the 90th percentile of mY ?

c© Iain Pardoe, 2006 22 / 22

12


Chapter 1: FoundationsSections 1.5–1.7

by Iain Pardoe

1.5 Interval estimation 2Interval estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Confidence interval for E(Y) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Calculating confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Confidence interval interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.6 Hypothesis testing 6Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6The rejection region method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Rejection region example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Hypothesis test for home prices example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9The p-value method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10A p-value example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Hypothesis test for home prices example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12One-tail hypothesis tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13One-tail hypothesis tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14One-tail hypothesis tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15One-tail hypothesis tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Two-tail hypothesis tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Two-tail hypothesis tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Hypothesis test errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.7 Random errors and prediction 20Prediction intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Prediction error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Calculating prediction intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1

1.5 Interval estimation 2 / 22

Interval estimation

• Goal: estimate the population mean E(Y ).• Best point estimate: the sample mean mY .• How far off might we be? Can we quantify our uncertainty?• Confidence interval: point estimate ± uncertainty.• Example: 80% confidence interval for E(Y ) in home prices application is

278.603 ± 12.893 = (265.710, 291.496).• In other words, based on this dataset, we are 80% confident that the population mean home

price is between $266,000 and $291,000.• This leaves quite a bit of room for error (20%), so 90% and 95% intervals are more common.• Question: will a 90% interval be narrower or wider than the 80% interval?


Confidence interval for E(Y)

• Example: 80% confidence interval.• Pr (−90th percentile < tn−1 < 90th percentile)=0.80 where the 90th percentile comes from tn−1

(t-distribution with n−1 df).• Question: why does an 80% interval require 90th percentiles? (draw a picture)

• Next step: plug in tn−1 =mY −E(Y )

sY /√

n.

• Algebra . . .• Pr (mY − 90thpercentile(sY/

√n) <E(Y )< mY + 90thpercentile(sY/

√n))=0.80.

• In other words, the 80% confidence interval can be written mY ± 90th percentile (sY /√

n).• Question: what is the formula for a 90% interval?


2

Calculating confidence intervals

• Example: home prices Y1, . . . , Y30.• Sample mean, mY , is 278.603.• Sample standard deviation, sY , is 53.8656.• Calculate an 80% confidence interval for E(Y ).• 90th percentile of t29 is 1.311.• mY ± 90th percentile (sY /

√n) = 278.603 ± 1.311 (53.8656/

√30) = 278.603 ± 12.893 =

(265.710, 291.496).• Calculate a 90% confidence interval for E(Y ).


Confidence interval interpretation

• Loosely speaking: based on this dataset, we are 80% confident that the population mean homeprice is between $266,000 and $291,000.

• More precisely: If we were to take a large number of random samples of size 30 from apopulation of sale prices and calculate an 80% confidence interval for each, then 80% of thoseconfidence intervals would contain the (unknown) population mean.

• E.g., 10 confidence intervals for samples from a population with E(Y ) marked by the verticalline:

• 8 of the intervals contain E(Y ), while 2 don’t.


3

1.6 Hypothesis testing 6 / 22

Hypothesis testing

• Confidence intervals tell us a range of plausible values for E(Y ) with a specified confidencelevel.

• By contrast, hypothesis tests ask whether a particular value is plausible or not.• Example: does a population mean of $255,000 seem plausible given our sample of 30 home

prices?

◦ Upper-tail test: can we reject the possibility that E(Y ) = 255 in favor of E(Y ) > 255?◦ Lower-tail test: can we reject the possibility that E(Y ) = 255 in favor of E(Y ) < 255?◦ Two-tail test: can we reject the possibility that E(Y ) = 255 in favor of E(Y ) 6= 255?


The rejection region method

• Upper-tail test: null hypothesis NH : E(Y )=255 versus alternative hypothesis AH :E(Y )>255.

• If NH is true, then the sampling distribution of the t-statistic =mY −E(Y )

sY /√

nis tn−1.

• Recall tn−1 has a bell-shape centered at zero with most of it’s area (≈ 95%) between −2 and+2.

• So, if the value of the t-statistic is “not too far” from zero, we cannot reject NH .• Conversely, a t-statistic much larger than zero favors AH (larger since this is an upper -tail

test).• How large does the t-statistic have to be before we reject NH in favor of AH?• Significance level (e.g., 5%) determines a rejection region beyond a critical value (e.g.,

95th percentile of tn−1).


4

Rejection region example


• t-statistic =mY −E(Y )

sY /√

n= 278.603−255

53.8656/√

30= 2.40.

• Significance level = 5%.• Critical value is the 95th percentile of t29 which is 1.699.• Since t-statistic (2.40) > critical value (1.699), we reject NH in favor of AH .• In other words, the sample data suggest that the population mean is greater than $255,000 (at

a 5% significance level).


Hypothesis test for home prices example

Test stat. is in rejection region, p-value < signif. level:

0

significancelevel

p−value

1.699 2.40critical value test statistic


5

The p-value method


• If NH is true, then the sampling distribution of the t-statistic =mY −E(Y )

sY /√

nis tn−1.

• Recall tn−1 has a bell-shape centered at zero with most of it’s area (≈ 95%) between −2 and+2.

• So, if the upper-tail area beyond the t-statistic is “not too small,” we cannot reject NH .• Conversely, a very small upper tail-area favors AH .• How small does the upper-tail area, called the p-value, have to be before we reject NH in favor

of AH?• Smaller than the significance level (e.g., 5%).

c© Iain Pardoe, 2006 10 / 22

A p-value example


• t-statistic =mY −E(Y )

sY /√

n= 278.603−255

53.8656/√

30= 2.40.

• Significance level = 5%.• Since the t-statistic (2.40) is between 2.045 and 2.462, the p-value must be between 0.01 and

0.025.• Since p-value < significance level, we reject NH in favor of AH .• In other words, the sample data suggest that the population mean is greater than $255,000 (at

a 5% significance level).

c© Iain Pardoe, 2006 11 / 22

6

Hypothesis test for home prices example

Test stat. is in rejection region, p-value < signif. level:

0

significancelevel

p−value

1.699 2.40critical value test statistic

c© Iain Pardoe, 2006 12 / 22

One-tail hypothesis tests

E.g., NH : E(Y )=255 vs. AH : E(Y )>255 (@5%).Test stat. is in rejection region, p-value < signif. level:

Upper−tail test: reject null

0

significancelevelp−value

criticalvalue

teststatistic

c© Iain Pardoe, 2006 13 / 22

7


E.g., NH : E(Y )=265 vs. AH : E(Y )>265 (@5%).Test stat. not in rejection region, p-value> signif. level:

Upper−tail test: do not reject null

0

significancelevel

p−value

teststatistic

criticalvalue

c© Iain Pardoe, 2006 14 / 22


E.g., NH : E(Y )=300 vs. AH : E(Y )<300 (@5%).Test stat. is in rejection region, p-value < signif. level:

Lower−tail test: reject null

0

significancelevel

p−value

criticalvalue

teststatistic

c© Iain Pardoe, 2006 15 / 22

8


E.g., NH : E(Y )=290 vs. AH : E(Y )<290 (@5%).Test stat. not in rejection region, p-value> signif. level:

Lower−tail test: do not reject null

0

significancelevel

p−value

teststatistic

criticalvalue

c© Iain Pardoe, 2006 16 / 22

Two-tail hypothesis tests

E.g., NH : E(Y )=255 vs. AH : E(Y ) 6=255 (@5%).Test stat. is in rejection region, p-value < signif. level:

Two−tail test: reject null

0

significancelevelp−value

criticalvalue

teststatistic

−criticalvalue

−teststatistic

c© Iain Pardoe, 2006 17 / 22

9

Two-tail hypothesis tests

E.g., NH : E(Y )=265 vs. AH : E(Y ) 6=265 (@5%).Test stat. not in rejection region, p-value> signif. level:

Two−tail test: do not reject null

0

p−value

significancelevel

criticalvalue

teststatistic

−criticalvalue

−teststatistic

c© Iain Pardoe, 2006 18 / 22

Hypothesis test errors

• Four possible hypothesis test outcomes:Decision

Do not reject NH Reject NHin favor of AH in favor of AH

NH true correct decision type 1 errorReality

NH false type 2 error correct decision• Pr(type 1 error) = signif. level; analyst selects this.• But, setting it too low can increase the chance of a type 2 error occuring.• Trade-off: set signif. level at 5% (sometimes 1% or 10%); reduce chance of type 2 error by

having n as large as possible, using sound statistical methods.• Also, use hypothesis tests judiciously and always keep in mind the possibility of making these

errors.

c© Iain Pardoe, 2006 19 / 22

10

1.7 Random errors and prediction 20 / 22

Prediction intervals

• New problem: predict an individual Y-value picked at random from the population.• Is this easier or more difficult than estimating the population mean?• More difficult: imagine predicting the sale price of a new home on the market (versus estimating

the average sale price of homes in this market)—which answer would you be less certain about?• Approach: calculate a prediction interval—like a confidence interval but with a larger range of

uncertainty.• Confidence interval: point estimate ± estimation uncertainty.• Prediction interval: point estimate ± prediction uncertainty.

c© Iain Pardoe, 2006 20 / 22

Prediction error

• Model: Yi = E(Y ) + ei (i = 1, . . . , n).• Y-value to be predicted: Y ∗= E(Y ) + e∗.• Point estimate of Y ∗? Sample mean, mY .• Prediction error: Y ∗− mY = (E(Y ) − mY ) + e∗.• Variance of estimation error (E(Y )−mY ): s2

Y/n.

• Var. of random error (e∗): s2

Y.

• Var. of prediction error (Y ∗−mY ): s2

Y(1+1/n).

• Confidence interval for E(Y ): mY ± t-percentile(sY /√

n).

• Prediction interval for Y ∗: mY ± t-percentile(

sY

√

1+1/n)

.

• Which is wider?

c© Iain Pardoe, 2006 21 / 22

11

Calculating prediction intervals

• Example: home prices Y1, . . . , Y30.• Sample mean, mY , is 278.603.• Sample standard deviation, sY , is 53.8656.• Calculate an 80% prediction interval for Y .• 90th percentile of t29 is 1.311.

• mY ± 90th percentile(

sY

√

1+1/n)

= 278.603 ± 1.311(

53.8656√

1+1/30)

=

278.603 ± 71.785 = (206.818, 350.388).• We’re 80% confident the sale price of an individual, randomly selected home in this market will

be between $207,000 and $350,000.• Calculate a 90% prediction interval for Y .

c© Iain Pardoe, 2006 22 / 22

12


Chapter 2: Simple Linear RegressionSections 2.1–2.3

by Iain Pardoe

2.1 Probability model for X and Y 2

Simple linear regression model for X and Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Possible relationships between X and Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Straight-line model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4HOMES2 data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Simple linear regression model equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Least squares criterion 7

Least squares criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Estimating the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Estimated equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Computer output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Model evaluation 11

Model evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Evaluating fit numerically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Regression standard error, s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Regression standard error interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Coefficient of determination, R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Calculating R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Interpreting R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17R2 examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Correlation examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Slope parameter, b1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Hypothesis test for b1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Computer output and slope test illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Slope confidence interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1

2.1 Probability model for X and Y 2 / 24

Simple linear regression model for X and Y

• Y is a quantitative response variable(a.k.a. dependent, outcome, or output variable).

• X is a quantitative predictor variable(a.k.a. independent or input variable, or covariate).

• Two variables play different roles, so important to identify which is which and define carefully,e.g.:

◦ Y is sale price, in $ thousands;◦ X is floor size, in thousands of square feet.

• How much do we expect Y to change by when we change the value of X?• What do we expect the value of Y to be when we set the value of X at 2?• Note: association (observational data) not causation (experimental data).


Possible relationships between X and Y

Which factors might lead to the different relationships?

floor size

sale

pric

e

floor size

sale

pric

e

floor size

sale

pric

e

floor size

sale

pric

e


2

Straight-line model

• Simple linear regression models straight-line relationships (like upper two plots on last slide).• Suppose sale price is (on average) $190,300 plus 40.8 times floor size.

◦ E(Y |Xi) = 190.3 + 40.8Xi,where E(Y |Xi) means “the expected value of Y given that X is equal to Xi”.

• Individual sale prices can deviate from this expected value by an amount ei (called a “randomerror”).

◦ Yi |Xi = 190.3 + 40.8Xi + ei (i = 1, . . . , n).◦ Yi |Xi = deterministic part + random error.

• Error, ei, represents variation in Y due to factors other than X which we haven’t measured,e.g., lot size, # beds/baths, age, garage, schools.


HOMES2 dataY 259.9 259.9 269.9 270.0 285.0X 1.683 1.708 1.922 2.053 2.269

1.7 1.8 1.9 2.0 2.1 2.2 2.3

260

265

270

275

280

285

X = floor size (in thousands of square feet)

Y =

sal

e pr

ice

(in th

ousa

nds

of d

olla

rs)

E(Y|X) = 190.3 + 40.8X

e


3

Simple linear regression model equation

Population: E(Y |X) = b0 + b1X.

0 2 4 6

01

23

45

X = predictor

Y =

res

pons

e

E(Y|X) = b0 + b1X

b1 = slope change in Y

change in X

b0 = Y−intercept


2.2 Least squares criterion 7 / 24

Least squares criterion

Which line fits the data best?

1.7 1.8 1.9 2.0 2.1 2.2 2.3

260

265

270

275

280

285

X = floor size

Y =

sal

e pr

ice


4

Estimating the model

• Population: E(Y |X) = b0 + b1X.• Sample: Y = b0 + b1X (estimated model).• Obtain b0 and b1 by finding best fit line

(least squares line).• Mathematically, minimize sum of squared errors:

SSE =

n∑

i=1

e2

i

=n

∑

i=1

(Yi − Yi)2

=

n∑

i=1

(Yi − b0 − b1Xi)2.

• Can use calculus (partial derivatives), but we’ll use computer software to find b0 and b1.


Estimated equation

Sample: Y = b0 + b1X.

1.7 1.8 1.9 2.0 2.1 2.2 2.3

260

265

270

275

280

285

X = floor size

Y =

sal

e pr

ice

Y = b0 + b1X

Y = b0 + b1X + e

e

(fitted value)

(observed value)

(estimated error)


5

Computer outputParameters a

Model Estimate Std. Error t-stat Pr(> |t|)1 (Intercept) 190.318 11.023 17.266 0.000

X 40.800 5.684 7.179 0.006a Response variable: Y.

• Estimated equation: Y = b0 + b1X = 190.3 + 40.8X.• We expect Y = b0 when X =0, but only if this makes sense and we have data close to X =0

(not the case here).• We expect Y to change by b1 when X increases by one unit, i.e., we expect sale price to

increase by $40,800 when floor size increases by 1000 sq. feet.• For this example, more meaningful to say we expect sale price to increase by $4080 when floor

size increases by 100 sq. feet.

c© Iain Pardoe, 2006 10 / 24

2.3 Model evaluation 11 / 24

Model evaluation

How well does the model fit each dataset?

X

Y

X

Y

X

Y

X

Y

c© Iain Pardoe, 2006 11 / 24

6

Evaluating fit numerically

Three methods:

• How close are the actual observed Y-values to the model-based fitted values, Y ?

◦ Calculate the regression standard error, s.

• How much of the variability in Y have we been able to explain with our model?

◦ Calculate the coefficient of determination, R2.

• How strong is the evidence of a straight-line relationship between Y and X?

◦ Estimate and test the slope parameter, b1.

c© Iain Pardoe, 2006 12 / 24

Regression standard error, sModel Summary

Adjusted RegressionModel Multiple R R Squared R Squared Std. Error1 0.972 a 0.945 0.927 2.7865a Predictors: (Intercept), X.

• Regression standard error, s, estimates the std. dev.of the simple linear regression random errors:

s =

√

SSE

n − 2.

• Unit of measurement for s is the same as unit of measurement for Y.• Approximately 95% of the observed Y-values lie within plus or minus 2s of their fitted Y-values.• Since 2s=5.57, we can expect to predict an unobserved sale price from a particular floor size

to within approx. ±$5570 (at a 95% confidence level).

c© Iain Pardoe, 2006 13 / 24

7

Regression standard error interpretation

CLT: 95% of Y-values lie within ±2s of regression line.

X

Y + 2s

− 2s

c© Iain Pardoe, 2006 14 / 24

Coefficient of determination, R2

Measures of variation for simple linear regression.

1.7 1.8 1.9 2.0 2.1 2.2 2.3

260

265

270

275

280

285

X = floor size

Y =

sal

e pr

ice

SSE = ∑i=1

n(Yi − Yi)2

TSS = ∑i=1

n(Yi − mY)2

c© Iain Pardoe, 2006 15 / 24

8

Calculating R2

• Without model, estimate Y with sample mean mY .• With model, estimate Y using fitted Y-value.• How much do we reduce our error when we do this?• Total error without model:

TSS =∑

n

i=1(Yi − mY )2, variation in Y about mY .

• Remaining error with model:SSE =

∑

n

i=1(Yi − Yi)

2, unexplained variation.

• Proportional reduction in error: R2 = TSS−SSETSS .

• Home prices example: R2 = 423.4−23.3423.4 = 0.945.

• 94.5% of the variation in sale price (about its mean) can be explained by a straight-linerelationship between sale price and floor size.

c© Iain Pardoe, 2006 16 / 24

Interpreting R2

Model Summary


• R2 measures the proportion of variation in Y (about its mean) that can be explained by astraight-line relationship between Y and X.

• If TSS = SSE then R2 = 0: using X to predict Y hasn’t helped and we might as well use mY

to predict Y regardless of the value of X.• If SSE = 0 then R2 = 1: using X allows us to predict Y perfectly (with no random errors).• Such extremes rarely occur and usually R2 lies between zero and one, with higher values of R2

corresponding to better fitting models.

c© Iain Pardoe, 2006 17 / 24

9

R2 examples

X

Y

R2 = 0.05

X

Y

R2 = 0.13

X

Y

R2 = 0.3

X

YR2 = 0.62

X

Y

R2 = 0.74

X

Y

R2 = 0.84

X

Y

R2 = 0.94

X

Y

R2 = 0.97

X

Y

R2 = 1

c© Iain Pardoe, 2006 18 / 24

CorrelationModel Summary


• Correlation coefficient, r, measures the strength and direction of linear association between Y

and X:

◦ r ≈ −1 indicates a negative linear relationship;◦ r ≈ +1 indicates a positive linear relationship;◦ r ≈ 0 indicates no linear relationship.

• Simple linear regression:√

R2 = absolute value of r (“multiple R” above).• But, correlation is less useful than R2 in multiple linear regression.

c© Iain Pardoe, 2006 19 / 24

10

Correlation examples

X

Y

r = −1, R2 = 1

X

Y

r = −0.97, R2 = 0.94

X

Y

r = −0.81, R2 = 0.66

X

Yr = −0.57, R2 = 0.32

X

Y

r = −0.01, R2 = 0

X

Y

r = 0.62, R2 = 0.38

X

Y

r = 0.87, R2 = 0.76

X

Y

r = 0.97, R2 = 0.95

X

Y

r = 1, R2 = 1

c© Iain Pardoe, 2006 20 / 24

Slope parameter, b1

Infer from sample slope about the population slope.

X

Y

E(Y) = b0 + b1X

Y = b0 + b1X + e

(expected value)

(population value)

e (random error)

c© Iain Pardoe, 2006 21 / 24

11

Hypothesis test for b1

• Recall univariate t-statistic =mY −E(Y )

sY /√

n∼ tn−1.

• Here, slope t-statistic =b1−b1

sb1

∼ tn−2.

• NH: b1 =0 versus AH: b1 6=0.

• t-statistic =b1−b1

sb1

= 40.8−05.684 = 7.18.

• Significance level = 5%.• Critical value is 3.182 (97.5th percentile of t3).• Since t-statistic (7.18) is between 5.841 and 10.215, p-value is between 0.01 and 0.002.• Since t-statistic (7.18) > critical value (3.182) and p-value < signif. level, reject NH in favor of

AH.• In other words, the sample data favor a nonzero slope (at a significance level of 5%).• Exercise: do an upper tail test for this example.

c© Iain Pardoe, 2006 22 / 24

Computer output and slope test illustration

Parameters a


X 40.800 5.684 7.179 0.006a Response variable: Y.

X

Y

c© Iain Pardoe, 2006 23 / 24

12

Slope confidence interval

• Calculate a 95% confidence interval for b1.• 97.5th percentile of t3 is 3.182.• b1 ± 97.5th percentile (s

b1) = 40.8 ± 3.182 × 5.684 = 40.8 ± 18.1 = (22.7, 58.9).

• Loosely speaking: based on this dataset, we are 95% confident that the population slope, b1, isbetween 22.7 and 58.9.

• More precisely: if we were to take a large number of random samples of size 5 from ourpopulation of homes and calculate a 95% confidence interval for each, then 95% of thoseconfidence intervals would contain the (unknown) population slope.

• Exercise: calculate a 90% confidence interval for b1.

c© Iain Pardoe, 2006 24 / 24

13


Chapter 2: Simple Linear RegressionSections 2.4–2.7

by Iain Pardoe

2.4 Model assumptions 2

Regression model assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Viewing the assumptions on a scatterplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Checking the model assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Residual plots which pass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Residual plots which fail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Histograms of residuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7QQ-plots of residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Assessing assumptions in practice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.5 Model interpretation 10

Homes example model results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Interpreting model results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Regression summary plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.6 Estimation and prediction 13

Estimation and prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Confidence interval for populationmean, E(Y) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Prediction interval for an individual Y-value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Confidence and prediction intervals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.7 Chapter summary 17

Steps in a simple linear regression analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1

2.4 Model assumptions 2 / 17

Regression model assumptions

Four assumptions about random errors, e = Y − E(Y ) = Y − b0 − b1X:

• Probability distribution of e at each value of X has a mean of zero;• Probability distribution of e at each value of X has constant variance;• Probability distribution of e at each value of X is normal;• Value of e for one observation is independent of the value of e for any other observation.


Viewing the assumptions on a scatterplot

Random error probability distributions.

X

Y

E(Y) = b0 + b1X


2

Checking the model assumptions• Calculate residuals, e = Y − Y = Y − b0 − b1X.• Draw a residual plot with e along the vertical axis and X along the horizontal axis.

◦ Assess zero mean assumption—do the residuals average out to zero as we move across theplot from left to right?

◦ Assess constant variance assumption—is the (vertical) variation of the residuals similar aswe move across the plot from left to right?

◦ Assess independence assumption—do residuals look “random” with no systematicpatterns?

• Draw a histogram and QQ-plot of the residuals.

◦ Assess normality assumption—does histogram look approximately bell-shaped andsymmetric and do QQ-plot points lie close to line?


Residual plots which pass

X

resi

dual

s

X

resi

dual

s

X

resi

dual

s

X

resi

dual

s

X

resi

dual

s

X

resi

dual

s

X

resi

dual

s

X

resi

dual

s

X

resi

dual

s

X

resi

dual

s

X

resi

dual

s

X

resi

dual

s


3

Residual plots which fail

X

resi

dual

s

X

resi

dual

s

X

resi

dual

s

X

resi

dual

s

X

resi

dual

s

X

resi

dual

s

X

resi

dual

s

Xre

sidu

als

X

resi

dual

s

X

resi

dual

s

X

resi

dual

s

X

resi

dual

s


Histograms of residuals

Upper three pass, lower three fail

residuals residuals residuals



4

QQ-plots of residuals


normal quantiles

resi

dual

qua

ntile

s

normal quantiles

resi

dual

qua

ntile

s

normal quantiles

resi

dual

qua

ntile

s

normal quantiles

resi

dual

qua

ntile

s

normal quantiles

resi

dual

qua

ntile

s

normal quantiles

resi

dual

qua

ntile

s


Assessing assumptions in practice

• Assessing assumptions in practice can be difficult and time-consuming.• Taking the time to check the assumptions is worthwhile and can provide additional support for

any modeling conclusions.• Clear violation of one or more assumptions could mean results are questionable and should

probably not be used (possible remedies to come in Chapters 3 and 4).• Regression results tend to be quite robust to mild violations of assumptions.• Checking assumptions when n is very small (or very large) can be particularly challenging.• Example: CARS2 data file—is weight or horsepower better for predicting cost?


5

2.5 Model interpretation 10 / 17

Homes example model resultsModel Summary


Parameters a


X 40.800 5.684 7.179 0.006

95% Confidence Interval

Model Lower Bound Upper Bound1 (Intercept) 155.238 225.398

X 22.712 58.888a Response variable: Y.

c© Iain Pardoe, 2006 10 / 17

Interpreting model results• We found a statistically significant straight-line relationship (at a 5% significance level)

between Y = sale price ($k) and X = floor size (k sq. feet).• Estimated equation: Y = b0+b1X = 190.3+40.8X.• X =0 does not make sense for this application, nor do we have data close to X =0, so we

cannot meaningfully interpret b0 = 190.3.• Expect sale price to increase $4080 when floor size increases 100 sq. feet, for 1683–2269

sq. feet homes (95% confident sale price increases between $2270 and $5890 when floor sizeincreases 100 sq. feet).

• Can expect a prediction of an unobserved sale price from a particular floor size to be accurateto within approximately ±$5570 (with 95% confidence).

• 94.5% of the variation in sale price (about its mean) can be explained by a straight-linerelationship between sale price and floor size.

c© Iain Pardoe, 2006 11 / 17

6

Regression summary plot

Simple linear regression model.

1.7 1.8 1.9 2.0 2.1 2.2 2.3

260

265

270

275

280

285

X = floor size (in thousands of square feet)

Y =

sal

e pr

ice

(in th

ousa

nds

of d

olla

rs)

Y = 190.3 + 40.8X

c© Iain Pardoe, 2006 12 / 17

2.6 Estimation and prediction 13 / 17

Estimation and prediction

• Recall the confidence interval for a univariate population mean, E(Y ):mY ± t-percentile(sY /

√n).

• Also, a prediction interval for an individual univariate Y-value:

mY ± t-percentile(

sY

√

1+1/n)

.

• Similar distinction between confidence and prediction intervals for simple linear regression.• Confidence interval for the population mean, E(Y ), at a particular X-value is

Y ± t-percentile(sY).

• Prediction interval for an individual Y-value at a particular X-value is Y ∗± t-percentile(sY ∗).

• Which should be wider? Is it harder to estimate a mean or predict an individual value?

c© Iain Pardoe, 2006 13 / 17

7

Confidence interval for populationmean, E(Y)

• Formula: Y ± t-percentile(sY)

where sY

= s√

1

n+ (Xp−mX)2

∑ni=1

(Xi−mX)2.

• Interval is narrower:

◦ when n is large;◦ when Xp is close to its sample mean, mX ;◦ when the regression standard error, s, is small;◦ for lower levels of confidence.

• Example: for home prices–floor size dataset, the 95% confidence interval for E(Y ) when X =2is (267.7, 276.1).

• Interpretation: we’re 95% confident that average sale price is between $267,700 and $276,100for 2000 square foot homes.

c© Iain Pardoe, 2006 14 / 17

Prediction interval for an individual Y-value• Formula: Y ∗± t-percentile(s

Y ∗)

where sY ∗ = s

√

1 + 1

n+ (Xp−mX )2

∑ni=1

(Xi−mX)2.


◦ when n is large;◦ when Xp is close to its sample mean, mX ;◦ when the regression standard error, s, is small;◦ for lower levels of confidence.

• Since sY ∗ >s

Y, prediction interval is wider than confidence interval.

• Example: home prices–floor size dataset, the 95% prediction interval forY ∗ atX =2 is(262.1, 281.7).

• Interpretation: we’re 95% confident that the sale price for an individual 2000 square foot homeis between $262,100 and $281,700.

• What is a 95% prediction interval for large n?

c© Iain Pardoe, 2006 15 / 17

8

Confidence and prediction intervals

Compare widths of confidence and prediction intervals.

X

Y

upper prediction limit

upper confidence limitE(Y) = b0 + b1Xlower confidence limit

lower prediction limit

c© Iain Pardoe, 2006 16 / 17

2.7 Chapter summary 17 / 17

Steps in a simple linear regression analysis

• Formulate model.• Construct a scatterplot of Y versus X.• Estimate model using least squares.• Evaluate model:

◦ Regression standard error, s;◦ Coefficient of determination, R2;◦ Population slope, b1.

• Check model assumptions.• Interpret model.• Estimate E(Y ) and predict Y.

c© Iain Pardoe, 2006 17 / 17

9


Chapter 3: Multiple Linear RegressionSections 3.1–3.3.2

by Iain Pardoe

3.1 Probability model for (X1, X2, . . . ) and Y 2

Multiple linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Multiple linear regression model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33D-scatterplot of (Y, X1, X2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Multiple linear regression equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.2 Least squares criterion 6

Estimating the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6HOMES3 data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Scatterplot matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Multiple linear regression model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Computer output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10


Evaluating fit numerically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Regression standard error, s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Calculating R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Interpreting R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Disadvantage of R2 for model building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Adjusted R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Using adjusted R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17SHIPDEPT data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Adjusted R2 for SHIPDEPT data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Multiple correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Low correlation between Y and X1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21High correlation between Y and X1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1

3.1 Probability model for (X1, X2, . . . ) and Y 2 / 22

Multiple linear regression

• Y is a quantitative response variable(a.k.a. dependent, outcome, or output variable).

• (X1, X2, . . . ) are quantitative predictor variables(a.k.a. independent/input variables, or covariates).

• Important to identify variables and define them carefully, e.g.:

◦ Y is final exam score, out of 100;◦ X1 is time spent partying during last week of term, in hours;◦ X2 is average time spent studying during term, in hours per week.

• How much do we expect Y to change by when we change the values of X1 and/or X2?• What do we expect the value of Y to be when X1 = 7.5 and X2 = 1.3?


Multiple linear regression model

• Suppose exam score is (on average) 70 minus 1.6 times party hours plus 2.0 times study hours.

◦ E(Y |(X1i, X2i)) = 70 − 1.6X1i + 2.0X2i,where E(Y |(X1i, X2i)) means “the expected value of Y given that X1 =X1i and X2 =X2i”.

• Individual scores can deviate from this expected value by an amount ei (called a “randomerror”).

◦ Yi |(X1i, X2i)= 70 − 1.6X1i + 2.0X2i + ei (i = 1, . . . , n)= deterministic part + random error.

• Error, ei, represents variation in Y due to factors other than X1 and X2 which we haven’tmeasured, e.g., quantitative skills, exam-taking ability.

• Example: E(Y ) = 70 − 1.6(7.5) + 2.0(1.3) = 60.6. If Y =65, then e=65 − 60.6=4.4.


2

3D-scatterplot of (Y, X1, X2)Y = exam score, X1 = party time, X2 = study time

x1x2

yobserved Y

error

fitted value

−50

−55

−60

−65

−70

−75

−80

−85

−90


Multiple linear regression equation

• Population: E(Y |(X1, X2, . . . )) = b0 + b1X1 + b2X2 + · · · .• Interpretation:

◦ b0: expected Y-value when X1 =X2 = . . .=0;◦ b1: “slope in the X1-direction”

(i.e., when X2, X3, . . . are held constant);◦ b2: “slope in the X2-direction”

(i.e., when X1, X3, . . . are held constant).

• Sample: Y = b0 + b1X1 + b2X2 + · · · .◦ How can we estimate b0, b1, b2, · · · ?


3

3.2 Least squares criterion 6 / 22

Estimating the model

• Model: E(Y ) = b0 + b1X1 + b2X2 + · · · + bkXk.• Estimate: Y = b0 + b1X1 + b2X2 + · · ·+ bkXk.• Obtain b0, b1, b2, · · · , bk by finding best fit “hyperplane” (using least squares).• Mathematically, minimize sum of squared errors:

SSE =

n∑

i=1

e2

i

=n

∑

i=1

(Yi − Yi)2

=

n∑

i=1

(Yi − b0 − b1X1i − b2X2i − · · · − bkXki)2.

• Can use calculus (partial derivatives), but we’ll use computer software to find b0, b1, b2, · · · , bk.


HOMES3 dataY 252.5 259.9 259.9 269.9 270.0 285.0X1 1.888 1.683 1.708 1.922 2.053 2.269X2 2 5 4 4 3 3

Y = sale price

1.7 1.9 2.1

255

265

275

285

1.7

1.9

2.1

X1 = floor size

255 265 275 285 2.0 3.0 4.0 5.0

2.0

3.0

4.0

5.0

X2 = lot size


4

Scatterplot matrix

• A matrix of scatterplots showing all bivariate relationships in a multivariate dataset(e.g., previous slide).

• However, patterns cannot tell us whether a multiple linear regression model can provide auseful mathematical approximation to these bivariate relationships.

• Primarily useful for identifying any strange patterns or odd-looking values that might warrantfurther investigation before we start modeling.

• Home price–floor size example:no odd values to worry about.


Multiple linear regression model

• Propose this multiple linear regression model:

Y = E(Y ) + e

= b0 + b1X1 + b2X2 + e.

• Random errors, e, represent variation in Y due to factors other than X1 and X2 that we haven’tmeasured, e.g., numbers of bedrooms/bathrooms, property age, garage size, or nearby schools.

• Use least squares to estimate the deterministic part of the model, E(Y ), asY = b0 + b1X1 + b2X2.

◦ i.e., use statistical software to find the values of b0, b1, and b2 that minimizeSSE =

∑

n

i=1(Yi − b0 − b1X1i − b2X2i)

2.


5

Computer output Parameters a


X1 61.976 6.113 10.139 0.002X2 7.091 1.281 5.535 0.012

a Response variable: Y.

• Fitted model: Y = 122.36 + 61.98X1 + 7.09X2.• Expect Y = b0 when X1 =X2 =0, but only if this makes sense and we have data close to

X1 =X2 =0• Expect Y to change by b1 when X increases by one and other predictor X-variables stay

constant, i.e., expect sale price to increase $6200 when floor size increases 100 sq. feet and lotsize stays constant.

• Expect Y to change by b2 when X increases by one and other predictor X-variables stayconstant, i.e., expect sale price to increase $7090 when lot size increases one category and floorsize stays constant.

c© Iain Pardoe, 2006 10 / 22



Three methods:


◦ Calculate regression standard error, s (3.3.1).


◦ Calculate coefficient of determination, R2 (3.3.2).

• How strong is the evidence of our modeled relationship between Y and (X1, X2, . . . )?

◦ Estimate/test regression parameters, b1, b2, . . .◦ Globally (3.3.3), in subsets (3.3.4),

individually (3.3.5).

c© Iain Pardoe, 2006 11 / 22

6

Regression standard error, sModel Summary

Adjusted RegressionModel Multiple R R Squared R Squared Std. Error1 0.986 a 0.972 0.953 2.4753a Predictors: (Intercept), X1, X2.

• Regression standard error, s, estimates the std. dev.of the multiple linear regression random errors:

s =

√

SSE

n−k−1.

• Unit of measurement for s is the same as unit of measurement for Y.• Approximately 95% of the observed Y-values lie within plus or minus 2s of their fitted Y-values.• 2s=4.95, so expect to predict an unobserved sale price from particular floor and lot size values

to within approx. ±$4950 (at a 95% confidence level).

c© Iain Pardoe, 2006 12 / 22

Calculating R2

• Without model, estimate Y with sample mean mY .• With model, estimate Y using fitted Y-value.• How much do we reduce our error when we do this?• Total error without model:

TSS =∑

n

i=1(Yi − mY )2, variation in Y about mY .

• Remaining error with model:SSE =

∑

n

i=1(Yi − Yi)

2, unexplained variation.

• Proportional reduction in error: R2 = TSS−SSETSS .

• Home price–floor size example: R2 = 0.972.• 97.2% of the variation in sale price (about its mean) can be explained by a multiple linear

regression relationship between sale price and(floor size, lot size).

c© Iain Pardoe, 2006 13 / 22

7

Interpreting R2

Model Summary


• R2 measures the proportion of variation in Y (about its mean) that can be explained by amultiple linear regression relationship between Y and (X1, X2, . . .).

• If TSS = SSE then R2 = 0: using (X1, X2, . . .) to predict Y hasn’t helped and we may as welluse mY to predict Y regardless of the (X1, X2, . . .) values.

• If SSE = 0 then R2 = 1: using (X1, X2, . . .) allows us to predict Y perfectly (with no randomerrors).

• Such extremes rarely occur and usually R2 lies between zero and one, with higher values of R2

corresponding to better fitting models.

c© Iain Pardoe, 2006 14 / 22

Disadvantage of R2 for model building

• Model building: what is the best way to model the relationship between Y and(X1, X2, . . . , Xk)?

◦ e.g., should we use all k predictors, or just a subset?

• Consider a sequence of nested models, with each model in the sequence adding predictors tothe previous model.

• Which model would R2 say is the “best” model? The final model with k predictors.• Geometrical argument: start with a regression line on a 2D-scatterplot, then add a second

predictor to make the line a plane in a 3D-scatterplot.• In other words, R2 always increases (or stays the same) as you add predictors to a model.

c© Iain Pardoe, 2006 15 / 22

Adjusted R2

• R2 has a clear interpretation since it represents the proportion of variation in Y (about itsmean) explained by a multiple linear regression relationship between Y and (X1, X2, . . . ).

• But, R2 is not appropriate for finding a model that captures the major, important populationrelationships without overfitting every slight twist and turn in the sample relationships.

• We need an alternate criterion, which penalizes models that contain too many unimportantpredictor variables:

adjusted R2 = 1 −(

n−1

n−k−1

)

(

1−R2)

.

• In practice, we can obtain the value for adjusted R2 directly from statistical software.

c© Iain Pardoe, 2006 16 / 22

8

Using adjusted R2

Model Summary

Adjusted RegressionModel Multiple R R Squared R Squared Std. Error1 0.826 a 0.682 0.603 7.1775a Predictors: (Intercept), X1.

2 0.986 a 0.972 0.953 2.4753a Predictors: (Intercept), X1, X2.

• Since adjusted R2 is 0.603 for the single-predictor model, but 0.953 for the two-predictormodel, the two-predictor model is better than the single-predictor model (according to this criterion).

• In other words, there is no indication that adding X2 = lot size to the model causes overfitting.• What happens to R2 and s?

c© Iain Pardoe, 2006 17 / 22

SHIPDEPT dataY (labor X1 (weight X2 (truck X3 (average X4

hours) shipped) proportion) weight) (week)100 5.1 90 20 185 3.8 99 22 2. . . . . . . . . . . . . . .85 4.8 58 25 20

• Y = weekly labor hours• X1 = total weight shipped in thousands of pounds• X2 = proportion shipped by truck• X3 = average shipment weight in pounds• X4 = week• Compare two models:

◦ E(Y ) = b0 + b1X1 + b3X3;◦ E(Y ) = b0 + b1X1 + b2X2 + b3X3 + b4X4.

c© Iain Pardoe, 2006 18 / 22

9

Adjusted R2 for SHIPDEPT dataModel Summary


2 0.905 a 0.820 0.771 9.103a Predictors: (Intercept), X1, X2, X3, X4.

• Since adjusted R2 is 0.786 for the two-predictor model, but 0.771 for the four-predictor model,the two-predictor model is better than the four-predictor model (according to this criterion).

• In other words, there is a suggestion that adding X2 = truck proportion and X4 = week to themodel causes overfitting.

• What happens to R2 and s?

c© Iain Pardoe, 2006 19 / 22

Multiple correlationModel Summary


• The multiple correlation coefficient, multiple R, measures the strength and direction of linearassociation between the observed Y-values and the fitted Y-values from the model.

• Multiple linear regression: multiple R = +√

R2.

◦ e.g., 0.986=√

0.972 for the home price–floor size example above.

• Beware: intuition about correlation can be seriously misleading when it comes to multiple linearregression (see next two slides).

c© Iain Pardoe, 2006 20 / 22

10

Low correlation between Y and X1

X1 can still be a useful predictor of Y in a MLR model.

1 2 3 4 5 6 7

510

1520

25

X1 = advertising (in $m)

Y =

sal

es (

in $

m)

c© Iain Pardoe, 2006 21 / 22

High correlation between Y and X1

X1 may be a poor predictor of Y in a MLR model.

2 3 4 5 6 7 8

68

1012

1416

18

X1 = traditional advertising (in $m)

Y =

sal

es (

in $

m)

c© Iain Pardoe, 2006 22 / 22

11


Chapter 3: Multiple Linear RegressionSections 3.3.3–3.3.5

by Iain Pardoe


Evaluating fit numerically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

3.3.3 Global usefulness test 3

Global usefulness test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Density curve for an F-distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Global usefulness test for HOMES3 data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Global usefulness test for SHIPDEPT data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.3.4 Nested model test 7

Do some predictors overfit the data? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Nested model test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Nested F-statistic for SHIPDEPT data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Nested model test results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Compare reduced and complete models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.3.5 Individual test 12

Individual regression parameter test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Hypothesis test for bp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Individual t-test computer output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Individual t-tests and nested F-tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Regression parameter confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Correlation revisited: Y and X1 uncorrelated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17But Y associated with (X1, X2) together. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Correlation revisited: Y and X1 correlated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19But X1 and X2 even more highly correlated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Predictor selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1



Three methods:


◦ Calculate regression standard error, s (3.3.1).


◦ Calculate coefficient of determination, R2 (3.3.2).

• How strong is the evidence of our modeled relationship between Y and (X1, X2, . . . )?

◦ Estimate/test regression parameters, b1, b2, . . .◦ Globally (3.3.3), in subsets (3.3.4),

individually (3.3.5).


3.3.3 Global usefulness test 3 / 21

Global usefulness test

• Model: E(Y ) = b0 + b1X1 + b2X2 + · · · + bkXk.Could all k population regression parameters be 0?

• NH: b1 =b2 = · · ·=bk =0AH: at least one of b1, b2, . . . , bk is not equal to 0.

• Global F-stat =(TSS−SSE)/kSSE/(n−k−1) =

R2/k

(1−R2)/(n−k−1).

• Significance level = 5%.• Critical value is 95th percentile of the F-distribution with k numerator df and n−k−1

denominator df.• The p-value is the area to the right of the global F-statistic for the F-distribution with k

numerator df and n−k−1 denominator df.• If the global F-statistic falls in the rejection region, or the p-value is less than the significance

level, then we reject NH in favor of AH.


2

Density curve for an F-distribution

F−test: reject null

0

significancelevel

p−value

critical value test statistic


Global usefulness test for HOMES3 dataANOVA a

Sum ofModel Squares df Mean Square Global F-stat Pr(>F)

1 Regression 630.259 2 315.130 51.434 0.005 b

Residual 18.381 3 6.127Total 648.640 5

a Response variable: Y.b Predictors: (Intercept), X1, X2.

•Global F-stat =

(TSS−SSE)/k

SSE/(n−k−1)=

(648.640−18.381)/2

18.381/(6−2−1)

=R2/k

(1−R2)/(n−k−1)=

0.97166/2

(1−0.97166)/(6−2−1)

= 51.4.

• Critical value, FINV(0.05,2,3), is 9.55.• p-value, FDIST(51.4,2,3), is 0.005.• Reject NH in favor of AH; at least one of the predictors, (X1,X2), is linearly related to Y.


3

Global usefulness test for SHIPDEPT dataANOVA a


1 Regression 5646.052 4 1411.513 ? ? b

Residual 1242.898 15 82.860Total 6888.950 19

a Response variable: Y.b Predictors: (Intercept), X1, X2, X3, X4.

•Global F-stat =

(TSS−SSE)/k

SSE/(n−k−1)

=R2/k

(1−R2)/(n−k−1)

= ?

• Critical value, FINV(0.05,4,15), is 3.06.• p-value, FDIST(?,4,15), is ?• Reject NH in favor of AH; at least one of the predictors, (X1,X2,X3,X4), is linearly related to Y.


3.3.4 Nested model test 7 / 21

Do some predictors overfit the data?

• Suppose a global usefulness test suggests at least one of (X1, X2, . . . , Xk) is linearly related toY.

• Can a reduced model with less than k predictor variables be better than a complete k-predictormodel?

◦ If a subset of the X’s provides no useful information about Y beyond the informationprovided by the other X’s.

• Complete k-predictor model: SSEC.• Reduced r-predictor model: SSER.• Which is larger? (recall geometric argument)• Which model is favored if it is a lot larger?• Which model is favored if it is just a little larger?


4

Nested model test

• Reduced model: E(Y ) = b0+b1X1+· · ·+brXr.• Complete model: E(Y ) = b0+b1X1+· · ·+brXr+br+1Xr+1+· · ·+bkXk.• NH: br+1 = · · ·=bk =0

AH: at least one of br+1, . . . , bk is not equal to 0.

• Nested F-stat =(SSER−SSEC)/(k−r)

SSEC/(n−k−1) .

• Significance level = 5%.• Critical value is 95th percentile of the F-distribution with k−r numerator df and n−k−1

denominator df.• The p-value is the area to the right of the nested F-statistic for the F-distribution with k−r

numerator df and n−k−1 denominator df.• If the nested F-statistic falls in the rejection region, or the p-value is less than the significance

level, then we reject NH in favor of AH.


Nested F-statistic for SHIPDEPT dataANOVA a


C Regression 5646.052 4 1411.513 17.035 0.000 b

Residual 1242.898 15 82.860Total 6888.950 19

a Response variable: Y.b Predictors: (Intercept), X1, X2, X3, X4.

R Regression 5567.889 2 2783.945 35.825 0.000 b

Residual 1321.061 17 77.709Total 6888.950 19

a Response variable: Y.b Predictors: (Intercept), X1, X3.

•Nested F-stat =

(SSER−SSEC)/(k−r)

SSEC/(n−k−1)

=(1321.061−1242.898)/(4−2)

1242.898/(20−4−1)

= 0.472.


5

Nested model test results

• Reduced model: E(Y ) = b0 + b1X1 + b3X3.• Complete model: E(Y ) = b0 + b1X1 + b2X2 + b3X3 + b4X4.• NH: b2 =b4 =0

AH: at least one of b2 or b4 is not equal to 0.• Nested F-stat = 0.472.• Significance level = 5%.• Critical value, FINV(0.05,2,15), is 3.68.• p-value, FDIST(0.472,2,15), is 0.633.• Cannot reject NH in favor of AH.• Neither X2 nor X4 appears to provide useful information about Y beyond the information

provided by X1 and X3.• Reduced model is favored.

c© Iain Pardoe, 2006 10 / 21

Compare reduced and complete modelsModel Summary

Adjusted Regression Change StatisticsModel R Squared R Squared Std. Error F-stat df1 df2 Pr(>F)

R 0.808 a 0.786 8.815C 0.820 b 0.771 9.103 0.472 2 15 0.633a Predictors: (Intercept), X1, X3.b Predictors: (Intercept), X1, X2, X3, X4.

• There is a suggestion that adding X2 = truck proportion and X4 = week to the model causesoverfitting. Why?

◦ Adjusted R2 is higher for the reduced model.◦ The regression standard error, s, is lower for the reduced model.◦ The nested F-stat is not significant (high p-value), so the reduced model is favored.

c© Iain Pardoe, 2006 11 / 21

6

3.3.5 Individual test 12 / 21

Individual regression parameter test

• Which predictors to test in a nested model test?• One possible approach is to consider the regression parameters individually.• What do the estimated sample estimates, b1, b2, . . . , bk, tell us about likely values for the

population parameters, b1, b2, . . . , bk?• An individual t-test for bp considers whether there is evidence that Xp provides useful

information about Y beyond the information provided by the other k−1 predictors. In otherwords:

◦ should we retain Xp in the model with the other k−1 predictors (evidence suggests bp 6=0);◦ or, should we consider removing Xp from the model and retain only the other k−1

predictors (evidence cannot rule out bp =0)?

c© Iain Pardoe, 2006 12 / 21

Hypothesis test for bp

• Recall slope t-statistic = b1−b1

sb1

∼ tn−2.

• Here, t-statistic for bp =bp−bp

sbp

∼ tn−k−1.

• Example: NH: b1 =0 versus AH: b1 6=0.

• t-statistic =b1−b1

sb1

= 6.074−02.662 = 2.28.

• Significance level = 5%.• Critical value, TINV(0.05,15), is 2.13.• p-value, TDIST(2.28,15,2), is 0.038.• Since t-statistic (2.28) > critical value (2.13) and p-value < signif. level, reject NH in favor of

AH.• Sample data favor b1 6=0 (at a 5% signif. level).• There appears to be a linear relationship between Y and X1, once X2, X3, and X4 have been

accounted for (or holding X2, X3, and X4 constant).

c© Iain Pardoe, 2006 13 / 21

7

Individual t-test computer outputParameters a


X1 6.074 2.662 2.281 0.038X2 0.084 0.089 0.951 0.357X3 −1.746 0.760 −2.297 0.036X4 −0.124 0.380 −0.328 0.748


• Last two cols: individual t-stats and two tail p-values.• Low p-values indicate potentially useful predictors that should be retained (i.e., X1 and X3

here).• High p-values indicate possible candidates for removal from the model (i.e., X2 and X4 here).• However, high p-value for X2 means we can remove X2, but only if we retain X1, X3, and X4.• Similarly, high p-value for X4 means we can remove X4, but only if we retain X1, X2, and X3.

c© Iain Pardoe, 2006 14 / 21

Individual t-tests and nested F-tests

• Can do individual regression parameter t-tests to:

◦ remove just one redundant predictor at a time;◦ or to identify which predictors to investigate with a nested model F-test.

• Need to do a nested model F-test to remove more than one predictor at a time.• Using nested model F-tests allows us to use fewer hypothesis tests overall to help identify

redundant predictors (so that the remaining predictors appear to explain Y adequately).

◦ This also lessens the chance of making any hypothesis test errors.

c© Iain Pardoe, 2006 15 / 21

8

Regression parameter confidence intervals

• Calculate a 95% confidence interval for b1.• 97.5th percentile of t15 is TINV(0.05,15)=2.131.• b1 ± 97.5th percentile (s

b1) = 6.074 ± 2.131 × 2.662 = 6.074 ± 5.673 = (0.40, 11.75).

• Loosely speaking: based on this dataset, we are 95% confident that the the populationregression parameter, b1, is between 0.40 and 11.75 in the modelE(Y ) = b0 + b1X1 + b2X2 + b3X3 + b4X4.

• More precisely: if we were to take a large number of random samples of size 20 from ourpopulation of shipping numbers and calculate a 95% confidence interval for b1 in each, then95% of those confidence intervals would contain the true (unknown) population regressionparameter.

• What happens to this interval in the model E(Y ) = b0 + b1X1 + b3X3?

c© Iain Pardoe, 2006 16 / 21

Correlation revisited: Y and X1 uncorrelated

X1 can still be a useful predictor of Y in a MLR model.

1 2 3 4 5 6 7

510

1520

25


Y =

sal

es (

in $

m)

c© Iain Pardoe, 2006 17 / 21

9

But Y associated with (X1, X2) together

Linear association between Y and X1 for fixed X2.

1 2 3 4 5 6 7

510

1520

25


Y =

sal

es (

in $

m)

1

1 12

2

23

3

34

44

Points marked by X2 = # of stores

c© Iain Pardoe, 2006 18 / 21

Correlation revisited: Y and X1 correlated

X1 may be a poor predictor of Y in a MLR model.

2 3 4 5 6 7 8

68

1012

1416

18

X1 = traditional advertising (in $m)

Y =

sal

es (

in $

m)

c© Iain Pardoe, 2006 19 / 21

10

But X1 and X2 even more highly correlated

Unstable estimates when both X1 and X2 in model.

Y = sales

2 4 6 8

610

1418

24

68

X1 = trad advert

6 10 14 18 1 3 5 7

13

57

X2 = internet advert

c© Iain Pardoe, 2006 20 / 21

Predictor selection

• Global usefulness test to determine whether any of the potential predictors in a dataset areuseful.

• Nested model F-tests and individual parameter t-tests to identify the most important predictors.• Employ tests judiciously to avoid conducting too many tests and reduce chance of making

mistakes.• If possible, identification of the important predictors should also be guided by practical

considerations and background knowledge about the application.• When k is very large, computer intensive methods can help get things started:

◦ Forward selection: predictors added sequentially to an initial zero-predictor model;◦ Backward elimination: predictors excluded sequentially from the full k-predictor model;◦ Combined stepwise method: can proceed forwards or backwards at each stage;◦ Other machine learning/data mining methods.

c© Iain Pardoe, 2006 21 / 21

11


Chapter 3: Multiple Linear RegressionSections 3.4–3.6

by Iain Pardoe

3.4 Model assumptions 2

Regression model assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Checking the model assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Residual plots which pass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Residual plots which fail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Histograms of residuals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6QQ-plots of residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Assessing assumptions in practice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8MLRA residual plots—zero mean check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9MLRA model 2 residual plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10MLRA residual histogram and QQ-plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.5 Model interpretation 12

Shipping example model building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Shipping example two-predictor model results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Interpreting model results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Interpreting model results (cont) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.6 Estimation and prediction 16

Confidence interval for populationmean, E(Y) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Prediction interval for an individual Y-value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1

3.4 Model assumptions 2 / 17

Regression model assumptions

Four assumptions about random errors, e = Y − E(Y ) = Y − b0 − b1X1 − · · · − bkXk:

• Probability distribution of e at each set of values (X1, X2, . . . , Xk) has a mean of zero;• Probability distribution of e at each set of values (X1, X2, . . . , Xk) has constant variance;• Probability distribution of e at each set of values (X1, X2, . . . , Xk) is normal;• Value of e for one observation is independent of the value of e for any other observation.


Checking the model assumptions• Calculate residuals, e = Y − Y = Y − b0 − b1X1 − · · · − bkXk.• Draw a residual plot with e along the vertical axis and a function of (X1, X2, . . . , Xk) along the

horizontal axis (e.g., Y or one of the X’s).

◦ Assess zero mean assumption—do the residuals average out to zero as we move across theplot from left to right?

◦ Assess constant variance assumption—is the (vertical) variation of the residuals similar aswe move across the plot from left to right?

◦ Assess independence assumption—do residuals look “random” with no systematicpatterns?

• Draw a histogram and QQ-plot of the residuals.

◦ Assess normality assumption—does histogram look approximately bell-shaped andsymmetric and do QQ-plot points lie close to line?


2

Resid

ualplo

tsw

hic

hpass

X

residuals

X

residuals

X

residuals

X

residuals

X

residuals

X

residuals

X

residuals

Xresiduals

X

residuals

X

residuals

X

residuals

X

residuals

c©Iain

Pard

oe,

20064

/17

Resid

ualplo

tsw

hic

hfa

il

X

residuals

X

residuals

X

residuals

X

residuals

X

residuals

X

residuals

X

residuals

X

residuals

X

residuals

X

residuals

X

residuals

X

residuals

c©Iain

Pard

oe,

20065

/17

3

Histograms of residuals





QQ-plots of residuals


normal quantiles

resi

dual

qua

ntile

s

normal quantiles

resi

dual

qua

ntile

s

normal quantiles

resi

dual

qua

ntile

s

normal quantiles

resi

dual

qua

ntile

s

normal quantiles

resi

dual

qua

ntile

s

normal quantiles

resi

dual

qua

ntile

s


4

Assessing assumptions in practice

• Assessing assumptions in practice can be difficult and time-consuming.• Taking the time to check the assumptions is worthwhile and can provide additional support for

any modeling conclusions.• Clear violation of one or more assumptions could mean results are questionable and should

probably not be used.• Possible remedy: try a different subset of available predictors (further ideas to come in Chapter

4).• Regression results tend to be quite robust to mild violations of assumptions.• Checking assumptions when n is very small (or very large) can be particularly challenging.• Example: MLRA data file.


MLRA residual plots—zero mean check

Model 1 on the left: E(Y ) = b0 + b1X1 + b2X2.Model 2 on the right: E(Y )=b0+b1X1+b2X2+b3X3.

0.0 0.2 0.4 0.6 0.8

−1.

00.

00.

51.

0

X3

Mod

el 1

res

idua

ls

0.0 0.2 0.4 0.6 0.8

−0.

50.

00.

51.

0

X3

Mod

el 2

res

idua

ls

Plots include “loess fitted lines” (computational method for applying “slicing/averaging”technique).Do either of the models fail the zero mean assumption?


5

MLRA model 2 residual plots

Model 2 fitted values

Mod

el 2

res

idua

ls

Model 2 fitted values

Mod

el 2

res

idua

ls

X1

Mod

el 2

res

idua

ls

X1

Mod

el 2

res

idua

ls

X2

Mod

el 2

res

idua

ls

X2

Mod

el 2

res

idua

ls

X3

Mod

el 2

res

idua

ls

X3

Mod

el 2

res

idua

ls

c© Iain Pardoe, 2006 10 / 17

MLRA residual histogram and QQ-plot

The approximately bell-shaped and symmetric histogram and QQ-plot points lying close to the linesupport the normality assumption.

Model 2 residuals−0.5 0.0 0.5 1.0

02

46

810

−2 −1 0 1 2

−0.

50.

00.

51.

0

Normal quantiles

Mod

el 2

res

idua

l qua

ntile

s

c© Iain Pardoe, 2006 11 / 17

6

3.5 Model interpretation 12 / 17

Shipping example model building

Model Summary

Adjusted Regression Change StatisticsModel R Squared R Squared Std. Error F-stat df1 df2 Pr(>F)

1 0.808 a 0.786 8.8152 0.820 b 0.771 9.103 0.472 2 15 0.633a Predictors: (Intercept), X1, X3.b Predictors: (Intercept), X1, X2, X3, X4.

There is no evidence at the 5% significance level that X2 (proportion shipped by truck) or X4

(week) provide useful information about Y (weekly labor hours) beyond the information providedby X1 (total weight shipped in thousands of pounds) and X3 (average shipment weight in pounds).

c© Iain Pardoe, 2006 12 / 17

Shipping example two-predictor model resultsModel Summary


Parameters a


X1 5.001 2.261 2.212 0.041X3 −2.012 0.668 −3.014 0.008

95% Confidence Interval

Model Lower Bound Upper BoundX1 0.231 9.770X3 −3.420 −0.604


c© Iain Pardoe, 2006 13 / 17

7

Interpreting model results• We found a statistically significant straight-line relationship (at a 5% significance level)

between Y and X1 (holding X3 constant)and between Y and X3 (holding X1 constant).

• Estimated equation: Y =110.43+5.00X1−2.01X3.• X1 =X3 =0 makes no sense for this application, nor do we have data close to X1 =X3 =0, so

cannot meaningfully interpret b0 = 110.43.• Expect increase of 5 weekly labor hours when total weight increases 1000 pounds and

ave. shipment weight remains constant, for total weights of 2000–10,000 pounds andave. weights of 10–30 pounds (95% confident increase is 0.23–9.77).

• Expect decrease of 2.01 weekly labor hours when ave. weight increases 1 pound and totalweight remains constant, for total weights of 2000–10,000 pounds and ave. weights of 10–30pounds (95% confident decrease is 0.60–3.42).

c© Iain Pardoe, 2006 14 / 17

Interpreting model results (cont)

• Can expect a prediction of unobserved weekly labor hours from particular values of total weightshipped and average shipment weight to be accurate to within approximately ±17.6 (with 95%confidence).

• 80.8% of the variation in weekly labor hours (about its mean) can be explained by a multiplelinear regression relationship between labor hours and (total weight shipped, average shipmentweight).

c© Iain Pardoe, 2006 15 / 17

8

3.6 Estimation and prediction 16 / 17

Confidence interval for populationmean, E(Y)

• Estimate the mean (or expected) value of Y at particular values of (X1, X2, . . . , Xk).• Formula: Y ± t-percentile(s

Y).


◦ when n is large;◦ when X’s are close to their sample means;◦ when the regression standard error, s, is small;◦ for lower levels of confidence.

• Example: for shipping example two-predictor model, the 95% confidence interval for E(Y )when X1 =6 and X3 =20 is (95.4, 105.0).

• Interpretation: we’re 95% confident that expected weekly labor hours is between 95.4 and105.0 when total weight shipped is 6000 pounds and average shipment weight is 20 pounds.

c© Iain Pardoe, 2006 16 / 17

Prediction interval for an individual Y-value• Predict an individual value of Y at particular values of (X1, X2, . . . , Xk).• Formula: Y ∗± t-percentile(s

Y ∗).• Interval is narrower:

◦ when n is large;◦ when X’s are close to their sample means;◦ when the regression standard error, s, is small;◦ for lower levels of confidence.

• Since sY ∗ >s

Y, prediction interval is wider than confidence interval.

• Example: for shipping example two-predictor model, the 95% prediction interval for Y ∗whenX1 =6 and X3 =20 is (81.0, 119.4).

• Interpretation: we’re 95% confident that actual labor hours in a week is between 81.0 and119.4 when total weight shipped is 6000 pounds and average shipment weight is 20 pounds.

c© Iain Pardoe, 2006 17 / 17

9

APPENDIX B

CRITICAL VALUES FOR t-DISTRIBUTIONS

Table B.l contains critical values or percentiles for t-distributions; a description of how to use the table precedes it. Figure B.l illustrates how to use the table to find bounds for an upper-tail p-value. Bounds for a lower-tail p-value involve a similar procedure for the negative (left-hand) side of the density curve. To find bounds for a two-tail p-value, multiply each bound for the corresponding upper-tail p-value by 2; for example, the two-tail p-value for the situation in Figure B.l lies between 0.05 and 0.10.

Use Table B.l and Figure B.2 to find critical values or percentiles for t-distributions; each row of the table corresponds to a t-distribution with the degrees of freedom shown in the left-hand column. The critical values in the body of the table represent values along the horizontal axis of the figure. Each upper-tail significance level in bold at the top of the table represents the area under the curve to the right of a critical value. For example, if the curve in the figure represents a t-distribution with 60 degrees of freedom, the right-hand shaded area under the curve to the right of the critical value 2.000 represents an upper-tail significance level of 0.025. Each two-tail significance level in bold at the bottom of the table represents the sum of the areas to the right of a critical value and to the left of the negative of that critical value. For example, for a t-distribution with 60 degrees of freedom, the sum of the shaded areas under the curve to the right of the critical value 2.000 and to the left of —2.000 represents a two-tail significance level of 0.05.

For t-distributions with degrees of freedom not in the table (e.g., 45), to be conservative you should use the table row corresponding to the next lowest number (i.e., 40 for 45 degrees of freedom), although you will lose some accuracy when you do this. Alternatively, use

Applied Regression Modeling, Second Edition. By Iain Pardoe Copyright © 2012 John Wiley & Sons, Inc.

289

APPENDIX B

CRITICAL VALUES FOR t-DISTRIBUTIONS

Table B.l contains critical values or percentiles for t-distributions; a description of how to use the table precedes it. Figure B.l illustrates how to use the table to find bounds for an upper-tail p-value. Bounds for a lower-tail p-value involve a similar procedure for the negative (left-hand) side of the density curve. To find bounds for a two-tail p-value, multiply each bound for the corresponding upper-tail p-value by 2; for example, the two-tail p-value for the situation in Figure B.l lies between 0.05 and 0.10.

Use Table B.l and Figure B.2 to find critical values or percentiles for t-distributions; each row of the table corresponds to a t-distribution with the degrees of freedom shown in the left-hand column. The critical values in the body of the table represent values along the horizontal axis of the figure. Each upper-tail significance level in bold at the top of the table represents the area under the curve to the right of a critical value. For example, if the curve in the figure represents a t-distribution with 60 degrees of freedom, the right-hand shaded area under the curve to the right of the critical value 2.000 represents an upper-tail significance level of 0.025. Each two-tail significance level in bold at the bottom of the table represents the sum of the areas to the right of a critical value and to the left of the negative of that critical value. For example, for a t-distribution with 60 degrees of freedom, the sum of the shaded areas under the curve to the right of the critical value 2.000 and to the left of —2.000 represents a two-tail significance level of 0.05.

For t-distributions with degrees of freedom not in the table (e.g., 45), to be conservative you should use the table row corresponding to the next lowest number (i.e., 40 for 45 degrees of freedom), although you will lose some accuracy when you do this. Alternatively, use


289

2 9 0 CRITICAL VALUES FOR t-DISTRIBUTIONS

/ 0.05 p-value > 0.025

^ I J^2* 0 1.671 1.83 2.000

critical test critical value statistic value

Figure B.l Density curve for a t-distribution showing two critical values from Table B.l to the left and to the right of a calculated test statistic. The upper-tail p-value is between the corresponding upper-tail significance levels at the top of the table, in this case 0.025 and 0.05.

Figure B.2 Density curve for a t-distribution showing critical values (or percentiles or t-statistics) along the horizontal axis and significance levels (or probabilities or p-values) as areas under the curve.

computer help #8 in the software information files available from the book website to find exact percentiles (or critical values). For example, computer software will show that the 97.5th percentile of the t-distribution with 40 degrees of freedom is 2.021, while the 97.5th percentile of the t-distribution with 45 degrees of freedom is 2.014. Be careful to input the correct significance level when using software to find t-percen tiles. For example, if you enter "0.05," software that expects a one-tail significance level will return the 95th percentile, whereas software that expects a two-tail significance will return the 97.5th percentile. Use computer help #9 to turn these calculations around and find tail areas (or p-values). Again, be careful about whether the software is working with one- or two-tail areas. For example, the upper-tail area corresponding to the test statistic 2.021 for a t-distribution with 40 degrees of freedom is 0.025, while the two-tail area corresponding to the test statistic 2.021 for a t-distribution with 40 degrees of freedom is 0.05.

2 9 0 CRITICAL VALUES FOR t-DISTRIBUTIONS

/ 0.05 p-value > 0.025

^ I J^2* 0 1.671 1.83 2.000

critical test critical value statistic value

Figure B.l Density curve for a t-distribution showing two critical values from Table B.l to the left and to the right of a calculated test statistic. The upper-tail p-value is between the corresponding upper-tail significance levels at the top of the table, in this case 0.025 and 0.05.

Figure B.2 Density curve for a t-distribution showing critical values (or percentiles or t-statistics) along the horizontal axis and significance levels (or probabilities or p-values) as areas under the curve.

computer help #8 in the software information files available from the book website to find exact percentiles (or critical values). For example, computer software will show that the 97.5th percentile of the t-distribution with 40 degrees of freedom is 2.021, while the 97.5th percentile of the t-distribution with 45 degrees of freedom is 2.014. Be careful to input the correct significance level when using software to find t-percen tiles. For example, if you enter "0.05," software that expects a one-tail significance level will return the 95th percentile, whereas software that expects a two-tail significance will return the 97.5th percentile. Use computer help #9 to turn these calculations around and find tail areas (or p-values). Again, be careful about whether the software is working with one- or two-tail areas. For example, the upper-tail area corresponding to the test statistic 2.021 for a t-distribution with 40 degrees of freedom is 0.025, while the two-tail area corresponding to the test statistic 2.021 for a t-distribution with 40 degrees of freedom is 0.05.

CRITICAL VALUES FOR t-DISTRIBUTIONS 291

Table B.l Percentiles or critical values for t-distributions. The final row of the table labeled Z represents the standard normal distribution (equivalent to a t-distribution with infinite degrees of freedom).

t-distribution upper-tail significance level df

2 3 4 5

6 7 8 9

10

11 12 13 14 15

16 17 18 19 20

21 22 23 24 25

26 27 28 29 30

40 50 60 70 80 90

100

200 500

1000

z df

0.1

1.886 1.638 1.533 1.476

1.440 1.415 1.397 1.383 1.372

1.363 1.356 1.350 1.345 1.341

1.337 1.333 1.330 1.328 1.325

1.323 1.321 1.319 1.318 1.316

1.315 1.314 1.313 1.311 1.310

1.303 1.299 1.296 1.294 1.292 1.291 1.290

1.286 1.283 1.282

1.282

0.2

0.05

2.920 2.353 2.132 2.015

1.943 1.895 1.860 1.833 1.812

1.796 1.782 1.771 1.761 1.753

1.746 1.740 1.734 1.729 1.725

1.721 1.717 1.714 1.711 1.708

1.706 1.703 1.701 1.699 1.697

1.684 1.676 1.671 1.667 1.664 1.662 1.660

1.653 1.648 1.646

1.645

0.1

0.025

4.303 3.182 2.776 2.571

2.447 2.365 2.306 2.262 2.228

2.201 2.179 2.160 2.145 2.131

2.120 2.110 2.101 2.093 2.086

2.080 2.074 2.069 2.064 2.060

2.056 2.052 2.048 2.045 2.042

2.021 2.009 2.000 1.994 1.990 1.987 1.984

1.972 1.965 1.962

1.960

0.05

0.01

6.965 4.541 3.747 3.365

3.143 2.998 2.896 2.821 2.764

2.718 2.681 2.650 2.624 2.602

2.583 2.567 2.552 2.539 2.528

2.518 2.508 2.500 2.492 2.485

2.479 2.473 2.467 2.462 2.457

2.423 2.403 2.390 2.381 2.374 2.368 2.364

2.345 2.334 2.330

2.326

0.02

0.005

9.925 5.841 4.604 4.032

3.707 3.499 3.355 3.250 3.169

3.106 3.055 3.012 2.977 2.947

2.921 2.898 2.878 2.861 2.845

2.831 2.819 2.807 2.797 2.787

2.779 2.771 2.763 2.756 2.750

2.704 2.678 2.660 2.648 2.639 2.632 2.626

2.601 2.586 2.581

2.576

0.01

0.001

22.327 10.215 7.173 5.893

5.208 4.785 4.501 4.297 4.144

4.025 3.930 3.852 3.787 3.733

3.686 3.646 3.610 3.579 3.552

3.527 3.505 3.485 3.467 3.450

3.435 3.421 3.408 3.396 3.385

3.307 3.261 3.232 3.211 3.195 3.183 3.174

3.131 3.107 3.098

3.090

0.002 t-distribution two-tail significance level

CRITICAL VALUES FOR t-DISTRIBUTIONS 291

Table B.l Percentiles or critical values for t-distributions. The final row of the table labeled Z represents the standard normal distribution (equivalent to a t-distribution with infinite degrees of freedom).

t-distribution upper-tail significance level df

2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 50 60 70 80 90

100 200 500

1000

z df

0.1

1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.303 1.299 1.296 1.294 1.292 1.291 1.290 1.286 1.283 1.282 1.282

0.2

0.05

2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 1.684 1.676 1.671 1.667 1.664 1.662 1.660 1.653 1.648 1.646 1.645

0.1

0.025

4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 2.045 2.042 2.021 2.009 2.000 1.994 1.990 1.987 1.984 1.972 1.965 1.962 1.960

0.05

0.01

6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 2.462 2.457 2.423 2.403 2.390 2.381 2.374 2.368 2.364 2.345 2.334 2.330 2.326

0.02

0.005

9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 2.704 2.678 2.660 2.648 2.639 2.632 2.626 2.601 2.586 2.581 2.576

0.01

0.001

22.327 10.215 7.173 5.893 5.208 4.785 4.501 4.297 4.144 4.025 3.930 3.852 3.787 3.733 3.686 3.646 3.610 3.579 3.552 3.527 3.505 3.485 3.467 3.450 3.435 3.421 3.408 3.396 3.385 3.307 3.261 3.232 3.211 3.195 3.183 3.174 3.131 3.107 3.098 3.090

0.002 t-distribution two-tail significance level

APPENDIX C

NOTATION AND FORMULAS

C.1 UNIVARIATE DATA

Notation and Formulas Page

Data values: Y; sample size: n 2

Sample mean of Y: my 4

Sample standard deviation of Y: sy 4

Sample standardized Z-values: —-—*- 4

Population mean or expected value of Y: E(Y) 10

Population standard deviation of Y: SD(Y) 10 Y PfyM

Population standardized Z-values: crvv-) 10

Central limit theorem (normal version): o rvy i / r ~N(0,1) 13

mY-E(Y) Sy/y/n

_ my-E(y )

Central limit theorem (t-version): —o_ ; ^ ~ tn-\ 15

t-statistic for testing E(K): J-T=

[test value, E(F), is the value in the null hypothesis] 19


293

APPENDIX C

NOTATION AND FORMULAS

C.1 UNIVARIATE DATA


Data values: Y; sample size: n 2

Sample mean of Y: my 4 Sample standard deviation of Y: sy 4

Sample standardized Z-values: —-—*- 4

Population mean or expected value of Y: E(Y) 10 Population standard deviation of Y: SD(Y) 10

Y PfyM Population standardized Z-values: crvv-) 10

Central limit theorem (normal version): o rvy i / r ~N(0,1) 13 mY-E(Y)

Sy/y/n _ my-E(y )

Central limit theorem (t-version): —o_ ; ^ ~ tn-\ 15

t-statistic for testing E(K): J-T=

[test value, E(F), is the value in the null hypothesis] 19


293

2 9 4 NOTATION AND FORMULAS


Upper-tail critical value for testing E(K): t-percentile from t„^\ (significance level = area to the right) 20

Lower-tail critical value for testing E(Y): t-percentile from /„_i (significance level = area to the left) 20

Two-tail critical value for testing E(Y): t-percentile from t„-\ (significance level = sum of tail areas) 20

Upper-tail p-value for testing E(y): area under /„_i curve to right of t-statistic 21 Lower-tail p-value for testing E(K): area under t„_\ curve to left of t-statistic 21 Two-tail p-value for testing E(K): 2x area under /„_] curve beyond t-statistic 21 Model for univariate data: Y = E(Y) + e 26 Point estimate for E(K): my 15 Confidence interval for E(Y): my ± (t-percentile from t„-i){sy/y/n) 17 Point estimate for Y* (prediction): my 25 Prediction interval for Y*: my ± (t-percentile from t„-i)(sy y/l + l/n) 26

C.2 SIMPLE LINEAR REGRESSION


Response values: Y; predictor values: X; sample size: n 35 Simple linear regression model: Y = E(Y) + e = bo + biX + e 40 Fitted regression model for E(Y): Y = bo + b\ X 42 Estimated errors or residuals: e = Y — Y 42 Residual sum of squares: RSS = EjLt e2 42

vC RSS Regression standard error: s= \j „ _ 2 ^6

(with 95% confidence, we can expect to predict Y to within approx. ±2s) 47 Total sum of squares: TSS = £?= 1 (Yi - mY)2 48

RSS Coefficient of determination: R2 = 1 — JQQ 48

(linear association between Y and X explains R2 of the variation in Y) 49 LL,(yi-my)(Xi-mx) Coefficient of correlation: r = —, —. 50

y/LU (Yi-my)Wl.U {Xi-mx)2

(r tells us strength and direction of any linear association between Y and X) 50

t-statistic for testing b\: — (the test value, b\, is usually 0) 53

Upper-tail critical value for testing b\: t-percentile from f„_2 (significance level = area to the right) 54

Lower-tail critical value for testing b\: t-percentile from f„_2 (significance level = area to the left) 54

Two-tail critical value for testing b\: t-percentile from f„_2 (significance level = sum of tail areas) 55

Upper-tail p-value for testing b\: area under t„-2 curve to right of t-statistic 54



Upper-tail critical value for testing E(K): t-percentile from t„^\ (significance level = area to the right) 20

Lower-tail critical value for testing E(Y): t-percentile from /„_i (significance level = area to the left) 20

Two-tail critical value for testing E(Y): t-percentile from t„-\ (significance level = sum of tail areas) 20

Upper-tail p-value for testing E(y): area under /„_i curve to right of t-statistic 21 Lower-tail p-value for testing E(K): area under t„_\ curve to left of t-statistic 21 Two-tail p-value for testing E(K): 2x area under /„_] curve beyond t-statistic 21 Model for univariate data: Y = E(Y) + e 26 Point estimate for E(K): my 15 Confidence interval for E(Y): my ± (t-percentile from t„-i){sy/y/n) 17 Point estimate for Y* (prediction): my 25 Prediction interval for Y*: my ± (t-percentile from t„-i)(sy y/l + l/n) 26

C.2 SIMPLE LINEAR REGRESSION


Response values: Y; predictor values: X; sample size: n 35 Simple linear regression model: Y = E(Y) + e = bo + biX + e 40 Fitted regression model for E(Y): Y = bo + b\ X 42 Estimated errors or residuals: e = Y — Y 42 Residual sum of squares: RSS = EjLt e2 42

vC RSS Regression standard error: s= \j „ _ 2 ^6

(with 95% confidence, we can expect to predict Y to within approx. ±2s) 47 Total sum of squares: TSS = £?= 1 (Yi - mY)2 48

RSS Coefficient of determination: R2 = 1 — JQQ 48

(linear association between Y and X explains R2 of the variation in Y) 49 LL,(yi-my)(Xi-mx) Coefficient of correlation: r = —, —. 50

y/LU (Yi-my)Wl.U {Xi-mx)2

(r tells us strength and direction of any linear association between Y and X) 50 t-statistic for testing b\: — (the test value, b\, is usually 0) 53

Upper-tail critical value for testing b\: t-percentile from f„_2 (significance level = area to the right) 54

Lower-tail critical value for testing b\: t-percentile from f„_2 (significance level = area to the left) 54

Two-tail critical value for testing b\: t-percentile from f„_2 (significance level = sum of tail areas) 55

Upper-tail p-value for testing b\: area under t„-2 curve to right of t-statistic 54

MULTIPLE LINEAR REGRESSION 2 9 5


Lower-tail p-value for testing b\: area under f„_2 curve to left of t-statistic 54 Two-tail p-value for testing b\: 2 x area under r„_2 curve beyond t-statistic 55 Confidence interval for b\: b\± (t-percentile from f„_2)(s£ ) 58

Point estimate for E(Y) (estimation) at Xp: Y = bo + b\Xp 68

Confidence interval for E(K) at Xp: Y ± (t-percentile from f„_2)Cty) 68

-mx)2

Point estimate for Y* (prediction) at Xp: Y = bo + b\Xp 69

Prediction interval for Y* at Xp: Y ± (t-percentile from tn-2)(sy.) 69

/ l | {Xp-mx) Standard error of estimation: Sf = i < / - + „ n (y._ \2~ 68

/ i l l , (Xp—mx)2

Standard error of prediction: Sy* — ^\l ' n x*n (Y y " L,=i(Ai — mx, I

70

C.3 MULTIPLE LINEAR REGRESSION


Response values: Y; predictor values: X\,X2,---, X ; sample size: n 83 Multiple linear regression model:

Y = E{Y) + e = b0+biXi+b2X2 + --- + bkXk + e 85 Interpreting regression parameters in models such as E(Y) = bo + b\ X\ + £2^2:

by — expected change in Y when X\ increases by one unit (and X2 stays fixed) 85 Fitted regression model for E(K): Y = b0 + b\X\ + b2X2 -\ \-bkXk 87 Estimated errors or residuals: e = Y — Y 88 Residual sum of squares: RSS = £"=1 e

2 88 Regression parameter estimates: (XTX)_1XTY 92

Regression standard error: s= \ n—k—\ ^

(with 95% confidence, we can expect to predict Y to within approx. ±2s) 93

Total sum of squares: TSS = Z"= i{Yi-mY)2 94 RSS

Coefficient of determination: R2 = 1 — J Q ^ 94 [the linear regression model for (X\,... ,Xk) explains R2 of the variation in Y] 94

AdjustedR2 = l - ( ^ = ^ T ) ( l - R 2 ) 96

Multiple R = v/R2 100 (the correlation between the observed Y-values and the fitted Y-values) 100

Global F-statistic for testing b\=b2=-—bk=0:

(TSS-RSS) A R 2 A ) m

RSS/(B-ik-l) " (l_R2)/(II_ ifc_l) Critical value: F-percentile from F* „_*_! (significance level = area to the right) 102 p-value: area under the Fkn_k_\ curve to the right of the F-statistic 102

MULTIPLE LINEAR REGRESSION 2 9 5


Lower-tail p-value for testing b\: area under f„_2 curve to left of t-statistic 54 Two-tail p-value for testing b\: 2 x area under r„_2 curve beyond t-statistic 55 Confidence interval for b\: b\± (t-percentile from f„_2)(s£ ) 58 Point estimate for E(Y) (estimation) at Xp: Y = bo + b\Xp 68 Confidence interval for E(K) at Xp: Y ± (t-percentile from f„_2)Cty) 68

-mx)2

Point estimate for Y* (prediction) at Xp: Y = bo + b\Xp 69 Prediction interval for Y* at Xp: Y ± (t-percentile from tn-2)(sy.) 69

/ l | {Xp-mx) Standard error of estimation: Sf = i < / - + „ n (y._ \2~ 68

/ i l l , (Xp—mx)2

Standard error of prediction: Sy* — ^\l ' n x*n (Y y " L,=i(Ai — mx, I

70

C.3 MULTIPLE LINEAR REGRESSION


Response values: Y; predictor values: X\,X2,---, X ; sample size: n 83 Multiple linear regression model:

Y = E{Y) + e = b0+biXi+b2X2 + --- + bkXk + e 85 Interpreting regression parameters in models such as E(Y) = bo + b\ X\ + £2^2:

by — expected change in Y when X\ increases by one unit (and X2 stays fixed) 85 Fitted regression model for E(K): Y = b0 + b\X\ + b2X2 -\ \-bkXk 87 Estimated errors or residuals: e = Y — Y 88 Residual sum of squares: RSS = £"=1 e

2 88 Regression parameter estimates: (XTX)_1XTY 92

Regression standard error: s= \ n—k—\ ^

(with 95% confidence, we can expect to predict Y to within approx. ±2s) 93 Total sum of squares: TSS = Z"= i{Yi-mY)2 94

RSS Coefficient of determination: R2 = 1 — J Q ^ 94

[the linear regression model for (X\,... ,Xk) explains R2 of the variation in Y] 94 AdjustedR2 = l - ( ^ = ^ T ) ( l - R 2 ) 96 Multiple R = v/R2 100

(the correlation between the observed Y-values and the fitted Y-values) 100 Global F-statistic for testing b\=b2=-—bk=0:

(TSS-RSS) A R 2 A ) m

RSS/(B-ik-l) " (l_R2)/(II_ifc_l) Critical value: F-percentile from F* „_*_! (significance level = area to the right) 102 p-value: area under the Fkn_k_\ curve to the right of the F-statistic 102



Nested F-statistic for testing br+\ = br+2 = • • ■ = bk=0: (RSSR-RSSc)/(*-r) . . .

RSS c / (n-J t - l ) 1U3

Critical value: F-percentile from F^- r^ - i (significance level = area to the right) 105

p-value: area under the Ft-r^-i curve to the right of the F-statistic 105

bn ~bn

t-statistic for testing b„: -*■ - (the test value, b„, is usually 0) 109

Upper-tail critical value for testing bp: t-percentile from f„_/t_i

(significance level = area to the right) 111

Lower-tail critical value for testing bp: t-percentile from fn-t-i

(significance level = area to the left) 111

Two-tail critical value for testing bp: t-percentile from f„-*_i

(significance level = sum of tail areas) 110

Upper-tail p-value for testing bp: area under /„_*-] curve to right of t-statistic 111

Lower-tail p-value for testing bp: area under f„_*_i curve to left of t-statistic 111

Two-tail p-value for testing bp: 2 x area under f„_t_i curve beyond t-statistic 111

Confidence interval for bp: bp ± (t-percentile from /„_*_!) (s^ ) 113

Regression parameter standard errors: square roots of the diagonal entries of s2 (XTX) ~' 118

Point estimate for E(Y) (estimation) at (X\ ,X2,...,X^):

Y = b0 + blXi+b2X2+-+bkXk 126

Confidence interval forE{Y) at(Xi,X2,...,Xk):

Y ± (t-percentile from f„_n )(sf) ^6

Point estimate for Y* (prediction) at (X\ ,X2, ■ ■. ,Xk):

Y = bo + blXl+b2X2 + --+bkXk 128

Prediction interval for Y* at (X) ,X2,... ,Xk):

Y ± (t-percentile from r^t- i ) isf«) 128

Standard error of estimation: Sy = s\/xT(XJX)~lx 129

Standard error of prediction: sY, = s\/l +xT(XTX)_ 1x 129

Models with loge(F) as the response, for example, E(loge(y)) = bo + b\ X\ + b2X2:

exp(fc]) — 1 = proportional change in Y when X\ increases by 1 unit 155

(and X2 stays fixed)

Standardized residual: r,- = ,'< . 193

Studentized residual: /, = n J " ^ ^ 193

Leverages: diagonal entries of H = X(XTX)" * XT 196

Leverage (alternate formula): hi = , . j ; . ,2 'A 199

Cook's distance: D,- = ik+Ai[_fl\ 199

Variance inflation factor forXp: 1/(1 -R2) 209



Nested F-statistic for testing br+\ = br+2 = • • ■ = bk=0: (RSSR-RSSc)/(*-r) . . .

RSS c / (n-J t - l ) 1U3

Critical value: F-percentile from F^- r^ - i (significance level = area to the right) 105 p-value: area under the Ft-r^-i curve to the right of the F-statistic 105

bn ~bn

t-statistic for testing b„: -*■ - (the test value, b„, is usually 0) 109 Upper-tail critical value for testing bp: t-percentile from f„_/t_i

(significance level = area to the right) 111 Lower-tail critical value for testing bp: t-percentile from fn-t-i

(significance level = area to the left) 111 Two-tail critical value for testing bp: t-percentile from f„-*_i

(significance level = sum of tail areas) 110 Upper-tail p-value for testing bp: area under /„_*-] curve to right of t-statistic 111 Lower-tail p-value for testing bp: area under f„_*_i curve to left of t-statistic 111 Two-tail p-value for testing bp: 2 x area under f„_t_i curve beyond t-statistic 111 Confidence interval for bp: bp ± (t-percentile from /„_*_!) (s^ ) 113 Regression parameter standard errors: square roots of the diagonal entries of s2 (XTX) ~' 118 Point estimate for E(Y) (estimation) at (X\ ,X2,...,X^):

Y = b0 + blXi+b2X2+-+bkXk 126 Confidence interval forE{Y) at(Xi,X2,...,Xk):

Y ± (t-percentile from f„_n )(sf) ^6

Point estimate for Y* (prediction) at (X\ ,X2, ■ ■. ,Xk):

Y = bo + blXl+b2X2 + --+bkXk 128 Prediction interval for Y* at (X) ,X2,... ,Xk):

Y ± (t-percentile from r^t- i ) isf«) 128 Standard error of estimation: Sy = s\/xT(XJX)~lx 129 Standard error of prediction: sY, = s\/l +xT(XTX)_ 1x 129 Models with loge(F) as the response, for example, E(loge(y)) = bo + b\ X\ + b2X2:

exp(fc]) — 1 = proportional change in Y when X\ increases by 1 unit 155 (and X2 stays fixed)

Standardized residual: r,- = ,'< . 193

Studentized residual: /, = n J " ^ ^ 193

Leverages: diagonal entries of H = X(XTX)" * XT 196 Leverage (alternate formula): hi = , . j ; . ,2 'A 199

Cook's distance: D,- = ik+Ai[_fl\ 199 Variance inflation factor forXp: 1/(1 -R2) 209

GLOSSARY

ANOVA test See Global usefulness test and Nested model test. Autocorrelation Data collected over time can result in regression model residuals that

violate the independence assumption because they are highly dependent across time (p. 202). Also called serial correlation.

Average See Mean.

Bivariate Datasets with two variables measured on a sample of observations (p. 35).

Categorical See Qualitative.

Collinearity See Multicollinearity.

Confidence interval A range of values that we are reasonably confident (e.g., 95%) contains an unknown population parameter such as a population mean or a regression parameter (p. 16). Also called a mean confidence interval.

Cook's distance A measure of the potential influence of an observation on a regression

model, due to either outlyingness or high leverage (p. 196).

Correlation A measure of linear association between two quantitative variables (p. 50).

Covariate(s) See Predictor variable(s).

Critical value A percentile from a probability distribution (e.g., t or F) that defines the

rejection region in a hypothesis test (p. 20).

Degrees of freedom Whole numbers for t, F, and x2 distributions that determine the

shape of the density function, and therefore also critical values and p-values (p. 14).

Density curve Theoretical smoothed histogram for a probability distribution that shows the relative frequency of particular values for a random variable (p. 6).


315

GLOSSARY

ANOVA test See Global usefulness test and Nested model test. Autocorrelation Data collected over time can result in regression model residuals that

violate the independence assumption because they are highly dependent across time (p. 202). Also called serial correlation.

Average See Mean. Bivariate Datasets with two variables measured on a sample of observations (p. 35). Categorical See Qualitative. Collinearity See Multicollinearity. Confidence interval A range of values that we are reasonably confident (e.g., 95%)

contains an unknown population parameter such as a population mean or a regression parameter (p. 16). Also called a mean confidence interval.

Cook's distance A measure of the potential influence of an observation on a regression model, due to either outlyingness or high leverage (p. 196).

Correlation A measure of linear association between two quantitative variables (p. 50). Covariate(s) See Predictor variable(s). Critical value A percentile from a probability distribution (e.g., t or F) that defines the

rejection region in a hypothesis test (p. 20). Degrees of freedom Whole numbers for t, F, and x2 distributions that determine the

shape of the density function, and therefore also critical values and p-values (p. 14). Density curve Theoretical smoothed histogram for a probability distribution that shows

the relative frequency of particular values for a random variable (p. 6).


315

3 1 6 GLOSSARY

Dependent variable See Response variable.

Distribution Theoretical model that describes how a random variable varies, that is, which values it can take and their associated probabilities (p. 5).

Dummy variables See Indicator variables.

Expected value The population mean of a variable.

Extrapolation Using regression model results to estimate or predict a response value for an observation with predictor values that are very different from those in our sample (p. 213).

Fitted value The estimated expected value, Y, of the response variable in a regression

model (p. 88). Also called an (unstandardized) predicted value.

Global usefulness test Hypothesis test to see whether any of the predictors in a multiple

linear regression model are significant (p. 101). An example of an ANOVA test.

Hierarchy A modeling guideline that suggests including lower-order predictor terms when also using higher-order terms, for example, keep X\ when using Xf, keep Xi and X2 when using X1X2, and keep X2 when using DX2 (p. 145).

Histogram A bar chart showing relative counts (frequencies) within consecutive ranges

(bins) of a variable (p. 3).

Hypothesis test A method for deciding which of two competing hypotheses about a

population parameter seems more reasonable (p. 19).

Imputation One method for dealing with missing data by replacing the missing values with imputed numbers, which might be sample means, model predictions, and so on (p. 215).

Independent variable(s) See Predictor variable(s).

Indicator variables Variables derived from qualitative variables that have values of 1 for

one category and 0 for all other categories (p. 167). Also called dummy variables.

Individual prediction interval See Prediction interval.

Input variable(s) See Predictor variable(s).

Interaction When the effect of one predictor variable on a response variable depends on

the value of another predictor variable (p. 159).

Least squares The computational criterion used to derive regression parameter estimates by minimizing the residual sum of squares, where the residuals are the differences between observed K-values and fitted f-values (p. 88).

Leverage A measure of the potential influence of a sample observation on a fitted regres-sion model (p. 194).

Loess fitted line A smooth line for a scatterplot that fits a general nonlinear curve

representing the association between the variables on the two axes (p. 120).

Mean A measure of the central tendency of a variable, also known as the average (p. 4).

Median An alternative measure of the central tendency of a variable, which is greater

than half the sample values and less than the other half (p. 4).

Multicollinearity When there is excessive correlation between quantitative predictor variables that can lead to unstable multiple regression models and inflated standard errors (p. 206). Also called collinearity.

3 1 6 GLOSSARY

Dependent variable See Response variable. Distribution Theoretical model that describes how a random variable varies, that is,

which values it can take and their associated probabilities (p. 5). Dummy variables See Indicator variables. Expected value The population mean of a variable. Extrapolation Using regression model results to estimate or predict a response value for

an observation with predictor values that are very different from those in our sample (p. 213).

Fitted value The estimated expected value, Y, of the response variable in a regression model (p. 88). Also called an (unstandardized) predicted value.

Global usefulness test Hypothesis test to see whether any of the predictors in a multiple linear regression model are significant (p. 101). An example of an ANOVA test.

Hierarchy A modeling guideline that suggests including lower-order predictor terms when also using higher-order terms, for example, keep X\ when using Xf, keep Xi and X2 when using X1X2, and keep X2 when using DX2 (p. 145).

Histogram A bar chart showing relative counts (frequencies) within consecutive ranges (bins) of a variable (p. 3).

Hypothesis test A method for deciding which of two competing hypotheses about a population parameter seems more reasonable (p. 19).

Imputation One method for dealing with missing data by replacing the missing values with imputed numbers, which might be sample means, model predictions, and so on (p. 215).

Independent variable(s) See Predictor variable(s). Indicator variables Variables derived from qualitative variables that have values of 1 for

one category and 0 for all other categories (p. 167). Also called dummy variables.

Individual prediction interval See Prediction interval. Input variable(s) See Predictor variable(s). Interaction When the effect of one predictor variable on a response variable depends on

the value of another predictor variable (p. 159). Least squares The computational criterion used to derive regression parameter estimates

by minimizing the residual sum of squares, where the residuals are the differences between observed K-values and fitted f-values (p. 88).

Leverage A measure of the potential influence of a sample observation on a fitted regres-sion model (p. 194).

Loess fitted line A smooth line for a scatterplot that fits a general nonlinear curve representing the association between the variables on the two axes (p. 120).

Mean A measure of the central tendency of a variable, also known as the average (p. 4). Median An alternative measure of the central tendency of a variable, which is greater

than half the sample values and less than the other half (p. 4). Multicollinearity When there is excessive correlation between quantitative predictor

variables that can lead to unstable multiple regression models and inflated standard errors (p. 206). Also called collinearity.

GLOSSARY 317

Multiple R The correlation between the observed K-values and the fitted F-values from a regression model (p. 100).

Multivariate Datasets with two or more variables measured on a sample of observations

(p. 83).

Natural logarithm transformation A mathematical transformation for positive-valued quantitative variables which spreads out low values and pulls in high values; that is, it makes positively skewed data look more normal (p. 142).

Nested model test Hypothesis test to see whether a subset of the predictors in a multiple linear regression model is significant (p. 104). An example of an ANOVA test. Also called an R-squared change test.

Nominal See Qualitative.

Normal probability plot See QQ-plot.

Observed significance level See p-value.

Ordinal See Qualitative.

Outcome variable See Response variable.

Outlier A sample observation in a linear regression model with a srudentized residual

less than —3 or greater than +3 (p. 190).

Output variable See Response variable.

p-value The probability of observing a test statistic as extreme as the one observed or

even more extreme (in the direction that favors the alternative hypothesis) (p. 21).

Parameter A numerical summary measure for a population such as a population mean

or a regression parameter (p. 11).

Percentile A number that is greater than a certain percentage (say, 95%) of the sample

values and less than the remainder (5% in this case) (p. 4). Also called a quantile.

Point estimate A single number used as an estimate of a population parameter. For

example, the sample mean is a point estimate of the population mean (p. 15).

Polynomial transformation A mathematical transformation involving increasing powers

of a quantitative variable, for example, X, X2, and X3 (p. 144).

Population The entire collection of objects of interest about which we would like to

make statistical inferences (p. 5).

Predicted value See Fitted value.

Prediction interval A range of values that we are reasonably confident (e.g., 95%) contains an unknown data value (such as for univariate data or for a regression response variable) (p. 25). Also called an individual prediction interval.

Predictor effect plot A line graph that shows how a regression response variable varies

with a predictor variable holding all other predictors constant (p. 224).

Predictor variable(s) Variable(s) in a regression model that we use to help estimate or predict the response variable; also known as independent or input variable(s), or covariate(s) (p. 83).

Probability Mathematical method for quantifying the likelihood of particular events

occurring (p. 9).

QQ-plot A scatterplot used to assess the normality of some sample values (p. 8).

GLOSSARY 317

Multiple R The correlation between the observed K-values and the fitted F-values from a regression model (p. 100).

Multivariate Datasets with two or more variables measured on a sample of observations (p. 83).

Natural logarithm transformation A mathematical transformation for positive-valued quantitative variables which spreads out low values and pulls in high values; that is, it makes positively skewed data look more normal (p. 142).

Nested model test Hypothesis test to see whether a subset of the predictors in a multiple linear regression model is significant (p. 104). An example of an ANOVA test. Also called an R-squared change test.

Nominal See Qualitative. Normal probability plot See QQ-plot. Observed significance level See p-value. Ordinal See Qualitative. Outcome variable See Response variable. Outlier A sample observation in a linear regression model with a srudentized residual

less than —3 or greater than +3 (p. 190). Output variable See Response variable. p-value The probability of observing a test statistic as extreme as the one observed or

even more extreme (in the direction that favors the alternative hypothesis) (p. 21). Parameter A numerical summary measure for a population such as a population mean

or a regression parameter (p. 11). Percentile A number that is greater than a certain percentage (say, 95%) of the sample

values and less than the remainder (5% in this case) (p. 4). Also called a quantile.

Point estimate A single number used as an estimate of a population parameter. For example, the sample mean is a point estimate of the population mean (p. 15).

Polynomial transformation A mathematical transformation involving increasing powers of a quantitative variable, for example, X, X2, and X3 (p. 144).

Population The entire collection of objects of interest about which we would like to make statistical inferences (p. 5).

Predicted value See Fitted value. Prediction interval A range of values that we are reasonably confident (e.g., 95%)

contains an unknown data value (such as for univariate data or for a regression response variable) (p. 25). Also called an individual prediction interval.

Predictor effect plot A line graph that shows how a regression response variable varies with a predictor variable holding all other predictors constant (p. 224).

Predictor variable(s) Variable(s) in a regression model that we use to help estimate or predict the response variable; also known as independent or input variable(s), or covariate(s) (p. 83).

Probability Mathematical method for quantifying the likelihood of particular events occurring (p. 9).

QQ-plot A scatterplot used to assess the normality of some sample values (p. 8).

3 1 8 GLOSSARY

Quadratic A particular type of polynomial transformation that uses a variable and its square, for example, X and X2 (p. 145).

Qualitative Data variable that contains labels for categories to which each sample obser-vation belongs (p. 166). Also called categorical, nominal (if there is no natural order to the categories, e.g., male/female), or ordinal (if there is a natural order to the categories, e.g., small/medim/large).

Quantile See Percentile.

Quantitative Data variable that contains meaningful numerical values that measure some

characteristic for each sample observation. Also called a scale measure (p. 35).

R-squared (R2) The proportion of variation in a regression response variable (about its

mean) explained by the model (p. 94).

R-squared change test See Nested model test.

Reciprocal transformation A mathematical transformation that divides a quantitative variable into 1, for example, l/X (p. 147).

Reference level One of the categories of a qualitative variable selected to be the compar-ison level for all the other categories. It takes the value zero for each of the indicator

variables used (p. 174).

Regression coefficients See Regression parameters.

Regression parameters The numbers multiplying the predictor values in a multiple linear regression model, that is, (b\, bj, ■ ■ ■) in E(Y) = bo + b\Xi +b2X2-\ . Also called (unstandardized) regression coefficients (p. 86).

Regression standard error (s) An estimate of the standard deviation of the random errors in a multiple linear regression model (p. 93). Also called standard error of the estimate in SPSS, root mean squared error in SAS, and residual standard error in R.

Rejection region The range of values for a probability distribution that leads to rejection

of a null hypothesis if the test statistic falls in this range (p. 20). •

Residual The difference, e, between a response Y-value and a fitted Y-value in a regression

model (p. 119).

Residual standard error R terminology for regression standard error.

Response variable Variable, Y, in a regression model that we would like to estimate or

predict (p. 83). Also known as a dependent, outcome, or output variable.

Root mean squared error SAS terminology for regression standard error.

Sample A (random) subset of the population for which we have data values (p. 11).

Sampling distribution The probability distribution of a test statistic under (hypothetical)

repeated sampling (p. 12).

Scatterplot A graph representing bivariate data with one variable on the vertical axis and

the other on the horizontal axis (p. 37).

Scatterplot matrix A matrix of scatterplots representing all bivariate associations in a

set of variables (p. 89).

Serial correlation See Autocorrelation.

Significance level The probability of falsely rejecting a null hypothesis when it is true— used as a threshold for determining significance when a p-value is less than this (p. 20).

3 1 8 GLOSSARY

Quadratic A particular type of polynomial transformation that uses a variable and its square, for example, X and X2 (p. 145).

Qualitative Data variable that contains labels for categories to which each sample obser-vation belongs (p. 166). Also called categorical, nominal (if there is no natural order to the categories, e.g., male/female), or ordinal (if there is a natural order to the categories, e.g., small/medim/large).

Quantile See Percentile. Quantitative Data variable that contains meaningful numerical values that measure some

characteristic for each sample observation. Also called a scale measure (p. 35). R-squared (R2) The proportion of variation in a regression response variable (about its

mean) explained by the model (p. 94). R-squared change test See Nested model test. Reciprocal transformation A mathematical transformation that divides a quantitative

variable into 1, for example, l/X (p. 147). Reference level One of the categories of a qualitative variable selected to be the compar-

ison level for all the other categories. It takes the value zero for each of the indicator variables used (p. 174).

Regression coefficients See Regression parameters. Regression parameters The numbers multiplying the predictor values in a multiple linear

regression model, that is, (b\, bj, ■ ■ ■) in E(Y) = bo + b\Xi +b2X2-\ . Also called (unstandardized) regression coefficients (p. 86).

Regression standard error (s) An estimate of the standard deviation of the random errors in a multiple linear regression model (p. 93). Also called standard error of the estimate in SPSS, root mean squared error in SAS, and residual standard error in R.

Rejection region The range of values for a probability distribution that leads to rejection of a null hypothesis if the test statistic falls in this range (p. 20). •

Residual The difference, e, between a response Y-value and a fitted Y-value in a regression model (p. 119).

Residual standard error R terminology for regression standard error. Response variable Variable, Y, in a regression model that we would like to estimate or

predict (p. 83). Also known as a dependent, outcome, or output variable. Root mean squared error SAS terminology for regression standard error. Sample A (random) subset of the population for which we have data values (p. 11). Sampling distribution The probability distribution of a test statistic under (hypothetical)

repeated sampling (p. 12). Scatterplot A graph representing bivariate data with one variable on the vertical axis and

the other on the horizontal axis (p. 37). Scatterplot matrix A matrix of scatterplots representing all bivariate associations in a

set of variables (p. 89). Serial correlation See Autocorrelation. Significance level The probability of falsely rejecting a null hypothesis when it is true—

used as a threshold for determining significance when a p-value is less than this (p. 20).

GLOSSARY 319

Standardize Rescale a variable by subtracting a sample mean value and dividing by a sample standard deviation value. The resulting Z-value has a mean equal to 0 and a standard deviation equal to 1 (p. 4).

Standard deviation A measure of the spread of a variable, with most of the range of a

normal random variable contained within 3 standard deviations of the mean (p. 4).

Standard error An estimate of a population standard deviation, often used to quantify the sampling variability of a test statistic or model estimate (p. 26).

Standard error of a regression parameter A standard deviation estimate used in hy-pothesis tests and confidence intervals for regression parameters (p. 111).

Standard error of estimation A standard deviation estimate used in hypothesis tests and

confidence intervals for a univariate population mean (p. 26).

Standard error of estimation for regression A standard deviation estimate used in

confidence intervals for the population mean in a regression model (p. 126).

Standard error of prediction A standard deviation estimate used in prediction intervals

for a univariate prediction (p. 26).

Standard error of prediction for regression A standard deviation estimate used in

prediction intervals for an individual response value in a regression model (p. 128).

Standard error of the estimate SPSS terminology for regression standard error.

Statistic A numerical summary measure for a sample such as a sample mean or an

estimated regression parameter (p. 11).

Stem-and-leaf plot A variant on a histogram where numbers in the plot represent actual

sample values or rounded sample values (p. 2).

Test statistic A rescaled numerical summary measure for a sample that has a known sampling distribution under a null hypothesis, for example, a t-statistic for a univariate mean or a t-statistic for a regression parameter (p. 19).

Unbiased When a statistic is known to estimate the value of the population parameter

correctly on average under repeated sampling (p. 11).

Univariate Datasets with a single variable measured on a sample of observations (p. 1).

Variance The square of the standard deviation (p. 10).

Variance inflation factor (VIF) An estimate of how much larger the variance of a regression parameter estimate becomes when the corresponding predictor is included in the model (p. 206).

Z-value See Standardize.

GLOSSARY 319

Standardize Rescale a variable by subtracting a sample mean value and dividing by a sample standard deviation value. The resulting Z-value has a mean equal to 0 and a standard deviation equal to 1 (p. 4).

Standard deviation A measure of the spread of a variable, with most of the range of a normal random variable contained within 3 standard deviations of the mean (p. 4).

Standard error An estimate of a population standard deviation, often used to quantify the sampling variability of a test statistic or model estimate (p. 26).

Standard error of a regression parameter A standard deviation estimate used in hy-pothesis tests and confidence intervals for regression parameters (p. 111).

Standard error of estimation A standard deviation estimate used in hypothesis tests and confidence intervals for a univariate population mean (p. 26).

Standard error of estimation for regression A standard deviation estimate used in confidence intervals for the population mean in a regression model (p. 126).

Standard error of prediction A standard deviation estimate used in prediction intervals for a univariate prediction (p. 26).

Standard error of prediction for regression A standard deviation estimate used in prediction intervals for an individual response value in a regression model (p. 128).

Standard error of the estimate SPSS terminology for regression standard error. Statistic A numerical summary measure for a sample such as a sample mean or an

estimated regression parameter (p. 11). Stem-and-leaf plot A variant on a histogram where numbers in the plot represent actual

sample values or rounded sample values (p. 2). Test statistic A rescaled numerical summary measure for a sample that has a known

sampling distribution under a null hypothesis, for example, a t-statistic for a univariate mean or a t-statistic for a regression parameter (p. 19).

Unbiased When a statistic is known to estimate the value of the population parameter correctly on average under repeated sampling (p. 11).

Univariate Datasets with a single variable measured on a sample of observations (p. 1). Variance The square of the standard deviation (p. 10). Variance inflation factor (VIF) An estimate of how much larger the variance of a

regression parameter estimate becomes when the corresponding predictor is included in the model (p. 206).

Z-value See Standardize.

Introduction to Econometrics - pith-edu · 2019. 10. 4. · Introduction to Econometrics is an introductory book for undergraduate students in economics ... Econometrics, Accompanied

Documents