Diploma in Statistics Introduction to Regression Lecture 2.1 1 Introduction to Regression Lecture 2.1 1. Review of Lecture 1.1 2. Correlation 3. Pitfalls with Regression and Correlation 4. Introducing Multiple Linear Regression – Job times case study – Stamp sales case study 5. Homework
Introduction to Regression Lecture 2.1. Review of Lecture 1.1 Correlation Pitfalls with Regression and Correlation Introducing Multiple Linear Regression Job times case study Stamp sales case study Homework. Review of Lecture 1.1. Scatter plot of US mail handling data, exceptions deleted. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 1
Introduction to RegressionLecture 2.1
1. Review of Lecture 1.1
2. Correlation
3. Pitfalls with Regression and Correlation
4. Introducing Multiple Linear Regression
– Job times case study
– Stamp sales case study
5. Homework
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 2
Review of Lecture 1.1
Scatter plot of US mail handling data,exceptions deleted
150 160 170 180 190
Volume
550
600
650
700
Manhours
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 3
Always look ar your data!
"Although regression can be done without ever looking at a scatter plot, that is the statistical equivalent of flying blind"
Amy Lap Mui Choi, JF MSISS, 1993/94.
"Decision-making under risk is when you know what will probably happen and
decision-making under uncertainty is when you probably know what will happen."
Anon., JF MSISS 1995/96
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 4
Simple linear regression modelwith Normal model for chance variation
150 160 170 180 190
Volume
550
600
650
700
Manhours
Y = α + βX +
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 5
The prediction formula
Prediction equation:
Prediction equation allowing for chance variation:
XˆˆY
ˆ2XˆˆY
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 6
Homework
Use the prediction formula
to predict the extra manpower requirement during Christmas period, based on the experience of Period 7, Fiscal 1963,
when Y was 1,070 and X was 270.
Compare with actual.
Comment.
40X3.350Y
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 7
Application 1Confidence interval for marginal change
Recall confidence interval for
or
Confidence interval for :
Small sample:
)ˆ(SE2ˆ
)ˆ(SEtˆ05,.2n
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 8
Selected critical values for the t-distribution .25 .10 .05 .02 .01 .002 .001
Application 2Testing the statistical significance of the
intercept
Formal test:
H0: = 0
Test statistic:
Critical value: 2 (or t21, .05 = 2.08)
Calculated value: 0.848
Comparison: Z < 2 (or t < 2.08)
Conclusion: Accept H0
)ˆ(SEˆ
)ˆ(SE0ˆ
Z
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 10
Testing the statistical significance of the intercept
Informal test:
is less than its standard error,
Draw a picture!
46.59)ˆ(SE
4394.50ˆ
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 11
Regression Analysis: Manhours versus Volume
The regression equation isManhours = 50.4 + 3.35 Volume
Predictor Coef SE Coef T PConstant 50.44 59.46 0.85 0.406Volume 3.3454 0.3401 9.84 0.000
S = 18.9300
More on Minitab results
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 12
Homework
In a study of a wholesaler's distribution costs, undertaken with a view to cost control, the volume of goods handled and the overall costs were recorded for one month in each of ten depots in a distribution network. The results are presented in the following table.
The simple linear regression of costs (Y) on volume (X) was calculated, and resulted in the following numerical summary.
Regression Analysis: Costs versus Volume
The regression equation isCosts = 2.98 + 0.332 Volume
Predictor Coef SE Coef T PConstant 2.982 1.646 1.81 0.108Volume 0.33174 0.03182 10.42 0.000
S = 0.667603
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 14
Homework
(i) Draw a scatter plot for these data. Comment. Interpret the numerical summary in context.
(ii) Calculate a prediction interval for costs next month when Volume in Depot 1 is planned to be £40,000, and Volume in Depot 2 is planned to be £51,000.
(iii) Next month, when the two depots recorded volumes of £40,000 and £51,000 as planned, costs were £1,700 and £2,300 respectively. Comment on each case. Illustrate with an enhancement of your scatter plot.
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 15
Homework Solution (i)
656055504540
24
23
22
21
20
19
18
17
16
15
Volume
Co
sts
Scatterplot of Costs vs Volume
There appears to be a strong positive relationship between Costs and Volume.
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 16
Homework Solution (i)
Costs increase approximately linearly with Volume, by around £33.20 for every £1,000 increase in Volume, from a base of around £300.
(Costs = 2.98 + 0.332 Volume)
The cost for a given volume is subject to chance variation with a standard deviation of around £67.
(S = 0.667603)
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 17
Homework Solution
(ii) Volume = £40,000, Costs (£1,491 , £1,759)
Volume = £51,000, Costs (£1,857 , £2,124)
(iii) £1,700 is within the corresponding prediction interval, satisfactory.
£2,300 is outside the corresponding prediction interval, too high. An investigation is needed.
Illustrate
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 18
Confidence interval for mean response:
Prediction interval for next response:
2
XX s
XX
n
1
n
1s2Xˆˆˆ
2
XX s
XX
n
1
n
11s2XˆˆY
More precise formulas
(ii) Volume = £40,000, Costs (£1,444 , £1,807)
Volume = £51,000, Costs (£1,829 , £2,151)
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 19
Standard error
• of prediction
• of estimation
Ref: "The Standard Error of Prediction"
Extra Notes folder in mstuart/get or Diploma webpage
The shelf life of packaged foods depends on many factors. Dry cereal (such as corn flakes) is considered to be a moisture-sensitive product, with the shelf life determined primarily by moisture. In a study of the shelf life of one brand of cereal, packets of cereal were stored in controlled conditions (23°C and 50% relative humidity) for a range of times, and moisture content was measured. The results were as follows.
Draw a scatter diagram. Comment. What action is suggested? Why?
Following appropriate action, the following regression was computed.
The regression equation isMoisture Content = 2.86 + 0.0417 Storage Time
Predictor Coef SE Coef T PConstant 2.86122 0.02488 115.01 0.000Storage Time 0.041660 0.001177 35.40 0.000
S = 0.0493475
Calculate a 95% confidence interval for the daily change in moisture content; show details.
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 35
Was the action you suggested on studying the scatter diagram in part (a) justified? Explain.
Predict the moisture content of a packet of cereal stored under these conditions for 3 weeks; calculate a prediction interval.
What would be the effect on your interval of not taking the action you suggested on studying the scatter diagram? Why?
Taste tests indicate that this brand of cereal is unacceptably soggy when the moisture content exceeds 4. Based on your prediction interval, do you think that a box of cereal that has been on the shelf for 3 weeks will be acceptable? Explain.
What about 4 weeks? 5 weeks? What is acceptable?
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 36
Reading
SA Sections 6.4, 6.5
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 37
4 Introducing Multiple Linear Regression
• SLR explaining variation in Y
in terms of variation in X
• MLR explaining variation in Y
in terms of variation in several X 's
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 38
Example 1What determines the taste of mature cheese?
• X1 = Acetic Acid
• X2 = Hydrogen Sulphide
• X3 = Lactic Acid
• Y = Taste Score
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 39
Example 2Explaining crime rates
Variable Description
M percentage of males aged 14–24 So indicator variable for a southern state Ed mean years of schooling Po1 police expenditure in 1960 Po2 police expenditure in 1959 LF labour force participation rate M.F number of males per 1000 females Pop state population NW number of nonwhites per 1000 people U1 unemployment rate of urban males 14–24 U2 unemployment rate of urban males 35–39 GDP gross domestic product per head Ineq income inequality Prob probability of imprisonment Time average time served in state prisons
Crime rate of crimes in a particular category per head of population
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 40
Example 3Estimating tree volume / timber yield
For a sample of 31 black cherry trees in the Allegheny
National Forest, Pennsylvania, measure
• Y = volume (cubic feet),
• X1 = height (feet)
• X2 = diameter (inches) (at 54 inches above
ground
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 41
Example 4The Stamp Sales Case Study
The problem
• January 1984, An Post established
• New business plan; sales forecasts required
• Historical sales data available
bring in a consultant!
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 42
Example 5A production prediction problem
• The problem
• The data
• Initial data analysis
– dotplots– lineplots (time series plots)– scatterplot matrix
• Model fitting / estimation
• Model criticism
• Application
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 43
Erie Metal Products: The problem
Metal products fabrication:
customers order varying quantities of products of varying complexity;
customers demand accurate and precise order delivery times.
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 44
Stephan Clark Metal Products
A specially designed cabinet Rear view
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 45
Stephan Clark Metal Products
Instrument casing Another view
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 46
Stephan Clark Metal Products
Instrument casing; oblique view Lockers
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 47
Stephan Clark Metal Products
• "One customer is an international manufacturer of petrochemical equipment."
• "Stephen Clark supplies painted metalwork components, panels and fabrications, which are used throughout the customer's product range."
• "Stephen Clark plays an important part in them being able to cope with frequent scheduling changes."
• "Through careful program management, we are able to offer excellent flexibility of supply, delivering finished product against weekly call-offs."
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 48
Table 8.1 Times, in hours, to complete jobs with varying numbers of units, numbers of operations per unit and priority status (normal or rushed)
Order Jobtime Units Operations Normal (0)
number (hours) per unit or Rushed (1)? 1 153 100 6 0
the price of a stamp for a standard sealed letter for internal delivery, and divided by 1,000,000.
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 65
Trend projection?
100
200
300
400
Data
1950 1960 1970 1980
Year
Stamp Sales
Meter Sales
Total Sales
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 66
Factors influencing sales
• Economic growth
• Stamp prices
• Alternative product prices
measurement problems!
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 67
Project: develop a sales forecasting system for An Post
Terms of reference
1. Identify and collect the relevant macro-economic data.
2. Establish a data base containing the data needed for model building;
3. Identify, estimate and check a dynamic regression model suitable for the purposes outlined below:
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 68
(a) medium-term (one to five years) forecasting of aggregate demand for postal services;
(b) analysis of the effects of levels of general economic activity, postal prices and the prices of competing services, on aggregate demand for postal services;
(c) use as a benchmark for the analysis of the effects of demand stimulation activities.
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 69
Project: develop a sales forecasting system for An Post
Terms of reference
1. Identify and collect the relevant macro-economic data.
2. Establish a data base containing the data needed for model building;
3. Identify, estimate and check a dynamic regression model suitable for the purposes outlined below:
Diploma in StatisticsIntroduction to Regression
Lecture 2.1 70
(a) medium-term (one to five years) forecasting of aggregate demand for postal services;
(b) analysis of the effects of levels of general economic activity, postal prices and the prices of competing services, on aggregate demand for postal services;
(c) use as a benchmark for the analysis of the effects of demand stimulation activities.