Top Banner
Vooruitblik 10 en 11 Dinsdag 30 september 2008
37

Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

Mar 31, 2015

Download

Documents

Meadow Scarr
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

Vooruitblik 10 en 11

Dinsdag 30 september 2008

Page 2: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

Chapter 10Correlation and Regression

1. Correlation

2. Regression

3. Variation and Prediction Intervals

4. Rangorde correlatie

Page 3: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

1. Correlation

• Verband tussen twee gemeten variabelen in een dataset op interval of ratio nivo

• In dit boek: alléén lineaire verbanden

• Let op de voorwaarden!

• Maat: Pearson PM correlatie r of rho

• Geen correlatie: r = 0, maximale correlatie r = -1 of +1

• Kritische waarden: tabel A-6

Page 4: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

Scatterplots of Paired Data

Figure 10-2

Page 5: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

Scatterplots of Paired Data

Figure 10-2

Page 6: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

Formula 10-1

nxy – (x)(y)

n(x2) – (x)2 n(y2) – (y)2r =

The linear correlation coefficient r measures the strength of a linear relationship between the paired values in a sample.

Calculators can compute r

Formula

Page 7: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

Figure 10-3

Hypothesis Test for a Linear Correlation

Page 8: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

2. Regression

• Vervolg op correlatie

• Berekening van regressielijn in de scatterplot: de lijn die het beste past in de puntenwolk

• Doel: voorspellen van waarden

Page 9: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

Regression

The typical equation of a straight line y = mx + b is expressed in the form y = b0 + b1x, where b0 is the y-intercept and b1 is the slope.

^

The regression equation expresses a relationship between x (called the independent variable, predictor variable or explanatory variable), and y (called the dependent variable or response variable).

^

Page 10: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

Formulas for b0 and b1

Formula 10-2n(xy) – (x) (y)

b1 = (slope)n(x2) – (x)2

b0 = y – b1 x (y-intercept)Formula 10-3

calculators or computers can compute these values

Page 11: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

Given the sample data in Table 10-1, find the regression equation.

Example: Old Faithful - cont

Page 12: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

Procedure for Predicting

Figure 10-7

Page 13: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

3. Variation and Prediction Intervals

• Vervolg op regressielijn

• (hfst 7) Confidence interval = interval schatting van populatie parameters: proportie, gemiddelde, variantie

• Hier: interval schatting van de schatting van de waarde van een variabele

Page 14: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

Key Concept

In this section we proceed to consider a method for constructing a prediction interval, which is an interval estimate of a predicted value of y.

Page 15: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

y - E < y < y + E^ ^

Prediction Interval for an Individual y

where

E = t2 se n(x2) – (x)2

n(x0 – x)2

1 + +1n

x0 represents the given value of x

t2 has n – 2 degrees of freedom

Page 16: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

Standard Error of Estimate

The standard error of estimate, denoted by se

is a measure of the differences (or distances) between the observed sample y-values and the predicted values y that are obtained using the regression equation.

Definition

^

Page 17: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

4. Rangorde correlatie

• Non-parametrische methode = verdelingsvrije toets = geen aannames mbt. Verdeling in de opulatie

• Associatietest op twee variabelen• Spearman’s: rs (sample) of voor populatie: rhos

• Procedure in fig 10.10 (p.537)

Page 18: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

voorbeeld

Page 19: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

1. Goodness-of-fit: multinominaal

2. Kruistabellen (contingency tables)

3. Variantie analyse (ANOVA)

Chapter 11Multinomial Experiments and Contingency Tables

Page 20: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

OverviewWe focus on analysis of categorical (qualitative

or attribute) data that can be separated into different categories (often called cells).

Use the 2 (chi-square) test statistic (Table A- 4).

The goodness-of-fit test uses a one-way frequency table (single row or column).

The contingency table uses a two-way frequency table (two or more rows and columns).

Page 21: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

1. Goodness-of-fit: multinominaal

• Komt een feitelijke kansverdeling op een nominale variabele overeen met een verwachte verdeling?

• H0: p1 = x, p2 = y, p3 = z, p4 = etc..

• H1: Tenminste één van de gevonden proporties is afwijkend van de verwachte kans.

Page 22: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

Goodness-of-Fit Test in Multinomial Experiments

Critical Values1. Found in Table A- 4 using k – 1 degrees of

freedom, where k = number of categories.

2. Goodness-of-fit hypothesis tests are always right-tailed.

2 = (O – E)2

E

Test Statistics

Page 23: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

Example: Last Digit Analysis

Test the claim that the digits in Table 11-2 do not occur with the same frequency.

Page 24: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

Relationships Among the 2 Test Statistic, P-Value, and Goodness-of-Fit

Figure 11-3

Page 25: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

2. Kruistabellen (contingency tables)

• In this section we consider contingency tables (or two-way frequency tables), which include frequency counts for categorical data arranged in a table with a least two rows and at least two columns.

• We present a method for testing the claim that the row and column variables are independent of each other.

• We will use the same method for a test of homogeneity, whereby we test the claim that different populations have the same proportion of some characteristics.

Page 26: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

491

213

704

377

112

489

31

8

39

899

333

1232

Black White Yellow/OrangeRow Totals

Controls (not injured)

Cases (injured or killed)

Column Totals

For the upper left hand cell:

= 513.714E =(899)(704)

1232

Case-Control Study of Motorcycle Drivers

(row total) (column total) E =

(grand total)

899

1232704

899

1232

Page 27: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

491513.714

213

704

377

112

489

31

8

39

899

333

1232

Black White Yellow/OrangeRow Totals

Cases (injured or killed)Expected

Column Totals

Controls (not injured)Expected

190.286

356.827

132.173

28.459

10.541

2 2 22 ( ) (491 513.714) (8 10.541)

...513.714 10.541

O E

E

2 8.775

Case-Control Study of Motorcycle Drivers

Page 28: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

H0: Row and column variables are independent.

H1: Row and column variables are dependent.

The test statistic is 2 = 8.775

= 0.05

The number of degrees of freedom are

(r–1)(c–1) = (2–1)(3–1) = 2.

The critical value (from Table A-4) is 2.05,2 = 5.991.

Case-Control Study of Motorcycle Drivers

Page 29: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

We reject the null hypothesis. It appears there is an association between helmet color and motorcycle safety.

Case-Control Study of Motorcycle Drivers

Figure 11-4

Page 30: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

3. Variantie analyse (ANOVA)

• ANalysis Of VAriance

• H0 = meerdere populatie gemiddeldes zijn gelijk

• F-verdeling (tabel A7)

• Toets op P-waarde

Page 31: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.
Page 32: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.
Page 33: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

TOT SLOT: Bayesiaanse statistiek

• Teksten en 2 opdrachten (worden uitgedeeld)

• 1. Intuïtieve benadering• 2. Formele benadering

Page 34: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

Voorbeeldprobleem

• Gegeven: In Orange County VS is 51 % man, 9.5% van de mannen rookt sigaren, tegenover 1.7% van de vrouwen

• Gevraagd: Hoe groot is de kans dat een willekeurige sigarenroker een man is?

Page 35: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

1. Intuïtieve benadering

Page 36: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

2. Formele benadering

Page 37: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals.

Einde vooruitblik

• Volgende week (week 6): – Vragenuur– Geen nieuwe stof– Voorbereiding proeftentamen