EARTH SC \ ENVIR SC \ GEOG 3MB3 STATISTICAL ANALYSIS SECTION 4 INFERENTIAL STATISTICS (cont’d)
Nov 27, 2015
EARTH SC \ ENVIR SC \ GEOG 3MB3STATISTICAL ANALYSIS
SECTION 4INFERENTIAL STATISTICS (cont’d)
Two-Sample Difference of Means Tests
We may want to form hypotheses comparing two populations; does significant difference exist?
Examples: Two similar cars are introduced at the same time with the same price. In after 5 years, have the two cars’ values depreciated the same amount?
In China, do we find a significant difference between the number of children born to women in coastal regions (heavy policing of one-child policy) and inland regions (weak policing of one-child policy) ?
Slight alterations of the one-sample difference of means Z and t testallow us to compare 2 populations
Two-Sample Difference of Means Z Test
Where:• E(X1) is the mean of sample 1• E(X2) is the mean of sample 2• σ1
2 is the variance of sample 1• σ2
2 is the variance of sample 2• n1 is the size of sample 1• n2 is the size of sample 2
Like the one-sample Z-test,we use the two-sample differenceof means Z-test when both n1 and n2 ≥ 30
Two-Sample Difference of Means t Test
In most cases we do not know the variance of the populations, so we estimate it from sample variances (s2) using the two-sample difference of means t test:
Though the formula for the Z and t test look the same, the denominator of the t test is derived using the sample variances. There are two ways to do this:
1. Assume population variances are equal (σ12 = σ2
2), and calculate a weighted average of the two sample variances called a pooled variance estimate (PVE)
2. Assume population variances are unequal (σ12 ≠ σ2
2), direct substitution of sample variances for population variances called a separate variance estimate (SVE)
Two-Sample Difference of Means t Test (cont’d)
Pooled Variance Estimate
Separate Variance Estimate ( σ12 ≠ σ2
2 )
( σ12 = σ2
2 )
Two-Sample Difference of Means t Test Example
A researcher found the mean house price in Dundas and Ancaster from a record of housing sales from 2012
Ancaster µA : $ 462, 579 n = 23 s = 35,000 s2 = 900,000,000
Dundas µD : $ 455, 891 n = 17 s = 15,000 s2 = 225,000,000
Is there a significant difference between mean house price in Dundas and Ancaster?
H0 : µA = µD
HA : µA > µD
Two-Sample Difference of Means t Test Example (cont’d)
We cannot use the Z test because we do not know the variances of the populations the 2 samples were taken from. We were only given the sample variances. We assume the population variances to be unequal
Two-Sample Difference of Means t Test Example (cont’d)
The t-value of 0.94 , according to the t table, corresponds to A = 0.3159
Therefore, the p-value is :
p-value = 0.5000 – 0.3159 = 0.1841
This is a relatively high p-value. We can not reject H0 at both α = 0.10 and α = 0.05
H0 : µA = µD
HA : µA > µD
Two-Sample Difference of Proportions TestUsed to compare two sample proportions for difference. Assumption: Variable being considered is binary (i.e. only 2 types of observation: yes-no, male-female
Where:• p1 = proportion of sample 1 in categoryof focus• p2 = proportion of sample 2 in categoryof focus• = pooled estimate of the focus category for the population
We define the focus category as one of the two possible responses.Ex. Proportions in Sample 1: Yes – 0.86 ; No – 0.14If we choose “Yes” as the focus category , we use 0.86 for calculations
A sample was taken from a county regarding a proposed legislation. Participants were divided into two categories: rural and urban. We want to know if there is a significant difference of opinion between rural and urban citizens on the legislation.
CategorySample Size (n)
Proportion in favour
Proportion Against
Urban 79 0.63 0.37
Rural 44 0.59 0.41
Two-Sample Difference of Proportions Example
H0 = purban = prural
HA = purban ≠ prural
Substitute the pooled estimate value into standard error of the difference equation
Put that expression into the Zp equation
Two-Sample Difference of Proportions Example (cont’d)
Two-Sample Difference of Proportions Example (cont’d)
Zp = - 0.43714, which corresponds to A = 0.1700
p-value= [ 0.5000 – (0.1700) ] x 2= 0.3300 x 2= 0.6600 Multiply by 2 because we
have a non-directional HA
This p-value of 0.6600 is very large. We cannot reject the null hypothesis that there is no difference between urban and rural opinions on the new legislation.
H0 : purban = prural
HA : purban ≠ prural
Matched-Pairs TestsMatched-pairs tests are used to analyze dependent samples
Dependent Samples : Samples that are related; results of one sample give information about other samples
Examples:
1. Two measurements of the same participant’s non-commute driving distances before and after an oil crisis. Did driving distances decrease?
2. Random sample of men and women from same Mexican villages to determine the average male and female life expectancy for these villages. Do male and female life expectancies differ between villages?
Each sample observation has two values, which are known as a matched-pair:
In the first example, matched-pairs would be formed from each participant’s before and after distances.
In the second, the life expectancies of men and women from the same village are dependent and constitute a matched-pair. This is because people in the same village are affected by the same social, economic, and environmental factors.
Matched-Pairs Tests (cont’d)
Matched-Pairs t Test
A parametric test which compares the mean differences of matched-pairs
Where:• di = difference of matched-pair i • E(d) = mean of matched-pair differences• σd = standard error of matched-pair differences • sd = standard deviation of matched-pair differences
Matched-Pairs t Test Example
Let’s perform a matched-pairs t test on the crop yield data
The average difference, E(d), was found to be 1.5333The term Σ[di - E(d)]2 was found to be 177.73422*[calculations are very space-consuming]
Matched-Pairs t Test Example (cont’d)
The t statistic calculated was 1.67This corresponds to A = 0.4444
p = 0.5000 – 0.4444p = 0.0556
This is a relatively low p-value. It rejects the null hypothesis at α = 0.10. It cannot reject the null hypothesis at α = 0.05 and α = 0.01
Parametric and Nonparametric Tests
Up to this point, we have made assumptions about the populations we have tested for differences in means and proportions:
• Population Parameters (μ ,ρ, and σ)• Populations are normally distributed with mean μ and standard deviation σ
Parametric TestsTests that require knowledge of population parameters and make certain assumptions about the population’s distribution. Can only be used with interval/ratio scale data
Parametric and Nonparametric Tests
Nonparametric TestsTests that require no knowledge of population parameters and make few assumptions about the population’s distribution. Can only be used for data in ordinal form
Data may only be available in ordinal form. Sometimes, we choose to downgrade interval/ratio data to ordinal data to use nonparametric tests.
We use nonparametric tests for non-normally distributed data
Example on next slide
Non-normally Distributed DataTurbidity, the measure of haziness or cloudiness of a fluid caused by suspended solids, is a key test of water quality. Turbidity values are generally very high upstream, and drop off downstream.
Turb
idity
non-normal distribution
normal distribution
Figure 1: Water Quality Along a Stream
Distance downstream from an arbitrarily chosen starting point (km)
As you can see, a non-normal distribution in green fits the data better than the normal distribution in red. We should use a non-parametric test to analyze
Wilcoxon Rank Sum W TestA nonparametric test of sample mean difference, which only works for ordinal data. It assumes that the two population distributions have the same shapeProcedure: 1. Combine the results of two samples and rank them (starting by
assigning the lowest value the rank of 1)2. If there is a tie, assign the average rank between the pairs ( Rank 7
through 11 all equal 34.3. Assign a rank of 9 to all values. 3. Put the ranked values back into their original samples
Where: • Wi = sum of ranks for smaller sample• E(Wi) = mean rank of smaller sample
Wilcoxon Matched-Pairs Signed-Ranks Test
A nonparametric test comparing matched-pair differences using their absolute differences
Steps:
1. For all pairs, determine the sign of the differences and the absolute value of the differences
2. Exclude all pairs with an absolute difference of 03. Order remaining pairs from smallest absolute difference to
largest absolute difference4. Rank the pairs, starting with the smallest as 1. Ties receive a
rank equal to the average of the ranks they span
Wilcoxon Matched-Pairs Signed-Ranks Test (cont’d)
Where: • n = number of matched-pairs; must be > 10
• T = rank sum
There are two possible values for T: Tp (rank sum for positive differences) and Tn (rank sum for negative differences). Which to use depends on how HA is stated.
If non-directional, we choose the smaller of Tp and Tn
If directional, we choose the value of T according to the smaller number of hypothesized differences (i.e. If more differences are expected to be positive, we choose Tn.
Wilcoxon Rank Sum W Test Example
The Canadian government decides to present Canada’s gross exports for 2013 by dividing the country into 20 geographic regions. Instead of providing exact dollar values for each region, they rank them. Researchers are interested in determining whether there is an appreciable difference in gross exports between Eastern and Western Canada.
Groups A - K are classified as “Eastern”, and L-T as “Western”
H0 : ∑RE = ∑RW
HA : ∑RE ≠ ∑RW
Region Location RankA East 11B East 12C East 3D East 16E East 6F East 14G East 2H East 5I East 13J East 20K East 4L West 10M West 9N West 17O West 1P West 18Q West 7R West 19S West 8T West 15
Region Location RankO West 1G East 2C East 3K East 4H East 5E East 6Q West 7S West 8M West 9L West 10A East 11B East 12I East 13F East 14T West 15D East 16N West 17P West 18R West 19J East 20
Wilcoxon Rank Sum W Test Example (cont’d)
We first calculate the sum of the Western and Eastern Ranks respectively
Sum of Western Ranks∑ RW = 10 + 9 + 17 + 1 + 18 + 7 + 19 + 8 + 15 = 104
Sum of Eastern Ranks∑ RE = 11 + 12 + 16 + 3 + 6 + 14 + 2 + 5 + 20 + 13 + 4 = 106
Wilcoxon Rank Sum W Test Example (cont’d)
** Calculate W using values fromsample with smaller sample size
Wilcoxon Rank Sum W Test Example (cont’d)
The Z-score calculated using the Wilcoxon Rank Sum Test was 0.722
According to the normal table, that Z-score corresponds to A = 0.2642
p-value = 2(0.5000 – 0.2642) = 0.4715
This is a very high p-value, so we cannot reject the H0 that there is no appreciable difference in gross exports between the Eastern and Western regions
H0 : WE = WW
HA : WE ≠ WW
Wilcoxon Matched-Pairs Signed-Ranks Test Example
A farmer recorded his crop yields for two consecutive years. The average rainfall in the growing season of Year 1 was much greater than Year 2. Is there significant difference in crop yields between the two years?
Crop Yield X-Y
i2011 (X)
2012 (Y) sign |X-Y|
1 92 87 pos 52 91 90 pos 13 84 78 pos 64 86 86 N/A 05 87 89 neg 26 90 87 pos 37 92 93 neg 18 91 94 neg 39 97 92 pos 5
10 102 95 pos 711 107 101 pos 612 102 101 pos 113 89 93 neg 414 90 91 neg 115 92 92 N/A 0
Crop Yield X-Y
i2011
(X)2012(
Y) sign |X-Y| Rank2 91 90 pos 1 27 92 93 neg 1 2
14 90 91 neg 1 212 102 101 pos 1 25 87 89 neg 2 58 91 94 neg 3 6.56 90 87 pos 3 6.5
13 89 93 neg 4 89 97 92 pos 5 9.51 92 87 pos 5 9.53 84 78 pos 6 11.5
11 107 101 pos 6 11.510 102 95 pos 7 13
** we exclude pairs where X-Y = 0
H0: Yield2011 = Yield2012
HA: Yield2011 > Yield 2012
Wilcoxon Matched-Pairs Signed-Ranks Test Example (cont’d)
We now find the sum of ranks for pairs with positive and negative differences
Sum of Ranks for Positive Differences (Tp)ΣTp = 2 + 2 + 6.5 + 9.5 + 9.5 + 11.5 + 11.5 + 13 = 63.5Sum of Ranks for Negative Differences (Tn)ΣTn = 2 + 2 + 5 + 6.5 + 8 = 23.5
Since our HA is directional, that the yield of 2011 will be greater than 2012, we must determine whether there are more positive or negative differences in our data. We use the rank sum of the group with the smaller number of differences.
Thus, we use rank sum for negative differences = 23.5
Wilcoxon Matched-Pairs Signed-Ranks Test Example (cont’d)
A Z-score of -1.54 corresponds toA = 0.4382
p = 0.5000 – 0.4382p = 0.0618
This p-value rejects the null hypothesis at α = 0.10, but fails to reject it at α = 0.05 and α = 0.01