Non- Parametric Statistics A Presentation by Rob McMullen for AP Statistics.

Non-Non-ParametricParametric StatisticsStatistics

A Presentation by A Presentation by Rob McMullen Rob McMullen for for

AP StatisticsAP Statistics

What are Non-Parametric What are Non-Parametric Statistics?Statistics?

Non-parametric statistics are a special form of statistics which help statisticians with a problem occuring in Parametric statistics. In order to understand what non parametric statistics are, it is first necessary to know what parametric statistics are.

end

What are Parametric What are Parametric Statistics?Statistics?

In AP statistics, when we refer to a distribution we In AP statistics, when we refer to a distribution we often make certain assumptions about it that enable often make certain assumptions about it that enable us to work with it. One thing that helps us with us to work with it. One thing that helps us with this is the CLT, which allows us to assume that this is the CLT, which allows us to assume that many sampling distributions are approximately many sampling distributions are approximately normal. normal.

This theorem, the Central Limit Therom, tells us that for any This theorem, the Central Limit Therom, tells us that for any distribution with a mean and variance, the sampling distribution with a mean and variance, the sampling distribution for all samples of a given sample size is distribution for all samples of a given sample size is

approximately normally distributed. approximately normally distributed.

end

When are Parametric When are Parametric Statistics not useful?Statistics not useful?

When we do significance tests, we rely on the When we do significance tests, we rely on the assumption that the sampling distribution of assumption that the sampling distribution of samples taken follows the t-distribution or the z-samples taken follows the t-distribution or the z-distribution, depending on the situation. When this distribution, depending on the situation. When this assumption is not true, none of our tests, which are assumption is not true, none of our tests, which are called “parametric statistical inference tests,” are called “parametric statistical inference tests,” are reliable. Everything we have done in AP stats has reliable. Everything we have done in AP stats has been in the field of “parametric statistics.”been in the field of “parametric statistics.”

end

Why does lack of normality Why does lack of normality cause problems?cause problems?

When we calculate the p-value for an inference test, we When we calculate the p-value for an inference test, we find the probability that the sample was different due to find the probability that the sample was different due to sampling variability. Basically, we are trying to see if a sampling variability. Basically, we are trying to see if a recorded value occurred by chance and chance alone. recorded value occurred by chance and chance alone. When we look for a p-value, we are assuming that all When we look for a p-value, we are assuming that all samples of the given sample size are normally distributed samples of the given sample size are normally distributed around the mean. This is why the test statistic, which is the around the mean. This is why the test statistic, which is the number of standard deviations away from the population number of standard deviations away from the population mean the sample mean is, is able to be used. Therefore, mean the sample mean is, is able to be used. Therefore, without normality, no p-value can be found.without normality, no p-value can be found.

end

The way in which statisticians deal with this problem of parametric statistics is the field of non-parametric statistics. These are tests that can be done without the assumption of normality, approximate normality, or symmetry. These tests do not require a mean and standard deviation. Since a standard deviation assumes symmetry, it is not useful for many distributions anyway.

What are Non-Parametric What are Non-Parametric Statistics?Statistics?

end

What is different about Non-What is different about Non-Parametric Statistics?Parametric Statistics?

In parametric statistics, one deals with the median rather than the mean. Since a mean can be easily influenced by outliers or skewness, and we are not assuming normality, a mean no longer makes sense. The median is another judge of location, which makes more sense in a non-parametric test. The median is considered the center of a distribution.

Sometimes statisticians use what is called “ordinal” data. This data is obtained by taking the raw data and giving each sample a rank. These ranks are then used to create test statistics.

end

Parametric TestParametric Test Goal for Goal for Parametric TestParametric Test

Non-Parametric Non-Parametric TestTest

Goal for Non-Goal for Non-Parametric TestParametric Test

Two Sample T-TestTwo Sample T-Test To see if two samples To see if two samples have identical have identical population meanspopulation means

Wilcoxon Rank-Sum Wilcoxon Rank-Sum TestTest

To see if two samples To see if two samples have identical have identical population medianspopulation medians

One Sample T-TestOne Sample T-Test To test a hypothesis To test a hypothesis about the mean of the about the mean of the population a sample was population a sample was taken fromtaken from

Wilcoxon Signed Ranks Wilcoxon Signed Ranks TestTest

To test a hypothesis To test a hypothesis about the median of the about the median of the population a sample population a sample was taken fromwas taken from

Chi-Squared Test for Chi-Squared Test for Goodness of FitGoodness of Fit

To see if a sample fits a To see if a sample fits a theoretical distribution, theoretical distribution, such as the normal curvesuch as the normal curve

Kolmogorov-Smirnov Kolmogorov-Smirnov TestTest

To see if a sample To see if a sample could have come from could have come from a certain distributiona certain distribution

ANOVAANOVA To see if two or more To see if two or more sample means are sample means are significantly differentsignificantly different

Kruskal-Wallis TestKruskal-Wallis Test To test if two or more To test if two or more sample medians are sample medians are significantly differentsignificantly different

Tests for non-parametric statistics are similar to the tests covered in AP stats, but each is slightly different. There are

non-parametric tests which are similar to the parametric tests. The following table shows how some of the tests

match up.

end

A N O V AA N O V AWhat is an ANOVA?What is an ANOVA?

How does one carry out an How does one carry out an ANOVA?ANOVA?

When are ANOVAs useful?When are ANOVAs useful?

end

Since ANOVAs were not covered in AP Since ANOVAs were not covered in AP stats, I will now explain them. An ANOVA stats, I will now explain them. An ANOVA is a way to compare multiple sample is a way to compare multiple sample means to see if they are significantly means to see if they are significantly different. The term comes from a term different. The term comes from a term that describes what the experiment does:that describes what the experiment does:

ANANalysis alysis OOf f VAVAriance = riance = ANOVAANOVA. .

An ANOVA looks at the variance between An ANOVA looks at the variance between the sample means, and decides if they are the sample means, and decides if they are significant or not. This can be done to significant or not. This can be done to compare two or more samples.compare two or more samples.

A N O V AA N O V AWhat is an ANOVA?What is an ANOVA?

end

A N O V AA N O V AWhen are ANOVAs useful?When are ANOVAs useful?

An ANOVA can be used when one wants to compare any number of samples. This test be done to see if many samples could have come from the same population. This test can also tell you about the differences between two or more areas. For example, if a survey is conducted in many different towns, you can see if their average responses differ significantly. Similarly, you can take samples of plant growth in different climates, soil, or with different treatments. In all cases, an ANOVA can be used to see if the means vary significantly.

end

A N O V AA N O V AHow does one carry out an How does one carry out an ANOVA?ANOVA?An ANOVA is conducted by first putting all the samples into one, large sample. The standard deviation of this sample is then found, and called . Next, the value for the range of variation in sample means is found. If the variation between the means is greater than the range of variation, the null hypothesis is rejected. The range of variation is found by finding / N½, (N½ is the square-root of N) where N is the number of samples in each sample. The difference between each pair of sample means is then found, which is the variation of the means. If any one of these is greater than the range of variation, then those two means are significantly different from each other. Depending on your goal, this may cause you to reject your null hypothesis.

end

EXAMPLEEXAMPLE

Now that I have explained the background Now that I have explained the background principles of Non-Parametric Statistics, I will principles of Non-Parametric Statistics, I will now carry out an example of one of the tests. now carry out an example of one of the tests. I have chosen the Wilcoxon Rank-Sum Test I have chosen the Wilcoxon Rank-Sum Test (also call the Wilcoxon Mann-Whitney Test) (also call the Wilcoxon Mann-Whitney Test) because it is the most commonly used test.because it is the most commonly used test.

end

The Wilcoxon Rank-Sum TestThe Wilcoxon Rank-Sum Test

The Wilcoxon Rank-Sum Test is used in The Wilcoxon Rank-Sum Test is used in place of the two-sample t-test when the place of the two-sample t-test when the sampling distributions of the variables sampling distributions of the variables being compared are not normal. This being compared are not normal. This test requires two samples of sample size test requires two samples of sample size n1 and n2. The test is carried out as n1 and n2. The test is carried out as follows. follows.

Items in green are the steps to the test. Items in green are the steps to the test.

Items in white are an example of a real test.Items in white are an example of a real test.

end


Sample 1: {3,2,12,9,13,7,9,11,4,5,6} n1=11

Sample 2: {1,8,4,15,12,6,10,14,3,3} n2=10

1: The first step in this procedure 1: The first step in this procedure is to collect two samples.is to collect two samples.

end


Combined Sample size: n1+n2 = 10+11 = 21

{3,2,12,9,13,7,9,11,4,5,6} and {1,8,4,15,12,6,10,14,3,3}

2: 2: The Second step is to combine the two samples into The Second step is to combine the two samples into one large sample. Simply take all the data values from one large sample. Simply take all the data values from each sample and make one large group. Make sure to each sample and make one large group. Make sure to know the original samples, as the data will have to be know the original samples, as the data will have to be separated back into its original state later.separated back into its original state later.

{3,2,12,9,13,7,9,11,4,5,6,1,8,4,15,12,6,10,14,3,3}

becomes:

end


In order is:

{1,2,3,3,3,4,4,5,6,6,7,8,9,9,10,11,12,12,13,14,15}

3: Once all the data is in one sample, the data 3: Once all the data is in one sample, the data must be put into order by size. The data must be put into order by size. The data should go from smallest to largest. should go from smallest to largest.

{3,2,12,9,13,7,9,11,4,5,6,1,8,4,15,12,6,10,14,3,3}

end


4: Each data value is given a rank based on size. If 4: Each data value is given a rank based on size. If two or more data have the same value, their rank is two or more data have the same value, their rank is the average of the ranks. This step is when the raw the average of the ranks. This step is when the raw data becomes ordinal data, or ranked data.data becomes ordinal data, or ranked data.

RANK:

RAW DATA:

Combined sample in order is: (sample size 21){1,2,3,3,3,4,4,5,6,6,7,8,9,9,10,11,12,12,13,14,15}

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

1 2 3 3 3 4 4 5 6 6 7 8 9 9 10 11 12 12 13 14 15

Each data value is ranked 1-21:

end

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

1 2 3 3 3 4 4 5 6 6 7 8 9 9 10 11 12 12 13 14 15

1 2 4 6.5 8 9.5 11 12 13.5 15 16 17.5 19 20 21

1 2 3 3 3 4 4 5 6 6 7 8 9 9 10 11 12 12 13 14 15

When two or more data have the same rank, the rank is averaged. Therefore, the data becomes:

RANK:

RANK:

RAW DATA:

RAW DATA:

end


Ranked Sample 1: {4,2,17.5,13.5,19,11,13.5,16,6.5,8,9.5}

Ranked Sample 2: {1,12,6.5,21,17.5,9.5,15,20,4,4,}

5: The data are then put back into their original 5: The data are then put back into their original sampling groups as ranked data.sampling groups as ranked data.

1 2 4 6.5 8 9.5 11 12 13.5 15 16 17.5 19 20 21

1 2 3 3 3 4 4 5 6 6 7 8 9 9 10 11 12 12 13 14 15RAW DATA:

RANK:

Orininal Sample 1: {3,2,12,9,13,7,9,11,4,5,6}

Original Sample 2: {1,8,4,15,12,6,10,14,3,3}

end


Sum of sample 1: 120.5

Sum of sample 2: 110.5

Ranked Sample 1: {4,2,17.5,13.5,19,11,13.5,16,6.5,8,9.5}

Ranked Sample 2: {1,12,6.5,21,17.5,9.5,15,20,4,4,}

6: The sum of the ranks is taken for each sample. 6: The sum of the ranks is taken for each sample. This is the test statistic.This is the test statistic.

end

The Wilcoxon Rank-Sum TestThe Wilcoxon Rank-Sum TestSUMMARY:SUMMARY:

1: Two samples are taken.1: Two samples are taken.

2: The samples are combined to make one distribution of 2: The samples are combined to make one distribution of sample size (n1+n2). sample size (n1+n2).

3: The data are put into order, based on size. 3: The data are put into order, based on size.

4: Each data value is given a rank based on size. If two or 4: Each data value is given a rank based on size. If two or more data have the same value, their rank is the average of more data have the same value, their rank is the average of the ranks.the ranks.

5: The data are then put back into their original sampling 5: The data are then put back into their original sampling groups as ranked data.groups as ranked data.

6: The sum of the ranks is taken for each sample. This is 6: The sum of the ranks is taken for each sample. This is the test statistic.the test statistic.

end

Non-Non-ParametricParametric StatisticsStatistics

This concludes my presentation. Are there any topics which have been covered that are not clear, which you would like to see again?

Explanation of an ANOVA

Introduction to Non-Parametric Statistics

Chart comparing Significance Tests

Wilcoxon Rank-Sum Test explanation/example

THANK YOUTHANK YOU

I would like to thank you for taking the time to view this presentation. If you have any questions regarding this topic, you may email me at [email protected].

I hope that this has been informational and that you now clearly understand what non-parametric statistics are.

Non- Parametric Statistics A Presentation by Rob McMullen for AP Statistics.

Documents

test statistics

non parametric test

nonparametric statistics

nonparametric tests

field of parametric

sample ttest

ap statistics slide

ranks test