YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Statistical InferenceStatistical Inference

Page 2: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Two Statistical Tasks 1. Description 2. Inference

Page 3: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Thus far, we have completed:1. Descriptive Statistics a. Central tendency

i. discrete variablesii. continuous variables

b. Variationi. discrete variablesii. continuous variables

c. Associationi. discrete variables

Page 4: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Now we begin: 2. Inferential Statistics

a. Estimation b. Hypothesis testing

Page 5: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Inferential statistics are based on random sampling.A sample is a subset of some universe (or population [set]).If (and only if) the sample is selected according to the laws of probability, we can make inferences about the universe from known (statistical) characteristics of the sample.

Page 6: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.
Page 7: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.
Page 8: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

“Random” means selected so that each element in the universe has exactly the same chance of being picked for the sample (sometimes called an equi-probability sample).

Page 9: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Put differently, the only difference between elements selected into the sample and those not selected is pure chance (i.e., “the luck of the draw”).

Page 10: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

All inferential statistics evaluate the probability that unlucky selection in creating a random sample (the “luck of the draw,” technically called “sampling error”) explains the statistical outcomes obtained from random samples.

Page 11: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.
Page 12: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Sample 1: 75% cardinal(n1 = 4)

Page 13: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Sample 2: 0% cardinal(n2 = 4)

Page 14: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Sample 3: 25% cardinal(n3 = 4)

Page 15: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Percent cardinal f

0 lowest 25 medium 50 highest 75 medium100 lowest

Page 16: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

0% 25% 50% 75% 100%Percent cardinal in random samples

Page 17: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

All statistics calculated on variables from a random sample have a (known) sampling distribution. Sampling distributions are the theoretically possible distributions of statistical outcomes from an infinite number of random samples of the same size.

Page 18: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Knowing this, we do not actually need to draw an infinite number of random samples. When we draw ONE (large) random sample, CHANCES ARE that its characteristics will be closer to the center of its sampling distribution than the extremes. That is, any sample statistic is likely to be close to (rather than very different from) the actual (unknown) value (parameter) in the universe.

Page 19: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

For example, when we find that the value of 2 for the association between two variables in a large random sample is 13.748, chances are that the (unknown) value of 2 for the universe (the so-called “true” value) is similar rather than very different.

Page 20: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

The question is: Does this sample value of 2 permit us to infer that the two variables are (probably) related or are (probably) independent in the universe? The answer requires knowing how to use the Chi-Square sampling distribution(s).

Page 21: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Sampling distributions allow us to identify the probability that a sample statistic has a similar value in the universe from which the random sample was drawn (that is, whether the value holds in general, not merely for the sample).

Page 22: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Unfortunately, 2 has not one but several sampling distributions, each differently shaped. The one that is relevant for the specific inference we wish to make can be identified by knowing the number of degrees of freedom involved in the calculation of this sample statistic.

Page 23: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.
Page 24: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

In the case of contingency tables (crosstabulations), degrees of freedom associated with 2 are a function of the size of the table (i.e., the number of rows and columns). Specifically,

df = (R – 1)(C – 1)

Page 25: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

For example, a contingency table having two rows and two columns (i.e., a 2 x 2 table) has only one degree of freedom:

df = (R – 1)(C – 1) = (2 – 1)(2 – 1) = (1)(1) = 1

Page 26: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Column

Row One Two Total

One ? ? 100

Two ? ? 200

Total 200 100 300

Page 27: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Column

Row One Two Total

One 96 ? 100

Two ? ? 200

Total 200 100 300

Page 28: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Column

Row One Two Total

One 96 4 100

Two ? ? 200

Total 200 100 300

Page 29: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Column

Row One Two Total

One 96 4 100

Two 104 ? 200

Total 200 100 300

Page 30: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Column

Row One Two Total

One 96 4 100

Two 104 96 200

Total 200 100 300

Page 31: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

An Example

Year

1984 1985 TotalParty Preference Democrat 545 595 1,140

Independent 528 462 990

Republican 370 455 825

Total 1,443 1,512 2,955

For the crosstabulation in this table, 2 = 13.748. Is the association in this table confined to the sample, or does this mean that there was a “real” shift in party identification from one year to the next? There are several steps in answering this question.

Page 32: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Since these data are from a large random sample, we can use the laws of chance to infer whether this value represents a “real” shift in the universe (i.e., among people in the U.S. in general ) or is merely an artifact of sampling (bad luck in randomly selecting 2,955 people who are NOT like the rest of the population).

We know that 13.748 is ONE of the values on a sampling distribution of 2, but which sampling distribution? Since df = 2 [i.e., (3-1)(2-1)], we can determine that the sampling distribution is the one whose values are located in row 2 of the table in Appendix 4, the “Critical Values of Chi-Square.”

We need a DECISION RULE or CUT POINT to decide whether this represents a true shift or merely the result of chance in drawing the random sample.

Page 33: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.
Page 34: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

We must decide what chance of being wrong we want to entertain in deciding between a “true” relationship between changes over time and political party preference (i.e., one that actually exists in the universe) and an artifact of sampling (i.e., a relationship that exists nowhere else except in our sample due to the “luck of the draw”). Actually, with Appendix 4 we are limited to some conventional probabilities of deciding incorrectly: 10 percent (.10, column 1), 5 percent (.05, column 2), 1 percent (.01, column 3), or 1/10 of 1 percent (.001, column 4). Until we have introduced some additional criteria, let's stick with a 5 percent chance of incorrectly deciding between a real association and chance.

Page 35: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

This is known as an alpha level (or significance level) and is expressed as:

= 0.05

It means that we have only a 5 percent chance of incorrectly deciding between a true association in the universe and one due to chance (which exists only in the sample). In other words, this means that we have a 95 percent chance of being correct in making our inference.

Page 36: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Having decided on an alpha level of .05 (i.e., accepting a 5 percent chance that we will decide incorrectly) and knowing the appropriate Chi-Square sampling distribution (one defined by 2 degrees of freedom), we can find the critical value of 2. From row 2 (df = 2) and column 2 ( = .05) of Appendix 4, we find that the appropriate critical value is 5.99. Since 2 for the data was calculated to be 13.748 and since 13.748 is GREATER than the critical value, we conclude that the odds favor there being a true association between party preference and year of poll. In other words, there is less that a 5 percent chance that this association could be due to chance (by randomly selecting people who are atypical of the rest of the population).

Page 37: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.
Page 38: Statistical Inference Two Statistical Tasks 1. Description 2. Inference.

Recapitulation

1. Statistical inference involves “generalizing” from a sample to a (statistical) universe.2. Statistical inference is only possible with random samples.3. Statistical inference estimates the probability that a sample result could be due to chance (in sample selection).4. Sampling distributions are the “keys” that connect (known) sample statistics and (unknown) universe parameters.5. Alpha levels are used to identify “critical values” on sampling distributions.


Related Documents