Top Banner
Inferential Inferential Statistics Statistics Definition: Definition: Statistics, Statistics, derived from sample data, derived from sample data, that are used to make that are used to make inferences about the inferences about the population from which the population from which the sample was drawn. sample was drawn. Generalizability Generalizability is is important is this type of important is this type of statistic because it is statistic because it is the ability to use the the ability to use the results of data collected results of data collected from a sample to reach from a sample to reach conclusions about the conclusions about the characteristics of the characteristics of the population. population. Definition Definition : Statistics : Statistics used to described the used to described the characteristics of a characteristics of a distribution of scores. distribution of scores. They apply only to the They apply only to the members of a sample or members of a sample or population from which population from which data have been data have been collected. collected. Generalizability Generalizability to the to the population is not the population is not the objective of objective of descriptive statistics descriptive statistics Descripti ve Statistic s
31

Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Dec 26, 2015

Download

Documents

Charles Mccoy
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Inferential Inferential StatisticsStatistics

Definition: Definition: Statistics, derived Statistics, derived from sample data, that are from sample data, that are used to make inferences used to make inferences about the population from about the population from which the sample was drawn. which the sample was drawn.

Generalizability Generalizability is important is important is this type of statistic is this type of statistic because it is the ability to use because it is the ability to use the results of data collected the results of data collected from a sample to reach from a sample to reach conclusions about the conclusions about the characteristics of the characteristics of the population.population.

DefinitionDefinition: Statistics used : Statistics used to described the to described the characteristics of a characteristics of a distribution of scores. distribution of scores. They apply only to the They apply only to the members of a sample or members of a sample or population from which population from which data have been collected. data have been collected.

GeneralizabilityGeneralizability to the to the population is not the population is not the objective of descriptive objective of descriptive statisticsstatistics

Descriptive Statistics

Page 2: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

PopulationPopulation Definition:Definition: The collection of cases that comprise The collection of cases that comprise

the entire set of cases with the specified the entire set of cases with the specified characteristics (e.g., “All living adult males in characteristics (e.g., “All living adult males in the United States”)the United States”)

ExampleExample: In order to find the average salary of : In order to find the average salary of

Psychology majors who graduated from college Psychology majors who graduated from college in 2004, collect information about the salaries in 2004, collect information about the salaries of all the 2004 Psychology graduates and derive of all the 2004 Psychology graduates and derive an average from that data.an average from that data.

Any value generated from or applied to the Any value generated from or applied to the population is a population is a parameterparameter..

Page 3: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

SampleSample

DefinitionDefinition: : A collection of A collection of cases selected from a larger cases selected from a larger population population

ExampleExample: In order to find the : In order to find the average salary of Psychology average salary of Psychology majors who graduated from majors who graduated from college in 2004, you select college in 2004, you select (randomly or non-randomly) (randomly or non-randomly) some of these graduates and some of these graduates and derive a mean from their derive a mean from their salaries.salaries.

Any value derived from the Any value derived from the sample, such as the mean, is sample, such as the mean, is a a statistic.statistic.

Page 4: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Sampling MethodsSampling MethodsRANDOMRANDOM

DefinitionDefinition: Selecting : Selecting cases from a cases from a population in a population in a manner that ensures manner that ensures each member of the each member of the population has an population has an equal chance of being equal chance of being selected into the selected into the sample. sample.

One of the most One of the most useful, but most useful, but most difficult to use.difficult to use.

The major benefit of The major benefit of random sampling is random sampling is that any differences that any differences between the sample between the sample and the population and the population from which the sample from which the sample was selected will not was selected will not be systematic. be systematic.

CONVENIENCECONVENIENCE Definition: Definition: Selecting Selecting

a sample based on a sample based on ease of access or ease of access or availability. availability.

This method of This method of selecting a sample selecting a sample is less labor-is less labor-intensive than intensive than selecting a random selecting a random or representative or representative sample.sample.

In order for it to be In order for it to be an acceptable an acceptable method, it cannot method, it cannot differ from my differ from my population of population of interest interest in ways that in ways that influence the influence the outcome of the outcome of the study.study.

REPRESENTATIVE

Definition: Definition: A method of A method of selecting a sample in which selecting a sample in which members are purposely selected members are purposely selected to create a sample that to create a sample that represents the population on represents the population on some characteristic(s) of interest some characteristic(s) of interest (e.g., when a sample is selected (e.g., when a sample is selected to have the same percentages of to have the same percentages of various ethnic groups as the various ethnic groups as the larger population).larger population).

• This type of sampling can be This type of sampling can be expensive and time consuming, expensive and time consuming, however it ensures that your however it ensures that your sample looks the population on sample looks the population on some important variables, some important variables, therefore increasing the therefore increasing the generalizability of the sample.generalizability of the sample.

Page 5: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

VariableVariable

Any construct with more than one value that is Any construct with more than one value that is examined in research. examined in research.

Examples include income, gender, age, height, Examples include income, gender, age, height, attitudes about school, score on a measure of attitudes about school, score on a measure of depression, etc. depression, etc.

Page 6: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Types of VariablesTypes of Variables Quantitative (continuous) Quantitative (continuous)

variablevariable A variable that A variable that has assigned values and has assigned values and the values are ordered and the values are ordered and meaningful, such that 1 is meaningful, such that 1 is less than 2, 2 is less than less than 2, 2 is less than 3, etc. 3, etc.

Qualitative Qualitative (categorical) variable(categorical) variable A variable that has A variable that has discrete categories. If discrete categories. If the categories are the categories are given numerical given numerical values, the values have values, the values have meaning as nominal meaning as nominal references but not as references but not as numerical values (e.g., numerical values (e.g., in 1 = “male” and 2 = in 1 = “male” and 2 = “female” 1 is not more “female” 1 is not more or less than 2). or less than 2).

Page 7: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Scales of Measurement for Scales of Measurement for VariablesVariables

Nominally (or categorical) Nominally (or categorical) scaled variable:scaled variable: A variable in A variable in which the numerical values which the numerical values assigned to each category are assigned to each category are simply labels rather than simply labels rather than meaningful numbers.meaningful numbers.

Ordinal variable:Ordinal variable: Variables Variables measured with numerical values measured with numerical values where the numbers are meaningful where the numbers are meaningful (e.g., 2 is larger than 1) but the (e.g., 2 is larger than 1) but the distance between the numbers is distance between the numbers is not constant. not constant.

Interval or Ratio variable:Interval or Ratio variable: Variables measured with numerical Variables measured with numerical values with equal distance, or values with equal distance, or space, between each number (e.g., space, between each number (e.g., 2 is twice as much as 1, 4 is twice 2 is twice as much as 1, 4 is twice as much as 2, the distance between as much as 2, the distance between 1 and 2 is the same as the distance 1 and 2 is the same as the distance between 2 and 3).between 2 and 3).

Page 8: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Collecting DataCollecting Data

Collecting data produces a group of Collecting data produces a group of scores on one or more variablesscores on one or more variables

To get the distribution of scores you must To get the distribution of scores you must arrange the scores from lowest to highestarrange the scores from lowest to highest

Researchers are usually interested in Researchers are usually interested in central tendency, a set of distribution central tendency, a set of distribution characteristics that consist of the mean, characteristics that consist of the mean, median, and modemedian, and mode

Page 9: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

The MeanThe Mean

DefinitionDefinition: The arithmetic average of a : The arithmetic average of a distribution of scoresdistribution of scores

Provides a single, simple number that Provides a single, simple number that gives a rough summary of the distributiongives a rough summary of the distribution

The most commonly used statistic in all The most commonly used statistic in all social science researchsocial science research

Useful, but does not tell you anything Useful, but does not tell you anything about how spread out the scores are (i.e., about how spread out the scores are (i.e., variance) or how many scores in the variance) or how many scores in the distribution are close to the meandistribution are close to the mean

Page 10: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

The MedianThe Median

DefinitionDefinition: : The score in a distribution that The score in a distribution that marks the 50th percentile. It is the score at marks the 50th percentile. It is the score at which 50% of the distribution falls below and which 50% of the distribution falls below and 50% fall above50% fall above

Used when dividing distribution scores into Used when dividing distribution scores into two groups (median split)two groups (median split)

Useful statistic to examine when the scores Useful statistic to examine when the scores in a distribution are skewed or when there in a distribution are skewed or when there are a few extreme scores at the high end or are a few extreme scores at the high end or the low end of the distributionthe low end of the distribution

Page 11: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

The ModeThe Mode

DefinitionDefinition: : The score in the distribution The score in the distribution that occurs most frequentlythat occurs most frequently

Least used of the measures of central Least used of the measures of central tendency; provides the least amount of tendency; provides the least amount of information information

Page 12: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

SXnN

is the sample meanis the sample mean

is the population meanis the population mean

means “the sum of”means “the sum of”

is an individual score in the distributionis an individual score in the distribution

is the number of scores in the sampleis the number of scores in the sample

is the number of scores in the populationis the number of scores in the population

X

Nn

XX

,,

Formula for calculating the mean of a Formula for calculating the mean of a distributiondistribution

Calculating the MeanCalculating the Mean

1.1. Add, or sum, all of the Add, or sum, all of the scores in a distributionscores in a distribution

2.2. Divide by the number of Divide by the number of scoresscores

1.1. Multiply each value by Multiply each value by the frequency for which the frequency for which the value occurredthe value occurred

2.2. Add all of these productsAdd all of these products

3.3. Divide by the number of Divide by the number of scores scores

OROR

Page 13: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Calculating The MedianCalculating The Median1.1. Arrange all of the scores in the Arrange all of the scores in the

distribution in order, from distribution in order, from smallest to largestsmallest to largest

2.2. Find the middle score in the Find the middle score in the distributiondistribution

If there is an odd number of scores...If there is an odd number of scores...

there will be a single score that marks the there will be a single score that marks the middle of the distributionmiddle of the distribution

If there are an even number of scores If there are an even number of scores in the distribution...in the distribution...

the median is the average of the the median is the average of the twotwo scores in the middle of the distribution (as scores in the middle of the distribution (as long as the scores are arranged in order, long as the scores are arranged in order, from largest to smallest)from largest to smallest)

Finding the averageFinding the average

add the two scores in the middle together add the two scores in the middle together and divide by twoand divide by two

Page 14: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Finding The ModeFinding The Mode

Remember, Remember, the mode is simply the mode is simply the category in the distribution the category in the distribution that has the highest number of that has the highest number of scores, or the highest frequencyscores, or the highest frequency

MultimodalMultimodal: When a distribution : When a distribution of scores has two or more of scores has two or more values that have the highest values that have the highest frequency of scoresfrequency of scores

• Example - Example - Bimodal distributionBimodal distribution: : A distribution that has two A distribution that has two values that have the highest values that have the highest frequency of scores; often frequency of scores; often occurs when people respond to occurs when people respond to controversial questions that controversial questions that tend to polarize the publictend to polarize the public

11——————————22——————————33——————————44——————————55

Strongly OpposedStrongly Opposed Strongly In FavorStrongly In Favor

11 22 33 44 55

Frequency Frequency of of Responses Responses in Each in Each CategoryCategory

4545 33 44 33 4545

Category of Responses on the Category of Responses on the ScaleScale

Frequency of ResponsesFrequency of Responses

Example of bimodal distributionExample of bimodal distributionOn the following scale, please indicate how you On the following scale, please indicate how you feel about capital punishment.feel about capital punishment.

Page 15: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Example: The Mean, Example: The Mean, Median, and Mode of a Median, and Mode of a

DistributionDistributionThe following distribution of test scores are given:The following distribution of test scores are given:

8686 9090 9696 9696 100100 105105 115115 121121

Mean = Mean = 86+90+96+96+100+105+115+121 86+90+96+96+100+105+115+121 = 101.13= 101.13 88

Median = Median = 96+100 96+100 = 98= 98 22

Mode = 96Mode = 96

Calculating the mean:Calculating the mean: Add up all the scores, then divide by the number of Add up all the scores, then divide by the number of scores. In this case, there are 8 IQ scores.scores. In this case, there are 8 IQ scores.

Calculating the median:Calculating the median: Because there is an even amount of scores, sum the two Because there is an even amount of scores, sum the two scores that are found in the middle of the distribution when it is put into scores that are found in the middle of the distribution when it is put into numerical order, then divide by two.numerical order, then divide by two.

Calculating the mode:Calculating the mode: 96 is the most frequent number that occurs 96 is the most frequent number that occurs

Page 16: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Skewed DistributionSkewed Distribution

DefinitionDefinition: A: A distribution of distribution of scores has a high number of scores has a high number of scores clustered at one end of scores clustered at one end of the distribution with the distribution with relatively few scores spread relatively few scores spread out toward the other end of out toward the other end of the distribution, forming a the distribution, forming a tail.tail.

When working with a skewed When working with a skewed distribution, distribution, the mean, the mean, median, and mode are usually median, and mode are usually all at different pointsall at different points rather rather than at the center of than at the center of distribution.distribution.

Similarities between a Similarities between a skewed and normal skewed and normal distribution:distribution:

• The procedures used to The procedures used to calculate a mean, median, calculate a mean, median, and mode are the same and mode are the same

Differences between a Differences between a skewed and normal skewed and normal distribution:distribution:

• The position of the three The position of the three measures of central tendency measures of central tendency in the distribution in the distribution

Left or Negative

Right or Positive

Page 17: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

SkewnessSkewness Skewness RangesSkewness Ranges

If skewness is less than If skewness is less than −1 or greater than +1, −1 or greater than +1, the distribution is highly the distribution is highly skewed. skewed.

If skewness is between −1 If skewness is between −1 and −½ or between +½ and −½ or between +½ and +1, the distribution is and +1, the distribution is moderately skewed. moderately skewed.

If skewness is between If skewness is between −½ and +½, the −½ and +½, the distribution is distribution is approximately symmetric. approximately symmetric.

Page 18: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

KurtosisKurtosis If a distribution is

symmetric, the next question is about the central peak: is it high and sharp, or short and broad

The reference standard is a The reference standard is a

normal distribution, which has a normal distribution, which has a kurtosis of 3. Often the excess kurtosis of 3. Often the excess kurtosis is presented: excess kurtosis is presented: excess kurtosis = kurtosis−3.kurtosis = kurtosis−3.

A normal distribution has A normal distribution has kurtosis exactly 3 (excess kurtosis exactly 3 (excess kurtosis exactly 0). Any kurtosis exactly 0). Any distribution with kurtosis ≈3 distribution with kurtosis ≈3 (excess ≈0) is called (excess ≈0) is called mesokurticmesokurtic..

A distribution with kurtosis <3 A distribution with kurtosis <3

(excess kurtosis <0) is called (excess kurtosis <0) is called platykurticplatykurtic. Compared to a . Compared to a normal distribution, its central normal distribution, its central peak is lower and broader, and peak is lower and broader, and its tails are shorter and thinner. its tails are shorter and thinner.

A distribution with kurtosis >3 A distribution with kurtosis >3 (excess kurtosis >0) is called (excess kurtosis >0) is called leptokurticleptokurtic. Compared to a . Compared to a normal distribution, its central normal distribution, its central peak is higher and sharper, and peak is higher and sharper, and its tails are longer and fatter. its tails are longer and fatter.

kurtosis = 1.8, excess =

−1.2 kurtosis = 4.2, excess = 1.2

kurtosis = 3, excess = 0

Page 19: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Measures of Central Tendency Measures of Central Tendency vs. Measures of Variabilityvs. Measures of Variability

Measures of central tendency provide Measures of central tendency provide useful information, but are limited.useful information, but are limited.

Measures of central tendency provide Measures of central tendency provide insufficient information on the dispersion of insufficient information on the dispersion of scores in a distribution or, in other words, scores in a distribution or, in other words, the variety of the scores in a distribution.the variety of the scores in a distribution.

3 measures of dispersion that researchers 3 measures of dispersion that researchers typically examine: range, variance, and typically examine: range, variance, and standard deviation. Standard deviation is standard deviation. Standard deviation is the most informative and widely used of the the most informative and widely used of the three.three.

Page 20: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

RangeRange

DefinitionDefinition: The range is the difference : The range is the difference between the largest (maximum value) score between the largest (maximum value) score and the smallest score (minimum value) of a and the smallest score (minimum value) of a distributiondistribution

Gives researchers a quick sense of how Gives researchers a quick sense of how spread out the scores of a distribution arespread out the scores of a distribution are

Not practical; misleading at timesNot practical; misleading at times Helps see whether all or most of the points Helps see whether all or most of the points

on a scale, such as a survey, were coveredon a scale, such as a survey, were covered

Page 21: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Interquartile Range Interquartile Range (IQR)(IQR)

DefinitionDefinition: The difference : The difference between the 75th between the 75th percentile (third quartile) percentile (third quartile) and 25th percentile (first and 25th percentile (first quartile) scores in a quartile) scores in a distributiondistribution

IQR contains scores in the IQR contains scores in the two middle quartiles if two middle quartiles if scores in a distribution scores in a distribution were arranged in order were arranged in order numericallynumerically

Page 22: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

VarianceVariance

DefinitionDefinition: The sum of the squared : The sum of the squared deviations divided by the number of cases in deviations divided by the number of cases in the population, or by the number of cases the population, or by the number of cases minus one in the sampleminus one in the sample

Provides a statistical average of the amount Provides a statistical average of the amount of dispersion in a distribution of scoresof dispersion in a distribution of scores

Rarely look at variance by itself because it Rarely look at variance by itself because it does not use the same scales as the original does not use the same scales as the original measure of a variable; although this is true, measure of a variable; although this is true, it is helpful for the calculation of other it is helpful for the calculation of other statistics (i.e., analysis of variance, statistics (i.e., analysis of variance, regression)regression)

Page 23: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Standard DeviationStandard Deviation When combined, the mean and When combined, the mean and

standard deviation provide a standard deviation provide a pretty good picture of what the pretty good picture of what the distribution of scores is likedistribution of scores is like

DefinitionDefinition: The average deviation : The average deviation between the individual scores in between the individual scores in the distribution and the mean for the distribution and the mean for the distributionthe distribution

To understand standard deviation, To understand standard deviation, consider the meanings of the two consider the meanings of the two words:words:

• StandardStandard: typical or average: typical or average • DeviationDeviation: refers to the : refers to the

difference between an difference between an individual score and the individual score and the average score for the average score for the distributiondistribution

Useful statistic; provides handy Useful statistic; provides handy measure of how spread out the measure of how spread out the scores are in the distributionscores are in the distribution

Page 24: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Sample Statistics as Estimates Sample Statistics as Estimates of Population Parametersof Population Parameters

For the most part, researchers are For the most part, researchers are concerned with what a sample tells us about concerned with what a sample tells us about the population from which the sample was the population from which the sample was drawn. This is important because most of drawn. This is important because most of the statistics, although generated from the statistics, although generated from sample data, are used to make inferences sample data, are used to make inferences about the populationabout the population

The formulas for calculating the variance The formulas for calculating the variance and standard deviation of sample data are and standard deviation of sample data are actually designed to make sample statistics actually designed to make sample statistics better better estimatesestimates of the population of the population parameters (i.e., the population variance parameters (i.e., the population variance and standard deviation)and standard deviation)

Page 25: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Making Sense of the Formulas Making Sense of the Formulas

for Calculating the Variancefor Calculating the Variance Not interested in the average Not interested in the average scorescore of the of the

distribution, rather in the average distribution, rather in the average difference,difference, or or deviation,deviation, between each between each score in the distribution and the mean of score in the distribution and the mean of the distribution the distribution

First, calculate a First, calculate a deviation scoredeviation score for each for each individual score in the distributionindividual score in the distribution

See next slide for formulaSee next slide for formula

Page 26: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Variance and Standard Deviation FormulasVariance and Standard Deviation Formulas

PopulationPopulation Estimate Based on SampleEstimate Based on Sample

VarianceVariance

XX

NN

sumsum

a score in the a score in the distributiondistribution

the population meanthe population mean

the number of cases the number of cases in the populationin the population

XX

NN

sumsum

a score in the distributiona score in the distribution

the sample meanthe sample mean

the number of cases in the the number of cases in the sample sample

Standard Standard DeviationDeviation

XX

NN

to sumto sum

a score in the a score in the distributiondistribution

the population meanthe population mean

the number of cases the number of cases in the populationin the population

XX

NN

sumsum

a score in the distributiona score in the distribution

the sample meanthe sample mean

the number of cases in the the number of cases in the samplesample

N

X 2)(

1

)( 22

n

XXs

1

)( 2

n

XXs

X

X

N

X 22 )(

Formulas for Formulas for calculating the calculating the variance and the variance and the standard deviation are standard deviation are virtually identical. virtually identical. Square root in Square root in standard deviation standard deviation formula is only formula is only difference.difference.

Calculating the Calculating the variance is the same variance is the same for both sample and for both sample and population data population data except the except the denominator for the denominator for the sample formula, which sample formula, which is is nn-1-1

Formula for Formula for calculating the calculating the variance is known as variance is known as deviation score deviation score formulaformula

Similarities Between the Variance and Similarities Between the Variance and Standard Deviation FormulasStandard Deviation Formulas

Page 27: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Differences Between the Differences Between the Variance and Standard Variance and Standard

Deviation Formulas: Why Deviation Formulas: Why n n – 1?– 1? Brief explanation:Brief explanation:

• If population mean is unknownIf population mean is unknown, , use the sample mean as an use the sample mean as an estimate. But sample mean estimate. But sample mean probably will differ from the probably will differ from the population mean population mean

• Whenever using Whenever using a number a number other other thanthan the actual mean to the actual mean to calculate the variance, a calculate the variance, a largerlarger variance will be foundvariance will be found. This . This will will be true regardless of whether be true regardless of whether the number used in the formula the number used in the formula is smaller or larger than the is smaller or larger than the actual mean actual mean

• Because the sample mean Because the sample mean usually differs from the usually differs from the population mean, the variance population mean, the variance and standard deviation will and standard deviation will probably be smaller than it probably be smaller than it would have been if used the would have been if used the population meanpopulation mean

• When using the sample mean When using the sample mean to generate an to generate an estimateestimate of the of the population variance or population variance or standard deviation, it will standard deviation, it will actually actually underunderestimate the estimate the size of the population mean size of the population mean

• To adjust underestimation:To adjust underestimation: use use nn – 1 in the – 1 in the

denominator in sample denominator in sample formulasformulas

• Smaller denominators produce Smaller denominators produce larger overall variance and larger overall variance and standard deviation statistics, standard deviation statistics, making it a more accurate making it a more accurate estimate of the population estimate of the population parametersparameters

Page 28: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Working with a Population Working with a Population DistributionDistribution

Researchers usually assume they are Researchers usually assume they are working with a sample that represents a working with a sample that represents a larger population larger population

How much of a difference between using How much of a difference between using NN and and nn-1 in the denominator depends on -1 in the denominator depends on size of samplesize of sample• If sample is large, virtually no differenceIf sample is large, virtually no difference• If sample is small, relatively large If sample is small, relatively large

difference between the results produced difference between the results produced by the population and sample formulasby the population and sample formulas

Page 29: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Why Have Variance?Why Have Variance?

Why not go straight to standard deviation? Why not go straight to standard deviation? • We need to calculate the variance before We need to calculate the variance before

finding the standard deviation. That is finding the standard deviation. That is because we need to because we need to square square the deviation the deviation scores (so they will not sum to zero). scores (so they will not sum to zero). These squared deviations produce the These squared deviations produce the variance. Then we need to take the variance. Then we need to take the square rootsquare root to find the standard deviation. to find the standard deviation.

• The fundamental piece of the variance The fundamental piece of the variance formula, which is the sum of the squared formula, which is the sum of the squared deviations, is used in a number of other deviations, is used in a number of other statistics, most notably analysis of statistics, most notably analysis of variance (ANOVA)variance (ANOVA)

Page 30: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Students’ responses to the item “I would feel really good if I were the Students’ responses to the item “I would feel really good if I were the only one who could answer the teacher’s question in class.”only one who could answer the teacher’s question in class.”

Range does not provide very Range does not provide very much information.much information.

The mean of 2.92 not particularly The mean of 2.92 not particularly informative because from the informative because from the mean it is impossible to mean it is impossible to determine whether:determine whether:

Most students circled a 3 Most students circled a 3 on the scaleon the scale

Roughly equal numbers Roughly equal numbers of students circled each of students circled each of the five numbers on of the five numbers on the response scale the response scale

Almost half of the Almost half of the students circled 1 students circled 1 whereas the other half whereas the other half circled 5 circled 5

Sample Size = 491Sample Size = 491Mean = 2.92Mean = 2.92Standard Deviation = 1.43Standard Deviation = 1.43Variance = (1.43)Variance = (1.43)22 = 2.04 = 2.04Range = 5 – 1 = 4Range = 5 – 1 = 4

115

81

120

77

98

0

20

40

60

80

100

120

140

1 2 3 4 5

Scores on desire to demonstrate ability itemFr

eque

ncy

Page 31: Inferential Statistics Definition: Statistics, derived from sample data, that are used to make inferences about the population from which the sample was.

Drawing Conclusions…Drawing Conclusions… Consider the standard deviation in Consider the standard deviation in

conjunction with the meanconjunction with the mean • Predicting what the size of the Predicting what the size of the

standard deviation will be:standard deviation will be: If almost all of the students circled a If almost all of the students circled a

2 or a 3 on the response scale, 2 or a 3 on the response scale, expect a fairly small standard expect a fairly small standard deviationdeviation

If half of the students circled 1 If half of the students circled 1 whereas the other half circled 5, whereas the other half circled 5, expect a large standard deviation expect a large standard deviation (about 2.0) because each score would (about 2.0) because each score would be about two units away from the be about two units away from the meanmean

If the responses are fairly evenly If the responses are fairly evenly spread out across the five response spread out across the five response categories, expect a moderately sized categories, expect a moderately sized standard deviation (about 1.50)standard deviation (about 1.50)

1.1. Boxplot for the desire to appear able variableBoxplot for the desire to appear able variable 2.2. Presented for the same variable that is Presented for the same variable that is

represented in the previous graph, wanting to represented in the previous graph, wanting to demonstrate abilitydemonstrate ability

Conclusions:Conclusions:• The distribution looks somewhat The distribution looks somewhat

symmetrical due to the mean of symmetrical due to the mean of 2.92 being somewhat in the 2.92 being somewhat in the middle middle

• From the standard deviation of From the standard deviation of 1.43, we know that the scores are 1.43, we know that the scores are pretty well spread out across the pretty well spread out across the five response categoriesfive response categories

6

5

4

3

2

1

0