Research Methods Chapter 8 Data Analysis. Two Types of Statistics Descriptive –Allows you to describe relationships between variables Inferential –Allows.

Research MethodsResearch Methods

Chapter 8

Data Analysis

Two Types of Statistics

• Descriptive– Allows you to describe relationships between

variables

• Inferential– Allows one to test hypotheses & see if results are

generalizable

Descriptive Statistics

• Often begins with univariate analysis– Displays the variation of a variable– Several ways to display variation

• Bar Chart, Frequency Polygram, Histogram, etc.

Rates of Church Affiliation, U.S., 1776-1995Rates of Church Affiliation, U.S., 1776-1995

0

10

20

30

40

50

60

70

1776 1850 1890 1916 1952 19951860 1870 1906 1926 1980

Pe

rce

nt

of

Ch

urc

h M

em

be

rsh

ip

Year

Frequency Polygon

– 3 features of the shape of variation are important:• Central Tendency: The most common value or the value

around which cases tend to center around– a.k.a averages like mean, median, mode

• Variability: the degree to which cases are spread out or clustered together

• Skewness– The extent to which cases are clustered more at one or the

other end of a distribution

» Can be either non, positive, or negative

Negative Skew: Test to Easy

Freq.

0 Score 100

Positive Skew: Test to Hard

Freq.

0 Score 100

Frequency Distribution of Voting in 1992 Presidential Election

Value Frequency Valid Percent

Voted 1,909 71.5%

Did not vote 762 28.5

Not eligible 183 ---

Refused 10 ---

Don’t know 38 ---

No answer 2 ---

Total 2,904 100.0%

Ungroup and Grouped Age Distributions

Ungrouped Grouped

Age Percent Age Percent

18 0.2% 18-19 1.4

19 1.2 20-29 19.0

20 1.4 30-39 24.0

21 1.3 40-49 21.5

And so on…...

Calculating The Mean

X = The Sum of Scores / # of Scores

• So if you had the following test scores (5, 10, 15, 10, 5, 10, 5, 15, 15, 10)

• What would be the mean?

• Answer: 10! (100/10)

Calculating the Mode

• Mode = The most frequent value in a distribution

• So if you had the following test scores: (10, 5, 10, 15, 10, 10, 5, 10, 5, 15, 15, 10)

• What would be the mean?

• Answer: 10! (There are more 10’s than any other number)

Calculating the Median

• Median = The value in the middle of a distribution

• Example: (22, 25, 34, 35, 41, 41, 46, 46, 46, 47, 49, 54, 54, 59, 60)

• Several Steps to calculate the Median– Arrange all observations in order of size, from

smallest to largest

– Determine the number of values in the distribution (N)

• N in this case = 15

– Plug N into the following formula• (N+1)/2 = (15+1)/2 = 16/2= 8

– If you get a whole number (in this case you got an “8”) then count up that number in the distribution

• (22, 25, 34, 35, 41, 41, 46, 46, 46, 47, 49, 54, 54, 59, 60)

• Thus, the median is “46”

• If you don’t get a whole number then you have to add a step

• Example: 8, 13, 14, 16, 23, 26, 28, 33, 39, 61

• Find the N (In this case, the N is “10”

• (N + 1)/2 = (10+1)/2 = 5.5.

• Thus, counting up 5.5 gets you to the point between “23” & “26”

• The extra step….• (N1 + N2)/2 = (23 + 26)/2 = 49/2 = 24.5• Thus, the Median in this case is 24.5

Determine the Mean, Median and Mode

• 2, 2, 2, 2, 2

• 1,2,2,2,5,5,10,10,15,25

• 17, 18, 9, 9, 5

• 7, 7, 14, 3, 11, 27, 498

• 11, 67, 43, 2, 2, 2, 6

Answers

• 2, 2, 2, 2, 2– Mean = 10/5 = 2– Median =(5 + 1)/2 = 6/2 = 3 Then: count up 3 spaces

to get to “2”– Mode = 2

• 1,2,2,2,5,5,10,10,15,25– Mean = 77/10 = 7.7– Median = (10 + 1)/2 = 11/2 =5.5 Then: (5 + 5)/2 =

10/2= 5– Mode = 2

• 17, 18, 9, 9, 5– Mean = 58/5 = 11.6– Median = (5 + 1)/2 = 3 Then: = 9– Mode = 9

• 7, 7, 14, 3, 11, 27, 498– Mean = 567/7 = 81– Median =(7 + 1)/2 = 4 Then: = 11– Mode = 7

• 11, 67, 43, 2, 2, 2, 6– Mean = 133/7 = 19– Median = (7 + 1)/2 = 4 Then: = 6– Mode = 2

Suppose You Had the Following

1 person making $45,000


2 People making $10,000

1 Person making $5,700

3 people making $5,000




What did you Get?

• Mean = – $142,500 / 25 = $5,700

• Median = – $3,000 (there are 12 above you and 12 below you

• Mode = – $2,000 (occurs the most frequently)

Mean Vs. Median Vs. Mode

• Generally use the mean for interval or ratio levels of measurement– E.g. Fahrenheit temperatures, Age, Income

• Look at shape of distribution first, however– If there are lot’s of outliers, the median might be

preferable• Income if including Bill Gates

• Use the mode for nominal levels of measurement– Gender

Measures of Variation

• Central tendency (mean, median, mode) although valuable, only shows us a small piece of the picture– Relying only on central tendency may give us an

incomplete and misleading picture• Three towns may have the same mean and median income

but be very different in social character– One may be mostly middle class with a few rich and many poor

– One may have an euqal number of rich, middle class, & poor

• Looking at measures of variation can help us see past the limitations of central tendency

The Four Popular Measures of Variation

1 Range– Calculated by taking the highest value in a

distribution and subtracting the lowest value, and then adding 1

– Shows us the range of possible values that may be encountered

– Weakness: The range can be drastically altered by just one exceptionally high or low value (known as an “outlier”).

2 Interquartile Range– Avoids the problem created by outliers– Quartiles are the points in a distribution

corresponding to the first 25%, the first 50%, and the first 75% of the cases.

• The second quartile (50%) is the median

3 Variance– The average of the squared deviations from the

mean

Variance __ __

X X-X (X - X)2 X2

3 -6 36 9

4 -5 25 16

6 -3 9 36

12 3 9 144

20 11 121 400

Total 200 605

__

X = 9

4 Standard Deviation– Gives an “average distance” between all scores and

the mean– Calculated by squaring the variance

Crosstabulation

Family Income

$17,500- $35,000-

Voting <$17,500 $34,999 $59,999 $60,000+

Voted 60% 73% 75% 84%

Did not 40% 27% 25% 16%

Total 100% 100% 100% 100%

(n) (424) (550) (541) (433)

Crosstabulating Variables

• Crosstabulations reveal 4 aspects of the association between 2 variables:– Existence: is there a correlation?– Strength: How strong does the correlation appear

to be?– Direction: Positive or negative correlation?– Pattern: Are changes in the percentage

distribution of the dependent variable fairly regular (simply increasing or decreasing), or do they vary?

Evaluating Association

• Inferential Stats are used to determine the likelihood that an association exists in the larger pop. From which the sample is drawn

• Thus, researchers often calculate probability levels that determine the probability of chance– E.g. p<.05 means that the probability that the

association is due to chance is less than 5 out of 100, or 5%

• Generally looking for at least .05, but some want .01 or .001

Controlling for a Third Variable

• Associations, however, do not necessary mean causation

• Use elaboration analysis to determine whether an association is due to a causal relationship or to another variable

• Three types…. Intervening, extraneous, and specification...

Intervening Variables

IncomePerceived Efficacy

Voting

Extraneous Variables

Income Voting

Education

Findings• The 3 criteria

– Time Order• Asked the following questions:

– How long have they been attended church? Used only those who had attended for over a year or more

– Eight questions about their deviant acts WITHIN THE PAST YEAR!!– Correlation

• The data indicated a correlation between the two variables (church attendance and delinquency)

– Spuriousness• Could another variable be the determining factor for delinquency

instead of church attendance? (Elaboration Analysis)– Race– School– Grade– Gender

Findings

• The hypothesis was not supported!

• The correlation between church attendance and delinquency is spurious– The third variable of gender appears to be an

extraneous variable

Research Methods Chapter 8 Data Analysis. Two Types of Statistics Descriptive –Allows you to describe relationships between variables Inferential –Allows.

Documents