Research Methods Chapter 8 Data Analysis
Research MethodsResearch Methods
Chapter 8
Data Analysis
Two Types of Statistics
• Descriptive– Allows you to describe relationships between
variables
• Inferential– Allows one to test hypotheses & see if results are
generalizable
Descriptive Statistics
• Often begins with univariate analysis– Displays the variation of a variable– Several ways to display variation
• Bar Chart, Frequency Polygram, Histogram, etc.
Rates of Church Affiliation, U.S., 1776-1995Rates of Church Affiliation, U.S., 1776-1995
0
10
20
30
40
50
60
70
1776 1850 1890 1916 1952 19951860 1870 1906 1926 1980
Pe
rce
nt
of
Ch
urc
h M
em
be
rsh
ip
Year
Frequency Polygon
– 3 features of the shape of variation are important:• Central Tendency: The most common value or the value
around which cases tend to center around– a.k.a averages like mean, median, mode
• Variability: the degree to which cases are spread out or clustered together
• Skewness– The extent to which cases are clustered more at one or the
other end of a distribution
» Can be either non, positive, or negative
Negative Skew: Test to Easy
Freq.
0 Score 100
Positive Skew: Test to Hard
Freq.
0 Score 100
Frequency Distribution of Voting in 1992 Presidential Election
Value Frequency Valid Percent
Voted 1,909 71.5%
Did not vote 762 28.5
Not eligible 183 ---
Refused 10 ---
Don’t know 38 ---
No answer 2 ---
Total 2,904 100.0%
Ungroup and Grouped Age Distributions
Ungrouped Grouped
Age Percent Age Percent
18 0.2% 18-19 1.4
19 1.2 20-29 19.0
20 1.4 30-39 24.0
21 1.3 40-49 21.5
And so on…...
Calculating The Mean
X = The Sum of Scores / # of Scores
• So if you had the following test scores (5, 10, 15, 10, 5, 10, 5, 15, 15, 10)
• What would be the mean?
• Answer: 10! (100/10)
Calculating the Mode
• Mode = The most frequent value in a distribution
• So if you had the following test scores: (10, 5, 10, 15, 10, 10, 5, 10, 5, 15, 15, 10)
• What would be the mean?
• Answer: 10! (There are more 10’s than any other number)
Calculating the Median
• Median = The value in the middle of a distribution
• Example: (22, 25, 34, 35, 41, 41, 46, 46, 46, 47, 49, 54, 54, 59, 60)
• Several Steps to calculate the Median– Arrange all observations in order of size, from
smallest to largest
– Determine the number of values in the distribution (N)
• N in this case = 15
– Plug N into the following formula• (N+1)/2 = (15+1)/2 = 16/2= 8
– If you get a whole number (in this case you got an “8”) then count up that number in the distribution
• (22, 25, 34, 35, 41, 41, 46, 46, 46, 47, 49, 54, 54, 59, 60)
• Thus, the median is “46”
• If you don’t get a whole number then you have to add a step
• Example: 8, 13, 14, 16, 23, 26, 28, 33, 39, 61
• Find the N (In this case, the N is “10”
• (N + 1)/2 = (10+1)/2 = 5.5.
• Thus, counting up 5.5 gets you to the point between “23” & “26”
• The extra step….• (N1 + N2)/2 = (23 + 26)/2 = 49/2 = 24.5• Thus, the Median in this case is 24.5
Determine the Mean, Median and Mode
• 2, 2, 2, 2, 2
• 1,2,2,2,5,5,10,10,15,25
• 17, 18, 9, 9, 5
• 7, 7, 14, 3, 11, 27, 498
• 11, 67, 43, 2, 2, 2, 6
Answers
• 2, 2, 2, 2, 2– Mean = 10/5 = 2– Median =(5 + 1)/2 = 6/2 = 3 Then: count up 3 spaces
to get to “2”– Mode = 2
• 1,2,2,2,5,5,10,10,15,25– Mean = 77/10 = 7.7– Median = (10 + 1)/2 = 11/2 =5.5 Then: (5 + 5)/2 =
10/2= 5– Mode = 2
• 17, 18, 9, 9, 5– Mean = 58/5 = 11.6– Median = (5 + 1)/2 = 3 Then: = 9– Mode = 9
• 7, 7, 14, 3, 11, 27, 498– Mean = 567/7 = 81– Median =(7 + 1)/2 = 4 Then: = 11– Mode = 7
• 11, 67, 43, 2, 2, 2, 6– Mean = 133/7 = 19– Median = (7 + 1)/2 = 4 Then: = 6– Mode = 2
Suppose You Had the Following
1 person making $45,000
1 person making $15,000
2 People making $10,000
1 Person making $5,700
3 people making $5,000
4 people making $3,700
1 person making $3,000
12 people making $2,000
What did you Get?
• Mean = – $142,500 / 25 = $5,700
• Median = – $3,000 (there are 12 above you and 12 below you
• Mode = – $2,000 (occurs the most frequently)
Mean Vs. Median Vs. Mode
• Generally use the mean for interval or ratio levels of measurement– E.g. Fahrenheit temperatures, Age, Income
• Look at shape of distribution first, however– If there are lot’s of outliers, the median might be
preferable• Income if including Bill Gates
• Use the mode for nominal levels of measurement– Gender
Measures of Variation
• Central tendency (mean, median, mode) although valuable, only shows us a small piece of the picture– Relying only on central tendency may give us an
incomplete and misleading picture• Three towns may have the same mean and median income
but be very different in social character– One may be mostly middle class with a few rich and many poor
– One may have an euqal number of rich, middle class, & poor
• Looking at measures of variation can help us see past the limitations of central tendency
The Four Popular Measures of Variation
1 Range– Calculated by taking the highest value in a
distribution and subtracting the lowest value, and then adding 1
– Shows us the range of possible values that may be encountered
– Weakness: The range can be drastically altered by just one exceptionally high or low value (known as an “outlier”).
2 Interquartile Range– Avoids the problem created by outliers– Quartiles are the points in a distribution
corresponding to the first 25%, the first 50%, and the first 75% of the cases.
• The second quartile (50%) is the median
3 Variance– The average of the squared deviations from the
mean
Variance __ __
X X-X (X - X)2 X2
3 -6 36 9
4 -5 25 16
6 -3 9 36
12 3 9 144
20 11 121 400
Total 200 605
__
X = 9
4 Standard Deviation– Gives an “average distance” between all scores and
the mean– Calculated by squaring the variance
Crosstabulation
Family Income
$17,500- $35,000-
Voting <$17,500 $34,999 $59,999 $60,000+
Voted 60% 73% 75% 84%
Did not 40% 27% 25% 16%
Total 100% 100% 100% 100%
(n) (424) (550) (541) (433)
Crosstabulating Variables
• Crosstabulations reveal 4 aspects of the association between 2 variables:– Existence: is there a correlation?– Strength: How strong does the correlation appear
to be?– Direction: Positive or negative correlation?– Pattern: Are changes in the percentage
distribution of the dependent variable fairly regular (simply increasing or decreasing), or do they vary?
Evaluating Association
• Inferential Stats are used to determine the likelihood that an association exists in the larger pop. From which the sample is drawn
• Thus, researchers often calculate probability levels that determine the probability of chance– E.g. p<.05 means that the probability that the
association is due to chance is less than 5 out of 100, or 5%
• Generally looking for at least .05, but some want .01 or .001
Controlling for a Third Variable
• Associations, however, do not necessary mean causation
• Use elaboration analysis to determine whether an association is due to a causal relationship or to another variable
• Three types…. Intervening, extraneous, and specification...
Intervening Variables
IncomePerceived Efficacy
Voting
Extraneous Variables
Income Voting
Education
Findings• The 3 criteria
– Time Order• Asked the following questions:
– How long have they been attended church? Used only those who had attended for over a year or more
– Eight questions about their deviant acts WITHIN THE PAST YEAR!!– Correlation
• The data indicated a correlation between the two variables (church attendance and delinquency)
– Spuriousness• Could another variable be the determining factor for delinquency
instead of church attendance? (Elaboration Analysis)– Race– School– Grade– Gender
Findings
• The hypothesis was not supported!
• The correlation between church attendance and delinquency is spurious– The third variable of gender appears to be an
extraneous variable