Basic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor Social Work Program January 30, 2012 Reproduction of any part of the guidelines is not permitted without the author’s permission. August, 2008
34
Embed
Data Analysis Guidelines - University of Mary … · Web viewBasic Data Analysis Guidelines for Research Students Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW University of Mary Hardin-Baylor
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Basic Data Analysis Guidelines for Research Students
Isaac V. Gusukuma, Ph.D., LMSW-IPR, ACSW
University of Mary Hardin-BaylorSocial Work Program
January 30, 2012
Reproduction of any part of the guidelines is not permitted without the author’s permission.August, 2008
Example of SPSS pie graph with percentages for the variable Religious Preference:
Brief Interpretation of an Analysis of the Variable Religious Preference Using the Mode
The 1,486 respondents in this survey most often reported they were of a Protestant faith followed by those reporting they were of the Catholic faith.
Brief Interpretation of an Analysis of the Variable Religious Preference Using Percentages
12
Of the 1,486 total respondents, 59.6% reported they were Protestant, followed by those reporting they were Catholic (24.7%) and Jewish (1.7%), while 9.8% reported they had no religious preference, 3.5% noted they had another religious preference, and 0.6% were “missing,” meaning they did not respond to the question.
Brief Interpretation of an Analysis of the Variable Religious Preference Using a Ratio
Slightly less than three of every five respondents reported they were of the Protestant faith.
Analysis of an Ordinal Level Variable
An ordinal variable is a categorical variable in which there is some inherent rank, hierarchy,
or order to the categories. The concept of “rank” in this instance does not imply that respondents
in a higher category are in some way better than other respondents. Instead, hierarchy or rank
means that the established categories allow the respondents to be arranged along some dimension
or in some order. Common examples of ordinal level variables include Economic Status (Low,
Middle, High), Class Standing (Senior, Junior, Sophomore, Freshman), and attitudinal variables,
such as Satisfaction with Services (High, Medium, Low).
The following statistics may be appropriate for ordinal variables/data:o Frequencies (mode, median)o Percentageso Quartiles
Example of a survey question and SPSS output for an ordinal level variable
Example of a survey question and ordinal response categories:
Brief Interpretation of an Analysis of the variable Family Income
Though the annual family income most often reported was between $40,000 and $49,999, the median annual family income for the 1,405 valid respondents was between $30,000 and $34,999. Twenty-five percent of the families reported an annual income of less than $17,500 while the upper 25% reported an annual income of more than $50,000.
Analysis of an Interval/Ratio Level Variable
Numeric values represent various income groups. See the next
Table.
14
Unlike nominal and ordinal variables that are categorical, interval and ratio level variables
are numeric or scaled variables. For these variables, the numbers are ordered, ranked, and the
distance between the numbers is the same for all numbers (i.e., $5.00 is higher than $4.00 by the
same amount as $99.00 is higher than $98.00). Interval variables are like ratio variables except
interval variables do not have a true zero, meaning a value of zero does not really mean the
absence of the characteristic and the distance between units of measurement of interval variables
are not proportional. For example, age is a ratio level variable because the age of zero means the
person is not yet born and someone 20 years of age is twice that of another who is age 10. IQ
score is an interval level variable because an IQ of 100 does not mean a person has twice the
intelligence of a person with an IQ of 50. Statistically, however, interval and ratio level data are
treated the same way. Variables often used in social research that are interval/ratio level include
number of children in a family, number of therapy or counseling sessions, number of times
married, and number of days hospitalized.
The following statistics may be appropriate for interval/ratio variables/data:o Frequencies (mode, median, mean)o Quartileso Standard deviation
Example of a survey question and SPSS output for an interval level variable
1. How old were you when you were first married?____ Years of age
Brief Interpretation of an Analysis of Age When First Married
15
When asked their age when they were first married, 590 of 1,486 respondents had a valid response. The average age of first marriage was 22.64 years (sd = 4.710), while the median age of first marriage was 22 years. The most common or frequently reported age of first marriage was 21 years of age. The youngest age reported of first marriage was 13 years and the oldest was 57 years of age. The lower twenty-five percent of the respondents noted that they first married by age 19 while the upper twenty-five percent reported they were married at or older than the age of 25 years.
While univariate analysis of data is an important and helpful procedure to describe a variable,
even more information about the data can be gathered by conducting bivariate data analysis. The
next section presents a discussion on the more common types of bivariate data analysis
procedures.
16
Bivariate (2 variables) Data Analysis
The more common bivariate statistical analysis procedures (ones you will most likely use)
include the chi square, t-test, analysis of variance (ANOVA), and Pearson’s r (correlation). Each
procedure is discussed in the sections below and include the assumptions for each statistical
procedure, conditions for selecting a particular statistical procedure, and an SPSS example with a
brief statement describing the analysis of the SPSS output.
Chi Square (Goodness of Fit) Test
The chi square is a nonparametric test for the bivariate analysis of two nominal level
variables. When conducting the chi square test, the data is often displayed as a cross tabulation
(crosstab) or contingency table. The chi square is actually the name of the test statistic used to
determine if there is a significant relationship between the two nominal variables. The specific
statistical procedure is discussed in most statistical textbooks. In SPSS the crosstabs procedure
may also be used to determine the association between two ordinal level variables and
nominal/interval level variables though doing so requires specific and special data analysis
procedures. Consult with your professor or a statistician if you think you will need to conduct an
analysis of ordinal or interval level variables using the crosstabs.
Assumptions of the chi square: Probability sampling design 80% or more of the cells in your contingency table should have an expected cell
frequency of 5 or greater Observations are independent, meaning you should not use the chi square test for
matched pairs. You should apply Yates correction factor for 2x2 contingency tables and the cell
frequencies are 5 or more, but less than 10. You should apply Fisher’s exact test when the sample size for a 2x2 contingency
table is 20 or less.
SPSS procedures to request in the Crosstabs dialogue box (see Appendix 1 Crosstab and Chi
Square SPSS Screens):
Click the “Statistics” button and check the “Chi square” box in the upper left corner Check the appropriate measure of association box (most likely “Phi and Cramer’s V) Click the “Cells” button and check the row and/or column and/or total percentages
(based on how you prefer to look at/analyze the table) box For Residuals, standardized residuals are recommended. Cells with standardized
residual values of greater than +1.0 may reveal the cell that contributes to a significant chi square test.
17
Measures of association for 2 nominal variables: Phi - for 2x2 tables Cramer’s V for other tables (2x3, 3x3, etc.)
Example 1 - Chi square test
The contingency table and data analysis examines the relationship between general happiness
and marital status of the respondents. Examples of possible survey questions are noted below.
1. What is your current marital status?___1 Married ___2 Widowed ___3 Divorced ___4 Separated ____5 Never married
2. What is your current level of general happiness with life?___ Very happy ___ Pretty happy ___ Not too happy
SPSS Output of a Contingency Table and Chi Square Test
GENERAL HAPPINESS by MARITAL STATUS (N=494)MARITAL STATUS Total
GENERAL HAPPINESS MARRIED WIDOWED DIVORCED SEPARATED NEVER
MARRIED
VERY HAPPY
Count% within Marital
StatusStd. Residual
8133.5%
3.1
816.3%
-1.1
1216.0%
-1.4
19.1%
-1.0
1512.8%
-2.4
11723.7%
PRETTY HAPPY
Count% within Marital
StatusStd. Residual
14359.1%
-1.1
3367.3%
.2
5269.3%
.5
872.7%
.3
8370.9%
.9
31964.6%
NOT TOO HAPPY
Count% within Marital
StatusStd. Residual
187.4%
-2.0
816.3%
.9
1114.7%
.7
218.2%
.6
1916.2%
1.4
5811.7%
Total Count%
242100.0%
49100.0%
75100.0%
11100.0%
117100.0%
494100.0%
Chi-Square Tests
Value df Asymp. Sig. (2-sided)
Pearson Chi-Square 29.537 8 .000Likelihood Ratio 30.445 8 .000Linear-by-Linear
Association22.369 1 .000
N of Valid Cases 494a. 2 cells (13.3%) have expected count less than 5. The minimum expected count is 1.29.
Standardized residuals > +1.0
Level of significance (p) is < .05
Number of cells (%) that have an expected frequency of less than 5
Chi square value and degrees of freedom (df)
18
Symmetric MeasuresValue Approx. Sig.
Nominal by Nominal Phi .245 .000Cramer's V .173 .000
N of Valid Cases 494a. Not assuming the null hypothesis.b. Using the asymptotic standard error assuming the null hypothesis.
Brief Interpretation of the Analysis of General Happiness with the Marital Status of the Respondents
There is a significant (X 2 = 29.537, df = 8, p < .01) but weak association (Cramer’s V = .173) between one’s level of general happiness and marital status. Persons who are married are significantly more likely to report they are very happy while persons who have never married are more likely to report they are not too happy.
Example 2 - Chi square test
The table and data analysis examines the relationship between favoring or opposing the death
penalty for the crime of murder and gender of the respondent. Possible survey questions are also
provided below.
1. What is your sex?___1 Male ___2 Female
2. Do you favor or oppose the death penalty for the crime of murder?___1 Favor ___2 Oppose
SPSS Output for the a Cross tabulation and Chi square test
FAVOR OR OPPOSE THE DEATH PENALTY FOR MURDER by SEX SEX Total
Likelihood Ratio 5.385 1 .020Fisher's Exact Test .025 .013
Linear-by-Linear Association
5.291 1 .021
N of Valid Cases 558a. Computed only for a 2x2 tableb. 0 cells (.0%) have expected count less than 5. The minimum expected count is 55.31.
Symmetric MeasuresValue Approx. Sig.
Nominal by Nominal Phi .097 .021Cramer's V .097 .021
N of Valid Cases 558a. Not assuming the null hypothesis.b. Using the asymptotic standard error assuming the null hypothesis.
Brief Interpretation of the Analysis of Attitude Toward the Death Penalty for Murder with Respondent’s Sex
There is a significant (X 2 = 5.301, df = 1, p = .021) but weak association (Phi = .097) between a person favoring or opposing the death penalty for the crime of murder and the person’s sex. Women are significantly more likely to oppose the death penalty for the crime of murder than are men.
Statistics is like grout - The word feels decidedly unpleasant in the mouth, but it describes something
essential for holding a mosaic in place.- Ramsey & Schafer -
20
Appendix 1 Frequencies SPSS Screens
Highlight and move the variables from the variable list to the “Variable(s)” box by clicking the arrowhead.
21
Click “OK” button to run Frequencies. Output for the first variable, “quality” is noted below..
quality Quality of Svc
Frequency Percent Valid Percent
Cumulative
Percent
Valid 1 Poor 1 .9 .9 .9
2 Fair 2 1.9 1.9 2.8
3 Good 17 15.9 15.9 18.7
4 Excellent 87 81.3 81.3 100.0
Total 107 100.0 100.0
Variable Name Variable Label
Values
22
Appendix 2 Crosstab and Chi Square SPSS Screens
Click “Statistics” to get the “Crosstabs: Statistics” dialogue box
23
Click “Cells” to get the “Crosstabs: Cell Display” dialogue box
24
References
Holcomb, Z. C. (2006). SPSS basics: Techniques for a first course in statistics. Glendale, CA:
Pyrczak Publishing.
Kachigan, S. K. (1986). Statistical analysis: An interdisciplinary introduction to univariate and
multivariate methods. New York: Radius Press.
Keller, G. (2001). Applied statistics with Microsoft® Excel. Pacific Grove, CA: Duxbury.
Norusis, M. J. (2011). IBM SPSS Statistics 19 guide to data analysis. Upper Saddle River, NJ:
Prentice Hall.
Ramsey, F. L., & Schafer, W. (2002). The statistical sleuth: A course in methods of data analysis
(2nd ed.). Belmont CA: Duxbury Press.
Rubin, A., & Babbie, E. (2008). Research methods for social work (7th ed.). Belmont, CA: