Top Banner
Data Analysis
46

Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

Jan 11, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

Data Analysis

Page 2: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

In most social research the data analysis involves three major steps, done in roughly this order:

• Cleaning and organizing the data for analysis (Data Preparation) • Describing the data (Descriptive Statistics) • Testing Hypotheses and Models (Inferential Statistics)

Page 3: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

Data Preparation

• involves checking or logging the data in; checking the data for accuracy; entering the data into the computer; transforming the data; and developing and documenting a database structure that integrates the various measures.

Page 4: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

Descriptive Statistics

• Used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data. With descriptive statistics you are simply describing what is, what the data shows.

Page 5: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

Inferential statistics

• investigate questions, models and hypotheses. In many cases, the conclusions from inferential statistics extend beyond the immediate data alone.

• For instance, we use inferential statistics to try to infer from the sample data what the population thinks. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study.

Page 6: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

6

Types of Statistical Analysis

• Univariate Statistical Analysis– Tests of hypotheses involving only one

variable.– Testing of statistical significance

• Bivariate Statistical Analysis– Tests of hypotheses involving two variables.

• Multivariate Statistical Analysis– Statistical analysis involving three or more

variables or sets of variables.

Page 7: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

7

Statistical Analysis: Key Terms

• Hypothesis

– Unproven proposition: a supposition that tentatively explains certain facts or phenomena.

– An assumption about nature of the world.• Null Hypothesis

– No difference in sample and population.• Alternative Hypothesis

– Statement that indicates the opposite of the null hypothesis.

Page 8: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

8

Statistical Analysis: Key Terms

• Hypothesis

– Unproven proposition: a supposition that tentatively explains certain facts or phenomena.

– An assumption about nature of the world.• Null Hypothesis

– No difference in sample and population.• Alternative Hypothesis

– Statement that indicates the opposite of the null hypothesis.

Page 9: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

9

Choosing the Appropriate Statistical Technique

• Choosing the correct statistical technique requires considering:– Type of question to be answered– Number of variables involved– Level of scale measurement

Page 10: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

Univariate analysis

Univariate analysis involves the examination across cases of one variable at a time. There are three major characteristics of a single variable that we tend to look at:– the distribution – the central tendency – the dispersion

In most situations, we would describe all three of these characteristics for each of the variables in our study.

Page 11: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

The Distribution

The distribution is a summary of the frequency of individual values or ranges of values for a variable. The simplest distribution would list every value of a variable and the number of persons who had each value.

Page 12: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

Distributions may also be displayed using percentages. For example, you could use percentages to describe the:

• percentage of people in different income levels • percentage of people in different age ranges • percentage of people in different ranges of standardized test scores

Page 13: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

Central Tendency

The central tendency of a distribution is an estimate of the "center" of a distribution of values. There are three major types of estimates of central tendency:

• Mean • Median • Mode

Page 14: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

15, 20, 21, 20, 36, 15, 25, 15The sum of these 8 values is 167, so

the mean is 167/8 = 20.875.

If we order the 8 scores shown above, we would get:15,15,15,20,20,21,25,36

There are 8 scores and score #4 and #5 represent the halfway point. Since both of these scores are 20,

the median is 20. If the two middle scores had different values, you would have to interpolate to

determine the median.

Page 15: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

To determine the mode, you might again order the scores as shown above, and then count each one. The most frequently occurring value is the mode. In our example, the value 15 occurs three times and is the mode

Page 16: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

Dispersion

Dispersion refers to the spread of the values around the central tendency. There are two common measures of dispersion, the range and the standard deviation.

The range is simply the highest value minus the lowest value. In our example distribution, the high value is 36 and the low is 15, so the range is 36 - 15 = 21.

Page 17: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

The Standard Deviation

is a more accurate and detailed estimate of dispersion because an outlier can greatly exaggerate the range (as was true in this example where the single outlier value of 36 stands apart from the rest of the values. The Standard Deviation shows the relation that set of scores has to the mean of the sample.

Page 18: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

15 - 20.875 = -5.875

20 - 20.875 = -0.87521 - 20.875 = +0.125

20 - 20.875 = -0.87536 - 20.875 = 15.125

15 - 20.875 = -5.87525 - 20.875 = +4.125

15 - 20.875 = -5.875

N 8

Mean 20.8750

Median 20.0000

Mode 15.00

Std. Deviation 7.0799

Variance 50.1250

Range 21.00

Page 19: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

Bivariate analysis

The correlation is one of the most common and most useful statistics.

A correlation is a single number that describes the degree of relationship between two variables.

Let's assume that we want to look at the relationship between two variables, height (in inches) and self esteem.

Page 20: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

Height is measured in inches. Self esteem is measured based on the average of 10 1-to-5 rating items (where higher scores mean higher self esteem)

Person Height

Self Esteem

1 68 4.1

2 71 4.6

3 62 3.8

4 75 4.4

5 58 3.2

6 60 3.1

7 67 3.8

8 68 4.1

9 71 4.3

10 69 3.7

11 68 3.5

12 67 3.2

13 63 3.7

14 62 3.3

15 60 3.4

16 63 4

17 65 4.1

18 67 3.8

19 63 3.4

20 61 3.6

Page 21: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

Variable Mean StDev Variance Sum Minimum

Maximum Range

Height 65.4 4.4057 19.4105 1308 58 75 17

Self Esteem 3.755 0.4261 0.18155 75.1 3.1 4.6 1.5

Page 22: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

Calculating the Correlation

So, the correlation for our twenty cases is .73, which is a fairly strong positive relationship

Page 23: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

Testing the Significance of a Correlation

Once you've computed a correlation, you can determine the probability that the observed correlation occurred by chance. That is, you

can conduct a significance test. Most often you are interested in determining the probability that the correlation is a real one and not a chance occurrence. In this case, you are

testing the mutually exclusive hypotheses:

Null Hypothesis: r = 0

Alternative Hypothesis: r <> 0

Page 24: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

you need to first determine the significance level. Here, use the common significance level of alpha = .05

The df is simply equal to N-2 or, in this example, is 20-2 = 18.

Finally, decide whether you are doing a one-tailed or two-tailed test. In this example, since there is no strong prior theory to suggest whether the relationship between height and self esteem would be positive or negative, we opt for the two-tailed test

With these three pieces of information -- the significance level (alpha = .05)), degrees of freedom (df = 18), and type of test (two-tailed)

Page 25: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

the critical value is .4438. This means that if our correlation is greater than .4438 or less than -.4438 (remember, this is a two-tailed test), we can conclude that the odds are less than 5 out of 100 that this is a chance occurrence. Since the correlation of .73 (higher), we conclude that it is not a chance finding and that the correlation is "statistically significant".

The null hypothesis is rejected and the alternative is accepted

Page 26: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

26

Pearson Product-Moment Correlation Matrix for Salesperson

Page 27: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

Other CorrelationsThe specific type of correlation illustrated here is known as the Pearson Product Moment Correlation. It is appropriate when both variables are measured at an interval level. However there are a wide variety of other types of correlations for other circumstances. for instance, if you have two ordinal variables, you could use the Spearman rank Order Correlation (rho) or the Kendall rank order Correlation (tau). When one measure is a continuous interval level one and the other is dichotomous (i.e., two-category) you can use the Point-Biserial Correlation. For other situations, consulting the web-based statistics selection program, Selecting Statistics at http://trochim.human.cornell.edu/selstat/ssstart.htm.

Page 28: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

28

Regression Analysis

• Simple (Bivariate) Linear Regression

– A measure of linear association that investigates straight-line relationships between a continuous dependent variable and an independent variable that is usually continuous, but can be a categorical dummy variable.

• The Regression Equation (Y = α + βX )– Y = the continuous dependent variable– X = the independent variable– α = the Y intercept (regression line intercepts Y axis)– β = the slope of the coefficient (rise over run)

Page 29: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

29

The Regression Equation• Parameter Estimate Choices

– β is indicative of the strength and direction of the relationship between the independent and dependent variable.

– α (Y intercept) is a fixed point that is considered a constant (how much Y can exist without X)

• Standardized Regression Coefficient (β)– Estimated coefficient of the strength of relationship

between the independent and dependent variables.– Expressed on a standardized scale where higher

absolute values indicate stronger relationships (range is from -1 to 1).

Page 30: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

30

Simple Regression Results Example

Page 31: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

31

What is Multivariate Data Analysis?

• Research that involves three or more variables, or that is concerned with underlying dimensions among multiple variables, will involve multivariate statistical analysis.– Methods analyze multiple variables or even multiple

sets of variables simultaneously.– Business or economic problems involve multivariate

data analysis:• most employee motivation research• customer psychographic profiles• research that seeks to identify viable market segments

Page 32: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

32

Which Multivariate Approach Is Appropriate?

Page 33: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

33

Classifying Multivariate Techniques

• Dependence Techniques– Explain or predict one or more dependent

variables.– Needed when hypotheses involve distinction

between independent and dependent variables.– Types:

• Multiple regression analysis• Multiple discriminant analysis• Multivariate analysis of variance

Page 34: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

34

Classifying Multivariate Techniques (cont’d)

• Interdependence Techniques– Give meaning to a set of variables or seek to

group things together.– Used when researchers examine questions that

do not distinguish between independent and dependent variables.

– Types:• Factor analysis• Cluster analysis• Multidimensional scaling

Page 35: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

35

Classifying Multivariate Techniques (cont’d)

• Influence of Measurement Scales– The nature of the measurement scales will

determine which multivariate technique is appropriate for the data.

– Selection of a multivariate technique requires consideration of the types of measures used for both independent and dependent sets of variables.

– Nominal and ordinal scales are nonmetric.– Interval and ratio scales are metric.

Page 36: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

36

Which Multivariate Dependence Technique Should I Use?

Page 37: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

37

Which Multivariate Interdependence Technique Should I Use?

Page 38: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

38

Interpreting Multiple Regression

• Multiple Regression Analysis

– An analysis of association in which the effects of two or more independent variables on a single, interval-scaled dependent variable are investigated simultaneously.

inni eXbXbXbXbbY 3322110

Dummy variable The way a dichotomous (two group) independent

variable is represented in regression analysis by assigning a 0 to one group and a 1 to the other.

Page 39: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

39

Multiple Regression Analysis• A Simple Example

– Assume that a toy manufacturer wishes to explain store sales (dependent variable) using a sample of stores from Canada and Europe.

– Several hypotheses are offered:

• H1: Competitor’s sales are related negatively to sales.

• H2: Sales are higher in communities with a sales office than when no sales office is present.

• H3: Grammar school enrollment in a community is related positively to sales.

Page 40: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

40

Multiple Regression Analysis (cont’d)

• Regression Coefficients in Multiple Regression– Partial correlation

• The correlation between two variables after taking into account the fact that they are correlated with other variables too.

• R2 in Multiple Regression– The coefficient of multiple determination in multiple

regression indicates the percentage of variation in Y explained by all independent variables.

Page 41: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

41

Interpreting Multiple

Regression Results

Page 42: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

42

ANOVA (n-way) and MANOVA

• Multivariate Analysis of Variance (MANOVA)– A multivariate technique that predicts multiple

continuous dependent variables with multiple categorical independent variables.

Page 43: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

43

ANOVA (n-way) and MANOVA (cont’d)

Interpreting N-way (Univariate) ANOVA

1. Examine overall model F-test result. If significant, proceed.

2. Examine individual F-tests for individual variables.

3. For each significant categorical independent variable, interpret the effect by examining the group means.

4. For each significant, continuous covariate, interpret the parameter estimate (b).

5. For each significant interaction, interpret the means for each combination.

Page 44: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

44

Discriminant Analysis• A statistical technique for predicting the probability

that an object will belong in one of two or more mutually exclusive categories (dependent variable), based on several independent variables.

– To calculate discriminant scores, the linear function used is:

niniii XbXbXbZ 2211

Page 45: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

45

Factor Analysis

• A type of analysis used to discern the underlying dimensions or regularity in phenomena. Its general purpose is to summarize the information contained in a large number of variables into a smaller number of factors.

Page 46: Data Analysis. In most social research the data analysis involves three major steps, done in roughly this order: Cleaning and organizing the data for.

46

Multidimensional Scaling

• Multidimensional Scaling– Measures objects in multidimensional space

on the basis of respondents’ judgments of the similarity of objects.