Spss Presentation - Copy

8/8/2019 Spss Presentation - Copy

http://slidepdf.com/reader/full/spss-presentation-copy 1/27

MULTIVARIATE ANALYSIS

SPSS OPERATION AND

APPLICATION

STUDENT NAME: DENIZ YILMAZ

STUDENT NUMBER: M0987107

1



REGRESSION ANALYSIS

CONTENTS

1

2

3

4

5

INTRODUCTION

RELIABILITY ANALYSIS

CORRELATIONS

COMPARE MEANS

GENERAL LINEAR MODEL

6

.

FACTOR ANALYSIS

72



1.INTRODUCTION

In this study I gave 70 soccer players as a data andtested their name, nationality, incomes, marital

status, weight, height, performance, goal, age,

nationality, red card, yellow card, disabilities, in 12

variables .

It allows for a great deal of flexibility in the data format

It provides the user with a comprehensive set of procedures for

data transformation and file manipulation

It offers the researcher a large number of statistical analyses

processes commonly used in social sciences.

3



Reliability is the correlation of an item, scale, or

instrument with a hypothetical one which truly

measures what it is supposed to. Since the true

instrument is not available, reliability is estimated in

one of four ways:

Internal consistency: Cronbach's alpha

Split-half reliability: The Spearman-Brown coefficientTest-retest reliability: The Spearman Brown coefficient

Inter-rater reliability: Intraclass correlation, of which

there are six types

4

2.RELIABILITY ANALYSIS



Correlations, Nonparametic CorrelationsThere are two types of correlations: bivariate and partial. Bivariatecorrelation is a correlation between two variables. Partial correlation looksat the relationship between two variables while controlling the effect of oneor more additional variables. Pearsons·s product-moment correlationcoefficient and Spearman·s rho are examples of bivariate correlationcoefficient.

Pearsons Correlation Coefficient Pearson correlation requires only that data are interval for it to be anaccurate measure of the linear relationship between two variables.

Partial Correlation

Partial correlation is used to examine the relationship between two

variables while controlling for the effects of one or more additionalvariables. the sign of the coefficient informs us of the direction of therelationship and the size of the coefficient describes the strength of therelationship

3.CORRELATIONS

5



CORRELATIONS

6

Pearson correlation requires only that data are interval for it to be an

accurate measure of the linear relationship between two variables.Figure provides a matrix of the correlation coefficient for the three variables.

Incomes is negatively related to goals. The output also show that age is

positively related to the amount of incomes. Finally goals appears to be

negatively related to the age .



One Sample T-Test

One sample t-test is a statistical procedure that is used to know the mean

difference between the sample and the known value of the population

mean. In one sample t-test, we know the population mean. We draw a

random sample from the population and then compare the sample mean

with the population mean and make a statistical decision as to whether or

not the sample mean is different from the population.

4. COMPARE MEANS

s. In our table population mean of AGE is 1.87 if we look at the significance

value, we can see,it is less than the predetermined significance level, then we

can reject the null hypothesis and conclude that the population mean and the

sample mean are statistically different. If the calculated value is greater than

the predetermined significance level, than we can accept the null hypothesis

and conclude that the mean of the population and sample are statistically

different...

7



COMPARE MEANS

Independent T-Test The Independent Samples T Test compares the mean scores of two groups on a given

variable.

Hypotheses:

Null: The means of the two groups are not significantly different.

Alternate: The means of the two groups are significantly different.

We have a closer look to GOALS:

In here I want to compare the mean goals of soccer players who are the age between 30-40 and under 30 years old.

In independent sample test we see the Levene's Test for Equality of Variances. This tells us if we have met our second

assumption (the two groups have approximately equal variance on the dependent variable). If the Levene's Test is

significant (the value under "Sig." is less than .05), the two variances are significantly different. If it is not significant

(Sig. is greater than .05), the two variances are not significantly different; that is, the two variances are approximately

equal. Here, we see that the significance is .985, which is greater than .05. We can assume that the variances are

approximately equal. As the t value of 0.985 with 47 degrees of freedom is not significant (greater then our 0.05

significance level) we fail to reject the null hypothesis:

t (47) =-0.595; p =0.555, NS

8



Paired T-Test

Paired sample t-test is a statistical technique that is used to compare

two population means in the case of two samples that are

correlated.

Hypothesis:

Null hypothesis

Alternative hypothesis

The level of significance:

In paired sample t-test, after making the hypothesis, we choose the

level of significance. In most of the cases in the paired sample t-test,significance level is 5%.

COMPARE MEANS

9



One Way ANOV A The One-Way ANOVA compares the mean of one or more groups based on

one independent variable (or factor).

Assumptions: The two groups have approximately equal variance on the

dependent variable. We can check this by looking at the Levene's Test.

Hypotheses:Null: There are no significant differences between the groups' mean scores.

Alternate: There is a significant difference between the groups' mean scores

In one way ANOVA:

First, we are looking the Descriptive Statistics

Next we see the results of the Levene's Test of Homogeneity of Variance

Lastly, we see the results of our One-Way ANOV A

COMPARE MEANS

10



One Way ANOV A Post-Hoc Comparisons :We can look at the results of the Post-Hoc

Comparisons to see exactly which pairs of groups are significantly

different. There are three parts in post-hoc tests: Tukey·s test , Scheffe and

LSD results

Homogenous subsets: (The Tukey range test) gives information similar

with post hoc tests, but in a different format.The important point is whether

(Sig. is greater than 0.05) or(Sig. is less than 0.05).

Mean plots : are used to see if the mean varies between different

groups of the data.

COMPARE MEANS

11



General Linear Model

The general linear model can be seen as an extension of linear multiple

regression for a single dependent variable, and understanding the multiple

regression model is fundamental to understanding the general linear model.

The general purpose of multiple regression (the term was first used by

Pearson, 1908) is to quantify the relationship between several independent

or predictor variables and a dependent or criterion variable.

General Linear Model menu includes :

Univariate GLM

Multivariate GLM

Repeated MeasuresVariance Components

5. GENERAL LINEAR MODEL(GLM)

12



GENERAL LINEAR MODEL(GLM)

UnivariateGLMUnivariate GLM is the general linear model now often used to implement

such long-established statistical procedures as regression and members of

the ANOVA family.

The Between-Subjects Factors information table in Figure is an example

of GLMs output. This table displays any value labels defined for levels of the

between-subjects factors. In this table, we see that GOALS= 1 ,2 and 3correspond to (under 100, between 100-150 and over 150 goals)

respectively

13



GENERAL LINEAR MODEL(GLM) UnivariateGLM

Tests of Between-Subjects Effects :Type III in Figure SS shows the sums

of squares and other statistics differ for most effects. The ANOVA table in

Figure demonstrates the PERFORMANCE by GOALS interaction effect is

not significant at p = 0 .815

14




MultivariateGLMMultivariate GLM is often used to implement two long-established

statistical procedures - MANOVA and MANCOVA.

Tests of Between-Subjects Effects :(Test of overall model significance) The overall

F test appears, illustrated below, in the "Corrected Modelµ and answers the question,

"Is the model significant for each dependent?´There will be an F significance level for

each dependent. That is, the F test tests the null hypothesis that there is no difference

in the means of each dependent variable for the different groups formed by

categories of the independent variables. For the example below, the multivariate

GLM is found to be not significant for all three dependent variables.

15



Multivariate

GLM

Between-Subjects SSCP Matrix: contains the sums of squares attributable

to model effects. These values are used in estimates of effect size.

Multivariate Tests (Test of individual effects overall ) : in contrast to the overall F

test, answer the question, "Is each effect significant for at least one of the dependent

variables?" That is, where the F test focuses on the dependents, the multivariate tests

focus on the independents and their interactions.

Types of individual effects:

Hotelling's T-Square

Wilks' lambda

Pillai-Bartlett trace

Roy's greatest characteristic root (GCR)


16




MultivariateGLM Box's M tests MANOVA's assumption of homoscedasticity using the F distribution. If

p(M)<.05, then the covariances are not significantly different. Thus we want M not to be

significant, rejecting the null hypothesis that the covariances are not homogeneous. In the

figure of SPSS output below, Box's M shows the assumption of equality of covariances

among the set of dependent variables is violated with respect to the groups formed by

the categorical independent factor "performanceµ.

Levene's test :SPSS also outputs Levene's

test as part of Manova.. If Levene's test is

significant, then the data fail the assumption of

equal group error variances. In the figure

,Levene's test shows that the assumption of

homogeneity of error variances among the

groups of "performance" is violated for two of

the three dependent variables listed. 17



Factor AnalysisAttempts to identify underlying variables, or factors, that explain the

pattern of correlations within a set of observed variables. Factor analysis is

often used in data reduction to identify a small number of factors that

explain most of the variance observed in a much larger number of manifest

variables. Factor analysis requires that you have data in the form of

correlations, so all of the assumptions that apply to correlations, are

relevant .

Types of factor analysis: :

Principal componentCommon factor analysis

6. FACTOR ANALYSIS

18



FACTOR ANALYSIS

CorrelationMatrix: We can use the correlation matrix to check the pattern of

relationships. First scan the significance values and look for any variable for which themajority of values are greater than 0.05.

All we want to see in this figure is that

the determinant is not 0. If thedeterminant is 0, then there will be

computational problems with the factor

analysis, If we look at figure : a.

Determinant = .835 which is greater

than necessary value 0.00001. Wecan say, there is no problem with this

data.

19



KMO and Barlett·s Test: Two important parts of the output

a. Kaiser-Meyer-OlkinMeasure of Sampling Adequacy :This measure variesbetween 0 and 1, and values closer to 1 are better. Kaiser recommends accepting

value is greater than 0.05. For this data (figure) the KMO value is 0.472 , the

minimum suggested value is 0.6 which is mean factor analysis is not appropriate

for this data.

b. Bartlett's Test of Sphericity : The null hypothesis that the original correlation

matrix is an identity matrix. For factor analysis to work we need some

relationships between variables and if the R-matrix were an identity matrix then

all correlation coefficients would be zero. Therefore we want to this test to be

significant. Barlett,s Test is not significant (0.061), because the value need to be

(p<0.001), therefore factor analysis is not appropriate.

FACTOR ANALYSIS

20



FACTOR ANALYSIS

Total Variance Explained: Figure lists the eigenvalues associated witheach linear component (factor) before extraction, after extraction and after

rotation. Before extraction, SPSS has identified 4 linear components within the

data set .The Eigenvalues associated with each factor represent the variance

explained by that particular linear component and SPSS also displays

eigenvalue in terms of the percentage of variance explained ( so factor 1

explains 35.360 total variance)it should be clear that the first two factors

explain relatively large amount of variance(especially factor1) whereas

subsequent factors explain only small amounts of variance.

21



FACTOR ANALYSIS

Communalities & Component Matrix: Figure shows the table of

communalities before and after extraction principal component analysis workson the initial assumption that all variance is common; therefore before

extraction the communalities are all 1. The communalities on the column labeled

extraction reflect the common variance in the data structure. So, we can say

that 69.7% of the variance associated with question 1 is common. This output

also shows the component matrix before rotation. This matrix containsthe loadings of each variable onto each factor. At this stage SPSS has

extracted two factors.

22



Regression analysis

Includes any techniques for modeling and analyzing several variables, when

the focus is on the relationship between a dependent variable and one or

more independent variables.

The regression equation is the simplest form regression analysis involves

finding the best straight line relationship to explain how the variation in anoutcome (or dependent) variable, Y, depends on the variation in a predictor

(or independent or explanatory) variable, X. Once the relationship has been

estimated we will be able to use the equation:

Y = 0 .1X

The basic technique for determining the coefficients 0 and 1 is OrdinaryLeast Squares (OLS): values for 0 and 1 are chosen so as to minimize the

sum of the squared residuals (SSR). The SSR may be written as:

7. REGRESSION ANALYSIS

23



REGRESSION ANALYSIS

Model summary: From the model summary table, we can find how well the

model fits the data .This table displays R, R squared, adjusted R squared, and

the standard error. R is the correlation between the observed and predicted

values of the dependent variable. The values of R range from -1 to 1.

R squared is the proportion of variation in the dependent variable

explained by the regression model. The values of R squared range from 0 to

1. Adjusted R-Square is an adjustment for the fact that when one has a large

number of independents, it is possible that R2 will become artificially high

simply.

24



REGRESSION ANALYSIS

ANOVA: Besides R-squared we can use Anova (Analysis of variance) to

check how well the model fits the data.

The F statistic is the regression mean square (MSR) divided by the residual

mean square (MSE). If the significance value of the F statistic is small (smaller

than 0.05) then the independent variables do a good job explaining the

variation in the dependent variable. If the significance value of F is larger than

0.05 then the independent variables do not explain the variation in the

dependent variable, and the null hypothesis that all the population values for

the regression coefficients are 0 is accepted.

25



26

The Collinearity Diagnostics table :in SPSS is an alternative method ofassessing if there is too much multicollinearity in the model. High eigenvalues

indicate dimensions (factors) which account for a lot of the variance in the

crossproduct matrix. Eigenvalues close to 0 indicate dimensions which explain little

variance. Multiple eigenvalues close to 0 indicate an ill-conditioned crossproduct

matrix, meaning there may be a problem with multicollinearity .

REGRESSION ANALYSIS



27

Spss Presentation - Copy

Documents