Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people. Central Tendency

Class Meeting #11

Data Analysis

Types of Statistics

DescriptiveStatistics used to describe things, frequently groups of people. Central Tendency Variability Relative Standing Relationship

InferentialStatistics used to make inferences and draw conclusions.

Parametric (t-test, ANOVA, multiple regression)

Non-Parametric (chi-square)

Types of Analysis

Univariate – looks at one variable at a time.

Bivariate – looks at variables two at a time.

Multivariate – looks at three or more variables at a time.

Types of Variables

Independent (or Predictor) The variable measured first in time and from which

a prediction is made. The “cause” variable. Dependent (or Predicted)

The variable measured later in time and which is desirable to predict. The “effect” variable.

Dayton, C. M. & Stunkard, C. L. (1971). Statistics for Problem Solving.

New York: McGraw-Hill Book Company.

Charles, C. M. & Mertler, C. A. (2002). Introduction to Educational Research, 4th Edition. Boston: Allyn and Bacon.

Measurement Scales

Nominal – A scale that measures data by name only, such as gender, hair color, race.

Ordinal – A scale that measures data by rank order only, such as medical condition, military rank, socioeconomic status.

Interval – A scale that measures data by using equal intervals, such as temperature, percentage correct on a test.

Nominal Scales

A number is used to represent a category.

The number has no meaning beyond serving as a label.

Categories are mutually exclusive but qualitatively different.

Ordinal Scales

A number is used to represent a category. The number has no meaning beyond

serving as a label. Categories are mutually exclusive but

qualitatively different. The categories are ordered in a

meaningful way. Differences between consecutive units of

measurement can be unequal.

Interval Scales

A number is used to represent a specific amount.

The numbers are meaningful in that they represent equal-sized units that correspond to equal increases in amounts of the underlying attribute.

The scale may include a zero value, but the zero is not meaningful. It is only a convenient starting point for measurement.

Ratio Scales

A number is used to represent a specific amount.

The numbers are meaningful in that they represent equal-sized units that correspond to equal increases in amounts of the underlying attribute.

In addition, there is a true zero on the scale that represents a true absence of the attribute being measured.

Organizing Data

Frequency DistributionA table showing the number of test takers who received each of the scores possible (simple frequency distribution), or the number of test takers who scored within a specified interval range (grouped frequency distribution).

X

(score)f

(frequency)

9 3

8 6

7 8

6 4

5 2

4 4

3 3

2 1

Displaying Data

Histogram (bar graph) Frequency Polygon (line graph) Scatter Plots

Bar Graph

Sometimes referred to as “column graph”

Useful in presenting or comparing differences between groups

Sometimes used to show how groups differ over time

Nichol & Pexman

Bar Graph

0102030405060708090

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

EastWestNorth

Effective Elements for Bar Graphs

Dependent variable is on the vertical axis. Independent variable is on the horizontal axis. Length of vertical axis should be 2/3 to 3/4 the length of

the horizontal axis. Positive values increase to the right (horizontal axis) or

up (vertical axis). Negative values increase to the left (horizontal axis) or

down (vertical axis). Highest value on either scale is larger than the highest

data value. Bars are clearly differentiated from one another. Bars are of the same width.

Nichol & Pexman

Line Graph

Used to present a change in one or more dependent variables as a function of an independent variable

Particularly useful in demonstrating a trend or an interaction

Must have at least 3 data points

Nichol & Pexman

Line Graph

Effective Elements for Line Graphs

Dependent variable is on the vertical axis. Independent variable is on the horizontal axis. Length of vertical axis should be 2/3 to 3/4 that of the

horizontal axis. Positive values increase to the right (horizontal) or up

(vertical). Negative values increase to the left (horizontal) or down

(vertical). No more than four lines or curves per graph. Lines within the graph can be clearly differentiated from

one another.Nichol & Pexman

Scatter Plot

Present values of single events as a function of two variables scaled along the vertical and horizontal axes.

Purpose is usually to explore the relationship between two variables.

A linear relationship (high correlation) may be indicated if the data points are clustered along the diagonal within the area of the plot.

Nichol & Pexman

Scatter Plot

Effective Elements for Plots

Length of vertical axis should be 2/3 to 3/4 the length of the horizontal axis.

Zero points are indicated on the axes. Data points are represented by symbols that are

approximately the same size as lowercase letters used in text on the figure.

Nichol & Pexman

Measures of Central Tendency

Mean (arithmetic average) Median (middle score in the

distribution, better known as the 50th percentile)

Mode (most frequently occurring score)

Comparing Measures of Central Tendency

The mean is more stable over time because each score in the distribution enters into the computation. It is, however, more affected by extreme scores.

The median is less affected by extreme scores.

The mode is easiest to determine but is the least stable.

Extreme Scores

Extreme scores, or “outliers, are individual low or high values in a group (or distribution) or scores that greatly affect the value of the mean.

Measures of Variability

Range (R)

The difference between the highest and lowest scores in a distribution.

Standard Deviation (SD)

The estimate of variability that accompanies the mean in describing a distribution.

Comparing Measures of Variability

Standard deviation is more reliable than range.

Standard deviation is used in calculation of other statistics such as standard scores and error scores.

Measures of Relationship

Paired Samples t-test compares the means of two variables. It computes the difference between the two variables for each case, and tests to see if the average difference is significantly different from zero.

t-test for Independent Samples compares the mean scores of two groups on a given variable.


One-Way ANOVA* Used to test for differences among two or

more independent groups.

* Analysis of Variance


Pearson’s Chi Square

A general test for the existence of a relationship between two or more nominal level variables.

Coefficient of Correlation (r)

Expresses the degree of relationship between two sets of scores.

Statistical Significance

p > .05 means that differences could have occurred 5 or more times in 100 samples. (NOT significant)

p < .05 means that differences could have occurred less than 5 times in 100 samples. (significant)

p < .01 means that differences could have occurred less than 1 time in 100 samples. (more significant)

Error

Type I – You conclude that a relationship exists between variables when in reality there is none.

Type II – You conclude that a relationship does not exist between variables when in reality there is one.

Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people. Central Tendency

Documents

interval scalesa number

number of test takers

ordinal scalesa number

ratio scalesa number

nominal scalesa number

predictorthe variable

predictedthe variable

independent variable