Class Meeting #11 Data Analysis
Dec 27, 2015
Class Meeting #11
Data Analysis
Types of Statistics
DescriptiveStatistics used to describe things, frequently groups of people. Central Tendency Variability Relative Standing Relationship
InferentialStatistics used to make inferences and draw conclusions.
Parametric (t-test, ANOVA, multiple regression)
Non-Parametric (chi-square)
Types of Analysis
Univariate – looks at one variable at a time.
Bivariate – looks at variables two at a time.
Multivariate – looks at three or more variables at a time.
Types of Variables
Independent (or Predictor) The variable measured first in time and from which
a prediction is made. The “cause” variable. Dependent (or Predicted)
The variable measured later in time and which is desirable to predict. The “effect” variable.
Dayton, C. M. & Stunkard, C. L. (1971). Statistics for Problem Solving.
New York: McGraw-Hill Book Company.
Charles, C. M. & Mertler, C. A. (2002). Introduction to Educational Research, 4th Edition. Boston: Allyn and Bacon.
Measurement Scales
Nominal – A scale that measures data by name only, such as gender, hair color, race.
Ordinal – A scale that measures data by rank order only, such as medical condition, military rank, socioeconomic status.
Interval – A scale that measures data by using equal intervals, such as temperature, percentage correct on a test.
Nominal Scales
A number is used to represent a category.
The number has no meaning beyond serving as a label.
Categories are mutually exclusive but qualitatively different.
Ordinal Scales
A number is used to represent a category. The number has no meaning beyond
serving as a label. Categories are mutually exclusive but
qualitatively different. The categories are ordered in a
meaningful way. Differences between consecutive units of
measurement can be unequal.
Interval Scales
A number is used to represent a specific amount.
The numbers are meaningful in that they represent equal-sized units that correspond to equal increases in amounts of the underlying attribute.
The scale may include a zero value, but the zero is not meaningful. It is only a convenient starting point for measurement.
Ratio Scales
A number is used to represent a specific amount.
The numbers are meaningful in that they represent equal-sized units that correspond to equal increases in amounts of the underlying attribute.
In addition, there is a true zero on the scale that represents a true absence of the attribute being measured.
Organizing Data
Frequency DistributionA table showing the number of test takers who received each of the scores possible (simple frequency distribution), or the number of test takers who scored within a specified interval range (grouped frequency distribution).
X
(score)f
(frequency)
9 3
8 6
7 8
6 4
5 2
4 4
3 3
2 1
Displaying Data
Histogram (bar graph) Frequency Polygon (line graph) Scatter Plots
Bar Graph
Sometimes referred to as “column graph”
Useful in presenting or comparing differences between groups
Sometimes used to show how groups differ over time
Nichol & Pexman
Bar Graph
0102030405060708090
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
EastWestNorth
Effective Elements for Bar Graphs
Dependent variable is on the vertical axis. Independent variable is on the horizontal axis. Length of vertical axis should be 2/3 to 3/4 the length of
the horizontal axis. Positive values increase to the right (horizontal axis) or
up (vertical axis). Negative values increase to the left (horizontal axis) or
down (vertical axis). Highest value on either scale is larger than the highest
data value. Bars are clearly differentiated from one another. Bars are of the same width.
Nichol & Pexman
Line Graph
Used to present a change in one or more dependent variables as a function of an independent variable
Particularly useful in demonstrating a trend or an interaction
Must have at least 3 data points
Nichol & Pexman
Line Graph
Effective Elements for Line Graphs
Dependent variable is on the vertical axis. Independent variable is on the horizontal axis. Length of vertical axis should be 2/3 to 3/4 that of the
horizontal axis. Positive values increase to the right (horizontal) or up
(vertical). Negative values increase to the left (horizontal) or down
(vertical). No more than four lines or curves per graph. Lines within the graph can be clearly differentiated from
one another.Nichol & Pexman
Scatter Plot
Present values of single events as a function of two variables scaled along the vertical and horizontal axes.
Purpose is usually to explore the relationship between two variables.
A linear relationship (high correlation) may be indicated if the data points are clustered along the diagonal within the area of the plot.
Nichol & Pexman
Scatter Plot
Effective Elements for Plots
Length of vertical axis should be 2/3 to 3/4 the length of the horizontal axis.
Zero points are indicated on the axes. Data points are represented by symbols that are
approximately the same size as lowercase letters used in text on the figure.
Nichol & Pexman
Measures of Central Tendency
Mean (arithmetic average) Median (middle score in the
distribution, better known as the 50th percentile)
Mode (most frequently occurring score)
Comparing Measures of Central Tendency
The mean is more stable over time because each score in the distribution enters into the computation. It is, however, more affected by extreme scores.
The median is less affected by extreme scores.
The mode is easiest to determine but is the least stable.
Extreme Scores
Extreme scores, or “outliers, are individual low or high values in a group (or distribution) or scores that greatly affect the value of the mean.
Measures of Variability
Range (R)
The difference between the highest and lowest scores in a distribution.
Standard Deviation (SD)
The estimate of variability that accompanies the mean in describing a distribution.
Comparing Measures of Variability
Standard deviation is more reliable than range.
Standard deviation is used in calculation of other statistics such as standard scores and error scores.
Measures of Relationship
Paired Samples t-test compares the means of two variables. It computes the difference between the two variables for each case, and tests to see if the average difference is significantly different from zero.
t-test for Independent Samples compares the mean scores of two groups on a given variable.
Measures of Relationship
One-Way ANOVA* Used to test for differences among two or
more independent groups.
* Analysis of Variance
Measures of Relationship
Pearson’s Chi Square
A general test for the existence of a relationship between two or more nominal level variables.
Coefficient of Correlation (r)
Expresses the degree of relationship between two sets of scores.
Statistical Significance
p > .05 means that differences could have occurred 5 or more times in 100 samples. (NOT significant)
p < .05 means that differences could have occurred less than 5 times in 100 samples. (significant)
p < .01 means that differences could have occurred less than 1 time in 100 samples. (more significant)
Error
Type I – You conclude that a relationship exists between variables when in reality there is none.
Type II – You conclude that a relationship does not exist between variables when in reality there is one.