DESCRIPTIV E STATISTICS Adapted from the Presentation of Mrs. Zennifer L. Oberio Presented by: Balbido, Aileen U. Latap, Kenneth John R. Tejuco, Kerwin Chester C.
DESCRIPTIVE
STATISTICSAdapted from the Presentation
of Mrs. Zennifer L. Oberio
Presented by:Balbido, Aileen U.
Latap, Kenneth John R.Tejuco, Kerwin Chester C.
STATISTICS
The Study of how to:
CollectOrganize
Analyze& interpret
numerical information
DESCRIPTIVE STATISTICS
STATISTICS: Two Categories• Descriptive Used to summarize or
Methods to describe a data set(a set of measurements obtained on some variable)
Used to draw conclusions• Inferential or to make inferences
Methods about a population basedon the observation of asample
DESCRIPTIVE STATISTICS
Nature of Statistical Data/Levels of Measurement
1. Nominal2. Ordinal3. Interval4. Ratio
Importance: The nature of a set of data may suggest the use of particular statistical techniques
DESCRIPTIVE STATISTICS
Nature of Statistical Data/ Levels of Measurement
1. 2. 3. 4.NOMINAL ORDINAL INTERVAL
RATIO
Categories or Qualities
Numbers are used simply as labels for groups orclasses
Number convey no numerical information
Example:1 for YES, 2 for NO 1 – Red, 2 – Yellow, 3 - Green
DESCRIPTIVE STATISTICS
Nature of Statistical Data/ Levels of Measurement
1. 2. 3. 4.NOMINAL ORDINAL INTERVAL
RATIO
Data may be ordered using inequality according
to their size or quality
Example: Mohs’ Scale of Hardness1 – Talc 2 – Gypsum, 3 – Calcite 4 –
Fluorite
DESCRIPTIVE STATISTICS
Nature of Statistical Data/ Levels of Measurement 1. 2. 3. 4.NOMINAL ORDINAL INTERVAL
RATIOExample: Mohs’ Scale of Hardness1 – Talc 2 – Gypsum, 3 – Calcite 4 – Fluorite
Data may be ranked (but no indication of how much of the variable exists)
3>2 : Calcite is harder than gypsum
Differences and Ratios between data values aremeaningless2 – 1 = 4 – 3 : The difference in hardness between gypsum and talc is equal to the difference in hardness between fluorite and calcite.
4 ÷ 2 – 2: Fluorite is twice as hard as gypsum.
DESCRIPTIVE STATISTICS
Nature of Statistical Data/ Levels of Measurement
1. 2. 3. 4.NOMINAL ORDINAL INTERVAL
RATIO
Differences between data values represent equalamounts in the magnitude of the variablemeasured
No true zero (the complete absence of the
variable measured)
Example: Temperatures in degrees Fahrenheit
and degrees Celsius
DESCRIPTIVE STATISTICS
Nature of Statistical Data/ Levels of Measurement
1. 2. 3. 4.NOMINAL ORDINAL INTERVAL
RATIO
Example: Temperatures in degrees Fahrenheit and degreesCelsius
Ranking and taking differences are permitted.
100˚F > 98˚F : 100˚F is warmer than 98˚F. 100˚F - 98˚F = 52˚F - 50˚F : The same amount of heat is requiredto raise the temperature of an object from 98˚F to 100˚F andfrom 50˚F to 52˚F
Ratios are meaningless.
100˚F is twice as hot as 50˚F
In degrees Celsius, 100˚ is 37.8˚C and 50˚F is 10˚C.
DESCRIPTIVE STATISTICS
Nature of Statistical Data/ Levels of Measurement
1. 2. 3. 4.NOMINAL ORDINAL INTERVAL RATIO
Has a true zero as a starting point for allmeasurements.
Example: length, height, elapsed time,
volume
Taking ratios and differences, and ranking are
permitted
DESCRIPTIVE STATISTICS
Generating Data
The Mach PyramidIs this pyramid with a small square projecting out towards you?
OR
Is this a room with the small square as the far wall?
DESCRIPTIVE STATISTICS
Generating Data
The Mach Pyramid
Within a 1-minuteperiod, how long(in seconds) doesit take for a personto see this as a Pyramid?
DESCRIPTIVE STATISTICS
The Data Set from 25 subjects(The Mach Pyramid Test)
1 2 2 3 33 4 4 5 55 6 6 7 77 7 8 8 99 10 10 11 11
DESCRIPTIVE STATISTICS
Descriptive Statistics – provides a picture about the data set
1. What is the shape of the distribution? Dothe values tend to fall into somerecognizable pattern?
2. What is the location of the variable? That is,where are the numbers centered?
3. How much variation is involved? Are the values widely dispersed or are they all fairlyclose in value?
DESCRIPTIVE STATISTICS
Picturing the DistributionThe distribution of a data set is a listing of the frequenciesof occurrence of the measurements in the data set.
Tools:
1. Stem-and-Leaf Display
2. Dot plot
3. Frequency Distributions
4. Graphical Presentations
DESCRIPTIVE STATISTICS
DOTPLOT: Data Set from 25 subjects(The Mach Pyramid Test)
1 2 2 3 3 3 4 4 55 5 6 6 7 7 7 7 88 9 9 10 10 11 11
DESCRIPTIVE STATISTICS
The Shape of The Distribution
Symmetric bell-shaped (data tend to cluster about a center point)
Skewed right or positively skewed (data are clustered off-center to the left)
Skewed left or negatively skewed (data are clustered off-center to the right)
DESCRIPTIVE STATISTICS
Describing the ‘Center’ of the Data Set: Measures of Central Location
1. Mean The ratio of the sum of all the values in the data set
and the total number of values in data set.
2. Median The middle value of an ordered data set. If the total number of values is odd. If the total number of values is even, it is the mean of the two middle values.
3. Mode The most frequently occurring value in the data set.
DESCRIPTIVE STATISTICS
Measures of Central Location
1. Mean = 6.12The ratio of the sum of all the values in the data set and the total number of values in the data set.
1 2 2 3 3
3 4 4 5 5
5 6 6 7 7
7 7 8 8 9
9 10 10 11 11
Mean
= 153 ÷ 25
= 6.12
Data set from 25 subjects
DESCRIPTIVE STATISTICS
Measures of Central Location
1. Mean = 6.12
2. Median = 6The middle value of an ordered data set if the total number of values is odd. If the total number of values is even, it is the mean of the two middle values.
3. Mode = 7The most frequently occurring value in the data set.
1 2 2 3 3
3 4 4 5 5
5 6 6 7 7
7 7 8 8 9
9 10 10 11 11
Data set from 25 subjects
DESCRIPTIVE STATISTICS
Describing the ‘Center’ of the Data Set: Measures of Central
Location1. Mean Summarizes all the information in the data
set.
2. Median Splits the data sets into two halves: there are an equal number of values above and below it.
3. Mode The most common value in the data set.
DESCRIPTIVE STATISTICS
Measures of Central Location: Mean, Median, Mode
SHORTCOMINGSMerits
MEAN
Takes into account every value in the data set. Always exists and unique. Most useful of the three for inferential statistics.
Can be influenced by extremely high or low values (outliers)
DESCRIPTIVE STATISTICS
Measures of Central Location: Mean, Median, Mode
SHORTCOMINGSMerits
MEDIAN
Not easily affected by outliers (extreme values). Always exists and unique.
Less reliable than the mean – the medians of many samples drawn from the same population will vary more widely than the corresponding sample means.
DESCRIPTIVE STATISTICS
Measures of Central Location: Mean, Median, Mode
SHORTCOMINGSMerits
MODERequires no calculation, only counting
Not a stable measure – it depends only a few valuesMay not existMoy not be unique
DESCRIPTIVE STATISTICS
The Shape of The Distribution
Symmetaric bell-shaped
Skewed right or positively skewed
Skewed left or negatively skewed
DESCRIPTIVE STATISTICS
Locating the “Centers” in the DOTPLOT
Mean = 6.12 Median = 6Mode = 7
1 2 3 4 5 6 7 8 9 10 11
DESCRIPTIVE STATISTICS
Describing the ‘Spread’ of the Data: Measures of Variability
1. Range The largest value minus the smallest value in a data set.
2. Semi- One half of the difference between the Interquartile 75th percentile and the 25th percentile range (SIQR
3. Standard Deviation The square root of the average of the squared deviations from the mean
DESCRIPTIVE STATISTICS
Describing the ‘Spread’ of the Data:
Measures of Variability
1. Range = 10
1 2 2 3 3
3 4 4 5 5
5 6 6 7 7
7 7 8 8 9
9 10 10 11 11
Data set from 25 subjects
The largest value minus the smallest value in a data set
Range = 11- 1 = 10
DESCRIPTIVE STATISTICS
Measures of Variability: Range, SIQR, SD
RANGE
A ‘quick and easy’ indication of variability
Provides no indication concerning the dispersion of the
values which wall between the two extremes
Relatively unstable measure of variability because it can be
influenced by change in the highest or lowest value
DESCRIPTIVE STATISTICS
MERITS SHORTCOMINGS
Measures of Variability: Range, SIQR, SD
MERITS SHORTCOMINGS
SEMI-INTERQUARTILE RANGE
More resistant to extreme values than the range
Does not utilize all the values in the data or set for its
computation
DESCRIPTIVE STATISTICS
Measures of Variability: Range, SIQR, SD
MERITS SHORTCOMINGS
STANDARD DEVIATION
Use all the values in the data for its computation
DESCRIPTIVE STATISTICS
Descriptive Statistics: What to use ?
Considering in choosing:
• The scale of the measurement represented by the data set
• The shape of the distribution
• The intended use of the descriptive statistics for further statistical analysis
DESCRIPTIVE STATISTICS
Describing the ‘Spread’ and the Center of the Data1. Range = 10
Indicates the variation between the smallest and the largest valuesin the data set; but does not tell how much the other values vary
DESCRIPTIVE STATISTICS Mean = 6.12 Median = 6 Mode = 7
Range = 10
Describing the ‘Spread’ and the Center of the Data2. Semi-interquartile range = 2.375
Describes the spread of the values in the data set for 25% of the total itemsabove and below the median If the SIQR is small, the values are concentrated near the median
DESCRIPTIVE STATISTICS Mean = 6.12 Median = 6 Mode = 7
belowabove
median 8.375 (6 + 2.375)
3.625 (6 + 2.375)
Describing the ‘Spread’ and the Center of the Data
2. Standard Deviation = 2.934A measure of variability based on how far each value if from themean of the data set
DESCRIPTIVE STATISTICS
Empirical Rule for symmetric bell-shaped distributions
About 68% of the values will liewithin 1 standard deviation of themeanAbout 95% of the values will lieWithin 2 standard deviation of theMean
About 99.7% of the values will liewithin 3 standard deviation of themean
Describing the ‘Spread’ and the Center of the Data
3. Standard Deviation = 2.934If the standard deviation is small, the values are concentrated near the
mean.If the standard deviation is LARGE, the values are scattered widely about the
mean.
DESCRIPTIVE STATISTICS Mean = 6.12 Median = 6 Mode = 7
3.186 (6.12 – 2.934)
1 sd below1 sd above
mean
9.054 ( 6.12 + 2.934)
Describing the ‘Spread’ of the Data:Measures of Variability
2. Semi-interquartile rangeOne half of the differenceBetween the 75th and the
25th percentileData Set from 25 subjects
1 2 2 3 3
3 4 4 5 5
5 6 6 7 7
7 7 8 8 9
9 10 10 11 11
DESCRIPTIVE STATISTICS
18th and 19th items
75th percentile – the value in thedate set which is exceeded by 25%of the total number of items in theset25 x (0.75) = 18.7518.75 : rank of the 75th percentile18th + 19th items = 875th percentile = 8
Describing the ‘Spread’ of the Data:
Measures of Variability
2. Semi-interquartile range
One half of the differenceBetween the 75th and the
25th percentileData Set from 25 subjects
1 2 2 3 3
3 4 4 5 5
5 6 6 7 7
7 7 8 8 9
9 10 10 11 11
DESCRIPTIVE STATISTICS
6th and 7th items
25th percentile – the value in thedate set which is exceeded by 75% of the total number of items in the set25 x (0.25) = 6.256.25 : rank of the 25th percentile6th item = 3 7th item = 425th percentile = 3 + (0.25)(4-3)25th percentile = 3.25
Describing the ‘Spread’ of the Data:
Measures of Variability
2. Semi-interquartile = 2.375 range
One half of the difference
Between the 75th and the 25th
percentile
Data Set from 25 subjects
1 2 2 3 3
3 4 4 5 5
5 6 6 7 7
7 7 8 8 9
9 10 10 11 11
DESCRIPTIVE STATISTICS
75th percentile = 8
25th percentile = 3.25
SIQR = ½ (8 - 3.25)
SIQR = 2.375
Describing the ‘Spread’ of the Data:
Measures of Variability
3. Standard = 2.934 Deviation
The Square root of the average of the squared deviations from
the mean
Data Set from 25 subjects
1 2 2 3 3
3 4 4 5 5
5 6 6 7 7
7 7 8 8 9
9 10 10 11 11
DESCRIPTIVE STATISTICS
SD = ∫∑ ( x - x1 ) ²
__________
n-1
= 2.934
Mean (x1 ) = 6.12