Descriptive statistics

DESCRIPTIVE

STATISTICSAdapted from the Presentation

of Mrs. Zennifer L. Oberio

Presented by:Balbido, Aileen U.

Latap, Kenneth John R.Tejuco, Kerwin Chester C.

STATISTICS

The Study of how to:

CollectOrganize

Analyze& interpret

numerical information

DESCRIPTIVE STATISTICS

STATISTICS: Two Categories• Descriptive Used to summarize or

Methods to describe a data set(a set of measurements obtained on some variable)

Used to draw conclusions• Inferential or to make inferences

Methods about a population basedon the observation of asample


Nature of Statistical Data/Levels of Measurement

1. Nominal2. Ordinal3. Interval4. Ratio

Importance: The nature of a set of data may suggest the use of particular statistical techniques


Nature of Statistical Data/ Levels of Measurement

1. 2. 3. 4.NOMINAL ORDINAL INTERVAL

RATIO

Categories or Qualities

Numbers are used simply as labels for groups orclasses

Number convey no numerical information

Example:1 for YES, 2 for NO 1 – Red, 2 – Yellow, 3 - Green




RATIO

Data may be ordered using inequality according

to their size or quality

Example: Mohs’ Scale of Hardness1 – Talc 2 – Gypsum, 3 – Calcite 4 –

Fluorite


Nature of Statistical Data/ Levels of Measurement 1. 2. 3. 4.NOMINAL ORDINAL INTERVAL

RATIOExample: Mohs’ Scale of Hardness1 – Talc 2 – Gypsum, 3 – Calcite 4 – Fluorite

Data may be ranked (but no indication of how much of the variable exists)

3>2 : Calcite is harder than gypsum

Differences and Ratios between data values aremeaningless2 – 1 = 4 – 3 : The difference in hardness between gypsum and talc is equal to the difference in hardness between fluorite and calcite.

4 ÷ 2 – 2: Fluorite is twice as hard as gypsum.




RATIO

Differences between data values represent equalamounts in the magnitude of the variablemeasured

No true zero (the complete absence of the

variable measured)

Example: Temperatures in degrees Fahrenheit

and degrees Celsius




RATIO

Example: Temperatures in degrees Fahrenheit and degreesCelsius

Ranking and taking differences are permitted.

100˚F > 98˚F : 100˚F is warmer than 98˚F. 100˚F - 98˚F = 52˚F - 50˚F : The same amount of heat is requiredto raise the temperature of an object from 98˚F to 100˚F andfrom 50˚F to 52˚F

Ratios are meaningless.

100˚F is twice as hot as 50˚F

In degrees Celsius, 100˚ is 37.8˚C and 50˚F is 10˚C.



1. 2. 3. 4.NOMINAL ORDINAL INTERVAL RATIO

Has a true zero as a starting point for allmeasurements.

Example: length, height, elapsed time,

volume

Taking ratios and differences, and ranking are

permitted


Generating Data

The Mach PyramidIs this pyramid with a small square projecting out towards you?

OR

Is this a room with the small square as the far wall?


Generating Data

The Mach Pyramid

Within a 1-minuteperiod, how long(in seconds) doesit take for a personto see this as a Pyramid?


The Data Set from 25 subjects(The Mach Pyramid Test)

1 2 2 3 33 4 4 5 55 6 6 7 77 7 8 8 99 10 10 11 11


Descriptive Statistics – provides a picture about the data set

1. What is the shape of the distribution? Dothe values tend to fall into somerecognizable pattern?

2. What is the location of the variable? That is,where are the numbers centered?

3. How much variation is involved? Are the values widely dispersed or are they all fairlyclose in value?


Picturing the DistributionThe distribution of a data set is a listing of the frequenciesof occurrence of the measurements in the data set.

Tools:

1. Stem-and-Leaf Display

2. Dot plot

3. Frequency Distributions

4. Graphical Presentations


DOTPLOT: Data Set from 25 subjects(The Mach Pyramid Test)

1 2 2 3 3 3 4 4 55 5 6 6 7 7 7 7 88 9 9 10 10 11 11


The Shape of The Distribution

Symmetric bell-shaped (data tend to cluster about a center point)

Skewed right or positively skewed (data are clustered off-center to the left)

Skewed left or negatively skewed (data are clustered off-center to the right)


Describing the ‘Center’ of the Data Set: Measures of Central Location

1. Mean The ratio of the sum of all the values in the data set

and the total number of values in data set.

2. Median The middle value of an ordered data set. If the total number of values is odd. If the total number of values is even, it is the mean of the two middle values.

3. Mode The most frequently occurring value in the data set.


Measures of Central Location

1. Mean = 6.12The ratio of the sum of all the values in the data set and the total number of values in the data set.

1 2 2 3 3

3 4 4 5 5

5 6 6 7 7

7 7 8 8 9

9 10 10 11 11

Mean

= 153 ÷ 25

= 6.12

Data set from 25 subjects


Measures of Central Location

1. Mean = 6.12

2. Median = 6The middle value of an ordered data set if the total number of values is odd. If the total number of values is even, it is the mean of the two middle values.

3. Mode = 7The most frequently occurring value in the data set.

1 2 2 3 3

3 4 4 5 5

5 6 6 7 7

7 7 8 8 9

9 10 10 11 11



Describing the ‘Center’ of the Data Set: Measures of Central

Location1. Mean Summarizes all the information in the data

set.

2. Median Splits the data sets into two halves: there are an equal number of values above and below it.

3. Mode The most common value in the data set.


Measures of Central Location: Mean, Median, Mode

SHORTCOMINGSMerits

MEAN

Takes into account every value in the data set. Always exists and unique. Most useful of the three for inferential statistics.

Can be influenced by extremely high or low values (outliers)



SHORTCOMINGSMerits

MEDIAN

Not easily affected by outliers (extreme values). Always exists and unique.

Less reliable than the mean – the medians of many samples drawn from the same population will vary more widely than the corresponding sample means.



SHORTCOMINGSMerits

MODERequires no calculation, only counting

Not a stable measure – it depends only a few valuesMay not existMoy not be unique


The Shape of The Distribution

Symmetaric bell-shaped

Skewed right or positively skewed

Skewed left or negatively skewed


Locating the “Centers” in the DOTPLOT

Mean = 6.12 Median = 6Mode = 7

1 2 3 4 5 6 7 8 9 10 11


Describing the ‘Spread’ of the Data: Measures of Variability

1. Range The largest value minus the smallest value in a data set.

2. Semi- One half of the difference between the Interquartile 75th percentile and the 25th percentile range (SIQR

3. Standard Deviation The square root of the average of the squared deviations from the mean


Describing the ‘Spread’ of the Data:

Measures of Variability

1. Range = 10

1 2 2 3 3

3 4 4 5 5

5 6 6 7 7

7 7 8 8 9

9 10 10 11 11


The largest value minus the smallest value in a data set

Range = 11- 1 = 10


Measures of Variability: Range, SIQR, SD

RANGE

A ‘quick and easy’ indication of variability

Provides no indication concerning the dispersion of the

values which wall between the two extremes

Relatively unstable measure of variability because it can be

influenced by change in the highest or lowest value


MERITS SHORTCOMINGS


MERITS SHORTCOMINGS

SEMI-INTERQUARTILE RANGE

More resistant to extreme values than the range

Does not utilize all the values in the data or set for its

computation



MERITS SHORTCOMINGS

STANDARD DEVIATION

Use all the values in the data for its computation


Descriptive Statistics: What to use ?

Considering in choosing:

• The scale of the measurement represented by the data set

• The shape of the distribution

• The intended use of the descriptive statistics for further statistical analysis


Describing the ‘Spread’ and the Center of the Data1. Range = 10

Indicates the variation between the smallest and the largest valuesin the data set; but does not tell how much the other values vary

DESCRIPTIVE STATISTICS Mean = 6.12 Median = 6 Mode = 7

Range = 10

Describing the ‘Spread’ and the Center of the Data2. Semi-interquartile range = 2.375

Describes the spread of the values in the data set for 25% of the total itemsabove and below the median If the SIQR is small, the values are concentrated near the median


belowabove

median 8.375 (6 + 2.375)

3.625 (6 + 2.375)

Describing the ‘Spread’ and the Center of the Data

2. Standard Deviation = 2.934A measure of variability based on how far each value if from themean of the data set


Empirical Rule for symmetric bell-shaped distributions

About 68% of the values will liewithin 1 standard deviation of themeanAbout 95% of the values will lieWithin 2 standard deviation of theMean

About 99.7% of the values will liewithin 3 standard deviation of themean

Describing the ‘Spread’ and the Center of the Data

3. Standard Deviation = 2.934If the standard deviation is small, the values are concentrated near the

mean.If the standard deviation is LARGE, the values are scattered widely about the

mean.


3.186 (6.12 – 2.934)

1 sd below1 sd above

mean

9.054 ( 6.12 + 2.934)

Describing the ‘Spread’ of the Data:Measures of Variability

2. Semi-interquartile rangeOne half of the differenceBetween the 75th and the

25th percentileData Set from 25 subjects

1 2 2 3 3

3 4 4 5 5

5 6 6 7 7

7 7 8 8 9

9 10 10 11 11


18th and 19th items

75th percentile – the value in thedate set which is exceeded by 25%of the total number of items in theset25 x (0.75) = 18.7518.75 : rank of the 75th percentile18th + 19th items = 875th percentile = 8



2. Semi-interquartile range

One half of the differenceBetween the 75th and the

25th percentileData Set from 25 subjects

1 2 2 3 3

3 4 4 5 5

5 6 6 7 7

7 7 8 8 9

9 10 10 11 11


6th and 7th items

25th percentile – the value in thedate set which is exceeded by 75% of the total number of items in the set25 x (0.25) = 6.256.25 : rank of the 25th percentile6th item = 3 7th item = 425th percentile = 3 + (0.25)(4-3)25th percentile = 3.25



2. Semi-interquartile = 2.375 range

One half of the difference

Between the 75th and the 25th

percentile

Data Set from 25 subjects

1 2 2 3 3

3 4 4 5 5

5 6 6 7 7

7 7 8 8 9

9 10 10 11 11


75th percentile = 8

25th percentile = 3.25

SIQR = ½ (8 - 3.25)

SIQR = 2.375



3. Standard = 2.934 Deviation

The Square root of the average of the squared deviations from

the mean

Data Set from 25 subjects

1 2 2 3 3

3 4 4 5 5

5 6 6 7 7

7 7 8 8 9

9 10 10 11 11


SD = ∫∑ ( x - x1 ) ²

__________

n-1

= 2.934

Mean (x1 ) = 6.12

Descriptive statistics

Technology

unique descriptive statistics

green descriptive statistics

inferential statistics

data values

data set

fluorite data

f ratios

gypsum differences