Top Banner
STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN
28

STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

Dec 30, 2015

Download

Documents

Ashley Craig
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

STATISTICS I

COURSE INSTRUCTOR: TEHSEEN IMRAAN

Page 2: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

CHAPTER 4

DESCRIBING DATA

Page 3: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

INTRODUCTION

We continue our study of descriptive statistics with measures of dispersion, such as dot plots, stem and leaf displays, quartiles, percentiles, and box plots. Dot plots, a stem-and-leaf display, and box plots give additional insight into where the values are concentrated and dispersed and the general shape of the data. Finally we consider bivariate data where we observe two variables for each individual or observation selected.

Page 4: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

DOT PLOTS

• A graph for displaying a set of data. Each numerical value is represented by a dot placed above a horizontal number line.

• To develop a dot plot we display a dot for each observation along a horizontal number line indicating the value of each piece of data. For multiple observations we pile the dots on top of each other.

Page 5: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

STEPS TO CONSTRUCT DOT PLOT

• Sort the data from smallest to largest. • Draw and label a number line. • Place a dot . for each observation.

Page 6: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

FOR EXAMPLE

Length of Service (in years)

7 6 2 10 6 6

5 8 4 8 4 7

6 5 3 3 7 5

Page 7: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

Step 1: Sort the data from smallest to largest.

2 3 3 4 4 5 5 5 6 6 6 6 7 7 7 8 8 10

Page 8: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

Step 2: Draw the number line and label it as shown.

Page 9: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

Step 3: Place a dot for each observation.

Page 10: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

STEM AND LEAF DISPLAYS• A statistical technique for displaying a set of data. Each numerical

value is divided into two parts: The leading digit(s) become the stem, and the trailing digits the leaf. The stems are located along the main vertical axis, and the leaf for each observation along the horizontal axis.

• To develop a stem-and-leaf chart the first step is to locate the largest value and the smallest value. This will provide the range of the stem values. The stem is the leading digit or digits of the number, and the leaf is the trailing digit. For example, the number 15 has a stem value of 1 and a leaf value of 5. For another problem the number 231 has a stem value of 23 and a leaf value of 1.

Page 11: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

FOR EXAMPLE

$12 $28 $32 $24 $17 $6

$34 $18 $22 $42 $36 $26

Page 12: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

FOR EXAMPLE

Leading Digit Trailing Digit

0 6

1 278

2 2468

3 246

4 2

Page 13: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

OTHER MEASURES OF DISPERSION

QUARTILES:– First Quartile The point below which one-fourth or

25% of the ranked data values lie. (It is designated Q1 )

– Third Quartile The point below which three-fourths or 75% of the ranked data values lie. (It is designated Q3 )

– Logically the median is the Second Quartile (designated Q2 ). The values corresponding to Q1 , Q2 and Q3 divide a set of data into four equal parts.

Page 14: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

DECILES AND PERCENTILES

• Just as quartiles divide a distribution into 4 equal parts, deciles divide a distribution into ten equal parts; and percentiles divide a distribution into 100 equal parts.

• The procedure for finding the quartile, decile, and a percentile for ungrouped data is to order the data from smallest to largest. Then use text formula [4-1].

Page 15: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

DECILES AND PERCENTILES

Location of a Percentile,

Page 16: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

BOX PLOTS

• A graphical display based on five statistics: the minimum value, Q1 (the first quartile), Q2 the median, Q3 (the third quartile) and the maximum value.

• To construct a box plot we need five pieces of information. We need the minimum value, Q1 (the first quartile), Q2 the median, Q3 (the third quartile) and the maximum value.

Page 17: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

RELATIVE DISPERSION

• Coefficient of variation: The ratio of the standard deviation to the arithmetic mean, expressed as a percent.

Page 18: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

FORMULA FOR CV

Coefficient of Variation,

Multiplying by 100 converts the decimal to a percent

[4-2]

Page 19: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

COEFFICIENT OF VARIATION

• Characteristics of the coefficient of variation are:– It reports the variation relative to the mean. – It is useful for comparing distributions with

different units.

Page 20: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

SKEWNESS

• Four shapes of distribution

• Coefficient of skewness: A measure to describe the degree of skewness. How the distribution is skewed?

Page 21: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

Text Formula [4–3] is for Pearson’s Coefficient of Skewness.

Page 22: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

Characteristics of the coefficient of skewness are:

• The coefficient of skewness, designated sk, measures the amount of skewness and may range from -3.0 to +3.0.

• A value near -3, such as -2.57, indicates considerable negative skewness.

• A value such as 1.63 indicates moderate positive skewness.

• A value of 0, which will occur when the mean and median are equal, indicates the distribution is symmetrical and that there is no skewness.

Page 23: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

SUMMARY OF CHARTS

Page 24: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

RELATIONSHIP BETWEEN TWO VARIABLES

• Bivariate data: A collection of paired data values. • Scatter diagram: A graph in which paired data

values are plotted on an X,Y Axis.• The steps to follow in developing a scatter diagram

are:– We need two variables. – We scale one variable (x) along the horizontal axis (X –

Axis) of a graph and the corresponding variable (y) along the vertical axis (Y – Axis).

– Place a dot for each (x, y) pair of observations.

Page 25: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

GRAPH

Page 26: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

CONTINGENCY TABLE

• A table used to classify sample observations according to two or more identifiable characteristics.

• When we study the relationship between two or more variables when one or both are nominal or ratio scale, we tally the results into a two-way table. This two-way table is referred to as a contingency table.

Page 27: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

CONTINGENCY TABLE

  Gender  

Bought Lunch

Boys Girls Total

0 up to 10 10 5 15

10 up to 20 20 25 45

Total 30 30 60

Page 28: STATISTICS I COURSE INSTRUCTOR: TEHSEEN IMRAAN. CHAPTER 4 DESCRIBING DATA.

CONTINGENCY TABLE

• A contingency table is a cross tabulation that simultaneously summarizes two variables of interest and their relationship.

• A survey of 60 school children classified each as to gender and the number of times lunch was purchased at school during a four-week period. Each respondent is classified according to two criteria – the number of times lunch was purchased and gender.