In this chapter, we will look at some charts and graphs used to summarize quantitative data. We will also look at numerical analysis of such data.

Post on 17-Dec-2015

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

In this chapter, we will look at some charts and graphs used to summarize quantitative data. We will also look at numerical analysis of such data.

Chapter 3Summarizing Quantitative Data

Stem and Leaf Plots

A way of listing all data values in a condensed format:

while not required, it helps to have the data sorted

choose the digit to be the stem (10’s place, 100’s place…)

put the stems in increasing (or decreasing) order in a column

next to each stem, put leaves in increasing order, left to right

Example 1

Construct a stem and leaf display for wingspans in “ACSC” using the 10’s digit as the stem.

Stem and Leaf Plots

Repeated Stems

Sometimes, if the data are clumped together in a small range of values, we use repeated stems – that is, each stem is listed twice

next to the first copy of the stem, all leaves from the lower half of the possible leaf values are listed

next to the second copy of the stem, all leaves from the upper half of the possible leaf values are listed

Example 2

Construct a stem and leaf display for wingspans in “ACSC” using the 10’s digit as the stem and using repeated stems.

Histograms

The quantitative data equivalent of a bar chart:

the horizontal axis has the possible values of the variable

the width of each rectangle is called the class interval or bin width

the vertical axis should be appropriately scaled for representing either frequencies or relative frequencies

the height of each rectangle corresponds to the frequency or relative frequency of each interval

the lower value of each of the class intervals is included in the count but the upper value is not included

Example 3

Construct a histogram for wingspans in “ACSC” with bins 10 wide.

Example 4

Construct a histogram for wingspans in “ACSC” with bins 5 wide.

Histogram ShapeModes

The humps in a histogram are called modes.

 

If the histogram has one distinct hump, it is called unimodal.

If the histogram has two distinct humps, it is called bimodal.

If the histogram has three or more humps, it is called multimodal.

If the histogram has no clear modes (all rectangles are about the same height), then it is called uniform.

Histogram ShapeSymmetric?

If there exists a vertical line that could be drawn through the “middle” of the histogram such that both the right and the left sides are pretty close to the same, the distribution is called symmetric.

Histogram ShapeSkewed?

If one side of the histogram is stretched out farther than the other, then the histogram is said to be skewed in the direction of the longer tail.

This histogram is skewed to the left.

Histogram ShapeOutliers?

Any observation that stand away from the body of the distribution could be an outlier.

Numerical AnalysisNon-Symmetric Data

Center:

If the data is non-symmetric, its center is measured as the median of the set.

The median of a data set is the middle value of the ordered set

If n is odd, the median is the value that cuts the list in half

If n is even, the median is the average of the two middle values

Example 5

Find the median of the given data sets. The first is the heights of females in “ACSC” while the second is the heights of females with brown hair in “ACSC”.

(a) 61 62 62 63 63 64 65 65 66 66 69 70 70 70 72

(b) 62 63 64 66 69 70

Numerical AnalysisNon-Symmetric Data

Spread:

The range of a data set is the difference between the maximum and minimum values .

The interquartile range (IQR) is the range of the middle 50% of the data set

The lower quartile (LQ or Q1) is the median of the lower half of the data

The upper quartile (UQ or Q3) is the median of the upper half of the data

IQR = UQ – LQ

Example 6

Find the range and interquartile range of the heights of females in “ACSC”.

61 62 62 63 63 64 65 65 66 66 69 70 70 70 72

Numerical Analysis5-Number Summary

5-Number Summary

The five values: min, Q1, median, Q3, and max are called the 5-number summary of a data set.

These can be found by hand as described in the previous slides, or using technology.

Numerical Analysis5-Number Summary

5-Number Summary via TI 83/84

• press and then enter the data in L1

• press to select 1-Var Stats

• press to perform the command, then scroll down to see results

Boxplot

Once we have the 5-number summary of a quantitative data set, we can represent the data set in a boxplot.

numerically scaled axis

BoxplotDetails

numerically scaled axis

A = lower quartile

B = median

C = upper quartile

A B C

D E

F F

BoxplotDetails

numerically scaled axis

D = Lower Fence = smallest data value that is LQ – 1.5(IQR)

E = Upper Fence = largest data value that is UQ + 1.5(IQR)

F = Outliers = values > than upper fence or < than lower fence

A B C

D E

F F

Example 7

Construct a boxplot for shoe sizes in “ACSC”.

Numerical AnalysisSymmetric Data

Center:

If the data is symmetric, its center is measured as the sample mean of the set.

The sample mean of a data set is the average of the values

the population mean is denoted

Numerical AnalysisSymmetric Data

Spread:

• the sample variance is calculated as

• the population variance is calculated as

For our purposes, this measurement will be rarely used.

Numerical AnalysisSymmetric Data

Spread:

standard deviation is the positive square root of variance, and it is a measurement of how, on average, observations vary from the mean

• sample standard deviation is

• population standard deviation is

Numerical AnalysisSymmetric Data

Both mean and standard deviation can be found by hand using these formulas.

It is much more common to use technology (the calculator for our purposes).

The 1-Var Stats command introduced earlier for the 5-number summary of a data set also has the mean and standard deviation.

Example 8

Find the mean, variance, standard deviation, and the

5-number summary of wingspans from the data in “ACSC”.

top related