Top Banner
AP STATISTICS: Chapter 1 Exploring Data Name ___________________________ Intro:Making Sense of Data Date _____________ Period ________ What is statistics? What is data analysis? Definition: Individuals- Variable- Ex 1: Identify the individuals and variables for a high school’s student data base. 2 questions to ask when you 1 st meet a new set of data: 1. 2. Definition: Categorical Variable- Quantitative variable- Ex 2: Identify categorical and quantitative variables for a high school’s student data base. Do we ever use numbers to describe the values of a categorical variable? Give some examples.
11

Wednesday, August 11 (131 minutes) - Lee County Schools · Web viewBlackberry Storm 330 Motorola Cliq 360 Samsung Moment 330 Blackberry Tour 300 HTC Droid 460 Ex 2: Time Spent on

Jan 20, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Wednesday, August 11 (131 minutes) - Lee County Schools · Web viewBlackberry Storm 330 Motorola Cliq 360 Samsung Moment 330 Blackberry Tour 300 HTC Droid 460 Ex 2: Time Spent on

AP STATISTICS: Chapter 1 Exploring Data Name ___________________________Intro:Making Sense of Data Date _____________ Period ________

What is statistics?

What is data analysis?

Definition:Individuals-

Variable-

Ex 1: Identify the individuals and variables for a high school’s student data base.

2 questions to ask when you 1st meet a new set of data:1.

2.

Definition:Categorical Variable-

Quantitative variable-

Ex 2: Identify categorical and quantitative variables for a high school’s student data base.

Do we ever use numbers to describe the values of a categorical variable? Give some examples.

Page 2: Wednesday, August 11 (131 minutes) - Lee County Schools · Web viewBlackberry Storm 330 Motorola Cliq 360 Samsung Moment 330 Blackberry Tour 300 HTC Droid 460 Ex 2: Time Spent on

EX 3:

a) Who are the individuals in this data set?

b) What variables were used? Identify as categorical or quantitative.

c) Describe the individual in the highlighted row.

Rows vs Columns:

What is a distribution?

How to explore data:

1.1 Analyzing Categorical Data

Pie Chart: Use this type of graph when you want to emphasize each category’s relation to the whole (displays categorical data).

Bar Graph: These graphs display the distribution of a categorical variable. Use this type of graph when you want to compare parts of a whole.

Two-Way Table: a table that describes two categorical variables in counts or percents.

What is the difference between a frequency table and a relative frequency table?

Page 3: Wednesday, August 11 (131 minutes) - Lee County Schools · Web viewBlackberry Storm 330 Motorola Cliq 360 Samsung Moment 330 Blackberry Tour 300 HTC Droid 460 Ex 2: Time Spent on

Ex 1: What Personal Media Do You Own?

Here are the percent of 15-18 year olds that own the following personal media devices, according to the Kaiser Family Foundation:

Device Percent who OwnCell Phone 85%MP3 Player 83%

Handheld Video Game Player 41%

Laptop 38%Portable CD/Tape Player 20%

a) Make a well labeled bar graph to display the data. Describe what you see.

b) Would it be appropriate to make a pie chart for this data? Why or why not?

What are some common ways to make a misleading graph? (pp. 11&12)

What is wrong with the following graph?

The marginal distribution of one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table.

A conditional distribution of a variable describes the values of that variable among individuals who have a specific value of another variable. There is a separate conditional distribution for each value of the other variable.

We say that there is an association between two variables if specific values of one variable tend to occur in common with specific values of the other.

Page 4: Wednesday, August 11 (131 minutes) - Lee County Schools · Web viewBlackberry Storm 330 Motorola Cliq 360 Samsung Moment 330 Blackberry Tour 300 HTC Droid 460 Ex 2: Time Spent on

------------------------------------------------------------------------------------------------------------------------------------------------------------Ex 2: Cell phones

The Pew Research Center asked a random sample of 2024 adult cell phone owners from the United States which type of cell phone they own: iPhone, Android, or other (including non-smart phones). Here are the results, broken down by age category.h tt p :// pewin t erne t. org / Repor t s / 2013 / S m ar t phone- O wnership-2013 . aspx

18-34 35-54 55+ Total

iPhone 169 171 127

Android 214 189 100

Other 134 277 643

Total

Questions:1. Use the cell phone data to calculate the marginal distribution (in percents) of type of cell phone.2. Make a graph to display the marginal distribution. Describe what you see.3. What percent of each age group is an iPhone user (conditional distribution)?4. Make a side-by-side bar graph of cell phone type by age group

1.2 Displaying Quantitative Data with Graphs

Stemplot: Displays distributions of quantitative data. These graphs work best for small numbers of observations that are all greater than 0. This will give you a quick picture of the shape of a distribution while including the actual numerical values in the graphs. When you wish to compare two related distributions, a back-to-back stemplot with common stems is useful.

Histograms: This graph shows the distribution of counts or percents among the values of a single quantitative variable.• Classes (or bins or bars) should be of equal width. A good rule of thumb is to use a minimum of five classes.• Be sure to pay attention to whether you are reading/creating a frequency histogram (number in a group) or a percent histogram.

Dotplot: One of the simplest ways to graphically represent quantitative data.

Page 5: Wednesday, August 11 (131 minutes) - Lee County Schools · Web viewBlackberry Storm 330 Motorola Cliq 360 Samsung Moment 330 Blackberry Tour 300 HTC Droid 460 Ex 2: Time Spent on

How to Examine the Distribution of a Quantitative Variable

Ex1: Smart Phone Battery Life Here is the estimated battery life for each of 9 different smart phones (in minutes). Make a dotplot of the data and describe what you see.

Ex 2: Time Spent on Internet: Graph the data using a histogramTime on theIn ter n et(m in ) F re qu e n cy 0 710 120 330 740 145 160 1590 3120 14180 10210 1240 10270 2300 9360 3

Definition: Symmetric and Skewed Distributions (describing shape)

Smart Phone Battery Life (minutes)Apple iPhone 300

Motorola Droid 385Palm Pre 300

Blackberry Bold 360Blackberry Storm 330

Motorola Cliq 360Samsung Moment 330

Blackberry Tour 300HTC Droid 460

Page 6: Wednesday, August 11 (131 minutes) - Lee County Schools · Web viewBlackberry Storm 330 Motorola Cliq 360 Samsung Moment 330 Blackberry Tour 300 HTC Droid 460 Ex 2: Time Spent on

Illustrate the following distribution shapes: Symmetric Skewed right Skewed left

Unimodal Bimodal Uniform

What is the most important thing to remember when you are asked to compare two distributions?

Ex 3: Energy Cost: Top vs. Bottom FreezersHow do the annual energy costs (in dollars) compare for refrigerators with top freezers and refrigerators with bottom freezers? The data below is from the May 2010 issue of Consumer Reports.

1.3 Describing Quantitative Data with Numbers

C E N TE R: M ean o r M e di an M ea n : average value of a population ( μ, "mu" ) or sample ( x , “x-bar”); BE CAREFUL, the mean is sensitive to the influence of a few extreme observations! We say that the mean is not a resistant measure of center and use it to describe the center of symmetric distributions.

M e di a n : middle value of a data set when the data is in order from least to greatest. It is the value such that 50% of the data falls below and 50% falls above (which makes it the 50th percentile). The median is more resistant to extreme observations than the mean so it is used to describe the center of skewed distributions. Sometimes, the median is referred to as Q2.

Comparing the Mean and Median: The mean and the median of a roughly symmetric distribution are close together. If the distribution is exactly symmetric, the mean and median will be exactly the same. In a skewed distribution, the mean is farther out in the long tail than the median.

SPREAD: Interquartile RangeRange is the difference between the maximum observation and the minimum observation (range = max - min).

The first quartile (Q1) is the value in a data set such that 25% of the data falls below it (25th percentile).

Page 7: Wednesday, August 11 (131 minutes) - Lee County Schools · Web viewBlackberry Storm 330 Motorola Cliq 360 Samsung Moment 330 Blackberry Tour 300 HTC Droid 460 Ex 2: Time Spent on

10 30 5 25 40 20 10 15 30 2015 20 85 15 65 15 60 60 40 45

The third quartile (Q3) is the value in a data set such that 75% of the data falls below it (75th percentile).

To calculate the quartiles:1. 1. Arrange the observations in increasing order and locate the median M.2. 2. The first quartile Q1 is the median of the first half of the observations (the observations to the left of M).3. 3. The third quartile Q3 is the median of the last half of the observations (the observations to the right of M).

The interquartile range is the distance between the quartiles (Q3-Q1) and is a measure of spread that is more resistant to outliers than the range (IQR = Q3-Q1). IQR is used as a measure of spread for skewed distributions (along with median for center).

Example 1: People say that it takes a long time to get to work in New York State due to the heavy traffic near big cities. What do the data say? Here are the travel times in minutes of 20 randomly chosen New York workers:

1. Make a stemplot of the data. Be sure to include a key.2. Find and interpret the median of the travel times.3. Find and interpret the IQR of travel times.4. Find the mean of the travel times. How does the mean compare to the median? What does this confirm for you

about the shape of the distribution of travel times?

UNUSUAL FEATURES: Identifying OutliersAn observation is called an outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile.

upper bound = Q3 + 1.5 (IQR) lower bound = Q1 - 1.5 (IQR)

Example 2: Determine if the distribution of travel times from Example 1 has an outlier. Show your calculations and justify your answer.

Page 8: Wednesday, August 11 (131 minutes) - Lee County Schools · Web viewBlackberry Storm 330 Motorola Cliq 360 Samsung Moment 330 Blackberry Tour 300 HTC Droid 460 Ex 2: Time Spent on

The Five-Number Summary and BoxplotsThe five-number summary of a set of observations consists of the minimum, first quartile (Q1), the median M (Q2), third quartile (Q3), and the maximum written in order from smallest to largest.

Min Q1 M Q3 Max

These five numbers divide each distribution roughly into quarters. About 25% of the data values fall between the minimum and Q1, about 25% are between Q1 and the median, about 25% are between the median and Q3, and about 25% are between Q3 and the maximum. This five-number summary leads us to a new graph, the boxplot (aka “box and whisker plot”).

● A central box spans the quartiles Q1 and Q3.● A line in the box marks the median, M.● Lines extend from the box out to the smallest and largest observations.

Example 3: Create a box and whisker plot of the distribution of travel times from Example 1.

Example 4: The 2009 roster of the Dallas Cowboys professional football team included 10 offensive linemen. Their weights (in pounds) were

338318353 313 318 326 307 317 311 311

1. Find the five-number summary for these data using the calculator.2. Calculate the IQR. Interpret this value in context.3. Determine whether there are any outliers using the 1.5 x IQR rule.4. Draw a box plot of the data.

SPREAD: Standard DeviationThe standard deviation measures the average distance of the observations (data points) from their mean. It is calculated by finding an average of the squared distances and then taking the square root. This average squared distance is called the variance. Standard deviation should be used as the measure of spread for symmetric distributions (along with mean for center).

Page 9: Wednesday, August 11 (131 minutes) - Lee County Schools · Web viewBlackberry Storm 330 Motorola Cliq 360 Samsung Moment 330 Blackberry Tour 300 HTC Droid 460 Ex 2: Time Spent on

Males: 127 44 28 83 0 6 78 5 213 73 20 214 28 11Females: 112 203 102 54 379 305 179 24 127 65 41 27 298 6 130 0

From the AP Formula Sheet: sx =√ 1n−1

Σ(x i−x )2

Variance (not on the AP Formula Sheet) = sx2

Example 5: The heights (in inches) of the five starters on a basketball team are 67, 72, 76, 76, and 84.1. Find and interpret the mean.2. Make a table that shows, for each value, its deviation from the mean and its squared deviation from the mean.3. Show how to calculate the variance and standard deviation from the values in your table.4. Interpret the meaning of the standard deviation in this setting.5. Use your calculator to confirm the standard deviation of the heights. What is the difference between σ and s?

Choosing Measures of Center and Spread: We now have a choice between two descriptions of the center and spread of a distribution: the median and IQR, OR mean and standard deviation. So, how do we know which one to use?

The median and IQR work for everything! You definitely want to use them when describing a skewed distribution since the median and IQR are resistant to outliers.

Use the mean and standard deviation only for reasonable symmetric distributions that don’t have outliers.

Example 6: For their final project, a group of AP Statistics students investigated their belief that females text more than males. They asked a random sample of students from their school to record the number of text messages sent and received over a two-day period. Here are their data:

What conclusion should the students draw? Give appropriate evidence to support your answer.