Top Banner
STATISTICS 13 Lecture 2 Mar 31, 2010
24

STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Dec 28, 2018

Download

Documents

hoangnguyet
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

STATISTICS 13 Lecture 2

Mar 31, 2010

Page 2: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

What is Statistics? As a colloquial term

– Sports statistics (e.g. basketball scores, goals scored in a season)

– Business statistics (e.g. gross domestic product, total volume of transactions)

As a discipline: the science of making numerical conjectures about puzzling questions: – What are the effects of a new drug?– Why parents and children show resemblance?– Why does casino make a profit at roulette?– Who is going to win the next election?

Page 3: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

What is Statistics? (Cont.)

A statistical study involves:- Gathering data – Analyzing data– Making decisions under uncertainty

Page 4: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Example: Population of US Population statistic of US: (from Population Division, U.S. Census Bureau )

Statistical analysis :-Collect data: by census-Analyze data: what is the trend of population size in US?-Make predictions: what would be the population size of US in 2010?

1990 1991 1992 1993 1994 1995 1996 1997 1998 1999

249,464,396

252,153,09

255,029,699

257,782,608

260,327,021

262,803,276

265,228,572

267,783,607

270,248,003

272,690,813

Page 5: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Example (cont.) “Linear trend” from 1990 to 1999:

displaying data in a graph helps us to understand it

235,000,000240,000,000245,000,000250,000,000255,000,000260,000,000265,000,000270,000,000275,000,000

1990

1993

1996

1999

US population

Page 6: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Variables A variable is a characteristic which varies from

subject to subject and/or changes over time in a study

Examples:- Population size of US (changes over time)– Hair color (varies from person to person)– Blood pressure (varies over time and from person to

person )– Car mileage (varies over time and from car to car)– Annual rainfall amount of CA (varies over year)

Page 7: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Types of VariablesVariables

Qualitative Quantitative

Discrete Continuous

Page 8: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Types of Variables (cont.)Qualitative variables measure a

quality or a characteristic on a subjectExamples :

– Eye color (black, brown, blue, hazel,…)– Gender (Male, Female)– Make of car (GM, Honda, Toyota,…)– State of birth (California, Pennsylvania, Alaska,…)

Page 9: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Types of Variables (cont.)Quantitative variables measure a

numerical quantity on a subject– Discrete if it assumes only a finite or

countable number of different values– Continuous if it can assume infinitely

many values corresponding to the points on the real line

Page 10: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Examples Quantitative discrete:

-number of wins by a baseball team in a season (0, 1, 2, 3, etc)-number of incidences of fire on campus in a year (0, 1,2, etc)-number of customers of a store in a day

Quantitative continuous: -time it takes for me to commute between my home and school (between 5 min and 15 min)-time to complete a survey

Page 11: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

DataAn experimental unit is the subject on

which a variable is measuredA measurement results when a variable

is actually measured on an experimental unitA set of measurements is called data

Page 12: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Population vs. SampleA population is a set of subjects under

interest to the current study-the definition of population depends on your study. For example, if we want to study the weight of US people, then the population consists of all US residents; if we want to study the weight of Chinese people, then the population consists of all Chinese.

Page 13: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Population vs. Sample (Cont.)

Studying the whole population is usually impractical. Only part of it can be examined, and this subset of the population is called the sample -investigators want to make generalizations from

the part to the whole; in other words, they want to make inference from the sample to the population-in order to make inference from sample to

population, sample needs to be representative of the population. Therefore “how to draw a sample” is important

Page 14: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Example 1: Trout Study: goal—investigate weight of trout in Lake

Tahoe; 100 trout are captured and measured Variable: weight (varies from fish to fish) Population: all trout in Tahoe Sample: the 100 captured trout An experimental unit: one trout from the catch A measurement: the weight of that trout Data: weights of all 100 captured trout

Page 15: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Example 2 Study: goal--investigate the life style of students in

statistics 13, Section A Variables: variable 1—most favorite movie genre (drama,

comedy, horror, suspense, romance, etc); variable 2—time to get up (early bird/ not early bird)

Population: everyone in our class (500 enrollment!) Sample: students in session A01 (T: 8:00-9:00am, currently

50 students) Question: Is this sample representative of the population?

Page 16: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

How Many Variables Have You Measured in Your Study?

Univariate data: One variable is measured on a single experimental unit

Bivariate data: Two variables are measured on a single experimental unit

Multivariate data: More than two variables are measured on a single experimental unit

Page 17: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Example “Trout weight” study: univariate data: one

variable-weight “Life style” study: bivariate data: two

variables-favorite movie genre and get up time

If we also measure the “favorite food” of the students in the “life style” study, then it becomes multivariate data

Page 18: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Summarize Data in a Statistical Table

Answer two questions: -What values of the variable have been observed in your data ? -How often each value has occurred ?

“How often” can be measured in terms of• Frequency: exact number of occurrences• Relative frequency = Frequency / Total

number of measurements• Percent = Relative frequency x 100%

Page 19: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Summarize Data in a Table (cont.)

Data: A bag of M&Ms contains 25 candies: Bl, R, G, Br, Y, R, Bl, G, Y, O, O, G, Bl, G, Br, O, Bl, Y, R, Y, O, Bl, Br, O, Bl

Variable of interest: color of candy (varies from candy to candy)

Summarizing information in a table

Color Tally Frequency Relative Frequency

Percent

Red R R R

Blue Bl Bl Bl Bl Bl Bl

Green G G G G 4 4/25 = .16 16%

Orange O O O O O 5 5/25 = .20 20%

Brown Br Br Br 3 3/25 = .12 12%

Yellow Y Y Y Y 4 4/25 = .16 16%

Page 20: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Graphing Qualitative Variables

Pie Chart: circular graph that shows how the measurements are distributed among the categories

The angle for each part:R—360*0.12=43.2; BL—360*0.24=86.4G—360*0.16=57.6;O—360*0.20=72B—360*0.12=43.2;Y—360*0.16=57.6

16.0%Green

20.0%Orange

24.0%Blue

12.0%Red

16.0%Yellow

12.0%Brown

Page 21: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Graphing Qualitative Variables (cont.)

Bar Chart: shows how the measurements are distributed among the categories with the height of the bar measuring how often a particular category is observed

Color

Freq

uenc

y

GreenOrangeBlueRedYellowBrown

6

5

4

3

2

1

0

Page 22: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Example: US Population Groups

The American population over the past 50 years has been grouped by nicknames that try to describe their common traits. According to a recent magazine article, the numbers of Americans in each of the four age categories are as shown in the table

Generation Number of Americans (in millions)

Matures (born before 1946) 68.3

Baby boomers (born 1946-1964)

77.6

GenXers (born 1965-1976) 44.6

Others (born after 1976) 72.4

Page 23: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Example (cont.) What graphical methods could you use to describe

the data?-Either a bar chart or a pie chart would be appropriate.

When raw number is given, people often use a bar chart; when the data is already presented as percentages of the whole group, we often choose a pie chart

Draw a bar chat

010

203040

506070

8090

Matures Baby Boomers GenXers Others

Generation

Num

ber o

f Am

eric

ans

(milli

ons)

Page 24: STATISTICS 13 - University of California, Davisanson.ucdavis.edu/~tcmlee/sta13a2010spr/lec02.pdf · What is Statistics? ... (born 1946-1964) 77.6. GenXers (born 1965-1976) 44.6. Others

Example (cont.) Pie chart: first calculate the

percentage of each category Generation Percentage

Matures (born before 1946)

26%

Baby boomers (born 1946-1964)

30%

GenXers (born 1965-1976)

17%

Others (born after 1976)

27%Matures

Baby boomers

Genxers

Others