STATISTICS 13 Lecture 2 Mar 31, 2010
What is Statistics? As a colloquial term
– Sports statistics (e.g. basketball scores, goals scored in a season)
– Business statistics (e.g. gross domestic product, total volume of transactions)
As a discipline: the science of making numerical conjectures about puzzling questions: – What are the effects of a new drug?– Why parents and children show resemblance?– Why does casino make a profit at roulette?– Who is going to win the next election?
What is Statistics? (Cont.)
A statistical study involves:- Gathering data – Analyzing data– Making decisions under uncertainty
Example: Population of US Population statistic of US: (from Population Division, U.S. Census Bureau )
Statistical analysis :-Collect data: by census-Analyze data: what is the trend of population size in US?-Make predictions: what would be the population size of US in 2010?
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
249,464,396
252,153,09
255,029,699
257,782,608
260,327,021
262,803,276
265,228,572
267,783,607
270,248,003
272,690,813
Example (cont.) “Linear trend” from 1990 to 1999:
displaying data in a graph helps us to understand it
235,000,000240,000,000245,000,000250,000,000255,000,000260,000,000265,000,000270,000,000275,000,000
1990
1993
1996
1999
US population
Variables A variable is a characteristic which varies from
subject to subject and/or changes over time in a study
Examples:- Population size of US (changes over time)– Hair color (varies from person to person)– Blood pressure (varies over time and from person to
person )– Car mileage (varies over time and from car to car)– Annual rainfall amount of CA (varies over year)
Types of Variables (cont.)Qualitative variables measure a
quality or a characteristic on a subjectExamples :
– Eye color (black, brown, blue, hazel,…)– Gender (Male, Female)– Make of car (GM, Honda, Toyota,…)– State of birth (California, Pennsylvania, Alaska,…)
Types of Variables (cont.)Quantitative variables measure a
numerical quantity on a subject– Discrete if it assumes only a finite or
countable number of different values– Continuous if it can assume infinitely
many values corresponding to the points on the real line
Examples Quantitative discrete:
-number of wins by a baseball team in a season (0, 1, 2, 3, etc)-number of incidences of fire on campus in a year (0, 1,2, etc)-number of customers of a store in a day
Quantitative continuous: -time it takes for me to commute between my home and school (between 5 min and 15 min)-time to complete a survey
DataAn experimental unit is the subject on
which a variable is measuredA measurement results when a variable
is actually measured on an experimental unitA set of measurements is called data
Population vs. SampleA population is a set of subjects under
interest to the current study-the definition of population depends on your study. For example, if we want to study the weight of US people, then the population consists of all US residents; if we want to study the weight of Chinese people, then the population consists of all Chinese.
Population vs. Sample (Cont.)
Studying the whole population is usually impractical. Only part of it can be examined, and this subset of the population is called the sample -investigators want to make generalizations from
the part to the whole; in other words, they want to make inference from the sample to the population-in order to make inference from sample to
population, sample needs to be representative of the population. Therefore “how to draw a sample” is important
Example 1: Trout Study: goal—investigate weight of trout in Lake
Tahoe; 100 trout are captured and measured Variable: weight (varies from fish to fish) Population: all trout in Tahoe Sample: the 100 captured trout An experimental unit: one trout from the catch A measurement: the weight of that trout Data: weights of all 100 captured trout
Example 2 Study: goal--investigate the life style of students in
statistics 13, Section A Variables: variable 1—most favorite movie genre (drama,
comedy, horror, suspense, romance, etc); variable 2—time to get up (early bird/ not early bird)
Population: everyone in our class (500 enrollment!) Sample: students in session A01 (T: 8:00-9:00am, currently
50 students) Question: Is this sample representative of the population?
How Many Variables Have You Measured in Your Study?
Univariate data: One variable is measured on a single experimental unit
Bivariate data: Two variables are measured on a single experimental unit
Multivariate data: More than two variables are measured on a single experimental unit
Example “Trout weight” study: univariate data: one
variable-weight “Life style” study: bivariate data: two
variables-favorite movie genre and get up time
If we also measure the “favorite food” of the students in the “life style” study, then it becomes multivariate data
Summarize Data in a Statistical Table
Answer two questions: -What values of the variable have been observed in your data ? -How often each value has occurred ?
“How often” can be measured in terms of• Frequency: exact number of occurrences• Relative frequency = Frequency / Total
number of measurements• Percent = Relative frequency x 100%
Summarize Data in a Table (cont.)
Data: A bag of M&Ms contains 25 candies: Bl, R, G, Br, Y, R, Bl, G, Y, O, O, G, Bl, G, Br, O, Bl, Y, R, Y, O, Bl, Br, O, Bl
Variable of interest: color of candy (varies from candy to candy)
Summarizing information in a table
Color Tally Frequency Relative Frequency
Percent
Red R R R
Blue Bl Bl Bl Bl Bl Bl
Green G G G G 4 4/25 = .16 16%
Orange O O O O O 5 5/25 = .20 20%
Brown Br Br Br 3 3/25 = .12 12%
Yellow Y Y Y Y 4 4/25 = .16 16%
Graphing Qualitative Variables
Pie Chart: circular graph that shows how the measurements are distributed among the categories
The angle for each part:R—360*0.12=43.2; BL—360*0.24=86.4G—360*0.16=57.6;O—360*0.20=72B—360*0.12=43.2;Y—360*0.16=57.6
16.0%Green
20.0%Orange
24.0%Blue
12.0%Red
16.0%Yellow
12.0%Brown
Graphing Qualitative Variables (cont.)
Bar Chart: shows how the measurements are distributed among the categories with the height of the bar measuring how often a particular category is observed
Color
Freq
uenc
y
GreenOrangeBlueRedYellowBrown
6
5
4
3
2
1
0
Example: US Population Groups
The American population over the past 50 years has been grouped by nicknames that try to describe their common traits. According to a recent magazine article, the numbers of Americans in each of the four age categories are as shown in the table
Generation Number of Americans (in millions)
Matures (born before 1946) 68.3
Baby boomers (born 1946-1964)
77.6
GenXers (born 1965-1976) 44.6
Others (born after 1976) 72.4
Example (cont.) What graphical methods could you use to describe
the data?-Either a bar chart or a pie chart would be appropriate.
When raw number is given, people often use a bar chart; when the data is already presented as percentages of the whole group, we often choose a pie chart
Draw a bar chat
010
203040
506070
8090
Matures Baby Boomers GenXers Others
Generation
Num
ber o
f Am
eric
ans
(milli
ons)