Ch1: The Nature of Probability and Statistics Santorico - Page 1 The Nature of Probability and Statistics Chapter 1 Statistics is the science of conducting studies to collect, organize, summarize, analyze, and draw conclusions from data. Why Study Statistics?
25
Embed
The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Ch1: The Nature of Probability and Statistics Santorico - Page 1
The Nature of Probability and Statistics
Chapter 1 Statistics is the science of conducting studies to collect,
organize, summarize, analyze, and draw conclusions from data. Why Study Statistics?
Ch1: The Nature of Probability and Statistics Santorico - Page 2
Example: Archaeology
Four measurements were made of male Egyptian skulls from five different time periods ranging from 4000 B.C. to 150 A.D. Are there differences in the skull sizes between the time periods? The researchers theorize that a change in skull size over time is evidence of the interbreeding of the Egyptians with immigrant populations over the years. Thirty skulls are measured from each of the 5 time periods.
Measurements:
1. MB: Maximal Breadth of Skull 2. BH: Basibregmatic Height of Skull 3. BL: Basialveolar Length of Skull 4. NH: Nasal Height of Skull
Ch1: The Nature of Probability and Statistics Santorico - Page 3
Sec 1-1: Descriptive and Inferential Statistics Main Areas of Statistics
Descriptive statistics consists of the collection, organization, summarization, and presentation of data.
Inferential statistics consists of generalizing from samples to populations, performing estimation and hypothesis tests, determining relationships among variables, and making predictions.
Inferential statistics uses probability (the likelihood of an
outcome occurring) to make conclusions and predictions.
Ch1: The Nature of Probability and Statistics Santorico - Page 4
A population consists of all subjects that are being studied.
A sample is a group of subjects selected from a population. A variable is a characteristic or attribute that can assume different values.
A random variable is a variable whose values are determined by chance.
Data are the values (measurements or observations) that the variables can assume.
A data set is a collection of data values.
Ch1: The Nature of Probability and Statistics Santorico - Page 5
Ch1: The Nature of Probability and Statistics Santorico - Page 6
Sec 1-2: Variables and Types of Data
Data
Qualitative Quantitative
Discrete Continuous
Types of Variables Qualitative variables are variables that can be placed into categories,
according to some characteristic or attribute.
Example: Quantitative variables are numeric in nature and can be ordered or ranked.
Quantitative variables can be either discrete or continuous. Example:
Ch1: The Nature of Probability and Statistics Santorico - Page 7
Qualitative Variables can be nominal and ordinal.
The nominal level of measurement classifies the data into categories with no meaningful order or ranking can be imposed on the data.
The ordinal level of measurement classifies the data into categories that can be meaningfully ranked or ordered.
Ch1: The Nature of Probability and Statistics Santorico - Page 8
More on Quantitative variables:
A continuous variable is a quantitative variable that can assume ANY numerical value between any two specific values.
Obtained by measuring.
May include fractions and decimals.
A discrete variable is a quantitative variable that has either a finite number of possible values or a countable number of possible values.
Countable means that the values result from counting, such as 0, 1, 2, 3…
Examples: Determine if the variable would be discrete or continuous for the following examples:
Weight (lbs):
Number of car accidents in which you’ve been involved:
Temperature (F):
Ch1: The Nature of Probability and Statistics Santorico - Page 9
Continuous data measurements must be rounded because of the limits of the measuring device. Typically answers will be measured to the nearest unit.
The boundaries of a measurement provide the range of possible values, up to the upper bound, that could have led to the recorded value.
Example: Recorded Value Boundaries 12 in [11.5 – 12.5) 0.57 sec 3.8 g
Ch1: The Nature of Probability and Statistics Santorico - Page 10
The measurement scale/level of a variable describes how the variable is categorized, counted, or measured. Qualitative variables: nominal and ordinal Quantitative variables: interval and ratio The interval level of measurement ranks (quantitative) data, but the “zero” value is arbitrary.
The “zero” value is when the variable is zero. It is arbitrary if it does NOT mean a total absence of that variable.
The ratio level of measurement ranks (quantitative data) and there is a true zero value. Additionally, true ratios exist when the same variable exists on two different members of the population.
Ch1: The Nature of Probability and Statistics Santorico - Page 11
Ch1: The Nature of Probability and Statistics Santorico - Page 12
Sec. 1-3: Data Collection and Sampling Techniques
Data is often collected via surveys. Telephone Surveys Mailed Questionnaire Personal Interview Internet Survey
What are advantages and disadvantages of data collection
through surveys?
Ch1: The Nature of Probability and Statistics Santorico - Page 13
Advantages
Disadvantages
Telephone Surveys
Less costly than
personal interview.
People may be more candid.
Not everyone has a phone.
Cell phones typically not included.
Tone of interviewer’s voice may affect response.
Mailed
Questionnaire
Can cover wider
geographic area
than phone
survey or
personal
interview
Respondents can
remain anonymous
Less expensive
Low number of responses
Inappropriate answers to questions
Low reading abilities or not
understanding questions may
create useless responses.
Ch1: The Nature of Probability and Statistics Santorico - Page 14
Personal
Interview
In-depth responses to
questions
Interviewers must be
trained in asking
questions and recording
responses (which is
costly).
Interviewer may be
biased in the selection of
participants.
Internet
survey
As with telephone and mail
surveys,
Inexpensive, often free
Candor
Large geographic
coverage
Anonymity
Will miss demographics
without computer access
May have inappropriate
answers if questions are
misunderstood
Ch1: The Nature of Probability and Statistics Santorico - Page 15
Sampling Techniques
Researchers use samples to collect data and information about a particular variable from a population.
Samples save time, money, and may actually allow a researcher to collect better information.
Samples need to be representative of the population or they are meaningless in drawing conclusions about the population.
Sampling must be done in a way that the samples are unbiased—that each subject in the population has an equal chance of being in the sample.
Scenario: Suppose we are interested in studying how the University of Colorado Denver undergraduate population feels about the outcome of the presidential election.
Ch1: The Nature of Probability and Statistics Santorico - Page 16
Technique Description Example
Random sampling
Uses chance methods or random numbers
to select the sample. Everyone or
everything from the population has the
same chance of being selected for the
sample and it is the best way of obtaining a
representative sample.
Systematic sampling
Numbers each subject of the population
and then selects every kth subject.
Convenience sampling
Selects subjects that are convenient for the
researcher. These samples are typically of
not statistical value.
Ch1: The Nature of Probability and Statistics Santorico - Page 17
Stratified sampling
Divides the population into groups (called
strata) according to some characteristic
that is important to the study, then
randomly samples subject from each group.
Cluster Sampling
Divides the population into groups called
clusters by some means such as geographic
area or schools in a school district, etc.
Then randomly select some of the clusters
and use ALL members of the selected
clusters.
Ch1: The Nature of Probability and Statistics Santorico - Page 18
Sec 1-4: Observational and Experimental Studies
Observational study - the researcher merely observes what is happening or what has happened in the past and tries to draw conclusions based on these observations.
Example:
Experimental study - the researcher manipulates one of the variables and tries to determine how the manipulation influences other variables.
Example:
Ch1: The Nature of Probability and Statistics Santorico - Page 19
Experiments have at least two groups: Treatment Group – the group(s) in the sample that receives a treatment or
experimental condition. Control Group – the group in the sample that is treated identically in all
respects to the treatment group EXCEPT that they don’t receive the active
treatment.
Using a control group allows us to see what would have happened to the
response variable if treatments had not been applied. Placebo – a treatment that looks like a real drug but has no active ingredient
(meaning it doesn’t do anything!). Placebo Effect – when people take a placebo and it works like the treatment
or better.
This is usually because of psychological reasons. Our minds are powerful!
Good experiments include a placebo group when humans are involved.
Ch1: The Nature of Probability and Statistics Santorico - Page 20
Independent Variable – the variable that is being manipulated by the researcher (also called the explanatory variable).
Dependent Variable – the response to the independent variable or the result of the explanatory variable (also called the response or outcome variable).
Example: Taking nicotine patch and smoking status.
Ch1: The Nature of Probability and Statistics Santorico - Page 21
Advantages of Experiments
The effect of an explanatory variable can be studied more precisely. Researcher has (some) control over selecting participants, assigning
them to groups, and manipulating the independent variable. Cause and effect relationships can be established using
randomized experiments (e.g., smoking causes cancer in lab rats). Note: In order to make cause and effect conclusions in an experiment, the subjects must be randomly assigned among the treatment groups.
Disadvantages of Experiments May occur in unnatural settings (e.g., laboratories). Hawthorne Effect - when subjects know they are participating in an
experiment and change their behavior in ways that affect the results of the study. (weight loss studies)
Not all variables can be controlled for in a study.
Ch1: The Nature of Probability and Statistics Santorico - Page 22
Advantages of Observational Studies Occur in natural settings. Allows us to study situations for which it would be
illegal/unethical to conduct an experiment (e.g., rape, suicide, illegal drug use).
Disadvantages of Observational Studies Cannot make cause and effect conclusions because of
confounding variables. Data quality may be poor if researcher didn’t collect the data.
Confounding variables – one that influences the dependent or outcome variable but was not separated from the independent variable (e.g., vitamins and health, weight and income).
Ch1: The Nature of Probability and Statistics Santorico - Page 23
Examples of Confounding Variables Age and income
Vitamins and health
Weight tends to be higher among lower socio-economic groups.
What are the confounding variables?
Sections 1-5 and 1-6 (Read on your own) Misuses of statistics.
Computers and calculators.
Ch1: The Nature of Probability and Statistics Santorico - Page 24
We will come back to this for a longer discussion, but it is good to have a look now and start thinking about it:
Should you believe the results of a study? Eight Guidelines for Evaluating a Statistical Study
1. Identify the goal of the study, the population considered, and
type of study. 2. Consider the source, particularly with regard to whether the
researcher may be biased. 3. Look for bias that may prevent a sample from being
representative of the population. a. Selection bias occurs whenever researchers select their sample
in a way that tends to make it unrepresentative of the population.
b. Participation bias occurs primarily with surveys and polls; it arises whenever people choose whether to participate.
Ch1: The Nature of Probability and Statistics Santorico - Page 25
4. Look for problems in defining or measuring the variables of interest, which can make it difficult to interpret results.
5. Watch out for confounding variables that can invalidate the conclusions of a study. a. Are there viable alternate explanations of the results?
6. Consider the setting and the wording of questions in any survey, looking for anything that might tend to produce inaccurate or dishonest responses.
7. Check that the results are presented fairly in graphs and concluding statements.
8. Stand back and consider the conclusions. a. Did it achieve its goals? b. Do conclusions make sense? c. Do results have any practical significance?