Top Banner
Ch1: The Nature of Probability and Statistics Santorico - Page 1 The Nature of Probability and Statistics Chapter 1 Statistics is the science of conducting studies to collect, organize, summarize, analyze, and draw conclusions from data. Why Study Statistics?
25

The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

May 02, 2018

Download

Documents

lengoc
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 1

The Nature of Probability and Statistics

Chapter 1 Statistics is the science of conducting studies to collect,

organize, summarize, analyze, and draw conclusions from data. Why Study Statistics?

Page 2: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 2

Example: Archaeology

Four measurements were made of male Egyptian skulls from five different time periods ranging from 4000 B.C. to 150 A.D. Are there differences in the skull sizes between the time periods? The researchers theorize that a change in skull size over time is evidence of the interbreeding of the Egyptians with immigrant populations over the years. Thirty skulls are measured from each of the 5 time periods.

Measurements:

1. MB: Maximal Breadth of Skull 2. BH: Basibregmatic Height of Skull 3. BL: Basialveolar Length of Skull 4. NH: Nasal Height of Skull

Page 3: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 3

Sec 1-1: Descriptive and Inferential Statistics Main Areas of Statistics

Descriptive statistics consists of the collection, organization, summarization, and presentation of data.

Inferential statistics consists of generalizing from samples to populations, performing estimation and hypothesis tests, determining relationships among variables, and making predictions.

Inferential statistics uses probability (the likelihood of an

outcome occurring) to make conclusions and predictions.

Page 4: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 4

A population consists of all subjects that are being studied.

A sample is a group of subjects selected from a population. A variable is a characteristic or attribute that can assume different values.

A random variable is a variable whose values are determined by chance.

Data are the values (measurements or observations) that the variables can assume.

A data set is a collection of data values.

Page 5: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 5

MB BH BL NH Time Period 131 138 89 49 -4000 125 131 92 48 -4000 131 132 99 50 -4000 139 130 108 48 -4000 125 136 93 48 -4000 131 134 102 51 -4000 134 134 99 51 -4000 .......

138 136 92 46 150 131 129 97 44 150 132 127 97 52 150 137 125 85 57 150 129 128 81 52 150 140 135 103 48 150 147 129 87 48 150 136 133 97 51 150

Population?

Sample?

Variable?

Random variable?

Data?

Data set?

Page 6: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 6

Sec 1-2: Variables and Types of Data

Data

Qualitative Quantitative

Discrete Continuous

Types of Variables Qualitative variables are variables that can be placed into categories,

according to some characteristic or attribute.

Example: Quantitative variables are numeric in nature and can be ordered or ranked.

Quantitative variables can be either discrete or continuous. Example:

Page 7: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 7

Qualitative Variables can be nominal and ordinal.

The nominal level of measurement classifies the data into categories with no meaningful order or ranking can be imposed on the data.

The ordinal level of measurement classifies the data into categories that can be meaningfully ranked or ordered.

Page 8: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 8

More on Quantitative variables:

A continuous variable is a quantitative variable that can assume ANY numerical value between any two specific values.

Obtained by measuring.

May include fractions and decimals.

A discrete variable is a quantitative variable that has either a finite number of possible values or a countable number of possible values.

Countable means that the values result from counting, such as 0, 1, 2, 3…

Examples: Determine if the variable would be discrete or continuous for the following examples:

Weight (lbs):

Number of car accidents in which you’ve been involved:

Temperature (F):

Page 9: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 9

Continuous data measurements must be rounded because of the limits of the measuring device. Typically answers will be measured to the nearest unit.

The boundaries of a measurement provide the range of possible values, up to the upper bound, that could have led to the recorded value.

Example: Recorded Value Boundaries 12 in [11.5 – 12.5) 0.57 sec 3.8 g

Page 10: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 10

The measurement scale/level of a variable describes how the variable is categorized, counted, or measured. Qualitative variables: nominal and ordinal Quantitative variables: interval and ratio The interval level of measurement ranks (quantitative) data, but the “zero” value is arbitrary.

The “zero” value is when the variable is zero. It is arbitrary if it does NOT mean a total absence of that variable.

The ratio level of measurement ranks (quantitative data) and there is a true zero value. Additionally, true ratios exist when the same variable exists on two different members of the population.

Page 11: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 11

Page 12: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 12

Sec. 1-3: Data Collection and Sampling Techniques

Data is often collected via surveys. Telephone Surveys Mailed Questionnaire Personal Interview Internet Survey

What are advantages and disadvantages of data collection

through surveys?

Page 13: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 13

Advantages

Disadvantages

Telephone Surveys

Less costly than

personal interview.

People may be more candid.

Not everyone has a phone.

Cell phones typically not included.

Tone of interviewer’s voice may affect response.

Mailed

Questionnaire

Can cover wider

geographic area

than phone

survey or

personal

interview

Respondents can

remain anonymous

Less expensive

Low number of responses

Inappropriate answers to questions

Low reading abilities or not

understanding questions may

create useless responses.

Page 14: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 14

Personal

Interview

In-depth responses to

questions

Interviewers must be

trained in asking

questions and recording

responses (which is

costly).

Interviewer may be

biased in the selection of

participants.

Internet

survey

As with telephone and mail

surveys,

Inexpensive, often free

Candor

Large geographic

coverage

Anonymity

Will miss demographics

without computer access

May have inappropriate

answers if questions are

misunderstood

Page 15: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 15

Sampling Techniques

Researchers use samples to collect data and information about a particular variable from a population.

Samples save time, money, and may actually allow a researcher to collect better information.

Samples need to be representative of the population or they are meaningless in drawing conclusions about the population.

Sampling must be done in a way that the samples are unbiased—that each subject in the population has an equal chance of being in the sample.

Scenario: Suppose we are interested in studying how the University of Colorado Denver undergraduate population feels about the outcome of the presidential election.

Page 16: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 16

Technique Description Example

Random sampling

Uses chance methods or random numbers

to select the sample. Everyone or

everything from the population has the

same chance of being selected for the

sample and it is the best way of obtaining a

representative sample.

Systematic sampling

Numbers each subject of the population

and then selects every kth subject.

Convenience sampling

Selects subjects that are convenient for the

researcher. These samples are typically of

not statistical value.

Page 17: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 17

Stratified sampling

Divides the population into groups (called

strata) according to some characteristic

that is important to the study, then

randomly samples subject from each group.

Cluster Sampling

Divides the population into groups called

clusters by some means such as geographic

area or schools in a school district, etc.

Then randomly select some of the clusters

and use ALL members of the selected

clusters.

Page 18: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 18

Sec 1-4: Observational and Experimental Studies

Observational study - the researcher merely observes what is happening or what has happened in the past and tries to draw conclusions based on these observations.

Example:

Experimental study - the researcher manipulates one of the variables and tries to determine how the manipulation influences other variables.

Example:

Page 19: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 19

Experiments have at least two groups: Treatment Group – the group(s) in the sample that receives a treatment or

experimental condition. Control Group – the group in the sample that is treated identically in all

respects to the treatment group EXCEPT that they don’t receive the active

treatment.

Using a control group allows us to see what would have happened to the

response variable if treatments had not been applied. Placebo – a treatment that looks like a real drug but has no active ingredient

(meaning it doesn’t do anything!). Placebo Effect – when people take a placebo and it works like the treatment

or better.

This is usually because of psychological reasons. Our minds are powerful!

Good experiments include a placebo group when humans are involved.

Page 20: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 20

Independent Variable – the variable that is being manipulated by the researcher (also called the explanatory variable).

Dependent Variable – the response to the independent variable or the result of the explanatory variable (also called the response or outcome variable).

Example: Taking nicotine patch and smoking status.

Explanatory (independent) variable – Response (dependent) variable –

Example: Completing homework and grades

Explanatory (independent) variable – Response (dependent) variable -

Page 21: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 21

Advantages of Experiments

The effect of an explanatory variable can be studied more precisely. Researcher has (some) control over selecting participants, assigning

them to groups, and manipulating the independent variable. Cause and effect relationships can be established using

randomized experiments (e.g., smoking causes cancer in lab rats). Note: In order to make cause and effect conclusions in an experiment, the subjects must be randomly assigned among the treatment groups.

Disadvantages of Experiments May occur in unnatural settings (e.g., laboratories). Hawthorne Effect - when subjects know they are participating in an

experiment and change their behavior in ways that affect the results of the study. (weight loss studies)

Not all variables can be controlled for in a study.

Page 22: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 22

Advantages of Observational Studies Occur in natural settings. Allows us to study situations for which it would be

illegal/unethical to conduct an experiment (e.g., rape, suicide, illegal drug use).

Disadvantages of Observational Studies Cannot make cause and effect conclusions because of

confounding variables. Data quality may be poor if researcher didn’t collect the data.

Confounding variables – one that influences the dependent or outcome variable but was not separated from the independent variable (e.g., vitamins and health, weight and income).

Page 23: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 23

Examples of Confounding Variables Age and income

Vitamins and health

Weight tends to be higher among lower socio-economic groups.

What are the confounding variables?

Sections 1-5 and 1-6 (Read on your own) Misuses of statistics.

Computers and calculators.

Page 24: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 24

We will come back to this for a longer discussion, but it is good to have a look now and start thinking about it:

Should you believe the results of a study? Eight Guidelines for Evaluating a Statistical Study

1. Identify the goal of the study, the population considered, and

type of study. 2. Consider the source, particularly with regard to whether the

researcher may be biased. 3. Look for bias that may prevent a sample from being

representative of the population. a. Selection bias occurs whenever researchers select their sample

in a way that tends to make it unrepresentative of the population.

b. Participation bias occurs primarily with surveys and polls; it arises whenever people choose whether to participate.

Page 25: The Nature of Probability and Statistics - UC Denvermath.ucdenver.edu/~ssantori/MATH2830SP13/Math2830Chapter1.pdfA population consists of all subjects that are being studied. ... 12

Ch1: The Nature of Probability and Statistics Santorico - Page 25

4. Look for problems in defining or measuring the variables of interest, which can make it difficult to interpret results.

5. Watch out for confounding variables that can invalidate the conclusions of a study. a. Are there viable alternate explanations of the results?

6. Consider the setting and the wording of questions in any survey, looking for anything that might tend to produce inaccurate or dishonest responses.

7. Check that the results are presented fairly in graphs and concluding statements.

8. Stand back and consider the conclusions. a. Did it achieve its goals? b. Do conclusions make sense? c. Do results have any practical significance?