SAMPLING AND
SAMPLE SIZE CALCULATION
Danaida B. Marcelo, MSClinical Epidemiology Unit, Research DivisionDe La Salle Health Sciences Institute
Problem Identification Objective FormulationReview of Related LiteratureResearch DesignSampling Design and Sample SizeData Collection MethodData Analysis
Dissemination of Result
Writing the Report
THE RESEARCH PROCESS
Learning Objectives:
At the end of this session, learners should be able to:1. Understand the concept of sampling, sample size2. Define sampling and sampling error3. Know the different sampling methods4. Know the requirements for sample size calculation 5. Recognize OPEN EPI/EPIINFO for sample size
calculation for cross-sectional, cohort and case-control studies
What is sampling?
a procedure of drawing a fraction of a population for the purpose of determining certain characteristics of the population
TARGET POPULATION
SAMPLE POPULATION
Why do we need to sample?
we cannot study all elements of the population we are interested in
Advantages quicker less expensive more efficient
Basic Concepts in Sampling
target population - group of interest sample population - representative
subset sampling frame - list of sampling unit (ex.
List of names, or places)
sampling unit - the unit of selection elementary unit - unit of measurement
The Concept of SamplingThe Concept of SamplingExample: The researcher wants to determine prevalence of Positive PPD among 1-10 yr old children in Muntinlupa
•target population - all 1-10 yr old children in Muntinlupa
•sample population - ex. 100 children (1-10 yrs old) living in Muntinlupa
•sampling frame - list of names of all 1-10 yr old children or list of the barangays, or list of the households the sampling units
•sampling unit - the unit of selection – barangays or households or the children
•elementary unit - unit of measurement – child, 1-10 yrs old
Sampling Error
SAMPLING ERROR - error due to chance
- random error - the difference between the sample
value and the unknown true value - cannot be eliminated, but can be
minimized
How do we do sampling?
Non-probability sampling Judgment or purposive Accidental or haphazard
Probability sampling Simple random Systematic random Stratified random Cluster random Multi-stage random
How do we do sampling?
Probability Non-probability
-random selection -non random selection
-sampling frame is needed
-sampling frame is not required
-can compute for sampling error-results can be generalize
Can’t compute sampling error-results cannot be generalize
Non-probability Judgment or purposive
Expert sampling involves the assembling of a sample of persons with known or demonstrable experience and expertise in some area.
In snowball sampling, the process starts by identifying someone who meets the criteria for inclusion in the study. The respondent is then asked to recommend others whom they may know who also meet the criteria.
How do we do sampling?
How do we do sampling?
Probability Sampling Simple random Systematic random Stratified random Cluster random Multi-stage random
Example: The researcher wants to determine prevalence of Positive PPD among 1-10 yr old children in Muntinlupa
•target population - all 1-10 yr old children in Muntinlupa•Assume N = 1000
•sample population - ex. 100 1-10 yr old children in Muntinlupa
•sampling frame - list of names of all the 1-10 yr old children (assign numbers - 0001 to 1000)
•Generate randomly 100 numbers (between 0001 to 1000)•By using calculators•By using table of random numbers•By using softwares
Simple Random Sampling (SRS)
Simple Random Sampling
Stratified random sampling
the population is first divided into groups or strata
a Simple Random Sample is then selected from each stratum
subgroups of interest are represented adequately
Example: The researcher wants to determine prevalence of benign febrile convulsions among the infants in Dasmariñas, Cavite
target population - all 1-10 yr old children in MuntinlupaAssume N = 1000
sample population - ex. 100 1-10 yr old children in Muntinlupa
Sampling frame – list of 1-10 yr old children per barangay- N= 1000: Bgy A=500; Bgy B=300; Bgy
C=200 From each Barangay - select number of children using SRS - proportionate sampling - n=100: Bgy A=50; Bgy B=30; Bgy C=20
Stratified random sampling
Systematic random sampling
selection of every kth unit in the population
k = total # in population calculated sample size
the first unit is selected randomly from among the first k units
Example: The researcher wants to determine prevalence of Positive PPD among 1-10 yr old children in Muntinlupa
target population - all 1-10 yr old children in MuntinlupaAssume N = 1000
sample population - ex. 100 1-10 yr old children in Muntinlupa
Sampling frame – list of all 1-10 yr old children in Muntinlupa • k=1000/100 = 10• Choose the random start (from nos 1 to 10) • Chosen Random start= 3; then the child with id no 3 is included in the sample, then 13th in the list, then 23rd…
Systematic random sampling
Cluster Sampling
the population is first divided into clusters, usually based on geographical proximity
a random sample of such clusters is selected
all units in the clusters are selected
Example: The researcher wants to determine prevalence of Positive PPD among 1-10 yr old children in Muntinlupa
target population - all 1-10 yr old children in MuntinlupaAssume N = 1000
sample population - ex. 100 1-10 yr old children in Muntinlupa
•Clusters=barangay•Sampling frame – list of barangays•Select clusters (barangays) using simple random sampling•Include all children living in the selected barangays
Cluster Sampling
Multi-stage sampling design
for sample surveys of wide coverage, i.e. nationwide surveys
15 regions
2 provinces/region
4 towns/province/region
50 elderly/towns/province/region
RANDOM ALLOCATION in EXPERIMENTAL Studies
Random Allocation – the process of assigning subjects to different treatments by using random numbers
Example: Effect of Probiotic Treatment of Acute
Tonsillopharyngitis in children 2-5 years of age: A randomized double blind trial
Assuming sample size calculation – 50 per group, which patient will receive probiotic?
Use softwares:http://mahmoodsaghaei.tripod.com/Softwares/randalloc.html
What makes a good sample population?
“ A GOOD SAMPLE must be (1) selected at random to reduce bias(2) representative to improve validity
and (3) large enough to increase precision.”
How many subjects are to be included in the sample?
SAMPLE SIZE CALCULATION Why calculate?
for planning purposes for “power” of the study meaningful results
To minimize sampling error
Sample size calculation
Things to know: type of the study: descriptive or
analytic (cohort, case-control, clinical trial)?
study objective: proportions or means? usual values?
amount of deviation from the true value? Clinically important difference?
confidence level? power? one-tailed or two-tailed hypotheses?
Confidence level, Power
Errors in Hypothesis Testing
TRUTH DATA Support
Groups are the same
Groups differ
Do not reject Ho: Groups are the same
OK
Type II error
Reject Ho: Groups differ
Type I error
OK (1-) or power
Confidence level, Power
Type I error -- rejecting a true Ho -- probability of committing Type I error 1-
-- the confidence level usual values: = 0.05, 1- = .95 Type II error -- not rejecting a false Ho -- probability of committing Type II error 1-
-- power of the study; ability to detect a true difference usual values: = 0.20, 1- = 0.80
How do we calculate sample size?
-- Using formulas
-- Using tables of sample sizes
-- Using statistical calculators (StatCalc of EpiInfo, Open EPI)
How do we calculate sample size?
A.J. Dobson’s formula (SIMPLE RANDOM SAMPLE)
Sample size for descriptive studies
1. Estimation of a population proportion
wheren = computed sample size
p = estimate of the proportion = the desired width of the confidence interval 1- = confidence level
)1()100(
2
fpp
n
Sample size for descriptive studies
1. Estimation of a population proportion
Table 1 Values for f(1-) for various confidence levels 100 (1-) %
(1-) 0.8 0.9 0.95 0.99
f(1-)* 1.642 2.706 3.842 6.635
* f(1-) is the square of the upper 1/2 point of the std. Normal Distribution
Sample size for descriptive studies
1. Estimation of a population proportion
A researcher wants to estimate the prevalence of positive PPD among 1-10 yr old children in Muntinlupa . What is the sample size if it is expected that prevalence is 15%, and a 95% confidence interval will be used for an interval of 4% (11-19%)?
)1()100(
2
fpp
n
Sample size for descriptive studies
1. Estimation of a population proportion
Table 1 Values for f(1-) for various confidence levels 100 (1-) %
(1-) 0.8 0.9 0.95 0.99
f(1-)* 1.642 2.706 3.842 6.635
* f(1-) is the square of the upper 1/2 point of the std. Normal Distribution
Sample size for descriptive studies
1. Estimation of a population proportion
A researcher wants to estimate the prevalence of positive PPD among 1-10 yr old children in Muntinlupa . What is the sample size if it is expected that prevalence is 15%, and a 95% confidence interval will be used for an interval of 4% (11-19%)?
306
842.34
)15100(152
n
n
)1()100(
2
fpp
n
Sample size for descriptive studies
1. Estimation of a population proportion
A researcher wants to estimate the prevalence of positive PPD among 1-10 yr old children in Muntinlupa . What is the sample size if it is expected that prevalence is 15%, and a 95% confidence interval will be used for an interval of 4% (11-19%)?
306n
To estimate the prevalence of positive PPD among 1-10 yr old children in Muntinlupa with a 4% margin of error at a 95% confidence level, assuming that the population prevalence is 15%, 306 children should be included in the sample.
Sample size calculation using EPI-Info6http://www.cdc.gov/epiinfo/Epi6/ei6.htm
STATCALC program
http://www.openepi.com/Menu/OpenEpiMenu.htm
Calculate sample size: RCTExample: Efficacy of VCO as an adjunct in primary TB
Therapy among children ages 2-9 years old
Objective: To compare resolution of radiologic signs for patients given with VCO and those with placebo
VCO group(Exposed)
Placebo group(Unexposed)
+ resolution
(-) resolution
+ resolution
(-) resolution
Calculate sample size: RCT
Example: Efficacy of VCO as an adjunct in primary TB Therapy among children ages 2-9 years old
Objective: To compare resolution of radiologic signs for patients given with VCO and those with placebo
VCO group(Exposed)
Placebo group(Unexposed)
+ resolution
(-) resolution
+ resolution
(-) resolution
50% (from related literature)
75% (from related literature)
50% with (+) resolution in Placebo group
75% with (+) resolution in VCO group
SUMMARY
Statistical inference allows us to generalize sample results to the target population
random sampling ensures the “representativeness” of the sample
sample size is based on the research objectives/design sample estimates, variability from previous
studies power, level of confidence operational constraints (time, resources)
THANK YOU