Top Banner
[email protected]/2010 Population & Sample Sudigdo Sastroasmoro
55

Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

Oct 26, 2015

Download

Documents

Laurencia Leny

Metodologi penelitian
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Population & Sample

Sudigdo Sastroasmoro

Page 2: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Population is a large group of study subjects (human, animals, tissues, blood specimens, medical records, etc) with defined characteristics [“Population is a group of study subjects defined by the researcher as population”]

Sample is a subset of population which will be directly investigated. Sample should be (or assumed to be) representative to the population; otherwise all statistical analyses will be invalid

All investigations are always performed in the sample, and the results will be applied to the population

Page 3: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Avoid using ambiguous terms

Sample populationSampled populationPopulasi sampelStudy population ~ sample

Page 4: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Gap between Das Sein & Das Sollen

Literature study

Research question(s) / Hypothesis

Methods / Design

Data collection &analyses

Conclusions

In the real world(“Population”)

In the sample

Infer

Page 5: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Sample is assumed to be representative

to the population. In research: measurements are always done in the sample, the

results will be applied to population.

S

P P

S

Page 6: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

P S

Investigation

S P

S

Sampling

Results

Inference

Page 7: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Target population Accessible population

IntendedSample

Actualstudy subjects

Actualstudy subjects

Page 8: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Target population = domain = population in which the results of the study will be applied. In clinical research it is usually characterized by demographic & clinical characteristics; e.g. normal infants, teens with epilepsy, post-menopausal women with osteoporosis. Accessible population = subset of target population which can be accessed by the investigator. Frame: time & place. Example: teens with epilepsy in RSCM, 2000-2005; women with osteoporosis, 2002 RSGSIntended sample = subjects who meet eligibility criteria and selected to be included in the studyActual study subjects = subjects who actually completed the participation in the study

Page 9: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Accessible population(+ time,

place)

Usually based on practicalpurposes

Appropriatesampling technique

[Non-response, drop outs,withdrawals, loss to follow-up]

Target population

(demographic, clinical)

IntendedSample

[Subjects selectedfor study]

Actualstudy

subjectsSubjects

completedthe study

Page 10: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Target Population(Domain)

Accessible population

IntendedSample

Actualstudy

subjects

External validity II:Does AP represent TP?

[Internal validity: does ASS represent IS?]

[External validity I:Does IS represent AP?}

Page 11: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Internal validity: how well the study was done (usu. measurement, but also incl. whether actual study subjects represent intended sample or not). Many drop outs? loss to follow up? low compliance?.External validity I: assess whether intended sample represents accessible population (random sampling? convenient sampling?) External validity II: whether accessible population represents target population. This cannot be calculated, but can be judged by common sense & general knowledge

Validity: Internal & external

Page 12: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

A. Probability samplingSimple random sampling (r. table, computer generated)Stratified random samplingSystematic samplingCluster samplingOthers: two stage cluster sampling, etc

B. Non-probability samplingConsecutive samplingConvenience sampling Judgmental sampling / Purposive sampling

Sampling methods

Page 13: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Predicting the 1936 Election

In 1936, Literary Digest mailed questionnaires to 10 million people, asking who they would vote for in the upcoming presidential election. The list was complied from magazine subscribers, car owners and telephone directories. Based on the 2.3 million responses, they predicted a victory for Republican Landon over Roosevelt by a 60 to 40 margin.Roosevelt won with 61% of the vote, to 36% for Landon.George Gallup correctly predicted the election—and the results of the Literary Digest poll!—to within 1 percent, using random samples.

Page 14: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Probability sampling (1)Simple random sampling: – Select 50 out of 900 students 1. Using Random number table:

o Example: 146*72 2*238*9 12*970 *127*63 8*759*0 29*874

*390*48 6*83012. Using computer generated random numbers (pseudo-random) Command: How many subjects do you have? 900

How many do you want to select? 50Enter → 017, 068, 113, 142, etc

Repeating the procedure exactly will result in completely different numbers

Page 15: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Simple Random Sample: n = 20, N= 2000

Page 16: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Probability sampling (2)

Systematic sampling: Every m subject is selectedSelected number: k

Example: k =3, m =10:3, 13, 23, 33, 43, etc

Better (more representative) than SRS if no natural trends or strata

Page 17: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Systematic sample: N = 2000, n = 20, m = 100, k = 45

45, 145, 245, ………1945

Page 18: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Probability sampling (2)

Stratified [random] sampling: Random sampling is done in each strata separately, e.g., by sex, age group, stage of disease, etcThe results then combined

Page 19: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Stratified sample of 20 from 4 strata

Page 20: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Probability sampling (3)

Cluster sampling

Subjects are selected separatelyaccording to cluster or place (RT, RW,district, etc)

Page 21: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Cluster Sample of 20 (cluster size = 4)

Page 22: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Non-probability sampling (1)

Consecutive sampling:

Subjects are selected according to theirappearance on the listMost commonly used in clinical studies

Can be expected resembling randomsampling if time span is long enough

This is the best of non-probability sampling

Page 23: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Non-probability sampling (2)

Convenience samplingJudgmental sampling

They are rarely justified except for certain conditions, e.g. normal values

Page 24: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

All statistical analyses (inferences) are based on (simple) random samplingWhether or not a sample is representative to the population depends on whether or not it resembles the results if it were done by random sampling

Note

Page 25: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

How to generalize results in the sample

to the population:

Introduction to statistical inference

Page 26: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

IMPORTANT!!!Statistical significance vs. clinical

importanceNegligible clinical difference may be statistically very significant if the number of subjects >>>. e.g., difference in reduction of cholesterol level of 3 mg/dl, n1=n2 = 10,000; p = 0.00002Large clinical difference may be statistically non-significant if the no of subjects <<<, e.g. 30% difference in cure rate, if n1 = n2 = 10, p = 0.74

Page 27: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

R

x = 300 mg/dl

x = 300mg/dl

Standardtreatment

New treatment

Cholesterol level, mg/dl

t = df = 9998 p = 0.00002

x = 200

x = 197

Clinical

Statistical

Clinical importance vs. statistical significance

n=10000

n=10000

Page 28: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Cured Died

Standard Rx 0 10 (100%)

New Rx 3 7 (70%)

Fischer exact test: p = 0.211

Clinical significance vs. statistical significance

Absolute risk reduction = 30% Clinical

Statistical

Page 29: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Abstract• Objectives:• Methods:• Results: After 2 months of

treatment, there was significant difference in LDL (P = 0.0032), HDL (P = 0.048), but there was no significant difference in triglyceride (P= 0.073) between the 2 groups.

• Conclusion:

Page 30: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Can the results of the study (in sample) be applied in the accessible or target population?Hypothesis testing & confidence interval

Introduction to statistical inference

Page 31: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Statistic and Parameter

An observed value drawn from the sample is called a statistic (cf. statistics, the science)The corresponding value in population is called a parameterWe measure, analyze, etc statistics and translate them as parameters

Page 32: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Examples of statistics:

ProportionPercentageMeanMedian ModeDifference in proportion/mean

ORRRSensitivitySpecificityKappaLRNNT

Page 33: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

There are 2 ways in inferring statistic into parameter:

Hypothesis testing p valueEstimation: confidence interval (CI)

P Value & CI tell the same concept in different ways

Page 34: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

P value

Determines the probability that the observed results are caused solely by chance (probability to obtain the observed results if Ho were true)

Page 35: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

C 30 (60%) 20 (40%) 50

E 40 (80%) 10 (20%) 50

X2= ; df = 1; p = 0.0432

Group Success Failure Total

Page 36: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

C 30 (60%) 20 (40%) 50

E 40 (40%) 10 (20%) 50

X2= ; df = 1; p = 0.0432

Group Success Failure Total

If drugs E and C were equally effective, we still can have the above result (difference of success rate of 20%)

but the probability is small (4.32%)

If drugs E and C were equally effective, the probability that the result is merely caused by chance is 4.32%

If we define in advance that p<0.05 is significant,than the result is called statistically significant

Page 37: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Similar interpretation applies to ALL hypothesis testing: t-test, Anova,

non-parametric tests, Pearson correlation, multivariate tests, etc:

If null-hypothesis null were true, the probability of obtaining the

result was ……. (example 0,02 or 2%, etc)

Page 38: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Confidence Intervals

Estimate the range of values (parameter) in the population using a statistic in the sample (as point estimate)

Page 39: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

X XX

If the observedresult in the

sample is X, whatis the figure inthe population?

CI

A statistic (point estimate)

S

P

Page 40: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Most commonly used CI:

CI 90% corresponds to p 0.10CI 95% corresponds to p 0.05CI 99% corresponds to p 0.01

Note:p value only for analytical studiesCI for descriptive and analytical studies

Page 41: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

How to calculate CI

General Formula:

CI = p Z x SE

•p = point of estimate, a value drawn from sample (a statistic)

•Z = standard normal deviate for , if = 0.05 Z = 1.96 (~ 95% CI)

Page 42: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Example 1

100 FKUI students 60 females (p=0.6)What is the proportion of females in Indonesian FK students? (assuming FKUI represents FK in Indonesia)

Page 43: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Example 1

70501060

96160

10040609616095

.;...

..

....%

npqSE(p)

=±=

±=

±=

=

X0.5/10

xCI

Page 44: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Example 2: CI of the mean

• 100 newborn babies, mean BW = 3000 (SD = 400) grams, what is 95% CI?

95% CI = x 1.96 x SEM

3080;2920

)803000();803000(803000100

400x96.13000CI%95

nSDSEM

Page 45: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Examples 3: CI of difference between proportions (p1-p2)

• 50 patients with drug A, 30 cured (p1=0.6)• 50 patients with drug B, 40 cured (p2=0.8)

29.0;11.0)09.02.0();9.02.0()pp(CI%95

09.050

4.0

50

)2.08.0(

50

)4.06.0(

n

qp

n

qp)pp(SE

)pp(xSE96.1)pp()pp(CI%95

21

2

21

2

1121

212121

Page 46: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Example 4: CI for difference between 2 means

Mean systolic BP:50 smokers = 146.4 (SD 18.5) mmHg50 non-smokers = 140.4 (SD 16.8) mmHg

x1-x2 = 6.0 mmHg

95% CI(x1-x2) = (x1-x2) 1.96 x SE (x1-x2)

SE(x1-x2) = S x V(1/n1 + 1/n2)

Page 47: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Example 4: CI for difference between 2 means

V

13.01.0;)(1.96X3.536.095%CI

3.53501

501

17.7)xSE(x

17.798

16.24918.6)(49s

2)n(n1)s(n1)s(n

s

21

21

222

211

Page 48: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Other commonly supplied CI

Relative risk (RR)Odds ratio (OR)Sensitivity, specificity (Se, Sp)Likelihood ratio (LR)Relative risk reduction (RRR)Number needed to treat (NNT)

Page 49: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Altman & Gore

• Statistics with confidence

Page 50: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Suggested CI presentation:

• 95%CI: 1.5 to 4.5• 95%CI: -2.5 to 4.3• 95%CI: 12 to -6

• Not recommended: 3 + 1.5• Not recommended: -9 + -3

Page 51: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

In contrast to CI for proportion, mean, diff. between proportions/means, where the values of CI are symmetrical around point estimate, CI’s for RR, OR, LR, NNT are asymmetrical because the calculations involve logarithm

Page 52: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Examples

RR = 5.6 (95% CI 1.2 ; 23.7)OR = 12.8 (95% CI 3.6 ; 44,2)NNT = 12 (95% CI 9 ; 26)

Page 53: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

If p value <0.05, then 95% CI:exclude 0 (for difference), because if A=B then A-B = 0 p>0.05exclude 1 (for ratio), because if A=B then A/B = 1, p>0.05

For small number of subjects, computer calculated CI may not meet this rule due to correction for continuity automatically done by the computer

Page 54: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Concluding remarksIn every study sample should (assumed to) be representative to the population. Otherwise all statistical calculations are not validp values (hypothesis testing) gives you the probability that the result in the sample is merely caused by chance, it does not give the magnitude and direction of the differenceConfidence interval (estimation) indicates estimate of value in the population given one result in the sample, it gives the magnitude and direction of the difference

Page 55: Metlit-02 Populasi, Sampel & Variabel - Prof. dr. Sudigdo S, SpA(K).ppt

[email protected]/2010

Concluding remarks

p value alone tends to equate statistical significance and clinical importanceCI avoids this confusion because it provides estimate of clinical values and exclude statistical significance whenever applicable, supply CI especially

for the main results of study in critical appraisal of study results, focus

should be on CI rather than on p value.