Top Banner
Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when planning a survey.
23

Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Dec 24, 2015

Download

Documents

Ralph Potter
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

Ambitious title?

Confidence intervals, design effects and significance tests for surveys.

How to calculate sample numbers when planning a survey.

Page 2: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

Summary• Statistical inference

– Design based– Model based

• Confidence intervals and hypothesis tests - general

• Their modification for survey designs– Design effects and design factors

• Calculation of sample numbers for studies– Their modification for complex surveys

Page 3: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

Statistical inference

• Making inferences about some aspect of the population, using observation to draw conclusions about the population now, or will evolve in future

• Data are what we are given

• Inference allows us to turn them into

information

Page 4: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

Elements needed for statistical inference – design based

• Want to learn something about a population• You have

– A model of how the sample was selected from the population.

– Some data obtained from the sample

– Knowledge of how to estimate!• E.g. Obtain data on the income of 10,000 from a population of 5

million.

• Need inference to estimate the income distribution of the whole 5 million and to know how close this is to the population value

Page 5: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

Elements needed for statistical inference – model based

• You have– A model that could have generated the data for your

population, along with ideas about what current and future populations this might generalise to..

– Some data that can be assumed to be generated by this model.

– Knowledge of how to carry out the inference!• E.g. Obtain data on the income of 10,000 from a population and can

make the assumption that the income distribution follows some mathematical distribution

• Need inference about the assumed model for the income distribution of the whole 5 million and how close your estimate will be to the true value

Page 6: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

How do design and model based inferences differ?

• Conceptually poles apart• In practice they give the same answers• Except when numbers are small• Or when a large proportion of the

population has been sampled• But its good to think about what you are

doing and decide which type fits your problem

Page 7: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

Next set of results

• Apply to a simple unstructured sample – No clustering

– No stratification

– No weighting

• Taken from a population with replacement (not a problem in model based inference)

• Exactly the same large-sample results apply for model-based and design-based inferences

Page 8: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

Mean of 9 x s

ndevstx ..

..devstx

Page 9: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

Standard error of the mean

x Approx a normal distr with s.d. n

The data are fixed, so this tells us where is likely to be.

n

is called the standard error of the sample mean

Sometimes s.e.mean - it measures the expected distance of the “true” mean from the mean of the observed sample.

A 100(1 confidence interval for from thenormal distribution Is ...2/ meszx

Page 10: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

Values of Z for confidence intervals

• 95% c.I. Gives Z = 1.96

• 99%

Z = 2.58

• 68%

Z = 1

• 90%

Z = 1.64

Page 11: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

We can use it for proportions too• Want too estimate a proportion - e.g. a proportion of 20 year olds who use the internet

–Then r/n estimates

–with standard error

–to use this formula we replace with

•A rule of thumb is that this approximation is OK if the smaller of r and (n-r) is >5.

n/)1( )ˆ( nr

Page 12: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

Are these formulae good enough?• Yes – unless your survey is too small to be

any use

• They extend easily to differences in means and proportions

• Similar approximate results apply to regression models and logistic regressions

• BUT – they only apply to simple samples

Page 13: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

But my data are more complicated than thisAnd nobody will let me put standard erorrs or

confidence intervals in my report

• A goal of a good statistical report is that it should not include and tables or graphs where what seems to be information are just the result of chance variation (noise).– set out your task in terms of an outcome predicted from

other factors

– Carry out a set of regression predictions

– Base the tables to go in the report on the regression models that are found to be more than chance effects

Page 14: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

Inferences for complex surveys

• The usual formulae and regression models don’t hold

• Most surveys use weighting• And allowances for clustering and

stratification have to be made• Software that modifies the results we have

just discussed and calculates them correctly for complex surveys is now available

Page 15: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

Two main methods are used

• Taylor linearisation – theory of this all worked out in the 1940s and 50s

• Replication methods, jacknives and bootsraps – 1960s and 1970s

• Only now is software readily available to do things properly

Page 16: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

Getting by without the correct software

• Carry out an analysis using an ordinary computer package (eg. SAS, SPSS simple procedures)

• But use a weight in the analysis to get results that will correct the bias in the estimates

• Your weighted analysis will get you the wrong standard errors and wrong tests, but the estimates will be about right.

• Use design effect tables to get some idea of the standard errors

Page 17: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

Using the correct software

• Is not difficult – PEAS web site explains how• Routines are available in SAS, SPSS, STATA and

R• But it does mean that you need to get details of the

survey design• E.g. PSU, stratification variables need to be

available• Easier for you than for me

Page 18: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

Getting by without the correct software

• Use a table of design effects (DE)• Often published with the surveys• To get a s.e. from a complex survey

– Calculate the design factor (DF) as the square root of the DE

• Multiply the s.e. from a simple analysis by DF• For most household surveys DEs vary from about

0.8 to 2 or 3.• This is a rough and ready method and will only

work if weights are not too far from 1.0

Page 19: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

Disadvantages of this

• DEs are not constant for a survey

• They are also different (usually lower) when subgroups of a survey are selected

• They may also be lower in complicated models, like regressions where it is also very hard to know how to apply them.

• Methods are approximate

Page 20: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

Uses of design effects (DEs)

• They tell you about how well your survey design has worked

• Most survey software produce estimates of design effects with their output

• A design effect of 2 means your effective sample size is halved

• It is good to have such estimates when planning sample numbers for surveys.

Page 21: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

Sample numbers for planning studies

• Think ahead about the sort of comparisons you might want to make

• Are you interested in time trends?

• Or in comparisons between certain groups– If so, what proportions in each

• Do you want to estimate something (eg % of children in poverty)?

Page 22: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

Use spread sheet sample numbers.xls

Page 23: Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.

Scot Exec Course Nov/Dec 04

To modify these for surveys

• Simply multiply your answer by an estimate of the design effect

• Or try to do the next survey better by getting a smaller design effect