Top Banner
SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16
37

SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Mar 28, 2015

Download

Documents

John Roberts
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

SADC Course in Statistics

Common complicationswhen analysing survey data

Module I3 Sessions 14 to 16

Page 2: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Objectives of these three sessions

You should be able to:

• Explain why weights • are sometimes needed in analysing survey data

• Produce weighted tables • of counts and other statistics

• Suggest ways of adjusting analyses • when there are missing values

• Analyse multiple response data

• Cope with data containing zero values

Page 3: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Contents

• Review• Why these sessions?

• There can be zero values• That may have to be analysed separately

• Multiple responses are common• and are an example of data at multiple levels

• Weights are often needed• Because observations represent different fractions of

the population

• Missing values can distort an analysis• Simple options are explored

Page 4: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Review

• Describe data well• can use Excel, or a statistics package• we repeat briefly, with a statistics package

• Real data sets introduce surprises in analysis• That are not present with artificial training exercises• They need practice during training courses• Or they will be a problem to analyse later

• But some complications are predictable• And very common• Like multiple response questions, or the need for weights

• These are the complications we cover here

Page 5: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

How to describe data well – (repeat slide)

• Look for oddities in the data • and be prepared to adapt the summaries that you calculate

• Study the data as tables and graphs

• Use frequencies and percentages • to summarize categorical variables

• Use averages and measures of variability• to summarize numeric variables

• Identify any structure in the data• and use it in producing your summaries

Page 6: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Look at the data (repeat slide)

The 2 types of variable are summarized in different ways

Page 7: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Analysis to meet objectives (repeat slide)

Simple objectives

Not so simple

objectives

Page 8: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Meeting simple objectives (repeat slide)

These summaries were made with Instat – see practical 1

Page 9: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Answering more complicated objectives

AND explaining some of the variability

These were also with Instat

Page 10: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Practicals 1 and 2

• Practical 1• Reviews the construction of tables• Using a statistics package

• Particularly to look at percentages• Because percentages have to be understood clearly• to analyse multiple response data

• Practical 2• Looks at the analysis of data containing zeros• And shows that calculating averages needs to be done

carefully, when there is structure in the data

• Both practicals give more practice• In the use of a statistics package

Page 11: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Zero values

• Zeros may be are a simple part of the data• For example: List the assets – radio, bicycle, etc• Some may have zero assets

• Often however zero is a special value

• And should be analysed in a special way

• Examples:• How many livestock do you have?• What was your yield of maize?• How much rain fell yesterday?

• What is different here?

Page 12: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Example

Obs. Value

1. 3

2. 8

3. 0

4. 0

5. 5

6. 6

7. 0

8. 7

9. 0

10. 1

Possible analysis• Total = 30

• n = 10

• mean = 3

• median = 2

• etc

This does nothing special

The zeros are analysed with all the other values

Page 13: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Example continued

Obs. Value

1. 3

2. 8

3. 0

4. 0

5. 5

6. 6

7. 0

8. 7

9. 0

10. 1

Alternative analysis• Total = 30

• n = 10

• number of zeros = 4

• proportion of zeros = 0.4 (40%)

• n = 6 are non-zero

• mean = 5 of the non-zero values

• median = 5.5

• etc

Page 14: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Which is better?

• As usual both are valid• It depends on the precise objective• And on the type of data

• Often the 2-step analysis is appropriate

• The data are split into 2• For example: Do you have cattle?• Then (if you do) how many do you have?

• Analysis• 60% of farmers owned cattle• Among the cattle owners, the mean was 5 per

household

Page 15: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Multiple response questions?

From Tanzania agricultural survey

These are NOT multiple

responses

because the question asks for the main

source

Ask for ALL sources used

to make it multiple response

Page 16: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Multiple responses?

Not multiple response Multiple response

You may own more than 1 asset

Page 17: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Livestock survey examples

Page 18: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Analysis of multiple responses

• For individual species

• it is easy• What % keep cattle?• What % keep sheep?

• Nothing special needed

• Looking at all species together

• Needs thought• what % keep livestock• does livestock keeping

depend on type of household

Page 19: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Practicals 3 and 4

• Multiple response analysis• Using a simple example• With three different layouts of the data

• Then some real examples!• Using data from the Tanzania agriculture survey

Page 20: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Introducing weights

• Suppose a sample of 2 farmers

• Farmer Yield

A 1 t/ha

B 2 t/ha

• What is the mean?

• Obviously it is (1 + 2)/2 = 1.5 t/ha!

• But…

Page 21: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Introducing weights - continued

• Suppose a sample of 2 farmers

• Farmer Area Yield Production

A 5 ha 1 t/ha 5 tons

B 0.5 ha2 t/ha 1 ton

• Now what is the mean?

• It could still be (1 + 2)/2 = 1.5 t/ha

• Or it could be (5 + 1)/5.5 = 1.1 t/ha

Page 22: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

But which is right?

• They are both right, • but they answer different questions

• Take food security• Are you interested in the farmer• Or the production• Or both

• If the farmer is the unit of interest• Then there are 2 farmers• The mean is 1.5

• If the area is the unit of interest• Then there are 5.5 ha• And Farmer A is 10 times as important as farmer B• So a weighted mean is produced

Page 23: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

The weighted mean

• So if the area is of interest – then with

• Farmer Area Yield

A 5 ha 1 t/ha

B 0.5 ha2 t/ha

• Weight each yield by the area it represents

• mean = (1*5 + 2*0.5)/5.5 = 1.1

• Here the areas are the “weights”

• They are used when different observations • represent different proportions of the “population”

Page 24: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Weights in the Tanzania agriculture survey

The number of people in

the population

represented by each

observation

It was roughly a

1% sample, so the

weights are about 100

The technical guide explains the calculations

Page 25: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Practical 5

• Weights using a statistics package

• First the rice survey• Weighting by the size of field

• Then the Tanzania agriculture survey• Investigate ownership of radios• By sex of household head• And then by type of farming household

Page 26: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Possession of radio by type of farming

Unweighted analysis

The observed numbers and percentages in the sample

Look at livestock – but numbers small

Page 27: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Possession of radio by type of farming

Weighted analysis

The estimated numbers and percentages in the region of Tanzania

Look at livestock now – what do you conclude?

Page 28: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Why such a large change with weighting?

Examine the weights for these

2 groups

Average weight = 60 Average weight = 20

So estimated % with radio =

100*(42*20)/(10*60+42*20) = 59%

Page 29: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

And always take care with small numbers

Large sample overall

But still a small sample of livestock-only farmers

Page 30: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Missing values

Survey of countries on principles of official statistics

Non-response is one form of missing value

Here 82 of the 194 countries did not respond

Page 31: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

More missing values

This “non-response” is missing responses to questions within the 112 who responded overall

Page 32: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Practical 6: Non-response and missing values

• The data on the principles of official statistics• are re-analysed in a new way

• Which adjusts for the missing values• The countries who did not respond

• Then the missing values are considered• Within the responses that were available

Page 33: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Coping with missing values

• They should be stated in the reporting• Which they were in the report on the principles

• Can they be ignored?• Often the missing values are simply ignored• The analysis of the principles ignores them

• If their absence is uninformative• Then ignoring them is usually OK

• Otherwise you could look to compensate• We show one way here• By using a weighted analysis

• The main message is to think carefully• Don’t be quick to let the computer “impute” values

Page 34: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Non-response in the Principles survey

• The adjustment may present a fairer picture• Of the 194 countries

• But it adds a worrying component• Would it be better to present the results separately• For each type of country?

• And the 15 countries from the “Least Developed” group

• Have a large weight• To compensate for those that are missing

Page 35: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Missing values within the data

• There are also a few missing values• For example Principle 4 has only 11 responses

• Here there is much more information• From the other responses from this country

• Possible actions are:1. Do nothing

• That was how the results that were reported• There are so few missing • Any adjustment will make very little difference

2. Change the weights• For the questions with missing values

3. Impute missing values• Simply, or using special software

Page 36: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

Can you now?

• Cope with data containing zero values

• Explain why weights • are sometimes needed in analysing survey data

• Produce weighted tables • of counts and other statistics

• Suggest ways of adjusting analyses • when there are missing values

• Analyse multiple response data

Page 37: SADC Course in Statistics Common complications when analysing survey data Module I3 Sessions 14 to 16.

The next sessions are to practice in groups

all you have covered here so far