Top Banner
SADC Course in Statistics Statistical concepts Module B2, Session3
42

SADC Course in Statistics Statistical concepts Module B2, Session3.

Mar 28, 2015

Download

Documents

Ethan Shepherd
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SADC Course in Statistics Statistical concepts Module B2, Session3.

SADC Course in Statistics

Statistical concepts

Module B2, Session3

Page 2: SADC Course in Statistics Statistical concepts Module B2, Session3.

2To put your footer here go to View > Header and Footer

Objectives

At the end of this session students will be able to:

• Define statistics

• Enter simple datasets once the data entry form is set up

• Recognise the type of each variable in a dataset

• Know some ways to summarise data of each main type

• Explain how statistical investigations deal with variability

• Differentiate between descriptive and inferential statistics

Page 3: SADC Course in Statistics Statistical concepts Module B2, Session3.

3To put your footer here go to View > Header and Footer

Activities

1. This introduction

2. Entry of the data from the CAST survey

3. Discussion/presentation on statistical concepts1. Using the data entered2. And other case studies

4. The statistical glossary1. For when you need to remind yourself about

terminology

Page 4: SADC Course in Statistics Statistical concepts Module B2, Session3.

4To put your footer here go to View > Header and Footer

What is statistics - 1?

From RSS webpage:

1. Statistics changes numbers into information.

2. Statistics is the art and science of deciding: • what are the appropriate data to collect, • deciding how to collect them efficiently • and then using them to give information, • answer questions, • draw inferences • and make decisions.

Page 5: SADC Course in Statistics Statistical concepts Module B2, Session3.

5To put your footer here go to View > Header and Footer

What is statistics - 2?

3. Statistics is making decisions when there is uncertainty.

• We have to make decisions all the time, • in everyday life, • and as part of our jobs. • Statistics helps us make better decisions.

4. Statistics is NOT just collecting a lot of numbers• It is collecting numbers for a purpose

Page 6: SADC Course in Statistics Statistical concepts Module B2, Session3.

6To put your footer here go to View > Header and Footer

What is statistics - 3?

From Wikipedia:

5. Statistics is a mathematical science pertaining to the• collection, • analysis,• interpretation or explanation• and presentation

of data.

6. Statistics are used for making informed decisions• and misused for other reasons

in all areas of business and government

Page 7: SADC Course in Statistics Statistical concepts Module B2, Session3.

7To put your footer here go to View > Header and Footer

What is statistics - 4? From the book “Statistics: A guide to the unknown”:

7. Statistics is the science of learning from data.

Question 1 in the practical sheet

•From these 7 definitions – in the practical sheet• either chose the one you think is most appropriate• or make your own

a) A one – line definition

b) A longer definition

Page 8: SADC Course in Statistics Statistical concepts Module B2, Session3.

8To put your footer here go to View > Header and Footer

Data checking and entry – Question 2

• What can we learn from the data you collected?• Work in pairs or small groups• First check the data from the CAST survey• Check each others, not your own

• Is it legible?• Can it be entered into the computer?• Is the response to the open-ended question clear?• Can the text be simplified?• If there are many points, ask the respondent to state

which are the most important 2 or 3.

• Brief notes (as a report) to be made in the exercise sheet

• to establish the data are ready for entry

Page 9: SADC Course in Statistics Statistical concepts Module B2, Session3.

9To put your footer here go to View > Header and Footer

Data entry into Excel

Just type the number. The label is

automatic

Page 10: SADC Course in Statistics Statistical concepts Module B2, Session3.

10To put your footer here go to View > Header and Footer

Data entry and checking – Question 3

• The data are now entered

• This can be a class exercise• on a single computer

• Data is entered by someone else• for each respondent (never by themselves)

• Then it must be checked• read it out• check by reading back

• Put the record number from the Excel form• on your original sheet• or add your names as another field in the Excel sheet

• Why might it be better to just have a number?

Page 11: SADC Course in Statistics Statistical concepts Module B2, Session3.

11To put your footer here go to View > Header and Footer

Data entry and checking• You should now have completed question 3

• On the practical sheet

• How long to you estimate

• For 1000 records to be entered?

Page 12: SADC Course in Statistics Statistical concepts Module B2, Session3.

12To put your footer here go to View > Header and Footer

Once the data are entered• Remember:

“Statistics is the science of learning from data.”

• To learn as much as possible• we must have confidence in the data• so they must be entered and checked well

• This is what we have done in the groups

• Now the data are ready for the analysis

• Before that, look at some other data sets• Look for the common points• That apply to all the sets• and look for differences

Page 13: SADC Course in Statistics Statistical concepts Module B2, Session3.

13To put your footer here go to View > Header and Footer

Types of data - 1 • The analysis depends on the type of data

• What are the types here?

• For questions 1 to 6• Your answer was one of 5 categories• e.g. 1: Strongly agree, 2: Agree, … 5: Strongly disagree• These categories have an ordering• from strongly agree to strongly disagree

• This type of data are called • categorical • or factor• or qualitative

• With the ordering, they are sometimes called • ordered categorical data

Page 14: SADC Course in Statistics Statistical concepts Module B2, Session3.

14To put your footer here go to View > Header and Footer

Types of data - 2

• The last question in the survey • was a sentence or two that was written

• This is also an example of qualitative data

• It is an open-ended response

• These data can be reported – and reporting the sentences can be very useful

• So it is good if they are entered as they stand

• To summarise perhaps the responses can be coded?

Page 15: SADC Course in Statistics Statistical concepts Module B2, Session3.

15To put your footer here go to View > Header and Footer

Coding open-ended questions –Question 4

• This is question 4 in the practical sheet

• Looking at the responses in your groups• Could you code them?• What different codes would you have?• How would you enter the codes?

• Might you lose anything by coding

• For a quick analysis• Could you enter the complete texts• And analyse the other columns• And then code later?

• What might you lose by coding?

Page 16: SADC Course in Statistics Statistical concepts Module B2, Session3.

16To put your footer here go to View > Header and Footer

Coding and entering open-ended data

• Discuss the suggestions for the codes.

• If some points are made by many students then prepare a summary,

• how many as a frequency• and as a percentage

• With the small number of responses • there is no need to enter them into the computer

• But discuss how it could be done

• It is an example of a multiple response question• because respondents may give no points• or more than one point

• If you ask for the most important observation• then it becomes a single qualitative response

Page 17: SADC Course in Statistics Statistical concepts Module B2, Session3.

17To put your footer here go to View > Header and Footer

Other data sets• Zambia rainfall data

• Tanzania agriculture survey

• Look for the layout of the data• is it the same as for the simple CAST survey?

• Look for the types of data

• Which are the qualitative variables?• are they ordered?

• Which are the quantitative variables?• which of them are discrete?• and which are continuous?• have any been coded to become qualitative?

Page 18: SADC Course in Statistics Statistical concepts Module B2, Session3.

18To put your footer here go to View > Header and Footer

Annual climatic data from Zambia

Page 19: SADC Course in Statistics Statistical concepts Module B2, Session3.

19To put your footer here go to View > Header and Footer

Survey data from Tanzania - 1

Page 20: SADC Course in Statistics Statistical concepts Module B2, Session3.

20To put your footer here go to View > Header and Footer

Survey data from Tanzania - 2

Page 21: SADC Course in Statistics Statistical concepts Module B2, Session3.

21To put your footer here go to View > Header and Footer

Discussion- Question 5

• The layout of the data• Was always the same!• In a rectangle

• Each row is a record• There are as many records (rows of data) • as there were respondents, or students, or units

• Each column is a variable• Variables can be qualitative• or they can be quantitative

• Discuss which type they are • For each data sets• complete the tables in the practical sheet, question 5

Page 22: SADC Course in Statistics Statistical concepts Module B2, Session3.

22To put your footer here go to View > Header and Footer

Qualitative variables• They are categorical

• They may be nominal, (which implies there is no ordering)

• Give some examples from the Tanzania survey

• They may be ordered – as in the CAST survey

• Give an ordered example from the Tanzania survey

Page 23: SADC Course in Statistics Statistical concepts Module B2, Session3.

23To put your footer here go to View > Header and Footer

Examples of analysis – Tanzania surveyQuestion 6

• There are 3223 records, • but just take the 18 you can see in the figure

• Count the values for Q0123 – head of household• There were 6 Females and 12 Males• So 2/3 of the 18 households had a male head• That’s about 70% • but percentages are a bit misleading with so few numbers

• Now you give a similar summary for Q021• type of agricultural household

• And also Q3464• how often did the household have food problems

Page 24: SADC Course in Statistics Statistical concepts Module B2, Session3.

24To put your footer here go to View > Header and Footer

Add a simple chart• A simple chart can also be sketched

• Here is one by Excel

• But a sketch can be “by hand”• Excel will be used for these tasks from Session 4

Page 25: SADC Course in Statistics Statistical concepts Module B2, Session3.

25To put your footer here go to View > Header and Footer

Examples of analysis – CAST survey Question 7

• Do a similar analysis of the CAST survey

• To make it quick • each group could initially process just one question• then report the results to the class

• Include a hand drawn chart• Sketch a simple bar chart • and include the numbers on the chart• as shown earlier

Page 26: SADC Course in Statistics Statistical concepts Module B2, Session3.

26To put your footer here go to View > Header and Footer

Quantitative variables- Question 8• They may be discrete (whole numbers)

• Give examples from the climatic data• And the Tanzania survey

• They may be (conceptually) continuous• Give examples from the data sets

• Also they may be coded into (ordered) categories• Give an example from the Tanzania survey

Page 27: SADC Course in Statistics Statistical concepts Module B2, Session3.

27To put your footer here go to View > Header and Footer

Examples of analysis – Tanzania survey

• An analysis of the 18 values in Q3462– The number of times meat was eaten last week

• minimum = 0• maximum = 5• adding the values: total = 31, • so the mean = 31/18 about 1.7 times per week

• Note: the mean does not have to be an integer• just because the individual values are whole numbers

• Repeat this analysis• for Q3463 – times fish eaten last week• and HHsize

Page 28: SADC Course in Statistics Statistical concepts Module B2, Session3.

28To put your footer here go to View > Header and Footer

Data analysis• As the layout of the data is always the same

• Once you know how to analyse one data set• You will have the principles to analyse them all• And we have just done one analysis!

• You have seen that• The appropriate analysis depends on the type of data

• So what are the principles • of analysing (summarising) data • of the different types?

Page 29: SADC Course in Statistics Statistical concepts Module B2, Session3.

29To put your footer here go to View > Header and Footer

The methods of analysis

• How many? • are questions for qualitative variables• for example the CAST survey, the Tanzania survey

• You used summaries• Like counts, or proportions or percentages

• How large?

• How variable?• are questions for quantitative variables• for example the climatic data or the Tanzania survey

• We used summaries • Like averages, extremes and measures of spread

Page 30: SADC Course in Statistics Statistical concepts Module B2, Session3.

30To put your footer here go to View > Header and Footer

A toolkit for analysis

• Different types of graph are also used

• Qualitative data• “how many”

• Quantitative data• how large• how variable

Page 31: SADC Course in Statistics Statistical concepts Module B2, Session3.

31To put your footer here go to View > Header and Footer

Statistics and variation • In the CAST survey - why not just ask one student?

• In the climatic data - why not just use one year?

• In the agriculture survey - why not just use one household?

• Because there is variation between the responses

• Remember this definition?• “Statistics is making decisions • when there is uncertainty.”

Page 32: SADC Course in Statistics Statistical concepts Module B2, Session3.

32To put your footer here go to View > Header and Footer

Variation is everywhere!

• In the book “Statistics a guide to the unknown”

• “Variation is everywhere. • Individuals vary• Repeated measurements on the same individual vary

• The science of statistics• provides tools for dealing with variation”

• So statistics is concerned with making sense from data, when there is variation

Page 33: SADC Course in Statistics Statistical concepts Module B2, Session3.

33To put your footer here go to View > Header and Footer

Fighting the curse of variation• To do good statistics you must

• tame variation• fight the curse of variation

• You have 2 main strategies for overcoming variation

• 1. Take enough observations• In the Tanzania survey there were 3223 households

just from this one region

• 2. Measure characteristics that explain variation• Variation itself is not necessarily the problem• Variation you do not understand is the problem

Page 34: SADC Course in Statistics Statistical concepts Module B2, Session3.

34To put your footer here go to View > Header and Footer

An example: explaining variation• Take the CAST survey

• Add a new record for an imaginary student• Make it VERY DIFFERENT to the existing records • So if most students were positive about CAST• Then make this record very negative, etc

• You have added variation

• Now what could you (should you) have measured • to explain this variation?

Page 35: SADC Course in Statistics Statistical concepts Module B2, Session3.

35To put your footer here go to View > Header and Footer

What you could have measured• This little survey only asked about CAST

• It did not ask about you, e.g.• male/female• experience• age• computer access• etc

• These measurements could help• to understand the difference with this new student

• The Tanzania survey also asked about• Education• Possessions, etc

• Why – to be able to understand/explain variation

Page 36: SADC Course in Statistics Statistical concepts Module B2, Session3.

36To put your footer here go to View > Header and Footer

Analysis and variation together

• For statistical analysis you have:• summarised columns of data• i.e. summarised individual variables

• You did this for qualitative and quantitative variables

• To fight the curse of variation• You take measurements• So you add to the rows of data

• That helps you to explain the variation

• That’s statistics for you!• You analyse the columns, i.e. the variables• And you understand variability by looking at the rows

Page 37: SADC Course in Statistics Statistical concepts Module B2, Session3.

37To put your footer here go to View > Header and Footer

Types of statistics• Wikepedia says roughly:

• Statistical methods can be used to summarize • or describe a collection of data; • this is called descriptive statistics.

• In addition, patterns in the data may be modelled• and then used to draw inferences about the process

or population being studied; • this is called inferential statistics.

• Both descriptive and inferential statistics • comprise applied statistics.

Page 38: SADC Course in Statistics Statistical concepts Module B2, Session3.

38To put your footer here go to View > Header and Footer

Descriptive and inferential statistics

• We have just done descriptive statistics

• We will only do descriptive statistics in this module

• The sample in the Tanzania agricultural survey • was 3223 households

• That’s just under 1% of the households in the region• See the column called WT – with values like 137• So each observation “represents 137 households

• But with such a large sample• The inferences for the whole region• Will be quite precise

• So most of what we need now is descriptive tools• In the Higher level modules • we add ideas of inferential statistics

Page 39: SADC Course in Statistics Statistical concepts Module B2, Session3.

39To put your footer here go to View > Header and Footer

Glossary of statistical terms

• Each subject becomes easier• when you understand the terms

• A glossary is supplied• Called the SSC Statistical Glossary

• It explains most of the terms • For the 3 levels of this course

• So some terms may be new to you now

• An example is on the next slide• You can print the glossary if you wish• But it is good to look on-line• Then all the terms in blue are links• So you can easily move about in the document

Page 40: SADC Course in Statistics Statistical concepts Module B2, Session3.

40To put your footer here go to View > Header and Footer

Example from the glossary• Descriptive statistics• If you have a large set of data, then descriptive statistics

provides graphical (e.g. boxplots) and numerical (e.g. summary tables, means, quartiles) ways to make sense of the data.

• The branch of statistics devoted to the exploration, summary and presentation of data is called descriptive statistics.

• If you need to do more than descriptive summaries and presentations it is to use the data to make inferences about some larger population.

• Inferential statistics is the branch of statistics devoted to making generalizations.

Page 41: SADC Course in Statistics Statistical concepts Module B2, Session3.

41To put your footer here go to View > Header and Footer

Learning objectives

• Define statistics

• Enter simple datasets once the data entry form is set up

• Recognise the type of each variable in a dataset

• Know some ways to summarise data of each main type

• Explain how statistical investigations deal with variability

• Differentiate between descriptive and inferential statistics

Page 42: SADC Course in Statistics Statistical concepts Module B2, Session3.

42To put your footer here go to View > Header and Footer

The end

• Next we move to the use of Excel

• To produce the tables and graphs

• So you can analyse all 3223 records – not just 18