Top Banner
School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis: Types of Data, Sampling Methods, Descriptive Statistics November 2011
20

School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

Mar 28, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

School of ComputingFACULTY OF ENGINEERING

MJ11 (COMP1640)

Modelling, Analysis & Algorithm Design

Vania Dimitrova

Lecture 18Statistical Data Analysis: Types of Data, Sampling

Methods, Descriptive Statistics

November 2011

Page 2: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

In the previous lectures

Mathematical Modelling

Identify Factors

Make assumptions

Formulate model

Examine behaviour

Assumptions are crucial

Estimated values for parameters

Assumed dependencies between variables

Check validity of assumptions

Grounded on data analysis

Page 3: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

In this series of lectures

Analysis of data

How to collect data samples?

How to make estimations of values (point/interval)?

How to infer possible dependencies between variables?

How to check the validity of a hypothesis?

Page 4: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

Do you agree with these statements?

Average earnings in the UK grow steadily.

There are more overseas visits to the UK than UK visits abroad.

House prices are dependent on average family income.

Young people prefer to shop online.

Advertisement improves sales figures.

TV advertisement is more powerful than Radio advertisement.

Women are more likely to buy computer games than men.

Men are more likely to buy cosmetics products than women.

Page 5: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

Population

Large collection of objects or events which vary in respect of some characteristics

The whole set of measurements or counts about which we want to draw a conclusion

Characteristics of population:

height, age, reading abilities, fitness level

What is the population for each of the claims on the previous slide?

Page 6: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

Sample

Subset of the population, as set of some of the measurements or the characteristics of the population.

population sample

Measures describing population

characteristics

PARAMETERS

Measures describing sample

characteristics

STATISTICS

Statistics estimate the parameters

Systematic errorthe sample is not representative of the population

Sampling errorinfluences by the size of the sample and the variation in the population

Page 7: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

Sampling methods

Random samplingselecting members of the population in a random order

Pros & Cons

Systematic samplingselecting members of the population in a systematic order (quasi-random)

Pros & Cons

Page 8: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

Sampling methods (cont.)

Stratified samplingdividing population in homegeneous groups and random selection within the group

Pros & Cons

Cluster samplingwhen the population is too big, we may select certain clusters (e.g. UK students)

Pros & Cons

Stage sampling – random selection of clusters

Page 9: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

Sample size

What precision do we want?

Increase the size to get better precision

What is the likely variability in the population?

Increase the size to account for higher variability

Page 10: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

Types of data

Nominal

Categories, classes

Ordinal

Nominal with order

Discrete

Numbers that are distinct points on a scale

Continuous

Can take any values between points on a scale

GIVE EXAMPLES

Page 11: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

Descriptive Statistics

Mean – average score

n

x

x

n

jj

1

Median – middle point on the scale of measurementhelpful for oddly shaped distributions

General description of the sample

2 3 5 7 9 10 12 13 14 16 18 20 21

3 4 4 5 5 5 5 6 7

Page 12: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

Distribution of scores

Standard Deviationn

xxn

jj

1

2)(

Variance2

Coefficient of variation %*100

xV

Page 13: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

Example (EU-Area-Current-Accounts.xls)

http://epp.eurostat.ec.europa.eu/

Page 14: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

Normal and Skewed Distribution

wikipedia

Skewed Distribution

Normal Distribution

x

Page 15: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

Approximating Normal Distribution

As the sample size increases, the shape of the sampling distribution becomes normal(see also Central Limit Theorem)

http://www.statsoft.com/textbook/esc.html

Page 16: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

Correlation between two variables

Measure of the relations between two or more variables

Correlation coefficient r

Negative correlation r -1

Positive correlation r 1

Different methods to calculate r

Simplest: based on deviations from the mean

11 r

n

jj

n

jj

n

jjj

yyxx

yyxx

r

1

2

1

2

1

)()(

))((

Page 17: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

Example: Positive Correlation

40

45

50

55

60

65

70

75

40 50 60 70 80

r=0.998

Page 18: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

Example: Negative Correlation

r=-0.99

40

45

50

55

60

65

70

75

80

40 50 60 70 80

Page 19: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

Example: Limited (or No) Correlation

r=0.179

40

45

50

55

60

65

70

75

40 50 60 70 80

Page 20: School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

Summary

Types of data

Population vs Sample

Sampling methods

Descriptive statistics

Normal & Skewed distribution

Correlation between variables

References

Rees D.G., Essential Statistics, Chapman & Hall/CRC, 2000.

Cohen, L., Holliday, M., Practical Statistics for Students, Chapman, 1996. http://www.statsoft.com/textbook/esc.html