Measurement and Representation of Hydrological Quantities Leonardo da Vinci - Vitruvian Man, ca 1487 photo by Luc Viatour, www.lucnix.be Riccardo Rigon Sunday, September 12, 2010
Jan 29, 2015
Measurement and Representation of
Hydrological Quantities
Leon
ard
o d
a V
inci
- V
itru
vian
Man
, ca
14
87
p
hoto
by
Luc
Via
tou
r, w
ww
.lu
cnix
.be
Riccardo RigonSunday, September 12, 2010
Riccardo Rigon
Measurement and Representation of Hydrological Quantities
Objectives:
2
•In these pages the spatio-temporal variability of measurements of hydrological quantities is discussed by means of examples.
•One deduces that statistical instruments must be used to describe these quantities.
Sunday, September 12, 2010
Riccardo Rigon
Measurement and Representation of Hydrological Quantities
Hyd
rom
etri
c H
eigh
tFr
icken
hau
sen
, on
th
e R
iver
Men
o
3
Sunday, September 12, 2010
Riccardo Rigon
Measurement and Representation of Hydrological Quantities
4
Hyd
rom
etri
c H
eigh
tFr
icken
hau
sen
, on
th
e R
iver
Men
o
Sunday, September 12, 2010
Riccardo Rigon
Measurement and Representation of Hydrological Quantities
The hydrological cycles is controlled by innumerable factors: hence it depends on innumerable degrees of freedom. Only a small portion of these factors can be taken into consideration, while the remaining part needs to be modelled as a boundary condition or as “background noise” (this noise is either modelled or eliminated with statistical instruments).
The dynamics of the hydrological cycle are non-linear. Both the hydrodynamics and the thermodynamics of the processes, that involve numerous phase changes, are non-linear. Another non-linear characteristic is that many of these processes are activated in function of some regulating quantity surpassing a threshold value. For example, the condensation of water vapour into raindrops is triggered when air humidity exceeds saturation; landslides are triggered when the internal friction forces of the material are overcome by the thrust of water within the capillarities of the soil; the channels of a hydrographic network begin to form when running water reaches a certain value of force per unit area.
Hydrological Data have Complex Trends 1/2
5
Sunday, September 12, 2010
Riccardo Rigon
Measurement and Representation of Hydrological Quantities
The dynamics include processes which are linearly unstable: for example the
baroclinic instability the drives meteorological processes at the middle
latitudes.
The dynamics of climate and hydrology are dissipative. That is to say they
transfer and transform mechanical energy into thermal energy. The
hydrodynamic process of turbulence transports energy from the larger
spatial scales to the smaller ones, where the energy is dissipated through
friction. Wave phenomena of various kind (e.g. gravity waves) transport the
energy contained in water and in air.
6
Hydrological Data have Complex Trends 2/2
Sunday, September 12, 2010
Riccardo Rigon
Measurement and Representation of Hydrological Quantities
Some Typical Problemsprecipitation
7
Sunday, September 12, 2010
Riccardo Rigon
Measurement and Representation of Hydrological Quantities
8
Some Typical Problemsincident solar radiation
Sunday, September 12, 2010
Riccardo Rigon
Measurement and Representation of Hydrological Quantities
9
Some Typical Problems Flow of the River Adige at San Lorenzo Bridge
0200
400
600
800
1000
1200
1400
Anno
Port
ate
m^3/s
1990 1995 2000 2005
Sunday, September 12, 2010
Riccardo Rigon
Measurement and Representation of Hydrological Quantities
10
Some Typical ProblemsDistribution of monthly river flows in Trento
Sunday, September 12, 2010
Riccardo Rigon
Measurement and Representation of Hydrological Quantities
Grafico bilancio annuo del bacino (2000)
0,8675
0,343
-0,184
0,797
-0,3
-0,2
-0,1
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
gen-00 feb-00 mar-00 apr-00 mag-00 giu-00 lug-00 ago-00 set-00 ott-00 nov-00 dic-00
Tempo (mese- anno)
Valo
re (m
c/s)
P - precipitazione ET - evapotraspirazione Inv - volume invasato (accumulo) R - rilascio
11
Some Typical ProblemsAnnual water budget for the Lake of Serraia catchment
Sunday, September 12, 2010
Riccardo Rigon
Measurement and Representation of Hydrological Quantities
12
Some Typical ProblemsWater content of the soil in the Little Washita catchment (Oklahoma)
Sunday, September 12, 2010
Riccardo Rigon
Measurement and Representation of Hydrological Quantities
13
Some Typical ProblemsWater content of the soil in the Little Washita catchment (Oklahoma)
Sunday, September 12, 2010
Riccardo Rigon
Measurement and Representation of Hydrological Quantities
14
Some Typical ProblemsSpatial distribution of preceipitation
Sunday, September 12, 2010
Riccardo Rigon
Measurement and Representation of Hydrological Quantities
15
Some Typical ProblemsSpatial pattern of the hydrographic network
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Luci
o F
on
tan
a -
Exp
ecta
tion
s (M
oM
A), 1
95
9
Riccardo Rigon
Sunday, September 12, 2010
Riccardo Rigon
Measurement and Representation of Hydrological Quantities
Objectives:
17
•In these pages the fundamental elements of statistical analysis will be recalled.
•Population, sample and various elementary statistics, such as mean, variance and covariance, will be defined.
•The existence of statistics and their value will be argued.
•The concept of random sampling will be introduced.
Sunday, September 12, 2010
Riccardo Rigon
Statistics
Population and Sample
18
Statistical inference assumes that a dataset is representative of a subset of
cases, among all the possible cases, called the sample. All the possible
cases represent the population from which the dataset has been extracted.
While the sample is know, generally the population is not. Hypotheses are
implicitly made about the population.
Sunday, September 12, 2010
Riccardo Rigon
Statistics
1860 1880 1900 1920 1940 1960 1980 20008
9
10
11
12
13
14
15a) Bergen:Sep temperature
time
Tem
pera
ture
(oC
)
5 6 7 8 9 10 11 12 13 14 150
5
10
15
20
25
30b) Bergen:Sep temperature distribution (1861!1997)
Fre
quency
Temperature (oC)
Exploratory Data Analysistemporal representation - histogram
19
A set of n data constitutes, therefore, a sample of data.
These data can be represented in various forms. Each representation form emphasises certain characteristics.
Sunday, September 12, 2010
Riccardo Rigon
Statistics
Sample Means
20
x :=1n
n
t=1
x,t
< x >:=1n
n
i=1
xi
Temporal Mean
Spatial Mean
The mean is an indicator of position
Given a sample, various statistics can be calculated. For example:
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
21
Corr
ado C
aud
ek
Statistical Inference
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
21
Corr
ado C
aud
ek
Statistical Inference
•Statistical inference is the process which allows one to formulate
conclusions with regards to a population on the basis of a sample of
observations extracted casually from the population.
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
21
Corr
ado C
aud
ek
Statistical Inference
•Statistical inference is the process which allows one to formulate
conclusions with regards to a population on the basis of a sample of
observations extracted casually from the population.
•Central to classic statistical inference is the notion of sample distribution,
that is to say how the statistics of the samples vary if casual samples, of the
same size n, are repeatedly extracted from the population.
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
21
Corr
ado C
aud
ek
Statistical Inference
•Statistical inference is the process which allows one to formulate
conclusions with regards to a population on the basis of a sample of
observations extracted casually from the population.
•Central to classic statistical inference is the notion of sample distribution,
that is to say how the statistics of the samples vary if casual samples, of the
same size n, are repeatedly extracted from the population.
•Even though, in each practical application of statistical inference, the
researcher only has one n-sized casual sample, the possibility that the
sampling can be repeated furnishes the conceptual foundation for deciding
how informative the observed sample is of the population in its entirety.
Sunday, September 12, 2010
Riccardo Rigon
Statistics
Exploratory Data Analysis
22
The mean is not the only indicator of position
Mode
Sunday, September 12, 2010
Riccardo Rigon
Statistics
Median and Mode
23
The mode represents the most frequent value.
The median represents the value for which 50% of the data has an inferior value and (obviously!) the other 50% has a greater value.
If the histogram distinctly presents various maximums, though the matter risks being controverial, the dataset is said to be multimodal.
Sunday, September 12, 2010
Riccardo Rigon
Statistics
Empirical Distribution Function
24
Given the dataset
hi = h1, · · ·, hn
the empirical cumulative distribution function is defined
and having derived from this the ordered set in ascending order
hj = (h1, · · ·, hn) h1 ≤ h2 ≤ · ≤ hn
ECDFi(h) :=1n
i
j=1
j
Sunday, September 12, 2010
Riccardo Rigon
Statistics
ECDF
25
The empirical cumulative distribution function can be represented as illustrated.
The ordinate value identified by the curve is called the frequency of non-
exceedance or quantile.
20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
Frequenza di non superamento
h[mm]
P[H<h]
Sunday, September 12, 2010
Riccardo Rigon
Statistics
2620 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
Frequenza di non superamento
h[mm]
P[H<h]
0.5 quantile
The 0.5 quantile separates the data distribution in half in relation to the ordinate.
ECDF
Sunday, September 12, 2010
Riccardo Rigon
Statistics
2720 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
Frequenza di non superamento
h[mm]
P[H<h]
0.5 quantile
The 0.5 quantile separates the data distribution in half in relation to the ordinate.
ECDF
Sunday, September 12, 2010
Riccardo Rigon
Statistics
2820 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
Frequenza di non superamento
h[mm]
P[H<h]
0.5 quantile
median
And so the median is identified
ECDF
Sunday, September 12, 2010
Riccardo Rigon
Statistics
Box and Whisker Diagrams
29
20 40 60 80
0.0
0.2
0.4
0.6
0.8
1.0
Frequenza di non superamento
h[mm]
P[H<h]
0.5 quantile
The procedure can be generalised and represented with a box and whisker diagram.
0.75 quantile
0.25 quantile
“whisker”
The box and whisker diagram is another way of representing the data distribution.
Sunday, September 12, 2010
Riccardo Rigon
Statistics
A parameter is a describes a certain aspect of the population.
• For example, the (real) mean annual precipitation at a weather station
is a parameter. Let us suppose that this mean is
• In any concrete situation the parameters are unknown
30
Corr
ado C
aud
ek
µh = 980 mm
Parameters and Statistics
Sunday, September 12, 2010
Riccardo Rigon
Statistics
A statistic is a number that can be calculated on the basis of data
given by a sample, without any knowledge of the parameters of the
population.
• Let us suppose, for example, that the casual sample of precipitation
data covers 30 years of measurement and that the mean annual
precipitation, on the basis of the sample, is
• This mean is a statistic.
31
Corr
ado C
aud
ek
h = 1002 mm
Parameters and Statistics
Sunday, September 12, 2010
Riccardo Rigon
Statistics
Other Statistics: the Range
32
Rx := max(x)−min(x)
The range is the simplest indicator of data distribution. It is an indicator of the scale of the data. However, it only considers two data and does not consider the other n-2 data that make up the sample.
Sunday, September 12, 2010
Riccardo Rigon
Statistics
Other Statistics: Variance and Standard Deviation
33
V ar(x) :=1n
n
i=1
(xi − x)
σx :=
1n
n
i=1
(xi − x)
The variance is an indicator of “scale” that considers all the data of the sample
Sunday, September 12, 2010
Riccardo Rigon
Statistics
34
V ar(x) :=1
n− 1
n
i=2
(xi − x)
σx :=
1n− 1
n
i=1
(xi − x)
The unbiased version of the variance takes into account that only n-1 data are independent, their mean being fixed.
Other Statistics: Variance and Standard Deviation
“corrected” version (unbiased)
Sunday, September 12, 2010
Riccardo Rigon
Statistics
Coefficient of Variation
• The coefficient of variation (CV) of a data sample is defined as the
ratio of between the standard deviation and the mean:
• The greater the coefficient of variation, the less informative and
indicative the mean is in relation to the future trends of the
population.
35
CVx :=σx
x
Sunday, September 12, 2010
Riccardo Rigon
Statistics
36
Skewness is a measure of the asymmetry of the data distribution
skx :=n
i=1
1n
xi − x
σx
3
kx := 3 +n
i=1
1n
xi − x
σx
4
Other Statistics: Skewness and Kurtosis
Kurtosis is a measure of the “peakedness” of the data distribution
Sunday, September 12, 2010
Riccardo Rigon
Statistics
Estimation and Hypothesis Testing
Usually, we are not interested in the statistics for themselves, but in
what the statistics tell us about the population of interest.
• We could, for example, use the annual mean precipitation, measured
at all hydro-meteorological stations, to estimate the mean annual
precipitation for the Italian Peninsula.
• Or, we could use the mean of the sample to establish whether the
mean annual precipitation has mutated during the duration of the
sample.
37
Sunday, September 12, 2010
Riccardo Rigon
Statistics
These two questions belong to the two main schools of classical
statistical inference
• The estimation of parameters
• Statistical hypothesis testing
38
Estimation and Hypothesis Testing
Sunday, September 12, 2010
Riccardo Rigon
Statistics
Sample Variability
A fundamental aspect of sample statistics is that they vary from one
sample to the next. In the case of annual precipitation, it is very
improbable that the mean of the sample, of 1002mm, will coincide
with the mean of the population.
• The variability of a sample statistic from sample to sample is called
sample variability.
– When sample variability is very high, the sample is
misinformative in relation to the population parameter.
– When the sample variability is small, the statistic is informative,
even though it is practically impossible that the statistic of a
sample be exactly the same as the population parameter.
39
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
40
Corr
ado C
aud
ek
Sample Variability
Simulation Sample variability will be illustrated as follows:
1. we will consider a discrete variable that can only assume a small number of possible values (N = 4);
2. a list will be furnished listing all possible samples of size n = 2;
3. the mean will be calculated for each possible sample of size n = 2;
4. the distribution of means of the samples of size n = 2 will be examined.
The mean μ and the variance σ of the population will be calculated. It must be noted that μ and σ are parameters, while the mean xi and the variance s2
i of each sample are statistics.
Techniques in Psychological Research and Data Analysis 8
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
41
Corr
ado C
aud
ek
Sample Variability
•The experiment in this example consists of the n=2 extractions with return of a marble xi from an urn that contains N=4 marbles.
•The marbles are numbered as follows: 2, 3, 5, 9
•Extraction with return of the marble corresponds to a population of infinite size (it is in fact always possible to extract a ball from the urn)
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
42
Corr
ado C
aud
ek
Sample Variability
•For each sample of size n=2 the mean of the value of the marbles extracted is calculated:
•For example, if the marbles extracted are x1=2 and x2=3, then:
x =2
i=1
xi
2
x =2 + 3
2=
52
= 2.5
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
43
Corr
ado C
aud
ek
Sample Variability
Three DistributionsWe must distinguish between three distributions:
1. the population distribution
2. the distribution of a sample
3. the sample distribution of the means of all possible samples
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
44
Corr
ado C
aud
ek
Sample Variability
๏ 1. The Population Distribution
The population distribution: the distribution of X (the value of the marble extracted) in the population. In this specific case the population is of infinite size and has the following probability distribution:
xi pi
2 1/43 1/45 1/49 1/4
Total 1
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
45
Corr
ado C
aud
ek
Sample Variability
•The mean of the population is:
•The variance of the population is:
µ =
xipi = 4.75
σ2 =
(xi − µ)2pi = 7.1875
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
46
Corr
ado C
aud
ek
Sample Variability
๏ 2. The Distribution of a Sample
The distribution of a sample: the distribution of X in a specific sample.
•If, for example, the x1 = 2 and x2 = 3, then the mean of this sample is and the variance is x = 2.5 s2 = 0.5
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
47
Corr
ado C
aud
ek
Sample Variability
๏ 3. The Sample Distribution of a the Means
The sample distribution of a the means: the distribution of the means of all the possible samples.
•If the size of the samples is n=2, then there are 4X4=16 possible samples. We can therefore list their means.
sample mean xi sample mean xi
3, 2 2.5 2, 3 2.55, 2 3.5 2, 5 3.59, 2 5.5 2, 9 5.55, 3 4.0 3, 5 4.09, 3 6.0 3, 9 6.09, 5 7.0 9, 5 7.02, 2 2.0 3, 3 3.05, 5 5.0 9, 9 9.0
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
48
Corr
ado C
aud
ek
Sample Variability
•The sample distribution of the means has the following probability distribution:
xi pi
2.0 1/162.5 2/163.0 1/163.5 2/164.0 2/165.0 1/165.5 2/166.0 2/167.0 2/169.0 1/16
Total 1
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
49
Corr
ado C
aud
ek
Sample Variability
•The mean of the sample distribution of the means is:
•The variance of the population is:
µx =
xipi = 4.75
σ2x =
(xi − µx)2pi = 3.59375
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
50
Corr
ado C
aud
ek
Sample Variability
The example we have seen is very particular insomuch that the population is known. In practice the population distribution is never known.
However, we can take note of two important properties of the sample distribution of the means:
•The mean of the sample distribution of means is the same as the population mean
•The variance of the sample distribution of means is the equal to the ratio of the variance of the population to the numerosity n of the sample:
µxµ
σ2x
σ2
σ2x =
σ2
n=
7.18752
= 3.59375
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
51
Corr
ado C
aud
ek
Sample Variability
The two things to note can be summarised as follows:
•The mean and variance of the sample distribution of means are determined by the mean and variance of the population:
•The variance of the sample distribution of the means is smaller than the variance of the population.
σ2x =
σ2
nµx = µ
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
52
Corr
ado C
aud
ek
Sample Variability
To follow, we will use the properties of the sample distribution to make inferences about the parameters of the population even when the population distribution is not known.
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
53
Corr
ado C
aud
ek
Sample Variability
Three DistributionsTherefore, we have distinguished between three distributions:
1. the population distribution
2. the distribution of a sample
3. the sample distribution of the means of all possible samples
µx = 4.75, σ2x = 3.59375
Ω = 2, 3, 5, 9, µ = 4.75, σ2 = 7.1875
Ωi = 2, 3, x = 2.5, s2 = 0.5
Ωx = 2.0, 2.5, 3.0, 3.5, 4.0, 5.0, 5.5, 6.0, 7.0, 9.0,
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
54
Corr
ado C
aud
ek
Sample Variability
The population distribution: this is the distribution that contains all possible observations. The mean and variance of this distribution are
indicated with μ and σ2.
1. The distribution of a sample: this is the distribution of the values of the population that make up a particular casual sample of size n. The single values are indicated x1,.... xn, and the mean and variance are indicated and s2.
2. The sample distribution of the means of the samples: this is the distribution of the for al the possible samples of size n that can be extracted from the population being considered. The mean and variance of the sample distribution of means are indicated by and .
x
xi
µx σ2x
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
55
Corr
ado C
aud
ek
Sample Variability
The distribution that is the basis of statistical inference is the sample
distribution.
Definition: the sample distribution of a statistic is the distribution of values that the specific statistic assumes for all samples of size n that can be extracted from the population.
It must be noted that if the simulation considers less samples than all those theoretically possible than the resulting distribution will only be an approximation of the real sample distribution.
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
56
Having created different statistics, we can now make some hypotheses. For
example:
• Do the samples all have the same mean and the same variance?
• Does the mean depend on the numerosity of the sample?
• Does the variance depend on the numerosity of the sample?
Estimation and Hypothesis Testing
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
57
If the samples do not have the same mean, a trend can present istself.
Estimation and Hypothesis Testing
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
58
The variance can vary with the numerosity of the sample !
If it does not stabilise as the data of the sample increases than the data
are said to have “Infinite Variance Syndrome”.
Estimation and Hypothesis Testing
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
59
Null Hypothesis
We will have a chance to look at hypothesis testing in detail in future
lectures. However, it is well to remember the following:
• Generally, it is not possible to definitively prove anything. One can
only attempt to prove that a hypothesis is not true.
• Let H0 be the (null) hypothesis to be tested. If H0 can not be rejected,
then one an affirm that “it is true” with a certain degree of confidence.
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
60
Given two datasets, for example:
andhi = h1, · · ·, hn li = l1, · · ·, ln
La covariance between these two datasets is defined as:
Cov(hi, li) :=1
N − 1
n
1
(li − li)(hi − hi)
Other Statistics: Covariance
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
61
hi = h1, · · ·, hn li = l1, · · ·, ln
ρlh :=Cov(l, h)√
σh σl
Other Statistics: Correlation
Given two datasets, for example:
and
La correlation between these two datasets is defined as:
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
62
Please observe that one can consider the correlation between two sample series of equal length:
hi = h1, · · ·, hn−1 hi+1 = h2, · · ·, hn−1and
Cov(hi, hi+1) :=1
N − 1
n−1
j=1
(hi − hi)(hi+1 − hi+1)
Resulting in:
Other Statistics: Correlation
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
63
Repeating this operation for the series which are gradually reduced in length and separated by r instants, the resulting series are:
and
From where:
hi+r = hr, · · ·, hnhri = h1, · · ·, hn−r
Cov(hri , hi+r) :=
1N − 1
n−r
j=1
(hri − hr
i )(hi+r − hi+r)
ρ(hri , hi+r) :=
Cov(hri , hi+r)
σri σi + r
Other Statistics: Correlation
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
64
Other Statistics: Autocorrelation
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
Random Sampling
Within the strategy of creating and analysing data samples, the selection ( or,
sometimes, the generation) of random samples plays an important role.
A random sample of n events, selected from a population, is such if the probability
of that sample being selected is the same as any other sample of the same size.
If the data are generated, then one is carrying out a random experiment. Some
examples of this are:
•tossing a coin;
•counting the rainy days in a year; and
•counting the days when the river flow at the Bridge of San Lorenzo, Trento, is
greater than a predetermined value.
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
66
Corr
ado C
aud
ek
Sample Variability
Simulation 2Let us consider another example where sample variability is illustrated as follows:
1. the same population as in the previous example shall be used (N = 4);
2. by means of the computer programme R, 50,000 samples will be extracted, with replacement, from the population of size n = 2;
3. the mean will be calculated for each of these samples of size n = 2;
4. the mean and variance of the distribution of means of the 50,000 samples of size n = 2 will be calculated.
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
67
3 Simulazione 2
N <- 4
n <- 2
nSamples <- 50000
X <- c(2, 3, 5, 9)
Mean <- mean(X)
Var <- var(X)*(N-1)/N
SampDistr <- rep(0, nSamples)
for (i in 1:nSamples)
samp <- sample(X, n, replace=T)
SampDistr[i] <- mean(samp)
MeanSampDistr <- mean(SampDistr)
VarSampDistr <- var(SampDistr)*(nSamples-1)/nSamples
Tecniche di Ricerca Psicologica e di Analisi dei Dati 27
Corr
ado C
aud
ek
Sample Variability
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
67
3 Simulazione 2
N <- 4
n <- 2
nSamples <- 50000
X <- c(2, 3, 5, 9)
Mean <- mean(X)
Var <- var(X)*(N-1)/N
SampDistr <- rep(0, nSamples)
for (i in 1:nSamples)
samp <- sample(X, n, replace=T)
SampDistr[i] <- mean(samp)
MeanSampDistr <- mean(SampDistr)
VarSampDistr <- var(SampDistr)*(nSamples-1)/nSamples
Tecniche di Ricerca Psicologica e di Analisi dei Dati 27
Corr
ado C
aud
ek
Sample Variability
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
67
3 Simulazione 2
N <- 4
n <- 2
nSamples <- 50000
X <- c(2, 3, 5, 9)
Mean <- mean(X)
Var <- var(X)*(N-1)/N
SampDistr <- rep(0, nSamples)
for (i in 1:nSamples)
samp <- sample(X, n, replace=T)
SampDistr[i] <- mean(samp)
MeanSampDistr <- mean(SampDistr)
VarSampDistr <- var(SampDistr)*(nSamples-1)/nSamples
Tecniche di Ricerca Psicologica e di Analisi dei Dati 27
Mean and Variance of the Sample
Corr
ado C
aud
ek
Sample Variability
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
67
3 Simulazione 2
N <- 4
n <- 2
nSamples <- 50000
X <- c(2, 3, 5, 9)
Mean <- mean(X)
Var <- var(X)*(N-1)/N
SampDistr <- rep(0, nSamples)
for (i in 1:nSamples)
samp <- sample(X, n, replace=T)
SampDistr[i] <- mean(samp)
MeanSampDistr <- mean(SampDistr)
VarSampDistr <- var(SampDistr)*(nSamples-1)/nSamples
Tecniche di Ricerca Psicologica e di Analisi dei Dati 27
Mean and Variance of the Sample
50,000 samples are extracted
Corr
ado C
aud
ek
Sample Variability
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
68
3 Simulazione 2
Risultati della simulazione
> Mean
[1] 4.75
> Var
[1] 7.1875
> MeanSampDistr
[1] 4.73943
> VarSampDistr
[1] 3.578548
> Var/n
[1] 3.59375
Tecniche di Ricerca Psicologica e di Analisi dei Dati 28
Corr
ado C
aud
ek
Sample Variability
Results of analysis with R:
Sunday, September 12, 2010
Statistical Inference and Descriptive Statistics
Riccardo Rigon
69
Corr
ado C
aud
ek
Sample Variability
Population:
๏Sample distribution of the means:
๏Results of the R simulation:
µ = 4.75, σ2 = 7.1875
µx = 4.75, σ2x = 3.59375
µx = 4.73943, σ2x = 3.578548
Sunday, September 12, 2010
Riccardo Rigon
Measurement and Representation of Hydrological Quantities
70
Thank you for your attention!
G.U
lric
i -
Uom
o d
op
e av
er l
avora
to a
lle
slid
es ,
20
00
?
Sunday, September 12, 2010