6 measurement&representation

Measurement and Representation of

Hydrological Quantities

Leon

ard

o d

a V

inci

- V

itru

vian

Man

, ca

14

87

p

hoto

by

Luc

Via

tou

r, w

ww

.lu

cnix

.be

Riccardo RigonSunday, September 12, 2010

Riccardo Rigon

Measurement and Representation of Hydrological Quantities

Objectives:

2

•In these pages the spatio-temporal variability of measurements of hydrological quantities is discussed by means of examples.

•One deduces that statistical instruments must be used to describe these quantities.

Sunday, September 12, 2010

Riccardo Rigon


Hyd

rom

etri

c H

eigh

tFr

icken

hau

sen

, on

th

e R

iver

Men

o

3


Riccardo Rigon


4

Hyd

rom

etri

c H

eigh

tFr

icken

hau

sen

, on

th

e R

iver

Men

o


Riccardo Rigon


The hydrological cycles is controlled by innumerable factors: hence it depends on innumerable degrees of freedom. Only a small portion of these factors can be taken into consideration, while the remaining part needs to be modelled as a boundary condition or as “background noise” (this noise is either modelled or eliminated with statistical instruments).

The dynamics of the hydrological cycle are non-linear. Both the hydrodynamics and the thermodynamics of the processes, that involve numerous phase changes, are non-linear. Another non-linear characteristic is that many of these processes are activated in function of some regulating quantity surpassing a threshold value. For example, the condensation of water vapour into raindrops is triggered when air humidity exceeds saturation; landslides are triggered when the internal friction forces of the material are overcome by the thrust of water within the capillarities of the soil; the channels of a hydrographic network begin to form when running water reaches a certain value of force per unit area.

Hydrological Data have Complex Trends 1/2

5


Riccardo Rigon


The dynamics include processes which are linearly unstable: for example the

baroclinic instability the drives meteorological processes at the middle

latitudes.

The dynamics of climate and hydrology are dissipative. That is to say they

transfer and transform mechanical energy into thermal energy. The

hydrodynamic process of turbulence transports energy from the larger

spatial scales to the smaller ones, where the energy is dissipated through

friction. Wave phenomena of various kind (e.g. gravity waves) transport the

energy contained in water and in air.

6

Hydrological Data have Complex Trends 2/2


Riccardo Rigon


Some Typical Problemsprecipitation

7


Riccardo Rigon


8

Some Typical Problemsincident solar radiation


Riccardo Rigon


9

Some Typical Problems Flow of the River Adige at San Lorenzo Bridge

0200

400

600

800

1000

1200

1400

Anno

Port

ate

m^3/s

1990 1995 2000 2005


Riccardo Rigon


10

Some Typical ProblemsDistribution of monthly river flows in Trento


Riccardo Rigon


Grafico bilancio annuo del bacino (2000)

0,8675

0,343

-0,184

0,797

-0,3

-0,2

-0,1

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

gen-00 feb-00 mar-00 apr-00 mag-00 giu-00 lug-00 ago-00 set-00 ott-00 nov-00 dic-00

Tempo (mese- anno)

Valo

re (m

c/s)

P - precipitazione ET - evapotraspirazione Inv - volume invasato (accumulo) R - rilascio

11

Some Typical ProblemsAnnual water budget for the Lake of Serraia catchment


Riccardo Rigon


12

Some Typical ProblemsWater content of the soil in the Little Washita catchment (Oklahoma)


Riccardo Rigon


13

Some Typical ProblemsWater content of the soil in the Little Washita catchment (Oklahoma)


Riccardo Rigon


14

Some Typical ProblemsSpatial distribution of preceipitation


Riccardo Rigon


15

Some Typical ProblemsSpatial pattern of the hydrographic network


Statistical Inference and Descriptive Statistics

Luci

o F

on

tan

a -

Exp

ecta

tion

s (M

oM

A), 1

95

9

Riccardo Rigon


Riccardo Rigon


Objectives:

17

•In these pages the fundamental elements of statistical analysis will be recalled.

•Population, sample and various elementary statistics, such as mean, variance and covariance, will be defined.

•The existence of statistics and their value will be argued.

•The concept of random sampling will be introduced.


Riccardo Rigon

Statistics

Population and Sample

18

Statistical inference assumes that a dataset is representative of a subset of

cases, among all the possible cases, called the sample. All the possible

cases represent the population from which the dataset has been extracted.

While the sample is know, generally the population is not. Hypotheses are

implicitly made about the population.


Riccardo Rigon

Statistics

1860 1880 1900 1920 1940 1960 1980 20008

9

10

11

12

13

14

15a) Bergen:Sep temperature

time

Tem

pera

ture

(oC

)

5 6 7 8 9 10 11 12 13 14 150

5

10

15

20

25

30b) Bergen:Sep temperature distribution (1861!1997)

Fre

quency

Temperature (oC)

Exploratory Data Analysistemporal representation - histogram

19

A set of n data constitutes, therefore, a sample of data.

These data can be represented in various forms. Each representation form emphasises certain characteristics.


Riccardo Rigon

Statistics

Sample Means

20

x :=1n

n

t=1

x,t

< x >:=1n

n

i=1

xi

Temporal Mean

Spatial Mean

The mean is an indicator of position

Given a sample, various statistics can be calculated. For example:



Riccardo Rigon

21

Corr

ado C

aud

ek

Statistical Inference



Riccardo Rigon

21

Corr

ado C

aud

ek


•Statistical inference is the process which allows one to formulate

conclusions with regards to a population on the basis of a sample of

observations extracted casually from the population.



Riccardo Rigon

21

Corr

ado C

aud

ek





•Central to classic statistical inference is the notion of sample distribution,

that is to say how the statistics of the samples vary if casual samples, of the

same size n, are repeatedly extracted from the population.



Riccardo Rigon

21

Corr

ado C

aud

ek





•Central to classic statistical inference is the notion of sample distribution,

that is to say how the statistics of the samples vary if casual samples, of the

same size n, are repeatedly extracted from the population.

•Even though, in each practical application of statistical inference, the

researcher only has one n-sized casual sample, the possibility that the

sampling can be repeated furnishes the conceptual foundation for deciding

how informative the observed sample is of the population in its entirety.


Riccardo Rigon

Statistics

Exploratory Data Analysis

22

The mean is not the only indicator of position

Mode


Riccardo Rigon

Statistics

Median and Mode

23

The mode represents the most frequent value.

The median represents the value for which 50% of the data has an inferior value and (obviously!) the other 50% has a greater value.

If the histogram distinctly presents various maximums, though the matter risks being controverial, the dataset is said to be multimodal.


Riccardo Rigon

Statistics

Empirical Distribution Function

24

Given the dataset

hi = h1, · · ·, hn

the empirical cumulative distribution function is defined

and having derived from this the ordered set in ascending order

hj = (h1, · · ·, hn) h1 ≤ h2 ≤ · ≤ hn

ECDFi(h) :=1n

i

j=1

j


Riccardo Rigon

Statistics

ECDF

25

The empirical cumulative distribution function can be represented as illustrated.

The ordinate value identified by the curve is called the frequency of non-

exceedance or quantile.

20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

Frequenza di non superamento

h[mm]

P[H<h]


Riccardo Rigon

Statistics

2620 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0


h[mm]

P[H<h]

0.5 quantile

The 0.5 quantile separates the data distribution in half in relation to the ordinate.

ECDF


Riccardo Rigon

Statistics

2720 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0


h[mm]

P[H<h]

0.5 quantile

The 0.5 quantile separates the data distribution in half in relation to the ordinate.

ECDF


Riccardo Rigon

Statistics

2820 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0


h[mm]

P[H<h]

0.5 quantile

median

And so the median is identified

ECDF


Riccardo Rigon

Statistics

Box and Whisker Diagrams

29

20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0


h[mm]

P[H<h]

0.5 quantile

The procedure can be generalised and represented with a box and whisker diagram.

0.75 quantile

0.25 quantile

“whisker”

The box and whisker diagram is another way of representing the data distribution.


Riccardo Rigon

Statistics

A parameter is a describes a certain aspect of the population.

• For example, the (real) mean annual precipitation at a weather station

is a parameter. Let us suppose that this mean is

• In any concrete situation the parameters are unknown

30

Corr

ado C

aud

ek

µh = 980 mm

Parameters and Statistics


Riccardo Rigon

Statistics

A statistic is a number that can be calculated on the basis of data

given by a sample, without any knowledge of the parameters of the

population.

• Let us suppose, for example, that the casual sample of precipitation

data covers 30 years of measurement and that the mean annual

precipitation, on the basis of the sample, is

• This mean is a statistic.

31

Corr

ado C

aud

ek

h = 1002 mm

Parameters and Statistics


Riccardo Rigon

Statistics

Other Statistics: the Range

32

Rx := max(x)−min(x)

The range is the simplest indicator of data distribution. It is an indicator of the scale of the data. However, it only considers two data and does not consider the other n-2 data that make up the sample.


Riccardo Rigon

Statistics

Other Statistics: Variance and Standard Deviation

33

V ar(x) :=1n

n

i=1

(xi − x)

σx :=

1n

n

i=1

(xi − x)

The variance is an indicator of “scale” that considers all the data of the sample


Riccardo Rigon

Statistics

34

V ar(x) :=1

n− 1

n

i=2

(xi − x)

σx :=

1n− 1

n

i=1

(xi − x)

The unbiased version of the variance takes into account that only n-1 data are independent, their mean being fixed.

Other Statistics: Variance and Standard Deviation

“corrected” version (unbiased)


Riccardo Rigon

Statistics

Coefficient of Variation

• The coefficient of variation (CV) of a data sample is defined as the

ratio of between the standard deviation and the mean:

• The greater the coefficient of variation, the less informative and

indicative the mean is in relation to the future trends of the

population.

35

CVx :=σx

x


Riccardo Rigon

Statistics

36

Skewness is a measure of the asymmetry of the data distribution

skx :=n

i=1

1n

xi − x

σx

3

kx := 3 +n

i=1

1n

xi − x

σx

4

Other Statistics: Skewness and Kurtosis

Kurtosis is a measure of the “peakedness” of the data distribution


Riccardo Rigon

Statistics

Estimation and Hypothesis Testing

Usually, we are not interested in the statistics for themselves, but in

what the statistics tell us about the population of interest.

• We could, for example, use the annual mean precipitation, measured

at all hydro-meteorological stations, to estimate the mean annual

precipitation for the Italian Peninsula.

• Or, we could use the mean of the sample to establish whether the

mean annual precipitation has mutated during the duration of the

sample.

37


Riccardo Rigon

Statistics

These two questions belong to the two main schools of classical

statistical inference

• The estimation of parameters

• Statistical hypothesis testing

38



Riccardo Rigon

Statistics

Sample Variability

A fundamental aspect of sample statistics is that they vary from one

sample to the next. In the case of annual precipitation, it is very

improbable that the mean of the sample, of 1002mm, will coincide

with the mean of the population.

• The variability of a sample statistic from sample to sample is called

sample variability.

– When sample variability is very high, the sample is

misinformative in relation to the population parameter.

– When the sample variability is small, the statistic is informative,

even though it is practically impossible that the statistic of a

sample be exactly the same as the population parameter.

39



Riccardo Rigon

40

Corr

ado C

aud

ek

Sample Variability

Simulation Sample variability will be illustrated as follows:

1. we will consider a discrete variable that can only assume a small number of possible values (N = 4);

2. a list will be furnished listing all possible samples of size n = 2;

3. the mean will be calculated for each possible sample of size n = 2;

4. the distribution of means of the samples of size n = 2 will be examined.

The mean μ and the variance σ of the population will be calculated. It must be noted that μ and σ are parameters, while the mean xi and the variance s2

i of each sample are statistics.

Techniques in Psychological Research and Data Analysis 8



Riccardo Rigon

41

Corr

ado C

aud

ek

Sample Variability

•The experiment in this example consists of the n=2 extractions with return of a marble xi from an urn that contains N=4 marbles.

•The marbles are numbered as follows: 2, 3, 5, 9

•Extraction with return of the marble corresponds to a population of infinite size (it is in fact always possible to extract a ball from the urn)



Riccardo Rigon

42

Corr

ado C

aud

ek

Sample Variability

•For each sample of size n=2 the mean of the value of the marbles extracted is calculated:

•For example, if the marbles extracted are x1=2 and x2=3, then:

x =2

i=1

xi

2

x =2 + 3

2=

52

= 2.5



Riccardo Rigon

43

Corr

ado C

aud

ek

Sample Variability

Three DistributionsWe must distinguish between three distributions:

1. the population distribution

2. the distribution of a sample

3. the sample distribution of the means of all possible samples



Riccardo Rigon

44

Corr

ado C

aud

ek

Sample Variability

๏ 1. The Population Distribution

The population distribution: the distribution of X (the value of the marble extracted) in the population. In this specific case the population is of infinite size and has the following probability distribution:

xi pi

2 1/43 1/45 1/49 1/4

Total 1



Riccardo Rigon

45

Corr

ado C

aud

ek

Sample Variability

•The mean of the population is:

•The variance of the population is:

µ =

xipi = 4.75

σ2 =

(xi − µ)2pi = 7.1875



Riccardo Rigon

46

Corr

ado C

aud

ek

Sample Variability

๏ 2. The Distribution of a Sample

The distribution of a sample: the distribution of X in a specific sample.

•If, for example, the x1 = 2 and x2 = 3, then the mean of this sample is and the variance is x = 2.5 s2 = 0.5



Riccardo Rigon

47

Corr

ado C

aud

ek

Sample Variability

๏ 3. The Sample Distribution of a the Means

The sample distribution of a the means: the distribution of the means of all the possible samples.

•If the size of the samples is n=2, then there are 4X4=16 possible samples. We can therefore list their means.

sample mean xi sample mean xi

3, 2 2.5 2, 3 2.55, 2 3.5 2, 5 3.59, 2 5.5 2, 9 5.55, 3 4.0 3, 5 4.09, 3 6.0 3, 9 6.09, 5 7.0 9, 5 7.02, 2 2.0 3, 3 3.05, 5 5.0 9, 9 9.0



Riccardo Rigon

48

Corr

ado C

aud

ek

Sample Variability

•The sample distribution of the means has the following probability distribution:

xi pi

2.0 1/162.5 2/163.0 1/163.5 2/164.0 2/165.0 1/165.5 2/166.0 2/167.0 2/169.0 1/16

Total 1



Riccardo Rigon

49

Corr

ado C

aud

ek

Sample Variability

•The mean of the sample distribution of the means is:

•The variance of the population is:

µx =

xipi = 4.75

σ2x =

(xi − µx)2pi = 3.59375



Riccardo Rigon

50

Corr

ado C

aud

ek

Sample Variability

The example we have seen is very particular insomuch that the population is known. In practice the population distribution is never known.

However, we can take note of two important properties of the sample distribution of the means:

•The mean of the sample distribution of means is the same as the population mean

•The variance of the sample distribution of means is the equal to the ratio of the variance of the population to the numerosity n of the sample:

µxµ

σ2x

σ2

σ2x =

σ2

n=

7.18752

= 3.59375



Riccardo Rigon

51

Corr

ado C

aud

ek

Sample Variability

The two things to note can be summarised as follows:

•The mean and variance of the sample distribution of means are determined by the mean and variance of the population:

•The variance of the sample distribution of the means is smaller than the variance of the population.

σ2x =

σ2

nµx = µ



Riccardo Rigon

52

Corr

ado C

aud

ek

Sample Variability

To follow, we will use the properties of the sample distribution to make inferences about the parameters of the population even when the population distribution is not known.



Riccardo Rigon

53

Corr

ado C

aud

ek

Sample Variability

Three DistributionsTherefore, we have distinguished between three distributions:

1. the population distribution

2. the distribution of a sample

3. the sample distribution of the means of all possible samples

µx = 4.75, σ2x = 3.59375

Ω = 2, 3, 5, 9, µ = 4.75, σ2 = 7.1875

Ωi = 2, 3, x = 2.5, s2 = 0.5

Ωx = 2.0, 2.5, 3.0, 3.5, 4.0, 5.0, 5.5, 6.0, 7.0, 9.0,



Riccardo Rigon

54

Corr

ado C

aud

ek

Sample Variability

The population distribution: this is the distribution that contains all possible observations. The mean and variance of this distribution are

indicated with μ and σ2.

1. The distribution of a sample: this is the distribution of the values of the population that make up a particular casual sample of size n. The single values are indicated x1,.... xn, and the mean and variance are indicated and s2.

2. The sample distribution of the means of the samples: this is the distribution of the for al the possible samples of size n that can be extracted from the population being considered. The mean and variance of the sample distribution of means are indicated by and .

x

xi

µx σ2x



Riccardo Rigon

55

Corr

ado C

aud

ek

Sample Variability

The distribution that is the basis of statistical inference is the sample

distribution.

Definition: the sample distribution of a statistic is the distribution of values that the specific statistic assumes for all samples of size n that can be extracted from the population.

It must be noted that if the simulation considers less samples than all those theoretically possible than the resulting distribution will only be an approximation of the real sample distribution.



Riccardo Rigon

56

Having created different statistics, we can now make some hypotheses. For

example:

• Do the samples all have the same mean and the same variance?

• Does the mean depend on the numerosity of the sample?

• Does the variance depend on the numerosity of the sample?




Riccardo Rigon

57

If the samples do not have the same mean, a trend can present istself.




Riccardo Rigon

58

The variance can vary with the numerosity of the sample !

If it does not stabilise as the data of the sample increases than the data

are said to have “Infinite Variance Syndrome”.




Riccardo Rigon

59

Null Hypothesis

We will have a chance to look at hypothesis testing in detail in future

lectures. However, it is well to remember the following:

• Generally, it is not possible to definitively prove anything. One can

only attempt to prove that a hypothesis is not true.

• Let H0 be the (null) hypothesis to be tested. If H0 can not be rejected,

then one an affirm that “it is true” with a certain degree of confidence.



Riccardo Rigon

60

Given two datasets, for example:

andhi = h1, · · ·, hn li = l1, · · ·, ln

La covariance between these two datasets is defined as:

Cov(hi, li) :=1

N − 1

n

1

(li − li)(hi − hi)

Other Statistics: Covariance



Riccardo Rigon

61

hi = h1, · · ·, hn li = l1, · · ·, ln

ρlh :=Cov(l, h)√

σh σl

Other Statistics: Correlation

Given two datasets, for example:

and

La correlation between these two datasets is defined as:



Riccardo Rigon

62

Please observe that one can consider the correlation between two sample series of equal length:

hi = h1, · · ·, hn−1 hi+1 = h2, · · ·, hn−1and

Cov(hi, hi+1) :=1

N − 1

n−1

j=1

(hi − hi)(hi+1 − hi+1)

Resulting in:




Riccardo Rigon

63

Repeating this operation for the series which are gradually reduced in length and separated by r instants, the resulting series are:

and

From where:

hi+r = hr, · · ·, hnhri = h1, · · ·, hn−r

Cov(hri , hi+r) :=

1N − 1

n−r

j=1

(hri − hr

i )(hi+r − hi+r)

ρ(hri , hi+r) :=

Cov(hri , hi+r)

σri σi + r




Riccardo Rigon

64

Other Statistics: Autocorrelation



Riccardo Rigon

Random Sampling

Within the strategy of creating and analysing data samples, the selection ( or,

sometimes, the generation) of random samples plays an important role.

A random sample of n events, selected from a population, is such if the probability

of that sample being selected is the same as any other sample of the same size.

If the data are generated, then one is carrying out a random experiment. Some

examples of this are:

•tossing a coin;

•counting the rainy days in a year; and

•counting the days when the river flow at the Bridge of San Lorenzo, Trento, is

greater than a predetermined value.



Riccardo Rigon

66

Corr

ado C

aud

ek

Sample Variability

Simulation 2Let us consider another example where sample variability is illustrated as follows:

1. the same population as in the previous example shall be used (N = 4);

2. by means of the computer programme R, 50,000 samples will be extracted, with replacement, from the population of size n = 2;

3. the mean will be calculated for each of these samples of size n = 2;

4. the mean and variance of the distribution of means of the 50,000 samples of size n = 2 will be calculated.



Riccardo Rigon

67

3 Simulazione 2

N <- 4

n <- 2

nSamples <- 50000

X <- c(2, 3, 5, 9)

Mean <- mean(X)

Var <- var(X)*(N-1)/N

SampDistr <- rep(0, nSamples)

for (i in 1:nSamples)

samp <- sample(X, n, replace=T)

SampDistr[i] <- mean(samp)

MeanSampDistr <- mean(SampDistr)

VarSampDistr <- var(SampDistr)*(nSamples-1)/nSamples

Tecniche di Ricerca Psicologica e di Analisi dei Dati 27

Corr

ado C

aud

ek

Sample Variability



Riccardo Rigon

67

3 Simulazione 2

N <- 4

n <- 2

nSamples <- 50000

X <- c(2, 3, 5, 9)

Mean <- mean(X)









Corr

ado C

aud

ek

Sample Variability



Riccardo Rigon

67

3 Simulazione 2

N <- 4

n <- 2

nSamples <- 50000

X <- c(2, 3, 5, 9)

Mean <- mean(X)









Mean and Variance of the Sample

Corr

ado C

aud

ek

Sample Variability



Riccardo Rigon

67

3 Simulazione 2

N <- 4

n <- 2

nSamples <- 50000

X <- c(2, 3, 5, 9)

Mean <- mean(X)









Mean and Variance of the Sample

50,000 samples are extracted

Corr

ado C

aud

ek

Sample Variability



Riccardo Rigon

68

3 Simulazione 2

Risultati della simulazione

> Mean

[1] 4.75

> Var

[1] 7.1875

> MeanSampDistr

[1] 4.73943

> VarSampDistr

[1] 3.578548

> Var/n

[1] 3.59375


Corr

ado C

aud

ek

Sample Variability

Results of analysis with R:



Riccardo Rigon

69

Corr

ado C

aud

ek

Sample Variability

Population:

๏Sample distribution of the means:

๏Results of the R simulation:

µ = 4.75, σ2 = 7.1875

µx = 4.75, σ2x = 3.59375

µx = 4.73943, σ2x = 3.578548


Riccardo Rigon


70

Thank you for your attention!

G.U

lric

i -

Uom

o d

op

e av

er l

avora

to a

lle

slid

es ,

20

00

?


6 measurement&representation

Documents

riccardo rigonsunday

hydrological cycles

mechanical energy

thermal energy

typical problems flow

typical problems precipitation

river meno hydrometric

river adige