Statistics with R Chapter 1: Introduction to statistics · Statistics with R Chapter 1: Introduction to statistics TabeaRebafka October 2018 MasterAIMS2018–19 Tabea Rebafka Statistics

Statistics with R

Chapter 1: Introduction to statistics

Tabea Rebafka

October 2018

Master AIMS 2018–19

Tabea Rebafka Statistics with R Introduction to statistics 1 / 39

Outline

1 What is statistics?

2 Example: Coin tossing

3 Refresher on probability theory

4 Statistical modelling


What is statistics? I

What is the aim of statistics?Analysis and interpretation of data (or observations, measurements)

understand an observed phenomenon by statistical inference (i.e.modelling, estimation and testing)recover unobserved features (prediction)


What is statistics? II

Statistical approachUse a probabilistic model to explain the nature of the data (inopposition to data analysis)Let x1, . . . , xn be the data. A statistician assumes that (x1, . . . , xn) isthe realization of a random variable X with distribution P.The distribution P is unknown (in opposition to probability theory).


What is statistics? III


Example: Coin tossing I

DataObservations: the outcome of n tosses of the same coinHead is encoded by 1, tail by 0.Data: x1, . . . , xn with xi ∈ 0, 1. The number n is called the samplesize.

Probabilistic modelConsider xi as independent realizations of a Bernoulli distributionMore precisely, let Xi be i.i.d. (independent and identicallydistributed) random variables with Bernoulli distribution B(p) withparameter p ∈ (0, 1), i.e.

P(Xi = 1) = p = 1− P(Xi = 0)

Bernoulli parameter p is unknown.


Example: Coin tossing II

Fit the modelEstimate the Bernoulli parameter p from the data x1, . . . , xn.Simple idea: we know that for Xi ∼ B(p) i.i.d., we have

E[X1] = p and Xn =1n

n∑i=1

XiP−→ p (n→∞).

Use the sample mean Xn as an estimate of p:

pn = xn =1n

n∑i=1

xi .


Example: Coin tossing III

Properties of the estimator pn of p

pn = XnP−→ p as n→∞, i.e. when the sample size n is large, pn

tends to be close to p (consistency).E[pn] = p, i.e. in average pn takes the target value p (unbiased)Mean squared error (MSE)

E[(pn − p)2] =p(1− p)

n−→ 0 (n→∞).

Limit distribution and rate of convergence

√n(pn − p)

d−→ N (0, p(1− p)) (n→∞).


Example: Coin tossing IV

Quantify uncertainty of the estimateInstead of a point estimator pn compute an interval I that depends on thedata (i.e. I = I(x1, . . . , xn)) and that contains the target p with givenprobability γ (confidence interval):

P(p ∈ I) ≥ γ.

The length of the interval I indicates the uncertainty about our estimationof p.


Example: Coin tossing V

Confidence interval for p in the Bernoulli modelAn asymptotic confidence interval is given by

In =

[pn + qNγ1

√pn(1− pn)

n, pn + qNγ2

√pn(1− pn)

n

]

with γ1 = (1− γ)/2 and γ2 = (1 + γ)/2 and where qNα denotes theα-quantile of the standard normal distribution N (0, 1) defined by

P(Z ≤ qNα ) = α for Z ∼ N (0, 1).

Interval length:

2qNγ2

√pn(1− pn)

n

by using qNγ1= −qNγ2

.


Example: Coin tossing VI

Statistical testingAnswer questions as: Is the coin a fair coin?

Mathematically speaking: Is p = 1/2 or p 6= 1/2?Estimate p and evaluate the uncertainty of the estimateIf the estimate is too far away from 1/2, then decide that p 6= 1/2.Otherwise conserve the hypothesis that p = 1/2.


Refresher on probability theory

DefinitionA sample space is any finite or infinite set Ω (it is thought as the setof all possible outcomes of a random experiment).Any subset A ⊂ Ω is called an event, including Ω and the empty set ∅.

Example: DiceSample space of rolling a dice: Ω = 1, . . . , 6.Some events:

A = 2 ,B = 2, 4, 6 = the result is even ,C = ∅,D = Ω.


Probability measures I

Basically, a probability measure assigns a probability to every event.

DefinitionLet Ω be a sample space, a probability measure P on Ω is an application

P : Events → [0, 1]

such thatP(∅) = 0, P(Ω) = 1.(Countable additivity) For every sequence of disjoint eventsA1,A2, . . .

P

⋃n≥1

An

=∑n≥1

P(An).

A pair (Ω,P) is called a probability space.


Probability measures IIExamples

The uniform measure on a finite set Ω is defined by

µ(A) =card(A)

card(Ω).

The Dirac measure (or Dirac mass) at some point a, denoted by δa,puts all the mass on a:

δa(A) =

1 if a ∈ A,

0 otherwise.

The Lebesgue measure on R is the measure λ that assigns the length toeach interval [a, b]:

λ([a, b]

)= b − a.

It is not a probability measure as its values are not restricted to [0, 1].


Probability measures III

PropositionLet (Ω,P) be a probability space.(i) If A ⊂ B , then P(A) ≤ P(B).(ii) For any event A, P(Ac) = 1− P(A).(iii) For any events A,B ,

P(A ∪ B) = P(A) + P(B)− P(A ∩ B),

in particular P(A ∪ B) ≤ P(A) + P(B).


Probability measures IV

Proposition(iv) (Union bound) More generally, let (An)n≥1 be any sequence of sets

(not necessarily disjoint),

P

⋃n≥1

An

≤∑n≥1

P(An).

(v) (Law of total probability) Let A be an event and B1,B2, . . . be asequence of disjoint sets such that ∪n≥1Bn = Ω,

P(A) =∑n≥1

P(A ∩ Bn).


Random variables I

From now on, we work on a fixed probability space (Ω,P) where P is aprobability measure.

Elements of Ω are often denoted by ω.

DefinitionAny function X : Ω→ R is a random variable.


Random variables II

Example: Indicator functionFor a given event A, the indicator function of A is denoted by 1A anddefined as

1A(ω) =

1 if ω ∈ A,

0 otherwise.

Indicator functions are similar to Dirac measures as 1A(ω) = δω(A).


Random variables III

DefinitionThe distribution or law of X , denoted by PX , is the probabilitymeasure on R such that for any event A

PX (A) = P (ω such that X (ω) ∈ A) = P(X ∈ A).

We write X ∼ PX .The cumulative distribution function (or just distribution function)of X is the function FX : R 7→ [0, 1] defined by

FX (t) = P(X ≤ t) for every t.

TheoremX and Y have the same law ⇐⇒ FX (t) = FY (t) for every t.


Random variables IV

Properties of the distribution function(i) FX is non-decreasing.(ii) FX is right-continuous.(iii) lim

t→−∞FX (t) = 0, lim

t→+∞FX (t) = 1.

TheoremAny function F with properties (i), (ii) and (iii) above, is the distributionfunction of some random variable.


Discrete distribution I

DefinitionWe say that X has a discrete distribution if X takes its values in afinite or countable set x1, x2, . . . .Discrete distributions are entirely described by their probability massfunction p(x) = P(X = x) for x ∈ x1, x2, . . . .


Discrete distribution II

Examples of discrete distributionsBernoulli distribution B(p) with parameter p ∈ [0, 1] with values in0, 1:

P(X = 1) = p, P(X = 0) = 1− p.

Model of the success or failure of an experiment.Binomial distribution B(n, p) with parameters n ≥ 1 and p ∈ [0, 1]:

P(X = k) =

(n

k

)pk(1− p)n−k for k = 0, 1, . . . , n.

Model of the number of successes in n Bernoulli trials.


Discrete distribution III

Examples of discrete distributionsGeometric distribution with parameter p ∈ [0, 1]:

P(X = k) = (1− p)k−1p for k = 1, 2, . . .

Model of the number of Bernoulli trials until the first success.Poisson distribution with parameter λ > 0:

P(X = k) = e−λλk

k!for k = 0, 1, 2, . . .

Discrete uniform distribution on a finite set of values x1, . . . , xm:

P(X = xk) =1m

for k = 1, . . . ,m.


Discrete distribution IVBernoulli distribution with parameter p = 0.4

0.0 0.2 0.4 0.6 0.8 1.00.

00.

4

x

Pro

babi

litie

s

−2 −1 0 1 2 3 4 5

0.0

0.4

0.8

x

CD

F


Discrete distribution VBinomial distribution with parameters n = 8 and p = 0.4

0 2 4 6 8

0.00

0.15

0.30

x

Pro

babi

litie

s

−2 0 2 4 6 8 10

0.0

0.4

0.8

x

CD

F


Discrete distribution VI

The cumulative distribution function of any discrete distribution is astep function.

The jumps indicate the values taken by the random variable and

the height of the jump indicates the associated probability.


Continuous distribution I

DefinitionWe say that X has continuous distribution if X takes its values in R (orin an interval of R) and if there is a non-negative function f such that forany event A

P(X ∈ A) =

∫Af (x)dx .

The function f is called the density of X .

Any density function f is non-negative and∫R f (x)dx = 1.

The density entirely describes the distribution of the random variable.


Continuous distribution II

Examples continuous distributionsUniform distribution U[a, b] on [a, b]:

f (x) =1

b − a1[a,b](x).

Exponential distribution E(λ) with parameter λ > 0:

f (x) = λ exp(−λx)1x≥0.

Normal distribution or gaussian distribution N (µ, σ2) withparameters µ ∈ R, σ2 > 0:

f (x) =1

σ√2π

exp(−(x − µ)2

2σ2

).


Continuous distribution IIIExponential distribution E(1) with parameter λ = 1

−2 −1 0 1 2 3 4 50.

00.

40.

8

x

Den

sity

−2 −1 0 1 2 3 4 5

0.0

0.4

0.8

x

CD

F


Continuous distribution IVNormal distribution N (2, 1) with parameters µ = 2 and σ2 = 1

−2 0 2 4 60.

00.

20.

4

x

Den

sity

−2 0 2 4 6

0.0

0.4

0.8

x

CD

F


Continuous distribution VUniform distribution U[−1, 3] on [−1, 3]

−2 0 2 40.

00.

20.

4

x

Den

sity

−2 0 2 4

0.0

0.4

0.8

x

CD

F


Continuous distribution VI

The cumulative distribution function of any continuous distribution iscontinuous.

We have Fx(t) =∫ t−∞ f (x)dx for all t and

f (t) = F ′(t) for almost all t.


Continuous distribution VII

There exist random variables which are neither discrete norcontinuous!For instance X = min 1,Y where Y ∼ E(1) (censored distribution).


Statistical modelling I

In statistics, data x = (x1, . . . , xn) are considered as a realization of arandom vector X = (X1, . . . ,Xn) with distribution P.The distribution P is unknown.

Statistical modelWe introduce a family P of (known) probability distributions andsuppose that P belongs to this family P, i.e.

P ∈ P.

P is called a statistical model and it is indeed a set of candidatedistributions for P.


Statistical modelling II

A statistical model P is determined by usingI our prior knowledge on the observed phenomenon andI tools from descriptive statistics.

Any model is false. A model is only an approximation of reality.

A model is always a trade-off between a precise description of acomplex reality and mathematical convenience.


Statistical modelling III

Model parameterIn general, we write P = Pθ, θ ∈ Θ where θ is the modelparameter and Θ the parameter set.Denote θ0 ∈ Θ the “true value” of the parameter such that P = Pθ0 .The problem of estimating P becomes the problem of estimating theparameter θ0 from the data.

IdentifiabilityThe model P is said to be identifiable if and only if

∀θ, θ′ ∈ Θ,Pθ = Pθ′ =⇒ θ = θ′.


Statistical modelling IV

Example: Coin tossingThe data x = (x1, . . . , xn) are considered as a realization the randomvector X = (X1, . . . ,Xn) with Xi ∼ B(p) i.i.d. and unknownparameter p ∈ (0, 1).In other words, we suppose that the distribution P of X belongs to thefamily

P = B(p)⊗n, p ∈ (0, 1).

Here, p is the model parameter.


Parameter estimation I

How to estimate θ0 from the data x = (x1, . . . , xn)?

DefinitionAny function S = S(x) defined on the data x is called a statistic.Examples: S1(x) = 0,∀x; S2(x) = xn.A statistic is called an estimator of θ0 if the statistic is supposed toapproach θ0.


Parameter estimation II

There are different estimation approaches depending on the size of theparameter set Θ.

If Θ ⊂ Rd (i.e. if θ is a d-vector) for some d <∞, the model is calledparametric.If no parametrization of P exists such that Θ is of finite dimension,the model is called non parametric.

Examples of non parametric models:

P = the set of all probability measures

P = the set of all absolutely continuous probability measures

P = the set of all absolutely continuous probability measures withcontinuous density


Statistics with R Chapter 1: Introduction to statistics · Statistics with R Chapter 1: Introduction to statistics TabeaRebafka October 2018 MasterAIMS2018–19 Tabea Rebafka Statistics

Documents