Top Banner
______________________________ Département TECHNIQUES DE COMMERCIALISATION Online lessons : ENT, section « outils pédagogiques », platform Claroline, category TC, Course « MATHS3 ». ________ MATHEMATICS ________ Semester 3 Probability distributions Statistical inference IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 1 / 44
49

Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

Apr 07, 2018

Download

Documents

dangdat
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

______________________________Département TECHNIQUES DE COMMERCIALISATION

Online lessons : ENT, section « outils pédagogiques », platform Claroline, category TC, Course « MATHS3 ».

________ MATHEMATICS ________

Semester 3

Probability distributions

Statistical inference

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 1 / 44

Page 2: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

Introduction and history 3

Lessons and tutorials 5

I Discrete probability laws 5

I-1 Generalities : remainder 5

I-2 Hypergeometric law 6

I-3 Binomial law 8

I-4 Poisson's law 10

II A continuous probability law : the normal law 12

II-1 Convergence of discrete laws 12

II-2 Continuous random variable 13

II-3 Normal law (or Laplace's law) 16

II-4 Approximation of other laws by a normal law 21

III Sampling distributions 23

III-1 Introduction 23

III-2 Sampling distribution of means 23

III-3 Sampling distribution of proportions 24

IV Estimation 26

IV-1 Point estimates 26

IV-2 Estimate of µ by a confidence interval 27

IV-3 Estimate of π by a confidence interval 28

V Statistical hypothesis testing 30

V-1 Adequacy Khi-2 (χ²) tests 31

V-2 Conformance tests 33

Exercises 36

Form 45

TABLE OF CONTENTS

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 2 / 44

Page 3: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

INTRODUCTION AND HISTORY

A quick story of normal law

On late XVIIth century, Jacques Bernoulli shows the way to binomial law, calculating the chances of

success while performing several times a given experiment. His manual calculatations become

horribly complicated when numbers grow, due to the length of factorial calculus.

In the first half of XVIIIth century, Abraham de Moivre works on chance calculus and discovers a formula

that gives (approximately) the factorial of a natural number n :

Stirling-Moivre formula : (with n > 8, deviation < 1 %)

(n increases : % of deviation decreases)

This formula is next improved by Euler in the middle of the century, leading to an equality :

The former function (within the integral) has a typical "bell" curve, whose

apex coordinates are . Laplace will give a new demonstration of this formula,

using Euler's works.

With Euler, and then with Laplace and Legendre, a new theory is developped : the theory of errors

(born to simplify astronomers' work) : among several fluctuating measures of the same object or phenomenon

(fluctuations due to errors, lack of sharpness, dilatation of materials, …), what unique value could be considered

as representative of reality ? Thus, laws of distribution were to be created : distribution of values

and distribution of sample means.

These distributions of values are in infinite number, given each concrete example.

The general case of the theory of errors is still today an unsolved problem.

Between 1790 and 1800, Gauss, the "prince of mathematicians", invents and develops the least square method .

He applies it to the theory of errors, arguing that the best representative value of a series of other values x i

is the one, x , that minimises Σ(x i - x )².

This way and from simple distributions, x appears to be the arithmetic mean of the x i ; this result is also

true from bell distribution (that is generally typical of a sampling means distribution - with same sized samples

taken from the same former population).

These works are the only ones in which Gauss mentions the now famous "bell curve", but he didn't draw one

and its function already existed - that's why calling it "Gauss curve" is irrelevant.

Laplace soon objects, in relation to these Gauss' works, that if a bell distribution leads to a

bell sample distribution, there's no mention about the numerous other concrete situations whose population

doesn't behave this way (bell curve). According to Laplace, Gauss' works only are theoretical thoughts and also

are reflexive (bell lead to bell… because it's bell !).

In the 1810s, he demonstrates that if the values are uniformly distributed on an interval (a constant probability

density, distributed into an interval of mean x ), then the sample distribution of n -sized samples (n big enough)

is a bell one, which mean is x and standard deviation is about x /√(3n ).

Then, he enounciates a theorem that is the cornerstone of statistical inference :

Laplace's theorem (nowadays central limit theorem) :

Whatever the distribution of the values, for n big enough, the sample distribution of the means (of the

n -sized samples) is normal (bell curve), which mean is the arithmetic mean of the values, and which

standard deviation can be easily calculated by a formula (which always looks like the one given above).

Thus, he creates his Laplace's law , that's to say the normal law .

!n

nn n

≈ π

2e

! .0

e dx nn x x

+∞−= ∫

,e

nn

n

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 3 / 44

Page 4: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

In the XIXth century appears the profession of statistician (in any field, people need to know how a

population behaves). The most famous and prolific at this time was the french Adolphe Quételet,

wo published an analysis of Laplace's philosophy, numerous concrete data series showing bell-shaped

distributions (for instance, the "chest sizes of 4000 scottish soldiers", whose distribution perfectly fits in

the kind of theoretical normal curve. Indeed, the chest size of a man is the result of the sum of

several, random and independant factors - genetics, education, feeding, activity, … - and Laplace's theorem

assesses that the distribution of a sum, like the one of a mean, is normal !)

It has also to be reported that Quételet was the first who drew one of these famous normal bell curves !

(neither Gauss, nor Laplace, felt the need to draw one while thinking about theory)

Everything isn't necessarily normal

During the second half of XIXth century, statisticians show that everything isn't normally distributed

(the symmetry of normal law isn't representative of what happen in the whole world). Consequently, other

continuous or discrete laws are created in order to model several concrete situations. For instance :

* Poisson's law, quite asymmetrical, in case of rare events,

* Pareto's law for incomes distributions, asymmetrical too,

* Exponential law and others based on the same model, for life lengths, asymmetrical again, …

Other laws were created before the normal law exists :

* Uniform law, where the probability of any value is the same (throw of a die),

* Binomial law (from Bernoulli),

* Geometric law, dealing whith the number of tries you need to get your first success (binomial cases),

* Hypergeometric law, similar to binomial law, but in which repetition isn't allowed, ...

In the early XXst century, laws of superior orders are built, dealing with more than one variable, or involving

degrees of freedom :

* Student's law (sample distribution of means, built with two variables : mean and standard deviation)

* χχχχ² law - "Khi-squared" - (to evaluate the differences between a theoretical law and a real distribution)

At this time, english statisticians like Pearson, Student (William Sealy Gosset) or Fisher begin to develop

a true actual methodology in statistics, that's to say a well-formalized theory of inference (drawing conclusions

about a population, only knowing one of its samples), by the mean of creating new probability laws to describe

phenomenons :

They have dictated, between 1900 and 1950, an "objectivist" or "frequencist" interpretation for the concept of

probability. From the 1950s has expressed an argument known as the "neo-Bayesian" school telling that

statistical inference shouldn't be based on the collected data alone, but also need the knowledge and use

of underlying probabilistic models. It's the "subjectivist" school.

Calculation tools are increasingly powerful

With data processing (computering), a new performance took off : the "multidimensional data analysis".

It consists in describing, sort and simplify large recordings of collected data (e.g. : a survey on 3000 persons

on wich it has been collected 80 informations each).

Observed and crossed results may suggest laws (already existing or not), models or explainations that may

avoid statisticians to consider data relatively to arbitrary laws, formerly created, to wich they would be forced

to do a comparison.

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 4 / 44

Page 5: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

I. Discrete probability distributions

I.1 General case : remainder What you must have remembered from last chapter…

Let's consider an object or a set of objects and conceive a random experiment on it, which

outcomes form a sample space partitioned into a certain number of events.

e.g. :

objects : two dice

experiment : roll them, then calculate their sum

sample space : Ω = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 (non equally likely outcomes)

partition of Ω : E1 : "less than 7" ; E2 : "from 7 to 10" ; E3 : "11 or 12"

Each event Ei can be associated with a real value x i : a gain.

These gains are randomly reached, like each outcome - unpredictable - of the experiment ;

Onto this game, thus, "gain" is called a random variable , written X .

e.g. :

events :

gain X (€) :

For each value of the gain, we have to be able to calculate the probability of the associated event.

That's called "getting the probability law of X ".

e.g. :

gain X (€) : (former chapter,

pi or p(X = x i) : tutorial 6)

Interpretation and purpose of those probabilities :

If you play this game many times, you will lose or win approximately according in the proportions

announced by these probabilities.

With our example : every 36 games, you will have on average 15 losses of 3 €, 18 gains of 1 €

and 3 gains of 5 € ; thus, by combining them : a global loss of 12 €, on average, every 36 games.

This global result can be expressed for one game : 12/36 ≈ 0.33.

Playing it long-term, you'll have an average loss of 33 cents per try.

This value is called expected value of X : E (X ).

This expected value of gain is in any case calculated with a unique formula :

where n is the number of different values of X .

These long-term forecasts allow us to regard the former array as a statistical series in which

probabilities could be real frequencies of occurrence of the gains (though they are, short-term, only

"ideal" frequencies). Into this context, the array can be interpreted on a statistical way, getting

for instance a standard deviation of X , σ(X ).

15/36

E2 E3

-3 1

18/36 3/36

E1

5

-3 1 5

LESSONS AND TUTORIALS

( )1

n

i i

i

E X p x=

=∑

( ) ( ) ( ) ( ) ( ) ( );2 22 2

1

n

i i

i

V X p x E X E X E X X V X=

= − = − σ =∑

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 5 / 44

Page 6: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 6 / 44

I.2 Hypergeometric law

Its study will be restricted to a simple partition of Ω in two subsets (one event and its contrary).

I.2.1 Definition and implementation

The distribution of a random variable X is hypergeometric iff :

* an experiment is performed n times with no repetition of an outcome,

leading to combinations.

* X is defined as the number of success that may be obtained after n attempts.

Given a sample space Ω containing N outcomes, partitioned in two events :

A, containing a outcomes, called “success”

and A , containing the N - a remaining outcomes, called “failure”.

An experiment is performed n times, with no possibility of repetition of an outcome each

new time (hence, n ≤ N). Once finished, we will have obtained :

k success, random, smaller than n (number of attempts), and smaller than a (total

number of available success outcomes),

and n - k failures, smaller than N - a (total number of available failure outcomes).

Defining X as the random variable « number of success after n attempts »,

the probability distribution of X is hypergeometric, with the parameters n, a and N.

We can write it : HHHH (n , a , N).

I-2.2 Calculating probabilities

Total number of possibilities, drawing n outcomes among N : NCn

Number of possibilities giving k success and n - k failures : NC Ck n k

a a

−−×

Therefore, the probability of getting k success is : ( ) N

N

C Cp

C

k n k

a a

nX k

−−×= =

I-2.3 Mean and variance

The expected value of X and its variance are :

Writing “p” and “q” for the probabilities of success and failure at the first attempt, we can

simplify these formulas a little. As N

p and qN N

a a−= = :

( )EN

aX n= × ( ) ( )

2

N NV

N N 1

a a nX n

− −= × ×−

( )E pX n= ( ) NV pq

N 1

nX n

−= ×−

Page 7: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 7 / 44

T1 : Recognize and work with a hypergeometric distribution 8 different letters have to be drawn from our alphabet. Then, we look at them.

A “success” letter is a vowel.

The random variable X is the number of success among the 8 drawn letters.

a. Justify that the probability distribution of X is hypergeometric and give its parameters.

b. Calculate p(X = 0), p(X = 3), p(X = 8)

c. Calculate the expected value and the standard deviation of X ; comment E(X).

d. Build a bar-chart of this probability distribution.

(first, you have to build a list of values with your calculator)

Page 8: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

I.3 Binomial law

I.3.1 Definition and implementation

The distribution of a random variable X is binomial iff :

* the experiment was conducted n times with possible repetition of an outcome,

through a partition of Ω into one event (success) and its contrary (failure).

* X counts the number of success reached after n attempts.

a. Bernoulli's scheme

Let be a random experiment leading to a sample space Ω.

The event A, called success , has a probability to occur, p(A), written p .

Its contrary, called failure , has a probability q = 1 - p .

The choice tree we can draw from this situation is called Bernoulli scheme .

b. Binomial law

Let be an experiment following a Bernoulli scheme, and let process it n times,

under identical conditions, i.e. : p is constant.

X will be the random variable that counts the number k of success at the end of the n tries.

The probability law of X is a binomial law, with parameters n and p . BBBB (n ; p ).

I.3.2 Calculating a probability

A tree will show us why the below general formula is relevant :

In this example, the experiment is processed three times : n = 3 ; A is the success.

values of X

(= # of success)

p A 3

p A q A 2

A

p q A p A 2

q A 1

p A 2

q p A q A 1

A

q A p A 1

q A 0

The probability of the event A is invariable : p , such as the one of its contrary : q = 1 - p .

The possible numbers of success after 3 tries, values of X , are combined with the probabilities of

all possible intersection of events, on the right of the tree.

The probability that X = 1 is the sum of the probabilities that match the value 1, whose values

are the same : . We can write : p(X = 1) = 3 ×

Why are there, in the tree, exactly 3 paths leading to X = 1 ?

On a general way,

the probability to get k success is :

I.3.3 Mean and variance

In this context, they can be given by simple formulas :

I.3.4 Approximation of a hypergeometric law by a binomial law

In case N ≥ 20n , the law H (n , a , N) can be accurately approximated

by the law B (n , p) in which p = a /N.

q²p

probabilities of

intersections

p²q

pqp

pq²

qp²

qpq

( )p k k n k

nX k C p q −= =

( )E X np= ( )V X npq=

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 8 / 44

Page 9: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

T2 : Recognize and work with a binomial distribution

A wheel (roulette) is divided into 14 same-sized sectors. 5 sectors are white and the others are red.

After having spinned the wheel, the success is : "it stops on a white sector".

The random variable X gives, after 8 successive spinnings, the total number of success.

a. Explain why the probability distribution of X is binomial and give its parameters.

b. calculate p(X = 2). On your calculator, compute the list of probabilities of each possible value of X .

c. Graph (sticks) these results.

bar diagram of probabilities

d. calculate the expected value and the standard deviation of X, then interprete these values.

1 5

0.1

0.3

0 8

X

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 9 / 44

Page 10: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

I.4 Poisson's law

I.4.1 Why it has been created

In many cases, the number of different values that a variable X can reach is very big.

So, calculating a probability may involve very large number of combinations (and also include

large powers if the law is binomial), that even a calculater might not be able to calculate.

Moreover, in case a success is a rare event, every result isn't useful, like calculating the

extremely low probabilities of many non-realistic situations implying a large number of success

(non-realistic because very far from the weak mean number of expected success).

In the context of a binomial law, with a weak value of p , an other formula will be used (instead of

the binomial one) based on a Poisson's law , whose results will appear to be close enough to reality.

Concrete examples of use :

* examining a sample taken from a large quantity of products, or a large harvest, in case

the probability p that an element is shoddy (wrong) is low :

Here, the n elements of the sample are taken among N elements without possible repetition -

which gives a hypergeometric law, but n is very little compared to N , so we can simplify

the situation considering it as if repetition were allowed. So, this case can be treated by a

binomial law, whose results will be reliable. Moreover, the low value of p allows us to use a

Poisson's law instead of a binomial law.

* problems of length of a queue

* predicting a maximum number of accidents or failures, or other rare events concerning

a large population (for insurance companies, or study of rare diseases, for instance).

I.4.2 Definition, calculating a probability

This law is desined for a discrete variable X whose values are in infinite number :

all the natural numbers (0, 1, 2, … "untill" infinity).

The value of a probability is here given by :

where k is the number of success, in theory between 0 and infinity ;

e is the exponential number ;

λλλλ is the average of X , ie λλλλ = E (X )

The probability law of X is the Poisson 's law with unique parameter λλλλ , P(λλλλ ).

Using a Poisson's law into an exercise must be justified :

* either an exercise tells that the law is a Poisson's one ;

* or a binomial law will logically lead to the corresponding Poisson's law (I-3.4)

I.4.3 Mean and variance

In this context, the parameters are instantly known :

I-4.4 Approximation of other laws by a Poisson's law

Given a random variable X distributed by B (n , p).

For n "big enough" (n > 30) and p "weak" (p ≤≤≤≤ 0,1) such that n pq ≤≤≤≤ 10,

the law B (n , p) can be approximated by the law P (λ ) where λλλλ = E (X ) = np .

Given a random variable X distributed by H(N, a , n ).

For N ≥≥≥≥ 20n , n > 30 and p ≤≤≤≤ 0,1,

the law H (n , a , N) can be approximated by the law P (λ ) where λλλλ = E (X ) = na /N.

( )!

p ek

X kk

λ λ−= =

( )E X λ= ( )V X λ=

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 10 / 44

Page 11: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

T3 : Using a Poisson's law

The variable X is distributed by the binomial law [n = 50 and p = 0.06].

a. Obtain (calculator list) p(X = i ) for each integer i from 0 to 7.

b. Justify the approximation of this law by a Poisson's whose parameter has to be given.

c. Give, by using Poisson's law table, the probabilities asked above.

Compare them to the ones obtained with the binomial law.

d. Using the formula given on page 10, display those probabilities on a new list of your calculator.

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 11 / 44

Page 12: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

II. A continuous probability law : the Normal law

II.1 Convergence of discrete laws

Let's graph some discrete probability distributions, varying n , a , N :

n = 10, a = 100, N = 500 n = 30, a = 100, N = 500 n = 100, a = 100, N = 500

hypergeometric law

binomial law

Poisson's law

Some facts can be noticed :

* here, p = 0.2. This probability of success isn't very low, which explains the visible

differences between Poisson and binomial law graphs.

* when the population's size (N = 500) is rather big compared to n , the results of hypergeometric

and binomial laws are rather similar.

* the more n increases, the more the distributions seem symmetric, around a central value

that is the expected value of the law, in any case.

* the more n increases, the more the distributions seem to follow a curve, that could probably

be described by a unique family of functions.

Could we then, under certain conditions on n and p , define a unique law which may

reliably and quickly describe reality ?

* When n is high, looking for the probability of a lone value of X among many possibilities

becomes irrelevant. It would make more sense to look for the probability that X may be

located into some interval.

Could this unique law be described in terms of intervals instead of separated values ?

We can imagine now the opportunity of a new law, continuous , that would replace our discrete laws

in case of large populations or large numbers (of tries).

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 12 / 44

Page 13: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

II.2 Continuous random variable

II.2.1 Statistical notion of "continuous" distribution

T4 : Bell statistical distribution

1 - Let's consider statistical data that shows a symmetric distribution with most of central values,

from a large population.

e.g. : many objects were manufactured and weighed. Their theoretical mass is 3.8 kg, and

the results of weighings of 200 objects are :

Let's graph the frequency histogram of this series :

on abscissas : the variable (mass, kg)

on ordinates : frequency concentration (in % of objects per kg)

the areas of rectangles are in proportion with the frequencies

What's the probability, for an object taken randomly, to weight less than 3.77 kg ?

2 - Now, the 200 results can be given more precisely, into a greater number of thinner intervals

The new frequency histogram is the following :

(each point is

the middle of

the high side

of a rectangle)

A bell curve is appearing, typical of numerous distributions in many concrete fields.

(production, economy, biology, ecology, …).

Which way could we find, using this histogram, the probability that an object's mass (chosen

at random) be less than 3.7 kg ? be between 3.7 kg and 3.9 kg ?

[3.83 ; 3.9[ [3.9 ; 4.1[

effective 9 27 20029 12

[3.7 ; 3.77[ [3.77 ; 3.8[ [3.8 ; 3.83[

0.135 0.315

63 60

[3.5 ; 3.7[

0.3 0.145 0.06frequency 0.045

mass (kg)

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 13 / 44

Page 14: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

3 - We could consider weighing far more than 200 pieces, and with far more accurate results.

The histogram could contain a large number of rectangles and would become difficult to draw

and to read ! The only useful graph would contain only a points cloud, that would actually follow

a bell-shaped curve, that could be modeled by a function f .

In this context, how would we calculate the probabilities asked above ?

Notice : many concrete distributions aren't symmetric (incomes, …), but they won't be studied here

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 14 / 44

Page 15: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

II.2.2 Continuous random variable

Let's take place into an ideal case where the random variable X can take any value

among real numbers, from an infinite population.

def We say that a random variable is continuous if the set of its possible values

is an interval I of R (maybe R itself).

Thus, the "frequency concentration" is now a "probability density".

def A probability density of X is a function f positive and continuous in R

and such that : area of the closed surface between its curve and (Ox ).

e.g. from the former tutorial, the probability density might be represented as below :

mass (kg)

where a probability is seen as a crosshatched area, between the curve and (Ox ) axis.

Thus, the probability an object's mass is less than 3.7 kg is an integral of f .

def The distribution fonction of X is the function F that, with any value x , associates

the number F (x ) = p(X < x ).

So :

Direct consequences : p(a < X < b ) =

p(X > x ) =

( ).d 1f x x =∫ℝ

( ) ( ).dx

F x f t t−∞

= ∫

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 15 / 44

Page 16: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

II.3 Normal law (or Laplace's law)

As seen before, with a large population and a large number of measures on it (or of drawings),

many concrete phenomenons, and many discrete probability laws, can be modeled by typically

shaped probability densities.

These functions f have a general expression : with a > 0.

Their curves are "bell curves".

II.3.1 General definition : Normale law NNNN (µµµµ , σσσσ )

def Let be a variable X , of mean µ and standard deviation σ .

X may take any real value x .

Its probability law is called NNNN (µµµµ , σσσσ ) as long as its probability density is expressed by :

graph :

(this way, the total area between

the curve and (Ox ) is 1)

e.g. : probability density of the law N (25 , 10) :

Comment 1 : these curves are named "bell curves" or Gauss curves .

Comment 2 : a Gauss curve owns two inflection points, whoses abscissas are µ - σ and µ + σ .

Standard deviation can be seen on their graphs.

Comment 3 : some results to remember :

p(µ - σ < X < µ + σ ) = 68.3 %

p(µ - 1.96σ < X < µ + 1.96σ ) = 95 %

p(µ - 2σ < X < µ + 2σ ) = 95.4 %

p(µ - 2.58σ < X < µ + 2.58σ ) = 99 %

Comment 4 : the term "normal" can't be defined for one individual. Only a population

may show a normal distribution , using this adjective beacause it's

known that these functions fit with many concrete situations.

x

xµ0

( ) ( ).2

ea x b

f x k− −=

( )2

1

21e

2

x

f x

µσ

σ

− − =

π

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 16 / 44

Page 17: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

II.3.2 The standard normal distribution NNNN (0 , 1)

We will systematically use it, as a reference.

For this particular law, with zero mean and standard deviation = 1,

the variable will be named U and its values u .

With N (0 ; 1), the comment 3 above (II-3.1) gives us formulas :

p(-1 < U < 1) = 68.3 %

p(-1.96 < U < 1.96) = 95 % represent it on the graph

p(-2 < U < 2) = 95.4 %

p(-2.58 < U < 2.58) = 99 %

How to use the table :

It provides probabilities p(U < u ) with positive values of u .

The other cases can be solved, using the following formulas :

It's been seen on part II-2.2 that : p(a < U < b) = p(U < b) - p(U < a) (1)

p(U > a) = 1 - p(U < a) (2)

Also can be deduced : p(U < -a) = 1 - p(U < a) (3)

Illustrations : (1) (2) (3)

Many values F (u ) = p(U < u ) are given into a table (see form).

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 17 / 44

Page 18: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

T5 : Using the table of standard normal law

Using this table, determine :

p(U < 1) = p(U < 1.96) = p(U < 2.58) =

p(U > 1) =

p(U > 1.63) =

p(U > 0.35) =

p(1 < U < 2) =

p(0.42 < U < 1.07) =

p(U < -1) =

p(U < -0.88) =

p(U > -0.5) =

p(U > -2.23) =

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 18 / 44

Page 19: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

p(-1.85 < U < -1.07) =

p(-1.12 < U < 0.6) =

II.3.3 Variable change : transition from NNNN (µµµµ , σσσσ ) to NNNN (0 , 1)

The standard normal law, with its table, is our only tool to get probabilities values.

In any case then, we'll have to translate our normal law into the law N (0 , 1).

prop Let be a random variable X , with mean µ and standard deviation σ, whose law is N (µ , σ ).

Then, the variable U = has a N (0 , 1) distribution.

Thus :

Whatever the variable (X ou U), the probability to be got is the crosshatched area.

e.g., the abscissa µ +2σ on X axis corresponds to the abscissa 2 on U axis (according to

the variable change) and then p(X < µ +2σ ) = p(U < 2).

X - µσ

xµ µ +σ µ +2σµ -σµ -2σ

( )p pX

X x Uµ

σ− < = <

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 19 / 44

Page 20: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

T6 : Calculate probabilities in a general case

Determine the asked probabilities, using the given normal distributions :

1 - law of X : N (50 , 10). Calculate p(X < 60), p(X < 43), p(45 < X < 55)

2 - law of X : N (3 , 0.45). Calculate p(X > 4), p(X < 2.55), p(3.2 < X < 3.7)

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 20 / 44

Page 21: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

II.4 Normal approximation of other laws

On part II-1, it's been stated that when n becomes a large number, the discrete laws become

close to the normal law.

So, we're allowed to use the normal law instead of one other, under the following conditions :

Approximation criterions for the replacement of a binomial law :

Starting with B (n , p ), if n > 30 and npq > 5,

then we can use N (µ , σ ) with µ = np and σ = √(npq )

Approximation criterions for the replacement of a Poisson's law :

Starting with P (λ ), if λ > 20,

then we can use N (µ , σ ) with µ = λ and σ = √λ

Probability of a single value :

Having a discrete case, in which variable X may only take natural values, for instance,

we could wish to calculate p(X = k).

However, the normal law is only able to calculate probabilities of inequalities.

In this case, the rule is :

Or : you can use a binomial

or a Poisson's law…

T7 : Approximations from other laws to normal law

Justify you can use a normal law, then calculate the probabilities :

1 - In a land, 30 % of the companies do exports. If we choose 80 companies at random,

what's the probability that more than 30 do exports ?

p(X = k ) = p(k - 0.5 < X < k + 0.5)

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 21 / 44

Page 22: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

What's the probability that exactly 30 companies do exports ?

2 - The number of items sold each day is distributed following a Poisson's law whose parameter is 25.

What's the probability that, one day, that less than 20 items would be sold ?

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 22 / 44

Page 23: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

III. Sampling distributions

III.1 Introduction

Do you know an operation where the entire population is surveyed ?

…to get several informations ?

Means deployed are huge. It takes more than a year to collect and analyse the whole

data set, and also an impressive number of surveyors to walk through the whole country.

Of course, this work can't be carried out for any survey…

By selecting a part of the population, you can get a pretty good representation of reality.

This selection, more or less "representative" to reality, is called sample.

Survey methods do exist, to build a sample as representative to the population as possible.

Our aim is, in this section, given an completely known population, to be able to tell how

its set of samples will surely behave.

Naming conventions :

Population's parameters will be written using greek alphabet :

mean : µ ; standard deviation : σ ; proportion : π

Sample's parameters will be written using our alphabet :

mean : x ; standard deviation : s ; proportion : p

III.2 Sampling distribution of means

Taking individuals from a population, we aim to study a random variable X .

Once chosen a natural number n , we can theoretically extract every n -sized sample.

we are able to calculate the mean of sample # k.

def We name mean random variable of n-sized samples the variable whose values

are the different means of n -sized samples.

def We name sampling distribution of means the distribution of the list of values ,

i.e. the probability law of variable .

theorem

Let consider a big population, on which is studied a quantitative variable X ,

normally distributed, whose mean is µ and standard deviation σ .

So, the law of is N (on EAS*)

* see next page

or N (on exhaustive sampling*)

Comment 1 : in case N > 20n (little sample compared to the population),

we can approximate the coefficient as if it were equal to 1, and so erase it.

Comment 2 : in an exercice, if no comparison between N and n is possible, then we will consider

that we have an EAS case.

Comment 3 : consequence of the "central limit" theorem

When N tends to infinity (concretely : when the population is very big), the probability law

of is normal, whatever the probability law of X .

kx

kx

X

Xk

x

X ;n

σµ

;1

N n

Nn

σµ − −

1

N n

N

−−

X

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 23 / 44

Page 24: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

What's the meaning of these results ?

e.g. :

Let's take the folowing population of figures : Ω = 0, 1, 2, 3, 4, 5

Its mean is : µ = 2,5 and its standard deviation is : σ = 1,7078.

1 - We can list below all its 2-sized samples (on EAS) :

(bold : the sample ; next to it : sample's mean)

00 0 10 0.5 20 1 30 1.5 40 2 50 2.5

01 0.5 11 1 21 1.5 31 2 41 2.5 51 3

02 1 12 1.5 22 2 32 2.5 42 3 52 3.5

03 1.5 13 2 23 2.5 33 3 43 3.5 53 4

04 2 14 2.5 24 3 34 3.5 44 4 54 4.5

05 2.5 15 3 25 3.5 35 4 45 4.5 55 5

Now, let's have a closer look at the distribution of these sample means :

their mean is : 2.5 !

their standard deviation is : 1.2076 and

2 - We can list below all its 2-sized samples (on exhaustive) :

(bold : the sample ; next to it : sample's mean)

01 0.5 12 1.5 23 2.5 34 3.5 45 4.5

02 1 13 2 24 3 35 4

03 1.5 14 2.5 25 3.5

04 2 15 3

05 2.5

Now, let's have a closer look at the distribution of these sample means :

their mean is : 2.5 !

their standard deviation is : 1.0801 and

III.3 Sampling distribution of proportions

Taking individuals from a population, we aim to study on them the presence (or not) of a character A.

a is the number of individual having A ; N is the total size of the population.

def The proportion of individuals having A inside the population is the number

In a n -sized sample, this measured proportion will be named p .

def We name random variable proportion the variable P , list of p values extracted from

the set of all n -sized samples.

déf We name sampling distribution of proportions the distribution of variable P , i.e.

its probability law.

Let's define the variable Y giving, for each n -sized sample, its number of individuals having A.

th The law of Y is the binomial law, with parameters n and π . Y B (n , π )

mean and variance of Y : E (Y ) = n π and V (Y ) = n π (1 - π)

Furthermore : P = Y /n

th With a population in which a character's proportion is π , the sampling distribution

of proportions p of n -sized samples is the one of variable P such as :

law of P is : N or N

csq The mean of P is π and its variance isπ (1 - π )

.n

n

σ =

1

N n

Nn

σ − =−

a

Nπ =

( ) ( );1

EASn

π ππ −

( ) ( );1

1

N nexhaustive

n N

π ππ − − −

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 24 / 44

Page 25: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

T8 : Sampling distributions

1 - From a normal population - mean 120, standard deviation 40 - are taken every EAS of sizes

n = 10 and n = 50.

a. What are the laws of sampling distributions of means of 10-sized samples ? 50 ?

b. Quickly draw these two distributions on the same graph.

c. What's the probability that the mean of a random 10-sized sample would be more than 130 ?

d. Same question for a 50-sized sample.

2 - In the world's population, severa years ago, were 3.38 billion women and 3.12 billion men.

P is the variable giving the proportion of women in every sample of 100 persons.

a. What is the probability law of P ?

b. What is the probability that, in a sample, there would be more men than women ?

IUT of Saint-Etienne - TC - J.F.Ferraris - Math - S3 - InferStats - CoursEx - Rev2014n - page 25 / 44

Page 26: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 26 / 44

IV. Estimation Problematic

A large population is to be studied. It’s partially or totally unknown. A unique n-sized sample is

taken from it. To what extent does this sample represent the whole population ? The available

informations taken from this sample are they reliable in order to estimate the reality of the

unknown population ?

As it’s a large population, we will systematically consider SRS samples.

IV.1 Point estimates

A ^ will be placed upon an unknown parameter to express an estimation of it.

IV.1.1 Estimate of a mean, of a proportion

th x is a point estimate of µ. ˆ xµ =

1

1 n

i

i

X Xn =

= ∑ is an unbiased estimator of µ, i.e. : its expected value is µ.

(indeed, we noticed on part III that the mean of the means of samples was µ)

th p is a point estimate of π. ˆ pπ =

IV.1.2 Estimate of a variance, of a standard deviation

th ( )2

1

1

1

n

i

i

x xn =

−− ∑ is a point estimate of σ2

.

S², variable “variances of samples” is a biased estimator of σ², i.e.

the mean of s² values isn’t σ².

From a sample, ( )22

1

1 n

i

i

s x xn =

= −∑ , so : ˆ 2 2

1

ns

nσ =

− and ˆ

1

ns

nσ = ×

−.

IV.1.3 Comment

Knowing a point estimate doesn’t give us any information about the accuracy of this result. The

actual value of population’s parameter might be very different, because a random sample might

badly represent our population.

Page 27: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 27 / 44

IV.2 Estimation of µµµµ by a confidence interval

Confidence intervals were created to give an answer to the question raised in the former

comment. For instance, around a sample’s mean we’ll build an interval “having 95% chances to

include µ ”. The building method of this interval actually depends on the knowledge of σ.

def We name significance level, αααα, the probability that the interval does not include µ.

We name confidence level, 1 - αααα, the probability that the interval includes µ.

Two values of α are commonly used : 5% and 1%, which means a 95% or a 99% confidence level.

The following scheme shows us about a 95% confidence level interval :

IV.2.1 Confidence interval in case σσσσ is known

Remainder : the law of X is N ;n

σµ

By variable change, we can deduce that the law of variable X

U

n

µσ−= is N (0 ; 1).

So, in any case, the confidence interval is built with the formula : ;I x u x un n

ασ σ = − +

,

where coefficient u, from the law N (0 ; 1), depends on the chosen significance level.

For instance : if α = 5%, u = ; and if α = 1%, u = .

IV.2.2 Confidence interval in case σσσσ is unknown

That’s the most common occurrence. In this case we must use the mean x and the standard

deviation s of our sample. Dealing with two variables doesn’t allow us to use the normal law : we

have to use Student law. with a variable T (another letter to differentiate it from U).

The law of variable

1

XT

s

n

µ−=

is the Student law, St (0 ; 1) with n - 1 degrees of freedom.

So, in any case, the confidence interval is built with the formula : ;1 1

s sI x t x t

n nα

= − + − − ,

where coefficient t, from the law St (0 ; 1), depends on the chosen significance level and on the

number of degrees of freedom.

For instance, with n = 10 : if α = 5%, t = ; and if α = 1%, t = .

[ ]

prob = 95 % (1 – α) :

our interval includes µ

prob = 2,5 % (α/2) :

µ is less than the lower value

lower

value

upper

value

prob = 2,5 % (α/2) :

µ is more than the upper value

Page 28: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 28 / 44

IV.3 Estimation of ππππ by a confidence interval

Using a similar reasoning, we can build an interval around a proportion p obtained from a sample :

( ) ( );

1 1p p p pI p u p u

n nα

− −= − +

(notice that, here, you will only use the normal law, with its parameter u)

T9 : Point estimates and confidence intervals

A sample of companies of the same industry provided the following results :

turnover (M€) [0 ; 2[ [2 ; 3[ [3 ; 4[ [4 ; 5[ [5 ; 7[

size (# of companies) 6 12 17 10 5

a. Give point estimates of the mean and standard deviation for the turnover of the whole

population of companies in this industry.

b. Give the 95% confidence interval of the mean turnover in this industry.

Page 29: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 29 / 44

c. Give a point estimate of the proportion of companies whose turnover is more than 4.5 M€.

b. Give the 99% confidence interval of this proportion in this industry.

Page 30: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 30 / 44

V. Statistical hypothesis testing On knowing one sample from an unknown population, we can word a hypothesis : an assumption

that we want to test. A suitable statistical test will, or won’t, permit us to reject the tested

hypothesis, named null hypothesis, H0.

In many tests, an alternative hypothesis, H1, has to be worded too.

We name significance level of a test the chance, α, we would be wrong on rejecting H0.

The value 1 – α is then the confidence level of the test.

Here, we’ll only deal with two kinds of tests :

- adequacy χχχχ² test, between an observed distribution and a theoretical law (Pearson’s test)

e.g. : does the distribution of these results correspond to a normal one ?

- conformance of a parameter to a value (z-test, t-test)

e.g. : is 1.78 m the average size of the population ?

graph illustrating the Pearson’s test :

Page 31: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 31 / 44

V.1 χχχχ² test of adequacy between a distribution and a law

The null hypothesis H0 has to be : “there’s a perfect adequacy between them”.

Our aim is to know if it’s risky or not to reject H0. (this risk α – to be wrong rejecting it - will be

chosen ; on the other side, the risk (β) to be wrong accepting H0 can’t be known).

Methodology : n observations are performed (one per individual)

k different values are noted.

1. Calculatations :

values observed # theoretical #

val 1 obs1 th1 χ² part1

val 2 obs2 th2 χ² part2

… … … …

val k obsk thk χ² partk

sum n n χχχχ²calc

2. Reading the table :

A χ² law has to be used. This one gets k – 1 degrees of freedom.

Once known a risk αααα, or significance level of the test, we’re able to go to the right place into

χ² table (form) and read there a value of our “χχχχ²lim”.

3. Comparison and decision :

“χχχχ²calc” and “χχχχ²lim” have to be compared.

A χ² value is an expression of the global difference between observations and tested law.

If the “χχχχ²calc” value is higher than the “χχχχ²lim” from the table, you will be allowed to reject H0,

so to reject the adequacy – but you won’t be 100% sure to be right doing that : your risk to be

wrong will be less than α, but not zero.

(obs - th)² th

Page 32: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 32 / 44

T10 : adequacy χχχχ² test

120 throws of a same die have been performed. The table below shows the results.

Considering this sample of results, can we say this die is not a fake one, with a 2%

significance level ?

result 1 2 3 4 5 6

observed # of throws 26 15 14 24 25 16

theor. # of throws

(obs - th)²

th

Page 33: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 33 / 44

V.2 Conformance testing of a mean, of a proportion

V.2.1 Principle

Given a data set of n observations, from a sample, we have to decide wether a value µ0 can, or

can’t, correspond to the mean of the population (same expectations for a value π0 representing a

proportion).

Thus, the null hypothesis is H0 : µ = µ0 (or π = π0)

Following the situation, the alternative hypothesis can be :

H1 : µ ≠ µ0 (two-sided test)

H1 : µ < µ0 (one-sided test) or H1 : µ > µ0 (another one-sided test)

H0 will be rejected if the sample’s mean x (or the sample’s proportion p) is far enough from the

tested value µ0 (or π0).

V.2.2 Conformance testing of a mean

X is the list of individual values measured in a sample ; n is their number.

X is the list of means of all n-sized samples.

S is the list of standard deviations of all n-sized samples.

If the standard deviation of the population, σσσσ, is known : z-test, using the normal law.

Under H0 (considering that this is the reality),

the associated decision variable is : 0XU

n

µσ−= , distributed by N (0, 1).

(U is so, in case X is normally distributed, or in case n > 30)

If the standard deviation of the population, σσσσ, is unknown : t-test, using a Student’s law.

Under H0 (considering that this is the reality),

the associated decision variable is : 0

1

XT

S

n

µ−=

, distributed by St (0, 1).

(T is so, in case X is normally distributed, or in case n > 30)

V.2.3 Conformance testing of a proportion

n is the size of our sample, p is the proportion in it.

P is the list of proportions in all n-sized samples.

testing a proportion is always done by a z-test, using the normal law.

Under H0 (considering that this is the reality),

the associated decision variable is : ( )

0

0 01

PU

n

ππ π

−=−

, distributed by N (0, 1).

(U is so, in case P is normally distributed, or in case n > 30)

Page 34: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 34 / 44

T11 : one-sided conformance test of a mean (σσσσ known) A greengrocer wishes to buy vegetables from a new supplier. This one claims that its beans

measure 10 cm on average. If this value is plausible, or if the estimate is even higher, then the

greengrocer will decide to choose this supplier. Of course, he won’t in case a sample gives a too

low average size. The greengrocer fixed its risk level to 5%.

Let X be the random variable "size of a bean (cm)" distributed by N (µ, 2,3).

After having taken a sample of n = 25 beans, the calculated average was x = 9.5 cm.

a. Hypothesis :

H0 : b. Statistics (according to H0) :

H1 : X

U =

c. Significance level :

d. Rejection area

ulim =

uobs = OR limx =

e. Decision

Page 35: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 35 / 44

T12 : two-sided conformance test of a mean (σσσσ unknown) A career should produce 300 tons of ore, on daily average, not more, not less.

It is assumed that the daily mass of produced ore is distributed normally.

Quantities examined 10 days gave the following results, in tons :

302 287 315 322 341 324 329 345 392 289

Can we consider that the all-days mean is 300 tons, at a 1% significance level ?

a. Hypothesis :

H0 : b. Statistics (according to H0) :

H1 : X

T =

c. Significance level :

d. Rejection area

tlim =

tobs = OR limx =

e. Decision

Page 36: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 36 / 44

EXERCISES

I. DISCRETE LAWS

Exercise 1

A supermarket sells 24 fruit species, counting 8 “bio” label. A blind control consists in choosing 10

fruits of different species. The variable X gives the number of “bio” species among these 10.

1. Give, with explanations, the probability law of X.

2. Calculate its expected value and standard deviation.

3. What is the probability that less than two “bio” species would be chosen ?

Exercise 2

A car driver meets five signal lights on his way. They share the same duration of red and green

lightening : 40 seconds green and 20 seconds red. Unfortunately, they are not synchronized, so

that the color of one light is independent of the color of another one.

1. When approaching the first light, what’s the probability it will be green ?

2. What’s the probability that the lights will all be green ?

3. What’s the probability that at least two lights will be red ?

4. What’s the mean expected number of green lights driving this way ?

Exercise 3

The germination capacity of a seed is 0.8 (probability to germinate).

1. 8 seeds are sown. Calculate the probabilities of the following events :

a. exactly 5 seeds will germinate.

b. At least 7 seeds will germinate.

2. When a seed has germinated, the probability that a slug eats the young plant is 0.4.

a. Calculate the probability that a seed will finally become a grown plant.

b. How many seeds must be sown to get more than 99% chances of getting at least

one grown plant ?

Exercise 4

According to a survey, 80% of the customers of a product “A” are satisfied.

Choosing randomly 10 customers of this product, what’s the probability that…

a. they’re all satisfied ?

b. 80% of them are satisfied ?

c. at least 80% are satisfied ?

Exercise 5

6% of French people are clients of the mobile phoning operator “Yellow”.

A survey consists in asking to 50 randomly chosen French people which is their mobile phoning

operator. The variable X gives the number of “Yellow” clients among these 50 people.

Page 37: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 37 / 44

1. a. Justify and give the probability law of X.

b. What are the chances that the population proportion of clients would be the same

into the sample ?

c. What is special with this former probability ?

d. What’s the probability that none of the 50 people would be a “Yellow” client ?

e. What’s the probability that there would be at least 4 “Yellow” clients ?

2. In this part, the number of persons to call is still unknown. How many people would have

to be called, to get more than 99% chances finding at least one “Yellow” client ?

Exercise 6

The shop “HighTech” sells computers. The variable number of daily sales is distributed like a

Poisson’s law whose parameter is 4. Calculate the probability that the next day…

a. No computer would be sold

b. At least one computer would be sold

c. Exactly 2 computers would be sold.

Exercise 7 (determine X, n, p and justify the use of a Poisson’s law)

On a survey implying a large number of persons, only 2% of them accept to give their name.

Given that one of the investigators has to interview 250 people, calculate the probability that…

a. All these people won’t give their name.

b. At least 5 people will give their name.

Exercise 8

A box contains 250 matches. It has been exposed to moisture, so that 20% of matches won’t

lighten. Taking at random 10 matches, the variable X gives the number of matches that will

lighten.

1. Demonstrate that the law of X is binomial and give its parameters and expected value.

2. Calculate the following probabilities :

a. No match will lighten

b. They will all lighten

c. At least 3 won’t lighten

3. a. Look again for the above probabilities, this time using a Poisson’s law.

b. Explain the differences of your answers between questions 2 and 3.

Exercise 9

In a large population are met on average 0.4% of blind people.

1. Into a sample of 100 people, what’s the probability there’s no blind people ? at least 2 ?

2. Answer these questions using the correct Poisson’s law (justify its use).

Page 38: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 38 / 44

II. NORMAL LAW

Exercise 10 (with solutions)

It’s been stated that the variable X “mass (kg) of a newborn baby” is distributed by the law

N (3.1 ; 0.5).

1. What’s the probability that a newborn baby weights more than 4 kg ?

variable change : U = (X – 3.1)/0.5 is distributed by the standard normal law N (0 ; 1).

our U value : u = (4 – 3.1)/0.5 = 1.8 ; from the table : F(1.8) = p(U < 1.8) = 0.9641.

our answer : p(X > 4 kg) = p(U > 1.8) = 1 – 0.9641 = 0.0359.

2. What’s the probability that a newborn baby weights less than 3 kg ?

variable change : U = (X – 3.1)/0.5 is distributed by the standard normal law N (0 ; 1).

our U value : u = (3 – 3.1)/0.5 = -0.2 ; from the table : F(0.2) = p(U < 0.2) = 0,5793.

our answer : p(X < 3 kg) = p(U < -0.2) = 1 - p(U < 0.2) = 1 - 0.5793 = 0.4207.

3. What’s the probability that a newborn baby’s weight is between 3 and 4 kg ?

interval formula : p(a < X < b) = p(X < b) – p(X < a).

our answer : p(3 < X < 4) = p(X < 4) – p(X < 3) = 0.9641 – 0.4207 = 0.5434.

Exercise 11

1. The variable U is distributed by the standard normal law N (0 ; 1).

a. Calculate : * p(U < 0.86) * p(U > 1.96) * p(U > -1.39) * p(-0.63 < U < 0.63)

b. Give the value u0 such that : * p(U < u0) = 0.8944 * p(-u0 < U < u0) = 0.98

2. The variable X is distributed by the normal law N (25 ; 7).

a. Calculate : * p(X > 35.5) * p(X < 18)

b. Give the value x0 such that : p(X > x0) = 0.0516

Exercise 12

A company manufactures beacons (flashing lights) for all types of machines, in large quantities.

The probability that a beacon is defective is p = 0.04.

A random sample of 600 beacons is taken from the production. X is the random variable that gives

the number of defective beacons among the 600.

1. Show that the random variable X is a binomial distribution whose parameters are to be

specified.

2. Show that we can approximate the distribution of X by a normal distribution.

3. Determine µ and σ, mean and standard deviation of the variable X for the normal

distribution.

4. Then calculate with precision allowed by the tables, the probability of having at least 27

defective flashing lights in the drawing of 600 beacons.

Page 39: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 39 / 44

Exercise 13

A commercial agent is assigned to telephone solicitations. On average, one in five (phone calls)

leads to appeal an order.

1. We name X the random variable “number of past commands after 60 calls”.

a. Give the name and the parameters of the probability distribution of X.

b. Justify that the law can be approximated by a normal distribution, giving its

parameters.

c. Using the normal distribution, calculate the following probabilities :

* p(X > 15) * p(X < 10) * p(X = 12)

2. Find out the minimum number of phone calls that must be passed by the sales agent so

that his chances to get at least 15 orders are more than 75%.

Exercise 14

It is assumed that on average you’re checked once in 20 in the bus by a controller. M.A makes 800

trips a year on the line.

1. What is the probability that M.A. would be checked between 30 and 50 times a year?

2. M.A always travels without a ticket. An annual subscription would be 320 € / year.

At what height must the company fix the fine, so that at least 75% of cheaters would better

take an annual subscription ? 99% ?

III. SAMPLING

Exercise 15

In a production of light bulbs, it is assumed that the lifetime of a bulb is a normal random variable

in which mean is 900 hours and standard deviation 80 hours. Calculate the probability that in a

random sample of 100 bulbs (SRS), the average lifetime of bulbs exceeds 910 hours.

Exercise 16

A candidate obtained 55% of votes cast in an election.

1. What is the probability that, in a sample of 100 people, his result be less than 50% ?

2. Same question for a sample of 2000 people.

3. How many people do we have to take so as the probability that less than 50% of them

voted for him drops below 1% ?

Page 40: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 40 / 44

Exercise 17

In one region, during the summer period, it is assumed that the number of tourists present in a

day follows a normal distribution witch mean is 50,000 and standard deviation 8,000.

1. The prefecture estimated that tourism is "manageable" (reception, environment,

pollution, ...) when the probability to receive less than 55,000 people in a day exceeds 70%.

Is this the case?

2. Officials want to base their thinking on using samples of 10 vacation days.

a. What is the law of X : "Average daily number of vacationers in a sample of 10 days" ?

b. What is the probability that, in such a sample, the average daily number of tourists

is less than 55,000 ?

Exercise 18 (with solutions)

An elevator can carry a load of 580 kg. It is assumed that the mass - of a person chosen at random

among the users of the elevator, expressed in kilograms - is a random variable following a normal

distribution N(µ, σ) with µ = 70 kg and σ = 16 kg.

What is the maximum number of persons that may be permitted to be together in the elevator if

you want the risk of overload does not exceed 0.01 ?

Consider a sample of n (unknown) people in this elevator. There is an overload if the total

mass (variable X) exceeds 580 kg. The average mass of the sample is distributed following

N (70 ; 16/√n) and the total mass is the product of the average mass by n.

Hence, the distribution of the total mass is N (70n ; 16√n).

Variable change : U = (X - 70n)/16√n follows N (0 ; 1).

We aim that p(X > 580) = 0.01, and on the other side, we know that p(U > 2.33) = 0.01.

So, we must have .580 70

2 3316

n

n

− = (and even > 2.33), and then .70 37 28 580 0n n+ − = .

We can solve this equation as a quadratic one (∆, …) or we can test several values of n :

the conclusion is n < 7. For this elevator will be displayed : “6 people maximum”.

Exercise 19

A large population took an IQ test. The results are normally distributed with µ = 102 and σ = 15.

1. What’s the proportion of people whose IQ is less than 100 ?

2. We want to analyze the results of a few samples of this population. For this, we form

groups of 20 individuals selected by simple random sampling (SRS), and the average IQ of

each group will be calculated.

a. Give the parameters of the normal distribution of IQ means of all 20-sized samples.

b. What is the probability that our selected group has an average IQ below 100 ?

c. Instead of 20, how many people would we have to choose to be less than 5% chance

that the average IQ of this new group is below 100 ?

3. Using the answer of question 1, what is the probability that in a group of 20 people, the

proportion (of individuals whose IQ is less than 100) is more than 50 % ?

Page 41: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 41 / 44

IV. ESTIMATION

Exercise 20

From a vine, 10 grapes have been taken at random and weighed, which gave the

following results in kilograms : 2.4 ; 3.2 ; 3.6 ; 4.1 ; 4.3 ; 4.7 ; 5.3 ; 5.4 ; 6.5 ; 6.9

1. Give the mean and standard deviation of a grape’s mass in this sample.

2. Give a point estimate of the standard deviation of grape mass inside the vine

(population).

3. Give a 95% confidence interval for the mean mass of grapes in the whole population.

4. Calculate the minimum number of grapes that would have to be studied such as this

interval has a size of 1 kg, assuming that the estimated standard deviation is the real one of

the population.

Exercise 21

A laboratory wishes to analyze the level of contamination of trees by acid rains, in a given

territory. After examination of a 100 trees sample, 8 affected trees have been found.

Give an estimate of the proportion π of affected trees in this territory, by a 90% confidence

interval.

Exercise 22

In managing a grain elevator, one wonders about the safety (minimal) stock expected to have 99%

chances to satisfy customers at any time. For this, the weekly consumption of grain has been

analyzed during a sample of 15 weeks. The following results were obtained :

consumption (in tons) 4.6 4.7 4.8 4.9 5 5.1 5.2 5.3

number of weeks 1 0 2 3 5 2 1 1

1. Give the mean x and standard deviation s of the consumption in this sample.

2. We name X the variable “weekly consumption of grain” at any time, and we assume that

its distribution is normal.

a. Give point estimates of µ and σ.

b. Using this normal law, calculate the value of X that has a 99% chance of not being

exceeded.

3. a. Using the results of question 2, build a 99% confidence interval of the average

weekly consumption.

b. What’s the probability of exceeding, one week, the upper limit of this interval ?

Page 42: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 42 / 44

Exercise 23

A company wants to specialize in the delivery of large packages. Those that have already been

carried are considered as a representative sample of all future packages.

data set of large packages that have already been carried :

volume (m³) 0.2 to 0.4 0.4 to 0.5 0.5 to 0.6 0.6 to 1

# of packages 15 40 60 10

1. Give point estimates for the mean and standard deviation of the future packages’ volume.

2. Give a 99% confidence interval for the average volume of future packages.

3. In this question, the standard deviation of the population is considered known and its

value is the one you found in question 1. We want to use a confidence interval of the

average volume, whose size would be 0.05 m3. What would be the confidence level of such

an interval ?

V. TESTS

Exercise 24

Experiment : a die is thrown 3 times. Each time, the success is : getting a 5 or a 6.

X is the random variable “number of success at the end of the experiment” ; the possible values of

X are 0, 1, 2 or 3. We assume that p(X = 0) = 8/27, p(X = 1) = 12/27 and p(X = 3) = 1/27.

Moreover, the experiment will be performed 54 times !

1. Complete the following table :

number of success 0 1 2 3 total

theoretical # of experiments 54

observed # of experiments 14 20 16 4 54

2. By a χ² test, with a 5% significance level, say wether the observed results are in adequacy

with the expected theoretical ones.

Exercise 25 (parts 1 and 2 are independent)

There has been reported, for five French groups in the same industry, the annual budget for

promotion on the Internet compared to the global annual budget for promotion :

group A B C D E

Internet budget (k€) 47 55 58 63 72

global budget (k€) 558 545 587 560 585

Part 1

1. Determine in this sample the proportion of firms for which the Internet budget exceeds

10% of the overall budget.

2. a. Determine the 95% confidence interval of the proportion that could be observed in

all French groups in this industry.

Page 43: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 43 / 44

b. This industry actually consists of 58 groups in France. What is the minimal number,

that can be assumed with a confidence level of 80%, of groups whose internet budget

exceeds 10% of their overall budget?

Part 2

Perform a Khi-squared test to tell, with a significance level of 5%, if the data observed for

these five companies are in adequacy with the following assertion : “in France, Internet

budget worths 10% of overall budget”.

Exercise 26 (parts 1 and 2 are independent)

A study was conducted in a sample of 50 plastics companies. On each of them, there have been

given net income of 2009. The list of net incomes is a variable R, expressed in M€. The table below

provides a dispatching by classes of incomes :

net income R (M€) [-1 ; 1[ [1 ; 1,5[ [1,5 ; 2[ [2 ; 3[ [3 ; 5[

# of companies 3 10 18 15 4

Part 1

1. Give the income’s mean and standard deviation of this sample.

2. Give a 99% confidence interval of the mean net income in the whole large population of

plastics companies (you may notice that population’s standard deviation is unknown).

Part 2

Our aim in this part is to make an assertion about the possibility that the net incomes

distribution is in adequacy with the normal law N (2 , 0.9).

1. With variable X distributed with N (2 , 0.9), calculate :

* p(-1 < X < 1) * p(1 < X < 1,5) * p(1,5 < X < 2) * p(2 < X < 3) * p(3 < X < 5).

2. Explain why, in adequacy with this particular normal law, and then in accordance with the

five probabilities you just calculated, a theoretical sample of 50 plastics companies would

give the following table :

net income R (M€) [-1 ; 1[ [1 ; 1,5[ [1,5 ; 2[ [2 ; 3[ [3 ; 5[

# of companies 6.675 7.71 10.615 18.325 6.675

3. Then, perform an adequacy χ² test between this normal law and reality, choosing a 95%

confidence level. Give detailed explanations of this “confidence”.

Exercise 27

The study of 320 families with 5 children has given the distribution of the following table.

enfants 5 boys 4 boys 3 boys 2 boys 1 boy 0 boy

0 girl 1 girl 2 girls 3 girls 4 girls 5 girls

effectif 18 56 110 88 40 8

Are these results compatible with the hypothesis that the birth of a boy and a girl are equally

likely events ?

Page 44: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT of Saint-Etienne – TC –J.F.Ferraris – Math – S3 – InferStats – CoursEx – Rev2014n – page 44 / 44

Exercise 28

A string manufacturer states that the objects it produces have an average tensile strength of 300

kg (with a standard deviation of 30 kg). It is assumed that the variable “strength of a string” is

normally distributed. Experiments on 10 strings revealed the following breakdown tensions :

251 277 255 305 341 324 329 314 272 289

Can we consider, analyzing this sample, that the average tensile strength of the population

of strings is equal to 300 kg? (significance level : 10%)

Exercise 29

For 1000 French baccalaureate candidates chosen at random, 675 were successful. Test at a 10%

significance level the assumption that the success rate in France is 70%.

Exercise 30

In several countries, the weather forecast is given as a probability.

Forecasting "the probability of rain tomorrow is 0.4" was made 50 times during the past year and

when it’s been done, it rained 26 times the day after. Test the accuracy of the prediction, with a

5% α-level.

Page 45: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

IUT TC Formulaire global du Semestre 3 MATHEMATIQUES

Lois de probabilités

Loi hypergéométrique H(n, a, N)

n : nombre de tirages ; a : nombre d’individus « succès » ; N : taille de la population

k : nombre de succès souhaités parmi les n tirages

approximation hypergéom.

par binomiale : si N ≥ 20n

Loi binomiale B(n, p) n : nombre de tirages p, q : probabilité de succès, d’échec

approximation binomiale par

Poisson : si n ≥ 30 et p < 0,1 et npq < 10

Loi de Poisson P(λ)

Approximation d'une loi binomiale B(n, p) par une loi normale N(µ, σ) :

si n ≥ 30 et npq ≥ 5 ; on posera µ = np et σ =

Approximation d'une loi de Poisson P(λ) par une loi normale N(µ, σ) :

si λ ≥ 20 ; on posera µ = λ et σ =

Echantillonnage

Distribution d’échantillonnage des moyennes

Soit une population de grande taille N sur laquelle on étudie une variable X de moyenne µ et

d’écart type σ. On imagine tous les échantillons de taille n > 30.

La loi de est en EAS (N ≥ 20n), ou dans le cas contraire.

Distribution d’échantillonnage des proportions

Soit une population de grande taille N sur laquelle on étudie une caractéristique dont la

proportion relevée est π. On imagine tous les échantillons de taille n > 30.

La loi de P est en EAS, ou dans le cas contraire.

Estimation

Estimations ponctuelles de µ, σ, π :

Estimation de µ par intervalle de confiance :

σ est connu : σ est inconnu :

Estimation de π par intervalle de confiance :

( ) k k n k

nX k

−= =p C p q ( )X n=E p

( ) ( )a a nX n

− −=−2

N NV

N N 1

( )!

p ek

X kk

λ λ−= = ( ) ( );E X V Xλ λ= =

,n

σµ

N ,N

N 1

n

n

σµ − −

N

( ),

1

n

π ππ −

N( )

,1

1

N n

n N

π ππ − − −

N

ˆ xµ = ˆ1

ns

nσ = ×

−ˆ pπ =

;I x u x un n

ασ σ = − +

;

1 1

s sI x t x t

n nα

= − + − −

( ) ( );

1 1p p p pI p u p u

n nα

− −= − +

( )k n k

a a

nX k

−−×= = N

N

C Cp

C( ) a

X n=EN

( )X n=V pq

pqn

λ

X

Page 46: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

Tables

Table de la loi de Poisson

Table de probabilités : valeurs de p(X = k) pour différentes lois de Poisson

λλλλ 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9

k 0 0,90484 0,81873 0,74082 0,67032 0,60653 0,54881 0,49659 0,44933 0,40657

1 0,09048 0,16375 0,22225 0,26813 0,30327 0,32929 0,34761 0,35946 0,36591

2 0,00452 0,01637 0,03334 0,05363 0,07582 0,09879 0,12166 0,14379 0,16466

3 0,00015 0,00109 0,00333 0,00715 0,01264 0,01976 0,02839 0,03834 0,04940

4 0,00000 0,00005 0,00025 0,00072 0,00158 0,00296 0,00497 0,00767 0,01111

5 0,00000 0,00000 0,00002 0,00006 0,00016 0,00036 0,00070 0,00123 0,00200

6 0,00000 0,00000 0,00000 0,00000 0,00001 0,00004 0,00008 0,00016 0,00030

λλλλ 1 1,5 2 2,5 3 3,5 4 4,5 5

k 0 0,36788 0,22313 0,13534 0,08208 0,04979 0,03020 0,01832 0,01111 0,00674

1 0,36788 0,33470 0,27067 0,20521 0,14936 0,10569 0,07326 0,04999 0,03369

2 0,18394 0,25102 0,27067 0,25652 0,22404 0,18496 0,14653 0,11248 0,08422

3 0,06131 0,12551 0,18045 0,21376 0,22404 0,21579 0,19537 0,16872 0,14037

4 0,01533 0,04707 0,09022 0,13360 0,16803 0,18881 0,19537 0,18981 0,17547

5 0,00307 0,01412 0,03609 0,06680 0,10082 0,13217 0,15629 0,17083 0,17547

6 0,00051 0,00353 0,01203 0,02783 0,05041 0,07710 0,10420 0,12812 0,14622

7 0,00007 0,00076 0,00344 0,00994 0,02160 0,03855 0,05954 0,08236 0,10444

8 0,00001 0,00014 0,00086 0,00311 0,00810 0,01687 0,02977 0,04633 0,06528

9 0,00000 0,00002 0,00019 0,00086 0,00270 0,00656 0,01323 0,02316 0,03627

10 0,00000 0,00000 0,00004 0,00022 0,00081 0,00230 0,00529 0,01042 0,01813

11 0,00000 0,00000 0,00001 0,00005 0,00022 0,00073 0,00192 0,00426 0,00824

12 0,00000 0,00000 0,00000 0,00001 0,00006 0,00021 0,00064 0,00160 0,00343

λλλλ 5,5 6 6,5 7 7,5 8 8,5 9 9,5 10

k 0 0,00409 0,00248 0,00150 0,00091 0,00055 0,00034 0,00020 0,00012 0,00007 0,00005

1 0,02248 0,01487 0,00977 0,00638 0,00415 0,00268 0,00173 0,00111 0,00071 0,00045

2 0,06181 0,04462 0,03176 0,02234 0,01556 0,01073 0,00735 0,00500 0,00338 0,00227

3 0,11332 0,08924 0,06881 0,05213 0,03889 0,02863 0,02083 0,01499 0,01070 0,00757

4 0,15582 0,13385 0,11182 0,09123 0,07292 0,05725 0,04425 0,03374 0,02540 0,01892

5 0,17140 0,16062 0,14537 0,12772 0,10937 0,09160 0,07523 0,06073 0,04827 0,03783

6 0,15712 0,16062 0,15748 0,14900 0,13672 0,12214 0,10658 0,09109 0,07642 0,06306

7 0,12345 0,13768 0,14623 0,14900 0,14648 0,13959 0,12942 0,11712 0,10371 0,09008

8 0,08487 0,10326 0,11882 0,13038 0,13733 0,13959 0,13751 0,13176 0,12316 0,11260

9 0,05187 0,06884 0,08581 0,10140 0,11444 0,12408 0,12987 0,13176 0,13000 0,12511

10 0,02853 0,04130 0,05578 0,07098 0,08583 0,09926 0,11039 0,11858 0,12350 0,12511

11 0,01426 0,02253 0,03296 0,04517 0,05852 0,07219 0,08530 0,09702 0,10666 0,11374

12 0,00654 0,01126 0,01785 0,02635 0,03658 0,04813 0,06042 0,07277 0,08444 0,09478

13 0,00277 0,00520 0,00893 0,01419 0,02110 0,02962 0,03951 0,05038 0,06171 0,07291

14 0,00109 0,00223 0,00414 0,00709 0,01130 0,01692 0,02399 0,03238 0,04187 0,05208

15 0,00040 0,00089 0,00180 0,00331 0,00565 0,00903 0,01359 0,01943 0,02652 0,03472

16 0,00014 0,00033 0,00073 0,00145 0,00265 0,00451 0,00722 0,01093 0,01575 0,02170

Page 47: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

Table de la loi normale centrée réduite

Le tableau donne la probabilité p(U < u)

u 0 1 2 3 4 5 6 7 8 9

0 0,5000 0,5040 0,5080 0,5120 0,5160 0,5199 0,5239 0,5279 0,5319 0,5359

0,1 0,5398 0,5438 0,5478 0,5517 0,5557 0,5596 0,5636 0,5675 0,5714 0,5753

0,2 0,5793 0,5832 0,5871 0,5910 0,5948 0,5987 0,6026 0,6064 0,6103 0,6141

0,3 0,6179 0,6217 0,6255 0,6293 0,6331 0,6368 0,6406 0,6443 0,6480 0,6517

0,4 0,6554 0,6591 0,6628 0,6664 0,6700 0,6736 0,6772 0,6808 0,6844 0,6879

0,5 0,6915 0,6950 0,6985 0,7019 0,7054 0,7088 0,7123 0,7157 0,7190 0,7224

0,6 0,7257 0,7291 0,7324 0,7357 0,7389 0,7422 0,7454 0,7486 0,7517 0,7549

0,7 0,7580 0,7611 0,7642 0,7673 0,7704 0,7734 0,7764 0,7794 0,7823 0,7852

0,8 0,7881 0,7910 0,7939 0,7967 0,7995 0,8023 0,8051 0,8078 0,8106 0,8133

0,9 0,8159 0,8186 0,8212 0,8238 0,8264 0,8289 0,8315 0,8340 0,8365 0,8389

1 0,8413 0,8438 0,8461 0,8485 0,8508 0,8531 0,8554 0,8577 0,8599 0,8621

1,1 0,8643 0,8665 0,8686 0,8708 0,8729 0,8749 0,8770 0,8790 0,8810 0,8830

1,2 0,8849 0,8869 0,8888 0,8907 0,8925 0,8944 0,8962 0,8980 0,8997 0,9015

1,3 0,9032 0,9049 0,9066 0,9082 0,9099 0,9115 0,9131 0,9147 0,9162 0,9177

1,4 0,9192 0,9207 0,9222 0,9236 0,9251 0,9265 0,9279 0,9292 0,9306 0,9319

1,5 0,9332 0,9345 0,9357 0,9370 0,9382 0,9394 0,9406 0,9418 0,9429 0,9441

1,6 0,9452 0,9463 0,9474 0,9484 0,9495 0,9505 0,9515 0,9525 0,9535 0,9545

1,7 0,9554 0,9564 0,9573 0,9582 0,9591 0,9599 0,9608 0,9616 0,9625 0,9633

1,8 0,9641 0,9649 0,9656 0,9664 0,9671 0,9678 0,9686 0,9693 0,9699 0,9706

1,9 0,9713 0,9719 0,9726 0,9732 0,9738 0,9744 0,9750 0,9756 0,9761 0,9767

2 0,9772 0,9778 0,9783 0,9788 0,9793 0,9798 0,9803 0,9808 0,9812 0,9817

2,1 0,9821 0,9826 0,9830 0,9834 0,9838 0,9842 0,9846 0,9850 0,9854 0,9857

2,2 0,9861 0,9864 0,9868 0,9871 0,9875 0,9878 0,9881 0,9884 0,9887 0,9890

2,3 0,9893 0,9896 0,9898 0,9901 0,9904 0,9906 0,9909 0,9911 0,9913 0,9916

2,4 0,9918 0,9920 0,9922 0,9925 0,9927 0,9929 0,9931 0,9932 0,9934 0,9936

2,5 0,9938 0,9940 0,9941 0,9943 0,9945 0,9946 0,9948 0,9949 0,9951 0,9952

2,6 0,9953 0,9955 0,9956 0,9957 0,9959 0,9960 0,9961 0,9962 0,9963 0,9964

2,7 0,9965 0,9966 0,9967 0,9968 0,9969 0,9970 0,9971 0,9972 0,9973 0,9974

2,8 0,9974 0,9975 0,9976 0,9977 0,9977 0,9978 0,9979 0,9979 0,9980 0,9981

2,9 0,9981 0,9982 0,9982 0,9983 0,9984 0,9984 0,9985 0,9985 0,9986 0,9986

3 0,99865 0,99869 0,99874 0,99878 0,99882 0,99886 0,99889 0,99893 0,99896 0,99900

3,1 0,99903 0,99906 0,99910 0,99913 0,99916 0,99918 0,99921 0,99924 0,99926 0,99929

3,2 0,99931 0,99934 0,99936 0,99938 0,99940 0,99942 0,99944 0,99946 0,99948 0,99950

3,3 0,99952 0,99953 0,99955 0,99957 0,99958 0,99960 0,99961 0,99962 0,99964 0,99965

3,4 0,99966 0,99968 0,99969 0,99970 0,99971 0,99972 0,99973 0,99974 0,99975 0,99976

3,5 0,99977 0,99978 0,99978 0,99979 0,99980 0,99981 0,99981 0,99982 0,99983 0,99983

3,6 0,999841 0,999847 0,999853 0,999858 0,999864 0,999869 0,999874 0,999879 0,999883 0,999888

3,7 0,999892 0,999896 0,999900 0,999904 0,999908 0,999912 0,999915 0,999918 0,999922 0,999925

3,8 0,999928 0,999931 0,999933 0,999936 0,999938 0,999941 0,999943 0,999946 0,999948 0,999950

3,9 0,9999519 0,9999539 0,9999557 0,9999575 0,9999593 0,9999609 0,9999625 0,9999641 0,9999655 0,9999670

U

u

p(U < u)

Page 48: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

Table de la loi de Student

Le tableau donne les valeurs t

telles que p(-t < T < t) = p

1 - p 1 - p

ddl 0,2 0,1 0,05 0,02 0,01 ddl 0,2 0,1 0,05 0,02 0,01

1 3,078 6,314 12,706 31,821 63,657 28 1,313 1,701 2,048 2,467 2,763

2 1,886 2,920 4,303 6,965 9,925 29 1,311 1,699 2,045 2,462 2,756

3 1,638 2,353 3,182 4,541 5,841 30 1,310 1,697 2,042 2,457 2,750

4 1,533 2,132 2,776 3,747 4,604 31 1,309 1,696 2,040 2,453 2,744

5 1,476 2,015 2,571 3,365 4,032 32 1,309 1,694 2,037 2,449 2,738

6 1,440 1,943 2,447 3,143 3,707 33 1,308 1,692 2,035 2,445 2,733

7 1,415 1,895 2,365 2,998 3,499 34 1,307 1,691 2,032 2,441 2,728

8 1,397 1,860 2,306 2,896 3,355 35 1,306 1,690 2,030 2,438 2,724

9 1,383 1,833 2,262 2,821 3,250 36 1,306 1,688 2,028 2,434 2,719

10 1,372 1,812 2,228 2,764 3,169 37 1,305 1,687 2,026 2,431 2,715

11 1,363 1,796 2,201 2,718 3,106 38 1,304 1,686 2,024 2,429 2,712

12 1,356 1,782 2,179 2,681 3,055 39 1,304 1,685 2,023 2,426 2,708

13 1,350 1,771 2,160 2,650 3,012 40 1,303 1,684 2,021 2,423 2,704

14 1,345 1,761 2,145 2,624 2,977 41 1,303 1,683 2,020 2,421 2,701

15 1,341 1,753 2,131 2,602 2,947 42 1,302 1,682 2,018 2,418 2,698

16 1,337 1,746 2,120 2,583 2,921 43 1,302 1,681 2,017 2,416 2,695

17 1,333 1,740 2,110 2,567 2,898 44 1,301 1,680 2,015 2,414 2,692

18 1,330 1,734 2,101 2,552 2,878 45 1,301 1,679 2,014 2,412 2,690

19 1,328 1,729 2,093 2,539 2,861 46 1,300 1,679 2,013 2,410 2,687

20 1,325 1,725 2,086 2,528 2,845 47 1,300 1,678 2,012 2,408 2,685

21 1,323 1,721 2,080 2,518 2,831 48 1,299 1,677 2,011 2,407 2,682

22 1,321 1,717 2,074 2,508 2,819 49 1,299 1,677 2,010 2,405 2,680

23 1,319 1,714 2,069 2,500 2,807 50 1,299 1,676 2,009 2,403 2,678

24 1,318 1,711 2,064 2,492 2,797 51 1,298 1,675 2,008 2,402 2,676

25 1,316 1,708 2,060 2,485 2,787 100 1,290 1,660 1,984 2,364 2,626

26 1,315 1,706 2,056 2,479 2,779 ∞∞∞∞ 1,282 1,645 1,960 2,327 2,576

27 1,314 1,703 2,052 2,473 2,771

p

-t t T

Page 49: Probability distributions Statistical inference - Accueiljff-dut-tc.weebly.com/uploads/1/4/7/9/14799044/es3_-_inferstats...Probability distributions Statistical inference ... Introduction

Table de la loi du χχχχ²

Le tableau donne les valeurs χ²lim

telles que p(χ² < χ²lim) = p

1 - p 1 - p 1 - p 1 - p

ddl 1% 2% 5% 10% ddl 1% 2% 5% 10% ddl 1% 2% 5% 10% ddl 1% 2% 5% 10%

1 6,64 5,41 3,84 2,71 26 45,6 42,9 38,9 35,6 51 77,4 73,8 68,7 64,3 76 108 103 97,4 92,2

2 9,21 7,82 5,99 4,61 27 47 44,1 40,1 36,7 52 78,6 75 69,8 65,4 77 109 105 98,5 93,3

3 11,3 9,84 7,82 6,25 28 48,3 45,4 41,3 37,9 53 79,8 76,2 71 66,5 78 110 106 99,6 94,4

4 13,3 11,7 9,49 7,78 29 49,6 46,7 42,6 39,1 54 81,1 77,4 72,2 67,7 79 111 107 101 95,5

5 15,1 13,4 11,1 9,24 30 50,9 48 43,8 40,3 55 82,3 78,6 73,3 68,8 80 112 108 102 96,6

6 16,8 15 12,6 10,6 31 52,2 49,2 45 41,4 56 83,5 79,8 74,5 69,9 81 114 109 103 97,7

7 18,5 16,6 14,1 12 32 53,5 50,5 46,2 42,6 57 84,7 81 75,6 71 82 115 110 104 98,8

8 20,1 18,2 15,5 13,4 33 54,8 51,7 47,4 43,7 58 86 82,2 76,8 72,2 83 116 112 105 99,9

9 21,7 19,7 16,9 14,7 34 56,1 53 48,6 44,9 59 87,2 83,4 77,9 73,3 84 117 113 106 101

10 23,2 21,2 18,3 16 35 57,3 54,2 49,8 46,1 60 88,4 84,6 79,1 74,4 85 118 114 108 102

11 24,7 22,6 19,7 17,3 36 58,6 55,5 51 47,2 61 89,6 85,8 80,2 75,5 86 119 115 109 103

12 26,2 24,1 21 18,5 37 59,9 56,7 52,2 48,4 62 90,8 87 81,4 76,6 87 121 116 110 104

13 27,7 25,5 22,4 19,8 38 61,2 58 53,4 49,5 63 92 88,1 82,5 77,7 88 122 117 111 105

14 29,1 26,9 23,7 21,1 39 62,4 59,2 54,6 50,7 64 93,2 89,3 83,7 78,9 89 123 118 112 106

15 30,6 28,3 25 22,3 40 63,7 60,4 55,8 51,8 65 94,4 90,5 84,8 80 90 124 120 113 108

16 32 29,6 26,3 23,5 41 65 61,7 56,9 52,9 66 95,6 91,7 86 81,1 91 125 121 114 109

17 33,4 31 27,6 24,8 42 66,2 62,9 58,1 54,1 67 96,8 92,9 87,1 82,2 92 126 122 115 110

18 34,8 32,3 28,9 26 43 67,5 64,1 59,3 55,2 68 98 94 88,3 83,3 93 128 123 117 111

19 36,2 33,7 30,1 27,2 44 68,7 65,3 60,5 56,4 69 99,2 95,2 89,4 84,4 94 129 124 118 112

20 37,6 35 31,4 28,4 45 70 66,6 61,7 57,5 70 100 96,4 90,5 85,5 95 130 125 119 113

21 38,9 36,3 32,7 29,6 46 71,2 67,8 62,8 58,6 71 102 97,6 91,7 86,6 96 131 127 120 114

22 40,3 37,7 33,9 30,8 47 72,4 69 64 59,8 72 103 98,7 92,8 87,7 97 132 128 121 115

23 41,6 39 35,2 32 48 73,7 70,2 65,2 60,9 73 104 99,9 93,9 88,9 98 133 129 122 116

24 43 40,3 36,4 33,2 49 74,9 71,4 66,3 62 74 105 101 95,1 90 99 135 130 123 117

25 44,3 41,6 37,7 34,4 50 76,2 72,6 67,5 63,2 75 106 102 96,2 91,1 100 136 131 124 118

χ²lim χ²

p