Good Morning
Jan 18, 2017
Good Morning
NORMAL CURVESeminar-1
ByDr.M.S.Bala Vidyadhar
Introduction
Basis for statistical analysis
Probability Distributions
Normal Distribution/Curve
History
Description
Standard normal Variate
Variations
Contents:
Normal interpretation
Comparisons
Normality tests
Conclusion
Previous Year Questions
References
Contents:
The word statistics comes from Italian word ‘statista’
meaning statesman or German word ‘statistik’ which
means political state.
Science of statistics existed since the time of early Egypt to
the Roman empire to count the families.
John Graunt (1620-1674) is known as The Father of Health
Statistics.
Introduction
Statistics is the science of compiling classifying and tabulating numerical data and expressing the results in a mathematical or graphical form.
Biostatistics is that branch of statistics concerned with mathematical facts and data related to Biological events.
Introduction:
Statistical analyses are based on three primary entities:
1. (U) Population of interest,
2. (V) set of characteristics or variables of the units of
this population,
3. (P) Probability Distribution of the variables in the
given population.
Basis for Statistical Analysis
It is the most crucial link between the population and
its variables, which allows us to draw inferences on the
population based on the sample observations.
It is a way to enumerate the different values the
variable can have, and how frequently each value
appears in the population.
Probability Distributions
The three probability distributions useful in
medicine/health care are:
1. Normal distribution
2. Binomial distribution
3. Poissons distribution.
Probability distributions
Binomial distribution: Useful where an event or variables are only binary
outcomes.(eg: yes/no; positive/negative).
Poissons distribution: Useful where the outcome is the number of times an
event occurs in the population, hence very helpful in determining the probability of rare events/diseases.
Both these distributions are applicable in discrete data
only.
When data is collected from a very large population and a frequency distribution is made with narrow class intervals, the resulting curve is smooth, symmetrical and is called a normal distribution curve.
Also called as Gaussian Distribution.
The normal distribution is continuous, so it can take on
any value.
Normal Distribution
Was first discovered by Abraham de Moivre and published in 1733.
History
Two mathematician astronomers , Pierre-Simon Laplace (France) and Karl Friedrich Gauss (Germany) established the scientific principles of the Normal Distribution.
But Gauss’ name was given to the distribution as he applied it to the Theory of motions of heavenly bodies.
The normal distribution curve is a smooth, bell shaped curve and is symmetric about the mean of the distribution, symbolized by the letter μ(mu).
The standard deviation is denoted by the Greek letter sigma(σ).
Sigma is the horizontal distance between mean and the point of inflection on the curve.
Description
Normal distribution curve
The mean and the standard deviation are the two parameters that completely determine the location on the number line and the shape of the normal curve.
Thus many normal curves are possible , one each for every value of the mean and standard deviation , but all curves under probability distributions have area under the curve equal to 1.
Description contd.
The curve will be normal for values like height, weight, hemoglobin ,PCV, BP, etc.
For all normal curves their mean ,median and mode are equal and coincide on the graph.
The horizontal distance between the central point and 1 SD to both left and right of mean is marked as one confidence limit.
Normal Curve
In case of Normal Curve, distribution of individual subjects for their characters are symmetrically distributed in relation to mean and SD.
The area between -1SD to +1SD, there are 68.27 % of the observations.
The area between -2SD to +2SD, there are 95.4% of the observations.
The area between -3SD to +3SD, there are 99.5% of the observations of the population.
Confidence Limits
These limits are called the confidence limits and range between
the two is called confidence interval.
Observations lying within -2 SD to +2SD are known to lie in the
critical level of significance.
The data lying outside this area is said to be significantly different
from the population mean value.
Extreme values will occur only about 5 times in 100 observations.
As the normal curve is symmetrical, coefficient of skewness is
equal to 0
The central limit theorem states that under certain (fairly common) conditions, the sum of many random variables will have an approximately normal distribution.
More specifically, where X1, …, Xn are independent and identically distributed random variables with the same arbitrary distribution, zero mean, and variance σ2; and Z is their mean scaled by
Central Limit Theorem
A normal distribution with parameters μ and σ has the following properties.
The curve is Bell –shaped a. It is symmetrical (Non-skew). b. The mean, media and mode are equal. The curve is asymptotic to the X-axis. That is, the
curve touches the X-axis only at -∞ and+∞. The curve has points of inflexion at μ - σ and μ +σ.
Properties of the Normal Curve
For the distribution …. a. Standard deviation = σ b. Quartile deviation = 2/3 σ (approximately) c. Mean deviation = 4/5 σ (approximately)
For the distribution …. a. The odd order moments are equal to zero. b. The even order moments are given by –
Thus, μ2 = σ2 and μ4 = 3σ4.
The distribution is mesokurtic. That is,β2=3.
Total area under the curve is unity.
P[a < X ≤ b]=
a. P[ μ - σ < X ≤ μ + σ ] = 0.6826 = 68.26% b. P[μ – 2σ < X ≤ μ + 2σ] = 0.9544 = 95.44% c .P[μ – 3 σ < X ≤ μ + 3σ] = 0.9974 = 99.74%
Area bounded by the curve and the ordinates at a and b
Deviation of an individual observation from the mean in a normal distribution or curve is called standard normal variate and is given the symbol Z.
It is measured in terms of standard deviations (SDs) and indicates how such an observation is bigger or smaller than mean in units of standard deviation.
Standard Normal Variate
So Z will be a ratio, calculated as given.
Where Z stands for individual observation whereas μ and σ stand for mean and SD as usual.
Z is also called Standard normal deviate or relative normal deviate.
z X
The normal curve is completely determined by two
parameters mean(µ) and SD(σ).
So, a different normal distribution is specified for each
different value of µ and σ.
Variations of mean and SD values affect the normal
curve in different ways.
Comparisons
The effects of µ and SDHow does the standard deviation affect the shape of f(x)
= 2
=3 =4
= 10 = 11 = 12How does the mean value affect the location of f(x)
Different values of μ shift the graph of the distribution along the x-axis.
Different values of σ (SD) determine the degree of flatness or peakedness of the graph of the distribution.
Details of area under cumulative normal distribution can also be plotted.
It shows the cumulative probability by levels of mean ± Standard error.
assssssdddddfffffff
The two variations of the normal curve are due to the 2
variants of the curve.
1. Skewness
2. Kurtosis The normal curve is symmetric; Frequently, however,
our data distributions, especially with small sample sizes, will show some degree of asymmetry, or departure from symmetry.
Variations of the Normal Curve
Skewness is a statistic to measure the degree of asymmetry.
If the distribution has a longer "tail" to the right of the peak than to the left, the distribution is skewed to the right or has positive skewness.
If the reverse is true the distribution is said to be skewed to the left or to have a negative skewness.
Skewness
The value of skewness can be computed by
Where X is each individual score. The value of skewness is zero when the distribution is a
completely symmetric bell shaped curve.
A positive value indicates that the distribution is skewed to the right (i.e.,positive skewness) and a negative value indicates that the distribution is skewed to the left (i.e.negative skewness).
While skewness describes the degree of symmetry of a distribution, kurtosis measures the height of a distribution curve.
To compute kurtosis, we use the formula
Kurtosis
A positive kurtosis indicates that the distribution has a relatively high peak ; this is called leptokurtic.
A negative kurtosis indicates that the distribution is relatively flat topped this is called platykurtic.
A normal distribution has a kurtosis of zero this is called mesokurtic.
Skewness and Kurtosis provide distributional
information about the data.
In statistical tests that assume a normal distribution of a
data, skewness and kurtosis can be used to examine this
assumption called normality.
With measurements whose distributions are not normal, a simple
transformation of the scale of the measurement may induce
approximate normality.
The square root √x, and the logarithm, log x, are often used as
transformations in this way.
Those transformations are found useful for flexible use of some
tests of significance like student's t test.
Non Normal Distributions
Even if the distribution in the original population is far from normal, the distribution of sample averages tends to become normal, under a wide variety of conditions, as the size of the sample increases.
This is the single most important reason for the use of the normal distribution.
Also, many results that are useful in statistical work, although strictly true only when the population is normal, hold well enough for rough and ready use when samples come from non-normal populations.
When presenting such results, we can indicate how well they stand up under non-normality.
Normal curve INTERPRETATION
...71828.2eand...14159.3where
xe2
1)x(f
2x
)2/1(
Mathematical representation A random variable X with mean µ and standard deviation σ is normally distributed if its probability density function is given by
The Shape of Normal Distributions
Normal distributions are bell shaped, and symmetrical around .
Why symmetrical? Let µ = 100. Suppose x = 110. 22
10)2/1(
100110)2/1(
e2
1e
21
)110(f
Now suppose x = 9022 10
)2/1(10090
)2/1(e
21
e2
1)90(f
11090
The expected value (also called the mean) E(X) (or µ) can be any number
The standard deviation can be any nonnegative number
The total area under every normal curve is 1 There are infinitely many normal distributions
Normal Probability Distributions
Total area =1; symmetric around µ
The effects of μ and σHow does the standard deviation affect the shape of f(x)?
= 2 =3
=4
= 10 = 11 = 12How does the expected value affect the location of f(x)?
X83 6 9 120
A family of bell-shaped curves that differ only in their means and standard deviations.µ = the mean of the distributionσ = the standard deviation
µ = 3 and = 1
X3 6 9 120
X3 6 9 120
µ = 3 and = 1
µ = 6 and = 1
X83 6 9 120
X83 6 9 120
µ = 6 and σ = 2
µ = 6 and σ = 1
X
Probability = area under the density curveP(6 < X < 8) = area under the density curve
between 6 and 8.
3 6 9 12
P(6 < X < 8) µ = 6 and σ = 2
0 X
X
Probability = area under the density curveP(6 < X < 8) = area under the density curve
between 6 and 8.a b
83 6 9 12
P(6 < X < 8) µ = 6 and σ =2
0
6 8
X
a b
Probabilities:area undergraph of f(x)
P(a < X < b) = area under the density curve between a and b.
P(X=a) = 0P(a < x < b) = P(a < x < b)
f(x) P(a < X < b)
X
P(a X b) = f(x)dxa
b
Suppose X~N( Form a new random variable by subtracting the mean μ
from X and dividing by the standard deviation :(X
This process is called standardizing the random variable X.
Standardizing
(X is also a normal random variable; we will denote it by Z:
Z = (X-µ)/σ
Z has mean 0 and standard deviation 1: E(Z) ==0; SD(Z) =1.
1
The probability distribution of Z is called the standard normal distribution.
Standardizing (cont.)
If X has mean and stand. dev. , standardizing a particular value of x tells how many standard deviations x is above or below the mean .
Exam 1: =80, =10; exam 1 score: 92 Exam 2: =80, =8; exam 2 score: 90Which score is better?
Standardizing (cont.)
1 exam on 92 than better is 2 exam on 90
1.258
108
8090z
1.21012
108092z
2
1
X83 6 9 120
µ = 6 and = 2
Z0 1 2 3-1-2-3
.5.5
µ = 0 and = 1
(X-6)/2
A normal random variable x has the following pdf:Pdf of a standard normal curve
zez
pdfforandforsubstituteNZ
xexf
z
x
,21)(
becomes rv normal standard for the10)1,0(~
,)(
2
2
2)(21
21
21
Z = standard normal random variable = 0 and = 1
Standard Normal Distribution
Z0 1 2 3-1-2-3
.5.5 .5.5
Table Z is the standard Normal table. We have to convert our data to z-scores before using the table.
The figure shows us how to find the area to the left when we have a z-score of 1.80:
Finding Normal Percentiles by Hand (cont.)
Areas Under the Z Curve: Using the Table
P(0 < Z < 1) = .8413 - .5 = .3413
0 1 Z.1587.3413
.50
Standard normal probabilities have been calculated and are provided in table Z.
The tabulated probabilities correspondto the area between Z= - and some z0
Z = z0
P(- <Z<z0)
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
… … … …
1 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
… … … …
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
… … … …
Example – continued X~N(60, 8)
In this example z0 = 1.25
0.89440.8944
0.89440.89440.89440.8944= 0.8944
60 70 60( 70)8 8
( 1.25)
XP X P
P z
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.53590.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.57530.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
… … … …1 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
… … … …1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
… … … …
P(z < 1.25)
P(0 z 1.27) =
Examples
1.270 z
Area=.3980
.8980-.5=.3980
P(Z .55) = A1
= 1 - A2
= 1 - .7088 = .2912
0 .55
A2
P(-2.24 z 0) =
ExamplesArea=.4875
.5 - .0125 = .4875
z-2.24 0Area=.0125
P(z -1.85) = .0322
P(-1.18 z 2.73) = A - A1
= .9968 - .1190 = .8778
Examples (cont.)
A1 A2
0 2.73z
-1.18
.1190
.9968
A1
A
P(-1 ≤ Z ≤ 1) = .8413 - .1587 =.6826
vi) P(-1≤ Z ≤ 1)
0.84130.1587
0.6826
Look up .2514 in body of table; corresponding entry is -.67
6. P(z < k) = .2514
6. P(z < k) = .2514
.5 .5
Is k positive or negative?
-.67
Direction of inequality; magnitude of probability
250 275( 250) ( )43
25( ) ( .58) 1 .2810 .719043
P X P Z
P Z P Z
Examples (cont.)
.2810
.7190
Examples (cont.)
225 275 275 375 27543 43 43
) (225 375)
( 1.16 2.33) .9901 .1230 .8671
x
ix P xP
P z
.9901.1230
.8671
X~N(275, 43) find k so that P(x<k)=.9846
88.367275)43(16.2
table)normal standard from(16.243275
43275
43275
43275)(9846.
k
k
kzP
kxPkxP
P( Z < 2.16) = .9846
0 2.16 Z.1587.4846Area=.5
.9846
Regulate blue dye for mixing paint; machine can be set to discharge an average of μ ml./can of paint.
Amount discharged: N(µ, .4 ml). If more than 6 ml. discharged into paint can, shade of blue is unacceptable.
Determine the setting μ so that only 1% of the cans of paint will be unacceptable
Example
Solution
=amount of dye discharged into can~N( , .4); determine so that
( 6) .01
XX
P X
Solution (cont.)
6 6.4 .4 .4
6.4
=amount of dye discharged into can~N( , .4); determine so that
( 6) .01
.01 ( 6)
2.33(from standard normal table) = 6-2.33(.4) = 5.068
x
XX
P X
P x P P z
In statistics, normality tests are used to determine if a data set is well-modelled by a normal distribution and to compute how likely it is for a random variable underlying the data is set to be normally distributed.
More precisely, the tests are a form of model selection, and can be interpreted several ways, depending on one's interpretations of probability.
Tests for Normality
Graphical methods
An informal approach to testing normality is to compare a histogram of
the sample data to a normal probability curve.
The empirical distribution of the data (the histogram) should be bell-
shaped and resemble the normal distribution. This might be difficult to
see if the sample is small.
In this case one might proceed by regressing the data against the
quantiles of a normal distribution with the same mean and variance as the
sample.
Lack of fit to the regression line suggests a departure from normality.
A graphical tool for assessing normality is the normal probability plot, a quantile-quantile plot (QQ plot) of the standardized data against the standard normal distribution.
Here the correlation between the sample data and normal quantiles (a measure of the goodness of fit) measures how well the data is modeled by a normal distribution.
For normal data the points plotted in the QQ plot should fall approximately on a straight line, indicating high positive correlation.
These plots are easy to interpret and also have the benefit that outliers are easily identified.
Simple back-of-the-envelope test takes the sample maximum and minimum and computes their z-score, or more properly t-statistic (number of sample standard deviations that a sample is above or below the sample mean), and compares it to the 68–95–99.7 rule.
This test is useful in cases where one faces kurtosis risk – where large deviations matter – and has the benefits that it is
very easy to compute and to communicate: non-statisticians can easily grasp that “6σ events don’t happen in normal distributions”.
Back-of-the-envelope test
Tests of univariate normality include D'Agostino’s Ksquared test Jarque–Bera test Anderson–Darling test Cramér–von Mises criterion Lilliefors test for normality (itself an adaptation of the
Kolmogorov– Smirnov test) Shapiro–Wilk test Pearson’s chisquared test Shapiro–Francia test.
Frequentist tests
A 2011 paper from The Journal of Statistical Modeling and Analytics concludes that Shapiro-Wilk has the best power for a given significance, followed closely by Anderson- Darling when comparing the Shapiro-Wilk, Kolmogorov- Smirnov, Lilliefors, and Anderson-Darling tests.
Ralph B. D'Agostino (1986). “Tests for the Normal Distribution”. In D'Agostino, R.B. and Stephens, M.A. Goodness-of-Fit Techniques. New York: Marcel Dekker. ISBN 0-8247-7487-6.
More recent tests of normality include the energy test (Székely and
Rizzo) and the tests based on the empirical characteristic function
(ecf) (e.g. Epps and Pulley, Henze–Zirkler, BHEP (Baringhaus–
Henze–Epps–Pulley multivariate normality test)
The energy and the ecf tests are powerful tests that apply for testing
univariate or multivariate normality and are statistically consistent
against general alternatives.
Kullback–Leibler divergences between the whole posterior distributions
of the slope and variance do not indicate non-normality.
However, the ratio of expectations of these posteriors and the
expectation of the ratios give similar results to the Shapiro–Wilk statistic
except for very small samples, when non-informative priors are used.
Spiegelhalter suggests using a Bayes factor to compare normality with a
different class of distributional alternatives. This approach has been
extended by Farrell and Rogers-Stewart.
Bayesian tests
One application of normality tests is to the residuals from a linear regression model. If they are not normally distributed, the residuals should not be used in Z tests or in any other tests derived from the normal distribution, such as t tests, F tests and chi-squared tests.
If the residuals are not normally distributed, then the dependent variable or at least one explanatory variable may have the wrong functional form, or important variables may be missing, etc.
Correcting one or more of these systematic errors may produce residuals that are normally distributed.
Results of Normality tests
Most of the statistical analyses presented are based on the bell-shaped or normal distribution.
The major importance of the normal distribution is the statistical inference of how often an observation can occur normally in a population.
The normal distribution is the most important and most widely used distribution in statistics.
The normal distribution is very useful in practice and makes statistical analysis easy.
Conclusion
1. Essentials of community dentistry by Soben Peter.2. Basic and Clinical Biostatistics by Dawson and Trapp.3. Biostatistics by Dr.Vishweswara Rao.4. Health Research Methodology by Okolo.5. Biostatistics by Sarmakaddam6. Biostatistics by Kim and Dialey7. http://en.wikipedia.org/w/index.php?title=File:Normal_
Distribution_PDF8. Introduction to Normal Distributions by David M. Lane9. Introduction to Statistics Online Edition by David M.
Lane1 Other authors: David Scott1, Mikki Hebl, Rudy Guerra , Dan Osherson, and Heidi Ziemer
References
Spiegelhalter, D.J. (1980). An omnibus test for normality for small samples. Biometrika, 67, 493–496. doi:10.1093/biomet/67.2.493
Farrell, P.J., Rogers-Stewart, K. (2006) “Comprehensive study of tests for normality and symmetry: extending the Spiegelhalter test”. Journal of Statistical Computation and Simulation, 76(9), 803 – 816. doi:10.1080/10629360500109023
References
RGUHS -April 2000, Sept.2007, normal curve(10 mks). Sumandeep university- April 2012, normal curve (10mks). Manipal university -April 2007, normal curve (10mks). Mangalore university-July 1993, December 1997(10mks).
Previous year questions
THANK YOU