Bio Statistics Introduction • Any science needs precision for it’s development. • For precision, facts, observations or measurements have to be expressed in figures. • “It has been said when you can measure what you are speaking about and express it in numbers, you know something about it, but when you cannot express it in numbers your knowledge is of meagre and unsatisfactory kind.” - Lord Kelvin • Similarly in medicine, be it diagnosis, treatment or research everything depends on measurement. • E.g. you have to measure or count the number of missing teeth OR measure the vertical dimension [Type text]
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Bio Statistics
Introduction
• Any science needs precision for it’s development.
• For precision, facts, observations or measurements have to be
expressed in figures.
• “It has been said when you can measure what you are speaking about
and express it in numbers, you know something about it, but when
you cannot express it in numbers your knowledge is of meagre and
unsatisfactory kind.” - Lord Kelvin
• Similarly in medicine, be it diagnosis, treatment or research
everything depends on measurement.
• E.g. you have to measure or count the number of missing teeth OR
measure the vertical dimension and express it in number so that it
makes sense.
• Statistic or datum means a measured or counted fact or piece of the
information stated as a figure such as height of one person, birth
weight of a baby etc.
• Statistics or data is plural of the same.
• Statistics is the science of figures.
• Bio statistics is the term used when tools of statistics are applied to
data that is derived from biological sciences such as medicine.
[Type text]
Applications and uses of bio statistics as a science
• In physiology and anatomy
– To define the limits of normality for variable such as height or
weight or Blood Pressure etc in a population.
– Variation more than natural limits may be pathological i.e
abnormal due to play of certain external factors.
– To find correlation between two variables like height and
weight.
• In pharmacology
– To find the action of drugs
– To compare the action of two drugs or two successive dosages
of same drug
– To find the relative potency of a new drug with respect to a
standard drug
• In medicine
– To compare the efficiency of a particular drug, operation or line
of treatment
– To find association between two attributes such as cancer and
smoking
– To identify signs and symptoms of disease
• In community medicine and public health
– To test usefulness of sera or vaccine in the field
[Type text]
– In epidemiologic studies the role of causative factors is
statistically tested
• In research
– It helps in compilation of data , drawing conclusions and
making recommendations.
• For students
– By learning the methods in biostatistics a student learns to
evaluate articles published in medical and dental journals or
papers read in medical and dental conferences.
– He also understands the basic methods of observation in his
clinical practice and research.
Common Statistical Terms
• Constant
– Quantities that do not vary e.g. in biostatistics, mean, standard
deviation are considered constant for a population
• Variable
– Characteristics which takes different values for different person,
place or thing such as height, weight, blood pressure
• Population
– Population includes all persons, events and objects under study.
it may be finite or infinite.
• Sample
[Type text]
– Defined as a part of a population generally selected so as to be
representative of the population whose variables are under
study
• Parameter
– It is a constant that describes a population e.g. in a college there
are 40% girls. This describes the population, hence it is a
parameter.
• Statistic
– Statistic is a constant that describes the sample e.g. out of 200
students of the same college 45% girls. This 45% will be
statistic as it describes the sample
• Attribute
– A characteristic based on which the population can be described
into categories or class e.g. gender, caste, religion.
Source of data
• The main sources for collection of data
– Experiments
– Surveys
– Records
• Experiments
– Experiments are performed to collect data for investigations
and research by one or more workers.
• Surveys
[Type text]
– Carried out for Epidemiological studies in the field by trained
teams to find incidence or prevalence of health or disease in a
community.
• Records
– Records are maintained as a routine in registers and books over
a long period of time
– provides readymade data.
Types of data
• Data is of two types
• Qualitative or discrete data
• In such data there is no notion of magnitude or size of an
attribute as the same cannot be measured.
• The number of person having the same attribute are variable
and are measured
• e.g. like out of 100 people 75 have class I occlusion, 15 have
class II occlusion and 10 have class III occlusion.
• Class I II III are attributes , which cannot be measured in
figures, only no of people having it can be determined
• Quantitative or continuous data
• In this the attribute has a magnitude. both the attribute and the
number of persons having the attribute vary
[Type text]
• E.g Freeway space. It varies for every patient. It is a quantity
with a different value for each individual and is measurable. It
is continuous as it can take any value between 2 and 4 like it
can be 2.10 or 2.55 or 3.07 etc.
Data presentation
• Statistical data once collected should be systematically arranged and
presented
– To arouse interest of readers
– For data reduction
– To bring out important points clearly and strikingly
– For easy grasp and meaningful conclusions
– To facilitate further analysis
– To facilitate communication
• Two main types of data presentation are
– Tabulation
– Graphic representation with charts and diagrams
Tabulation
• It is the most common method
• Data presentation is in the form of columns and rows
• It can be of the following types
– Simple tables
– Frequency distribution tables
[Type text]
Simple Table
Number of patients at KIDS, Bgm
Jan 06 2,800
Feb 06 1,900
March 06 1,750
Frequency distribution table
• In a frequency distribution table, the data is first split into convenient
groups ( class interval ) and the number of items ( frequency ) which
occurs in each group is shown in adjacent column.
Number of Cavities Number of Patients
0 to 3 78
3 to 6 67
6 to 9 32
9 and above 16
[Type text]
Charts and diagrams
• Useful method of presenting statistical data
• Powerful impact on imagination of the people
They are
• Bar chart
• Histogram
• Frequency polygon
• Frequency curve
• Line diagram
• Cumulative frequency diagram or ogive
• Scatter diagram
• Pie chart
• Pictogram
• Spot map or map diagram
Bar chart
• Length of bars drawn vertical or horizontal is proportional to
frequency of variable.
• suitable scale is chosen
• bars usually equally spaced
• They are of three types
-simple bar chart
-multiple bar chart
• two or more variables are grouped together
-component bar chart
[Type text]
• bars are divided into two parts
• each part representing certain item and proportional to
magnitude of that item
Simple Bar Chart
Multiple Bar Chart
[Type text]
Component Bar Chart
Histogram
• pictorial presentation of frequency distribution
• consists of series of rectangles
• class interval given on vertical axis
• area of rectangle is proportional to the frequency
Frequency polygon
• obtained by joining midpoints of histogram blocks at the height of
frequency by straight lines usually forming a polygon
[Type text]
Frequency curve
• when number of observations is very large and class interval is
reduced the frequency polygon losses its angulations becoming a
smooth curve known as frequency curve
Line diagram
• line diagram are used to show the trends of events with the passage of
time
[Type text]
Cumulative Frequency Diagram
• graphical representation of cumulative frequency .
• it is obtained by adding the frequency of previous class
Scatter or Dot diagram
• shows relationship between two variables
• If the dots are clustered showing a straight line, it shows a relationship
of linear nature
[Type text]
Pie chart
• In this frequencies of the group are shown as segment of circle
• Degree of angle denotes the frequency
• Angle is calculated by
– class frequency X 360
total observations
Pictogram
• Popular method of presenting data to the common man
[Type text]
Spot map or map diagram
• These maps are prepared to show geographic distribution of
frequencies of characteristics
Measures of statistical averages or central tendency
• Average value in a distribution is the one central value around which
all the other observations are concentrated
• Average value helps
– to find most characteristic value of a set of measurements
– to find which group is better off by comparing the average of
one group with that of the other
• the most commonly used averages are
– mean
– median
– mode
Mean
• refers to arithmetic mean
• it is the summation of all the observations divided by the total number
of observations (n)
• denoted by X for sample and µ for population
• X = x1 + X2 + X3 …. Xn / n
• Advantages – it is easy to calculate
• Disadvantages – influenced by extreme values
[Type text]
Median
• When all the observation are arranged either in ascending order or
descending order, the middle observation is known as median
• In case of even number the average of the two middle values is taken
• Median is better indicator of central value as it is not affected by the
extreme values
Mode
• Most frequently occurring observation in a data is called mode
• Not often used in medical statistics.
Example
• Number of decayed teeth in 10 children
2,2,4,1,3,0,10,2,3,8
• Mean = 34 / 10 = 3.4
• Median = (0,1,2,2,2,3,3,4,8,10) = 2+3 /2
= 2.5
• Mode = 2 ( 3 Times).
Types of variability
• There are three types of variability
– Biological variability
– Real variability
– Experimental variability
• Experimental variability are of three subtypes
[Type text]
– Observer Error
– Instrumental Error
– Sampling Error
Biological variability
• It is the natural difference which occurs in individuals due to age,
gender and other attributes which are inherent
• This difference is small and occurs by chance and is within certain
accepted biological limits
• e.g. vertical dimension may vary from patient to patient
Real Variability
• such variability is more than the normal biological limits
• the cause of difference is not inherent or natural and is due to some
external factors
• e.g. difference in incidence of cancer among smokers and non
smokers may be due to excessive smoking and not due to chance only
Experimental Variability
• it occurs due to the experimental study
• they are of three types
– Observer error
• the investigator may alter some information or not record
the measurement correctly
– Instrumental error
• this is due to defects in the measuring instrument
• both the observer and the instrument error are called non
sampling error
– Sampling error or errors of bias
[Type text]
• this is the error which occurs when the samples are not
chosen at random from population.
• Thus the sample does not truly represent the population
Measures of variation or dispersion
• Biological data collected by measurement shows variation
• e.g. BP of an individual can show variation even if taken by
standardized method and measured by the same person.
• Thus one should know what is the normal variation and how to
measure it.
• The various measures of variation or dispersion are
• Range
• Mean or average deviation
• Standard deviation
• Co efficient of variation
Range
• It is the simplest
• Defined as the difference between the highest and the lowest figures
in a sample
• Defines the normal limits of a biological characteristic e.g. freeway
space ranges between 2-4 mm
• Not satisfactory as based on two extreme values only
[Type text]
Mean deviation
• It is the summation of difference or deviations from the mean in any
distribution ignoring the + or – sign
• Denoted by MD
MD = ∑ ( x – x )
n
X = observation
X = mean
n = no of observation
Standard deviation
• Also called root mean square deviation
• It is an Improvement over mean deviation used most commonly in
statistical analysis
• Denoted by SD or s for sample and σ for a population
• Denoted by the formula
SD = ∑ ( x – x )2
n or n-1
• Greater the standard deviation, greater will be the magnitude of
dispersion from mean
[Type text]
• Small standard deviation means a high degree of uniformity of the
observations
• Usually measurement beyond the range of ± 2 SD are considered rare
or unusual in any distribution
• Uses of Standard Deviation
• It summarizes the deviation of a large distribution from its
mean.
• It helps in finding the suitable size of sample e.g. greater
deviation indicates the need for larger sample to draw
meaningful conclusions
• It helps in calculation of standard error which helps us to
determine whether the difference between two samples is by
chance or real
Coefficient of variation
• It is used to compare attributes having two different units of
measurement e.g. height and weight
• Denoted by CV
CV = SD X 100
Mean
• and is expressed as percentage
[Type text]
Normal distribution or normal curve
• So much of physiologic variation occurs in any observation
• Necessary to
– Define normal limits
– Determine the chances of an observation being normal
– To determine the proportion of observation that lie within a
given range
• Normal distribution or normal curve used most commonly in statistics
helps us to find these
• Large number of observations with a narrow class interval gives a
frequency curve called the normal curve
It has the following characteristics
• Bell shaped
• Bilaterally symmetrical
• Frequency increases from one side reaches its highest and decreases
exactly the way it had increased
• The highest point denotes mean, median and mode which coincide
[Type text]
• Mean +_ 1 SD includes 68.27% of all observations . such
observations are fairly common
• Mean +- 2 SD includes 95.45% of all observations i.e. by convention
values beyond this range are uncommon or rare. There chances of u
u77being normal is 100 – 95.45 % i.e. only 4.55.%.
• Mean +- 3 SD includes 99.73%. such values are very rare. There
chance of being normal is 0.27% only
• These limits on either side of measurement are called confidence
limits
• the look of frequency distribution curve may vary depending on mean
and SD . thus it becomes necessary to standardize it.
• Eg- One study has SD as 3 and other has SD as 2,thus it becomes
difficult to compare them
• Thus normal curve is standardized by using the unit of standard
deviation to place any measurement with reference to mean.
• The curve that emerges through this procedure is called standard
normal curve
[Type text]
•
Properties of standard normal curve
• smooth bell shaped
• perfectly symmetrical
• based on infinite number of observations thus curve does not touch X
axis
• mean is zero
• SD is always 1
• total area under the curve is 1
• mean median mode coincide
[Type text]
• the unit of SD here is relative or standard normal deviate and is
denoted by Z
• Z = Observation – Mean
SD
• With the help of Z value we can find the area under the curve from a
table
• This area helps to give the P value
Sampling
• It is not possible to include each and every member of population as it
will be time consuming, costly , laborious .
• therefore sampling is done
• Sampling is a process by which some unit of a population or universe
are selected for the study and by subjecting it to statistical
computation, conclusions are drawn about the population from which
these units are drawn
• The sample will be a representative of entire population only
• It is sufficiently large
• It is unbiased
• Such sample will have its statistics almost equal to parameters of
entire population
Two main characteristics of a representative sample are
• Precision
• Unbiased character
Precision
• Precision depends on a sample size
[Type text]
• Ordinarily sample size should not be less than 30
• Precision = √n/s
• n = sample size , s = standard deviation
• Precision is directly proportional to square root of sample size, greater
the sample size greater the precision
• Also greater the SD, less will be the precision
• Thus in such cases to obtain precision, sample size needs to be
increased
Unbiased character
• The sample should be unbiased i.e. every individual should have an
equal chance to be selected in the sample.
• Thus a standard random sampling method should be used
• Non sampling errors can be taken care of by
– Using standardized instruments and criteria
– By single , double , triple blind trials
– Use of a control group
Determination of sample size
For Quantitative Data
• The investigator needs to decide how large an error due to sampling
defect is allowable i.e. allowable error L
• Either the investigator should start with assumed SD or do a pilot
study to estimate SD
sample size = 4 SD2 / L2
[Type text]
• Mean pulse rate of population is 70 beats per min with standard
deviation of 8 beats. What will be the sample size if allowable error is
± 1
n = 4 X 8 X 8 / 1 X 1 = 256
• If L is less n will be more i.e. larger the sample size lesser is the error.
For qualitative data
• In such data we deal with proportion
Sample size = n = 4 p q
L2
• p = proportion of positive character
• q = proportion of negative character
• q = 1-p or (100-p if expressed in percent)
• L = allowable error usually 10% of p
• e.g. incidence rate in last influenza was found to be 5% of the
population exposed
• what should be the size of the sample
• to find incidence rate in current epidemic if allowable error is 10%?
• p = 5% q = 95%
• l = 10 % of p = 0.5%
n = 4 X 5 X 95 / 0.5 X 0.5 = 7600
Probability or p value
[Type text]
• Concept of probability is very important in statistics
• Probability is the chance of occurrence of any event or permutation
combination.
• It is denoted by p for sample and P for population
• In various tests of significance we are often interested to know
whether the observed difference between 2 samples is by chance or
due to sampling variation.
• There probability or p value is used
• P ranges from 0 to 1
• 0 = there is no chance that the observed difference could not be due to
sampling variation
• 1 = it is absolutely certain that observed difference between 2 samples
is due to sampling variation
• However such extreme values are rare.
• P = 0.4 i.e. chances that the difference is due to sampling variation is
4 in 10
• Obviously the chances that it is not due to sampling variation will be 6
in 10
• The essence of any test of significance is to find out p value and draw
inference
• If p value is 0.05 or more
• it is customary to accept that difference is due to chance
(sampling variation) .
• The observed difference is said to be statistically not
significant.
• If p value is less than 0.05
[Type text]
• observed difference is not due chance but due to role of some
external factors.
• The observed difference here is said to be statistically
significant.
From shape of normal curve
• We know that 95% observation lie within mean ± 2SD . Thus
probability of value more or less than this range is 5%
From probability tables
• p value is also determined by probability tables in case of student t
test or chi square test
By area under normal curve
• Here z= standard normal deviate is calculated
• Corresponding to z values the area under the curve is determined (A)
• Probability is given by 2(0.5 - A)
Tests of significance
• Whatever be the sampling procedure or the care taken while selecting
sample, the sample statistics will differ from the population
parameters
• Also variations between 2 samples drawn from the same population
may also occur
[Type text]
• i.e. differences in the results between two research workers for the
same investigation may be observed
• Thus it becomes important to find out the significance of this
observed variation
• ie whether it is due to
• chance or biological variation (statistically not significant) OR
• due to influence of some external factors ( statistically
significant)
• To test whether the variation observed is of significance, the various
tests of significance are done. The test of significance can be broadly
classified as
1. Parameteric tests
2. Non parametric tests
Parameteric tests
• Parametric tests are those tests in which certain assumptions are made
about the population
– Population from which sample is drawn has normal distribution
– The variances of sample do not differ significantly
– The observations found are truly numerical thus arithmetic
procedure such as addition, division, and multiplication can be
used
• Since these test make assumptions about the population parameters
hence they are called parameteric tests .
• These are usually used to test the difference
• They are:
[Type text]
– Student t test( paired or unpaired)
– ANOVA
– Test of significance between two means
Non parametric tests
• In many biological investigations, the research worker may not know
the nature of distribution or other required values of the population.
• Also some biological measurements may not be true numerical values
hence arithmetic procedures are not possible in such cases.
• In such cases distribution free or non parametric tests are used in
which no assumption are made about the population parameters e.g.
• Mann Whitney test
• Chi square test
• Phi coefficient test
• Fischer’s Exact test
• Sign Test
• Freidmans Test
• Test of significance can also be divided into one tailed or 2 tailed test
Two tailed test
• This test determines if there is a difference between the two groups
without specifying whether difference is higher or lower
• It includes both ends or tails of the normal distribution
• Such test is called Two tailed test
[Type text]
• Eg when one wants to know if mean IQ in malnourished children is
different from well nourished children but does not specify if it is
more or less
One tailed test
• In the test of significance when one wants to specifically know if the
difference between the two groups is higher or lower
• ie the direction plus or minus side is specified.
• Then one end or tail of the distribution is excluded
• eg if one wants to know if mal nourished children have less mean IQ
than well nourished then higher side of the distribution will be
excluded
• Such test of significance is called one tailed test
Stages in performing test of significance
• State the null hypothesis
• State the alternative hypothesis
• Accept or reject the null hypothesis
• Finally determine the p value
State the null hypothesis
• Null hypothesis
• It is a hypothesis of no difference between statistics of a sample and
parameter of the population or between statistics of two samples
• It nullifies the claim that the experimental result is different from or
better than the one observed already
[Type text]
State the Alternative hypothesis
• It is hypothesis stating that the sample result is different ie larger or
smaller than the value of population or statistics of one sample is
different from the other
Accept or reject the null hypothesis
• Null Hypothesis is accepted or rejected depending on whether the
result falls in zone of acceptance or zone of rejection
• If the result of a sample falls in the area of mean ± 2SE the null
hypothesis is accepted.
• This area of normal curve is called zone of acceptance for null
hypothesis
• If the result of sample falls beyond the area of mean ± 2 SE
• null hypothesis of no difference is rejected and alternate hypothesis
accepted
• This area of normal curve is called zone of rejection for null
hypothesis
Finally determine the p value
• P value is determined using any of the previously mentioned methods
• If p> 0.05 the difference is due to chance and not statistically different
but if
• p < 0.05 the difference is due to some external factor and statistically
significant
[Type text]
Types of error
• While drawing conclusions in a study we are likely to commit two
types of error.
– Type I error
– Type II error
Type I error
• This type of error occurs
• When we conclude that the difference is significant when in fact there
is no real difference in the population ie, we reject the null hypothesis
when it is true
• Denoted by α
Type II error
• This type of error occurs
• When we say that the difference is not significant when in fact there is
a real difference between the populations i.e. the null hypothesis is not
rejected when it is actually false
• It is denoted by β
Tests of significance for large samples
• These tests are used for sample size greater than 30
• The test used is Z test
• Z is standard normal derivate and has been discussed under normal
distribution
Z = observation – mean / SD
[Type text]
• However in Z test standard deviation is replaced by standard error
In Z test, Z = observed difference / standard error
• We know that standard deviation measure the variation within a
sample
• Standard error is the measure of difference in values occuring
– between a sample and population
– between two samples of the same population
• Standard error used in Z test can be
– Standard error of mean
– Standard error of proportion
– Standard error of difference between 2 means
– Standard error of difference between 2 proportions
• If in the Z test the Z>2 i.e. if the observed difference between the 2
means or proportion is greater than 2 times the standard error of
difference
• p < 0.05 according to the given table
Z 1.6 2.0 2.3 2.6
P 0.1 0.05 0.02 0.01
Thus the difference is not due to chance and may be due to influence of
some external factor i.e. the difference is statistically significant
Standard error of mean
• Used for quantitative data
[Type text]
• Standard error of mean is the difference between sample mean and
population mean given by
SE x = SD of Sample / √n
• also population mean will be sample mean ± 2 standard error of mean
• This will enable us to know whether the sample mean is within the
limits of population mean
Here Z=sample mean – population mean / SE of mean
• In a random sample of 100 the mean blood sugar is 80 mg % with SD
6 mg% . Within what limits the population mean will be ? What can
be said about another sample whose mean is 82%
SE = 6 / √100 = 6 / 10 = 0.6
• Thus the population mean will be 80± 2 X 0.6 = 78.8 to 81.2
• A sample with 82% mean is not within limits of population mean thus
it does not seem to be drawn from the same population
Standard error of difference between 2 means
• Used for quantitative data
• It is the difference between means of two samples drawn from the
same population
• It helps to know what is the significance of difference obtained by 2
research workers for the same investigation
SE (X1 – X2) = √ SD12 / n1 + SD22 / n2
• Eg.Find the significance of difference in mean heights of 50 girls and
50 boys with following values
Mean SD
[Type text]
Girls 147.4 6.6
Boys 151.6 6.3
SE = √ (6.6)2 /50 + (6.3)2 / 50
= 1.29
Z=observed difference / SE
Z = 151.6 – 147.4 / 1.29
= 3.26
• Since Z value is more than 2 ,p will be less than .05
• Thus difference is statistically significant and it can be concluded that
boys are taller than girls
Standard error of proportion
• In case of qualitative data where character remains same but its
frequency varies we express it in proportion instead of mean
• Proportion of individual having special character p
• q is number of individual not having the character
• P+q =1 or 100 if expressed in %age
• Standard error of proportion is the unit which measures variation in
proportion of a character from sample to population
SE of proportion = √ p X q / n
p=proportion of positive character
q=proportion of negative character
n=sample size
[Type text]
• Also proportion of population = proportion of sample ± 2 SEP
• Thus one can determine whether the proportion of sample is within
limits of population proportion
Proportion of blood group B among Indians is 30%. If in a sample of 100
individuals it is 25% what is your conclusion about the group
SEP = √ p X q / n = √ 25 X 75 / 100 = 4.33
Z = observed diff / SE = 30 - 25 / 4.33 = 1.15
• Since z is < 2 ,p will be more .05 thus the difference is not
significant.
Standard error of difference between 2 proportion
• Measures the difference in proportion of a character from sample to
sample
SE (p1-p2) = √ p1 q1 / n1 + p2 q2 / n2
• If typhoid mortality in a sample of 100 is 20 % and other sample of
100 is 30% then is this difference in mortality rate significant ?
• p1 = 20 : q1 = 80 : n1 = 100
• p2 = 30 : q2 = 70 : n2 = 100
• SE(p1-p2) = 6.08
• Z = 30 – 20 / 6.08 = 1.64
• Z< 2 , p<.05 thus difference observed is not significant
Test of significance for small samples
[Type text]
• In case of samples less than 30 the Z value will not follow the normal
distribution
• Hence Z test will not give the correct level of significance .
• In such cases students t test is used
• It was given by WS Gossett whose pen name was student
• There are two types of student t Test
1. Unpaired t test
2. Paired t test
Unpaired t test
• Applied to unpaired data of observation made on individuals of 2
separate groups to find the significance of difference between 2 means
• Sample size is less than 30
• e.g. difference in accuracy in an impression using two different
impression materials
Steps in unpaired t Test are
• Calculate the mean of two samples
• Calculate combined standard deviation
• Calculate the standard error of mean which is given by
SEM = SD √1/n1 + 1/n2
• Calculate observed difference between means X1 – X2
• Calculate t value = observed difference / Standard error of mean
• Determine the degree of freedom which is one less than no of
observation in a sample (n -1)
• Here combined degree of freedom will be = (n1 – 1) + (n2 – 1)
[Type text]
• Refer to table and find the probability of the t value corresponding to
degree of freedom
• P< 0.05 states difference is significant
• P> 0.05 states difference is not significant
• In a nutritional study 13 children in group A are given usual diet along
with vitamin A and vitamin D while 12 children in group B take the
usual diet.
• The gain in weight in pounds for both groups after 12 months is
shown in the table
• Is vitamin A and D responsible for gain in weight?
• Mean of group A = 4
• Mean of group B = 2.5
• Total SD = 1.37
• Total SE = 0.548
• t = Observed difference / SE
[Type text]
Group A Group B
5 1
3 3
4 2
3 4
2 2
6 1
3 3
2 4
3 3
6 2
7 2
5 3
3 -
• t = 4 – 2.5 / 0.548 = 2.74
• Combined degree of freedom = n1 + n2 – 2
• 12 +13 - 2
• p Value is checked corresponding to the t value at 23 d.f. from the t
table
• It is < 0.02
• Thus difference is statistically significant
• And accounted to role of vitamins A&D
Paired t test
• It is applied to paired data of observation from one sample only .
• Used in sample less than 30
• The individual gives a pair of observation i.e. observation before and
after taking a drug
• The steps involved are
• Calculate the difference in paired observation i.e. before and after =
x1 – x2 = y
• Calculate the mean of this difference = y
• Calculate SD
• Calculate SE = SD / √ n
• Determine t = y / SE
• Determine the degree of freedom
• Since there is one sample df = n-1
• Refer to table and find the probability of the t value corresponding to
degree of freedom
• P< 0.05 states difference is significant
• P> 0.05 states difference is not significant
[Type text]
Eg.Systolic BP of a normal individual before and after injection of hypotensive drug is
given in the table. Does the drug lower the BP?
BP before giving drug X1 BP after giving drug X2 Difference X1-X2 = y
122 120 2
121 118 3
120 115 5
115 110 5
126 122 4
130 130 0
120 116 4
125 124 1
128 125 3
• Mean of difference y = ∑ y / n = 27 / 9 = 3
• SD = √ ∑ ( y - y )2 /n-1 = 1.73
• SE = SD / n = 1.73 / 9 = 0.58
• t = y / SE = 3 / 0.58 = 5.17
• Degree of freedom to n – 1 = 9 – 1 = 8
• p value corresponding to t = 5.17 and d.f. 8 is < 0.001
• Thus highly significant
• Thus decrease in BP is due to the Drug
Chi square test
• Chi square test unlike z and t test is a non parametric test
• The test involves calculation of a quantity called chi square .
• Chi square is denoted by X2
[Type text]
• It was developed by Karl Pearson
• The most important application of chi square test in medical statistics
are
• Test of proportion
• Test of association
• Test of goodness of fit
• Test of proportion
• Used as an alternate test to find the significance of difference in
2 or more than 2 proportions
• Test of association
• To measure the probability of association between 2 discreet
attributes e.g smoking and cancer
• Test of goodness of fit
• Tests whether the observed values of a character differ from the
expected value by chance or due to play of some external factor
X2 = € ( O – E ) 2 / E
• X2 denotes Chi square
• O = Observed Value
• E = Expected Value
Steps in Chi Square Test
• State the null hypothesis
• Determine the Chi square value
• Find the degree of freedom
• Refer the Chi square table to find the probability value corresponding
to the degree of freedom
[Type text]
Let us consider the following example
• We are making a field trial of 2 vaccines
• The results of field trial are
Vaccine Attacked Not AttackedTotal Attack Rate
A 22 68 90 24.4%
B 14 72 86 16.2%
Total 36 140 176
• Vaccine B seems to be superior to Vaccine A
• We perform Chi Square test to verify if the vaccine B is superior to