Top Banner
Chapter 1 Introduction The teaching of theory ( 3 hours) Objective: 1. master common statistical terms, such as Homogeneity and variation; Variable, Population and Sample; the types of data, Parameter and Statistic; Sampling and sampling error; probability. etc. 2. know well What’s biostatistics? The main Applications and uses of biostatistics as a science; how to learn the subject well? 3. understand scope of biostatistics, the association among medical statistics, health statistics, vital statistics. Emphasis: 1. master common statistical terms and their notations. 2. know well What’s biostatistics? The main Applications and uses of biostatistics as a science. 3. understand scope of biostatistics, the association among medical statistics, health statistics, vital statistics. master Difficulty: Homogeneity and variation; probability contents: 1. what’s biostatistics? And how to learn the subject well?
47
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BioStatistics

Chapter 1 Introduction The teaching of theory ( 3 hours)Objective:1. master common statistical terms, such as Homogeneity and variation; Variable, Population

and Sample; the types of data, Parameter and Statistic; Sampling and sampling error; probability. etc.

2. know well What’s biostatistics? The main Applications and uses of biostatistics as a science; how to learn the subject well?

3. understand scope of biostatistics, the association among medical statistics, health statistics, vital statistics.

Emphasis:1. master common statistical terms and their notations. 2. know well What’s biostatistics? The main Applications and uses of biostatistics as a science.3. understand scope of biostatistics, the association among medical statistics, health statistics,

vital statistics.master Difficulty: Homogeneity and variation; probabilitycontents:1. what’s biostatistics? And how to learn the subject well? In Webster’s International Dictionary: biostatistics is a science dealing with the collection, analysis, and presentation of masses of numerical data.In Dictionary of Epidemiology: biostatistics is a science and art of dealing with variation in data through collection,classification,and analy-sis in such a way as to obtain reliable results.2.the application of biostatistics as a science. Such as Find limits of normality ; Find the difference between means and proportions is significant or not, Find the correlation between variables and so on.3.scope of biostatistics, the main contents of the textbook, the association among medical statistics, health statistics, vital statistics. 4. common statistical terms, include homogeneity and variation; variable, observation unit, observation, data; the types of data(quantitative and qualitative data); population and sample; parameter and statistic; sampling and sampling error; probability, the notations of the terms.Chapter 2 The process of statistical workThe teaching of theory (5 hours)Objective1. master the process of statistical work: Collection of data, Sorting data or classification of data,

Analysis of data); the steps of drawing frequency distribution table. 2. know well the resources and presentation of data Collection of data, the methods of sorting

data and analyzing of data),the uses of frequency distribution table. 3. understand the association between scientific design and statistical conclusion in research

work.Emphasis:1. the process of statistical work.

Page 2: BioStatistics

2. the main resources and presentation of data.3.The steps of drawing frequency distribution table.4. the methods for analysis of data: descriptive statistics and inferential statistics:Difficulty:The methods of inferential statistics.contents:1. The process of statistical work: Collection of data, Sorting data/classification of data, Analysis of data2.Sources and presentation of data: records: the routine and ready-made information in medical work;experiments on individuals in laboratory; surveys or investigations in community or other certain sites,etc..3.Sorting data or classification of data: correct the mistakes occurred in original records firstly; then need classification in sorting data--drawing frequency distribution tables is often used in the process.4. Analysis of data-- descriptive statistics: statistical indices,statistical table and graph; inferential statistics: estimating of population’s parameter and tests of Hypotheses: t-test, Z-test,2(Chi-Square)test, analysis of variance(ANOVA), Linear correlation and regression, etc.5. how to draw the frequency distribution table for continuous quantitative data and discrete

quantitative data: ; locate the maximal and minimal value; work out the range; estimate the

number of groups and the class interval;list the limits of groups. 6. the uses of frequency distribution table: Find the type of distribution of data is symmetrical distribution or asymmetric distribution; Find out the characters of frequency distribution table; central tendency and tendency of dispersion; Easy to find extreme value; Easy to choose suitable indices or methods to analysis of the data.

the teaching of practice (2 hours)Emphasis:1. master the process the statistical work and2. master the steps of drawing the frequency distribution table.3. know well the uses of frequency table.Contents:1. what’s the process of statistical work?2. review the steps of drawing a frequency distribution for continuous quantitative data? 3. exercises of drawing a frequency table for a given data.

Height values from 110 7-year boys of one certain city in 1992 are followed as below, drawing it ’s frequency distribution table and narrate it’s characters and show it’s type of distribution. 112.4 117.2 122.7 123.0 113.0 110.8 118.2 108.2 118.9 118.1 123.5 118.3 120.3 116.2 114.7 119.7 114.8 119.6 113.2 120.0 119.7

116.8 119.8 122.5 119.7 120.7 114.3 122.0 117.0 122.5 ll9.8 122.9 128.0 121.5 126.1 117.7 124.1 129.3 121.8 112.7 120.2 120.8

126.6 120.0 130.5 120.0 121.5 114.3 124.1 117.2 124.4 116.4 119.0 117.1 114.9 129.1 118.4 113.2 116.0 120.4 112.3 114.9 124.4

112.2 125.2 116.3 125.8 121.0 115.4 121.2 117.9 120.1 118.4 122.8 120.1 112.4 118.5 113.0 120.8 114.8 123.8 119.1 122.8 120.7

117.4 126.2 122.1 125.2 118.0 120.7 116.3 125.1 120.5 114.3 123.1 122.4 110.3 119.3 125.0 111.5 116.8 125.6 123.2 119.5 120.5

127.1 120.6 132.5 116.3 130.8

Page 3: BioStatistics

Chapter 3 The describing indices for quantitative dataThe teaching of theory (6 hours)Objective1. master the names and conditions of applyinging the indices for describing the central

tendency of quantitative data.2. master the names and conditions of applying the indices for describing the tendency of

disperse of quantitative data.3. know well the calculation of the describing indices for quantitative data.4. understand the meaning and calculation of geometric mean.Emphasis:1.master the names, meaning and conditions of applying the indices for describing the central tendency of quantitative data.2.master the names, meaning and conditions of applying the indices for describing the tendency of disperse of quantitative data.3. how to choose the suitable indices for a given data?Difficulty:1. The meaning and calculation of percentile, Median, quartile range, Variance and standard

deviation. 2. the condition of applying CV.contents:indices for describing quantitative data includes two parts: central tendency and tendency of disperse. 1.the indices for describing the central tendency of quantitative data such as mean, median and mode, etc; the conditions when using these different indices.Mean is applied when the data is symmetrical distribution, especially normal distribution; Median is applied to datum of asymmetric distribution(or not so evenly distributed) and one or more value on ends are wide apart. etc.2.calculation of mean for small and large sample:

Calculation of median for small and large sample:For a small sample,when n is odd , Median is the mid-value of the group after all the observations are arranged in ascending (or descending) order;when n is even, Median is the arithmetic mean of the middle two values after observations are arranged in order.But for a large sample, should using the equation listed below:

In the equation, Px—percentile

L---lower limit of the group which median lies

Page 4: BioStatistics

i---class interval of the group which median lies;fx---frequency of the group which the median lies;fL---cumulative frequency before the group median lies ;n---the number of the sample. 3. the indices for describing the tendency of disperse for quantitative data, such as range ,quartile range(Q), variance, standard deviation(SD), and Coefficient of Variation(CV), etc; the conditions when using these different indices.Q is applied to asymmetric distribution mainly; variance and SD is applied to symmetric

distribution, especially normally distributed data; CV is applied to:①compare the variation of

two groups of data which has different measurement units.② compare the variation of two groups of data whose means differ very obviously.

4. calculation of indices of Quartile range, SD for quantitative data and CV:Q=QU- QL= P75 - P25

the teaching of practice ( 4 hours)Emphasis:1.master the names and conditions of applying the indices for describing the central tendency of quantitative data.2.master the names and conditions of applying the indices for describing the tendency of disperse of quantitative data.3.know well how to select the suitable indices for a given data and calculate them in further.Contents:1.introduction of calculator(fx-82TL),common calculation and statistical calculation and learn to use the calculator to work out the indices for describing the characters of quantitative data.2. calculate the mean,SD for a symmetrical distribution data, and calculate the median and quartile range for a asymmetrical distribution data.3.Calculate the coefficient of variation(CV) for given data.Chapter 4 Normal distribution and normal curveThe teaching of theory (4hours)Objective1.master the conception of normal distribution and the characters of normal distribution.2.know well the association between the interval of individual values and the area under normal curve.

Page 5: BioStatistics

3.understand the standardized normal distribution and law of the area distribution under the standard normal curve.4master the applications of normal distribution and learn to choose the suitable methods to work out the normal limits for a variable of medical data.5.understand the principle and methods of Quality Control in medical study.Emphasis:1.the conception of normal distribution and the characters of normal distribution.2. law of the area distribution under the standard normal curve, and the association between probability and standard normal deviate(Z).3. choose the suitable methods to work out the normal limits for a given data.Difficulty:Transition from a normal distribution to standard normal distribution, The relation between probability and standard normal deviate(Z) contents:1.the conception of normal distribution: The maximum number of frequencies lies in the middle, and fewer at the extremes , decreasing smoothly towards both sides,the nature or shape of a distribution is called normal distribution or (Gaussian distribution).2.the characters of normal distribution or normal curve

(1) centrality: the distribution centralize at “”, the curve is highest at “” above on the

abscissa.(2) Symmetry: the curve is symmetrical about the vertical line of “x=”.

(3) normal distribution have two parameters: is location parameter, s is shape parameter.

(4) The normal curve have two inflexions, lies on the two points where x=±(5) The total area under the normal curve is 1 or 100%, and the area distributed according to one

certain law.3.the theory of area distribution under normal curve(1) the area in the range of ±1.96 occupied 95% of the total area under normal curve.(2) The observations lies in the range of ±1.96 is 95% of all the observations.(3) Draw a observation/individual from the population at random, the probability of it lies in the range of ±1.96 is 95%. Standard normal distribution and it also has the same law

The area distribution under standard normal curve have one certain law also, for easy to apply, statistician work out a table to show the relation between area and “z” value of standard normal distribution(Appendices I, after P325 )5. the main applications of normal distribution(1) Find out normality limits

①select a large number of “normal” persons at random to make it a representative sample.②make sure one tailed or two tailed normality limits according to the professional

knowledge.③make sure the suitable proportion:80%,90%,95%,or 99%.④select suitable

methods to work out the normal limits.(2)Quality control:μ3σis control line, μ2σis warning line, μis central line

Page 6: BioStatistics

the teaching of practice (3 hours)Emphasis:1.Master the association between probability and standard normal deviate(Z).2. master selecting suitable methods to work out the normal limits.Contents:1.how frequently of the height values higher than 124cm among the110 boys aged 7 years old?

2.the proportion of the height values lies between 116~122cm among the boys?

3.90% of the boys will centralize on which range?4.seeing about the actual frequency is consistent with the theoretical frequency or not through

counting the numbers in the range of

5. Work out 95% normal limits of the height of the 110 boys of 7-year old.6.now a value of height from a 7-year boy is 110.2, then the boy is normal or abnormal if judged

by 95% normal limits?

Chapter 5 The describing indices for qualitative dataThe teaching of theory (5 hours)Objective1. Master the conception of categories of relative number, those indices applied to describe the qualitative data.2. know well the calculation of the indices, such as rate, proportion and ratio, etc .3. master the items we should pay attention to when applying relative number.4. master the difference between mortality rate or death rate(CDR) and Case fatality rate(CFR),

also Incidence rate(IR) and Prevalence rate(PR).

5.know well the indices in demography pertaining to vital events.6.understand the analysis of dynamic time series data. Emphasis:1.master the indices applied to describe the qualitative data.2. master the items we should pay attention to when applying relative number.3. differentiate mortality rate or death rate(CDR), Case fatality rate(CFR); Incidence rate(IR),

Prevalence rate(PR).Difficulty:The difference between rate and proportion when applying them.contents:1. review what’s qualitative data, and put forward relative number is the describing index for this type of data.2. Categories of relative number : proportion, rate, ratio1) The denominator shouldn’t too small when calculating rate.(2) not confusing the rate and proportion.(3) calculating the total rate correctly.

Page 7: BioStatistics

(4)Pay attention to whether two rates (or proportion) are comparable or not when comparing them.(5) when comparing two the rates (or proportions), should test statistical hypothesis.5.common indices in vital statistics, include those indices in demography pertaining to vital events, death events, and disease events. Such as Population size, Proportion of population,

dependency ratio; mortality rate or death rate(CDR), Case fatality rate(CFR); Incidence rate(IR),

Prevalence rate(PR).6. what’s dynamic time series data? And the indices for analyzing this type of data.

the teaching of practice (3hours)Emphasis:1.Master the conception of categories of relative number, and the meaning of rate, proportion and ratio.2. master the items we should pay attention to when applying relative number ,especially do not confusing rate and proportion in medical study when applying them.3.know well how to calculate the indices.Contents:1.categories of relative number, and the meaning of rate, proportion and ratio.2.the calculation of rate and proportion, and use them correctly.3.the items we should pay attention to when applying relative number.4.do exercises list as below:

(1) Fill the blanks in the table and describe the data in brief.(2) Describe the data using the indices you have learned.

Age

(years)

population deaths deaths caused

by cancer

proportion of cancer

in total deaths(%)

Death rate of cancer

(100thousands)

Age specific

death rate(%0)

0~ 82920 4 2.90

20~ 63 19.05 25.73

40~ 28161 172 42

60~ 32

total 167090 715 90 12.59

2.

Through the survey of health service, we got the data: Proportion of population in some area

Age group Male(%) Female(%) Age group Male(%) Female(%)0~ 4.2 4.0 45~ 2.4 2.75~ 3.2 3.1 50~ 2.1 2.410~ 4.4 4.2 55~ 1.2 2.215~ 5.5 5.3 60~ 1.3 2.420~ 5.1 5.2 65~ 1.1 1.425~ 6.0 6.1 70~ 0.8 1.230~ 4.3 4.5 75~ 0.5 0.935~ 3.2 3.3 80~ 0.2 0.540~ 2.3 2.5 85~ 0.1 0.2

Page 8: BioStatistics

(1)calculate the proportion of elders. (2)calculate the dependency ratio.(3)calculate the proportion of women aged 15~49 years old.

Chapter 6 statistical table and graph The teaching of theory (3 hours)Objective1.know well basic conception of statistical table and statistical graph.

2.master the categories of statistical table and statistical graph, and they are used to what kind of data.

3.master the principles of drawing statistical table and statistical graph.

4.know well choose suitable statistical graph to describe the data in research work.

Emphasis:1.master the categories of statistical table and statistical graph, and they are used to what kind of data.

2.master the principles of drawing statistical table and statistical graph.

Difficulty:How to choose suitable statistical graph to describe the data.

contents:1.Statistical table and Statistical chart are important ways to describe or express the data, it can make the data

legible and clearly at a glance.

statistical table is the format which uses the table form to describe the data.

statistical graph is The format which using the form of geometrical graph such as point, line, and area etc to

describe the data.

2.the categories of statistical table include Simple table and Combined table; statistical graph includes Bar

graph, histogram, proportion graph, line chart, scatter diagram, map diagram, etc.

3 what kind of data the statistical graphs applied to? We should select different graph for different study

objective and different type of data.

(1)bar graph applied to discrete data, the height of the equal-wide bar indicates the size of magnitude.

(2)histogram applied to continuous data, use the area of the rectangles to indicate the frequency of each group.

(3)proportional graph(circle or percent bar)using the length/area of a bar to indicate the proportion of every

parts in one event, or using the sector’s area to express the proportion of different parts of one same event.

(4)line chart is applied to continuous data generally, it shows the rising, falling or fluctuations trend of an event

occurring over a period of time such as birth rate, death rate, cancer deaths, etc.(5)Scatter diagram—using spots to show the nature of correlation between two variables characters X and Y in the same person(s) or group(s).(6)map diagram ,see it in page 33-34.

4. the principles of drawing statistical table.

(1)title—on the top middle. express the main contents of the table, generally includes the time,area and the

event.

(2)line—not too much lines, generally includes 3 lines such as top line, secondary line, bottom line,you need

adding another line before the bottom line when you have “total”.

Page 9: BioStatistics

(3)attributes on the left in simple table, and on the left and top-middle in combined table;indices should write

under the second attributes.

(4)figures in table:write in Arabic numerals in table, specificate the same decimal for one index, not leave

blank in table, fill in “0” if it is, if absent using “…”, if the value not exit using “—”.

(5)notes:if some figures need explaining, label as “*” ,meanwhile explain it’s meaning on the bottom of the

table.

5. the principles of drawing statistical graph.(1)title lies on the bottom middle, if there are many tables in the same paper, you should use Fig1,Fig2,Fig3 etc.(2) Generally, the ratio of vertically and horizontally is 5:7 in bar graph, histogram, scatter graph. beginning

from “0” on ordinate,when necessary using “//” to cut off.(3) write units of attributes on X-axis and Y-axis.(4) if there are 2 or more than 2 attributes, should use different lines or different colors to distinguish them, at

the same time append cutline to illuminate .

the teaching of practice (2 hours)Emphasis:1.master the categories of statistical table and statistical graph, and they are used to what kind of data.

2.master the principles of drawing statistical table and statistical graph.

Contents:1.the categories of statistical table include Simple table and Combined table; statistical graph includes Bar

graph, histogram, proportion graph, line chart, scatter diagram, map diagram, etc.

2. what kind of data the statistical graphs applied to? We should select different graph for different study

objective and different type of data.

3.the principles of drawing statistical table and statistical graph.

4.drawing statistical table and statistical graph for a given data

Choose suitable table or graph to describe the following data.1.In the second national health service survey, we find that:

63.84% urban women delivery a baby in hospital,20.76% in maternal and child health service station, 7.67% in

township hospital, 7.77% in others places;For rural women, 20.38% of them give a baby to birth in hospital,

4.66% in maternal and child health service station, 16.38% in township hospital, and 58.58% in others places.

2. the mortality of three causes of death some area in 1952 and 1992(1/100,000)

Causes of deaths 1952 1992

tuberculosis 165.2 27.4

heart diseases 72.5 83.6

tumor 57.2 178.2

Chapter 7 standard error and the estimation of parametersThe teaching of theory (6 hours)Objective:1. Master the meaning and calculation of standard error of means and proportions

2. Mater the difference between standard deviation of means and standard error of means

3. Master the meaning of the limits of desired confidence, especially

95% limits of desired confidence.

Page 10: BioStatistics

4. Know well the applications and uses of the SE of Mean and proportions

5. Know well the calculating process of Confidence Interval of population mean and proportion.

Emphasis:1.Master and comprehend the meaning of standard error of means and proportions.

2.master and comprehend the meaning of the limits of desired confidence.

Difficulty:How to use different equation to estimate the confidence interval of means and proportion.

contents:1. Standard error of mean and proportion are important units that measures chance variation.

Whatever the sampling procedure or care taken while selecting the sampling, the sampling estimates of

statistics will differ from population parameters, because of chance error or biological variability.

2. They are measurements of chance variation and sampling error. which reflects the difference of sample

means or proportion and population means or proportion So don’t regard it as error or mistake.

3. Calculation of standard error of means.

⑴To calculate the SE, find the mean (μ) of the sample means and then the differences of individual means

from this grand mean. Use the following formula:

⑵Usually only one large sample is drawn and its standard deviation is calculated. Then SE of mean is

calculated by the following formula:

⑶then SE and SD are combined closely by the above formula.

4. Applications and uses of the SE of Mean

⑴Firstly to work out the limits of desired confidence within which the population mean would lie.

⑵Secondly to determine whether the sample is drawn from a known population or not when its mean is

known.

⑶Finally to calculate the desired confidence limits, that is to say, to estimate the population parameters.

5. estimation of the limits of desired confidence of population means.

⑴Firstly t distribution method, we can use below formula on condition of population unknown and sample

size is small.

⑵Secondly normal distribution method, there are two states. One is when the population SD known and

sample number enough, according to standard normal distribution method, we can resort to following formula:

⑶the other is the population SD unknown and sample number enough (n>50), according to standard normal

distribution method, we can resort to following formula:

6.calculation of standard error of proportion can be taken by such formula:

7.application and uses of SEP

sz xx

2/

sxx 96.1

Page 11: BioStatistics

⑴to find confidence limits of population proportion when the sample proportion is known.

⑵to determine if a sample is drawn from the known population or not when the population proportion is

known.

⑶to find the standard error of difference between two proportion to judge their statistics significance.

8.calculate the standard error of difference between two proportion denoted as SE(p1-p2), we can use the

following formula:

the teaching of practice (3 hours)Emphasis:1.mater the difference between standard deviation of means and standard error of means.

2.master and comprehend the calculating process of the limits of desired confidence.

3.master the meaning of the limits of desired confidence interval.

Contents: 1, thinking and answering? Try to summarize the difference between 95% normal limits and 95% confidence

limit.(hints: from meaning ,formula, and application)

2 :Calculate and analysis of the data:

The total cholesterol (mmol/L) from 50 male adult between 40-50 as follows:

4.47 3.37 6.14 3.95 3.56 4.23 4.31 4.71 5.69 4.12 4.56 4.37 5.39 6.30 5.21 7.22 5.54 3.93 5.21 6.51

5.18 5.77 4.79 5.12 5.20 5.10 4.70 4.74 3.50 4.69 4.38 4.89 6.25 5.32 4.63 3.61 4.44 4.43 4.25 4.03

4.50 4.25 4.03 5.85 4.09 3.35 4.08 4.79 5.30 4.97

(1) Calculate the SE

(2) Estimate the population means 95% and 99% confidence limits, and compare the difference between them

and explain it.

3 if typhoid mortality from a sample of 100 is 20% and that of another sample of 100 it is 30%, find the

standard error of difference between two proportion.

Chapter 8 Design of experiment and sampling techniques in a survey.The teaching of theory (4 hours)Objective1.Know well the process of experimental study.

2.Master the essential factors and basic principles of design of experiment.

3.Master the methods of design of experiment, such as paired design, completely random design, randomized

block design, etc.

4.Master the sampling techniques in a survey.

5.understand the methods of Multistage sampling and Multiphase sampling.

Emphasis:1.Master the essential factors and basic principles of design of experiment.

2.Master the methods of design of experiment, such as paired design, completely random design, randomized

block design, etc.

3.Understand experimental error and how to reduce or eliminate experimental error.

4.Master the sampling techniques in a survey.

Difficulty:1.Three essential factors and four basic principles of design of experiment.

Page 12: BioStatistics

2.How to control experiment error.

Contents: 1.The process of design of experiment.

(1) Definition of the problem—Definition of the problem you intend to study.

(2) Aims and objective—Definition of the aims and objective of the study.

(3) Review of literature—Critically review the literature on the problem under study.

(4) Hypothesis—State your hypothesis or assumption about the problem.

(5) Plan of action—Prepare an overall plan or design of your study. Steps of the plan:

Definition of population under study; Selection of sample;Specifying the nature of study;Ruling out the

observer and instrument error;Recording of data;Work schedule.

2.Three important elements of design of experiment including study subjects, treatment (study factor),

experimental effect.

(1)Study subjects are the units that the treatment applied to.

(2)Treatment is the specific experimental condition which applied to the study subjects.

(3)Experimental effect is a measured characteristic after treatment applied to the study subjects.

3.Four principles of design of experiment including control, randomization, replication and equilibrium.

4.The common methods of design of experiment.

(1)Paired design: paired two study subjects according to the main factors those will not be probed in our study

then random allocation the two study subjects of every pair into control group and trail group.

(2)Completely random design—random allocation the homogeneous study subjects into multi- trail groups.

(3)Randomized block design—divided the study subjects into different blocks according to the main factors

those not be probed in our study then random allocation the study subjects of every block into trail groups.

5.The sampling techniques in a survey.

(1)Simple random sampling: A sampling procedure that assures that every object in the population has an equal

chance of being selected. The method is applicable when the population is small, homogeneous and readily

available.

(2)Systematic sampling: From the sampling frame, a starting point is chosen at random, and thereafter at

regular intervals,Suppose that the N units in the population are numbered 1 to N in some order. To select a

systematic sample of n units, if K≈N/n then every unit is selected commencing with a randomly chosen

number between 1 and k..

(3)Stratified sampling: the whole population is divided into several subgroups or strata and then units are

selected randomly from each stratum.

(4)Cluster sampling: the entire population is divided into groups, or clusters, and a randomly selected several

clusters from them, then all observations enveloped in the selected clusters will be our study objects.

(5)Multistage sampling: this method refers to the sampling procedures carried out in several stages using

random sampling techniques. This is employed in large country surveys. In the first stage, random numbers

of districts are chosen in all the states, followed by random numbers of villages and units respectively.

(6)Multiphase sampling: Part of the information is collected from the whole sample and part from the

subsample.

6.experimental error.

(1)Systematic errors are statistical fluctuations in the measured data due to the precision limitations of the

measurement device.

(2)Random errors are statistical fluctuations in the measured data due to some incidental or uncontrolled

factors.

Page 13: BioStatistics

The teaching of practice (2 hours)Emphasis:1.Know well the procedure of experimental study.

2.Master the essential factors and basic principles of design of experiment.

3.Master the methods of design of experiment, such as paired design, completely random design, randomized

block design, etc.

4.Master the sampling techniques in a survey.

Contents:1.what are the essential factors in an experimental study.

2.How to use design methods such as paired design, completely random design, randomized block design in

practice.

Exercise:1.Dose salted drinking water affect blood pressure of mice? Please point out the study subjects, treatment

(study factor), experimental effect in the experiment.

2.According above, if provided 20 mice and water containing 1% NaCl, how to design this experimental study?

Chapter 9 significance of difference in means(testing statistical hypothesis)The teaching of theory (6 hours)Objective1. know well the objective and principle of testing statistical hypothesis.

2. master the methods of testing statistical hypothesis under different designed data.

3.master the basic process of testing statistical hypothesis.

4. master the typeⅠ,type Ⅱerror and the meaning of power of a test.

5. master the Criteria of applying different methods of statistical test.

6. understand the association between CI and statistical test.

7. understand the normality test and variance equity test.

Emphasis:1. master the basic process of testing statistical hypothesis.

2. master the methods of testing statistical hypothesis for different designed data, such as t-test, Z-test, etc.

3.master the Criteria of applying different methods of statistical test.

4.master the typeⅠ,type Ⅱerror and the meaning of power of a test.

Difficulty:1.principle of testing statistical hypothesis.

2. the meaning of null hypothesis(H0)contents:1. What’s testing statistical hypotheses? The process or methods to infer the population parameter is same or

not according to the sample’s data.. using a example to show the objective of testing hypothesis. 2. The Principle of testing hypothesis: we suppose population parameters are same firstly (null hypotheses),then

using the sample’s data to calculate the testing statistic, and using it to judge the probability of null

hypotheses is true. If the probability is very large, we can accept the null hypotheses; if the probability is very

small(generally P<0.05 or 0.01),we can reject it.

wo hypothesis in testing: There are two probable reasons that make the difference between means of the

sample and the population:①the sample came from the known population, i.e, the difference is due to

chance.②the sample not came from the known population, but from another population, the difference is not

due to chance, but they are different from each other in fact.

Page 14: BioStatistics

Corresponding to the two reasons, we have two hypothesis:H0 (called as null hypothesis ):stating that

hypotheses of no difference between the sample’s mean and the known population mean is same. If the

hypotheses is true, we can infer that the present difference between the sample’s mean and the known

population mean is due to chance or sampling error. Another hypothesis is H1 (called as alternative

hypothesis) stating that the sample’s mean is different from the known parameter(the population mean is not

same). If H1 is true, we can infer that the present difference between two means is existing in fact, not only

due to sampling error.

4. the process of testing statistical hypothesis.

①establish hypotheses and the level of significance;②choose suitable method for testing, and calculate the

testing statistics, t-test、Z-test、F-test for quantitative data, 2-test, or Z-test for qualitative data, and so

on.;③Judge the P value and infer the conclusion. Use t-test as a example: if |t|t(,),P,reject

H0,accept H1,

At the level of , we can draw the conclusion that the difference is statistically significant; if |t|

<t(,),P>a, not reject H0 , At the level of , we can draw the conclusion that the difference is not

significant, only due to chance.

5 different methods of testing for different designed data.

① one single sample’s test ② Paired sample’s test ③Two independent samples’ test

When the size of samples are larger than 30,we should use Z-test for different designed data. we should pay

attention to select different methods or formula according to different conditions:Type of data, design of data,

the size of sample.

6. Criteria for Applying t-test: ①Random sample; ②Quantitative data; ③Variable normally distributed. ④

population Variances should have homogeneity in different samples. In generally if Sample size less than 30,

we often applying t-test, otherwise we can use z-test(for large samples).

7. the typeⅠ,type Ⅱerror and power of a test.

When hypotheses H0 is true, but it is rejected in our sample, we maybe commit typeⅠerror, If α=0.05,we

maybe commit this type of error 5 times out of 100 samples theoretically; When hypotheses H0 is false, but it is accepted in our sample, we maybe commit typeⅡerror. The probability of typeⅡerror is , it is unknown

usually. But in general, when α increase, will decrease; power of a test: If the difference of two means is

exit in fact , the ability of we can find out the difference through the testing at a level, the power of a test is

noted as 1- . if 1-β=0.9,we can have 90 times of conclusion that the difference is statistically significant

out of 100 times testing.

8. the association between CI and testing hypothesis: We can also use confidence interval to test the

significance of difference between means, but the confidence interval can not give us the concrete P value(see

it in book).

9. the normality test (see it in chapter14:computer software for analyzing of data)and variance equity test or

variance ratio test(see it page 151 in textbook).

Page 15: BioStatistics

the teaching of practice (3 hours)Emphasis:1.master the basic process of testing statistical hypothesis.

2. master the methods of testing statistical hypothesis for different designed data.

3.master the Criteria of applying different methods of statistical test.

Contents: 1.the process of testing statistical hypothesis.①establish hypotheses and the level of significance;②choose

suitable method for testing, and calculate the testing statistics;③Judge the P value and infer the conclusion.

2. different methods of testing for different designed data.

① one single sample’s test ② Paired sample’s test ③Two independent samples’ test

When the size of samples are larger than 30, Z-test will be applied. we should pay attention to select different

formula or methods or formula according to different conditions: Type of data, design of data, the size of

sample.

3. Criteria for Applying t-test: ①Random sample; ②Quantitative data; ③Variable normally distributed. ④

population Variances should have homogeneity in different samples. In generally if Sample size less than 30,

we often applying t-test, otherwise we can use z-test(for large samples).

4. review the basic conceptions in testing statistical hypothesis using examples.

When hypotheses H0 is true, but it is rejected in our sample, we maybe commit typeⅠerror; When hypotheses

H0 is false, but it is accepted in our sample, we maybe commit typeⅡerror. The probability of typeⅡerror is ,

is unknown usually. in general, when α increase, will decrease; power of a test: If the difference of two

means is exit in fact , the ability of we can find out the difference through the testing at a level, the power of a

test is noted as 1- .

5. doing exercises list as below:(1)A lots of study shows: the mean bi-pate diameter (BPD) of normal male neonate is 9.3 cm. Now a doctor

investigated 12 normal male neonates from a mountainous area, their BPD recorded as following: 9.95 9.33

9.49 9.00 10.09 9.15 9.52 9.33 9.16 9.37 9.11 9.27. Test whether BPD of male neonate in the mountainous area

is more than general neonate.

(2) In a clinical trial to assess the value of new tranquilliser on psychoneurotic patients with each patient being

given a week’s treatment with the drug, the drug was considered effective if it lowered anxiety score after

treatment, Test the efficacy of drug on the following results.

Before treatment: 22 18 17 19 22 12 14 11 19 7

After treatment: 19 11 14 17 23 11 15 19 11 8

(3)Blood glucose level of pigeons is known to be higher than that of rabbits. Prove it by applying proper

statistical test to the following data.

NoBlood glucose level per 100 ml

Pigeons . rabbits

1 200 145

2 186 125

3 176 100

Page 16: BioStatistics

4 184 112

5 170 127

6 172 139

7 170 151

8 163 140

9 176 159

10 173 132

Chapter 10 Analysis of Variance The teaching of theory (4 hours)Objective1.master the application of Analysis of Variance (ANOVA) or F test.

2.criteria for applying ANOVA.

2.know well the principle of analysis of variance.

3.master the process of analysis of variance.

4.know well the comparisons between any two means applying q test(Newman-Keuls methods)or Dunnett-t

test

5.understand transformations for variable when analyzing of data.

Emphasis:1.master applications of Analysis of Variance(ANOVA).

2.master criteria for applying Analysis of Variance.

3.master the process of analysis of variance.

4.know well the principle of analysis of variance.

Difficulty:Division of variance in data and the principle of Analysis of Variance

contents:1. applications of Analysis of Variance

(1) In general, we use F-test to compare three or more than three means, to find the difference among them

is significant or not.

(2) analysis the interaction between two factors or more than two factors.

(3) Applied to test regression equation.

(4) used for Variance Ratio test (P151).

2 criteria for applying Analysis of Variance.

(1)All the samples is independent;(2)All the samples came from normally distributed population.

(3) the population variance of the samples is equal, i.e.

The principle of ANOVA:

Divided the total variation into two parts of variation such as between-classes and within classes. What’s

total variation? Noted as SStotal,it is the sum of squared deviation of x from mean.

between-class variation noted as SSbetween,this part of variation affect the effect of treatment factors,it is

calculated by the sum of squared deviation of sample mean from total mean.

Page 17: BioStatistics

Within group variation noted as SSwithin,this part of variation affect the size of random error(individual

variation and measurement error) , it is sum of squared deviation of x from the mean of it’s class:

For express the variation of every part more reasonably, we should use the equation listed below:

If the treatment haven’t produced the effect,

If the treatment factor did product the effect ,

When the effect of treatment is larger, variation between classes is larger too,then F value will be larger than 1

more visible , exceed which limit the difference will be statistically

significant? We can use F table to infer the conclusion:

If F≥F (1,2),P≤,we can think that the treatment factors produced the effect;

if F<F (1,2),P>, we can think that the treatment factors did not product the effect.

4. the process of analysis of variance: (1)establish hypotheses and the level of significance.

H0: all the population mean is same; H1: all the population mean is not same completely.

(2) choose suitable method for testing, and calculate the testing statistics: apply F-test, we should calculate the

basal data firstly, such as , ,and so on, then calculate SS,MS of every part, finally work out F,

judge the size of P and draw the corresponding conclusion.

5.comparisons between any two means applying q test(Newman-Keuls methods)or Dunnett-t test for

many means from experiments group compared with that of the control group.

6. transformations for variable, when our data are not meet the demand for applying the methods mentioned

above, we can consider some certain transformation for original data, such as transformation of logarithm,

square root transformation, arcsine transformation, etc.

the teaching of practice (3 hours)

Emphasis:

1.master applications of Analysis of Variance(ANOVA).

Page 18: BioStatistics

2.master criteria for applying Analysis of Variance.

3.master the process of analysis of variance and q test or Dunnett’t test between two means.

Contents:1. applications of Analysis of Variance

(1) In general, we use F-test to compare three or more than three means, to find the difference among them

is significant or not.

(2) analysis the interaction between two factors or more than two factors.

(3) Applied to test regression equation.

(4) used for Variance Ratio test

2.criteria for applying analysis of variance.

(1)All the samples is independent;(2)All the samples came from normally distributed population.

(3) the population variance of the samples is equal.

3. the process of analysis of variance:

calculate the basal data firstly, such as , , then calculate SS, then MS, F according to the

equation listed below, judge the size of P, if we got P≤,we can draw the conclusion that the means are

different significantly, we can compare any two means using Newman-Keuls q test or Dunnett’t test.

4. do exercises: inoculate mouse with vaccine of typhoid and chincough after they were infected by

poliomyelitis, recorded the survival days of these mouse. The vaccine infect the survival days significantly?

typhoid chincough Control group

5 6 8

7 6 9

8 7 10

9 8 10

9 8 10

10 9 11

10 9 12

11 10 12

11 10 14

12 11 16

Chapter 11 Chi-square test (2 test)The teaching of theory (6 hours)Objective1. know well the characters of 2 distribution.

2. master the applications of 2 test and the principle of 2 test.

3. master the 2 test for completely random designed data of fourfold table and RC table and the condition of

applying them.

4. master the 2 test for paired designed data of fourfold table.

Page 19: BioStatistics

5. understand the method of exact probability and the method of 2 division.

Emphasis:1. master the applications of 2 test and the principle of 2 test.

2. master the 2 test for completely random designed data of fourfold table and RC table and the condition of

applying them.

3. master the 2 test for paired designed data of fourfold table.

Difficulty:the principle of 2 test and the characters of 2 distribution.

contents:1. 2 distribution is one probability of continuous random variable. It originates from standard normal

distribution, if Z2 corresponds to 2 distribution with df of 1, there are k independent standard normal

distributionsZ1,Z2,Z3,Z..Zk, and the Z12 Z22….will form series of 2 distribution curve with df of .

①2 is more than 0, its value varies from 0 to +∞; ②the shape of 2 curve depends on degree of freedom,

when is small, its curve illustrates positive abnormal, when is larger, its curve tenders to normal

distribution;③when =1, its distribution corresponding to standard normal distribution.

2. the applications of 2 test: ①find the difference between or among Proportions or rates from independent

groups. ②Association of two variables or attributes. ③Goodness of fit for one certain distribution.

3.master the principle of 2 test: when the study factors produce effect, the actual frequency(A) should be very

different from theoretical frequency(T),because calculation of T under the surpose of H0 is true),therefore, the

2 value should be very large, when the value larger than T ,we can infer that the two rates or proportion is

different significantly. contrariwise, when when the study factors have not produce effect, the actual

frequency(A) should be very near from theoretical frequency(T), the 2 value would be very small, when the

value less than T ,we can infer that the two rates or proportion is not different significantly.

4. 2 test for completely random designed data of fourfold table and RC table and their condition:

①Fourfold table data

The condition of applying the two formula above: N 40 and T5, and when 1T<5,we should use adjusted

formula for chi-square test:

when N<40 or T<1, we should calculate exact probability.

② RC table data

The condition of applying the formula above :①no cell T<1;②those cells with 1≤T≤5 not more than 1/5 of

total cell.

After H0 is rejected ,only showing that all the population rates is different in general, more details about any

two rates should use method of 2 division in further.

Page 20: BioStatistics

5. 2 test for paired designed data of fourfold table: The same object getting two different methods to check,

The final results can sorted as a crossed table.

. A method

B method + -- total

+ a b a+b

-- c d c+d

total a+c b+d a+b+c+d

for this kind of data, we can use the former 2 test formula to test the correlation of two variables or attributes.

the teaching of practice (4 hours)Emphasis:1.master the 2 test for completely random designed data of fourfold table and RC table and the condition of

applying them.

2. master the 2 test for paired designed data of fourfold table.

Contents: 1.the applications of 2 test: ①find the difference between or among Proportions or rates from independent

groups. ②Association of two variables or attributes. ③Goodness of fit for one certain distribution.

2 2 test for completely random designed data of fourfold table and RC table and their condition:

①Fourfold table data

The condition of applying the two formula above: N 40 and T5, and when 1T<5,we should use adjusted

formula for chi-square test:

when N<40 or T<1, we should calculate exact probability.

② RC table data

The condition of applying the formula above :①no cell T<1;②those cells with 1≤T≤5 not more than 1/5 of

total cell. After H0 is rejected ,only showing that all the population rates is different in general, more details

about any two rates should use method of 2 division in further.

3. 2 test for paired designed data of fourfold table: The same object getting two different methods to check,

The final results can sorted as a crossed table.

for this kind of data, we can use the former 2 test formula to test the correlation of two variables or attributes.

4.do exercise list as below:

(1)Some workers in a mineral powder plant have got occupational dermatitis. In order to keep their health,

Page 21: BioStatistics

there is new exposure suit made, to test its effectiveness, 15 workers are selected randomly to dress new

exposure suit, others still use former suit. The data are showed as follows, please compare whether there is

difference between these two groups.

Occupational dermatitis prevalence of those two type of exposure suit

Type of suitoccupational dermatitis

total Prevalence rate(%)Positive number Negative number

new 1 14 15 6.7

former 10 18 28 35.7

total 11 32 43 25.6

(2)Officers of FDA want to examine aflatoxin polluting of peanut from 3 areas. The results are provided as

follows, try to compare if aflatoxin polluting rate of these three areas are different?

Comparison of aflatoxin polluting rate of these three areas

areaNumber of samples

total Polluting rate(%)polluted No polluted

A area 6 23 29 79.3

B area 30 14 44 31.8

C area 8 3 11 27.3

Total 44 40 84 47.6

(3) Using two methods to check 120 patients of galactophore cancer, one method find out 60% out of all the

patents is positive, another method find out 50% is positive, and both two methods find out 35% is positive at

the same time, then find the association between the two methods? Which method is more effective?

Chapter 12 significance of difference in proportions of large samplesThe teaching of theory (3 hours)Objective1.master application and calculation of standard error of proportion(SEP).

2.master the methods of hypothesis testing for rates or proportions from large samples.

3.know well standard error of difference between two proportions, SE(P1-P2).

4.understand the association between chi-square test and Z test for comparing rates or proportions from large

samples

Emphasis:1.master application and calculation of standard error of proportion(SEP).

2.master the methods of hypothesis testing for rates or proportions from large samples

Difficulty:The meaning and calculation of standard error of difference between two proportions.

contents:1. the meaning and calculation of standard error of proportion(SEP)

Standard error of proportion may be defined as a unit that measures variation which occurs by chance in the

proportion of a character from sample to sample or from sample to population. It should be calculated as:

2.Applications of SEP:

Page 22: BioStatistics

(1) to find confidence limits of population proportion(P) when the sample proportion(p) is known. See it in

chapter 7.(2)to determine if a sample is drawn form the unknown population or not when the population

proportion P is known.(3) to find the standard error of difference between two proportions.(4) to find the size of

sample.

3. the meaning and calculation of standard error of difference between two proportions, SE(P1-P2).

The differences in the pairs of proportions or percentages of samples drawn from the same population are also

normally distributed as was seen in case of deference between two means.

P and Q are combined percentages of positive and negative characteristics in both the samples.

3. the methods of hypothesis testing for rates or proportions from large samples.

(1) hypothesis test for a proportion from one large sample’s and an known population proportion:

(2)hypothesis test for two proportions from two large independent samples:

In actual practice, we do not know the value of population proportion and we have only two samples. So we

have to substitute the value noticed in one sample in place of P and compare it with that of the other, the

assumptions are:①n1 and n2 are large ② samples are selected at random. The significance of difference is

found by normal deviate, Z test:

4. the association between chi-square test and Z test for comparing rates or proportions from large samples.

When we compare two rates or proportions from large samples, If the degree of freedom is 1, we can got the

correlation between chi-square test and Z test as 2 =(Z)2, this is determined by two distributions of chi-

square and Z distribution.

the teaching of practice (2 hours)Emphasis:1.master the calculation of standard error of proportion(SEP) and standard error of difference between two

proportions, SE(P1-P2).

2.master the methods of hypothesis testing for rates or proportions from large samples.

Contents: 1. hypothesis test for a proportion from one large sample’s and an known population proportion:

2. hypothesis test for two proportions from two large independent samples:

In actual practice, we do not know the value of population proportion and we have only two samples. So we

have to substitute the value noticed in one sample in place of P and compare it with that of the other, the

assumptions are:①n1 and n2 are large ② samples are selected at random. The significance of difference is

Page 23: BioStatistics

found by normal deviate, Z test:

3. do exercises:

(1) In a locality with 1000 unprotected population, 8 percent died of smallpox in a specified year. Of the

unprotected 250 were vaccinated and only 12 of them died in following year. The vaccinator claimed that

vaccination was responsible for reducing the mortality in the vaccinated population. Justify his claim

(2) In an epidemiological study of diabetes in urban and rural population of Ahmedabad district, the following

data was obtained. Compute the prevalence in the areas and determine if the results differ statistically.

You can applying chi-square test and Z test meanwhile, and testify the conclusion the association between the

two statistic values of testing.

Area Diabetes No diabetes Total

Rural 45 3450 3495

Urban 107 3409 3516

Total 152 6859 7011

Chapter 13 nonparametric statistics (6 hours)The teaching of theoryObjective1.master the conception and conditions of applying nonparametric methods in statistics.

2.master the methods of Rank Sum Test from different designed data.

3.know well the principle of Rank Sum Test.

4.understand the methods of comparison between any two groups

Emphasis:1.master the conception and condition of applying nonparametric methods in statistics.

2.master the methods of Rank Sum Test from different designed data.

Difficulty:the principle of Rank Sum Test.

contents:1.nonparametric : variables are not based on any assumption or distribution, we just only infer the population

distributions from samples are same or not significant in statistics.

Applying nonparametric test under the conditions listed below mainly:

(1) quantitative data not normally distributed.

(2) the distribution are not made certain.

(3).data from samples of ordinal categories.

(4) the population variance are not equal.

2. the principle of Rank Sum Test: when the two population is consistent, the rank sum from two samples

should be very near after all the values listed by ascending order. contrariwise, when the two population is not

same, the rank sum from two samples should be very different apparently, we can infer that the two means or

proportion is different significantly.

Page 24: BioStatistics

3. the methods of Rank Sum Test from different designed data.

(1) one single sample’s test:

calculate the difference between every value and known population median and list them by ascending order,

sum up the ranks of positive and negative difference respectively . record them as T+ and T- ,select the smaller

one as statistics of testing, then infer the conclusion using the boundary T table.

(2) Paired sample’s test:

calculate the difference between every paired data and list them by ascending order, sum up the ranks of

positive and negative difference respectively . record them as T+ and T- ,select the smaller one as statistics of

testing, then infer the conclusion using the boundary T table. Pay attention to calculate the mean rank for those

values who have the same rank.

(3)Two independent samples’ test:

list all the value from samples by ascending order, sum up the ranks of two groups respectively . record them as

T1 and T2 . if the size from two samples are same, select the rank sum of smaller sample as statistics of testing;

if the size from two samples are not same , select the smaller rank sum as statistics of testing, finally infer the

conclusion using the boundary T table. Pay attention to calculate the mean rank for those values who have the

same rank too.

(4) Three or more than three independent samples’ test:

list all the value from samples by ascending order, sum up the ranks of two groups respectively, pay attention to

calculate the mean rank for those values that have the same rank. Record them as R1 ,R2 , R3…Rn and so on.

Then calculating the H statistics of testing using the following equation, finally infer the conclusion using the

boundary H table.

4. comparison between any groups.

When we got the conclusion of H0 is rejected , only showing that all the population distributions are different

in general, more details about any two population should use method of comparison between any two samples

in further.

the teaching of practice (2 hours)Emphasis:1.master condition of applying nonparametric methods.

2.master the methods of Rank Sum Test from different designed data.

Contents: 1. the condition s of applying nonparametric test :(1) quantitative data not normally distributed;(2) the

distribution are not made certain;(3).data from samples of ordinal categories;(4) the population variance are not

equal, etc.

2. the methods of Rank Sum Test from different designed data:(1) one single sample’s test; (2) Paired sample’s

test;(3)Two independent samples’ test;(4) Three or more than three independent samples’ test. Go over the

Page 25: BioStatistics

process of analysis method, take notes for list the values by ascending order, and pay attention to calculate the

mean rank for those values that have the same rank.

3.finish following exercises:

(1) to find the efficacy of long running on the function of heart, 15 male students are sampled randomly,

measured their pulse rates before running, and measured again after long running during 5 months, the datum

are as below, find the long running effect the pulse rate significantly?

No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Before training 70 76 56 63 63 56 58 60 67 65 75 66 56 59 72

After training 48 54 60 64 48 55 54 45 50 48 56 48 62 49 50

(2) measured the following data from two groups, find the Pb in blood from two group of works is different or

not sifnificantly?

Worker in Pb environment: 0.82 0.37 0.97 1.21 1.64 2.08 2.13

Worker not in Pb environment:0.24 0.24 0.29 0.33 0.44 0.58 0.63 0.72 0.87 1.01

Chapter 14 linear correlation and regressionThe teaching of theory (6 hours)Objective:1.know well drawing the scatter diagram and the types of correlation.

2.master analysis of linear correlation and regression, and their process of analysis.

3.master the condition of linear correlation and regression.

4.know well the application of regression equation.

5.master the association between linear correlation and regression.

6.understand spearman’s rank order correlation and the types of linear correlation

Emphasis:1.master analysis of linear correlation and regression, and their process of analysis.

2.master the condition of linear correlation and regression.

3.master the association between linear correlation and regression.

Difficulty:the hypothesis of r and b.

contents:1. the process of linear correlation analysis.

(1) drawing scatter diagram.

the types of correlation: Perfect correlation; Moderate correlation; Absolutely no correlation

(2)the calculation of correlation coefficient r

(3)the hypothesis of r:First, make hypothesis, such as H0: ρ=0 H1: ρ≠0 α = 0.05

Then, calculate the statistics of t; regard r ( 0.5 ) as statistics v = n-2

calculate t (n ≤ 50) value or z value( n>50)

Page 26: BioStatistics

finally, according to boundary r value and infer the size of p value, draw the conclusion in statistics.

2. the process of linear regression analysis.

(1)According to the known data, draw a scatter diagram to display the relationship between these two sets of

results.

(2)Calculation of b and write out the regression equation.

(3) The hypothesis of regression equation.

the hypothesis of regression equation is same to that of correlation, so we can substitute the hypothesis of

correlation for that of regression equation.

(4)the condition of correlation and regression:

for linear correlation: continuous data or quantitative data; Associated variable are normally distributed

for linear regression: the variables Y must follow normal distribution, but X can be measured precisely and

controlled strictly.

3. the application of regression equation.

(1)Describing the dependent relationship between two variables.

(2)Making use of the Regression equation to make forecast.

4. association between linear correlation and regression.

(1)from the type of data, For regression, the variables Y must follow normal distribution, the X can be

measured precisely and controlled strictly. For correlation, the two variables(X, Y) must follow normal

distribution.

(2)form the applications, regression descriptive numerical relationship, but correlation only explain the degree

and direction of relationship.

5.spearman’s rank order correlation.

(1)The condition of spearman’s rank order correlation.

The two variables doesn’t follow normal distribution, or neither of the measurement the two variables belong to

order data.

(2)The calculation of coefficient of spearman’s rank order correlation.

the teaching of practice (3 hours)Emphasis:1.master the process of linear correlation analysis, especially, calculation of correlation coefficient of r and it’s

hypothesis testing.

2. master the process of linear regression analysis, especially, calculation of regression coefficient of b and a,

hypothesis testing of b.

Contents:1.the process of linear correlation analysis: drawing of scatter diagram, calculation of coefficient of correlation,

hypothesis testing of r.

2. the process of linear regression analysis: drawing of scatter diagram, calculation of coefficient of regression,

Page 27: BioStatistics

hypothesis testing of b; work out regression equation and draw then regression line.

3.introduction of using calculator to do linear correlation and regression.

The step of the use of calculator:①Select mode for calculating;②Clear away the memory of

calculator;③Input the data of X and Y together;④recall the answer such as r, a ,b,etc.

4 doing exercises listed below:

The data of the two variable weight (X) and vital capacity (Y) in 12 female college as following, make

analysis of correlation.

Weight(Kg) 42 42 46 46 46 50 50 50 52 52 58 58

vital capacity(L) 2.55 2.20 2.75 2.40 2.80 2.81 3.41 3.10 3.46 2.85 3.50 3.00

(1)draw a scatter diagram to display the relationship between these two sets of results.

(2)Calculate the coefficient of correlation.

(3)hypothesis of r.

(4)derive the regression equation of Y on X.

(5)hypothesis of b.

(6)draw the regression line.

Chapter 15 computer software for analyzing of dataThe teaching of theory (4 hours)objective 1.know well common software to analysis of data, such SAS(Statistics Analysis System),SPSS(Statistical

Package for the social Science),STATA(Statistics/Graphics/Data management),etc.

2. master the programs to deal with the data we obtain from our research work, locate the main results and draw

the conclusion.

3 know well the criteria to use different testing method of statistical hypothesis, and fin out the criteria of the

data applying SAS program by students themselves.

Emphasis: 1. The SAS programs to analysis of data we obtain from our research work, locate the main results and draw

the conclusion.

2. under different criteria, we should use different method to test the statistical hypothesis; under different

criteria we usually have different conlusion.

Difficulty:1.Write a right program and edit a program when it is wrong.

2.locate the main result and draw the conclusion correctly.

contents:1 program and main results for quantitative data

(1) Paired sample’s test ①normality test②t-test for paired designed data

(2) one sample’s test ①normality test②t-test for one sample data

(3)Two independent samples’ test: ①normality test ②equality test of Variances ③t-test for two independent

data

2. program and main results for qualitative data

(1) fourfold table data: Tmin and Continuity Adj. Chi-Square (2) RC table data: the criteria of applying RC 2 test

3 program and main results for linear correlation and regression

Page 28: BioStatistics

(1) locate coefficient of correlation and ‘P’ value for it’s hypothesis testing

(2) locate coefficient of regression and ‘P’ value for it’s hypothesis testing.

(3) write out the regression equation from the main results:

(4) compare the hypothesis results of coefficient of correlation and regression:

the teaching of practice (6 hours)

Emphasis:

Applying SAS programs to analysis of data, find out the main results and draw the conclusion in further by

students themselves.

Contents:

1.program and main results for quantitative data: (1) Paired sample’s test; (2) one sample’s test; (3)Two

independent samples’ test:

2.program and main results for qualitative data:(1) fourfold table data (2) RC table data

3.program and main results for linear correlation and regression:(1)locate coefficient of correlation and

regression; (2) compare the hypothesis testing for coefficient of correlation and regression. (3)write out the

regression equation:

4.do exercises:

(1) A lots of study shows: the mean bi-pate diameter (BPD) of normal male neonate is 9.3 cm. Now a doctor investigated 12 normal male neonates from a mountainous area, their BPD recorded as following: 9.95 9.33 9.49 9.00 10.09 9.15 9.52 9.33 9.16 9.37 9.11 9.27. Test whether BPD of male neonate in the mountainous area is more than general neonate.(2) In a nutritional study, 13 children were given a usual diet plus vitamins A and D tablets while the second

comparable group of 12 children was taking the usual diet. After one year, the gain in weight in pounds was

noted as given in table below, can we say that vitamins A and D were responsible for this difference?

Children on usual diet: 1 3 2 4 2 1 3 4 3 4 3 2 2 3

Children on vitamins: 5 3 4 3 2 6 3 2 3 6 7 5 3(3) The patients of lymphoma were randomly divided into two groups, respectively treated with single and compound medication. Get the number of patients getting better as following. Test the two rates are significantly different or not.

Groups Number of lymphoma patients treated

Numbers of getting better Not-getting better

Single medication 2 10

Compound medication 14 14

(4) Officers of FDA want to examine aflatoxin polluting of peanut from 3 areas. The results are provided as

follows, try to compare if aflatoxin polluting rate of these three areas are different?

Comparison of aflatoxin polluting rate of these three areas

areaNumber of samples

total Polluting rate(%)polluted No polluted

A area 6 23 29 79.3

B area 30 14 44 31.8

C area 8 3 11 27.3

Total 44 40 84 47.6

Page 29: BioStatistics

(5) During a laboratory experiment muscular contractions of a frog muscle were measured against different

doses of a given drug. The height of the curve was considered as the response to the drug. The observations

were as below.

Serial number of experiment

1 2 3 4 5

Dose of drug 0.3 0.4 0.6 0.8 0.9

Response to drug 54.0 59.0 60.0 65.0 70.0

a. Calculate correlation coefficient and its significance.b. Determine the regression coefficient ‘b’.c. Determine the expected value of Y for the given values of X using regression equation .