1 Introduction to Introduction to biostatistics biostatistics Lecture plan Lecture plan 1. 1. Basics Basics 2. 2. Variable types Variable types 3. 3. Descriptive statistics Descriptive statistics : : Categorical data Categorical data Numerical data Numerical data 4. 4. I I nferential statistics nferential statistics Confidence Confidence interval interval s s Hipot Hipot heses testing heses testing
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
11
Introduction to Introduction to biostatisticsbiostatisticsLecture planLecture plan
DEFINITIONSDEFINITIONSSTATISTISTATISTICSCS can mean can mean 2 things:2 things:- the numbers we get when we measure and - the numbers we get when we measure and count things (data)count things (data)- a collection of procedures for describing and - a collection of procedures for describing and anlysing data.anlysing data.
BIOSTATISTIBIOSTATISTICSCS – – application of statistics application of statistics in nature sciences, when biomedical and in nature sciences, when biomedical and problems are analysed.problems are analysed.
33
Why do we need statistics?Why do we need statistics?
????
44
Basic parts of Basic parts of statististatisticcs:s:
Gaphical presentation of Gaphical presentation of frequenciesfrequencies
2020
NormalNormal distributions distributions Most of them around centerMost of them around center Less above and lower central Less above and lower central
values, approximately the values, approximately the same proportionssame proportions
Most often Gaussian Most often Gaussian distributiondistribution
2121
Not normal distributionsNot normal distributions
More observations in one part.More observations in one part.
2222Asymmetrical distribution
2323
How would you How would you describe/present your describe/present your
respondents if the data are respondents if the data are numeric?numeric?
2 groups of measures2 groups of measures::
1.1. Central tendency (central Central tendency (central value, average)value, average)
2.2. VarianceVariance
2424
MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY
MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY
AritArithhmetimetic meanc mean (X, (X, μμ))
2626
MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY
MedianMedian (Me) – (Me) – the middle value or 5the middle value or 500thth procentilprocentilee ( (the value of the observationthe value of the observation, , that divides the sorted datathat divides the sorted data in almost in almost equal parts)equal parts)..It is found this wayIt is found this way
When When n n oddodd: median: median is the middle observation is the middle observationWhen When n n eveneven: median: median is the average of values is the average of values of two middle observationsof two middle observations
2
1n
2727
MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY
ModModee (Mo) – (Mo) – the most common the most common valuesvalues Can be more than one modeCan be more than one mode
2828
MEASURES OF CENTRAL MEASURES OF CENTRAL TENDENCYTENDENCY
Quartiles Quartiles (Q(Q11, , QQ22, , QQ33, , QQ44) ) – – sample sample size is divided into 4 equal parts size is divided into 4 equal parts getting 25% of observations in each getting 25% of observations in each of them.of them.
2929
Is it enough measure of Is it enough measure of central tendency to central tendency to
describe respondents?describe respondents?
3030
MEASURES OF VARIANCEMEASURES OF VARIANCE
Min and maxMin and max RangeRange StandarStandard deviationd deviation – – sqrt of sqrt of
InterInterquartile range quartile range (Q(Q3-Q1 or 3-Q1 or 75%-25%) IQRT75%-25%) IQRT
3131
What measures are to be used for What measures are to be used for sample description?sample description?
If distribution is NORMALIf distribution is NORMAL MeanMean Variance Variance ((oror standarstandard deviationd deviation))
If distribution is NOT NORMALIf distribution is NOT NORMAL MedianMedian IQRT or min/maxIQRT or min/max
Those measures are used also with numeric ordinal dataThose measures are used also with numeric ordinal data
3232
X, Mo, Me
Mean~Mean~MedianMedian~~ModModee,,SD ir SD ir empyric ruleempyric rule
3333
EMPEMPYRICAL RULEYRICAL RULE
Number of observationsNumber of observations (%) 1, 2 ir (%) 1, 2 ir 2.5 SD 2.5 SD from mean if distribution is from mean if distribution is normalnormal
Interval Interval where the “true” value where the “true” value most likely could occur.most likely could occur.
4141
The variance of samples The variance of samples and their measuresand their measures
μ, σ, p0
X1, SD1; p1
X2, SD2; p2X3, SD3; p3
X4; SD4; p4
X
4242
The variance of samples and The variance of samples and confidence confidence intervalintervalss
μ, p0
4343
Confidence intervalConfidence interval
Statistical definition:Statistical definition:
If the study was carried out 100 times, If the study was carried out 100 times, 100 100 reresultssults ir ir 100 C100 CII were got, 95 were got, 95 times of 100times of 100 the the “true” value will be in that interval. But it will “true” value will be in that interval. But it will not appear in that interval 5 times of 100.not appear in that interval 5 times of 100.
4444
Confidence Confidence intervalintervalss((generalgeneral, , most common most common
calculationcalculation))
95% CI 95% CI :: X X ±± 1.96 1.96 SE SE XXminmin;; X Xmaxmax
Note: for normal distribution, when n is largeNote: for normal distribution, when n is large
95% CI 95% CI :: pp ±± 1.96 1.96 SESE ppminmin ;; p pmaxmax
Note: whenNote: when p ir p ir 1-p > 5/n1-p > 5/n
4545
StandarStandard errord error (SE) (SE)
Numeric dataNumeric data
((X X ))Categorical dataCategorical data
(p)(p)
4646
Width of confidence inervalWidth of confidence inerval
depends ondepends on::
a)a) Sample sizeSample size;;
b)b) Confidence levelConfidence level ( (guaranty - usually 95%, guaranty - usually 95%, but available any %)but available any %);;
TesTestt for for P P valuevalue (t-test, (t-test, χχ22 , etc, etc..).).
P P value is the probability to get the value is the probability to get the difference (association)difference (association),, if the null if the null hypothesis is truehypothesis is true..
OROR P P value is the probability to get the difference value is the probability to get the difference (association) due to chance alone, when the null (association) due to chance alone, when the null hypothesis is truehypothesis is true..
HipotHipotheses testingheses testing
4949
Statistical agreementsStatistical agreements
If If P<0P<0.05, we say, that results can’t .05, we say, that results can’t be explained by chance alone, be explained by chance alone, therefore we reject Htherefore we reject H00 and accept Hand accept HAA..
If If PP≥≥00.05, we say.05, we say, , that found that found difference can be due to chance difference can be due to chance alone, therefore we don’t reject Halone, therefore we don’t reject H0.0.
5050
TestTestssTest depends onTest depends on
Study designStudy design,, Variable typeVariable type distribution,distribution, Number of groups, etc.Number of groups, etc.
Tests (probability distributions): z test t test (one sample, two independent, paired) Χ2 (+ trend) F test Fisher exact test Mann-Whitney Wilcoxon and others.
5151
P value tells, if there is statistically P value tells, if there is statistically significant difference (association).significant difference (association).
CI gives interval where true value can CI gives interval where true value can be.be.
Neither P value, nor CNeither P value, nor CI I give other give other explanations of the result (bias and explanations of the result (bias and confounding). confounding).
Neither P value, nor CNeither P value, nor CI I tell anything tell anything about the biological, clinical or public about the biological, clinical or public health meaning of the resultshealth meaning of the results..