L1.1 Biostatistics II (PUBH5769) TOPIC 1 UNIT OVERVIEW AND INTRODUCTORY MATERIAL 1.1 UNIT OVERVIEW This course covers biostatistical methods commonly used in epidemiological and clinical research. It focuses on modern regressio n (and a few classical) methods for• Quanti tati ve outcomes • Bi nary ou tc omes • Count (number of e vents) out comes • Time-t o- event out comes. It considers and describes the application of these methods to data from • cr os s- se ctional , • case-control, • cohort a nd • cl in ic al st ud ie s The emphasis is on • How to int ell ige ntl y us e t hese methods • How to use SAS to do the cal cul ati ons • How t o i nt er pr et the r esul ts And not on •The underlying statistical theory •Memorising formulae
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• A knowledge of basic statistical methods and concepts(as taught in introductory biostatistics/statistics coursessuch as PUBH8753/4401 Biostatistics 1)
• A basic familiarity with epidemiological/clinical study designs(copy of a chapter from a book is on LMS)
• Familiarity with hand-held calculatorsYou must have your own calculator and know how to use it. Also, it must have an ‘approved’ UWA sticker to take into exam.
• Familiarity with computing in a Windows environment
• Experience with at least one statistical analysis package(such as SPSS).
SPSS
Learning outcomes
• Understand and be able to apply standard biostatisticalmethods commonly used in epidemiological and clinicalresearch including
ANOVA and multiple linear regression2 x K frequency table methods, logistic regressionIncidence rates and Poisson regression
Kaplan-Meier survival curves and Cox regression
• Be able to use the package SAS to carry out statisticalanalyses of epidemiological/clinical data.
• Understand the statistical content of articles inepidemiological/clinical literature.
The Course Reader contains a self-explanatory Introduction to SAS. Sign up for anIntroduction to SAS Session with Mark Divitini if you want help with gettingstarted.
The expectation is that you do the SAS computing activities on your own (orpreferably with a class mate) at a place and time that suits you. Some of theProblems in the Course reader involve using SAS.
SAS is available on the computers in theSPH Postgrad Computer Lab (Rm 1.27, Clifton St Building) for SPH students onlySPH Main Computer Lab on the Nedlands Campus next to cafeteria (all students)
Bring a USB flash drive to save and keep your work!
If you use SAS elsewhere, download a copy of the files (datasets and programs) from
LMS or email Mark Divitini and he will send them to you.UWA has a site license for SAS and all enrolled students can get a copy. See LMS
for more details and forms.
If you need help with SAS during the semester, email Mark Divitini with yourquestions or to arrange a time to see him.
SAS
Assessment
• Three assignments each 20% (Due Sept 4, Oct 9, Nov 1)
• Final (written) examination 40% (Exam period Nov 9-23)
• Assignments involve data analysis using hand-calculation as well asSAS, providing written answers to questions that test understanding,and reviewing articles from the literature.
Hand assignments (with cover sheet) to SPH Reception.There is a penalty for (unapproved) late assignments.
• The final exam involves providing written answers to problems. Many
will involve use of a hand-calculator. Some will include SAS output.The exam is “open book”.
• All supplementary unit materials are placed on LMS(including assignments)
• The recording of the lecture should be available soon after thelecture has finished (same day).
• All students enrolled for credit (i.e. doing assessment) areautomatically given permission to access the LMS unit materials.Others who need access should contact lecturer (and provide anemail address).
Study plan and workload• Study each topic each week by reviewing the lecture notes, reading
appropriate sections of the books, doing the related SAS computing,and doing the problems in the course reader.
• Attempt the problems before coming to the tutorial.
• Begin assignments early, don’t leave until last few days.
• You should spend about 9 hours per week on this course
1 – 2 = Difference in log(odds of disease) for control vs treated groups
= ratio of odds of disease for control vs treated groups
= 2.0 ie odds of disease is twice as big in control vs treated
Numbers of events outcome variable
Example: number of deaths during follow-up of groupSummary measure: incidence rateEffect measure: incidence rate ratioRegression method : Poisson regression
For grouped data from follow-up studies, a common summarystatistic is the incidence rate.
risk attime- persontotal
up-followduringeventsnewof numberrateIncidence
eg. Incidence rate for stroke in women aged 50-59 years is 0.002 perperson per year or 2 per 1000 persons per year.
Example: number of deaths during follow-up of groupSummary measure: incidence rateEffect measure: incidence rate ratioRegression method : Poisson regression
Examples of Poisson regression models
log (rate) = + 1 Age
1 = Difference in log (rate) if Age increases by 1 year
CIs and p-values are calculated from the sampling distributions of theestimator and test statistic.The 95% CI indicates how well we have estimated a populationparameter. Narrower CIs indicate greater accuracy.P-values measure the amount of evidence in the sample data against
the null hypothesis. The closer the p-value to zero, the greater theevidence.
Most of the commonly used CI and p-value formulae are based on theassumption that the estimator (or a transformation of it) has a Normalsampling distribution. For moderate and large sample sizes, this isapproximately true.
General formulaeThe general form of these approximate formulae for a single parameter is
95% CI estimate (Z95 or t95) x SE with Z95=1.96
SE
valueedhypothesis-estimate statisticTest
With p-value from Z (or t) distribution
Most (but not all) approximate formulae come from asymptotic likelihood theory
which provides an estimator (maximum likelihood estimator MLE)
an approximate standard error (and hence a 95% CI)
and three approximate p-value formulae (Wald, Score, Likelihood ratio)
The p-value method described above is the Wald p-value.
When we are testing hypotheses involving more than one p arameter , the teststatistic often has an asymptotic chi-squared distribution.When we are comparing variances, the test statistic often has an F distribution.
These are approximate, the exact CI and p-value are
based on Poisson calculations.
Statistical inference formulae in other situations
Formulae for estimation and testing of “effect” measures insimple situations such as the comparison of two groups arereadily available (and can also be done on hand-calculator).
Comparing two means via difference (see Section 2.1)Comparing two proportions via odds ratio (see section 3.1)
Comparing two rates via rate ratio (see section 4.1)Comparing two survival curves (see section 4.3)
In more complex situations the formulae are more complicatedand we must rely on computer programs to obtain the estimate,its SE, its 95% CI and the test statistic value.
DYSPNOEA get shortness of breath0 = never, 1 = hurrying or walking up a hill2 = 1 and when walking with people of same age on level ground3 = 1, 2 and when walking at my own pace on level ground
EXERCISE number of days exercise per week
This dataset relates to baseline (in 1981) and mortality follow-updata (to 1995) for the 1552 participants aged 40-69 years in the1981 Busselton Health survey.
Busselton 1981 survey data (more)
FEV forced expiratory volume in 1 second in L
FVC forced vital capacity in L
HAYFEVER ever had hayfever 0 = no, 1 = yes
HEIGHT height in m
MARITAL 1 = single, 2 = divorce, widowed, separated, 3 = married or de facto
MYOCARD ever had myocardial infarction 0 = no, 1 = yes