1 Basic descriptive epidemiology you can analyze your data! The GWU YES Center Evaluation Team and Manya Magnus, PhD, MPH The George Washington University School of Public Health and Health Services Department of Epidemiology and Biostatistics July 2007 The data are pouring in—use ‘em!
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Basic descriptive epidemiology
you can analyze your data!
The GWU YES Center Evaluation Team and
Manya Magnus, PhD, MPH
The George Washington University
School of Public Health and Health Services
Department of Epidemiology and Biostatistics
July 2007
The data
are
pouring
in—use
‘em!
2
Outline
• Research question development
– Research question matrix
• Data description
• Descriptive epidemiologic study designs
– Case reports
– Case series
– Ecologic studies
– Outbreak investigations (introduced)
3
You’ve seen this before!
4
Research questions
• What is a research question?
– A carefully articulate question that you are asking
– Examples• Is there an association between exposure to the dairy products at a buffet and vomiting?
• Is there an association between exposure to a new virus and severe respiratory distress?
• Is there an association between highly active antiretroviral treatment (HAART) for HIV and HIV viral load?
5
Null and alternative hypotheses (1)
• The research question drives development
of formalized null and alternative
statements
– The null hypothesis is the statement of no
difference
– The alternative hypothesis is generally the
formalized hypothesis underlying the research
question—what you think you will find
6
Null and alternative hypotheses (2)
• Examples– Research question: Is there an association between highly active antiretroviral treatment (HAART) for HIV and HIV viral load?
– H0: There is no association between receipt of HAART and HIV viral load or viral loads among patients on HAART = viral loads among patients not on HAART
– HA: There is an association between receipt of HAART and HIV viral load or viral loads among patients on HAART ≠ viral loads among patients not on HAART
7
Null and alternative hypotheses (3)
– Research question: Is there an association between
exposure to the quiche at the buffet and salmonella
food poisoning?
– H0: There is no association between receipt of quiche
and salmonella or salmonella rates among patients
exposed to the quiche = salmonella rates among
patients not exposed to the quiche
– HA: There is an association between receipt of quiche
and salmonella or salmonella rates among patients
exposed to the quiche ≠ salmonella rates among
patients not exposed to the quiche
8
Null and alternative hypotheses (4)
• These are two-sided hypotheses; one sided are also possible as follows– Research question: Is HAART associated with decreased HIV viral loads?
– H0: Receipt of HAART is associated with increased or equal HIV viral loads or viral loads among patients on HAART are > viral loads among patients not on HAART
– HA: Receipt of HAART is associated with decreased HIV viral loads or viral loads among patients on HAART are < viral loads among patients not on HAART
9
Null and alternative hypotheses (5)
• Note that in a one-sided H0and H
A, the
equal sign remains in the H0 (the <) not in
the HA
• The HAis the statement that indicates a
difference, whereas the null is the
statement of no difference
• There are statistical testing implications in
this choice
10
Variable selection (1)
• Essential to select appropriate variables to
evaluate and to operationalize (define)
them in advance
– Independent variable=predictor=x=exposure
– Dependent variable=outcome=y=disease
11
Variable selection (2)
• Examples
– Is use of a DEET-based insect repellent social
marketing campaign associated with West
Nile Virus?
– Is substance abuse status associated with
unmet need and immunocompromise?
12
Research question matrix (1)
• The research question matrix is a useful
tool to ensure that you are considering all
independent, dependent, confounder
(potential), and effect modifier (potential)
variables in your study
• Helps ensure that you are able to answer
your research question in the end
13
Research question matrix (2)
14
Steps in preparing a study
15
Background
• Descriptive epidemiology– Describing public health events—including those concerning infectious disease—with respect to person, place, and time
– Provides information in a format which may be interpreted
– Suggests studies that will eventually test hypotheses about the associations between exposures (e.g., organisms) and outcomes (e.g., infectious disease); also known as hypothesis generating
– One of the first, and still one of the most important, facets and methods of epidemiologic inquiry
16
Basic tenets of descriptive
epidemiology (1)
• Descriptive epidemiology is the foundation
for all subsequent analytic methods
– No matter what analytic designs are later
selected, a solid description of the data at
hand is required and is always a first step
– Description must be systematic, however; just
because there is no hypothesis being tested
does not imply that rigor in methods is not
required—it still is
17
Example:
Case reports and HIV/AIDS (2)
• Noticing differences in person, place, and/or
time characteristics is often the first step towards
stopping disease; recognition is crucial
• The first case reports of PCP pneumonia in 1981
ushered in the era of HIV/AIDS
– Had it not been for the keen observation skills of
clinical providers, noticing differences in
characteristics of disease, it could have taken longer
to identify that a problem was present than it did
18
Example:
Case reports and HIV/AIDS (3)
A snapshot of the
original MMWR report of
the first 5 documented
cases of PCP among
what later became
known to be HIV+ men
19
Case reports and HIV/AIDS (4)
• Looking at the profile—the description—of the cases is
what shed light on the possibility of there being a
problem
– PCP pneumonia usually among elderly and/or those severely
immunocompromised—seldom among healthy, young
individuals
– Cluster of relatively rare type of pneumonia
– Behavioral clues
– Similar presentation vis a vis other comorbidities (e.g., CMV)
• This approach, looking at individual case reports—is one
of the descriptive epidemiologic designs
20
Descriptive epidemiologic methods (1)
• Case reports are just one of the methods used in
descriptive epidemiology
• Descriptive methods share the same goals
– Describe and evaluate trends in health outcomes or
behaviors
– Provide data for planning needs
– Suggest future research needs
21
Descriptive epidemiologic methods (2)
• Comprehensive description of the person, place,
and time of the events is the hallmark of
descriptive epidemiology
• What are these exactly?
22
Descriptive epidemiologic methods (3)
• Person
– Characteristics of individuals affected by the disease
of interest
• Examples of person
– Who are they?
– Gender?
– Race? Ethnicity?
– Age?
– Religion?
– Educational attainment?
23
Descriptive epidemiologic methods (4)
– What unites the cases?
– What do they have in common? What do they have
that differs from non-cases or from each other?
– Health status, including specific co-morbidities, places
access care, treatments, etc.
– Community
– Socioeconomic status
– Behavior, including licit, illicit
– Many, many more
24
Descriptive epidemiologic methods (5)
• Place– Characteristics of where the cases took place
• Examples of place– Country, state, locality, zip code, census tract?
– One establishment? What type? School?
– Where were the vents in relation too each other, proximal? Isolation?
– Urban, suburban, rural?
– Where was the disease contracted vs. where diagnosed (i.e., air travel problem)
• Time– Characteristics of when the cases took place
• Examples of time– When did the events first occur? When did the symptoms first occur?
– What was happening around that time?
– Month day year?
– Day of week?
– Time of day?
– Sun/dark?
– Relation to other events, such as sewage release? Toxic emissions? Summer break? Holiday?
– Geospatial and temporal clustering
26
Descriptive epidemiologic methods (7)
• Gathering data systematically is essential
– Haphazard person, place, and time data collection is
seldom fruitful
– Instruments are frequently used to direct interviewing
and data collection process
– These can be essential
27
Descriptive epidemiologic methods (8)
• Once data have been collected systematically,
they need to be described
• Statistical packages as well as qualitative
analysis software can be used to describe data
obtained
• Additional coursework will be necessary to do
this, but the concepts are straightforward
– We want to comprehensively describe the data in as
many ways as possible to truly understand what is
going on with respect to person, place, and time
28
Steps in describing data (1)
• Calculate the frequencies of categorical
variables
– This will inform you of how the sample is distributed
among different categories of independent variables
– Summary data of your outcomes are especially
important because they reveal the proportion of
missing data, which can impact your study
enormously
– See how data are communicated in reports and
journals you usually read
29
Steps in describing data (2)
• Calculate measures of central tendency (mean,
median, mode) and dispersion (standard
deviation or variance) for continuous variables
– How are variables distributed? Do they follow a
normal distribution (that is, like a bell curve)? Or are
they skewed left or right? Are the tails heavy or
skinny?
– This can be assessed visually to some degree, and
tested quantitatively as well
30
Steps in describing data (3)
• Plot the continuous data one variable at a time,
using box plots, stem and leaf plots, or other
graphic displays at your disposal
– This describes the data variable by variable
– In addition, it helps identify where there are out of
range values or missing values, and gives a general
description of your continuous data
31
Steps in describing data (4)
Stem-and-leaf plot for age (age of index)• 1f | 45555
• 1s | 6667777777
• 1. | 88888888888888888899999999999999
• 2* | 0000000000000000000000111111111111111
• 2t | 2222222222222222222222233333333
• 2f | 4444444444444444444444455555555555555
• 2s | 66666666666666667777777777
• 2. | 88888888899999999999
• 3* | 000000000000111111111111
• 3t | 22222222222222222333333333
• 3f | 4444444444455555555555555555555
• 3s | 66666666677777777777
• 3. | 88888888888888889999999
• 4* | 000001111111111111
• 4t | 222222233333333333333
• 4f | 44444444555555
• 4s | 666777
• 4. | 8888999
• 5* | 0000000011111A histogram
32
Steps in describing data (5)
• Plot the data in a scatterplot, placing the dependent
(outcome) variable on the Y-axis (the vertical axis) and
the independent (potential predictor) variable on the X-
axis (the horizontal axis)
– What do the data look like?
– How do they relate to each other?
– Is there any discernable pattern or relationship between the
independent variables and dependent variable under study? Is
there any discernable pattern between independent variables?
– (Remember that we are still looking for “clues”; not seeing a
pattern does not mean there is not one, just as seeing one does
not mean there is one!)
33
Steps in describing data (6)
• A scatterplot
34
Steps in describing data (7)
• Look at outliers, datapoints that stand out from the rest of the distribution. – For continuous variables, this can be quantified by looking at datapoints that extend beyond a set level (e.g., two standard deviation above or below the mean).
– What are they? Get to know each of these outliers. Investigate them. Are they data entry errors? Documentation problems? Or are they true?
– There are a number of techniques available to diagnose outliers and treat them appropriately
– However, sometimes, if the data are correct, the outlier can be a substantive “clue” towards figuring out the problem at hand. Each might represent an acute case, a pronounced relationship, or something “different” that can be extremely useful
35
Steps in describing data (5a)
• A box and whisker plotThese dots are out of range
values, to help see outliers
Shows upper 75th percentile,
median (50th percentile), and
25th percentile
Shows upper and
lower values
Steps in describing data (8)
A box-and-whisker plot
36
Specific descriptive methods (1)
• There are several basic descriptive methods
– Case reports
– Case series
– Ecologic studies
– Outbreak investigations
– Descriptive data emerging from cross-sectional
studies (to be covered in a later lecture)
37
Specific descriptive methods (2)
• Case reports
– Identification of “unusual” cases that are shared with the medical
or public health communities
• Generally consist of one or more of these hallmark
characteristics
– A known disease in an unusual population
– Presentation of a previously unrecognized syndrome or disease
– Presentation of a disease more or less severe than previously
seen, or with a different characteristic than noted before
– Presentation of a disease that was transmitted in a mode not
generally seen or suspected
38
Specific descriptive methods (3)
• Case series
– A series of case reports put together in a systematic
(or semi-systematic) fashion
– May include some denominator data though will be
historic in nature, and it may not have been
systematically collected
39
Specific descriptive methods (4)
• Ecologic studies
– These differ from other descriptive (and analytic)
studies in that they utilize aggregate data
– These are data collected on the group level, such as
• Means (averages) per state
• Proportions per city
• And so forth
40
Specific descriptive methods (5)
• Ecologic studies have one major limitation– The ecologic fallacy
• This results because we do not know if the association seen on the aggregate level holds on the individual level
• Still, extremely important and useful study design– Hypothesis generating
– Make comparisons between large groups, for example, continents, which would otherwise be impossible
– Can be done with minimal resources
41
Descriptive methods (6)
• Descriptive studies are just the beginning
• Once foundational information is gained about the
disease under study, public health action is required
– That may include additional studies
• Observational, case-control or cohort studies
• Intervention, experimental designs
– Or working with community of other scientists towards greater
understanding of the organism in question
– Or working with the public health community to prevent future
spread of the disease—engaging in preventive measures