Some Information on Case Control, Cohort and Survival Studies
By
RAMNATH TAKIAR
Ex-scientist G
National Cancer Registry Programme
(National Centre for Disease Informatics and Research, Bangalore)
Bangalore
NOVEMBER 2014
A cohort study is a form of longitudinal or a type of
observational study used in medicine, social sciences
and ecology.
A cohort is a group of people who share a common
characteristic or an experience within a defined
period (e.g., are born, are exposed to a drug or a
vaccine, etc.). Thus a group of people who were born
in a particular year (period), say 1948 (1945-50),
form a birth cohort.
Cohort studies
In a Cohort study usually the population selected isfree from certain disease or health condition. Thepopulation so selected is followed up for a specifiedperiod of time and information is obtained todetermine which subjects either have a particularcharacteristics (e.g., blood group A) that is suspectedof being related to the development of the diseaseunder investigation or have been exposed to apossible etiological agent (e.g., cigarette smoking,alcohol drinking). The entire study population is thenfollowed up in time and the incidence of the disease inthe exposed individuals is compared with theincidence in those not exposed.
Healthy Population
Exposed (P1) Non-Exposed (P2)
Diseased Non-diseased Diseased Non-diseased
(N1-P1) (N2) (N2-P2)(N1)
R1 = (N1/P1) R2 = (N2/P2)
IF R1> R2 => Exposure increases the disease risk IF R1< R2 => Exposure decreases the disease risk
IF R1= R2 == > Exposure does not increase the risk of occurrence of disease
Cohort Study :
OutcomeExposure
Yes No
Yes a b
No c d
Total a+c b+d
Risk in exposed group (P1) = a/(a+c)
Risk in Unexposed group (P2) = b/(b+d)
Risk ratio = P1/P2 ;
Risk Difference = P1-P2
Outcome (Cancer)
Exposure (Smoking)Total Yes
(Ever)No (Never)
Yes 392 323 715
No 16953 13114 30067
Total 17345 13437 30782
Risk in exposed group (P1) = 0.023 = 2.3%
Risk in Unexposed group (P2) = 0.0240 = 2.4%
Risk ratio = 0.023/0.0240 = 0.946 ; NS
Risk Difference = 0.023 – 0.024 = - 0.001 = - 0.1%Source: Karunagappally Cohort study (1990-97) –published in Health Physics society 2009
Study population: 359619; 10 year duration
Outcome(Lung
Cancer)
Exposure (Smoking)Total
Ever Never
Yes 12 14 26
No 11560 19031 30591
Total 11572 19045 30617
Risk in exposed group (P1) = 0.010 = 1.0% (103)*
Risk in Unexposed group (P2) = 0.007 = 0.7% (73)*
Risk ratio = 0.010/0.007 = 1.41; NS
Risk Difference = 0.01 – 0.007 = - 0.003 = - 0.3%Source: Karunagappally Cohort study (1990-97) –published in Int J Cancer 2008
Study population: 359619; 8 year duration (1997-2004)
* Represents Crude Rate
If study subjects have unequal follow-up periods ,
this must be taken into account in the analysis.
Follow-up durations may differ markedly if subjects
were recruited into the study population over a
relatively long period of time, or if some are lost to
follow-up during the course of the study. One way
of handling variable follow-up periods is to
calculate rates which use person years at risk as the
denominator.
9
Calculation of Person years for Incidence rate
SUBJECT 2001 2002 2003 2004 2005 2006 2007 Time at risk
A 2
B + 4
C + 4
D 5
E 5
Total years at risk 20
Incidence = 2 person for 20 person years
= 10 /100 person years of observation
Hypothetical group of 5 persons with certain risk factor were followed up for 7 years, two of whom developed the disease of interest.
Here, we are dealing with the time for which the population is exposed.
10
.
Person years:
It is the cumulative sum of time periods for which the
population is exposed to the risk of developing certain disease
condition during the specified period of time.
In above example, you can notice two things:
1. Registration of cases is not in same year.
2. Follow up period differ from one subject to another.
3. Loss to follow-up exist in the study.
OutcomeExposure
Yes No
Cases a bPerson-time at
risk x y
Rate in exposed group (r1) = a/x
Rate in unexposed group (r2) = b/y
Rate ratio = r1/r2;
Rate difference = r1-r2
Oral contraceptive use
Ever Never Total
Cases 204 240 444
Person years at risk 94029 128528 222257
Rate per 100000 pyrs 217 187 199
Rate Ratio = Relative Risk = 217/187 =1.16
95% confidence for the rate ratio = 0.96-1.40
Rate difference = 217-187 = 30 per 100,000 years
Example: Incidence of Breast cancer amongnurses aged 45-49 years at the time of their entryinto the cohort was examined in relation to use oforal contraceptives.
Source: IARC 1999: Cancer Epidemiology : Principles and Methods ; Page 179
The Nurses' Health Study, established in 1976 by Dr. FrankSpeizer, and the Nurses' Health Study II, established in1989 by Dr. Walter Willett, are the most definitive long-term epidemiological studies conducted to date onwomen's health. The US study has followed 121,700female registered nurses since 1976 initially aged 30 to 55and 116,000 female nurses since 1989 registered to assessrisk factors for cancer and cardiovascular disease. Thestudies are among the largest investigations into riskfactors for major chronic diseases in women everconducted.Over the time additional questions have been added, mostnotably the dietary assessment added in 1980. Deaths,usually reported by kin or by postal authorities, werefollowed up.
In a cohort study carried out among 78140 womenaged 30-84 years in Karunagappally, Kerala, anattempt was made to examine the relationshipbetween chewing habits and development of oralcancers.
Baseline information collected was related tolifestyle including tobacco chewing and socio-economic factors during the period 1990-97.
By the end of 2005, 92 oral cancers wereidentified.
Total person years covered were 921051 years.
Tobacco Chewing and Oral Cancer among women
Outcome Chewing habit
Current Former Never
Oral Cancer 53 14 25
Persons years at risk 183749 26804 706872
Incidence per 100000
28.8 52.2 3.5
Relative Risk 8.2 14.8 1
95% CI 5.1-13.1 7.7-28.4
Source: PA Jayalekshmi, P Gangadharan, S Akiba et al. : Tobacco Chewing and Female Oral Cavity cancer risk in Karunagappally cohort, India; British J of Cancer (2009) –Table 3.
Calculation of Confidence Limit for Relative Risk
RR ln(RR)SE of ln(RR) FORMULA
SE
8.2 2.09869 [(1/25)+(1/53)]^0.5 0.243
LL UL
For ln(RR) 2.09- (1.96*0.243) =
1.622.09+(1.96*0.243) =
2.57
anti log 5.07 13.12
Major steps followed in a Cohort Study:
1. Definition of the objectives
2. Choice of the study population
3. Choice of the comparison group
4. Measurement of exposure (s)
5. Measurement of outcome(s)
6. Follow-up of the subjects
7. Follow up periods
8. Analysis
Definition of the objective: It is essential that a clear
hypothesis is formulated before the start of a
cohort study. This should include a clear definition
of exposure(s) and outcome(s) of interest.
1. Choice of the study population: The choice of the
study population mainly depends on the specific
hypothesis under investigation. The cohort chosen
may be a general population group such as the
residents of a community or more narrowly defined
population that can be readily identified and
followed up.
3. Choice of the comparison group: The selection of
the unexposed is the most critical aspect in the design
of a cohort study . The unexposed group should be as
similar as possible to the exposed group with respect
to the distribution of all factors that may be related to
the outcome(s) of interest except the exposure under
investigation.
Two main type of comparison group may be used in a
cohort study: internal and external.
General population cohorts tend to be heterogeneous
with respect to many exposures and hence their
members can be classified into different exposure
categories.
In such circumstances, an internal comparison group
can be utilized. That is, the experience of those
members of the cohort who are either unexposed or
exposed to low levels can be used as the comparison
group.
4. Measurment of Exposure: Measurement of the
exposure(s) of interest is a critical aspect in the design
of a cohort study. Information should be obtained on
age at first exposure, dates at which exposure started
and stopped, dose and pattern of exposure and
changes over time.
Information on the exposure(s) of interest may be
obtained from a number of sources like
i) Information provided by subjects through personal
interviews or questionnaire;
ii) Data obtained by medical examination or other
testing of the participants.
iii) Biological specimens: or
iv) Direct measurements of the environment in which
cohort members have lived or worked.
5. Measurement of outcome(s):
A major advantage of cohort studies is that it is
possible to examine the effect of a particular exposure
on multiple outcomes (Malnutrition Vs. Ht, Wt, Menarche) or
effect of multiple exposures on single outcome (Age,
parity, sexual partners and their effect on occurrence of Cervix cancer ).
Many cohort studies make use of existing routine,
surveillance systems to ascertain the outcomes of
interest. Such system include cancer registries and
death certification.
Case-control studies:Case Control studies are particularly suitable for the
study of relatively rare diseases with long induction
period, such as cancer. This is because a case-control
study starts with subjects who have already
developed the condition of interest, so that there is
no need to wait for time to elapse between exposure
and the occurrence of disease.
In an unmatched study, the numbers of cases
and controls found to have been exposed and not
exposed to the factor under investigation can be
arranged in 2x2 table as shown in the following table.
Subjects Exposed Unexposed TotalCases a b a+b
Controls c d c+d
Total a+c b+d N
Odds of exposure in the cases = a/b
Odds of exposure in the controls = c/d
Odds ratio = Odds of exposure in the
cases/Odds of exposure in the controls
= (a/b ) / (c/d) = ad/bc
It is not possible to estimate the disease incidence in exposed and unexposed group. However, it is possible to calculate the odds of exposure in the cases and in the controls.
Study Population
Cases (N1) Controls (N2)
Exposed Non-exposed Exposed
(N1-P1) (P2) (N2-P2)(P1)
Odds in Cases = O1 = P1/(N1-P1) Odds in Controls =O2 = P2/(N2-P2)
IF O1> O2 => Exposure increases the disease prevalence IF O1< O2 => Exposure decreases the disease prevalence
IF O1= O2 == > Exposure does not increase the risk of occurrence of disease
Case-control study:
Non-Exposed
Selection
A population based case-control study was carriedout in Spain and Colombia to assess therelationship between cervical cancer andexposure to human papilloma virus (HPV),selected aspects of sexual and reproductivebehaviour, use of oral contraceptives screeningpractices , smoking, and possible interactionsbetween them. The study included 436 incidentcases of histoligically confirmed invasivesquamous–cell carcinoma of the cervix and 387controls of similar age randomly selected from thegeneral population that generated the cases(Munoz et al., 1992)
In a cervical cancer control study conducted inColombia and Spain, the risk of developingcervical cancer was examined in relation to thelifetime number of sexual partners. (Based onPooled data of Colombia and Spain)
Outcome Number of Sexual partners
0-1 2-5 6+
Cervical cancer cases (a) 265 125 46
Controls (b) 305 74 8
Odds (a/b)
0.87 1.69 5.75
Odds ratio* 1 1.94** 6.62**
95% CI - 1.39 - 2.70 3.07 - 14.28
* Keeping 0 - 1 category as reference** Significantly different from reference categorySource: IARC 1999: Cancer Epidemiology: Principles and Methods: Page 207
Calculation of OR by classical method
Outcome Number of Sexual partners
2-5 0-1
Cervical cancer cases 125 (a) 265 (b)
Controls 74 (c) 305 (d)
Odds in Cervical cancer cases = a/b = 125/265
Odds in Controls = c/d = 74/305
OR = a/b/c/d = ad/bc = 125*305/(74*265) = 1.944
Calculation of 95% Confidence Interval:
The formula for calculation of SE for OR is given by
Sqrt (1/a + 1/b + 1/c + 1/d)
sqrt { (1/25) + (1/265) + (1/74) + (1/305) }
= (0.02856)^0.5 = .1690
LL = ln(1.944) – 1.96*0.1690
= 0.6647 – 0.3312 = 0.3335 =exp(0.3335) = 1.396
UL = ln(1.994)+1.96*0.1690 = 0.9960 = exp(0.9960) =2.707
So. The 95% Confidence Interval = 1.396 – 2.707
Calculation of OR by classical method
Outcome Number of Sexual partners
6+ 0-1
Cervical cancer cases 46 (a) 265 (b)
Controls 8 (c) 305 (d)
Odds in Cervical cancer cases = a/b = 46/265
Odds in Controls = c/d = 8/305
OR = a/b/c/d = ad/bc = 46*305/(265*8) = 6.618
Nested Case Control Study:In a traditional cohort study all study individuals aresubjected to the same procedure – interviews,health examinations, laboratory measurements, etc.at the time of their entry into the study andthroughout the follow up period. Alternatively acohort may be identified and followed up until asufficient number of cases are obtained. Moredetailed information is then collected and analysedbut only for the cases and for a sample of thedisease free individuals (controls), not for allmembers of the cohort. This type of case controlstudy conducted within a fixed cohort is called anested case – control study.
Survival Rate :
It indicates the percentage of people in a study ortreatment group who are alive for a given period of
time after diagnosis. Survival rates are important for
prognosis, for example whether a type of cancer hasa good or bad prognosis can be determined from its
survival rate.
Patients with certain disease can die directly from
that disease or from an unrevealed cause such as a
car accident or poisoning. When precise cause ofdeath is not specified, it is called the overall survival
rate or observed survival rate.
Survival rate is often expressed over standard timeperiods like one, three and five years. For example,prostate cancer has a much higher one year overallsurvival rate than pancreatic cancer and thus has abetter prognosis.
Relative Survival :
It is calculated by dividing the overall survival afterdiagnosis of a disease by the survival rate asobserved in a similar population that was notdiagnosed with the disease. A similar population iscomposed of individuals with at least age andgender similar to those diagnosed with disease.
Calculation of Relative survival - Mumbai
Total 1 year 3 years 5 years
Number of Breast cancer cases
7294 5682 4128 3355
% Absolute Survival
100 77.9 56.6 46.0
General survival 100 97.7 93.7 89.5
Relative survival 100 79.7 60.4 51.4
*Source: R. Sankarnarayanan, R Swaminathan; Cancer Survival in Africa, Asia, the Caribbean and Central America; IARC scientific Publications No. 162 (2011)
1992-94 -- > 1999; 1995-99 --> 2003
ChennaiKarunag
appallyMumbai Chennai
Karunag
appallyMumbai
Tongue C01-02 51.6 62.6 56.5 53.3 65.1 58.3
Oral cavity C03-06 60.9 65.6 60.5 62.9 68.3 62.2
Oesophagus C15 32.1 27.0 36.8 33.2 28.3 38.3
Stomach C16 34.5 22.1 33.6 35.7 23.1 34.8
Larynx C32 65.6 69.6 59.9 68.0 72.5 62.3
Lung C33-34 31.9 22.2 28.4 33.0 23.1 29.6
Breast C50 79.2 85.8 77.9 81.0 87.2 79.7
Cervix C53 77.0 82.9 75.2 78.4 85.7 76.6
Ovary C56 60.3 62.9 49.7 61.4 64.1 50.7
Prostate C61 50.2 93.5 64.3 51.3 101.2 77.9
One year Absolute and Relative Survival Rates for selected cancer sites
Source: Cancer Surviva l in Africa , As ia , the Caribbean and Centra l America - IARC Scienti fic
Publ ications No. 162; Edited by R. Sankaranarayanan and R. Swaminathan - 2011
Relative Survival
SiteICD10
code
Absolute Survival
ChennaiKarunag
appallyMumbai Chennai
Karunag
appallyMumbai
Tongue C01-02 19.4 25.9 25.3 23.0 31.9 29.3
Oral cavity C03-06 30.5 33.1 32.3 35.7 41.2 37.0
Oesophagus C15 6.9 2.9 13.0 8.3 3.5 15.4
Stomach C16 8.6 2.6 12.8 10.1 3.3 14.8
Larynx C32 30.7 29.6 28.6 36.8 35.1 34.6
Lung C33-34 6.5 5.3 10.9 7.6 6.4 13.2
Breast C50 43.7 46.8 46.0 48.6 51.2 51.4
Cervix C53 54.0 46.7 42.2 59.4 56.3 46.1
Ovary C56 27.4 26.0 22.8 29.7 28.1 24.6
Prostate C61 - 22.1 24.0 - 34.6 35.9
Source: Cancer Survival in Africa, Asia, the Caribbean and Central America - IARC Scientific
Publications No. 162; Edited by R. Sankaranarayanan and R. Swaminathan - 2011
Five years Absolute and Relative Survival Rates for selected cancer sites
SiteICD10
code
Absolute Survival Relative Survival