ORIGINAL INVESTIGATION Strengthening the reporting of genetic association studies (STREGA): an extension of the STROBE Statement Julian Little Julian P. T. Higgins John P. A. Ioannidis David Moher France Gagnon Erik von Elm Muin J. Khoury Barbara Cohen George Davey-Smith Jeremy Grimshaw Paul Scheet Marta Gwinn Robin E. Williamson Guang Yong Zou Kim Hutchings Candice Y. Johnson Valerie Tait Miriam Wiens Jean Golding Cornelia van Duijn John McLaughlin Andrew Paterson George Wells Isabel Fortier Matthew Freedman Maja Zecevic Richard King Claire Infante-Rivard Alex Stewart Nick Birkett Received: 20 March 2008 / Accepted: 9 November 2008 / Published online: 1 February 2009 Ó The Author(s) 2009. This article is published with open access at Springerlink.com Abstract Making sense of rapidly evolving evidence on genetic associations is crucial to making genuine advances in human genomics and the eventual integration of this information in the practice of medicine and public health. Assessment of the strengths and weaknesses of this evidence, and hence the ability to synthesize it, has been limited by inadequate reporting of results. The STrength- ening the REporting of Genetic Association studies (STREGA) initiative builds on the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement and provides additions to 12 of the 22 items on the STROBE checklist. The additions concern population stratification, genotyping errors, modeling haplo- type variation, Hardy–Weinberg equilibrium, replication, selection of participants, rationale for choice of genes and variants, treatment effects in studying quantitative traits, In order to encourage dissemination of the STREGA Statement, this article has also been published by Annals of Internal Medicine, European Journal of Clinical Investigation, European Journal of Epidemiology, Genetic Epidemiology, Journal of Clinical Epidemiology, and PLoS Medicine. The authors jointly hold the copyright of this article. J. Little Canada Research Chair in Human Genome Epidemiology, Ottawa, Canada J. Little (&) Á D. Moher Á K. Hutchings Á C. Y. Johnson Á V. Tait Á M. Wiens Á N. Birkett Department of Epidemiology and Community Medicine, University of Ottawa, 451 Smyth Rd., Ottawa, ON K1H 8M5, Canada e-mail: [email protected]J. P. T. Higgins MRC Biostatistics Unit, Cambridge, UK J. P. A. Ioannidis Department of Hygiene and Epidemiology, School of Medicine, University of Ioannina, Ioannina 45110, Greece J. P. A. Ioannidis Center for Genetic Epidemiology and Modeling, Tufts University School of Medicine, Boston, MA 02111, USA F. Gagnon CIHR New Investigator and Canada Research Chair in Genetic Epidemiology, University of Toronto, Dalla Lana School of Public Health, 155 College Street, Toronto, ON M5T 3M7, Canada E. von Elm Institute of Social and Preventive Medicine, University of Bern, Finkenhubelweg 11, 3012 Bern, Switzerland E. von Elm Department of Medical Biometry and Medical Informatics, German Cochrane Centre, University Medical Centre, Freiburg, Germany M. J. Khoury Á M. Gwinn National Office of Public Health Genomics, Centers for Disease Control and Prevention, Atlanta, USA B. Cohen Public Library of Science, San Francisco, CA, USA G. Davey-Smith Department of Social Medicine, MRC Centre for Causal Analyses in Translational Epidemiology, University of Bristol, Bristol, UK J. Grimshaw Canada Research Chair in Health Knowledge Transfer and Uptake, Clinical Epidemiology Program, Department of Medicine, Ottawa Health Research Institute, University of Ottawa, Ottawa, Canada 123 Hum Genet (2009) 125:131–151 DOI 10.1007/s00439-008-0592-7
21
Embed
Strengthening the reporting of genetic association … › content › pdf › 10.1007 › s00439-008...Strengthening the reporting of genetic association studies (STREGA): an extension
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
ORIGINAL INVESTIGATION
Strengthening the reporting of genetic association studies(STREGA): an extension of the STROBE Statement
Julian Little Æ Julian P. T. Higgins Æ John P. A. Ioannidis Æ David Moher Æ France Gagnon Æ Erik von Elm ÆMuin J. Khoury Æ Barbara Cohen Æ George Davey-Smith Æ Jeremy Grimshaw Æ Paul Scheet Æ Marta Gwinn ÆRobin E. Williamson Æ Guang Yong Zou Æ Kim Hutchings Æ Candice Y. Johnson Æ Valerie Tait Æ Miriam Wiens ÆJean Golding Æ Cornelia van Duijn Æ John McLaughlin Æ Andrew Paterson Æ George Wells Æ Isabel Fortier ÆMatthew Freedman Æ Maja Zecevic Æ Richard King Æ Claire Infante-Rivard Æ Alex Stewart Æ Nick Birkett
Received: 20 March 2008 / Accepted: 9 November 2008 / Published online: 1 February 2009
� The Author(s) 2009. This article is published with open access at Springerlink.com
Abstract Making sense of rapidly evolving evidence on
genetic associations is crucial to making genuine advances
in human genomics and the eventual integration of this
information in the practice of medicine and public health.
Assessment of the strengths and weaknesses of this
evidence, and hence the ability to synthesize it, has been
limited by inadequate reporting of results. The STrength-
ening the REporting of Genetic Association studies
(STREGA) initiative builds on the Strengthening the
Reporting of Observational Studies in Epidemiology
(STROBE) Statement and provides additions to 12 of the 22
items on the STROBE checklist. The additions concern
population stratification, genotyping errors, modeling haplo-
type variation, Hardy–Weinberg equilibrium, replication,
selection of participants, rationale for choice of genes and
variants, treatment effects in studying quantitative traits,
In order to encourage dissemination of the STREGA Statement, this
article has also been published by Annals of Internal Medicine,European Journal of Clinical Investigation, European Journal ofEpidemiology, Genetic Epidemiology, Journal of ClinicalEpidemiology, and PLoS Medicine. The authors jointly hold the
copyright of this article.
J. Little
Canada Research Chair in Human Genome Epidemiology,
Ottawa, Canada
J. Little (&) � D. Moher � K. Hutchings � C. Y. Johnson �V. Tait � M. Wiens � N. Birkett
Department of Epidemiology and Community Medicine,
common single-nucleotide polymorphisms (SNPs) (for
example, copy number variants, rare variants), and even-
tually routine full sequencing of samples from large
populations. Our recommendations are not intended to
support or oppose the choice of any particular study design
or method. Instead, they are intended to maximize the
transparency, quality and completeness of reporting of
what was done and found in a particular study.
Methods
A multidisciplinary group developed the STREGA State-
ment using literature review, workshop presentations and
discussion, and iterative electronic correspondence after
the workshop. Thirty-three of 74 invitees participated in the
STREGA workshop in Ottawa, Ontario, Canada, in June,
2006. Participants included epidemiologists, geneticists,
statisticians, journal editors, and graduate students.
Hum Genet (2009) 125:131–151 133
123
Before the workshop, an electronic search was per-
formed to identify existing reporting guidance for genetic
association studies. Workshop participants were also asked
to identify any additional guidance. They prepared brief
presentations on existing reporting guidelines, empirical
evidence on reporting of genetic association studies, the
development of the STROBE Statement, and several key
areas for discussion that were identified on the basis of
consultations before the workshop. These areas included
the selection and participation of study participants, ratio-
nale for choice of genes and variants investigated,
genotyping errors, methods for inferring haplotypes, popu-
lation stratification, assessment of Hardy–Weinberg
equilibrium (HWE), multiple testing, reporting of quanti-
tative (continuous) outcomes, selectively reporting study
results, joint effects and inference of causation in single
studies. Additional resources to inform workshop partici-
pants were the HuGENet handbook (Little and Higgins
2006; Higgins et al. 2007), examples of data extraction
forms from systematic reviews or meta-analyses, articles
on guideline development (Altman et al. 2001; Moher et al.
2001) and the checklists developed for STROBE. To har-
monize our recommendations for genetic association
studies with those for observational epidemiologic studies,
we communicated with the STROBE group during the
development process and sought their comments on the
STREGA draft documents. We also provided comments on
the developing STROBE Statement and its associated
explanation and elaboration document (Vandenbroucke
et al. 2007).
Results
In Table 1, we present the STREGA recommendations, an
extension to the STROBE checklist (von Elm et al. 2007)
for genetic association studies. The resulting STREGA
checklist provides additions to 12 of the 22 items on the
STROBE checklist. During the workshop and subsequent
consultations, we identified five main areas of special
interest that are specific to, or especially relevant in,
genetic association studies: genotyping errors, population
stratification, modeling haplotype variation, HWE, and
replication. We elaborate on each of these areas, starting
each section with the corresponding STREGA recom-
mendation, followed by a brief outline of the issue and an
explanation for the recommendations. Complementary
information on these areas and the rationale for additional
STREGA recommendations relating to selection of par-
ticipants, choice of genes and variants selected, treatment
effects in studying quantitative traits, statistical methods,
relatedness, reporting of descriptive and outcome data, and
issues of data volume, are presented in Table 2.
Genotyping errors
Recommendation for reporting of methods (Table 1, item
8(b)): Describe laboratory methods, including source and
storage of DNA, genotyping methods and platforms
(including the allele calling algorithm used, and its ver-
sion), error rates, and call rates. State the laboratory/
center where genotyping was done. Describe comparability
of laboratory methods if there is more than one group.
Specify whether genotypes were assigned using all of the
data from the study simultaneously or in smaller batches.
Recommendation for reporting of results (Table 1, item
13(a)): Report numbers of individuals in whom genotyping
was attempted and numbers of individuals in whom geno-
typing was successful.
Genotyping errors can occur as a result of effects of the
DNA sequence flanking the marker of interest, poor quality
or quantity of the DNA extracted from biological samples,
biochemical artefacts, poor equipment precision or equip-
ment failure, or human error in sample handling, conduct
of the array or handling the data obtained from the array
(Pompanon et al. 2005). A commentary published in 2005
on the possible causes and consequences of genotyping
errors observed that an increasing number of researchers
were aware of the problem, but that the effects of such
errors had largely been neglected (Pompanon et al. 2005).
The magnitude of genotyping errors has been reported to
vary between 0.5 and 30% (Pompanon et al. 2005; Akey
et al. 2001; Dequeker et al. 2001; Mitchell et al. 2003). In
high-throughput centers, an error rate of 0.5% per genotype
has been observed for blind duplicates that were run on the
same gel (Mitchell et al. 2003). This lower error rate
reflects an explicit choice of markers for which genotyping
rates have been found to be highly repeatable and whose
individual polymerase chain reactions (PCR) have been
optimized. Non-differential genotyping errors, that is, those
that do not differ systematically according to outcome
status, will usually bias associations towards the null
(Rothman et al. 1993; Garcia-Closas et al. 2004), just as for
other non-differential errors. The most marked bias occurs
when genotyping sensitivity is poor and genotype preva-
lence is high ([85%) or, as the corollary, when genotyping
specificity is poor and genotype prevalence is low (\15%)
(Rothman et al. 1993). When measurement of the envi-
ronmental exposure has substantial error, genotyping errors
of the order of 3% can lead to substantial under-estimation
of the magnitude of an interaction effect (Wong et al.
2004). When there are systematic differences in genotyping
according to outcome status (differential error), bias in any
direction may occur. Unblinded assessment may lead to
differential misclassification. For genome-wide association
studies of SNPs, differential misclassification between
comparison groups (for example, cases and controls) can
134 Hum Genet (2009) 125:131–151
123
Table 1 STREGA reporting recommendations, extended from STROBE Statement
Item Item number STROBE guideline Extension for Genetic
Association Studies
(STREGA)
Title and Abstract 1 (a) Indicate the study’s design with a
commonly used term in the title or the
abstract
(b) Provide in the abstract an informative and
balanced summary of what was done and
what was found
Introduction
Background rationale 2 Explain the scientific background and rationale
for the investigation being reported
Objectives 3 State specific objectives, including any pre-
specified hypotheses
State if the study is the first report of a
genetic association, a replication effort,
or both
Methods
Study design 4 Present key elements of study design early in
the paper
Setting 5 Describe the setting, locations and relevant
dates, including periods of recruitment,
exposure, follow-up, and data collection
Participants 6 (a) Cohort study: give the eligibility criteria,
and the sources and methods of selection of
participants. Describe methods of follow-up
Case–control study: give the eligibility criteria,
and the sources and methods of case
ascertainment and control selection. Give the
rationale for the choice of cases and controls
Cross-sectional study: give the eligibility
criteria, and the sources and methods of
selection of participants
Give information on the criteria and methods
for selection of subsets of participants from
a larger study, when relevant
(b) Cohort study: for matched studies, give
matching criteria and number of exposed
and unexposed
Case–control study: for matched studies, give
matching criteria and the number of controls
per case
Variables 7 (a) Clearly define all outcomes, exposures,
predictors, potential confounders, and effect
modifiers. Give diagnostic criteria, if
applicable
(b) Clearly define genetic exposures (genetic
variants) using a widely-used
nomenclature system. Identify variables
likely to be associated with population
stratification (confounding by ethnic
origin)
Data sources measurement 8a (a) For each variable of interest, give sources of
data and details of methods of assessment
(measurement). Describe comparability of
assessment methods if there is more than one
group
(b) Describe laboratory methods, including
source and storage of DNA, genotyping
methods and platforms (including the
allele calling algorithm used, and its
version), error rates and call rates. State the
laboratory/center where genotyping was
done. Describe comparability of laboratory
methods if there is more than one group.
Specify whether genotypes were assigned
using all of the data from the study
simultaneously or in smaller batches
Hum Genet (2009) 125:131–151 135
123
Table 1 continued
Item Item number STROBE guideline Extension for Genetic
Association Studies
(STREGA)
Bias 9 (a) Describe any efforts to address potential
sources of bias
(b) For quantitative outcome variables,
specify if any investigation of potential
bias resulting from pharmacotherapy was
undertaken. If relevant, describe the nature
and magnitude of the potential bias, and
explain what approach was used to deal
with this
Study size 10 Explain how the study size was arrived at
Quantitative variables 11 Explain how quantitative variables were
handled in the analyses. If applicable,
describe which groupings were chosen, and
why
If applicable, describe how effects of
treatment were dealt with
Statistical methods 12 (a) Describe all statistical methods, including
those used to control for confounding
State software version used and options (or
settings) chosen
(b) Describe any methods used to examine
subgroups and interactions
(c) Explain how missing data were addressed
Cohort study: if applicable, explain how loss to
follow-up was addressed
Case–control study: if applicable, explain how
matching of cases and controls was
addressed
Cross-sectional study: if applicable, describe
analytical methods taking account of
sampling strategy
(e) Describe any sensitivity analyses
(f) State whether Hardy–Weinberg
equilibrium was considered and, if so, how
(g) Describe any methods used for inferring
genotypes or haplotypes
(h) Describe any methods used to assess or
address population stratification
(i) Describe any methods used to address
multiple comparisons or to control risk of
false-positive findings
(j) Describe any methods used to address and
correct for relatedness among subjects
Results
Participants 13a (a) Report the numbers of individuals at each
stage of the study—e.g., numbers potentially
eligible, examined for eligibility, confirmed
eligible, included in the study, completing
follow-up, and analyzed
Report numbers of individuals in whom
genotyping was attempted and numbers of
individuals in whom genotyping was
successful
(b) Give reasons for non-participation at each
stage
(c) Consider use of a flow diagram
Descriptive data 14a (a) Give characteristics of study participants
(e.g., demographic, clinical, social) and
information on exposures and potential
confounders
Consider giving information by genotype
(b) Indicate the number of participants with
missing data for each variable of interest
(c) Cohort study: summarize follow-up time,
e.g., average and total amount
136 Hum Genet (2009) 125:131–151
123
occur because of differences in DNA storage, collection or
processing protocols, even when the genotyping itself
meets the highest possible standards (Clayton et al. 2005).
In this situation, using samples blinded to comparison
group to determine the parameters for allele calling could
still lead to differential misclassification. To minimize such
Table 1 continued
Item Item number STROBE guideline Extension for Genetic
Association Studies
(STREGA)
Outcome data 15a Cohort study: report numbers of outcome
events or summary measures over time
Report outcomes (phenotypes) for each
genotype category over time
Case–control study: report numbers in each
exposure category, or summary measures of
exposure
Report numbers in each genotype category
Cross-sectional study: report numbers of
outcome events or summary measures
Report outcomes (phenotypes) for each
genotype category
Main results 16 (a) Give unadjusted estimates and, if
applicable, confounder-adjusted estimates
and their precision (e.g., 95% confidence
intervals). Make clear which confounders
were adjusted for and why they were
included
(b) Report category boundaries when
continuous variables were categorized
(c) If relevant, consider translating estimates of
relative risk into absolute risk for a
meaningful time period
(d) Report results of any adjustments for
multiple comparisons
Other analyses 17 (a) Report other analyses done—e.g., analyses
of subgroups and interactions, and
sensitivity analyses
(b) If numerous genetic exposures (genetic
variants) were examined, summarize
results from all analyses undertaken
(c) If detailed results are available elsewhere,
state how they can be accessed
Discussion
Key results 18 Summarize key results with reference to study
objectives
Limitations 19 Discuss limitations of the study, taking into
account sources of potential bias or
imprecision. Discuss both direction and
magnitude of any potential bias
Interpretation 20 Give a cautious overall interpretation of results
considering objectives, limitations,
multiplicity of analyses, results from similar
studies, and other relevant evidence
Generalizability 21 Discuss the generalizability (external validity)
of the study results
Other information
Funding 22 Give the source of funding and the role of the
funders for the present study and, if
applicable, for the original study on which
the present article is based
STREGA Strengthening the REporting of Genetic Association studies, STROBE Strengthening the Reporting of Observational Studies in
Epidemiologya Give information separately for cases and controls in case–control studies and, if applicable, for exposed and unexposed groups in cohort and
cross-sectional studies
Hum Genet (2009) 125:131–151 137
123
Ta
ble
2R
atio
nal
efo
rin
clu
sio
no
fto
pic
sin
the
ST
RE
GA
reco
mm
end
atio
ns
Sp
ecifi
cis
sue
ing
enet
ic
asso
ciat
ion
stu
die
s
Rat
ion
ale
for
incl
usi
on
inS
TR
EG
AIt
em(s
)in
ST
RE
GA
Sp
ecifi
csu
gg
esti
on
sfo
rre
po
rtin
g
Mai
nar
eas
of
spec
ial
inte
rest
Gen
oty
pin
ger
rors
(mis
clas
sifi
cati
on
of
exp
osu
re)
No
n-d
iffe
ren
tial
gen
oty
pin
ger
rors
wil
lu
sual
lyb
ias
asso
ciat
ion
s
tow
ard
sth
en
ull
(Ro
thm
anet
al.
19
93
;G
arci
a-C
losa
set
al.
20
04
).
Wh
enth
ere
are
syst
emat
ic
dif
fere
nce
sin
gen
oty
pin
gac
cord
ing
too
utc
om
est
atu
s(d
iffe
ren
tial
erro
r),
bia
sin
any
dir
ecti
on
may
occ
ur
8(b
)D
escr
ibe
lab
ora
tory
met
ho
ds,
incl
ud
ing
sou
rce
and
sto
rag
eo
fD
NA
,g
eno
typ
ing
met
ho
ds
and
pla
tfo
rms
(in
clu
din
gth
eal
lele
call
ing
alg
ori
thm
use
d,
and
its
ver
sio
n),
erro
r
rate
san
dca
llra
tes.
Sta
teth
ela
bo
rato
ry/c
ente
r
wh
ere
gen
oty
pin
gw
asd
on
e.D
escr
ibe
com
par
abil
ity
of
lab
ora
tory
met
ho
ds
ifth
ere
is
mo
reth
ano
ne
gro
up
.S
pec
ify
wh
eth
er
gen
oty
pes
wer
eas
sig
ned
usi
ng
all
of
the
dat
a
fro
mth
est
ud
ysi
mu
ltan
eou
sly
or
insm
alle
r
bat
ches
13
(a)
Rep
ort
nu
mb
ers
of
ind
ivid
ual
sin
wh
om
gen
oty
pin
gw
asat
tem
pte
dan
dn
um
ber
so
f
ind
ivid
ual
sin
wh
om
gen
oty
pin
gw
as
succ
essf
ul
Fac
tors
affe
ctin
gth
ep
ote
nti
alex
ten
to
f
mis
clas
sifi
cati
on
(in
form
atio
nb
ias)
of
gen
oty
pe
incl
ud
eth
ety
pes
and
qu
alit
yo
f
sam
ple
s,ti
min
go
fco
llec
tio
n,
and
the
met
ho
d
use
dfo
rg
eno
typ
ing
(Lit
tle
etal
.2
00
2;
Po
mp
ano
net
al.
20
05
;S
tein
ber
gan
d
Gal
lag
her
20
04
)
Wh
enh
igh
-th
rou
gh
pu
tp
latf
orm
sar
eu
sed
,it
is
imp
ort
ant
tore
po
rtn
ot
on
lyth
ep
latf
orm
use
d
bu
tal
soth
eal
lele
call
ing
alg
ori
thm
and
its
ver
sio
n.
Dif
fere
nt
call
ing
alg
ori
thm
sh
ave
dif
fere
nt
stre
ng
ths
and
wea
kn
esse
s
[(M
cCar
thy
etal
.2
00
8)
and
sup
ple
men
tary
info
rmat
ion
in(W
ellc
om
eT
rust
Cas
eC
on
tro
l
Co
nso
rtiu
m2
00
7)]
.F
or
exam
ple
,so
me
of
the
curr
entl
yu
sed
alg
ori
thm
sar
en
ota
bly
less
accu
rate
inas
sig
nin
gg
eno
typ
esto
sin
gle
-
nu
cleo
tid
ep
oly
mo
rph
ism
sw
ith
low
min
or
alle
lefr
equ
enci
es(\
0.1
0)
than
tosi
ng
le
nu
cleo
tid
ep
oly
mo
rph
ism
sw
ith
hig
her
min
or
alle
lefr
equ
enci
es(P
ears
on
and
Man
oli
o
20
08
).A
lgo
rith
ms
are
con
tin
ual
lyb
ein
g
imp
rov
ed.
Rep
ort
ing
the
alle
leca
llin
g
alg
ori
thm
and
its
ver
sio
nw
ill
hel
pre
ader
sto
inte
rpre
tre
po
rted
resu
lts,
and
itis
crit
ical
for
rep
rod
uci
ng
the
resu
lts
of
the
stu
dy
giv
enth
e
sam
ein
term
edia
teo
utp
ut
file
ssu
mm
ariz
ing
inte
nsi
tyo
fh
yb
rid
izat
ion
Fo
rso
me
hig
h-t
hro
ug
hp
ut
pla
tfo
rms,
the
use
r
may
cho
ose
toas
sig
ng
eno
typ
esu
sin
gal
lo
f
the
dat
afr
om
the
stu
dy
sim
ult
aneo
usl
y,
or
in
smal
ler
bat
ches
,su
chas
by
pla
te(C
lay
ton
etal
.2
00
5;
Pla
gn
ol
etal
.2
00
7)
and
sup
ple
men
tary
info
rmat
ion
(Wel
lco
me
Tru
st
Cas
eC
on
tro
lC
on
sort
ium
20
07
)).
Th
isch
oic
e
can
affe
ctb
oth
the
ov
eral
lca
llra
tean
dth
e
rob
ust
nes
so
fth
eca
lls
Fo
rca
se–
con
tro
lst
ud
ies,
wh
eth
erg
eno
typ
ing
was
do
ne
bli
nd
toca
se–
con
tro
lst
atu
ssh
ou
ld
be
rep
ort
ed,
alo
ng
wit
hth
ere
aso
nfo
rth
is
dec
isio
n
138 Hum Genet (2009) 125:131–151
123
Ta
ble
2co
nti
nu
ed
Sp
ecifi
cis
sue
ing
enet
ic
asso
ciat
ion
stu
die
s
Rat
ion
ale
for
incl
usi
on
inS
TR
EG
AIt
em(s
)in
ST
RE
GA
Sp
ecifi
csu
gg
esti
on
sfo
rre
po
rtin
g
Po
pu
lati
on
stra
tifi
cati
on
(co
nfo
un
din
gb
yet
hn
ico
rig
in)
Wh
enst
ud
ysu
b-p
op
ula
tio
ns
dif
fer
bo
thin
alle
le(o
rg
eno
typ
e)
freq
uen
cies
and
dis
ease
risk
s,th
en
con
fou
nd
ing
wil
lo
ccu
rif
thes
esu
b-
po
pu
lati
on
sar
eu
nev
enly
dis
trib
ute
dac
ross
exp
osu
reg
rou
ps
(or
bet
wee
nca
ses
and
con
tro
ls)
12
(h)
Des
crib
ean
ym
eth
od
su
sed
toas
sess
or
add
ress
po
pu
lati
on
stra
tifi
cati
on
Inv
iew
of
the
deb
ate
abo
ut
the
po
ten
tial
imp
lica
tio
ns
of
po
pu
lati
on
stra
tifi
cati
on
for
the
val
idit
yo
fg
enet
icas
soci
atio
nst
ud
ies,
tran
spar
ent
rep
ort
ing
of
the
met
ho
ds
use
d,
or
stat
ing
that
no
ne
was
use
d,
toad
dre
ssth
is
po
ten
tial
pro
ble
mis
imp
ort
ant
for
allo
win
gth
e
emp
iric
alev
iden
ceto
accr
ue
Eth
nic
ity
info
rmat
ion
sho
uld
be
pre
sen
ted
(see
for
exam
ple
(Win
ker
20
06
)),
assh
ou
ldg
enet
ic
mar
ker
so
ro
ther
var
iab
les
lik
ely
tob
e
asso
ciat
edw
ith
po
pu
lati
on
stra
tifi
cati
on
.
Det
ails
of
case
-fam
ily
con
tro
ld
esig
ns
sho
uld
be
pro
vid
edif
they
are
use
d
As
sev
eral
met
ho
ds
of
adju
stin
gfo
rp
op
ula
tio
n
stra
tifi
cati
on
hav
eb
een
pro
po
sed
(Bal
din
g
20
06
),ex
pli
cit
do
cum
enta
tio
no
fth
em
eth
od
s
isn
eed
ed
Mo
del
ing
hap
loty
pe
var
iati
on
Ind
esig
ns
con
sid
ered
inth
isar
ticl
e,
hap
loty
pes
hav
eto
be
infe
rred
bec
ause
of
lack
of
avai
lab
lefa
mil
y
info
rmat
ion
.T
her
ear
ed
iver
se
met
ho
ds
for
infe
rrin
gh
aplo
typ
es.
12
(g)
Des
crib
ean
ym
eth
od
su
sed
for
infe
rrin
g
gen
oty
pes
or
hap
loty
pes
.
Wh
end
iscr
ete
‘‘w
ind
ow
s’’
are
use
dto
sum
mar
ize
hap
loty
pes
,v
aria
tio
nin
the
defi
nit
ion
of
thes
em
ayco
mp
lica
te
com
par
iso
ns
acro
ssst
ud
ies,
asre
sult
sm
ayb
e
sen
siti
ve
toch
oic
eo
fw
ind
ow
s.R
elat
ed
‘‘im
pu
tati
on
’’st
rate
gie
sar
eal
soin
use
(Wel
lco
me
Tru
stC
ase
Co
ntr
ol
Co
nso
rtiu
m
20
07
;S
cott
etal
.2
00
7;
Scu
teri
etal
.2
00
7).
Itis
imp
ort
ant
tog
ive
det
ails
on
hap
loty
pe
infe
ren
cean
d,
wh
enp
oss
ible
,u
nce
rtai
nty
.
Ad
dit
ion
alco
nsi
der
atio
ns
for
rep
ort
ing
incl
ud
eth
est
rate
gy
for
dea
lin
gw
ith
rare
hap
loty
pes
,w
ind
ow
size
and
con
stru
ctio
n(i
f
use
d)
and
cho
ice
of
soft
war
e
Har
dy
–W
ein
ber
geq
uil
ibri
um
(HW
E)
Dep
artu
refr
om
Har
dy
–W
ein
ber
g
equ
ilib
riu
mm
ayin
dic
ate
erro
rso
r
pec
uli
arit
ies
inth
ed
ata
(Sal
anti
etal
.2
00
5).
Em
pir
ical
asse
ssm
ents
hav
efo
un
dth
at2
0–
69
%o
fg
enet
ic
asso
ciat
ion
sw
ere
rep
ort
edw
ith
som
ein
dic
atio
nab
ou
tco
nfo
rmit
y
wit
hH
ard
y–
Wei
nb
erg
equ
ilib
riu
m,
and
that
amo
ng
som
eo
fth
ese,
ther
e
wer
eli
mit
atio
ns
or
erro
rsin
its
asse
ssm
ent
(Sal
anti
etal
.2
00
5)
12
(f)
Sta
tew
het
her
Har
dy
–W
ein
ber
g
equ
ilib
riu
mw
asco
nsi
der
edan
d,
ifso
,h
ow
An
yst
atis
tica
lte
sts
or
mea
sure
ssh
ou
ldb
e
des
crib
ed,
assh
ou
ldan
yp
roce
du
reto
allo
w
for
dev
iati
on
sfr
om
Har
dy
–W
ein
ber
g
equ
ilib
riu
min
eval
uat
ing
gen
etic
asso
ciat
ion
s
(Zo
uan
dD
on
ner
20
06
)
Hum Genet (2009) 125:131–151 139
123
Ta
ble
2co
nti
nu
ed
Sp
ecifi
cis
sue
ing
enet
ic
asso
ciat
ion
stu
die
s
Rat
ion
ale
for
incl
usi
on
inS
TR
EG
AIt
em(s
)in
ST
RE
GA
Sp
ecifi
csu
gg
esti
on
sfo
rre
po
rtin
g
Rep
lica
tio
nP
ub
lica
tio
ns
that
pre
sen
tan
d
syn
thes
ize
dat
afr
om
sev
eral
stu
die
s
ina
sin
gle
rep
ort
are
bec
om
ing
mo
reco
mm
on
3:
Sta
teif
the
stu
dy
isth
efi
rst
rep
ort
of
ag
enet
ic
asso
ciat
ion
,a
rep
lica
tio
nef
fort
,o
rb
oth
Th
ese
lect
edcr
iter
iafo
rcl
aim
ing
succ
essf
ul
rep
lica
tio
nsh
ou
ldal
sob
eex
pli
citl
y
do
cum
ente
d
Ad
dit
ion
alis
sues
Sel
ecti
on
of
par
tici
pan
tsS
elec
tio
nb
ias
may
occ
ur
if(i
)g
enet
ic
asso
ciat
ion
sar
ein
ves
tig
ated
ino
ne
or
mo
resu
bse
tso
fp
arti
cip
ants
(su
b-
sam
ple
s)fr
om
ap
arti
cula
rst
ud
y;
or
(ii)
ther
eis
dif
fere
nti
aln
on
-
par
tici
pat
ion
ing
rou
ps
bei
ng
com
par
ed;
or,
(iii
)th
ere
are
dif
fere
nti
alg
eno
typ
ing
call
rate
sin
gro
up
sb
ein
gco
mp
ared
6(a
)G
ive
info
rmat
ion
on
the
crit
eria
and
met
ho
ds
for
sele
ctio
no
fsu
bse
tso
f
par
tici
pan
tsfr
om
ala
rger
stu
dy
,w
hen
rele
van
t
13
(a)
Rep
ort
nu
mb
ers
of
ind
ivid
ual
sin
wh
om
gen
oty
pin
gw
asat
tem
pte
dan
dn
um
ber
so
f
ind
ivid
ual
sin
wh
om
gen
oty
pin
gw
as
succ
essf
ul
Incl
usi
on
and
excl
usi
on
crit
eria
,so
urc
esan
d
met
ho
ds
of
sele
ctio
no
fsu
b-s
amp
les
sho
uld
be
spec
ified
,st
atin
gw
het
her
thes
ew
ere
bas
edo
n
ap
rio
rio
rp
ost
ho
cco
nsi
der
atio
ns
Rat
ion
ale
for
cho
ice
of
gen
esan
d
var
ian
tsin
ves
tig
ated
Wit
ho
ut
anex
pli
cit
rati
on
ale,
itis
dif
ficu
ltto
jud
ge
the
po
ten
tial
for
sele
ctiv
ere
po
rtin
go
fst
ud
yre
sult
s.
Th
ere
isst
ron
gem
pir
ical
evid
ence
fro
mra
nd
om
ised
con
tro
lled
tria
ls
that
rep
ort
ing
of
tria
lo
utc
om
esis
freq
uen
tly
inco
mp
lete
and
bia
sed
in
fav
or
of
stat
isti
call
ysi
gn
ifica
nt
fin
din
gs
(Ch
anet
al.
20
04
a,b
;C
han
and
Alt
man
20
05).
So
me
evid
ence
isal
soav
aila
ble
in
ph
arm
aco
gen
etic
s(C
on
top
ou
los-
Ioan
nid
iset
al.
20
06)
7(b
)C
lear
lyd
efin
eg
enet
icex
po
sure
s(g
enet
ic
var
ian
ts)
usi
ng
aw
idel
y-u
sed
no
men
clat
ure
syst
em.
Iden
tify
var
iab
les
lik
ely
tob
e
asso
ciat
edw
ith
po
pu
lati
on
stra
tifi
cati
on
(co
nfo
un
din
gb
yet
hn
ico
rig
in)
Th
esc
ien
tifi
cb
ack
gro
un
dan
dra
tio
nal
efo
r
inv
esti
gat
ing
the
gen
esan
dv
aria
nts
sho
uld
be
rep
ort
ed
Fo
rg
eno
me-
wid
eas
soci
atio
nst
ud
ies,
itis
imp
ort
ant
tosp
ecif
yw
hat
init
ial
test
ing
pla
tfo
rms
wer
eu
sed
and
ho
wg
ene
var
ian
tsar
e
sele
cted
for
furt
her
test
ing
insu
bse
qu
ent
stag
es.
Th
ism
ayin
vo
lve
stat
isti
cal
con
sid
erat
ion
s(f
or
exam
ple
,se
lect
ion
of
Pv
alu
eth
resh
old
),fu
nct
ion
alo
ro
ther
bio
log
ical
con
sid
erat
ion
s,fi
ne
map
pin
gch
oic
es,
or
oth
er
app
roac
hes
that
nee
dto
be
spec
ified
Gu
idel
ines
for
hu
man
gen
en
om
encl
atu
reh
ave
bee
np
ub
lish
edb
yth
eH
um
anG
ene
No
men
clat
ure
Co
mm
itte
e(W
ain
etal
.2
00
2a,
b).
Sta
nd
ard
refe
ren
cen
um
ber
sfo
rn
ucl
eoti
de
seq
uen
cev
aria
tio
ns,
larg
ely
bu
tn
ot
on
lyS
NP
s
are
pro
vid
edin
db
SN
P,
the
Nat
ion
alC
ente
r
for
Bio
tech
no
log
yIn
form
atio
n’s
dat
abas
eo
f
gen
etic
var
iati
on
(Sh
erry
etal
.2
00
1).
Fo
r
var
iati
on
sn
ot
list
edin
db
SN
Pth
atca
nb
e
des
crib
edre
lati
ve
toa
spec
ified
ver
sio
n,
gu
idel
ines
hav
eb
een
pro
po
sed
(An
ton
arak
is
19
98
;d
enD
un
nen
and
An
ton
arak
is2
00
0)
140 Hum Genet (2009) 125:131–151
123
Ta
ble
2co
nti
nu
ed
Sp
ecifi
cis
sue
ing
enet
ic
asso
ciat
ion
stu
die
s
Rat
ion
ale
for
incl
usi
on
inS
TR
EG
AIt
em(s
)in
ST
RE
GA
Sp
ecifi
csu
gg
esti
on
sfo
rre
po
rtin
g
Tre
atm
ent
effe
cts
inst
ud
ies
of
qu
anti
tati
ve
trai
ts
Ast
ud
yo
fa
qu
anti
tati
ve
var
iab
lem
ay
be
com
pro
mis
edw
hen
the
trai
tis
sub
ject
edto
the
effe
cts
of
a
trea
tmen
tfo
rex
amp
le,
the
stu
dy
of
ali
pid
-rel
ated
trai
tfo
rw
hic
h
sev
eral
ind
ivid
ual
sar
eta
kin
gli
pid
-
low
erin
gm
edic
atio
n.
Wit
ho
ut
app
rop
riat
eco
rrec
tio
n,
this
can
lead
tob
ias
ines
tim
atin
gth
eef
fect
and
loss
of
po
wer
9(b
)F
or
qu
anti
tati
ve
ou
tco
me
var
iab
les,
spec
ify
ifan
yin
ves
tig
atio
no
fp
ote
nti
alb
ias
resu
ltin
g
fro
mp
har
mac
oth
erap
yw
asu
nd
erta
ken
.If
rele
van
t,d
escr
ibe
the
nat
ure
and
mag
nit
ud
eo
f
the
po
ten
tial
bia
s,an
dex
pla
inw
hat
app
roac
h
was
use
dto
dea
lw
ith
this
11
:If
app
lica
ble
,d
escr
ibe
ho
wef
fect
so
f
trea
tmen
tw
ere
dea
ltw
ith
Sev
eral
met
ho
ds
of
adju
stin
gfo
rtr
eatm
ent
effe
cts
hav
eb
een
pro
po
sed
(To
bin
etal
.
20
05
).A
sth
eap
pro
ach
tod
eal
wit
htr
eatm
ent
effe
cts
may
hav
ean
imp
ort
ant
imp
act
on
bo
th
the
po
wer
of
the
stu
dy
and
the
inte
rpre
tati
on
of
the
resu
lts,
exp
lici
td
ocu
men
tati
on
of
the
sele
cted
stra
teg
yis
nee
ded
Sta
tist
ical
met
ho
ds
An
aly
sis
met
ho
ds
sho
uld
be
tran
spar
ent
and
rep
lica
ble
,an
d
gen
etic
asso
ciat
ion
stu
die
sar
eo
ften
per
form
edu
sin
gsp
ecia
lize
d
soft
war
e
12
(a)
Sta
teso
ftw
are
ver
sio
nu
sed
and
op
tio
ns
(or
sett
ing
s)ch
ose
n
Rel
ated
nes
sT
he
met
ho
ds
of
anal
ysi
su
sed
in
fam
ily
-bas
edst
ud
ies
are
dif
fere
nt
fro
mth
ose
use
din
stu
die
sth
atar
e
bas
edo
nu
nre
late
dca
ses
and
con
tro
ls.
Mo
reo
ver
,ev
enin
the
stu
die
sth
atar
eb
ased
on
app
aren
tly
un
rela
ted
case
san
dco
ntr
ols
,so
me
ind
ivid
ual
sm
ayh
ave
som
e
con
nec
tio
nan
dm
ayb
e(d
ista
nt)
rela
tiv
es,
and
this
isp
arti
cula
rly
com
mo
nin
smal
l,is
ola
ted
po
pu
lati
on
s,fo
rex
amp
le,
Icel
and
.
Th
ism
ayn
eed
tob
ep
rob
edw
ith
app
rop
riat
em
eth
od
san
dad
just
ed
for
inth
ean
aly
sis
of
the
dat
a
12
(j)
Des
crib
ean
ym
eth
od
su
sed
toad
dre
ssan
d
corr
ect
for
rela
ted
nes
sam
on
gsu
bje
cts
Fo
rth
eg
reat
maj
ori
tyo
fst
ud
ies
inw
hic
h
sam
ple
sar
ed
raw
nfr
om
larg
e,n
on
-iso
late
d
po
pu
lati
on
s,re
late
dn
ess
isty
pic
ally
neg
lig
ible
and
resu
lts
wo
uld
no
tb
eal
tere
dd
epen
din
go
n
wh
eth
erre
late
dn
ess
ista
ken
into
acco
un
t.T
his
may
no
tb
eth
eca
sein
iso
late
dp
op
ula
tio
ns
or
tho
sew
ith
con
sid
erab
lein
bre
edin
g.
If
inv
esti
gat
ors
hav
eas
sess
edfo
rre
late
dn
ess,
they
sho
uld
stat
eth
em
eth
od
use
d(L
yn
chan
d
Rit
lan
d1
99
9;
Sla
ger
and
Sch
aid
20
01
;V
oig
ht
and
Pri
tch
ard
20
05
)an
dh
ow
the
resu
lts
are
corr
ecte
dfo
rid
enti
fied
rela
ted
nes
s
Rep
ort
ing
of
des
crip
tiv
ean
d
ou
tco
me
dat
a
Th
esy
nth
esis
of
fin
din
gs
acro
ss
stu
die
sd
epen
ds
on
the
avai
lab
ilit
y
of
suffi
cien
tly
det
aile
dd
ata
14
(a)
Co
nsi
der
giv
ing
info
rmat
ion
by
gen
oty
pe
15
:C
oh
ort
stu
dy:
Rep
ort
ou
tco
mes
(ph
eno
typ
es)
for
each
gen
oty
pe
cate
go
ryo
ver
tim
e
Ca
se-c
on
tro
lst
ud
y:R
epo
rtn
um
ber
inea
ch
gen
oty
pe
cate
go
ry
Cro
ss-s
ecti
on
al
stu
dy:
Rep
ort
ou
tco
mes
(ph
eno
typ
es)
for
each
gen
oty
pe
cate
go
ry
Hum Genet (2009) 125:131–151 141
123
Ta
ble
2co
nti
nu
ed
Sp
ecifi
cis
sue
ing
enet
ic
asso
ciat
ion
stu
die
s
Rat
ion
ale
for
incl
usi
on
inS
TR
EG
AIt
em(s
)in
ST
RE
GA
Sp
ecifi
csu
gg
esti
on
sfo
rre
po
rtin
g
Vo
lum
eo
fd
ata
Th
ek
eyp
rob
lem
iso
fp
oss
ible
fals
e-
po
siti
ve
resu
lts
and
sele
ctiv
e
rep
ort
ing
of
thes
e.T
yp
eI
erro
rsar
e
par
ticu
larl
yre
lev
ant
toth
eco
nd
uct
of
gen
om
e-w
ide
asso
ciat
ion
stu
die
s.
Ala
rge
sear
cham
on
gh
un
dre
ds
of
tho
usa
nd
so
fg
enet
icv
aria
nts
can
be
exp
ecte
db
ych
ance
alo
ne
tofi
nd
tho
usa
nd
so
ffa
lse-
po
siti
ve
resu
lts
(od
ds
rati
os
sig
nifi
can
tly
dif
fere
nt
fro
m1
.0)
12
(i)
Des
crib
ean
ym
eth
od
su
sed
toad
dre
ss
mu
ltip
leco
mp
aris
on
so
rto
con
tro
lri
sko
f
fals
e-p
osi
tiv
efi
nd
ing
s
16
(d)
Rep
ort
resu
lts
of
any
adju
stm
ents
for
mu
ltip
leco
mp
aris
on
s
17
(b)
Ifn
um
ero
us
gen
etic
exp
osu
res
(gen
etic
var
ian
ts)
wer
eex
amin
ed,
sum
mar
ize
resu
lts
fro
mal
lan
aly
ses
un
der
tak
en
17
(c)
Ifd
etai
led
resu
lts
are
avai
lab
leel
sew
her
e,
stat
eh
ow
they
can
be
acce
ssed
Gen
om
e-w
ide
asso
ciat
ion
stu
die
sco
llec
t
info
rmat
ion
on
av
ery
larg
en
um
ber
of
gen
etic
var
ian
tsco
nco
mit
antl
y.
Init
iati
ves
tom
ake
the
enti
red
atab
ase
tran
spar
ent
and
avai
lab
le
on
lin
em
aysu
pp
lya
defi
nit
ive
solu
tio
nto
the
pro
ble
mo
fse
lect
ive
rep
ort
ing
(Kh
ou
ryet
al.
20
07
)
Av
aila
bil
ity
of
raw
dat
am
ayh
elp
inte
rest
ed
inv
esti
gat
ors
rep
rod
uce
the
pu
bli
shed
anal
yse
s
and
also
pu
rsu
ead
dit
ion
alan
aly
ses.
A
po
ten
tial
dra
wb
ack
of
pu
bli
cd
ata
avai
lab
ilit
y
isth
atin
ves
tig
ato
rsu
sin
gth
ed
ata
seco
nd
-
han
dm
ayn
ot
be
awar
eo
fli
mit
atio
ns
or
oth
er
pro
ble
ms
that
wer
eo
rig
inal
lyen
cou
nte
red
,
un
less
thes
ear
eal
sotr
ansp
aren
tly
rep
ort
ed.
In
this
reg
ard
,co
llab
ora
tio
no
fth
ed
ata
use
rs
wit
hth
eo
rig
inal
inv
esti
gat
ors
may
be
ben
efici
al.
Issu
eso
fco
nse
nt
and
con
fid
enti
alit
y(H
om
eret
al.
20
08;
Zer
ho
un
i
and
Nab
el2
00
8)
may
also
com
pli
cate
wh
at
dat
aca
nb
esh
ared
,an
dh
ow
.It
wo
uld
be
use
ful
for
pu
bli
shed
rep
ort
sto
spec
ify
no
to
nly
wh
atd
ata
can
be
acce
ssed
and
wh
ere,
bu
tal
so
bri
efly
men
tio
nth
ep
roce
du
re.
Fo
rar
ticl
esth
at
hav
eu
sed
pu
bli
cly
avai
lab
led
ata,
itw
ou
ldb
e
use
ful
tocl
arif
yw
het
her
the
ori
gin
al
inv
esti
gat
ors
wer
eal
soin
vo
lved
and
ifso
,h
ow
Th
ev
olu
me
of
dat
aan
aly
zed
sho
uld
also
be
con
sid
ered
inth
ein
terp
reta
tio
no
ffi
nd
ing
s
Ex
amp
les
of
met
ho
ds
of
sum
mar
izin
gre
sult
s
incl
ud
eg
ivin
gd
istr
ibu
tio
no
fP
val
ues
(fre
qu
enti
stst
atis
tics
),d
istr
ibu
tio
no
fef
fect
size
san
dsp
ecif
yin
gfa
lse
dis
cov
ery
rate
s
142 Hum Genet (2009) 125:131–151
123
differential misclassification, it would be necessary to
calibrate the software separately for each group. This is one
of the reasons for our recommendation to specify whether
genotypes were assigned using all of the data from the
study simultaneously or in smaller batches.
Population stratification
Recommendation for reporting of methods (Table 1, item
12(h)): Describe any methods used to assess or address
population stratification.
Population stratification is the presence within a popu-
lation of subgroups among which allele (or genotype; or
haplotype) frequencies and disease risks differ. When the
groups compared in the study differ in their proportions of
the population subgroups, an association between the
genotype and the disease being investigated may reflect the
genotype being an indicator identifying a population sub-
group rather than a causal variant. In this situation,
population subgroup is a confounder because it is associ-
ated with both genotype frequency and disease risk. The
potential implications of population stratification for the
validity of genetic association studies have been debated
(Knowler et al. 1988; Gelernter et al. 1993; Kittles et al.
2002; Thomas and Witte 2002; Wacholder et al. 2002;
Cardon and Palmer 2003; Wacholder et al. 2000; Ardlie
et al. 2002; Edland et al. 2004; Millikan 2001; Wang et al.
2004; Ioannidis et al. 2004; Marchini et al. 2004; Freedman
et al. 2004; Khlat et al. 2004). Modeling the possible effect
of population stratification (when no effort has been made
to address it) suggests that the effect is likely to be small in
most situations (Wacholder et al. 2000; Ardlie et al. 2002;
Millikan 2001; Wang et al. 2004; Ioannidis et al. 2004).
Meta-analyses of 43 gene-disease associations comprising