Top Banner
Effects of categorization and self-report bias on estimates of the association between obesity and mortality Samuel H. Preston a , Ezra Fishman a , and Andrew Stokes b Samuel H. Preston: [email protected]; Ezra Fishman: [email protected]; Andrew Stokes: [email protected] a Population Studies Center, University of Pennsylvania, 3718 Locust Walk, McNeil 239, Philadelphia, PA, 19104, USA b Department of Global Health, Boston University, 801 Massachusetts Ave., CT362, Boston, MA, 02118, USA Abstract Purpose—The health consequences of obesity are often assessed using categorical, self-reported data on body mass index (BMI). This paper investigates the combined effects of categorization and self-report bias on the estimated association between obesity and mortality. Methods—We used the National Health and Nutrition Examination Survey (1988–2008) linked to death records through 2011. Cox models and age-standardized death rates were used to evaluate the effects of categorization and self-report bias on the mortality risks and percent of deaths attributable to obesity. Results—Despite a correlation between measured and self-reported BMI of 0.96, self-reports miscategorized 20% of adults. Hazard ratios using self-reports were overstated for the obese 1 (BMI 30–35 kg/m 2 ) and obese 2 (BMI 30 kg/m 2 ) categories. The bias was much smaller using a continuous measure of BMI. In contrast, the percent of deaths attributable to excess weight was lower using self-reported versus measured data because self-reports led to systematic downward bias in the BMI distribution. Conclusions—Categorization of BMI and self-report bias combine to produce substantial error in the estimated hazard ratios and percent of deaths attributable to obesity. Future studies should use caution when estimating the association between obesity and mortality using categorical self- reported data. Keywords Body mass index; Obesity; Body weight; Mortality; Epidemiologic methods Correspondence to: Andrew Stokes, [email protected]. Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. HHS Public Access Author manuscript Ann Epidemiol. Author manuscript. Author Manuscript Author Manuscript Author Manuscript Author Manuscript
18

Effects of categorization and self-report bias on estimates of the association between obesity and mortality

May 15, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Effects of categorization and self-report bias on estimates of the association between obesity and mortality

Effects of categorization and self-report bias on estimates of theassociation between obesity and mortality

Samuel H. Prestona, Ezra Fishmana, and Andrew Stokesb

Samuel H. Preston: [email protected]; Ezra Fishman: [email protected]; Andrew Stokes:[email protected] Studies Center, University of Pennsylvania, 3718 Locust Walk, McNeil 239,Philadelphia, PA, 19104, USA

bDepartment of Global Health, Boston University, 801 Massachusetts Ave., CT362, Boston, MA,02118, USA

AbstractPurpose—The health consequences of obesity are often assessed using categorical, self-reporteddata on body mass index (BMI). This paper investigates the combined effects of categorizationand self-report bias on the estimated association between obesity and mortality.

Methods—We used the National Health and Nutrition Examination Survey (1988–2008) linkedto death records through 2011. Cox models and age-standardized death rates were used to evaluatethe effects of categorization and self-report bias on the mortality risks and percent of deathsattributable to obesity.

Results—Despite a correlation between measured and self-reported BMI of 0.96, self-reportsmiscategorized 20% of adults. Hazard ratios using self-reports were overstated for the obese 1(BMI 30–35 kg/m2) and obese 2 (BMI ≥30 kg/m2) categories. The bias was much smaller using acontinuous measure of BMI. In contrast, the percent of deaths attributable to excess weight waslower using self-reported versus measured data because self-reports led to systematic downwardbias in the BMI distribution.

Conclusions—Categorization of BMI and self-report bias combine to produce substantial errorin the estimated hazard ratios and percent of deaths attributable to obesity. Future studies shoulduse caution when estimating the association between obesity and mortality using categorical self-reported data.

KeywordsBody mass index; Obesity; Body weight; Mortality; Epidemiologic methods

Correspondence to: Andrew Stokes, [email protected]'s Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to ourcustomers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review ofthe resulting proof before it is published in its final citable form. Please note that during the production process errors may bediscovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

HHS Public AccessAuthor manuscriptAnn Epidemiol. Author manuscript.A

uthor Manuscript

Author M

anuscriptA

uthor Manuscript

Author M

anuscript

Page 2: Effects of categorization and self-report bias on estimates of the association between obesity and mortality

In 2011–2012, an estimated 34.9% of American adults ages 20 and older were obese basedon measured height and weight collected in the National Health and Nutrition ExaminationSurvey (NHANES) [1]. In contrast, recent estimates from the Behavioral Risk FactorSurveillance System and Gallup Poll, based on self-reported height and weight, place thenational obesity rate in adults ages 18 and above at 29.4% and 27.7% respectively [2,3].

The lower rates reported in these surveys are in part attributable to systematic misreportingof height and weight, which has been documented in numerous studies [4–7]. Acomprehensive review of the accuracy of self-reported weight and height concluded thatboth sexes have a tendency to overreport their height and underreport their weight, withbody mass index (BMI) underestimated as a result [8].

Misreporting may create especially large analytic errors in epidemiological studies whencombined with categorization of BMI. Misreporting that incorrectly transfers someoneacross one of the conventional 5-unit BMI categories connotes large changes in anindividual’s implied weight. For someone 5’10”, such a transfer would imply a movement of35 pounds from the mid-point of the actual class to the midpoint of the adjacent reportedclass, even though weight might be misreported by as little as a pound.

Despite the biases in self-reports, many of the largest and most influential studies of themortality consequences of obesity are based on self-reported weight and height [9–13].Keith et al. [14] explored the differences in estimated relations between BMI and mortalitywhen categorical self-reported data are used rather than measured data. The authorsdescribed the biases from using self-reports as “complicated and inconsistent.” We buildupon that study by taking advantage of a more recent data set, by relating patterns ofmisreporting to biases in estimated relative risks, by using age-standardized death rates inaddition to hazard ratios, and by introducing a continuous measure of BMI. The continuousmeasure enables us to demonstrate how misreporting and categorization combine to producebias. Finally, we also examine the effect of misreporting and categorization on the fractionof deaths attributable to obesity.

MethodsNHANES is a series of nationally representative samples of the non-institutionalized U.S.population conducted by the National Center for Health Statistics (NCHS). The surveyincludes an examination component in which extensive medical data, including height andweight, are collected by trained nurses in mobile clinics or in home-visits [15]. Surveydesign and sample characteristics are described in detail elsewhere [16,17]. We used datafrom NHANES III (conducted 1988 to 1994) and NHANES Continuous, 1999–2008, whichare linked to National Death Index death records through 2011 [18]. NHANES data from2009–2010 were not incorporated because we delay analysis of mortality for three yearsafter baseline to reduce the effects of reverse causality [19]. We used an age range of 40–75and terminated follow-up when individuals reached their 80th birthday because of declineswith age in the mortality risks of obesity [20,21]. Younger individuals were excludedbecause of the small number of deaths in the sample at these ages.

Preston et al. Page 2

Ann Epidemiol. Author manuscript.

Author M

anuscriptA

uthor Manuscript

Author M

anuscriptA

uthor Manuscript

Page 3: Effects of categorization and self-report bias on estimates of the association between obesity and mortality

There were 20,580 non-pregnant individuals aged 40 to 74 at baseline. We excluded subjectswho lacked measured height or weight (n=323) or self-reported height or weight (n=781).We further excluded individuals with very low (BMI<15 kg/m2) or very high (BMI≥75kg/m2) values for measured or self-reported BMI (n=24). Subjects missing data on smokingstatus (n=9) or education (n=53) were also excluded. To reduce the possibility that illnesswas driving weight loss (reverse causality), we excluded individuals who reported adiagnosis of the following smoking-related health conditions: emphysema (n=527) orcancers of the bladder, esophagus, kidney, larynx, lung, mouth/tongue/lip, or pancreas(n=117) [19]. We also excluded persons who had lost significant weight in the 10-yearperiod prior to baseline based on self-reported weight, using a cut-off of the 75th percentileof weight loss among weight losers, or 4.0 BMI units (n=1,046). Finally, after incorporatingmortality follow-up data, we excluded 15 subjects who lacked a date of death or censoringinformation.

Given our focus on the relation between excess weight and mortality, individuals withmeasured BMI below 18.5 were excluded from survival models using measured data(n=204) and individuals with self-reported BMI below 18.5 were excluded from survivalmodels using self-reported data (n=154). For models using self-reports, there were 16,970subjects with 1,979 deaths in 128,616 person-years of observation. For models usingmeasured BMI, there were 16,928 subjects with 1,974 deaths in 128,333 person-years ofobservation. Median follow-up time was 8.4 years.

We used Cox proportional hazards models to estimate the relation between BMI andmortality, using age as exposure time and compared coefficients estimated using self-reported data to those estimated using measured data. The continuous model used a BMIvariable of (BMI – 25.0); individuals with BMI values below 25 were excluded because thefocus was on the incremental risks associated with overweight and obesity. This approachwas also used in the two largest analyses of pooled data [10,22]. The Prospective StudiesCollaboration pooled data on approximately 900,000 individuals in 57 studies anddemonstrated that death rates rose exponentially (i.e., linearly on a log scale) above a BMIof 25 [22]. Such a pattern lends itself to efficient estimation of the mortality hazardsassociated with overweight and obesity by using linear BMI in a hazards model.

In models using discrete categories of BMI, we grouped people into the following categoriesrecommended by the World Health Organization: Normal weight: 18.5 ≤ BMI < 25,Overweight: 25 ≤ BMI < 30; Obese 1: 30 ≤ BMI < 35; and Obese 2: BMI ≥ 35. Weconducted two sensitivity analyses of the model. In one, the sample was restricted to non-smokers in order to assess the impact on estimates of an unusual pattern of BMImisreporting among smokers. The second dropped the three-year exclusion of observationsbetween survey and mortality.

Covariates in the survival models were gender, race/ethnicity (non-Hispanic Black,Hispanic, White and Other), smoking status (current, former, never), and educationalattainment (< high school, high school, > high school). To supplement the analysis of therelative hazards of death associated with excess BMI, we estimated age-standardized death

Preston et al. Page 3

Ann Epidemiol. Author manuscript.

Author M

anuscriptA

uthor Manuscript

Author M

anuscriptA

uthor Manuscript

Page 4: Effects of categorization and self-report bias on estimates of the association between obesity and mortality

rates by BMI category, standardizing in 5-year age groups to the age distribution of the 2000U.S. Census population.

To assess the fraction of deaths attributable to overweight and obesity, we used thefollowing formula for the Population Attributable Fraction [23]:

(1)

where pdi is the proportion of decedents in BMI category i and HRi is the hazard ratio withrespect to mortality for an individual in category i. Individuals in the Normal andUnderweight categories were assigned a hazard ratio of 1.0. To implement this formula inour continuous models, we used 0.5-unit wide BMI categories and associated each categorywith the predicted hazard ratio at the midpoint of the category.

All analyses were weighted to the U.S. civilian non-institutionalized population and adjustedfor the complex survey design of NHANES using the svy: routine in Stata 13 (StataCorp,Texas, USA). Variances were estimated using Taylor Series Linearization, as recommendedby NHANES (15).

ResultsThe correlation between measured BMI and BMI based on self-reported weight and heightwas 0.96 for each sex. Despite this high correlation, 19.5% were reported to be in a different5-unit BMI category from their correct category, 18.5% of men and 20.3% of women. Table1 shows a cross-classification of BMI by reported and measured values. Row proportions(i.e. the proportion of individuals in the measured BMI class who are in the given self-reported cell) are in parentheses. Among those who belong in Obese 2, 27.0% were reportedto be in a lower category. Among those measured to be in Obese 1, 32.0% were reported inthe wrong category, 28.9% in a lower category and 3.1% in a higher category. Except for thevery small Underweight category, Obese 1 had the lowest percentage of people whobelonged in the class who correctly reported themselves to be in the class as well as thelowest percentage of people who reported themselves in the class who belonged in the class.

Table 2 shows the mean of self-reported and measured BMI values within various self-reported BMI categories. In every reporting category for both sexes, the mean measuredBMI was higher than the mean self-reported BMI, consistent with a general tendency tounderreport BMI. The difference between the two means was greater in BMI classes aboveNormal, which is likely to impart an upward bias into estimates of the relative risk of deathin higher self-reported BMI categories.

The largest discrepancy between actual and reported BMI for any category of self-reportsoccurred among self-reported Obese Class 1 women, who were an average of 1.03 BMIunits above what was reported. This discrepancy arose from underreports of weight by anaverage of 4.40 pounds among these women, with a small additional contribution fromoverreported height. The bias was primarily a product of a large inflow into Obese 1 fromObese 2 and a large outflow from Obese 1 to Overweight, as shown in Table 1.

Preston et al. Page 4

Ann Epidemiol. Author manuscript.

Author M

anuscriptA

uthor Manuscript

Author M

anuscriptA

uthor Manuscript

Page 5: Effects of categorization and self-report bias on estimates of the association between obesity and mortality

A second reporting tendency that may raise the estimated risks of excess weight was adifferential pattern of error by smoking status. Current smokers were more likely tooverreport their BMI by more than one unit (15.3%) than were former or never smokers(8.2% for both groups) and less likely to underreport their BMI by more than one unit(26.7% vs. 33.3% and 34.2%) (results not shown). This pattern of errors changed thedistribution of smoking by BMI class in such a way as to produce relatively more smokersin higher BMI classes when using self-reports than when using measured values. Byskewing the smoking distribution toward higher weight classes, this pattern may combinewith residual confounding to bias upwards the risk of death associated with obesity whenusing self-reports compared to the risk using measured values [24].

Figure 1 shows the distribution of errors in BMI. While there was evidently some biastowards underreporting BMI, misreporting occurred in both directions. The magnitude of themisreporting was generally small. 57.9% of the population reported weight and heightvalues that produced a BMI within 1 unit of the measured value, 84.0% were within 2 units,and 96.4% were within 4 units. The distribution of errors appears nearly symmetrical amongmen and skewed to the left among women. Many of the errors in reported weight wereassociated with strong preferences to report a weight ending in zero or five (Appendix A).

Table 3 presents the hazard ratios and confidence intervals of hazard models using bothdiscrete and continuous measures of BMI. We failed to reject the proportional hazardsassumption with regard to BMI (P>0.05). The top panel of Table 3 shows that models basedon continuous, linear BMI variables produced similar hazard ratios regardless of whetherself-reports or measured data were used. For each 5-unit increment in BMI above 25,mortality rose by the factor 1.28 when using measured data and 1.30 using self-reports.Since random error would bias coefficients downwards [25,26], the fact that hazard ratioswere higher for self-reported BMI is likely a product of systematic errors in those data.

When discrete BMI categories were employed, there was a sizeable discrepancy betweenself-reported and measured estimates for the Obese 1 category. Mortality was 40.4% higherthan the reference category using self-reports but only 25.0% higher using measured values.We showed earlier that the Obese 1 self-reported category was subject to the largestreporting errors.

The Obese 2 category also showed higher relative risks of death when self-reported valueswere used rather than measured values. The “migration” of 25.6% of individuals who belongin Obese 2 into self-reported Obese 1 (from Table 1) raises the relative risks in both groups;their mean BMI of 36.9 is below the mean of 40.1 among people who belong in Obese 2 andabove the mean of 32.1 among people who belong in Obese 1.

All confidence intervals associated with Obesity 1, Obesity 2, and continuous BMI in Table3 are larger when self-reports were used than when measured BMI is used. This result isexpected when error is added to an independent variable. However, none of the confidenceintervals would have provided a misleading assessment of the significance of coefficients ata 5% level of significance when self-reports were used.

Preston et al. Page 5

Ann Epidemiol. Author manuscript.

Author M

anuscriptA

uthor Manuscript

Author M

anuscriptA

uthor Manuscript

Page 6: Effects of categorization and self-report bias on estimates of the association between obesity and mortality

Table 4 shows which BMI categories had the largest distortions in age-standardized deathrates when self-reports were used. The category with the largest error in the age-standardized death rate based on self-reports, not surprisingly, was Obese 1, where deathrates using self-reports were overstated by 12.0%. Other self-reported categories had deathrates within 5% of the values corresponding to measured BMI. Death rates in the self-reported Normal category proved particularly robust to reporting errors, perhaps because ithas the widest range of BMI units. Table 2 showed that those self-reported in the Normalcategory had the smallest mean discrepancy between actual and self-reported BMI.

We performed two analyses of the sensitivity of results to changes in the model. Results arepresented in Appendix B. The first repeats the analysis in Table 3 but excludes currentsmokers. As shown in Table B.1, the bias in self-reported coefficients for Obese 1 and 2 ismaintained and in fact strengthens relative to results in Table 3. So it does not appear thatthe bias from misreporting shown in Table 3 is attributable to the unusual BMI reportingpattern among smokers. (However, the slight bias in the coefficient of the continuousvariable using self-reports in Table 3 is eliminated when smokers are excluded.) The secondsensitivity analysis eliminates the three-year delay between baseline survey and mortalityfollow-up. As shown in Table B.2, the coefficients and patterns of bias are little changed bythe additional observations.

How do errors in self reports of height and weight affect the proportion of deaths attributableto obesity? One might expect that the higher hazards associated with obesity in self-reporteddata would result in greater fractions of deaths being attributed to obesity. At the same time,reporting errors bias the BMI distribution towards lower values. Table 5 shows that thesetendencies largely offset one another. Using categories of BMI, 8.9% of deaths wereattributed to obesity when weight and height were measured, compared to 8.4% when self-reports were used. Using a continuous measure of BMI also suggested that self-reportsproduced a smaller fraction of deaths attributable to obesity than measured BMI.Paradoxically, the use of self-reports makes obesity appear to be a greater hazard forindividuals but a slightly smaller hazard for the population.

DiscussionThe largest meta-analysis of the association between obesity and mortality conducted to datestratified studies by self-reported vs. measured data. It reported finding lower risks and lessheterogeneity in studies using measured data than in studies using self-reported data [27]. Afollow-up analysis reported similar patterns although differences in relative risks were small[28]. Another meta-analysis, limited to older adults, found that obesity was hazardous instudies using self-reported data but not in studies using measured data [29]. The direction ofbias reported in these meta-analyses is consistent with our findings.

Keith et al. [14] provided four sets of hazard ratios, separately analyzing NHANES 2 (withdeath follow-up through 1992) and NHANES 3 (follow-up through 2000). Within each setof surveys they showed results for the full sample and for a healthy subgroup. Of the 12pairs of self-reported vs. measured coefficients pertaining to overweight and obesecategories, 7 pairs agreed that coefficients were insignificant, four found significance for the

Preston et al. Page 6

Ann Epidemiol. Author manuscript.

Author M

anuscriptA

uthor Manuscript

Author M

anuscriptA

uthor Manuscript

Page 7: Effects of categorization and self-report bias on estimates of the association between obesity and mortality

measured variable but not the self-reported variable, and one found significance for the self-reported but not the measured variable. None of the pairs agreed that results weresignificant. In contrast, our results indicated that both self-reported and measuredcoefficients pertaining to Obese 1 and Obese 2 categories were significant. Possiblycontributing to the generally smaller and insignificant hazard ratios in the Keith et al. [29]study is that their age range for linking to mortality had no upper limit.

Our findings, although applicable to studies of obesity, might be relevant to the study ofother exposures, such as physical activity, that are often modeled categorically using self-reported data [30,31], when in fact they are measurable, continuous quantities.

The use of NHANES data represents both a strength and a weakness of this study. NHANESprovides the only national data that permits comparison of self-reported weight to measuredweight. However, many NHANES respondents know that their heights and weights will bemeasured, which is likely to reduce the extent of misreporting relative to other data sets [7].

We have shown that the use of self-reports produces an upward bias in estimates of themortality hazard associated with obesity. That bias is mild when BMI is used as acontinuous variable. A 5-unit increment in BMI above 25.0 is associated with a relative riskof 1.28 using measured data and a relative risk of 1.30 using self-reports. The bias fromusing self-reports is greater when discrete categories of BMI are used. A strong bias appearsin the estimated relative risk associated with being Obese 1, undoubtedly related to the factthat Obese 1 has the lowest reporting quality of any category. Age-standardized death ratesconfirm the upward bias in mortality rates in the Obese 1 category, whereas death rates inthe Normal category are scarcely affected by misreporting.

Despite the higher mortality hazards associated with obesity when self-reports are used inplace of measured data, the proportion of deaths attributable to obesity is slightly lowerusing self-reports. The reason is that downward bias in the BMI distribution produced byself-reporting outweighs the upward bias in hazard ratios. Paradoxically, the use of self-reports makes obesity appear a greater hazard for an individual but a smaller hazard for thepopulation.

In conclusion, it is clear that the use of continuous BMI avoids many of the erratic resultsobserved in categorical data. In the latter, even small errors in self-reports transfer highproportions across broad BMI categories. Using continuous BMI helps assure that smallerrors have relatively small effects. Future studies should use caution when estimating theassociation between obesity and mortality using categorical self-reported data.

AcknowledgementsThis project was supported by Grant Number R01AG040212 from the National Institute on Aging. We are gratefulto Jere Behrman, Irma Elo and Lucia Tiererova for comments and suggestions.

References1. Ogden CL, Carroll MD, Kit BK, Flegal KM. Prevalence of childhood and adult obesity in the

United States, 2011–2012. JAMA. 2014; 311:806–814. [PubMed: 24570244]

Preston et al. Page 7

Ann Epidemiol. Author manuscript.

Author M

anuscriptA

uthor Manuscript

Author M

anuscriptA

uthor Manuscript

Page 8: Effects of categorization and self-report bias on estimates of the association between obesity and mortality

2. Behavioral Risk Factor Surveillance System. [cited 2015 Feb 15] Prevalence and Trends Data[Internet]. Prevalence and Trends Data. 2013. Available from: http://apps.nccd.cdc.gov/brfss/

3. Gallup Healthways Well-Being Index. State of American Well-Being. 20144. Ezzati M, Martin H, Skjold S, Hoorn SV, Murray CJL. Trends in national and state-level obesity in

the USA after correction for self-report bias: analysis of health surveys. J R Stat Soc. Royal SocMed. 2006; 99(5):250.

5. Hattori A, Sturm R. The obesity epidemic and changes in self-report biases in BMI. Obesity. 2013Apr; 21(4):856–860. [PubMed: 23712990]

6. Merrill R, Richardson J. Validity of self-reported height, weight, and body mass index: findingsfrom the National Health and Nutrition Examination Survey, 2001–2006. Prev Chronic Dis. 2009;6(4) [PubMed: 19754997]

7. Stommel M, Schoenborn C. Accuracy and usefulness of BMI measures based on self-reportedweight and height: findings from the NHANES & NHIS 2001–2006. BMC Public Health. 2009 Jan.9:421. [PubMed: 19922675]

8. Gorber S, Tremblay M, Moher D, Gorber B. A comparison of direct vs. self-report measures forassessing height, weight and body mass index: a systematic review. Obes Rev. 2007 Jul; 8(4):307–326. [PubMed: 17578381]

9. Calle EE, Rodriguez C, Walker-thurmond K, Thun MJ. Overweight, Obesity, and Mortality fromCancer in a Prospectively Studied Cohort of U.S. Adults. N Engl J Med. 2003; 348(17):1625–1638.[PubMed: 12711737]

10. Gonzalez, ABDe; Hartge, P.; Cerhan, JR.; Flint, AJ.; Hannan, L.; Macinnis, RJ., et al. Body-MassIndex and Mortality among 1.46 Million White Adults. N Engl J Med. 2010; 363:2211–2219.

11. Masters RK, Reither EN, Powers D, Yang YC, Burger AE, Link BG. The Impact of Obesity on USMortality Levels: The Importance of Age and Cohort Factors in Population Estimates. Am JPublic Health. 2013 Oct; 103(10):1895–1901. [PubMed: 23948004]

12. Zajacova A, Burgard S. Shape of the BMI-Mortality Association by Cause of Death, UsingGeneralized Additive Models: NHIS 1986–2006. J Aging Health. 2012; 24:191–211. [PubMed:21558491]

13. Park Y, Wang S, Kitahara CM, Moore SC, De Gonzalez AB, Bernstein L, et al. Body mass indexand risk of death in Asian Americans. Am J Public Health. 2014; 104:520–525.

14. Keith SW, Fontaine KR, Pajewski NM, Mehta T, Allison DB. Use of self-reported height andweight biases the body mass index-mortality association. Int J Obes. Nature Publishing Group.2011 Mar; 35(3):401–408.

15. National Center for Health Statistics. National Health and Nutrition Examination Survey: AnalyticGuidelines, 1999–2010. Vital Heal Stat. 2013; 2(161)

16. National Center for Health Statistics. Plan and operation of the third National Health and NutritionExamination Survey, 1988–94. Vital Heal Stat. 1994; 1(32)

17. National Center for Health Statistics. National Health and Nutrition Examination Survey : Plan andOperations, 1999 – 2010. Vital Heal Stat. 2010; 1(56):1999–2010.

18. National Center for Health Statistics. Hyattsville, MD: 2009. Office of Analysis and Epidemiology.The Third National Health and Nutrition Examination Survey (NHANES III) Linked MortalityFile, mortality follow-up through 2006: Matching Methodology.

19. Preston SH, Stokes A. Obesity paradox: conditioning on disease enhances biases in estimating themortality risks of obesity. Epidemiology. 2014 May; 25(3):454–461. [PubMed: 24608666]

20. Bender R, Jöckel KH, Trautner C, Spraul M, Berger M. Effect of age on excess mortality inobesity. JAMA. 1999; 281:1498–1504. [PubMed: 10227319]

21. Stevens J, Cai J, Pamuk E, Williamson D, Thun MJ, Wood J. The effect of age on the associationbetween body-mass index and mortality. N Engl J Med. Mass Medical Soc. 1998; 338(1):1–7.

22. Whitlock, G.; Lewington, S.; Sherliker, P.; Clarke, R.; Emberson, J.; Halsey, J., et al. Lancet. Vol.373. Elsevier Ltd; 2009 Mar 28. Body-mass index and cause-specific mortality in 900 000 adults:collaborative analyses of 57 prospective studies; p. 1083-1096.

23. Benichou J. A Review of Adjusted Estimators of Attributable Risk. Stat Methods Med Res. 2001Jun.10:195–216. [PubMed: 11446148]

Preston et al. Page 8

Ann Epidemiol. Author manuscript.

Author M

anuscriptA

uthor Manuscript

Author M

anuscriptA

uthor Manuscript

Page 9: Effects of categorization and self-report bias on estimates of the association between obesity and mortality

24. Renehan AG, Leitzmann MF, Zwahlen M. Re: body mass index and risk of lung cancer amongnever, former, and current smokers. J Natl Cancer Inst. 2012 Nov 7; 104(21):1680–1681. authorreply 1681. [PubMed: 22933730]

25. Carroll, RJ.; Ruppert, D.; Stefanski, L.; Crainiceanu, C. Measurement Error in Nonlinear Models:A Modern Perspective. 2nd ed.. Taylor & Francis; 2006.

26. Greene W, Cai J. Measurement error in covariates in the marginal hazards model for multivariatefailure time data. Biometrics. 2004 December.:987–996. [PubMed: 15606419]

27. Flegal K, Kit B, Orpana H, Graubard B. Association of all-cause mortality with overweight andobesity using standard body mass index categories: a systematic review and meta-analysis. JAMA.2013 Jan 2.309(1):71. [PubMed: 23280227]

28. Flegal KM, Kit BK, Graubard BI. Body mass index categories in observational studies of weightand risk of death. Am J Epidemiol. 2014; 180(4):288–296. [PubMed: 24893710]

29. Janssen I, Mark AE. Elevated body mass index and mortality risk in the elderly. Obes Rev. 2007;8(17):41–59. [PubMed: 17212795]

30. Woodcock J, Franco OH, Orsini N, Roberts I. Non-vigorous physical activity and all-causemortality: systematic review and meta-analysis of cohort studies. International Journal ofEpidemiology. 2011; 40(1):121–138. [PubMed: 20630992]

31. Zhao G, Li C, Ford ES, Fulton JE, Carlson SA, Okoro CA, Wen XJ, Balluz LS. Leisure-timeaerobic physical activity, muscle-strengthening activity and mortality risks among US adults: theNHANES linked mortality study. British Journal of Sports Medicine. 2014; 48(3):244–249.[PubMed: 24096895]

Appendix A

Digit preference in self reportsMany of the errors in reporting were associated with digit preference in reported weight.Figure A.1 shows the proportions reporting weights that ended in digits 0 or 5 compared tothe proportions based on measured weight. Altogether, 69.3% of men and 62.7% of womenreported a weight ending in 0 or 5, compared to 20.6% and 19.7% when measured weight(rounded to the nearest integer) was used. 35.2% of those who reported a weight ending in 0or 5 underreported their BMI by more than one unit while 9.8% overreported it by more thanone unit. That downward bias was especially strong among women reporting a weightending in 0 or 5 (40.1% versus 7.6%). Among people reporting a weight ending in anotherdigit, 26.7% underreported their BMI by more than one unit and 9.9% overreported it. Adownward bias was still present but was weaker. Digit preference on weight helps to explainthe high frequency and generally small magnitude of error in reported BMI; it is clear thatmany people rounded their weight down to a digit ending in 0 or 5.

Height reporting showed much less digit preference, perhaps because the normal range ofheight in inches is much smaller than the range of weights in pounds, and because repeatedmeasurements typically identify a stable value of height.

Figure A.1: Proportion of population with weight ending in digits 5 or 0, self-reported vs.measured weight

Preston et al. Page 9

Ann Epidemiol. Author manuscript.

Author M

anuscriptA

uthor Manuscript

Author M

anuscriptA

uthor Manuscript

Page 10: Effects of categorization and self-report bias on estimates of the association between obesity and mortality

Appendix B

Sensitivity analysis of hazard modelsWe have performed two analyses of the sensitivity of results of our hazard model to changesin the model specification. The first repeats the analysis in Table 3 but excludes smokers.The second includes all smoking categories and removes the 3-year delay between surveyand observation of mortality. Results are presented in Tables B.1 and B.2.

Table B.1

Hazard ratios and confidence intervals from multivariate hazard models predicting mortalityrates as a function of body mass index: Non-smokers only (never-smokers and former-smokers)

Self-reported BMI Measured BMI

Continuous BMI>25 1.303[1.191,1.427]

1.310[1.213,1.415]

N subjects 9629 10071

Preston et al. Page 10

Ann Epidemiol. Author manuscript.

Author M

anuscriptA

uthor Manuscript

Author M

anuscriptA

uthor Manuscript

Page 11: Effects of categorization and self-report bias on estimates of the association between obesity and mortality

Self-reported BMI Measured BMI

N Deaths; PY Lived 942; 70678 986; 73929

Categorical BMI:

Over 0.862[0.709,1.049]

0.829[0.665,1.034]

Obese 1 1.337[1.055,1.693]

1.113[0.887,1.395]

Obese 2 1.868[1.397,2.498]

1.676[1.292,2.174]

N subjects 13181 13167

N Deaths; PY Lived 1263; 98506 1265; 98375

Models using categorized self-reported data excluded individuals with self-reported BMI < 18.5 kg/m2 and models usingcategorized measured data excluded individuals with measured BMI < 18.5 kg/m2. Models using continuous self-reporteddata excluded individuals with self-reported BMI <25 kg/m2 and models using continuous measured data excludedindividuals with measured BMI < 25 kg/m2. “Self-reported BMI” is BMI calculated based on self-reported weight andheight.The first three years of follow-up data are excluded.Variables controlled in all regressions: sex, race/ethnicity (non-Hispanic Black, Hispanic, White and Other), educationalattainment (less than high school, high school, more than high school), and an indicator for former smoker.95% Confidence intervals in brackets.Hazard ratios for models with continuous BMI represent the change in the hazard of death associated with a five-unitincrement in BMI.Definitions of categories: Normal: 18.5 ≤ BMI < 25 (reference group); Over: 25 ≤ BMI < 30; Obese 1: 30 ≤ BMI < 35;Obese 2: BMI ≥ 35.

Table B.2

Hazard ratios and confidence intervals from multivariate hazard models predicting mortalityrates as a function of body mass index. First three years of follow-up included.

Self-reported BMI Measured BMI

Continuous BMI>25 1.308[1.211,1.413]

1.287[1.199,1.381]

N subjects 14504 15076

N Deaths; PY Lived 1705; 129659 1774; 134665

Categorical BMI:

Over 0.874[0.761,1.004]

0.925[0.792,1.080]

Obese 1 1.335[1.133,1.573]

1.215[1.057,1.397]

Obese 2 1.913[1.548,2.364]

1.901[1.556,2.321]

N subjects 20406 20352

N Deaths; PY Lived 2477; 186402 2468; 185614

Models using categorized self-reported data excluded individuals with self-reported BMI < 18.5 kg/m2 and models usingcategorized measured data excluded individuals with measured BMI < 18.5 kg/m2. Models using continuous self-reporteddata excluded individuals with self-reported BMI <25 kg/m2 and models using continuous measured data excludedindividuals with measured BMI < 25 kg/m2. “Self-reported BMI” is BMI calculated based on self-reported weight andheight.Variables controlled in all regressions: sex, race/ethnicity (non-Hispanic Black, Hispanic, White and Other), smoking status(current smoker, former smoker, never smoked), and educational attainment (less than high school, high school, more thanhigh school).95% Confidence intervals in brackets.

Preston et al. Page 11

Ann Epidemiol. Author manuscript.

Author M

anuscriptA

uthor Manuscript

Author M

anuscriptA

uthor Manuscript

Page 12: Effects of categorization and self-report bias on estimates of the association between obesity and mortality

Hazard ratios for models with continuous BMI represent the change in the hazard of death associated with a five-unitincrement in BMI.Definitions of categories: Normal: 18.5 ≤ BMI < 25 (reference group); Over: 25 ≤ BMI < 30; Obese 1: 30 ≤ BMI < 35;Obese 2: BMI ≥ 35.

Preston et al. Page 12

Ann Epidemiol. Author manuscript.

Author M

anuscriptA

uthor Manuscript

Author M

anuscriptA

uthor Manuscript

Page 13: Effects of categorization and self-report bias on estimates of the association between obesity and mortality

Figure 1. Distribution of errors in BMI based on self-reportData: NHANES 3 (1988–94) & NHANES Continuous, 1999–2008, nonpregnant U.S. adultsaged 40–74 with valid data for self-reported height and weight, measured height and weight,education, and smoking status (n=12,932). All values are survey-weighted. BMI = BodyMass Index (kg/m2). “Self-reported BMI” is BMI calculated based on self-reported weightand height.

Preston et al. Page 13

Ann Epidemiol. Author manuscript.

Author M

anuscriptA

uthor Manuscript

Author M

anuscriptA

uthor Manuscript

Page 14: Effects of categorization and self-report bias on estimates of the association between obesity and mortality

Author M

anuscriptA

uthor Manuscript

Author M

anuscriptA

uthor Manuscript

Preston et al. Page 14

Tabl

e 1

Join

t Dis

tribu

tion

of P

opul

atio

n by

Sel

f-R

epor

ted

and

Mea

sure

d B

ody

Mas

s Ind

ex

Mea

sure

dcl

ass

Self-

repo

rted

cla

ss

Und

erN

orm

alO

ver

Obe

se 1

Obe

se 2

Tot

al

Und

er0.

74%

0.54

%0.

00%

0.02

%0.

00%

1.29

%

Nor

mal

0.33

%28

.08%

2.31

%0.

02%

0.00

%30

.74%

Ove

r0.

01%

5.24

%29

.61%

1.50

%0.

03%

36.3

9%

Obe

se 1

0.00

%0.

09%

5.43

%13

.22%

0.61

%19

.35%

Obe

se 2

0.00

%0.

01%

0.21

%3.

13%

8.87

%12

.22%

Tota

l1.

08%

33.9

4%37

.56%

17.9

0%9.

52%

100.

00%

Def

initi

ons o

f cla

sses

: Bod

y m

ass i

ndex

: Und

er: B

MI <

18.

5 kg

/m2 ;

Nor

mal

: 18.

5≤B

MI<

25; O

ver:

25≤B

MI<

30; O

bese

1: 3

0≤B

MI<

35; O

bese

2: B

MI ≥

35.

“Se

lf-re

porte

d B

MI”

is B

MI c

alcu

late

d ba

sed

on se

lf-re

porte

d w

eigh

t and

hei

ght.

Ann Epidemiol. Author manuscript.

Page 15: Effects of categorization and self-report bias on estimates of the association between obesity and mortality

Author M

anuscriptA

uthor Manuscript

Author M

anuscriptA

uthor Manuscript

Preston et al. Page 15

Table 2

Mean Body Mass Index and Mean Error in Reporting Weight and Height, by Self-Reported BMI class

Class based on self-reported BMI

Mean BMI, self-reported

(kg/m2)

Mean BMI,measured

(kg/m2)Mean Error:

Weight (lb)Mean Error:

Height (in)

A) Men

Under 17.78 18.03 2.26 1.10

Normal 22.98 23.20 1.46 0.64

Over 27.22 27.66 −0.19 0.51

Obese 1 32.09 32.61 −0.71 0.44

Obese 2 39.36 40.01 −1.66 0.36

B) Women

Under 17.70 18.23 −1.42 0.49

Normal 22.31 22.84 −1.98 0.26

Over 27.29 28.17 −3.40 0.33

Obese 1 32.18 33.21 −4.40 0.24

Obese 2 40.34 41.30 −4.59 0.13

Definitions of classes: Under: Body mass index (BMI)<18.5 kg/m2; Normal: 18.5 ≤ BMI < 25; Over: 25 ≤ BMI < 30; Obese 1: 30 ≤ BMI < 35;Obese 2: BMI ≥ 35. Error is calculated as (Self-reported value) − (Measured value), so negative errors indicate under-reporting. “Self-reportedBMI” is BMI calculated based on self-reported weight and height.

Ann Epidemiol. Author manuscript.

Page 16: Effects of categorization and self-report bias on estimates of the association between obesity and mortality

Author M

anuscriptA

uthor Manuscript

Author M

anuscriptA

uthor Manuscript

Preston et al. Page 16

Table 3

Hazard ratios and confidence intervals from multivariate hazard models predicting mortality rates as afunction of body mass index

Self-reported BMI Measured BMI

Continuous BMI>25 1.30[1.20,1.42]

1.28[1.19,1.38]

N subjects 11,962 12,430

N Deaths; PY Lived 1,377; 88,902 1,424; 92,303

Categorical BMI:

Over 0.91[0.78,1.06]

0.93[0.78,1.11]

Obese 1 1.40[1.18,1.67]

1.25[1.07,1.47]

Obese 2 1.98[1.57,2.51]

1.90[1.54,2.35]

N subjects 16,970 16,928

N Deaths; PY Lived 1,979; 128,616 1,974; 128,333

Models using categorized self-reported data excluded individuals with self-reported BMI < 18.5 kg/m2 and models using categorized measureddata excluded individuals with measured BMI < 18.5 kg/m2. Models using continuous self-reported data excluded individuals with self-reportedBMI <25 kg/m2 and models using continuous measured data excluded individuals with measured BMI < 25 kg/m2.The first three years of follow-up data are excluded.Variables controlled in all regressions: sex, race/ethnicity (non-Hispanic Black, Hispanic, White and Other), smoking status (never, former,current), and educational attainment (less than high school, high school, more than high school).95% Confidence intervals in brackets.Hazard ratios for models with continuous BMI represent the change in the hazard of death associated with a five-unit increment in BMI.Definitions of categories: Normal: 18.5 ≤ BMI < 25 (reference group); Over: 25 ≤ BMI < 30; Obese 1: 30 ≤ BMI < 35; Obese 2: BMI ≥ 35. “Self-reported BMI” is BMI calculated based on self-reported weight and height.

Ann Epidemiol. Author manuscript.

Page 17: Effects of categorization and self-report bias on estimates of the association between obesity and mortality

Author M

anuscriptA

uthor Manuscript

Author M

anuscriptA

uthor Manuscript

Preston et al. Page 17

Table 4

Age-Standardized Death Rates by BMI Category, Self-Reported and Measured BMI

(Death rate per 1,000 person-years)

CategorySelf-Reported

BMIMeasured

BMIError in Self-

Report

Normal(18.5 ≤ BMI < 25)

12.30 12.22 0.7%

Overweight(25 ≤ BMI < 30)

10.92 11.42 −4.4%

Obese 1(30 ≤ BMI < 35)

15.26 13.62 12.0%

Obese 2(BMI ≥ 35)

17.88 17.98 0.6%

All values are survey-weighted. Individuals were censored at age 80.Error = 100 * [(Self-reported BMI) − (Measured BMI)]/(Measured BMI).Negative errors indicate that the mortality rate among individuals in a given category based on self-reported height and weight was lower than themortality rate among individuals in the same category based on measured height and weight.The self-reported "Normal" category includes individuals whose measured BMI was <18.5 but whose self-reported BMI was normal. Similarly, themeasured "normal" category includes individuals whose self-reported BMI was <18.5 but whose measured BMI was normal. “Self-reported BMI”is BMI calculated based on self-reported weight and height.Mortality rates are standardized to the age distribution of the population aged 40 to 80 in the 2000 U.S. Census.

Ann Epidemiol. Author manuscript.

Page 18: Effects of categorization and self-report bias on estimates of the association between obesity and mortality

Author M

anuscriptA

uthor Manuscript

Author M

anuscriptA

uthor Manuscript

Preston et al. Page 18

Table 5

Population Attributable Fractions (PAF) for Obesity

Self-reported BMI Measured BMI Discrepancy

Continuous BMI 0.1651 0.1792 −7.86%

Categorical BMI 0.0838 0.0894 −6.29%

Discrepancy = [(PAF based on self-reports) - (PAF based on measured)] / (PAF based on measured), so negative values indicate that the PAF whenusing self-reports is less than the PAF when using measured height and weight. Results of hazard models used in PAF calculations are shown inTable 3 above. “Self-reported BMI” is BMI calculated based on self-reported weight and height.

Ann Epidemiol. Author manuscript.