What Makes Data Analysis GOOD: Ethics & Accountability Jonathan Gelfond MD PhD Dept of Epidemiology & Biostatistics UT Health Science Center San Antonio, Texas
What Makes Data Analysis GOOD: Ethics & Accountability
Jonathan Gelfond MD PhD Dept of Epidemiology & Biostatistics UT Health Science Center San Antonio, Texas
Objectives
� Describe why incorrect analyses are harmful to science and society
� Discuss the connections among professional ethics, data quality, and statistical analyses
� List key elements of ethical data analyses � Describe role of accountability in data
analysis
Translational Research � Perform meaningful research
� Accelerate the impact of scientific discovery
� Broaden the role of policy makers & basic scientists to improve the human condition
� Requires evidence-based practice
Most Novel Drugs Don’t Work!
Kola I, Landis J. Can the pharmaceutical industry reduce attrition rates? Nature Reviews Drug Discovery 2004; 3: 711-715.
Why we rely on testing interventions, not our intuition!
Roles of Statistical Analysis
� Construct predictive & validating models of scientific theories
� Guard against the use of ineffective or harmful interventions
� Compare effectiveness of new vs. old policies
� Monitor for patterns of harms from current policies
Statistical Analysis Critical in Translational Research 1 Incorrect stats could harm study
participants 2 Invalid stats may squander resources 3 Poor stats diminish the scientific value of
resultant discoveries 4 Poor stats promote public distrust in
research 5 Poor analysis diminishes the good of
improving care 6 Poor stats harm the reputation of the
statistical profession
Scientific Process & Levels of Reproducibility
Design & Planning
Conducting Study
Data Collection
Data Analysis
Numerical Results
Publication
Reproducible Experiment
Reproducible Publication
Reproducible Analysis
Input
Output Procedure
Scientific Study Process
Exte
rnal
Val
idity
Inte
rnal
Val
idity
Isn’t Science Self-correcting? � Testing by experiment replication is ◦ Impossible without details ◦ Expensive ◦ Unlikely because there is so much research ◦ Relatively inefficient ◦ Ioannidis, J. P. A. Why Science Is Not Necessarily
Self-Correcting. Perspect Psychol Sci 7, 645-654, doi:Doi 10.1177/1745691612464056 (2012).
� Much better to not publish or propagate errors in the first place
Types of Statistical Errors
� Unavoidable risks ◦ Type I Error ◦ Type II Error
� Unethical Errors ◦ Intentional misrepresentation ◦ Errors of neglect
Ethical Standards for Statistics � Atul Gawande surgical checklist ◦ Execution of key elements is highly effective
� What are the critical elements “Good” data analyses?
� Must have several critical elements:
Parsimonious Model Selection
Appropriate Study Design
Quantification of Evidence
Avoidance of Misinterpretation
Verification of Assumptions
Research Timeline
Objectivity
Multidisciplinary Expertise (Statistical & Scientific)
Openness & Transparency
Accuracy of Data & Computation
Gelfond J, Heitman E, Pollock B, Klugman C: Principles for the Ethical Analysis of Clinical and Translational Research. Statistics in Medicine 2011, 30(23):2785-2792.
Ethical Guidelines
Professional Ethics � Should nonstatisticians perform analyses? � Should a layperson perform an
appendectomy?
� Legal/Regulatory Problems
� Problems of Competence
� Resultant Harms
� Specific Fiduciary Concerns
Ethics and Data Analysis
� Statistics is a profession with its own code of ethics (i.e., ASA & International Guidelines)
� The ethical codes of a profession should be adhered to by all practitioners
� Basics standards of statistical practice should also be adhered to
Statistical Accreditation of Individual Professionals � Exists in Australia, Canada and United
Kingdom � Enacted by the American Statistical
Association: ◦ Advanced knowledge & training ◦ Track record of competence & expertise ◦ Effective communicator ◦ Ascent to ethical standards
AMSTAT News, June 2010, p11
Need for Expertise
“I will not use the knife, not even on sufferers
from stone, but will withdraw in favor of such [people] as are
engaged in this work.”
Edelstein L. The Hippocratic Oath: text, translation and interpretation. In: Temkin O, Temkin CL, eds. Ancient medicine: selected papers of Ludwig Edelstein. Baltimore: Johns Hopkins, University Press, 1967:3-64.
Need for Statistical Expertise
“The experiment should be conducted only by scientifically qualified persons.
The highest degree of skill and care should be required through all stages of the
experiment of those who conduct or engage in the experiment.”
Nuremberg Code # 8
Need for Expertise � Scientists and statisticians should work in
harmony � The complexity of statistical science is
expanding � Analyst needs Authenticity ◦ Understand & Believe in methods & results
� Effective Scrutiny requires knowledge ◦ “The weight of evidence for an extraordinary
claim must be proportioned to its strangeness.”—Laplace
Statistics for Researchers (not just statisticians)
…..Where is this division of labor to end? and what object does it finally serve? No doubt another may
also think for me; but it is not therefore desirable that he should do so to the exclusion of my thinking for myself.
Henry Thoreau, Walden
Chefs may cook for you Statisticians may analyze your data In the end, no one can think for us Hence, we must learn statistics even if we collaborate
Objectivity � Statistical analysis is beautiful because it is an
objective measure of truth
� Statistical analyses contain subjective decisions and these should be stated
� Develop analytical plan prior to seeing the data
� How to identify objectivity?
Element 4 Openness and Transparency: All relevant data and analyses must be presented.
http://victrixmedia.com.au/
Openness & Transparency
� Within the bounds of Privacy Concerns! � Disclosure of Conflicts & Competing
interests � Report all analyses (published and
unpublished) � Vioxx ◦ Suppressed reporting of cardiovascular events ◦ Vioxx associated with >7,000 CVD cases
Curfman, G. D., Morrissey, S. & Drazen, J. M. Expression of concern reaffirmed. N. Engl. J. Med. 354, 1193-1193 (2006).
“The [person] of science has learned to believe in justification, not by faith, but
by verification.” Thomas H. Huxley (1825–95) English biologist. On the Advisableness of Improving Natural Knowledge. 1866.
Assessment of Assumptions
Assessment of Assumptions
� All statistical methods have assumptions ◦ Independence, normality, equal variance
� These assumptions must be checked
� Testing assumptions more difficult than implementing methods
When good data goes bad
� Statistical formulae don’t care about data quality
� ALL kinds of data have errors � Data quality validation � How do we detect these errors? � What statistics are useful for detecting
errors?
Accuracy
� No Fabrication of Data � Errors by neglect are unethical � Substandard practices promote errors ◦ Use of Excel ◦ Use of point-and-click software ◦ e.g., published p-values are often wrong
� Reproducibility in Statistics ◦ Not the same as reproducibility in Science
Accuracy � Example: Clinical trial involving a gene
expression predictor � Independent bioinformaticists show that
the gene names were scrambled (!) ◦ Data from a particular gene upholds the
original investigator’s hypothesis (?) � Internal investigation by Duke finds no
impropriety � NYT & others reveal that Investigator had
a pattern of misrepresentation on CV Coombes, K. R., Wang, J. & Baggerly, K. A. Microarrays: retracing steps. Nature Medicine 13, 1276-1277 (2007).
Peng RD (2008). "Caching and distributing statistical analyses in R," Journal of Statistical Software, 26 (7), 1--24.
Schematic of Reproducible Research
http://devnet.jetbrains.net/thread/304042
Raw Data
Published Results
Unpublished Results
Appropriate Design & Sample Size
“The experiment should be such as to yield fruitful results for the good of society.”
Nuremberg Code # 2
Appropriate Design & Sample Size � Poorly designed experiments have little
positive value and are seldom recoverable by statistical analyses
� Too few patients are noninformative & inefficient
� Too many patients may cause excessive harm
� Statistical monitoring plans are required ◦ TGN1412
Element 8 Parsimonious Model Selection Weighed Against Precision, Bias, and Validity
William of Ockham, 1287-1347
Parsimonious Model Selection
“The aim of science is to seek the simplest
explanation of complex facts.”
Alfred North Whitehead (1861–1947) English mathematician and philosopher. Concepts of Nature, p. 163. 1919