To Explain or To Predict? Predictive Analytics in IS Research 3 rd Taiwan Summer Workshop on Information Management July 2015 Galit Shmuéli
Aug 13, 2015
To Explain or To Predict?Predictive Analytics in IS Research
3rd Taiwan Summer Workshop on Information Management
July 2015
Galit Shmuéli
Galit Shmueli ( 徐茉莉 )www.galitshmueli.com
❷ 2000-2002 Carnegie Mellon Univ.Visiting Assistant Prof.Dept. of Statistics
❸ 2002-2012 Univ. of Maryland College ParkAssistant then Associate Prof. of
Statistics & Management Science
R H Smith School of Business
2008-2014 Rigsum Institute (Bhutan)
Co-Director, Rigsum Research Lab
❹ 2011-2014 Indian School of Business SRITNE Chaired Prof. of Data
Analytics, Associate Prof. of Statistics & Info Systems
❶ 1994-2000 Israel Institute of
TechnologyMSc + PhD, Statistics
2014-… NTHUInstitute of Service ScienceDirector, Center for Service
Innovation & Analytics
Research in Data Analytics
www.galitshmueli.com
• Statistical strategy• ‘Entrepreneurial’ statistical &
data mining modeling (new conditions & environments)
• Business analytics
In progress…
Road Map
DefinitionsExplanatory-dominated MISExplanatory modeling ≠ predictive modeling
Why?Different modeling pathsExplanatory power vs. predictive power
How do I use this?
Definitions
Explanatory modeling:Theory-based, statistical testing of causal hypotheses
Explanatory power:Strength of relationship in statistical model
Definitions
Predictive modeling:Empirical method for predicting new observations
Predictive power:Ability to accurately predict new observations
Statistical modeling in MIS research
Purpose: test causal theory (“explain”)Association-based statistical models
Prediction nearly absent
Start with a causal theory
Generate causal hypotheses on constructs
Operationalize constructs → Measurable variables
Fit statistical model
Statistical inference → Causal conclusions
Explanatory modeling à-la MIS
“Empirical prediction aloneis un-scientific”
Some statisticians share this view:
The two goals in analyzing data... I prefer to describe as “management” and “science”. Management seeks profit... Science seeks truth.
- Parzen, Statistical Science 2001
Prediction in top research journals in Information Systems
Predictive goal?Predictive modeling?Predictive assessment?
1990-2006
generate new theorydevelop measurescompare theoriesimprove theoryassess relevanceevaluate predictability
Why Predict? for Scientific Research
Shmueli & Koppius, “Predictive Analytics in IS Research” MIS Quarterly, 2011
“A good explanatory model will also predict well”
“You must understand the underlying causes in order to predict”
Philosophy of Science
“Explanation and prediction have the same logical structure”
Hempel & Oppenheim, 1948
“It becomes pertinent to investigate the possibilities of predictive procedures autonomous of those used for explanation”
Helmer & Rescher, 1959
“Theories of social and human behavior address themselves to two distinct goals of science: (1) prediction and (2) understanding”
Dubin, Theory Building, 1969
Explanatory Model: Test/quantify causal effect for “average” record in population
Predictive Model: Predict new individual observations
Different Scientific Goals
Different generalization
Notation
Theoretical constructs: X, Y
Causal theoretical model: Y=F(X)
Measurable variables: X, Y
Statistical model: E(y)=f(X)
Four aspects
1. Theory – Data
2. Causation – Association
3. Retrospective – Prospective
4. Bias - Variance
Y=F(X)E(Y)=f(X)
“The goal of finding models that are predictively accurate differs from the goal of finding models that are true.”
Four aspects
1. Theory - Data
2. Causation – Association
3. Retrospective – Prospective
4. Bias - Variance
Y=F(X)Y=f(X)
Predict ≠ Explain
+ ?
“we tried to benefit from an extensive set of attributes describing each of the movies in the dataset. Those attributes certainly carry a significant signal and can explain some of the user behavior. However… they could not help at all for improving the [predictive] accuracy.”
Bell et al., 2008
Explain ≠ PredictThe FDA considers two products bioequivalent if the 90% CI of the relative mean of the generic to brand formulation is within 80%-125%
“We are planning to… develop predictive models for bioavailability and bioequivalence”
Lester M. Crawford, 2005Acting Commissioner of Food & Drugs
“For a long time, we thought that Tamoxifen was roughly 80% effective for breast cancer patients.
But now we know much more: we know that it’s 100% effective in 70%-80% of the patients, and ineffective in the rest.”
Goal Definition
Design & Collection
Data Preparation
EDA
Variables? Methods? Evaluation,
Validation & Model Selection
Model Use & Reporting
Study design
Hierarchical data
Observational or experiment?
Primary or secondary data?
Instrument (reliability+validity vs. measur accuracy)
How much data?
How to sample?
& data collection
Which Variables?
Multicollinearity?causation associations
endogeneity ex-post
availability
A, B, A*B?
Evaluation, Validation& Model Selection
Training dataEmpirical model Holdout data
Predictive power
Over-fitting analysis
Theoretical model
Empirical model
Data
ValidationModel fit ≠
Explanatory power
Inference
Model Use
test causal theory
generate new theorydevelop measurescompare theoriesimprove theoryassess relevanceEvaluate predictability
Predictive performance
Over-fitting analysis
Null hypothesis
Naïve/baseline
out-of-sample
Performance Metrics
type I,II errors
goodness-of-fit
p-values
over-fitting
costs
prediction accuracy
interpretation
Training vs. holdout
R2
The predictive power of an explanatory model has important scientific value
Relevance, reality check, predictability
Current State in Social Sciences (and MIS)
“While the value of scientific prediction… is beyond question… the inexact sciences [do not] have…the use of predictive expertise well in hand.”
Helmer & Rescher, 1959
Distinction blurred
Unfamiliarity with predictive modeling/assessment
Prediction underappreciated
State-of-the-art in Industry
Distinction blurred
Prediction over-appreciated
“Big Data” synonymous with prediction
What can be done?Acknowledge difference
Learn/teach predictionLeverage prediction in research
BUT focus on its scientific uses:
generate new theorydevelop measurescompare theoriesimprove theoryassess relevanceevaluate predictability
Why Predict? for Scientific Research
Shmueli & Koppius, “Predictive Analytics in IS Research” MIS Quarterly, 2011