Top Banner
To Explain or To Predict? Predictive Analytics in IS Research 3 rd Taiwan Summer Workshop on Information Management July 2015 Galit Shmuéli
48

Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Aug 13, 2015

Download

Data & Analytics

Galit Shmueli
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

To Explain or To Predict?Predictive Analytics in IS Research

3rd Taiwan Summer Workshop on Information Management

July 2015

Galit Shmuéli

Page 2: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Galit Shmueli ( 徐茉莉 )www.galitshmueli.com

❷ 2000-2002 Carnegie Mellon Univ.Visiting Assistant Prof.Dept. of Statistics

❸ 2002-2012 Univ. of Maryland College ParkAssistant then Associate Prof. of

Statistics & Management Science

R H Smith School of Business

2008-2014 Rigsum Institute (Bhutan)

Co-Director, Rigsum Research Lab

❹ 2011-2014 Indian School of Business SRITNE Chaired Prof. of Data

Analytics, Associate Prof. of Statistics & Info Systems

❶ 1994-2000 Israel Institute of

TechnologyMSc + PhD, Statistics

2014-… NTHUInstitute of Service ScienceDirector, Center for Service

Innovation & Analytics

Page 3: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Research in Data Analytics

www.galitshmueli.com

• Statistical strategy• ‘Entrepreneurial’ statistical &

data mining modeling (new conditions & environments)

• Business analytics

In progress…

Page 4: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

www.iss.nthu.edu.tw

Page 5: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Road Map

DefinitionsExplanatory-dominated MISExplanatory modeling ≠ predictive modeling

Why?Different modeling pathsExplanatory power vs. predictive power

How do I use this?

Page 6: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Definitions

Explanatory modeling:Theory-based, statistical testing of causal hypotheses

Explanatory power:Strength of relationship in statistical model

Page 7: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Definitions

Predictive modeling:Empirical method for predicting new observations

Predictive power:Ability to accurately predict new observations

Page 8: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Explain PredictDescribe

Matching Game

Social Sciences (MIS included)

Machine learning

Statistics

Page 9: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Statistical modeling in MIS research

Purpose: test causal theory (“explain”)Association-based statistical models

Prediction nearly absent

Page 10: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Start with a causal theory

Generate causal hypotheses on constructs

Operationalize constructs → Measurable variables

Fit statistical model

Statistical inference → Causal conclusions

Explanatory modeling à-la MIS

Page 11: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

In MIS,

data analysis is mainly used for testing causal theory.

“If it explains, it predicts”

Page 12: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

“Empirical prediction aloneis un-scientific”

Some statisticians share this view:

The two goals in analyzing data... I prefer to describe as “management” and “science”. Management seeks profit... Science seeks truth.

- Parzen, Statistical Science 2001

Page 13: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Prediction in top research journals in Information Systems

Predictive goal?Predictive modeling?Predictive assessment?

1990-2006

Page 14: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

52 “predictive” articles among 1,072 in Information Systems top journals

Page 15: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

generate new theorydevelop measurescompare theoriesimprove theoryassess relevanceevaluate predictability

Why Predict? for Scientific Research

Shmueli & Koppius, “Predictive Analytics in IS Research” MIS Quarterly, 2011

Page 16: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

“A good explanatory model will also predict well”

“You must understand the underlying causes in order to predict”

Page 17: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Philosophy of Science

“Explanation and prediction have the same logical structure”

Hempel & Oppenheim, 1948

“It becomes pertinent to investigate the possibilities of predictive procedures autonomous of those used for explanation”

Helmer & Rescher, 1959

“Theories of social and human behavior address themselves to two distinct goals of science: (1) prediction and (2) understanding”

Dubin, Theory Building, 1969

Page 18: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Why statistical

explanatory modeling differs from

predictive modeling

Page 19: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Explanatory Model: Test/quantify causal effect for “average” record in population

Predictive Model: Predict new individual observations

Different Scientific Goals

Different generalization

Page 20: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Theory vs. its manifestation

?

Page 21: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Notation

Theoretical constructs: X, Y

Causal theoretical model: Y=F(X)

Measurable variables: X, Y

Statistical model: E(y)=f(X)

Page 22: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Four aspects

1. Theory – Data

2. Causation – Association

3. Retrospective – Prospective

4. Bias - Variance

Y=F(X)E(Y)=f(X)

Page 23: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

“The goal of finding models that are predictively accurate differs from the goal of finding models that are true.”

Page 24: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Best explanatory model

Best predictive model

Point #1

Page 25: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Four aspects

1. Theory - Data

2. Causation – Association

3. Retrospective – Prospective

4. Bias - Variance

Y=F(X)Y=f(X)

Page 26: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Predict ≠ Explain

+ ?

“we tried to benefit from an extensive set of attributes describing each of the movies in the dataset. Those attributes certainly carry a significant signal and can explain some of the user behavior. However… they could not help at all for improving the [predictive] accuracy.”

Bell et al., 2008

Page 27: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Predict ≠ Explain

Page 28: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Explain ≠ PredictThe FDA considers two products bioequivalent if the 90% CI of the relative mean of the generic to brand formulation is within 80%-125%

“We are planning to… develop predictive models for bioavailability and bioequivalence”

Lester M. Crawford, 2005Acting Commissioner of Food & Drugs

Page 29: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

“For a long time, we thought that Tamoxifen was roughly 80% effective for breast cancer patients.

But now we know much more: we know that it’s 100% effective in 70%-80% of the patients, and ineffective in the rest.”

Page 30: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Goal Definition

Design & Collection

Data Preparation

EDA

Variables? Methods? Evaluation,

Validation & Model Selection

Model Use & Reporting

Page 31: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Study design

Hierarchical data

Observational or experiment?

Primary or secondary data?

Instrument (reliability+validity vs. measur accuracy)

How much data?

How to sample?

& data collection

Page 32: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Data Preprocessing

reduced-feature models

missing

partitioning

Page 33: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

PCASVD

Interactive visualization

Data exploration & reduction

Page 34: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Which Variables?

Multicollinearity?causation associations

endogeneity ex-post

availability

A, B, A*B?

Page 35: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

ensemblesShrinkage models

variance bias

Methods / ModelsBlackbox / interpretableMapping to theory

Page 36: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Evaluation, Validation& Model Selection

Training dataEmpirical model Holdout data

Predictive power

Over-fitting analysis

Theoretical model

Empirical model

Data

ValidationModel fit ≠

Explanatory power

Page 37: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Inference

Model Use

test causal theory

generate new theorydevelop measurescompare theoriesimprove theoryassess relevanceEvaluate predictability

Predictive performance

Over-fitting analysis

Null hypothesis

Naïve/baseline

Page 38: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Point #2

Explanatory Power

Predictive Power ≠

Cannot infer one from the other

Page 39: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

out-of-sample

Performance Metrics

type I,II errors

goodness-of-fit

p-values

over-fitting

costs

prediction accuracy

interpretation

Training vs. holdout

R2

Page 40: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Explanatory Power

Pred

ictiv

e Po

wer

Page 41: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

The predictive power of an explanatory model has important scientific value

Relevance, reality check, predictability

Page 42: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Current State in Social Sciences (and MIS)

“While the value of scientific prediction… is beyond question… the inexact sciences [do not] have…the use of predictive expertise well in hand.”

Helmer & Rescher, 1959

Distinction blurred

Unfamiliarity with predictive modeling/assessment

Prediction underappreciated

Page 43: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

How does this impact

Scientific Research?

Page 44: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

State-of-the-art in Industry

Distinction blurred

Prediction over-appreciated

“Big Data” synonymous with prediction

Page 45: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

How does this impact an organization’s actions?

…and our lives?

Page 46: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

What can be done?Acknowledge difference

Learn/teach predictionLeverage prediction in research

BUT focus on its scientific uses:

Page 47: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

generate new theorydevelop measurescompare theoriesimprove theoryassess relevanceevaluate predictability

Why Predict? for Scientific Research

Shmueli & Koppius, “Predictive Analytics in IS Research” MIS Quarterly, 2011

Page 48: Predictive analytics in Information Systems Research (TSWIM 2015 keynote)

Shmueli (2010) “To Explain or To Predict?”, Statistical ScienceShmueli & Koppius (2011) “Predictive Analytics in IS Research”, MISQ