1 When Prediction Met PLS: What We Learned in 3 Years of Marriage Galit Shmueli National Tsing Hua University, Taiwan PLS 2017, June 17, Macau
Jan 22, 2018
1
When Prediction Met PLS: What We Learned in 3 Years of Marriage
Galit ShmueliNationalTsingHuaUniversity,Taiwan PLS2017,June17,Macau
2
CrossingModelingBorders:UsingPredictiveModelsforCausalExplanation
andUsingExplanatoryModelsforPrediction
Bestexplanatorymodel
Bestpredictivemodel
≠
Point#1Point#2
ExplanatoryPower
PredictivePower≠
Cannotinferonefromtheother
Shmueli(2010)“ToExplainorToPredict?”,StatisticalScienceShmueli&Koppius(2011)“PredictiveAnalyticsinISResearch”,MISQ
PLSvsNN 3
SCECR2010,NY
The Future of PLS-PM: Prediction or Explanation?
2010 20142015
20162017
Mediator&Prediction
4
Predictionwithmodelsforobservabledata(regression,machinelearningalgorithms)
Predictionwithlatentvariablemodels(PLS,CB-SEM)
5
GeneratingPredictions
&PredictionErrors
EvaluatingPredictivePerformance
ConductingPredictiveSimulationStudies
UsingPLSPredictions
SimplePLSModel
6
X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
3DPredictionLandscape(latentmodels)
7
In-sample
Out-of-sample
D2
D1Construct Item
Averagecase
Case-wiseD3
Machinelearning
GeneratingPredictions&PredictionErrors
Q#1:whattopredict?
9
? Shouldwepredictitemsorcomposites?(wecanpredictboth!)
Answer:Dependsonrequiredaction
10
Abilitytogeneratetestablepredictions
1. Generatepredictions2. Evaluateaccuracyof
predictions
Challenge:PLSmodelscangeneratetestablepredictionsforitemsbutuntestablepredictionsforcomposites
validstructuralcommunal
latentoperative
redundant
6TypesofPredictionfromPLSModels
11
IN OUT validIN OUT
structuralIN OUT
communal
IN OUT
redundant
IN OUT
latent
IN OUT
operative
Lohmoller(1989) Predictoutcome
Evaluatepredictions
in-sample
out-of-sample Over-fitting?
AverageCasevs.Case-wise
12
X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
k
k
k
k
k
k
k
k
k
k
k
k
k
k
𝒚𝐢𝐣𝐤
WhyPredicttheAverageCase?
13
Somesocialscientiststhink• Predictingbehaviorofindividualsisdifficult• Predictingbehaviorofgroupsispossible
X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
Q#2:predictionerrorsofwhat?
14
? RMSEperitemorpercomposite?
𝒆𝐢𝐣
X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
k
k
k
k
k
k
k
k
k
k
k
k
k
k
Q#3:computingpredictionintervals
15
?Howtoestimatepredictionvarianceforaverage-case?Forcase-wise?
Pointpredictionssameforcase-wiseandaveragecase
𝒚&𝐢𝐣𝐤 = 𝒚&𝐢𝐣 = 𝒚(𝒊𝒋.
Answer:Averagecase->usebootstrapCasewise ->bootstrap+error
Scenario:
Wehaveanewrecord
Option1:Predictthevaluefor“thatkindofrecord”
Option2:Predictthevalueforthatspecificrecord
16
PredictionIntervalforAverageCase
17
X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
!"′$%X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
!"′$%X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
!"′$%X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
!"′$%X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
!"′$%X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
!"′$%X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
!"′$%X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
!"′$%
Trainingsamplesizen
1. GetBbootstrapsamplesoftrainingdata2. FitPLSmodeltoeachbootstrap(Bmodels)3. GetBpredictionsforthenewrecord:
4. Use5th,95th percentilesfromtheBpredictionstoget90%PI(foraveragecase)
𝒚&𝟏𝐢𝐣𝐍𝐄𝐖, …, 𝒚&𝑩𝐢𝐣𝐍𝐄𝐖
CapturesuncertaintyduetoPLSmodelestimation
Trainingsamplesizen
1. GetBbootstrapsamplesoftrainingdata2. FitPLSmodeltoeachbootstrap(Bmodels)3. Foreachbootstrapsampleb:
• Getpredictionsfornewrecordandeachtrainingrecord:𝒚&𝒃𝐢𝐣𝐤 (k=1,…,n)𝒚&𝒃𝐢𝐣𝐍𝐄𝐖
• Computentrainingpredictionerrors:𝒆𝒃𝐢𝐣𝐤 = 𝒚𝒃𝐢𝐣𝐤 - 𝒚&𝒃𝐢𝐣𝐤
• Addrandomlyselectederrorto𝒚&𝒃𝐢𝐣𝐍𝐄𝐖
𝒚& ∗ 𝒃𝐢𝐣𝐍𝐄𝐖 = 𝒚&𝒃𝐢𝐣𝐍𝐄𝐖 + 𝒆𝒃𝐢𝐣𝐤
1. Use5th,95th percentilesfromtheBpredictionstoget90%case-wisePI
PredictionIntervalforIndividualRecord
18
X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
!"′$%X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
!"′$%X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
!"′$%X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
!"′$%X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
!"′$%X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
!"′$%X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
!"′$%X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
!"′$%
X1
X2
Y1
β1
β2
x11
x12
x13
w11
w12
w13
x21
x22
x23w23
w22
w21
y11
y12
y13λ33
λ32
λ31
Exogenous Factors Endogenous Factors
Structural model (Inner model)
z1 ε11
ε12
ε13
Measurement Model (Outer Model)
y14
λ34
ε14
k
k
k
k
k
k
k
k
k
k
k
k
k
k
+Uncertaintyduetodeviationfromaverage
UncertaintyduetoPLSmodelestimation
PlowingThroughthePLSPathModel
Q#4:WhichitemstouseasinputsforpredictingY2?
19
X1
X2
Y1β1
β2
x11
x12
x13
w11
w12
w13
z1
Y2
y21
y22
y23λ43λ42λ41
ε21
ε22
ε23
β3
z2
y24
λ44
ε24
x21
x22
x23
w21
w22
w23
y11
y12
y13λ33λ32λ31
ε11
ε12
ε13
y14
λ34
ε14
? Multiplepossiblesetsofpredictors(predictionpaths)
Mediator&Prediction
EvaluatingPredictive
Performance
“Classic”Out-of-SamplePerformanceEvaluation(ingeneral,notpathmodels)
21
estimationtraining
predictionholdout
Dataset:
1 2 3 4 5 6 7 8 9 10
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
predictions - actuals = residualstrainingholdout
10-fo
ld c
ross
-val
idat
ion
𝑅𝑀𝑆𝐸 = 𝐴𝑣𝑔(𝒓𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔C)�
visualization of residualspredictive power
Q#5:Whattobenchmarkagainst?
22
? • SimpleAverage• Linearregressionmodel• Machinelearningalgorithm(specifically,NeuralNet)
• Adifferent(simpler)PLSmodel
23
X1
X2
Y1β1
β2
x11
x12
x13
w11
w12
w13
z1
Y2
y21
y22
y23λ43λ42λ41
ε21
ε22
ε23
β3
z2
y24
λ44
ε24
x21
x22
x23
w21
w22
w23
y11
y12
y13λ33λ32λ31
ε11
ε12
ε13
y14
λ34
ε14
Causal Theory dictates:Latent variablesPath model structureMeasurement modelArrow directions
[path model + path coefficients important]
Architecture req’s/options:Each construct has itemsMediation possible
Causal Theory dictates:Independent variables (X’s)Dependent variable (Y)Implicit construct-to-variable mappingLinear model
[model coefficients important]
Mediation requires multiple models
User dictates (not causal):Inputs (X’s)Output (Y)Hidden layer, nodes[model coefficients unimportant; no “mediation”]
Algorithm constraints:Arrows direction: left-to-rightOnly input+output have dataOne “item” per node
PLS LinearRegression NeuralNet
Example:TAMGefen &Straub,CAIS2005
PU
PEOU
USE
SCECR2010,NY
IterativeestimationNode-levelerrors
NeuralNetworkHackl &Westlund TQM 2000;Hsu,Chen&HsiehTQM2006
Method HoldoutRMSE
PLS(reflective) 2.10PLS(formative) 2.18NeuralNet 1.94LinearRegression 1.84
Predicts“3”
• Insufficientdata?• 5-pointLikertscale?
26
Figure 1. Model for product returns in online retail
NeuralNetworksasanApproximationtoProbabilisticGraphicalModels:UsingSEMforPredictiveAnalytics
• Modelcomplex,non-linearcausalrelationships• Largescaledatasetswithmillionsofrecordsandtens
orhundredsofthousandsofdimensions(attributes)• Neuralnetworksstatisticallyapproximatestructural
equationmodels,inwhichboththeouterandtheinnermodelaredefinedbylogisticregressionmodels
Q#6:HowtoMeasureOut-of-SamplePredictivePower?
27
? • Holdout:RMSE,MAD,MAPE• In-sample:R2,Q2 ?• NewtoPLS:AIC,BIC,GM,… (in-sample)
Togetsomeanswers,weneedasimulationstudy
Whichmeasure selectsthebestpredictivePLSmodel?
Predictivemodelselection:Twolenses
1.Predictiononly(P):– Focusonlyoncomparingthepredictiveaccuracyofmodels(Gregor,2006)– Limitedornoroleoftheory(nocausalexplanation)– Selectthemodelwithbestout-of-samplepredictiveaccuracy– Out-of-samplecriteria(e.g.RMSE)arethegoldstandardforjudging
2.ExplanationwithPrediction(EP):– Focusonbalancingcausalexplanationandprediction(Gregor,2006)– Prominentroleoftheory(causalexplanationisforemost)– Requirestrade-offinpredictivepowertoaccommodateexplanatorypower
Prediction-orientedmodelselection inPLS-PM(Sharmaetal.2017,submitted)
ConductingPredictiveSimulationStudies
Simulation Study
1. SimulatedatafromaspecificPLSmodel,manipulatingfactorsofinterest
2. Partitiondataintotrainingandholdout samples3. EstimaterelevantPLSmodelsfromtrainingsample4. Generateholdoutpredictionsusingeachestimated
model
TypicalStepsinPredictiveSimulationStudy
Welearned:Simulation isimportant,andnotstraightforward
31
Q#7:Howtosimulate dataforPLSfitting?
Q#8:Whichfactors tovary?
Q#9:Howbigofaholdout set?
Q#10:Roleof“generatingmodel”
? Simsem inRSEGIRLSinR(Schlittgen,2015)
?
?
pathmodel,coefficients,factorloading,samplesize
Large– morereliableout-of-sampleevaluationSmall– morerealisticinPLSstudies
? Shouldgoodpredictivemodelrecovergenmodel?Includegeneratingmodelinconsiderationset?
!"
!#
!$
%"
%$Model 1: Incorrect model
!"
!#
!$
%"
%$Model 3: Incorrect model
!"
!#
!$
%"
%$Model 5: Data generation model
!"
!#
!$
%"
%$Model 7: Saturated model
!"
!#
!$
%"
%$Model 2: Parsimonious model
!"
!#
!$
%"
%$Model 4: Incorrect model
!"
!#
!$
%"
%$Model 6: Incorrect model
&" = 0.2
&+ = 0.4
&- = 0.1
!"
!#
!$
%"
%$Model 8: Overspecified model
!"
!#
!$
%"
%$Model 1: Incorrect model
!"
!#
!$
%"
%$Model 3: Incorrect model
!"
!#
!$
%"
%$Model 5: Data generation model
!"
!#
!$
%"
%$Model 7: Saturated model
!"
!#
!$
%"
%$Model 2: Parsimonious model
!"
!#
!$
%"
%$Model 4: Incorrect model
!"
!#
!$
%"
%$Model 6: Incorrect model
&" = 0.2
&+ = 0.4
&- = 0.1
!"
!#
!$
%"
%$Model 8: Overspecified model
!"
!#
!$
%"
%$Model 1: Incorrect model
!"
!#
!$
%"
%$Model 3: Incorrect model
!"
!#
!$
%"
%$Model 5: Data generation model
!"
!#
!$
%"
%$Model 7: Saturated model
!"
!#
!$
%"
%$Model 2: Parsimonious model
!"
!#
!$
%"
%$Model 4: Incorrect model
!"
!#
!$
%"
%$Model 6: Incorrect model
&" = 0.2
&+ = 0.4
&- = 0.1
!"
!#
!$
%"
%$Model 8: Overspecified model
!"
!#
!$
%"
%$Model 1: Incorrect model
!"
!#
!$
%"
%$Model 3: Incorrect model
!"
!#
!$
%"
%$Model 5: Data generation model
!"
!#
!$
%"
%$Model 7: Saturated model
!"
!#
!$
%"
%$Model 2: Parsimonious model
!"
!#
!$
%"
%$Model 4: Incorrect model
!"
!#
!$
%"
%$Model 6: Incorrect model
&" = 0.2
&+ = 0.4
&- = 0.1
!"
!#
!$
%"
%$Model 8: Overspecified model
Example:ModelComparisonStudy
Prediction-orientedmodelselection in
PLS-PM(Sharmaetal.2017,
submitted)
33
Model # 1 2 3 4 5 6 7 8
PLS Criteria
R2 0.000 0.273 0.000 0.003 0.019 0.000 0.695 0.009Adjusted R2 0.000 0.537 0.000 0.005 0.074 0.000 0.303 0.081GoF 0.000 0.001 0.000 0.000 0.037 0.000 0.962 0.000Q2 0.003 0.305 0.000 0.004 0.224 0.002 0.179 0.281
Information Theoretic Criteria
FPE 0.000 0.638 0.000 0.006 0.091 0.000 0.163 0.101CP 0.000 0.686 0.000 0.006 0.100 0.001 0.096 0.111GM 0.000 0.743 0.000 0.006 0.109 0.007 0.011 0.123AIC 0.000 0.638 0.000 0.006 0.091 0.000 0.164 0.101AICu 0.000 0.688 0.000 0.006 0.099 0.002 0.093 0.112AICc 0.000 0.649 0.000 0.006 0.093 0.001 0.146 0.104BIC 0.000 0.731 0.000 0.006 0.107 0.005 0.032 0.120HQ 0.000 0.695 0.000 0.006 0.100 0.001 0.085 0.112HQc 0.000 0.705 0.000 0.006 0.102 0.002 0.070 0.114
Out of Sample Criteria
MAD 0.000 0.351 0.000 0.000 0.183 0.000 0.236 0.229RMSE 0.000 0.365 0.000 0.000 0.186 0.000 0.218 0.230MAPE 0.094 0.044 0.247 0.076 0.044 0.347 0.090 0.058SMAPE 0.000 0.365 0.000 0.000 0.123 0.000 0.343 0.168
PerformanceMeasuresChooseWhichModel?
34
Asimulationstudycantakelongtorun• Bootstrap• Parallelizing?• Pilotruns(fewerbootstraprounds)
35
UsingPLSPredictions
36
How to use it?
37
μy21,&σy21
μy22,&σy22
μy23,&σy23
!"′$%
Assess Relevance
Evaluate Predictability
Low/nopredictivepower• weaknessintheoreticalmodel• qualityofthemeasureditems• phenomenonisnaturallyunpredictable• modelsufficientonlyforexplanationbut
notprediction(e.g.,userbehavior)• Externalvalidity(overfitting)– compare
in-samplevs.out-of-sampleprediction
38
“If we can predict successfully on the basis of a certain explanation we have a good reason, and perhaps the best sort of reason, to accept the explanation”
The Conduct of Inquiry: Methodology for Behavioral ScienceKaplan (1964)
generatenew theory
Develop Measures
Low/nopredictivepowerofexistingmodel1.Opportunityforgeneratingnewtheory
2.Identifyconstructsthatyieldpoorpredictions(boosttraditionalrigorousmeasurementpractices)
Improve existing theory
X1
X2
Y1β1
β2
x11
x12
x13
w11
w12
w13
z1
Y2
y21
y22
y23λ43λ42λ41
ε21
ε22
ε23
β3
z2
y24
λ44
ε24
x21
x22
x23
w21
w22
w23
y11
y12
y13λ33λ32λ31
ε11
ε12
ε13
y14
λ34
ε14
X1
X2
Y1β1
β2
x11
x12
x13
w11
w12
w13
z1
Y2
y21
y22
y23λ43λ42λ41
ε21
ε22
ε23
β3
z2
y24
λ44
ε24
x21
x22
x23
w21
w22
w23
y11
y12
y13λ33λ32λ31
ε11
ε12
ε13
y14
λ34
ε14
X1*X2x11x21x11x21x11x21
W31
W32
W33
β3
AsymmetricPredictions:predictiveaccuracy/precisionvariesfordifferentsubgroups
Createmorenuancedtheories
41
Ray, Kim, and Morris: The Central Role of Engagement in Online Communities540 Information Systems Research 25(3), pp. 528–546, © 2014 INFORMS
Table 3 Structural Results of Proposed and Alternative Models
Proposed model First alternative Second alternative
CE SAT KC WOM CE SAT KC WOM CE SAT KC WOM
R2 0076 0050 0057 0054 0048 0045 0078 0050 0061 0055CI 0026⇤⇤⇤ 0039⇤⇤⇤ 0014 0037⇤⇤⇤ 0027⇤⇤⇤ 0038⇤⇤⇤ É0013 0007SIV 0030⇤⇤⇤ 0012 0015⇤ 0017⇤ 0031⇤⇤⇤ 0013 É0009 É0004EFF 0034⇤⇤⇤ 0019⇤⇤ 0003 0032⇤⇤⇤ 0019⇤ 0034⇤⇤⇤ 0019⇤⇤ 0000 É0007CE 0061⇤⇤⇤ 0047⇤⇤⇤ 0081⇤⇤⇤ 0053⇤⇤⇤
SAT 0016⇤⇤ É0005 0030⇤⇤⇤ 0015⇤ É0004 0027⇤⇤
CE⇥ EFF É0010⇤ É0009⇤
ArtifactsaVC 0015⇤⇤ É0014⇤⇤ 0001 É0017⇤⇤ 0009 É0014⇤ 0015⇤⇤ É0014⇤ É0001 É0017⇤⇤
aPD É0016⇤⇤⇤ 0003 0009 0014⇤⇤ É0001 0007 É0016⇤⇤ 0002 0012 0016⇤
aPP É0004 0004 0012⇤⇤ 0002 0010 0002 É0004 0004 0013⇤⇤ 0003aRP É0007⇤ 0015⇤⇤ 0000 0008 É0003 0009 É0007 0015⇤⇤ 0002 0008aUM 0000 É0007 É0009⇤ É0001 É0008 É0004 0000 É0007 É0008 É0002
ControlscGEN 0001 0006 É0003 0001 É0002 0003 0000 0006 0003 0001cAGE É0004 É0002 É0005 É0001 É0008 É0004 É0005 É0002 É0004 É0001cFREQ 0008 0010⇤ 0023⇤⇤⇤ É0003 0027⇤⇤⇤ 0004 0007 0010 0021⇤⇤⇤ É0003cTENURE É0011⇤⇤ 0014⇤⇤ É0002 0005 É0009 0005 É0011⇤ 0014⇤⇤ É0001 0006
Note. CI: community identification; SIV: self-identity verification; EFF: knowledge self-efficacy; CE: community engagement; SAT: satisfaction; KC: knowledgecontribution; WOM: positive word of mouth; aVC: virtual copresence; aPD: profile depth; aPP: past postings; aRP: regulatory practices; aUM: user moderation;cGEN: gender; cAGE: age; cFREQ: frequency of past visitation; cTENURE: tenure at online community.
Path significances: ⇤p < 0005; ⇤⇤p < 0001; ⇤⇤⇤p < 00001.
our proposed model), as was that of word-of-mouth(a 1.85% increase over our proposed model). Thus,engagement and satisfaction appear to fully mediate(Baron and Kenny 1986) the influence of identity factorson prosocial intentions.
Overall, the results strongly uphold the main princi-ples of our proposed model. Specifically, the identityfactors that earlier studies focused on appear to beantecedent to the more powerful mediating condi-tions of engagement and satisfaction that ultimately
Figure 2 Structural Results of Proposed Model
Self-identityverification
Knowledgeself-efficacy
Knowledgecontribution
Satisfaction
Communityengagement
Positiveword of mouth
Communityidentification
0.34***
0.19 **
0.30***
0.26
***
0.61***
0.39*** 0.30***
–0.10*
0.47 ***
0.16
**
Note. Nonsignificant hypothesized paths are dashed.Path significances: ⇤p < 0005; ⇤⇤p < 0001; ⇤⇤⇤p < 00001.
determine prosocial outcomes in online communities.The theory-free alternative models did not yield anyadditional advantage when both power and parsimonywere considered. We also note the failed hypothesesand unexpectedly significant control effects found inour empirical results. First, satisfaction does not directlyinfluence knowledge contribution intentions, althoughit does influence word-of-mouth intentions. Second,self-identity verification did not have a significantrelationship with satisfaction. Our artifact measures
Dow
nloa
ded
from
info
rms.o
rg b
y [1
40.1
14.1
39.1
81] o
n 14
Oct
ober
201
4, a
t 01:
00 .
For p
erso
nal u
se o
nly,
all
right
s res
erve
d.
Ray, S., Kim, S. S., and Morris, J. G. 2014. “The Central Role of Engagement in Online Communities,”Information Systems Research (25:3), pp. 528–546.
Reduced formMa & Agarwal (2007)
compare competing theories
ModelComparison&Selection
• Fundamentaltoscientificwork• PLSasexploratory• p-valueschallengeinlargesamples
compare alternative models
!"
!#
!$
%"
%$Model 1: Incorrect model
!"
!#
!$
%"
%$Model 3: Incorrect model
!"
!#
!$
%"
%$Model 5: Data generation model
!"
!#
!$
%"
%$Model 7: Saturated model
!"
!#
!$
%"
%$Model 2: Parsimonious model
!"
!#
!$
%"
%$Model 4: Incorrect model
!"
!#
!$
%"
%$Model 6: Incorrect model
&" = 0.2
&+ = 0.4
&- = 0.1
!"
!#
!$
%"
%$Model 8: Overspecified model
!"
!#
!$
%"
%$Model 1: Incorrect model
!"
!#
!$
%"
%$Model 3: Incorrect model
!"
!#
!$
%"
%$Model 5: Data generation model
!"
!#
!$
%"
%$Model 7: Saturated model
!"
!#
!$
%"
%$Model 2: Parsimonious model
!"
!#
!$
%"
%$Model 4: Incorrect model
!"
!#
!$
%"
%$Model 6: Incorrect model
&" = 0.2
&+ = 0.4
&- = 0.1
!"
!#
!$
%"
%$Model 8: Overspecified model
!"
!#
!$
%"
%$Model 1: Incorrect model
!"
!#
!$
%"
%$Model 3: Incorrect model
!"
!#
!$
%"
%$Model 5: Data generation model
!"
!#
!$
%"
%$Model 7: Saturated model
!"
!#
!$
%"
%$Model 2: Parsimonious model
!"
!#
!$
%"
%$Model 4: Incorrect model
!"
!#
!$
%"
%$Model 6: Incorrect model
&" = 0.2
&+ = 0.4
&- = 0.1
!"
!#
!$
%"
%$Model 8: Overspecified model
!"
!#
!$
%"
%$Model 1: Incorrect model
!"
!#
!$
%"
%$Model 3: Incorrect model
!"
!#
!$
%"
%$Model 5: Data generation model
!"
!#
!$
%"
%$Model 7: Saturated model
!"
!#
!$
%"
%$Model 2: Parsimonious model
!"
!#
!$
%"
%$Model 4: Incorrect model
!"
!#
!$
%"
%$Model 6: Incorrect model
&" = 0.2
&+ = 0.4
&- = 0.1
!"
!#
!$
%"
%$Model 8: Overspecified model
DifferentTypesofModels(generating,parsimonious,incorrect,saturated,overspecified)
Sharmaetal.2017
43
GeneratingPredictions
&PredictionErrors
EvaluatingPredictivePerformance
ConductingPredictiveSimulationStudies
UsingPLSPredictions
MoreOpenQs
44
? Whatis“good”predictionaccuracy?precision?
? Evaluatingconstruct-levelpredictions
? WhichpartstransfertoCB-SEM?
AnalyticsHumanity
Responsibility
Galit Shmueli徐茉莉Institute of Service Science