1
Heuristic Design of Experimentswith Meta-Gradient Search
of Model Training Parameters
SF Bay ACM, Data Mining SIG, Feb 28, 2011http://www.sfbayacm.org/?p=2464
[email protected]/in/GregMakowski
2
Choice is good…
But can beoverwhelming
Key Questions Discussed• You (a data miner) have many algorithms or
libraries you can use, with many choices…– How to stay organized among all the choices?
• Algorithm parameters• Adjustments in Cost vs. Profit (Type I vs. II error bias)• Metric selection (Lift if acting on top % vs. RMSE or ROC)• Ensemble Modeling, boosting, bagging, stacking• Data versions, preprocessing, trying new fields
– How to plan, and learn as you go?– How simple should you stay ?– to keep descriptiveness vs. Occam’s Razor?
3
Outline
Model Training Parameters in SAS Enterprise MinerTracking Conservative Results in a “Model Notebook”
How to Measure Progress
Meta-Gradient Search of Model Training ParametersHow to Plan and dynamically adapt
How to Describe Any Complex System – Sensitivity
4
Enterprise MinerSample Data Flow for a Project
5:
(Boxes are expanded in later slides)
LearningTuning
Validation
StratifiedSampling
Type I vs. II Error WeightsProfit-Loss Ratios
6
In the Data Source,NOT the Model Engines
In other software,may use a weight field
Need to stay organizedregardless
Regression
• It is always good to find thebest linear solution early on– Like testing a null hypothesis:
(linear vs. non-linear) problem
• Can feed “score” or “residualerror” as a source field intonon-linear models
7
Neural Net Architectureand Parameters
8
field 1
field
2
$
c
$
$
$$
$
$ $
$$$ $$
$
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c c
c
c
c cc
c
c
A Neural NetSolution
“Non-Linear”
Severalregions
which arenot adjacent
MLP
RBF
A Comparison of a Neural Netand Regression
9
A Logistic regression formula:Y = f( a0 + a1*X1 + a2*X2 + a3*X3)
a* are coefficients
Backpropagation, cast in a similar form:H1 = f(w0 + w1*I1 + w2*I2 + w3*I3)H2 = f(w4 + w5*I1 + w6*I2 + w7*I3):
Hn = f(w8 + w9*I1 + w10*I2 + w11*I3)
O1 = f(w12 + w13*H1 + .... + w15*Hn)On = ....
w* are weights, AKA coefficientsI1..In are input nodes or input variables.H1..Hn are hidden nodes, which extract features of the data.O1..On are the outputs, which group disjoint categories.f() is the SIGMOID function, a non-linear “S” curve
it is very noisy in the brain – chemical depletion of neurotransmitters
a0
a1 a2 a3
X1 X2 X3
Y
Input 1 I2 I3
Bias
H1 Hidden 2
Output
w1w2
w3
Direct connect
Neural Net• Network Architecture can be linear
(MLP) or circular (many RBF)
• Network Direct Connection allowsinputs to connect to output (to find thesimple, linear solution first)
• Network Hidden Units can go up to64 (much better than 8)
• Profit/Loss uses settings in Data Source10
11
What does a DecisionTree Look Like?
Split 3
Age
Income
Split 1
Split 2
$$$$
$
$$$
$$$ $
$
$$$
$$
$
$
c
c c
c
c
cc
cc
cc
c$
Leaf 3
Leaf 4
Leaf 1
Leaf 2
If (Age < Split1) then:…If (Income > Split2) then Leaf1 with dollar_avg1:…If (Income < Split2) then Leaf2 with dollar_avg2If (Age > Split1) then:…If (Income > Split3) then Leaf3 with dollar_avg3:…If (Income < Split3) then Leaf4 with dollar_avg4
Leaf 1
Split 1
Split 2 Split 3
Leaf 2 Leaf 3 Leaf 4
TreeDepth = 2
Decision Tree
• Primary Parameters to vary– Criterion
• Probchisq (Default)• Entropy• Gini
– Assessment (Decision vs. Lift)
– Tree size (depth, leaf size, Xvalid)
12
Gradient Boosting(Tree Based)
Based on “Greedy FunctionApproximation: A GradientBoosting Machine” by JeromeFriedmanEach new CART tree:
• is on a 60% random sample• Is a small, general tree• Forecasts the error from the forecast
from all previous trees summed• May have 50 to 2,000 trees in a
sequence• Evaluate how far “back” in sequence
to prune13
DM Algorithms Available in Packages
14
Regres-s ion
LassoReg
Decis ionTree
Neura lNet
SupportVectorMach
Other TOT
2 1 0 0 0 1 40 0 1 0 0 0 13 0 3 3 0 3 121 0 1 0 1 1 40 0 4 0 0 0 43 2 5 3 2 3 180 0 0 0 0 5 5
# Modules per Forecasting Family in DM Software
Feel Overwhelmed on Lots of ComplexAlgorithm Parameters? GOOD!
• A deep understanding of algorithms, math andassumptions helps significantly Heuristics– i.e. typically, regression has a problem with correlating
inputs because the solution calculation uses matrixinversion (if you are worried about weight sign inversion)
– SVM’s or Bayesian Nets do not have this problem,because they are solved differently.
• Don’t have a problem with correlating inputs, input selectionbecomes more random – but you still get a decent solution
• How can you manage the details?– I am glad you asked…. Moving on to the next section
15
Outline
Model Training Parameters in SAS Enterprise MinerTracking Conservative Results in a “Model Notebook”
How to Measure Progress
Meta-Gradient Search of Model Training ParametersHow to Plan and dynamically adapt
How to Describe Any Complex System – Sensitivity
16
Model Exploration Process• Scientific Method of
Hypothesis Test– If you change ONE thing, than any change
in the results is because of that onechange
– Design of Experiments (DOE), test plan– Best to compare model settings on same
data version• New data versions add new preprocessed fields,
or new months (records)– Key design objective: all experiments are
reproducible• SAME Random split between Learning – Test –
Validation, with a consistent random seed– LTV split before loading data in a tool, so same
partitioning for all tools/libraries/algorithms
Model Notebook
18
Train ValGap =Abs(
Trn-Val)
ConsrvResult
varsofferd
varselct
1 Regrsn 1 27 stepw 9 12 5.77 5.94 0.17 5.60
varsofferd
HidnNodes
DirectConn
Arch
1 Neural 1 27 3 n MLP all 77 6.65 10.89 4.24 2.411 Neural 2 27 10 n MLP all 40 6.88 6.73 0.15 6.581 Neural 3 27 10 Y MLP all 36 6.40 6.93 0.53 5.871 Neural 4 27 10 n RBF all 34 5.67 5.54 0.13 5.411 Neural 5 27 10 Y RBF all 35 5.95 7.92 1.97 3.98
Param1
Param2
Param3
…
Input Parameters Outcomes
Lift in Top 10%
VarsSeltd
TrnTime
DataVer
AlgorModNum
Bad vs. Good
Model NotebookOutcome Details
• My Heuristic Design Objectives: (yours may be different)
– Accuracy in deployment– Reliability and consistent behavior, a general solution
• Use one or more hold-out data sets to check consistency• Penalize more, as the forecast becomes less consistent
– No penalty for model complexity (if it validates consistently)• Let me drive a car to work, instead limiting me to a bike
– Message for check writer– Don’t consider only Occam’s Razor: value consistent good results
– Develop a “smooth, continuous metric” to sort and findmodels that perform “best” in future deployment
19
Model NotebookOutcome Details
• Training = results on the training set• Validation = results on the validation hold out• Gap = abs( Training – Validation )
A bigger gap (volatility) is a bigger concern for deployment, a symptomMinimize Senior VP Heart attacks! (one penalty for volatility)Set expectations & meet expectationsRegularization helps significantly
• Conservative Result= worst( Training, Validation) + Gap_penaltyCorr / Lift / Profit higher is better: Cons Result = min(Trn, Val) - GapMAD / RMSE / Risk lower is better: Cons Result = max(Trn, Val) + Gap
Business Value or Pain ranking = function of( conservative result )20
Model Notebook
21
Train ValGap =Abs(
Trn-Val)
ConsrvResult
varsofferd
varselct
1 Regrsn 1 27 stepw 9 12 5.77 5.94 0.17 5.60
varsofferd
HidnNodes
DirectConn
Arch
1 Neural 1 27 3 n MLP all 77 6.65 10.89 4.24 2.411 Neural 2 27 10 n MLP all 40 6.88 6.73 0.15 6.581 Neural 3 27 10 Y MLP all 36 6.40 6.93 0.53 5.871 Neural 4 27 10 n RBF all 34 5.67 5.54 0.13 5.411 Neural 5 27 10 Y RBF all 35 5.95 7.92 1.97 3.98
Param1
Param2
Param3
…
Input Parameters Outcomes
Lift in Top 10%
VarsSeltd
TrnTime
DataVer
AlgorModNum
Bad vs. Good
Model Notebook ProcessTracking Detail Training the Data Miner
DataVer
Author
AlgorModNum
chngfromprior
Param 1 Param 2 Param 3 Param 4 Param 5 Param 6 Param 7VarSel
Trntime(sec)
Train ValGap =Abs(
Trn-Val)
ConsrvResult
Train ValGap =Abs(
Trn-Val)
ConsrvResult
Train ValGap =Abs(
Trn-Val)
ConsrvResult
DataVer
Author
AlgorModNum
chngfromprior
varsoffered
varselectn
VarSel
TrnTime
Train Val GapConsrvResult
Train Val GapConsrvResult
Train Val GapConsrvResult
1 GM B logistic 1 0 27 stepws 10 12.04 8.12 3.92 4.20 7.59 4.85 2.74 2.111 GM B logistic 2 1 19 stepws 10 12.04 8.12 3.92 4.20 7.59 4.85 2.74 2.111 GM B logistic 3 1 6, no dbc stepws 4 7.51 1.98 5.53 -3.55 4.90 3.96 0.94 3.02 investigate inconsistency
1 GM B logistic 4 113, only
dbcstepws 7 9.58 7.33 2.25 5.08 6.59 5.25 1.34 3.91
DataVer
Author
AlgorModNum
chngfromprior
varsoffered
regr typevar
selectn2-factorinteract
polynomVarSel
TrnTime
Train Val GapConsrvResult
Train Val GapConsrvResult
Train Val GapConsrvResult
1 GM regr 1 0 27 logistic stepws n 9 12 5.77 5.94 0.17 5.60 3.35 4.46 1.11 2.24 2.25 3.02 0.77 1.481 GM regr 2 1 27 logistic stepws Yes 9 16 5.76 5.94 0.18 5.58 3.35 4.46 1.11 2.24 2.25 3.02 0.77 1.481 GM regr 3 1 27 logistic stepws n 2 10 57 5.86 6.93 1.07 4.79 3.48 5.03 1.55 1.93 2.32 2.61 0.29 2.031 GM regr 4 1 27 logistic stepws Yes 2 11 58 5.86 6.93 1.07 4.79 3.48 5.04 1.56 1.92 2.32 2.92 0.60 1.72
4 GM regr 5 4 3 logistic stepwise Yes 2 8 63 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.434 GM regr 6 5 28 logistic stepwise Yes 24 GM regr 7 5 3 logistic stepwise n 2 63 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.434 GM regr 8 5 3 logistic stepwise n 1 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.434 GM regr 9 5 3 logistic stepwise Yes 1 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.434 GM regr 10 8 28 logistic stepwise n 1 12.88 13.40 0.52 12.36 6.65 6.89 0.24 6.41 3.53 3.64 0.11 3.434 GM regr 11 5 3 logistic stepwise Yes 3 6 78 15.98 16.06 0.08 15.89 8.61 8.03 0.58 7.45 4.81 4.39 0.41 3.984 GM regr 12 5 3 logistic stepwise Yes 4 2 78 15.98 16.06 0.08 15.89 8.61 8.03 0.58 7.45 4.81 4.39 0.41 3.98
4n GM regr 13 11 3 logistic stepwise Yes 3 6 78 18.39 18.79 0.39 18.00 9.58 9.55 0.03 9.52 4.96 4.92 0.03 4.89
4n GM regr 14 11 3 6 78 12.49 12.12 0.36 11.76 7.63 7.42 0.20 7.22 4.29 4.47 0.18 4.12
1 GM DM Regr 1 0 27 logistic stepws 13 15 12.00 3.17 8.83 -5.66 7.21 4.16 3.05 1.11 4.28 3.07 1.21 1.86
4 GM DM Regr 2 0 28max v3000
min rsq0.005
useaov16 var
YES6 72 16.27 15.76 0.52 15.24 8.67 8.03 0.64 7.39 4.58 4.24 0.34 3.90
1 GM PLS 1 01 GM PLS 2 1 27 default default default default 4 18 11.26 3.08 8.18 -5.10 7.12 4.85 2.27 2.58 4.28 3.12 1.16 1.961 GM PLS 3 1 Test Set Cros Val didn't finish, don't use Xvalidation4 GM PLS 4 0 28 PLS NIPALS 200 28 122 16.63 15.76 0.87 14.89 8.93 8.03 0.90 7.13 4.76 4.32 0.45 3.87
DataVer
Author
AlgorModNum
chngfromprior
varsoffered
hiddenDirectConn ?
archVarSel
TrnTime
Train Val GapConsrvResult
Train Val GapConsrvResult
Train Val GapConsrvResult
1 GM AutoNrl 1 0 27 2 n MLP all 35 4.19 3.76 0.43 3.33 2.47 2.57 0.10 2.37 1.77 1.88 0.11 1.661 GM AutoNrl 2 1 27 6 n MLP all 189 4.37 2.77 1.60 1.17 2.82 1.78 1.04 0.74 1.98 1.93 0.05 1.88
1 GM AutoNrl 3 1 27 8 n MLPtrn action= search
all 532 0.83 0.56 0.27 0.29 0.83 0.56 0.27 0.29 0.83 0.56 0.27 0.29
1 GM AutoNrl 4 1 27 8 n MLPactiv =logistic
all 356 5.12 2.97 2.15 0.82 3.02 3.37 0.35 2.67 1.90 2.57 0.67 1.23
1 GM AutoNrl 5 1 27 6 n MLParch =block
all 130 0.89 0.97 0.08 0.81
1 GM AutoNrl 6 1 27 6 n MLParch =funnel
all 595 1.36 1.08 0.28 0.80
4 GM AutoNrl 7 1 28 6 n MLP all 1201 16.2722 15.76 0.51 15.24 8.65 7.88 0.77 7.11 4.46 4.24 0.22 4.03
DataVer
Author
AlgorModNum
chngfromprior
varsoffered
hiddenDirectConn ?
arch DecayDecisionWeight
VarSel
TrnTime
Train Val GapConsrvResult
Train Val GapConsrvResult
Train Val GapConsrvResult
1 GM Neural 1 0 27 3 n MLP all 77 6.65 10.89 4.24 2.41 3.90 6.53 2.63 1.27 2.52 3.96 1.44 1.081 GM Neural 2 1 27 10 n MLP all 40 6.88 6.73 0.15 6.58 3.97 4.55 0.58 3.39 2.56 3.02 0.46 2.101 GM Neural 3 1 27 10 Y MLP all 36 6.40 6.93 0.53 5.87 3.49 5.45 1.96 1.53 2.32 3.22 0.90 1.421 GM Neural 4 1 27 10 n RBF (orbfeq) all 34 5.67 5.54 0.13 5.41 3.25 4.85 1.60 1.65 2.20 3.22 1.02 1.181 GM Neural 5 1 27 10 Y RBF all 35 5.95 7.92 1.97 3.98 3.48 4.85 1.37 2.11 2.31 3.17 0.86 1.45
js1 JS Neural 6 0 17 5 n MLP Softmax 10,-5,-1,0 all 6.03 6.53 0.50 5.53 3.40 4.55 1.15 2.25 2.67 3.36 0.69 1.98js1 JS Neural 7 6 15 5 Y MLP Softmax 10,-5,-1,0 all 6.14 5.74 0.40 5.34 3.59 2.97 0.62 2.35 2.77 2.37 0.40 1.97js1 JS Neural 8 6 15 3 Y MLP Softmax 0.5 10,-5,-1,0 all 6.27 7.13 0.86 5.41 3.54 3.56 0.02 3.52 2.74 2.57 0.17 2.40js1 JS Neural 9 6 15 3 n MLP Softmax 0.5 10,-5,-1,0 all 6.27 6.33 0.06 6.21 3.57 4.65 1.08 2.49 2.76 2.82 0.06 2.702 GM Neural 10 2 35 12 Y MLP 20,0,-1,0 all3 GM Neural 11 2 45 20 n MLP 20,0,-1,0 all 18 6.26 7.76 1.50 4.76 3.54 4.22 0.68 2.86 2.18 2.46 0.28 1.913 GM Neural 12 11 45 20 n MLP 0.8 20,0,-1,0 all 16 6.26 7.76 1.50 4.76 3.54 4.22 0.68 2.86 2.18 2.46 0.28 1.913 GM Neural 13 11 45 20 n MLP 0.6 20,0,-1,0 all 16 6.26 7.76 1.50 4.76 3.54 4.22 0.68 2.86 2.18 2.46 0.28 1.914 GM Neural 14 11 3 20 n MLP 0.01 20,0,-1,0 all 204 16.39 15.15 1.24 13.91 8.67 8.03 0.64 7.39 4.82 4.39 0.43 3.974 GM Neural 15 11 28 20 n MLP 0.01 20,0,-1,0 all 713 16.39 15.76 0.63 15.12 8.54 7.88 0.66 7.22 4.40 4.25 0.15 4.114 GM Neural 16 15 31 40 n MLP 0.01 20,0,-1,0 all 782 18.02 18.18 0.16 17.86 9.21 9.55 0.34 8.87 4.60 4.77 0.17 4.444 GM Neural 17 15 same, max iter 20 --> 50 all 1754 18.02 18.18 0.16 17.86 9.21 9.55 0.34 8.87 4.66 4.77 0.11 4.55
4 GM Neural 18 1629 (notwoYr)
40 0 0 all 18.386 18.98 18.18 0.80 17.38 9.25 9.59 0.34 8.90 4.67 4.86 0.20 4.47
4n GM DMNeural 19 0 13 3 n all 19 10.60 2.57 8.03 -5.46 6.93 4.36 2.57 1.79 4.14 2.57 1.57 1.00
didn't finish, out of memory
recent_serrtrn_dbc changed to recent_serrtrn_flag(does DBC on ser patt help? YES)
add Feb & Mar torecent*
same, max iter 20 --> 50
Model NotebookProject = Transit, Last Update 5/6/2010
OutcomesStatus Lift in Top 10% Over File Avg
OutcomesLift in Top 20% Over File Avg
Input Parameters OutcomesLift in Top 5% Over File Avg
Input / Test Outcome
Regression
Top5%
Top10%
Top20%
AutoNeural
Neural
Yippeee!
More
Heuristic Strategy:1) Try a few models of many
algorithm types (seed thesearch)
2) Opportunistically spendmore effort on what isworking (invest in top stocks)
3) Still try a few trials onmedium success (diversify,limited by project time-box)
4) Try ensemble methods,combining model forecasts& top source vars w/modelThe Data Mining Battle Field
Model Notebook ProcessTracking Detail Training the Data Miner
Mcnt
DataVer
Author
AlgorModNum
chngfromprior
varsoffered
criterionmax
depthleaf size
asses =5% Lift
DecisionWeight
VarSel
TrnTime
Train Val GapConsrvResult
Train Val GapConsrvResult
Train Val GapConsrvResult
47 1 GM Dec Tree 1 0 27 default 6 5 20,0,-5,0 7 13 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.2748 1 GM Dec Tree 2 1 27 probchisq 6 5 20,0,-5,0 7 16 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.2749 1 GM Dec Tree 3 1 27 entropy 6 5 20,0,-5,0 6 16 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9150 1 GM Dec Tree 4 1 27 gini 6 5 20,0,-5,0 10 22 13.76 11.28 2.48 8.80 7.70 6.10 1.60 4.50 4.32 3.71 0.61 3.1051 1 GM Dec Tree 5 3 27 entropy 12 5 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9152 1 GM Dec Tree 6 3 27 entropy 6 10 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9153 1 GM Dec Tree 7 3 27 entropy 6 100 20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9154 1 GM Dec Tree 8 3 27 entropy 6 100 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.5455 1 GM Dec Tree 9 3 27 entropy 6 5 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.54
56 1 GM Dec Tree 10 3 27 entropy 6 5obs
import =Y
20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91
57 1 GM Dec Tree 11 3 27 entropy 6 5asses =5% Lift
20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91
58 1 GM Dec Tree 12 3 27 entropy 10 2 20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9146 2 GM Dec Tree 13 3 33 entropy 6 5 a=5% lift 20,0,-5,0 7 16 15.92 14.96 0.96 14.00 8.29 7.84 0.45 7.39 4.40 4.17 0.23 3.9447 2 GM Dec Tree 14 13 33 entropy 6 5 a=5% lift 10,-2.5,-1,0 13 15 16.32 15.05 1.27 13.78 9.07 8.00 1.07 6.93 4.63 4.08 0.55 3.5348 2 GM Dec Tree 15 13 33 entropy 6 5 a=5% lift 1,-1,1,-1 8 15 15.30 14.34 0.96 13.38 7.98 7.53 0.45 7.08 4.25 4.05 0.20 3.8549 2 GM Dec Tree 16 13 33 entropy 6 5 a=5% lift 10,-1,1,-1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.8450 2 GM Dec Tree 17 13 33 entropy 6 5 a=5% lift 20,-5,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.9551 2 GM Dec Tree 18 13 33 entropy 6 5 a=5% lift 20,-1,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.9552 2 GM Dec Tree 19 13 33 entropy 6 5 a=5% lift xval = no 20,0,-1,0 6 15 15.87 15.52 0.35 15.17 8.26 8.12 0.14 7.98 4.40 4.32 0.08 4.2453 2 GM Dec Tree 20 13 33 entropy 6 5 a=5% lift 20,-5,-1,1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.8454 2 GM Dec Tree 21 13 33 entropy 6 5 a=5% lift xval = no 20,0,0,1 9 16 16.17 15.57 0.60 14.97 8.74 8.25 0.49 7.76 4.44 4.21 0.23 3.9855 2 GM Dec Tree 22 19 33 gini 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.1256 2 GM Dec Tree 23 19 33 probchisq 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.1257 2 GM Dec Tree 24 19 33 entropy 20 5 a=5% lift 20,0,-1,0 19 26 18.94 15.42 3.52 11.90 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.2258 2 GM Dec Tree 25 19 33 entropy 20 20 a=5% lift 20,0,-1,0 19 26 18.94 13.80 5.14 8.66 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.2259 2 GM Dec Tree 26 19 33 entropy 20 40 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.0560 2 GM Dec Tree 27 19 33 entropy 20 60 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.0561 2 GM Dec Tree 28 19 33 entropy 7 5 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.5262 2 GM Dec Tree 29 19 33 entropy 7 10 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.5263 2 GM Dec Tree 30 19 33 entropy 7 20 a=5% lift 20,0,-1,0 7 37 16.04 14.66 1.38 13.28 8.35 7.69 0.66 7.03 4.41 4.07 0.34 3.73
64 2 GM Dec Tree 31 19 35 entropy 7 40 a=5% lift itmledratioitm_to_led
20,0,-1,0 7 36 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14
65 2 GM Dec Tree 32 19 35 entropy 7 60 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.1466 2 GM Dec Tree 33 19 35 entropy 7 80 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.1467 2 GM Dec Tree 34 19 35 entropy 7 100 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.1468 2 GM Dec Tree 35 19 35 entropy 7 150 a=5% lift 20,0,-1,0 5 37 14.53 13.08 1.45 11.63 7.75 7.19 0.56 6.63 4.36 4.29 0.07 4.2264 2 GM Dec Tree 36 19 35 entropy 6 5 a=5% lift 20,0,-1,0 7 29 15.91 14.95 0.96 13.99 8.29 7.83 0.46 7.37 4.40 4.17 0.23 3.94
ex=20knode smp
= 30k
65 2 GM Dec Tree 37 1914, raw
onlyentropy 6 5 a=5% lift 0 20,0,-1,0 7 16 13.92 11.81 2.11 9.69 7.46 6.54 0.93 5.61 4.24 3.91 0.33 3.57
5.28 2.15 0.41
66 3 GM Dec Tree 38 19 45 entropy 8 5 a=5% lift xval = no 20,0,-5,1 3 39 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.5867 3 GM Dec Tree 39 38 45 gini 8 5 a=5% lift xval = no 20,0,-5,1 3 71 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.5868 3 GM Dec Tree 40 38 45 propchi 8 5 a=5% lift xval = no 20,0,-5,1 3 42 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.5869 3 GM Dec Tree 41 38 45 entropy 20 5 a=5% lift subtr= 20,0,-5,1 33 91 20.00 14.81 5.19 9.61 10.00 7.54 2.46 5.08 5.00 3.90 1.10 2.8070 3 GM Dec Tree 42 38 45 entropy 20 100 a=5% lift sub=lrg 20,0,-5,1 25 70 19.09 16.25 2.84 13.42 10.00 8.17 1.83 6.35 5.00 4.19 0.81 3.3871 3 GM Dec Tree 43 38 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 23 64 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.6772 3 GM Dec Tree 44 38 45 entropy 20 400 a=5% lift sub=lrg 20,0,-5,1 21 59 15.87 17.08 1.21 14.67 9.02 8.96 0.06 8.89 4.97 4.69 0.28 4.4173 3 GM Dec Tree 45 38 45 entropy 20 800 a=5% lift sub=lrg 20,0,-5,1 16 52 14.35 16.16 1.81 12.53 8.46 8.96 0.50 7.96 4.78 4.79 0.01 4.7874 3 GM Dec Tree 46 38 45 entropy 20 1600 a=5% lift sub=lrg 20,0,-5,1 16 47 14.25 16.02 1.78 12.47 8.26 8.59 0.34 7.92 4.58 4.42 0.17 4.2575 3 GM Dec Tree 47 38 45 entropy 20 3200 a=5% lift sub=lrg 20,0,-5,1 10 39 12.45 14.35 1.91 10.54 7.49 8.31 0.82 6.67 4.36 4.48 0.12 4.2476 3 GM Dec Tree 48 43 45 entropy 20 150 a=5% lift sub=lrg 20,0,-5,1 23 68 18.57 16.25 2.32 13.93 10.00 8.14 1.86 6.27 5.00 4.17 0.83 3.3477 3 GM Dec Tree 49 43 45 entropy 20 300 a=5% lift sub=lrg 20,0,-5,1 23 62 16.45 17.86 1.41 15.03 9.31 8.96 0.35 8.61 5.00 4.60 0.40 4.2078 3 GM Dec Tree 50 43 45 entropy 20 250 a=5% lift sub=lrg 20,0,-5,1 24 65 16.64 17.71 1.07 15.57 9.56 8.96 0.60 8.36 5.00 4.61 0.39 4.2179 3 GM Dec Tree 51 43 45 entropy 20 350 a=5% lift sub=lrg 20,0,-5,1 24 67 16.07 17.50 1.43 14.64 9.19 8.96 0.23 8.73 5.00 4.59 0.41 4.1880 3 GM Dec Tree 52 43 45 entropy 20 225 a=5% lift sub=lrg 20,0,-5,1 23 63 17.85 16.67 1.18 15.49 9.83 8.96 0.87 8.09 5.00 4.53 0.48 4.0581 3 GM Dec Tree 53 43 45 entropy 20 175 a=5% lift sub=lrg 20,0,-5,1 26 68 18.15 16.25 1.90 14.35 9.97 8.13 1.84 6.28 5.00 4.16 0.84 3.3282 3 GM Dec Tree 54 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5.0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.6783 3 GM Dec Tree 55 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-1,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.6784 3 GM Dec Tree 56 43 45 entropy 20 200 a=5% lift sub=lrg 20,-5,0,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67
85 4 GM Dec Tree 57 43 146 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 9 149 20.00 14.09 5.91 8.19 10.00 7.20 2.80 4.40 5.00 3.76 1.24 2.5186 4 GM Dec Tree 58 57 107 18 115 20.00 16.09 3.91 12.18 10.00 8.15 1.85 6.29 5.00 4.18 0.82 3.3587 4 GM Dec Tree 59 57 107 entropy 20 500 a=5% lift sub=lrg 20,0,-5,1 13 110 19.46 14.79 4.68 10.11 10.00 7.64 2.36 5.29 5.00 3.95 1.05 2.9188 4 GM Dec Tree 60 57 107 entropy 20 1000 a=5% lift sub=lrg 20,0,-5,1 10 89 18.94 14.47 4.47 10.00 10.00 7.44 2.56 4.88 5.00 3.86 1.14 2.7389 4 GM Dec Tree 61 57 107 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 81 14.41 13.91 0.50 13.41 9.54 8.02 1.51 6.51 6.61 4.25 2.36 1.9090 4 GM Dec Tree 62 57 107 entropy 20 3000 a=5% lift sub=lrg 20,0,-5,1 5 71 9.89 7.91 1.98 5.94 8.74 6.39 2.35 4.04 5.00 3.70 1.30 2.4091 4 GM Dec Tree 63 57 107 entropy 20 1500 a=5% lift sub=lrg 20,0,-5,1 9 60 16.17 14.66 1.50 13.16 9.89 8.18 1.71 6.47 5.00 3.38 1.62 1.7692 4 GM Dec Tree 64 57 107 entropy 20 1750 a=5% lift sub=lrg 20,0,-5,1 7 60 15.23 14.32 0.92 13.40 9.68 8.07 1.61 6.46 5.00 4.26 0.75 3.5193 4 GM Dec Tree 65 57 107 entropy 20 2250 a=5% lift sub=lrg 20,0,-5,1 5 60 15.43 11.00 4.43 6.56 9.55 6.30 3.25 3.05 5.00 3.70 1.30 2.4094 4 GM Dec Tree 66 61 58 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 105 14.07 13.92 0.15 13.77 8.45 7.88 0.57 7.30 4.74 4.02 0.73 3.2995 4 GM Dec Tree 67 61 80 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 97 14.25 13.94 0.30 13.64 9.25 7.88 1.37 6.51 5.00 4.25 0.75 3.4996 4 GM Dec Tree 68 61 103 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 103 14.41 13.72 0.69 13.03 9.54 8.02 1.52 6.50 5.00 4.25 0.75 3.50
97 4n GM Dec Tree 69 61 3 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 7 14.61 15.54 0.93 13.68 8.83 8.99 0.16 8.67 4.88 4.73 0.15 4.5898 4n GM Dec Tree 70 0 20 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 10 11.50 11.12 0.38 10.74 7.08 7.29 0.21 6.87 4.24 3.94 0.30 3.64
use RAW vars ONLY, to test value of my preprocessing
Mcnt
DataVer
Author
AlgorModNum
chngfromprior
binarymodel
cleanupmodel
max numrips
VarSel
TrnTime
Train Val GapConsrvResult
Train Val GapConsrvResult
Train Val GapConsrvResult
94 1 GM Rule Ind 1 0 tree neural 16 32 10.77 9.92 0.85 9.07 6.28 5.60 0.68 4.92 3.35 3.09 0.26 2.8395 1 GM Rule Ind 2 1 regr neural 16 36 5.95 7.52 1.57 4.38 3.55 4.85 1.30 2.25 2.35 3.17 0.82 1.5396 1 GM Rule Ind 3 1 neural tree 16 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.3797 1 GM Rule Ind 4 3 neural tree 4 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.3798 1 GM Rule Ind 5 3 neural tree 32 121 5.95 7.92 1.97 3.98 3.53 5.64 2.11 1.42 2.34 3.32 0.98 1.3699 1 GM Rule Ind 6 1 tree neural 32 32 7.25 5.26 1.99 3.27 6.45 5.17 1.28 3.89 3.43 3.09 0.34 2.75
100 1 GM Rule Ind 7 1 tree neural 4 32 7.25 5.26 1.99 3.27 6.45 5.17 1.28 3.89 3.43 3.09 0.34 2.75
101 3 GM Rule Ind 8 1 tree neural 16 107 12.29 14.45 2.16 10.13 6.35 7.37 1.02 5.32 3.38 3.83 0.45 2.92102 3 GM Rule Ind 9 1 tree neural 8 107 12.29 14.45 2.16 10.13 6.35 7.37 1.02 5.32 3.38 3.83 0.45 2.92103 3 GM Rule Ind 10 1 tree neural 32 107 12.29 14.45 2.16 10.13 6.35 7.37 1.02 5.32 3.38 3.83 0.45 2.92
Mcnt
DataVer
Author
AlgorModNum
chngfromprior
N Iter Shrinkmax
depthPerf
Huber M-Regr
DecisionWeight
VarSel
TrnTime
Train Val GapConsrvResult
Train Val GapConsrvResult
Train Val GapConsrvResult
104 1 GM Grad Boost 1 0 50 0.10 2 disk No 20,0,-5,0 6 12 11.42 12.61 1.19 10.23 7.57 6.96 0.61 6.35 4.29 4.00 0.29 3.71105 1 GM Grad Boost 2 1 50 0.10 2 RAM No 20,0,-5,0 6 59 11.42 12.61 1.19 10.23 7.57 6.96 0.61 6.35 4.29 4.00 0.29 3.71106 1 GM Grad Boost 3 1 200 0.10 2 disk No 20,0,-5,0 19 429 13.77 12.67 1.10 11.57 8.28 7.23 1.05 6.18 4.53 4.06 0.47 3.59107 1 GM Grad Boost 4 1 50 0.10 4 disk No 20,0,-5,0 18 139 11.69 12.67 0.98 10.71 7.67 6.86 0.81 6.05 4.30 4.01 0.29 3.72108 1 GM Grad Boost 5 1 200 0.10 4 disk No 20,0,-5,0 13 401 11.69 12.67 0.98 10.71 7.67 6.86 0.81 6.05 4.30 4.01 0.29 3.72109 1 GM Grad Boost 6 1 1000 0.10 2 disk No 20,0,-5,0 13 2003 11.60 12.67 1.07 10.53 7.67 6.86 0.81 6.05 4.30 4.01 0.29 3.72110 1 GM Grad Boost 7 3 200 0.05 2 disk No 20,0,-5,0 16 367 13.49 12.67 0.82 11.85 7.85 7.33 0.52 6.81 4.49 4.11 0.38 3.73111 1 GM Grad Boost 8 3 200 0.20 2 disk No 20,0,-5,0 13 263 13.49 12.48 1.01 11.47 7.80 7.43 0.37 7.06 4.53 4.08 0.45 3.63112 1 GM Grad Boost 9 3 200 0.02 2 (stop No 20,0,-5,0 10 285 11.26 12.67 1.41 9.85 7.53 7.08 0.45 6.63 4.42 4.12 0.30 3.82113 1 GM Grad Boost 10 3 200 0.07 2 invest- No 20,0,-5,0 19 410 13.77 12.87 0.90 11.97 7.91 7.23 0.68 6.55 4.51 4.11 0.40 3.71114 1 GM Grad Boost 11 10 400 0.07 2 igating) No 20,0,-5,0 19 618 13.77 12.87 0.90 11.97 7.91 7.23 0.68 6.55 4.51 4.11 0.40 3.71115 1 GM Grad Boost 12 10 400 0.07 2 Huber .6 20,0,-5,0 17 569 13.46 12.28 1.18 11.10 8.13 7.43 0.70 6.73 4.45 4.06 0.39 3.67116 1 GM Grad Boost 13 10 200 0.07 2 Huber .8 20,0,-5,0 17 364 13.46 12.28 1.18 11.10 8.13 7.43 0.70 6.73 4.45 4.06 0.39 3.67117 1 GM Grad Boost 14 10 200 0.07 2 Huber 1 20,0,-5,0 17 366 13.46 12.28 1.18 11.10 8.13 7.43 0.70 6.73 4.45 4.06 0.39 3.67
Mcnt
DataVer
Author
AlgorModNum
chngfromprior
N Iter Shrinkmax
depthLeaf
FractionHuber M-
RegrDecisionWeight
VarSel
TrnTime
Train Val GapConsrvResult
Train Val GapConsrvResult
Train Val GapConsrvResult
118 JS 1 JS Grad Boost 15 1 50 0.10 2 0.05 No 10,-2.5,-1,0 ? ? 13.22 13.77 0.55 12.67 8.14 7.64 0.50 7.14 4.41 4.44 0.03 4.38119 JS 1 JS Grad Boost 16 15 100 0.08 2 0.05 No 10,-2.5,-1,0 ? ? 16.27 13.33 2.94 10.39 8.47 7.33 1.14 6.19 4.49 4.22 0.27 3.95120 JS 1 JS Grad Boost 17 15 100 0.15 2 0.05 No 10,-2.5,-1,0 ? ? 15.93 13.33 2.60 10.73 9.15 7.78 1.37 6.41 4.54 4.29 0.25 4.04121 JS 1 JS Grad Boost 18 15 100 0.20 2 0.05 No 10,-2.5,-1,0 ? ? 15.93 13.77 2.16 11.61 8.57 8.22 0.35 7.87 4.53 4.44 0.09 4.35122 2 GM Grad Boost 19 10 400 0.07 2 default No 20,0,-5,0 23 1717 16.09 14.79 1.30 13.49 8.84 8.13 0.71 7.42 4.56 3.50 1.06 2.44123 2 GM Grad Boost 20 15 50 0.10 2 0.05 No 10,-2.5,-1,0 14 239 14.31 13.75 0.56 13.19 8.36 7.71 0.65 7.06 4.53 4.43 0.10 4.33
124 2 GM Grad Boost 21 10 200 0.10 2 0.1ex=20k
node smp= 30k
20,0,-1,0 19 818 15.38 14.38 1.00 13.38 8.62 7.81 0.81 7.00 4.44 4.38 0.06 4.32
125 2 GM Grad Boost 22 19 50 0.10 3 0.05 20,0,-5,0 21 355 14.93 13.96 0.97 12.99 8.44 8.02 0.42 7.60 4.56 4.43 0.13 4.30126 2 GM Grad Boost 23 19 500 0.10 3 0.05 20,0,-5,0 28 4058 17.33 15.42 1.91 13.51 9.11 8.54 0.57 7.97 4.71 4.53 0.18 4.35
127 3 GM Grad Boost 24 23 200 0.07 3 0.05 20,0,-1,0 20 2915 13.83 16.67 2.84 10.99 7.36 8.54 1.18 6.18 4.01 4.58 0.58 3.43128 3 GM Grad Boost 25 23 500 0.07 3 0.05 20,0,-5,0 21 8012 14.10 16.25 2.15 11.94 7.66 9.38 1.72 5.94 4.08 4.69 0.61 3.47129 3 GM Grad Boost 26 23 1000 0.07 3 0.05 20,0,-5,0 22 14243 14.10 16.25 2.15 11.96 7.66 9.38 1.72 5.94 4.10 4.79 0.69 3.42130 3 GM Grad Boost 27 23 200 0.10 3 0.05 20,0,-5,0 19 3100 13.88 16.67 2.78 11.10 7.58 8.85 1.28 6.30 3.98 4.69 0.71 3.27131 3 GM Grad Boost 28 23 200 0.15 3 0.05 20,0,-5,0 19 3088 13.88 15.83 1.95 11.94 7.52 8.75 1.23 6.29 4.02 4.69 0.67 3.35132 3 GM Grad Boost 29 23 200 0.20 3 0.05 20,0,-5,0 17 2570 14.16 16.25 2.09 12.07 7.44 9.17 1.73 5.71 4.03 4.69 0.66 3.37133 3 GM Grad Boost 30 23 200 0.25 3 0.05 20,0,-5,0 17 2798 14.16 16.25 2.09 12.07 7.63 9.17 1.54 6.10 4.16 4.90 0.74 3.42134 3 GM Grad Boost 31 23 200 0.30 3 0.05 20,0,-5,0 17 2882 14.38 17.08 2.70 11.68 7.69 8.96 1.27 6.41 4.27 4.90 0.62 3.65135 3 GM Grad Boost 32 23 600 0.20 3 0.05 20,0,-5,0 27 10956 15.70 17.08 1.38 14.32 8.29 9.58 1.29 7.00 4.44 5.00 0.57 3.87
136 4 GM Grad Boost 33 32 50 0.20 3 0.05 20,0,-5,0 1 852 0.44 0.49 0.05 0.39 0.44 0.49 0.05 0.39 0.44 0.49 0.05 0.39137 4 GM Grad Boost 34 32 1
138 4 GM Grad Boost 35 32 15 980 14.62 13.94 0.68 13.26 8.14 7.42 0.71 6.71 4.22 4.03 0.19 3.84
(tree settings the same, dropped INT* categorical vars, not DBC)
(same Grad Boost settings. Exclude 3 & 4 way INT*dbc vars, like the same bad results, 1 var =
(same Grad Boost settings. Exclude all INT*dbc vars, using just 62)
improvement gain in Conservative Lift from new variables (vs. DecTree-d2-m19)
interactions are getting selected, improve Trn results butdecrease Val results. Perhaps I should regen the INT*dbc with alarger number of min records.
DecisionTreeData Version 1
More
DecisionTreeData Version 2
DecisionTreeData Version 3
DecisionTreeData Version 4
“Agile Software Design”Get something simple,fully working and testedearly on (Data Version 1)
Data Version 2…4Working, incremental improvementsIncremental complexity
Different preprocessingAdd more fields, recordsAdd & test morecomplexity
Model Notebook ProcessTracking Detail Training the Data Miner
Mcnt
DataVer
Author
AlgorModNum
chngfromprior
varsoffered
criterionmax
depthleaf size
asses =5% Lift
DecisionWeight
VarSel
TrnTime
Train Val GapConsrvResult
Train Val GapConsrvResult
Train Val GapConsrvResult
47 1 GM Dec Tree 1 0 27 default 6 5 20,0,-5,0 7 13 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.2748 1 GM Dec Tree 2 1 27 probchisq 6 5 20,0,-5,0 7 16 13.71 9.59 4.12 5.47 7.67 5.35 2.32 3.03 4.33 3.80 0.53 3.2749 1 GM Dec Tree 3 1 27 entropy 6 5 20,0,-5,0 6 16 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9150 1 GM Dec Tree 4 1 27 gini 6 5 20,0,-5,0 10 22 13.76 11.28 2.48 8.80 7.70 6.10 1.60 4.50 4.32 3.71 0.61 3.1051 1 GM Dec Tree 5 3 27 entropy 12 5 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9152 1 GM Dec Tree 6 3 27 entropy 6 10 20,0,-5,0 6 13 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9153 1 GM Dec Tree 7 3 27 entropy 6 100 20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9154 1 GM Dec Tree 8 3 27 entropy 6 100 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.5455 1 GM Dec Tree 9 3 27 entropy 6 5 xval = Y 20,0,-5,0 8 32 14.51 12.82 1.69 11.13 8.95 7.42 1.53 5.89 4.72 4.13 0.59 3.54
56 1 GM Dec Tree 10 3 27 entropy 6 5obs
import =Y
20,0,-5,0 6 17 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91
57 1 GM Dec Tree 11 3 27 entropy 6 5asses =5% Lift
20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.91
58 1 GM Dec Tree 12 3 27 entropy 10 2 20,0,-5,0 6 12 13.94 12.62 1.32 11.30 7.49 7.09 0.40 6.69 4.27 4.09 0.18 3.9146 2 GM Dec Tree 13 3 33 entropy 6 5 a=5% lift 20,0,-5,0 7 16 15.92 14.96 0.96 14.00 8.29 7.84 0.45 7.39 4.40 4.17 0.23 3.9447 2 GM Dec Tree 14 13 33 entropy 6 5 a=5% lift 10,-2.5,-1,0 13 15 16.32 15.05 1.27 13.78 9.07 8.00 1.07 6.93 4.63 4.08 0.55 3.5348 2 GM Dec Tree 15 13 33 entropy 6 5 a=5% lift 1,-1,1,-1 8 15 15.30 14.34 0.96 13.38 7.98 7.53 0.45 7.08 4.25 4.05 0.20 3.8549 2 GM Dec Tree 16 13 33 entropy 6 5 a=5% lift 10,-1,1,-1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.8450 2 GM Dec Tree 17 13 33 entropy 6 5 a=5% lift 20,-5,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.9551 2 GM Dec Tree 18 13 33 entropy 6 5 a=5% lift 20,-1,0,0 12 15 16.32 15.60 0.72 14.88 8.79 8.26 0.53 7.73 4.47 4.21 0.26 3.9552 2 GM Dec Tree 19 13 33 entropy 6 5 a=5% lift xval = no 20,0,-1,0 6 15 15.87 15.52 0.35 15.17 8.26 8.12 0.14 7.98 4.40 4.32 0.08 4.2453 2 GM Dec Tree 20 13 33 entropy 6 5 a=5% lift 20,-5,-1,1 12 16 16.32 15.05 1.27 13.78 8.96 8.14 0.82 7.32 4.62 4.23 0.39 3.8454 2 GM Dec Tree 21 13 33 entropy 6 5 a=5% lift xval = no 20,0,0,1 9 16 16.17 15.57 0.60 14.97 8.74 8.25 0.49 7.76 4.44 4.21 0.23 3.9855 2 GM Dec Tree 22 19 33 gini 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.1256 2 GM Dec Tree 23 19 33 probchisq 6 5 a=5% lift 20,0,-1,0 8 16 15.17 13.17 2.00 11.17 8.02 7.32 0.70 6.62 4.40 4.26 0.14 4.1257 2 GM Dec Tree 24 19 33 entropy 20 5 a=5% lift 20,0,-1,0 19 26 18.94 15.42 3.52 11.90 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.2258 2 GM Dec Tree 25 19 33 entropy 20 20 a=5% lift 20,0,-1,0 19 26 18.94 13.80 5.14 8.66 9.67 7.78 1.89 5.89 4.90 4.06 0.84 3.2259 2 GM Dec Tree 26 19 33 entropy 20 40 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.0560 2 GM Dec Tree 27 19 33 entropy 20 60 a=5% lift 20,0,-1,0 7 27 16.06 15.29 0.77 14.52 8.36 8.00 0.36 7.64 4.41 4.23 0.18 4.0561 2 GM Dec Tree 28 19 33 entropy 7 5 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.5262 2 GM Dec Tree 29 19 33 entropy 7 10 a=5% lift 20,0,-1,0 10 33 16.73 14.57 2.16 12.41 8.90 7.75 1.15 6.60 4.60 4.06 0.54 3.5263 2 GM Dec Tree 30 19 33 entropy 7 20 a=5% lift 20,0,-1,0 7 37 16.04 14.66 1.38 13.28 8.35 7.69 0.66 7.03 4.41 4.07 0.34 3.73
64 2 GM Dec Tree 31 19 35 entropy 7 40 a=5% lift itmledratioitm_to_led
20,0,-1,0 7 36 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.14
65 2 GM Dec Tree 32 19 35 entropy 7 60 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.1466 2 GM Dec Tree 33 19 35 entropy 7 80 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.1467 2 GM Dec Tree 34 19 35 entropy 7 100 a=5% lift 20,0,-1,0 6 35 15.90 15.36 0.54 14.82 8.28 8.03 0.25 7.78 4.40 4.27 0.13 4.1468 2 GM Dec Tree 35 19 35 entropy 7 150 a=5% lift 20,0,-1,0 5 37 14.53 13.08 1.45 11.63 7.75 7.19 0.56 6.63 4.36 4.29 0.07 4.2264 2 GM Dec Tree 36 19 35 entropy 6 5 a=5% lift 20,0,-1,0 7 29 15.91 14.95 0.96 13.99 8.29 7.83 0.46 7.37 4.40 4.17 0.23 3.94
ex=20knode smp
= 30k
65 2 GM Dec Tree 37 1914, raw
onlyentropy 6 5 a=5% lift 0 20,0,-1,0 7 16 13.92 11.81 2.11 9.69 7.46 6.54 0.93 5.61 4.24 3.91 0.33 3.57
5.28 2.15 0.41
66 3 GM Dec Tree 38 19 45 entropy 8 5 a=5% lift xval = no 20,0,-5,1 3 39 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.5867 3 GM Dec Tree 39 38 45 gini 8 5 a=5% lift xval = no 20,0,-5,1 3 71 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.5868 3 GM Dec Tree 40 38 45 propchi 8 5 a=5% lift xval = no 20,0,-5,1 3 42 13.41 15.52 2.11 11.30 7.50 8.47 0.97 6.54 4.01 4.44 0.43 3.5869 3 GM Dec Tree 41 38 45 entropy 20 5 a=5% lift subtr= 20,0,-5,1 33 91 20.00 14.81 5.19 9.61 10.00 7.54 2.46 5.08 5.00 3.90 1.10 2.8070 3 GM Dec Tree 42 38 45 entropy 20 100 a=5% lift sub=lrg 20,0,-5,1 25 70 19.09 16.25 2.84 13.42 10.00 8.17 1.83 6.35 5.00 4.19 0.81 3.3871 3 GM Dec Tree 43 38 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 23 64 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.6772 3 GM Dec Tree 44 38 45 entropy 20 400 a=5% lift sub=lrg 20,0,-5,1 21 59 15.87 17.08 1.21 14.67 9.02 8.96 0.06 8.89 4.97 4.69 0.28 4.4173 3 GM Dec Tree 45 38 45 entropy 20 800 a=5% lift sub=lrg 20,0,-5,1 16 52 14.35 16.16 1.81 12.53 8.46 8.96 0.50 7.96 4.78 4.79 0.01 4.7874 3 GM Dec Tree 46 38 45 entropy 20 1600 a=5% lift sub=lrg 20,0,-5,1 16 47 14.25 16.02 1.78 12.47 8.26 8.59 0.34 7.92 4.58 4.42 0.17 4.2575 3 GM Dec Tree 47 38 45 entropy 20 3200 a=5% lift sub=lrg 20,0,-5,1 10 39 12.45 14.35 1.91 10.54 7.49 8.31 0.82 6.67 4.36 4.48 0.12 4.2476 3 GM Dec Tree 48 43 45 entropy 20 150 a=5% lift sub=lrg 20,0,-5,1 23 68 18.57 16.25 2.32 13.93 10.00 8.14 1.86 6.27 5.00 4.17 0.83 3.3477 3 GM Dec Tree 49 43 45 entropy 20 300 a=5% lift sub=lrg 20,0,-5,1 23 62 16.45 17.86 1.41 15.03 9.31 8.96 0.35 8.61 5.00 4.60 0.40 4.2078 3 GM Dec Tree 50 43 45 entropy 20 250 a=5% lift sub=lrg 20,0,-5,1 24 65 16.64 17.71 1.07 15.57 9.56 8.96 0.60 8.36 5.00 4.61 0.39 4.2179 3 GM Dec Tree 51 43 45 entropy 20 350 a=5% lift sub=lrg 20,0,-5,1 24 67 16.07 17.50 1.43 14.64 9.19 8.96 0.23 8.73 5.00 4.59 0.41 4.1880 3 GM Dec Tree 52 43 45 entropy 20 225 a=5% lift sub=lrg 20,0,-5,1 23 63 17.85 16.67 1.18 15.49 9.83 8.96 0.87 8.09 5.00 4.53 0.48 4.0581 3 GM Dec Tree 53 43 45 entropy 20 175 a=5% lift sub=lrg 20,0,-5,1 26 68 18.15 16.25 1.90 14.35 9.97 8.13 1.84 6.28 5.00 4.16 0.84 3.3282 3 GM Dec Tree 54 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-5.0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.6783 3 GM Dec Tree 55 43 45 entropy 20 200 a=5% lift sub=lrg 20,0,-1,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.6784 3 GM Dec Tree 56 43 45 entropy 20 200 a=5% lift sub=lrg 20,-5,0,0 23 65 17.67 16.67 1.01 15.66 9.81 8.54 1.27 7.28 5.00 4.34 0.66 3.67
85 4 GM Dec Tree 57 43 146 entropy 20 200 a=5% lift sub=lrg 20,0,-5,1 9 149 20.00 14.09 5.91 8.19 10.00 7.20 2.80 4.40 5.00 3.76 1.24 2.5186 4 GM Dec Tree 58 57 107 18 115 20.00 16.09 3.91 12.18 10.00 8.15 1.85 6.29 5.00 4.18 0.82 3.3587 4 GM Dec Tree 59 57 107 entropy 20 500 a=5% lift sub=lrg 20,0,-5,1 13 110 19.46 14.79 4.68 10.11 10.00 7.64 2.36 5.29 5.00 3.95 1.05 2.9188 4 GM Dec Tree 60 57 107 entropy 20 1000 a=5% lift sub=lrg 20,0,-5,1 10 89 18.94 14.47 4.47 10.00 10.00 7.44 2.56 4.88 5.00 3.86 1.14 2.7389 4 GM Dec Tree 61 57 107 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 81 14.41 13.91 0.50 13.41 9.54 8.02 1.51 6.51 6.61 4.25 2.36 1.9090 4 GM Dec Tree 62 57 107 entropy 20 3000 a=5% lift sub=lrg 20,0,-5,1 5 71 9.89 7.91 1.98 5.94 8.74 6.39 2.35 4.04 5.00 3.70 1.30 2.4091 4 GM Dec Tree 63 57 107 entropy 20 1500 a=5% lift sub=lrg 20,0,-5,1 9 60 16.17 14.66 1.50 13.16 9.89 8.18 1.71 6.47 5.00 3.38 1.62 1.7692 4 GM Dec Tree 64 57 107 entropy 20 1750 a=5% lift sub=lrg 20,0,-5,1 7 60 15.23 14.32 0.92 13.40 9.68 8.07 1.61 6.46 5.00 4.26 0.75 3.5193 4 GM Dec Tree 65 57 107 entropy 20 2250 a=5% lift sub=lrg 20,0,-5,1 5 60 15.43 11.00 4.43 6.56 9.55 6.30 3.25 3.05 5.00 3.70 1.30 2.4094 4 GM Dec Tree 66 61 58 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 105 14.07 13.92 0.15 13.77 8.45 7.88 0.57 7.30 4.74 4.02 0.73 3.2995 4 GM Dec Tree 67 61 80 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 8 97 14.25 13.94 0.30 13.64 9.25 7.88 1.37 6.51 5.00 4.25 0.75 3.4996 4 GM Dec Tree 68 61 103 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,1 7 103 14.41 13.72 0.69 13.03 9.54 8.02 1.52 6.50 5.00 4.25 0.75 3.50
97 4n GM Dec Tree 69 61 3 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 7 14.61 15.54 0.93 13.68 8.83 8.99 0.16 8.67 4.88 4.73 0.15 4.5898 4n GM Dec Tree 70 0 20 entropy 20 2000 a=5% lift sub=lrg 20,0,-5,0 10 11.50 11.12 0.38 10.74 7.08 7.29 0.21 6.87 4.24 3.94 0.30 3.64
use RAW vars ONLY, to test value of my preprocessing
Mcnt
DataVer
Author
AlgorModNum
chngfromprior
binarymodel
cleanupmodel
max numrips
VarSel
TrnTime
Train Val GapConsrvResult
Train Val GapConsrvResult
Train Val GapConsrvResult
94 1 GM Rule Ind 1 0 tree neural 16 32 10.77 9.92 0.85 9.07 6.28 5.60 0.68 4.92 3.35 3.09 0.26 2.8395 1 GM Rule Ind 2 1 regr neural 16 36 5.95 7.52 1.57 4.38 3.55 4.85 1.30 2.25 2.35 3.17 0.82 1.5396 1 GM Rule Ind 3 1 neural tree 16 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.3797 1 GM Rule Ind 4 3 neural tree 4 121 5.95 7.92 1.97 3.98 3.52 5.64 2.12 1.40 2.34 3.31 0.97 1.3798 1 GM Rule Ind 5 3 neural tree 32 121 5.95 7.92 1.97 3.98 3.53 5.64 2.11 1.42 2.34 3.32 0.98 1.3699 1 GM Rule Ind 6 1 tree neural 32 32 7.25 5.26 1.99 3.27 6.45 5.17 1.28 3.89 3.43 3.09 0.34 2.75
100 1 GM Rule Ind 7 1 tree neural 4 32 7.25 5.26 1.99 3.27 6.45 5.17 1.28 3.89 3.43 3.09 0.34 2.75
101 3 GM Rule Ind 8 1 tree neural 16 107 12.29 14.45 2.16 10.13 6.35 7.37 1.02 5.32 3.38 3.83 0.45 2.92102 3 GM Rule Ind 9 1 tree neural 8 107 12.29 14.45 2.16 10.13 6.35 7.37 1.02 5.32 3.38 3.83 0.45 2.92103 3 GM Rule Ind 10 1 tree neural 32 107 12.29 14.45 2.16 10.13 6.35 7.37 1.02 5.32 3.38 3.83 0.45 2.92
Mcnt
DataVer
Author
AlgorModNum
chngfromprior
N Iter Shrinkmax
depthPerf
Huber M-Regr
DecisionWeight
VarSel
TrnTime
Train Val GapConsrvResult
Train Val GapConsrvResult
Train Val GapConsrvResult
104 1 GM Grad Boost 1 0 50 0.10 2 disk No 20,0,-5,0 6 12 11.42 12.61 1.19 10.23 7.57 6.96 0.61 6.35 4.29 4.00 0.29 3.71105 1 GM Grad Boost 2 1 50 0.10 2 RAM No 20,0,-5,0 6 59 11.42 12.61 1.19 10.23 7.57 6.96 0.61 6.35 4.29 4.00 0.29 3.71106 1 GM Grad Boost 3 1 200 0.10 2 disk No 20,0,-5,0 19 429 13.77 12.67 1.10 11.57 8.28 7.23 1.05 6.18 4.53 4.06 0.47 3.59107 1 GM Grad Boost 4 1 50 0.10 4 disk No 20,0,-5,0 18 139 11.69 12.67 0.98 10.71 7.67 6.86 0.81 6.05 4.30 4.01 0.29 3.72108 1 GM Grad Boost 5 1 200 0.10 4 disk No 20,0,-5,0 13 401 11.69 12.67 0.98 10.71 7.67 6.86 0.81 6.05 4.30 4.01 0.29 3.72109 1 GM Grad Boost 6 1 1000 0.10 2 disk No 20,0,-5,0 13 2003 11.60 12.67 1.07 10.53 7.67 6.86 0.81 6.05 4.30 4.01 0.29 3.72110 1 GM Grad Boost 7 3 200 0.05 2 disk No 20,0,-5,0 16 367 13.49 12.67 0.82 11.85 7.85 7.33 0.52 6.81 4.49 4.11 0.38 3.73111 1 GM Grad Boost 8 3 200 0.20 2 disk No 20,0,-5,0 13 263 13.49 12.48 1.01 11.47 7.80 7.43 0.37 7.06 4.53 4.08 0.45 3.63112 1 GM Grad Boost 9 3 200 0.02 2 (stop No 20,0,-5,0 10 285 11.26 12.67 1.41 9.85 7.53 7.08 0.45 6.63 4.42 4.12 0.30 3.82113 1 GM Grad Boost 10 3 200 0.07 2 invest- No 20,0,-5,0 19 410 13.77 12.87 0.90 11.97 7.91 7.23 0.68 6.55 4.51 4.11 0.40 3.71114 1 GM Grad Boost 11 10 400 0.07 2 igating) No 20,0,-5,0 19 618 13.77 12.87 0.90 11.97 7.91 7.23 0.68 6.55 4.51 4.11 0.40 3.71115 1 GM Grad Boost 12 10 400 0.07 2 Huber .6 20,0,-5,0 17 569 13.46 12.28 1.18 11.10 8.13 7.43 0.70 6.73 4.45 4.06 0.39 3.67116 1 GM Grad Boost 13 10 200 0.07 2 Huber .8 20,0,-5,0 17 364 13.46 12.28 1.18 11.10 8.13 7.43 0.70 6.73 4.45 4.06 0.39 3.67117 1 GM Grad Boost 14 10 200 0.07 2 Huber 1 20,0,-5,0 17 366 13.46 12.28 1.18 11.10 8.13 7.43 0.70 6.73 4.45 4.06 0.39 3.67
Mcnt
DataVer
Author
AlgorModNum
chngfromprior
N Iter Shrinkmax
depthLeaf
FractionHuber M-
RegrDecisionWeight
VarSel
TrnTime
Train Val GapConsrvResult
Train Val GapConsrvResult
Train Val GapConsrvResult
118 JS 1 JS Grad Boost 15 1 50 0.10 2 0.05 No 10,-2.5,-1,0 ? ? 13.22 13.77 0.55 12.67 8.14 7.64 0.50 7.14 4.41 4.44 0.03 4.38119 JS 1 JS Grad Boost 16 15 100 0.08 2 0.05 No 10,-2.5,-1,0 ? ? 16.27 13.33 2.94 10.39 8.47 7.33 1.14 6.19 4.49 4.22 0.27 3.95120 JS 1 JS Grad Boost 17 15 100 0.15 2 0.05 No 10,-2.5,-1,0 ? ? 15.93 13.33 2.60 10.73 9.15 7.78 1.37 6.41 4.54 4.29 0.25 4.04121 JS 1 JS Grad Boost 18 15 100 0.20 2 0.05 No 10,-2.5,-1,0 ? ? 15.93 13.77 2.16 11.61 8.57 8.22 0.35 7.87 4.53 4.44 0.09 4.35122 2 GM Grad Boost 19 10 400 0.07 2 default No 20,0,-5,0 23 1717 16.09 14.79 1.30 13.49 8.84 8.13 0.71 7.42 4.56 3.50 1.06 2.44123 2 GM Grad Boost 20 15 50 0.10 2 0.05 No 10,-2.5,-1,0 14 239 14.31 13.75 0.56 13.19 8.36 7.71 0.65 7.06 4.53 4.43 0.10 4.33
124 2 GM Grad Boost 21 10 200 0.10 2 0.1ex=20k
node smp= 30k
20,0,-1,0 19 818 15.38 14.38 1.00 13.38 8.62 7.81 0.81 7.00 4.44 4.38 0.06 4.32
125 2 GM Grad Boost 22 19 50 0.10 3 0.05 20,0,-5,0 21 355 14.93 13.96 0.97 12.99 8.44 8.02 0.42 7.60 4.56 4.43 0.13 4.30126 2 GM Grad Boost 23 19 500 0.10 3 0.05 20,0,-5,0 28 4058 17.33 15.42 1.91 13.51 9.11 8.54 0.57 7.97 4.71 4.53 0.18 4.35
127 3 GM Grad Boost 24 23 200 0.07 3 0.05 20,0,-1,0 20 2915 13.83 16.67 2.84 10.99 7.36 8.54 1.18 6.18 4.01 4.58 0.58 3.43128 3 GM Grad Boost 25 23 500 0.07 3 0.05 20,0,-5,0 21 8012 14.10 16.25 2.15 11.94 7.66 9.38 1.72 5.94 4.08 4.69 0.61 3.47129 3 GM Grad Boost 26 23 1000 0.07 3 0.05 20,0,-5,0 22 14243 14.10 16.25 2.15 11.96 7.66 9.38 1.72 5.94 4.10 4.79 0.69 3.42130 3 GM Grad Boost 27 23 200 0.10 3 0.05 20,0,-5,0 19 3100 13.88 16.67 2.78 11.10 7.58 8.85 1.28 6.30 3.98 4.69 0.71 3.27131 3 GM Grad Boost 28 23 200 0.15 3 0.05 20,0,-5,0 19 3088 13.88 15.83 1.95 11.94 7.52 8.75 1.23 6.29 4.02 4.69 0.67 3.35132 3 GM Grad Boost 29 23 200 0.20 3 0.05 20,0,-5,0 17 2570 14.16 16.25 2.09 12.07 7.44 9.17 1.73 5.71 4.03 4.69 0.66 3.37133 3 GM Grad Boost 30 23 200 0.25 3 0.05 20,0,-5,0 17 2798 14.16 16.25 2.09 12.07 7.63 9.17 1.54 6.10 4.16 4.90 0.74 3.42134 3 GM Grad Boost 31 23 200 0.30 3 0.05 20,0,-5,0 17 2882 14.38 17.08 2.70 11.68 7.69 8.96 1.27 6.41 4.27 4.90 0.62 3.65135 3 GM Grad Boost 32 23 600 0.20 3 0.05 20,0,-5,0 27 10956 15.70 17.08 1.38 14.32 8.29 9.58 1.29 7.00 4.44 5.00 0.57 3.87
136 4 GM Grad Boost 33 32 50 0.20 3 0.05 20,0,-5,0 1 852 0.44 0.49 0.05 0.39 0.44 0.49 0.05 0.39 0.44 0.49 0.05 0.39137 4 GM Grad Boost 34 32 1
138 4 GM Grad Boost 35 32 15 980 14.62 13.94 0.68 13.26 8.14 7.42 0.71 6.71 4.22 4.03 0.19 3.84
(tree settings the same, dropped INT* categorical vars, not DBC)
(same Grad Boost settings. Exclude 3 & 4 way INT*dbc vars, like the same bad results, 1 var =
(same Grad Boost settings. Exclude all INT*dbc vars, using just 62)
improvement gain in Conservative Lift from new variables (vs. DecTree-d2-m19)
interactions are getting selected, improve Trn results butdecrease Val results. Perhaps I should regen the INT*dbc with alarger number of min records.
More
Can treat model notebook tableas meta-data (i.e. 144 records ormodels)
Train models on meta-dataSource vars = model parametersTarget 1 = conservative result
orTarget 2 = training time
Perform sensitivity analysisto answer questions:
Q) Searching which modeltraining parameters lead to thebest results?Q) …most training time?
Outline
Model Training Parameters in SAS Enterprise MinerTracking Conservative Results in a “Model Notebook”
How to Measure Progress
Meta-Gradient Search of Model Training ParametersHow to Plan and dynamically adapt
How to Describe Any Complex System – Sensitivity
25
Design Of Experiments (DOE)Parameter Search
• Ideally, vary one parameter at a time, quantify the results– Bigger challenge in BIG DATA compute per model
• Exhaustive Grid Search O(3P)– for Param A = Low, Med, High (test 3 settings)– for Param B = Low, Med, High– for Param C = Low, Med, High– easy to implement, not the most efficient– Can use Fractional Factorial design (i.e. 10%)
• Scales less effectively for many parameters
• Stochastic Search (Genetic Algorithms) O(1002)– Directed Random Search is more efficient than Grid Search, but…– Can be overkill in complexity: (100 models / generation) * (100’s gens)
• Taguchi Analysis (works with this DOE approach)– Efficient multivariate orthogonal search– test landing pages w/ Offermatica (acquired by Ominture in 2007 for DOE)– http://en.wikipedia.org/wiki/Taguchi_methods– Does not use domain knowledge of parameter interactions - OPPORTUNITY
A
B
C
TaguchiDesign
• Not a full gridsearch
• Can weimprove withexperienceand aheuristicprocess?
27http://www.itl.nist.gov/div898/handbook/pri/section5/pri56.htmhttp://www.jmp.com/support/downloads/pdf/jmp_design_of_experiments.pdf
Model Parameters
Algorithm Searches Meta-Search by a Data MinerDesign of Experiments (DOE)
Over Your ChoicesAlgorithm Model Parameters Model Training ParametersRegression weights variable selct (forward, step)Neural net weights step size; learning rate
Decision Tree (spend < $1000) max depth; (Gini, Entropy)
28
Model Parameters vs.Model Training Parameters
Algorithm Searches Meta-Search by a Data MinerDesign of Experiments (DOE)
Over Your ChoicesAlgorithm Model Parameters Model Training ParametersRegression weights variable select (forward, step)Neural net weights step size; learning rate
Decision Tree (spend < $1000) max depth; (Gini, Entropy)
29
Heuristic Planning YourDesign of Experiments (DOE)
• Assumptions about Data Mining Project– May be on BIG DATA, with practical constraints– May be training 4 to 400 models (not 4000+ like GA)– Want diversity, to investigate different algorithms– Want to generalize process to future deployments
• Heuristic Strategies– Use knowledge of interacting parameters (parallel tests)
• (Cost+profit weights) and (boosting weights) fight each other– Delay searching compute intensive parameters
• First stabilize most other “computationally reasonable” params• Large decision tree depth,• neural nets w/ lots of connections
– Opportunistically spend time by algorithm success30
Gradient Descent Numerical MethodsSearching to Find Minima
31
ForestFieldsBeachWaterDeep Water
High Error
Low Error
Weight Parameter 1W
eightParam2
Min
Min
hill tops
beachwater
Min
Gradient Descent Numerical MethodsSearching to Find Minima
32
ForestFieldsBeachWaterDeep Water
High Error
Low Error
“Ski Down” fromthe mountains toLake TahoeMoving = adjust paramX = starting positionM = a local minimum
Weight Parameter 1W
eightParam2
X
M
M
hill tops
beachwater
Conservative Result with Respect toModel Training Parameters
33
ForestFieldsBeachWaterDeep Water
High Error
Low Error
“Ski Down” fromthe mountains toLake TahoeMoving = adjust paramX = starting positionM = a local minimum
Model Parameter 1M
odelParam2
X
M
M
Heuristic Planning YourDesign of Experiments (DOE)
• Start with a reasonable default setting ofparameters,– the “center of the daisy” the gradient check
• Vary one parameter at a time from the center– “each petal of the daisy” gradient search trial
• Move to the next “reasonable multivariate start”– The “stem of the daisy” steepest descent 34
Heuristic “Meta-Gradient Search” ofModel Training Parameters
35
High Error
Low Error Parameter 1Param
eter2
M
Heuristic “Meta-Gradient Search” ofModel Training Parameters
36
High Error
Low Error Parameter 1Param
eter2
M
Heuristic “Meta-Gradient Search” ofModel Training Parameters
37Parameter 1
Parameter2
Mvs.
Taguchi DOE
Art vs. Science?
No, a practicalcompliment
using existingnum. methods
Heuristic “Meta-Gradient Search” ofModel Training Parameters
38
ModNum
chngfromprior
varsoffered
criterionmax
depthleaf size
1 0 27 default 6 52 1 27 probchisq 6 53 1 27 entropy 6 54 1 27 gini 6 55 3 27 entropy 12 56 3 27 entropy 6 107 3 27 entropy 6 1008 3 27 entropy 6 1009 3 27 entropy 6 5
10 3 27 entropy 6 5
11 3 27 entropy 6 5
12 3 27 entropy 10 2
Can you give a moretangible example?This sounds a bit
vague.
Change from Prior Model– tracks change from the“center of a daisy”(Model 1 or 3)
Heuristic “Meta-Gradient Search” ofModel Training Parameters
• After stabilizing most of the “fast” and “medium”compute time parameters, search the “long computetime settings”
• With the final parameter settings, if 2x or 10x more datais available, perform a “final bake in,” long training run
• Then try Ensemble Methods– Stacking, boosting, bagging combining many of the best
models,– Gradient Boosting over residual error– Select models who’s residual errors correlate the least– Use a 2nd stage model to combine 1st stage models and top
preprocessed fields (for context switching)– Last year’s KDD Cup winners– Netflix winners used Ensemble methods
Outline
Model Training Parameters in SAS Enterprise MinerTracking Conservative Results in a “Model Notebook”
How to Measure Progress
Meta-Gradient Search of Model Training ParametersHow to Plan and dynamically adapt
How to Describe Any Complex SystemSensitivity Analysis
40
Needs to Describe Forecast Alg• Many Data Mining solutions need description
– To check writer (to SVP, owner, business unit, …) business realitycheck before deployment
– “What if” analysis, to fine tune larger system• Feed Operations Research or Revenue Management systems
– Need a modeling “descriptive simulation” (political donations)– When evaluating credit, by law required to offer 4 “reason
codes” for each person scored – when they are declined
• Should the Data Miner cut algorithm choices?– NO! “I understand how a bike works, but I drive a car to work”– how much detailed understanding is needed?– Provide enough info to “drive the car” vs. “build the car”
• Check writer does not need to understand B-tree to buy SQL 41
Sensitivity Analysis(OAT) One At a Time*
42
Arbitrarily ComplexData Mining System
Present record N, S times, each input 5% bigger (fixed input delta)Record delta change in output, S times per record
Aggregate: average(abs(delta)), target change per input field delta
(S) Source fields*Some catch interactions
Target field
For source fields withbinned ranges, sensitivity
tells you importance of therange, i.e. “low”, …. “high”
Can put sensitivity values inPivot Tablesor Cluster
Record Level “Reasoncodes” can be extractedfrom the most important
bins that apply to the givenrecord
Delta in forecast
43
Descriptions of Predictive ModelsReason Codes – Ranked by Sensitivity Analysis
• Reason codes are specific to the model and record
• Ranked predictive fields Mr. Smith Mr. Jonesmax_late_payment_120d 0 1max_late_payment_90d 1 0bankrupt_in_last_5_yrs 1 1max_late_payment_60d 0 0
• Mr. Smith’s reason codes include:max_late_payment_90d 1bankrupt_in_last_5_yrs 1
Summary• Conservative Result (How to Measure)
– Continuous metric to select accurate and general models• Heuristic Meta-Gradient Search (How to Plan)
– An automated or human process to plan a Design ofExperiments (DOE)
– Searches the training parameters that a data miner adjustsin data mining software (“meta-parameter search”)
– Heuristic DOE improvements• Most systems can be “reasonably described”
– Focus on repeatable business benefit (accuracy) overdescription or blind Occam’s Razor on a tech metric
44
SF Bay ACM, Data Mining SIG, Feb 28, 2011http://www.sfbayacm.org/?p=2464
[email protected]/in/GregMakowski
Take Away: The process of goingfrom design objectives to heuristic design