Top Banner
Volodymyrk Bayesian Model Averaging Bayesian Mixer, 27.09.2016 London, UK
25

Bayesian model averaging

Apr 14, 2017

Download

Science

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bayesian model averaging

Volodymyrk

Bayesian Model AveragingBayesian Mixer, 27.09.2016

London, UK

Page 2: Bayesian model averaging

Volodymyrk

Bayesian Model Averaging (BMA) - 1 minute versionNew Project - how much does it worth?

CFO VP of Growth

Net Present Value: $50m $100m

Model M1 Model M2

30%CEO belief:after evaluating both models and market

data

70%

$15m + $70m = $85m

K = 2

Page 3: Bayesian model averaging

Volodymyrk

Bayesian Model Averaging (BMA) - 3 minute version

VP of Growth

CLV assumptions

$10 $12 $15

CAC$4 72 129 149$6 62 112 133$8 51 92 101

Average= $100.11m

Sensitivity Analysis for M2DATA

Page 4: Bayesian model averaging

Volodymyrk

Bayesian Model Averaging (BMA) - 5 minute version

Bayesian Model Averaging: A Tutorial Jennifer A. Hoeting, David Madigan, Adrian E. Raftery and Chris T. Volinsky

How much do you trust your VP and CFO, before you look at models?

Scary normalising term that you can ignore

Prior probability for model parameter

Page 5: Bayesian model averaging

Volodymyrk

Bayesian answer to overfitting

Frequentist: - model selection- regularisation

Bayesian: - BMA- marginalisation

Page 6: Bayesian model averaging

Volodymyrk

Case Study

You just get the best job in the galaxy

Page 7: Bayesian model averaging

Volodymyrk

Your new Boss Business domain Modelling case

Always test your models on synthetic data that you understand and control

Page 8: Bayesian model averaging

Volodymyrk

Use Cases:- Fraud Detection- Inventory Sourcing

Data

Page 9: Bayesian model averaging

Volodymyrk

Modelling goals

- Prediction range is needed, so that you can identify fraudulent transactions(sand people under-reporting real transaction size and pocketing profit)

- Sale price should be easily explainable, as a function of various Droid Featuresso that Jabba can invest in appropriate scavenging/sourcing projects

- You want lowest prediction error possibleso that you are not feeded to Sarlacc

Page 10: Bayesian model averaging

Volodymyrk

Data Generation Class-1

Class-2

Class-3

Class-4

durability

circuitry

height

weight

price

...

age

Page 11: Bayesian model averaging

Volodymyrk

Data Collection

Page 12: Bayesian model averaging

Volodymyrk

Model Selection - classical methodcredits ~ height + weight + power + dents + rad + wheels + legs + red + blue + black + temperature + lat + long + ir_emit + dents_log + height_log + weight_log + power_log + rad_log

Adj. R2: 0.884974385182

Page 13: Bayesian model averaging

Volodymyrk

Model Selection - backward elimination

Page 14: Bayesian model averaging

Volodymyrk

Final Modelcredits ~ weight + power + dents + rad + wheels + blue + black + temperature + lat + dents_log + height_log + weight_log + power_log

Adj. R2: 0.903544333611

Page 15: Bayesian model averaging

Volodymyrk

Model Evaluation (out-of-sample)

Page 16: Bayesian model averaging

Volodymyrk

Ridge regression (L2 regularisation)

Page 17: Bayesian model averaging

Volodymyrk

Bayesian Model Averaging for Linear Models - a special case

Inclusion probability for (regression coefficients) are weighted across all possible models

Number of models = combinations of all K features (include/exclude) = 2K

Page 18: Bayesian model averaging

Volodymyrk

How to actually do BMA? (in R)cran.r-project.org/web/packages/BMA cran.r-project.org/web/packages/BAScran.r-project.org/web/packages/BMS

Mature. A.k.a. “the original”

Developed by PhD during research. Not maintained

Newest. Maintained by Chair of the Department of Statistical Science at Duke

Page 19: Bayesian model averaging

Volodymyrk

BMA using BMS (R) package

Model Selection L2 Regularisation BMA

MSE 9736.49 7782.21 7329.44

It worked! But you can find inputs into data generator script that will not work as well!

Page 20: Bayesian model averaging

Volodymyrk

Nice things you get from BMA

Posterior Inclusion Probability!How cool is that!

Page 21: Bayesian model averaging

Volodymyrk

Model ranking!

MCMC can be used, if number of features is large

Best model, according to BMA

Page 22: Bayesian model averaging

Volodymyrk

Can we use it for more complex models?

normalising term that you can ignore

http://www.ssc.wisc.edu/~bhansen/718/NonParametrics15.pdf http://www.ejwagenmakers.com/2004/aic.pdf

Warning: Very questionable math. Does not work

Page 23: Bayesian model averaging

Volodymyrk

Can we use BMA to combine complex (incl. hierarchical) models?

1

3

2

Model order is somewhat similar. Relative probabilities are not.We need working Reverse-Jump MCMC or something more sophisticated.

Not available in common bayesian MCMC packages yet.

Page 24: Bayesian model averaging

Volodymyrk

In Summary

- BMA is a Bayesian version of ML Model Ensembles- Math behind is quite beautiful

- Model Averaging is useful for interpretation, not only prediction

- Invest in synthetic data generation, - before applying new modelling techniques to real-world data

- Even if you are not using BMA, fit different models- And combine them, if your goal is prediction

- BMA works very well for common GLMs, but does not work yet for arbitrary models

- Do try it next time you need to fit OLS, though!

Page 25: Bayesian model averaging

Volodymyrk

Of course we are hiring!

● (Snr, Mid) Data Scientists

● Solutions Architect

● Ruby Developer

● Data Engineer

● Senior Artist

● Technical Artist

● Unity Developers

● Senior Product Manager

● Product Director

http://jobs.productmadness.com/