Top Banner
No-Bullshit Data Science Szilárd Pafka, PhD Chief Scientist, Epoch R/Finance Conference Chicago, May 2017
103

No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Apr 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

No-Bullshit Data Science

Szilárd Pafka, PhDChief Scientist, Epoch

R/Finance ConferenceChicago, May 2017

Page 2: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 3: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Disclaimer:

I am not representing my employer (Epoch) in this talk

I cannot confirm nor deny if Epoch is using any of the methods, tools, results etc. mentioned in this talk

Page 4: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 5: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 6: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 7: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 8: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Example #1

Page 9: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 10: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 11: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 12: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 13: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 14: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 15: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 16: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 17: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 18: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 19: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 20: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 21: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 22: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 23: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 24: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 25: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 26: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 27: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 28: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

https://deads.gitbooks.io/paratext-bench/content/teaser.html June 2016

Page 29: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Aggregation 100M rows 1M groups Join 100M rows x 1M rows

time [s]

time [s]

Page 30: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

(largest data analyzed)

Page 31: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

(largest data analyzed)

Page 32: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

(largest data analyzed)

Page 33: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 34: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

data size [M]

trainingtime [s]

10x

Gradient Boosting Machines

Page 35: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 36: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 37: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

linear tops off(data size)

(accuracy)

Page 38: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

linear tops off

more data & better algo

(data size)

(accuracy)

Page 39: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

linear tops off

more data & better algorandom forest on 1% of data beats linear on all data

(data size)

(accuracy)

Page 40: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

linear tops off

more data & better algorandom forest on 1% of data beats linear on all data

(data size)

(accuracy)

Page 41: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 42: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 43: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 44: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 45: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 46: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 47: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 48: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 49: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 50: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Summary / Tips for analyzing “big” data:

- Get lots of RAM (physical/ cloud)

- Use R/Python and high performance packages (e.g. data.table, xgboost)

- Do data reduction in database (analytical db/ big data system)

- (Only) distribute embarrassingly parallel tasks (e.g. hyperparameter search for machine learning)

- Let engineers (store and) ETL the data (“scalable”)

- Use statistics/ domain knowledge/ thinking

- Use “big data tools” only if the above tips not enough

Page 51: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Example #2

Page 52: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 53: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

I usually use other people’s code [...] I can find open source code for what I want to do, and my time is much better spent doing research and feature engineering -- Owen Zhanghttp://blog.kaggle.com/2015/06/22/profiling-top-kagglers-owen-zhang-currently-1-in-the-world/

Page 54: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

binary classification, 10M recordsnumeric & categorical features, non-sparse

Page 55: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

http://www.cs.cornell.edu/~alexn/papers/empirical.icml06.pdf

http://lowrank.net/nikos/pubs/empirical.pdf

Page 56: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

http://www.cs.cornell.edu/~alexn/papers/empirical.icml06.pdf

http://lowrank.net/nikos/pubs/empirical.pdf

Page 57: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 58: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 59: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

- R packages- Python scikit-learn- Vowpal Wabbit- H2O- xgboost- Spark MLlib- a few others

Page 60: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

- R packages 30%- Python scikit-learn 40%- Vowpal Wabbit 8%- H2O 10%- xgboost 8%- Spark MLlib 6%- a few others

Page 61: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

- R packages 30%- Python scikit-learn 40%- Vowpal Wabbit 8%- H2O 10%- xgboost 8%- Spark MLlib 6%- a few others

Page 62: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 63: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

EC2

Page 64: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

n = 10K, 100K, 1M, 10M, 100M

Training timeRAM usageAUCCPU % by coreread data, pre-process, score test data

Page 65: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

n = 10K, 100K, 1M, 10M, 100M

Training timeRAM usageAUCCPU % by coreread data, pre-process, score test data

Page 66: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 67: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 68: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 69: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 70: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 71: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 72: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 73: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

10x

Page 74: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 75: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 76: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 77: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

http://datascience.la/benchmarking-random-forest-implementations/#comment-53599

Page 78: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 79: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 80: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 81: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 82: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 83: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 84: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Best linear: 71.1

Page 85: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 86: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 87: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

learn_rate = 0.1, max_depth = 6, n_trees = 300learn_rate = 0.01, max_depth = 16, n_trees = 1000

Page 88: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 89: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 90: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

...

Page 91: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 92: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 93: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 94: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 95: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 96: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 97: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction

Summary

Page 98: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 99: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 100: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 101: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 102: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction
Page 103: No-Bullshit Data Science - R in Financepast.rinfinance.com/agenda/2017/talk/SzilardPafka.pdf · - Use R/Python and high performance packages (e.g. data.table, xgboost) - Do data reduction