Top Banner
© 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain
16

Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

Aug 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Distributed Machine Learning with Zero ETL

Yury Babak

Head of development, GridGain

Page 2: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Long ETL

Page 3: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Long ETL

- Х%

- Х%

Page 4: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Distributed Training

Page 5: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Node Crash

Page 6: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Apache Ignite

Page 7: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Apache Ignite: Replicated Caches

Server Node 1 Server Node 2

Server Node 3 Server Node 4

Client

Page 8: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Map Reduce

Page 9: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Iterative Optimization Algorithm

Page 10: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Partition Based Data Set

Page 11: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Restoration of partitions after a failure

Page 12: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Recovering calculations after failure

Page 13: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

OLS sample

Loss function

Gradient of loss function

Node 2Node 1Node M

Page 14: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Sample 2 LSQR

Page 15: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

Limitations of Applicability

Iteration time

Number of Iterations

SGDBS 1 000

BS 10

Time to training

Page 16: Distributed Machine Learning with Zero ETL · © 2018 GridGain Systems, Inc. Distributed Machine Learning with Zero ETL Yury Babak Head of development, GridGain

© 2018 GridGain Systems, Inc.

https://ignite.apache.org

https://apacheignite.readme.io/docs

https://github.com/apache/ignite

[email protected]

Want to learn more?