Apache Zeppelin The (very) short field trip by G.Alléon & G.Dupont TDS meetup - 2016.06.30
Apache ZeppelinThe (very) short field trip
by G.Alléon & G.DupontTDS meetup - 2016.06.30
Who are we?Guillaume Alleon - AIRBUS Group Innovation (corporate research center)Research leader for more than 30 people from UK to China, tackling problems in massive data processing and information extraction.
Was already in “big data” when it was still called HPC…
Gerard Dupont - AIRBUS Defence & Space (space systems)Technical coordinator for R&T studies on distributed processing systems.
Spend way too much time processing web data for intelligence, now looking to the sky (satellite data ;-)
Zeppelin moto
“A web-based notebook that enables interactive data analytics.”
Origins & historyMissing piece in HADOOP landscape: a modern analytic playground.
2012.12 - Data analytics solution (NFLabs)
2013.10 - Opensourced
2014.12 - ASF incubation
2015 - 3 stable releases
2016.05 - Maturing to Apache top level project
3000 feet view
What’s cool about Zeppelin⊕ interactive
⊕ out-of-the-box spark integration
⊕ out-of-the-box visualization options
⊕ direct access to DOM for customized visualization
⊕ nice UI (bootstrap & angular)
⊕ notebook run scheduler
⊕ easy to configure
⊕ extensibility, extensibility and extensibility...
What’s cool about Zeppelin⊕ interactive
⊕ out-of-the-box spark integration
⊕ out-of-the-box visualization options
⊕ direct access to DOM for customized visualization
⊕ nice UI (bootstrap & angular)
⊕ notebook run scheduler
⊕ easy to configure
⊕ extensibility, extensibility and extensibility...
… the dark side ⊝ hard to install
⊝ need to build from the source (for customized version)
⊝ not (yet) multi-users
⊝ still “young”
⊝ resources greedy
Overview/look & feel
Interpreter text (aka your code)
Interpreter config
Interactive results
Under the hood○Interpreter isolation with
their own JVM
○Dynamic dependencies loading
○REST & websocket on front
○Thrift in back (or whatever you add)
○Process scheduler (cron-like)
RoadmapEnterprise Ready
○Multi-tenancy
○Job scheduler
○HA
Usability Improvement
○UX improvement
○Table data support
○Dynamic interpreter integration
○Reusable analytic application catalog
ThxOffical website: https://zeppelin.apache.org/
Notebook sample: https://www.zeppelinhub.com/viewer
Source code: https://github.com/apache/incubator-zeppelin
Mailing lists: http://zeppelin.apache.org/community.html
This TDS notebook: http://tinyurl.com/zeppelin-tdsSources for this presentation:
○ http://www.slideshare.net/FlinkForward/moon-soo-lee-data-science-lifecycle-with-apache-flink-and-apache-zeppelin/23○ http://www.slideshare.net/HadoopSummit/apache-zeppelin-helium-and-beyond○ http://www.slideshare.net/felixcss/interactive-data-science-from-scratch-with-apache-zeppelin-and-apache-spark○ http://www.slideshare.net/BrunoBonnin/explorez-vos-donnes-avec-apache-zeppelin
credits: https://www.weasyl.com/~uszatyarbuz
BACKUP
Origins & historyActive core teams
Descent number of external contributors
Plenty of interpreters (official and external)
0.6.0-SNAPSHOT (pending stabilization)
3000 feet view