Top Banner
@doanduyhai Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate
30

Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

May 20, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

Page 2: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Who Am I ?!Duy Hai DOAN Cassandra technical advocate •  talks, meetups, confs •  open-source devs (Achilles, …) •  OSS Cassandra point of contact

[email protected] ☞ @doanduyhai

2

Page 3: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Datastax!•  Founded in April 2010

•  We contribute a lot to Apache Cassandra™

•  400+ customers (25 of the Fortune 100), 400+ employees

•  Headquarter in San Francisco Bay area

•  EU headquarter in London, offices in France and Germany

•  Datastax Enterprise = OSS Cassandra + extra features

3

Page 4: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

What is Apache Zeppelin ?!

Presentation!Architecture!

!

Page 5: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Zeppelin Presentation!

5

Page 6: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Zeppelin Architecture!

Zeppelin Server

Zeppelin Engine

6

REST

Web

Sock

et

Spark Interpreter Group

Spark SparkSQL

Zeppelin Interpreter

Factory

Tajo Interpreter

Flink Interpreter

Cassandra Interpreter

JVM

JVM

JVM

JVM

JVM

Page 7: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

What does Zeppelin provide ?!Front-end & display system for free Generic back-end with REST APIs & WebSocket Pluggable interpreters system Task scheduler (à la CRON)

7

Page 8: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

Zeppelin UI Layout!

Notebook!Paragraph!

UI elements!

Page 9: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Demo https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Page 10: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

Zeppelin Display System!!

Raw, Table, HTML!Available graphs!

View modes!Dynamic form!Iframe export!

Page 11: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Demo https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Page 12: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

Interpreter system !!

Core interpreters!Third-parties interpreters!

Interpreters conf & usage!

Page 13: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Interpreter processing lifecycle!①  Receive input commands/data •  as raw text

•  from form data

②  Process the input commands/data by the external back-end ③  Format the response using Zeppelin display system ④  Send response back to the Zeppelin engine

13

Page 14: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Core interpreters !!•  Spark (Spark core, SparkSQL/DataFrame, PySpark) •  Spark core = default (or %spark)

•  SparkSQL = %sql

•  Shell (%sh)

•  Markdown (%md) !

•  AngularJS (%angular)

14

Page 15: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Third-parties interpreters!•  Hive •  Phoenix •  Tajo •  Flink •  Ignite •  Lens •  Cassandra •  Geode •  PostgreSQL •  Kylin

15

Page 16: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Interpreter conf & usage https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Page 17: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

Writing An Interpreter !!

How To!Simple interpreter example (AsciiDoc)!

Complex interpreter example (Cassandra)!

Page 18: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Steps to write your own interpreter!

•  Create a class that extends Interpreter base class

•  Register it in a static block

•  Optionnally define default config params

18

static { Interpreter.register("MyInterpreterName", MyClassName.class.getName());

}

static { Interpreter.register("MyInterpreterName", MyClassName.class.getName(), new InterpreterPropertyBuilder() .add("property1", "default value", "Description of property1").build());

}

Page 19: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

To register your interpreter as default !

•  Edit the enum ZeppelinConfiguration.ConfVars

•  Add your interpreter FQCN in the property ZEPPELIN_INTERPRETERS

19

Page 20: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

To register your interpreter in config files!

•  Create conf/zeppelin-site.xml from conf/zeppelin-site.xml.template

•  Add your interpreter FQCN in the property zeppelin.interpreters

20

<property> <name>zeppelin.interpreters</name> <value>org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter, org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter, org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.shell.ShellInterpreter, org.apache.zeppelin.hive.HiveInterpreter,com.me.MyNewInterpreter </value>

</property>

Page 21: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Simple AsciiDoc Interpreter!

21

Zeppelin Server

AsciiDoc Interpreter

JVM Zeppelin Engine

Raw Text Block

Raw Text Block

Converted To

HTML

HTML Output

① ②

③ ④

JVM

Page 22: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Simple interpreter (AsciiDoc) https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Page 23: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Cassandra Interpreter Architecture!

23

Cassandra Interpreter

JVM

Display Results as

HTML

① ②

Zeppelin Server

JVM

Raw Text Block

Raw Text Block

Cassandra Cassandra

Java Driver

Async CQL statements

④ Render HTML

Page 24: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Cassandra Interpreter Commands!

24

Native CQL statements SELECT * FROM …; INSERT INTO …; …

Schema commands DESCRIBE TABLE …; DESCRIBE KEYSPACE …; …

Prepared statements Commands

@prepare …; @bind …; @remove_prepared …;

Help command HELP;

Options Commands @consistency …; @retryPolicy …; @fetchSize …;

Page 25: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Complex interpreter (Cassandra) https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Page 26: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

Zeppelin future!!

Roadmap!

Page 27: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Roadmap & future!•  More graph options (Map viz ZEPPELIN-157)

•  Helium project, packaging Zeppelin view, logic (code) & resource into Applications

•  Interpreters packaging re-design •  ship & compile core interpreters only

•  third-parties interpreters can be pulled from repository

•  which interpreter is core ? Who will maintain ? Community….

•  Integrate security (Apache Shiro, ZEPPELIN-53 )

27

Page 28: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Roadmap & future!•  Out of incubation state to become 1st class Apache project

28

Page 29: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Q & R

! " !

Page 30: Apache Zeppelin, the missing component for your BigData … · 2017-12-14 · Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

@doanduyhai

Thank You @doanduyhai

[email protected]

http://zeppelin.incubator.apache.org/