Top Banner
Livy: A REST Web Service for Spark River IQ
15

Livy: A REST Web Service for Spark

Jan 21, 2018

Download

Technology

Ashish Kumar
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Livy: A REST Web Service for Spark

Livy: A REST Web Service for

Spark

River IQ

Page 2: Livy: A REST Web Service for Spark

What is Livy?

A Service that manages long running Spark Contexts in your cluster.

• A Service which provides interaction with Apache Spark Cluster through Rest Interface.

• Open Source Apache Licensed.

• multi-tenant environment as it manages multiple Spark context efficiently.

• Livy removes the need of Local Spark Environment due to which we can submit jobs from mobile or web environment.

• Fine grained job submission.

• Retrieve job results over REST asynchronously or synchronously.

• Client APIs in java, Scala and soon in python.

Page 3: Livy: A REST Web Service for Spark

Features of Livy

• Interactive Scala, Python, and R shells

• Batch submissions in Scala, Java, Python

• Can handle Multiple spark jobs at the same time.

• Reliable for Multi-tenant executions.

• Can be used for submitting jobs from anywhere with REST

• Support Spark1/ Spark2, Scala 2.10/2.11 within one build.

• It is 100% open source Apache Licensed API.

• LIVY supports impersonation by which multiple users can share the same server.

• For using Livy there is no need to change the existing code just instead of defining the spark context we have to use the predefined sparkcontext in LIVY.

• Share Cached RDD’s or Dataframes between multiple jobs or clients.

Page 4: Livy: A REST Web Service for Spark

Jupyter-Spark Integration via Livy

Sparkmagic is an open source library that Microsoft is incubating under the Jupyter Incubator program. Thousands of Spark clusters in production providing feedback to further improve the experience

Architectural Advantages of Jupyter integration via Livy

• Run Spark code completely remotely; no Spark components need to be

• installed on the Jupyter server

• Multi-language support; the Python, Scala and R kernels are equally feature-rich

• Support for multiple endpoints; you can use a single notebook to start multiple Spark jobs in different languages and against different remote clusters

• Easy integration with any Python library for data science or visualization, like Pandas or Plotly

Page 5: Livy: A REST Web Service for Spark

Manage multiple independent Spark Contexts

Page 6: Livy: A REST Web Service for Spark

User Impersonation

Page 7: Livy: A REST Web Service for Spark

Zeppelin Livy Interaction

Page 8: Livy: A REST Web Service for Spark

Interactive Session – Create Session

2

1

3

4

curl -X POST --data '{"kind": "spark"}' -H "Content-Type: application/json" localhost:8998/sessions

{"state":"starting","proxyUser":”null","id":1,"kind":"spark","log":[]}

Request

Response

Livy Client

Livy Server

Spark Interactive Session

Spark Context

Page 9: Livy: A REST Web Service for Spark

Interactive Session – Execute Code

{"id":0,"state":"running","output":null}

Request

Response

curl http://localhost:8998/sessions/0/statements -X POST -H 'Content-Type: application/json' -d '{"code":"sc.parallelize(0 to 100).sum()"}'

2

1

3

4

Livy Client

Livy Server

Spark Interactive Session

SparkContext

Page 10: Livy: A REST Web Service for Spark

SparkContext Sharing

Livy Server

Client 1

Client 2

Client 3

Session-1

Session-1

Session-2 Session-2

Session-1SparkSession-1

SparkContext

SparkSession-2

SparkContext

Page 11: Livy: A REST Web Service for Spark

Livy Security

Client Livy Server

(Impersonation)

Shared SecretSpengo

SparkSession

• Only authorized users can launch spark session / submit code

• Each user can access his own session

• Only Livy server can submit job securely to spark session

Page 12: Livy: A REST Web Service for Spark

SPNEGO

Client(Kerbrose TGT)

Livy Server(SPENGO enabled)

• Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO), often pronounced "spen-go”

• It is a GSSAPI "pseudo mechanism" used by client-server software to negotiate the choice of security technology.

Http Get http://site/a.html

Error 401 Unauthorized

Http Get Request

Authorization: Negotiation

Http Get Request

Page 13: Livy: A REST Web Service for Spark

Impersonation

Alice(Kerberos TGT)

Shared Secret

Bob(Kerberos TGT)

Shared SecretSpengo

Spengo

Livy Server

(super user: livy)

Spark Session

Spark Session

Page 14: Livy: A REST Web Service for Spark

Shared Secret

• Livy Server generate secret key

• Livy Server pass secret key to spark session when launching spark session

• Use the secret key to communicate with each other

Spark SessionShared Secret

Livy Server

Page 15: Livy: A REST Web Service for Spark