Top Banner
Couchbase & Spark efficient data crunching in a fast moving world
26

Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

Apr 08, 2017

Download

Software

Couchbase
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

Couchbase & Sparkefficient data crunching in a fast moving world

Page 2: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 2

The Big Picture

Page 3: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 3

Apache Spark… is a fast and general purpose engine for small and large scale data processing …

Page 4: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 4

Components: Spark Core

Resilient Distributed DatasetsClusteringExecution

Page 5: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 5

Components: Spark SQL

Structured through DataFrames

Distributed querying with SQL

Page 6: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 6

Components: Spark Streaming

Fault-tolerant streaming applications

Page 7: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 7

Components: Spark MLib

Built-In Machine Learning Algorithms

Page 8: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 8

Components: Spark GraphX

Graph processing and graph-parallel computations

Page 9: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 9

How does it work?

Source: http://spark.apache.org/docs/latest/cluster-overview.html

Page 10: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 10

Spark Benefits

Linearly scalable to 1000+ worker nodes Simpler to use than Hadoop MR Only partial recompute on failure

For developers and data scientists– machine learning– R integration

Tight but not mandatory Hadoop integration– Sources, Sinks– Scheduler

Page 11: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 11

Spark vs Hadoop

Spark is RAM while Hadoop is mainly HDFS (disk) bound

Fully compatible with Hadoop Input/Output

Easier to develop against thanks to functional composition

Hadoop certainly more mature, but Spark ecosystem growing fast

Page 12: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 12

Couchbase in the Spark Landscape Transparent generation and persistence of

– RDDs– DataFrames– Dstreams

Spark SQL and N1QL are a natural fit Linearly scale your data and application layer Share data between Spark Applications

The perfect storage companion for your spark applications.

Source: http://spark.apache.org/docs/latest/cluster-overview.html

Page 13: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 13

Cluster Communication

STORAGE

Couchbase Server 1

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster

Manager

Managed Cache

Storage

Data Service

Index Service

Query Service STORAGE

Couchbase Server 2

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster

Manager

Managed Cache

Storage

Data Service

Index Service

Query Service STORAGE

Couchbase Server 3

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster

Manager

Managed Cache

Storage

Data Service

Index Service

Query Service STORAGE

Couchbase Server 4

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster

Manager

Managed Cache

Storage

Data Service

Index Service

Query Service STORAGE

Couchbase Server 5

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster

Manager

Managed Cache

Storage

Data Service

Index Service

Query Service STORAGE

Couchbase Server 6

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster

Manager

Managed Cache

Storage

Data Service

Index Service

Query Service

Spark Worker Spark Worker

Page 14: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 14

Ecosystem Flexibility

RDBMS

StreamsWeb APIs

DCPKVN1QLViews

BatchingData Archive

OLTP Data

Page 15: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 15

Infrastructure Consolidation

Page 16: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 16

The Connector

Page 17: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 17

Couchbase Connector Spark Core

– Automatic Cluster and Resource Management– Creating and Persisting RDDs– Java APIs in addition to Scala

Spark SQL– Easy JSON handling and querying– Tight N1QL Integration

Spark Streaming– Persisting DStreams– DCP source (experimental)

Page 19: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 19

Connection Management

Page 20: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 20

Connection Management

Page 21: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 21

Creating RDDs

Page 22: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 22

Persisting RDDs

Page 23: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 23

Spark SQL Integration

Page 24: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 24

Spark Streaming with DCP

Page 25: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 25

What‘s next?

Page 26: Spark with Couchbase to Electrify Your Data Processing – Couchbase Live New York 2015

©2015 Couchbase Inc. 26

Couchbase Connector 1.1.0 plans

– Upgrade to Spark 1.5– Stabilize DCP Support– Extend, Optimze, Fix bugs…

We need your feedback!