Top Banner
Revolutionizing Big Data in the Enterprise with Spark Ion Stoica October 28, 2015
18

Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

Jan 11, 2017

Download

Software

Databricks
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

Revolutionizing Big Data in the Enterprise with SparkIon StoicaOctober 28, 2015

Page 2: Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

We Have Seen a Lot

Worked with 100s companies to run Spark in production over five years

Collaborate with all major Hadoop and Big Data vendors

2

Page 3: Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

How Does Spark Change Enterprise Big Data?

• Unifying data sources

• Unifying data processing

3

Page 4: Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

4

Unifying Data Sources

Page 5: Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

Need to process data from• Multiple sources• Different data stores and locations • Different formats

Traditional solutions: ETL data into data warehouse, …

Traditional Data Warehouses

ETL

Slow to access and combine data

Data Warehouse

Page 6: Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

6

Just-In-Time (JIT) Data Warehouse

Page 7: Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

Process data in place or stream it• No need to wait for data to be

ETLed

7

JIT Data Warehouse

ETL

Data Warehouse

Page 8: Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

Process data in place or stream it• No need to wait for data to be

ETLed

Cache data in memory or SSDs

8

JIT Data Warehouse

Low latency and easy to combine data: value!

Page 9: Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

Analogy

9

Stream/cache &Play

Download &Play

Page 10: Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

Analogy

10

ETL & Query

Data Source A

ETL

Data Warehouse

Data Source B

Data Source B

Data Source A

Data Source B

Data Source B

Stream/Cache + Query

Page 11: Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

Top-3 Media Company

Data sources• Traditional data warehouse: Customer transaction and profile data • S3: Clickstream and historical logs• Elasticsearch: User-submitted reviews and comments• Kafka: Streaming online event data

Build Spark-based JIT Data Warehouse to perform real-time analytics

11

Page 12: Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

12

Unifying Data Processing

Page 13: Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

Unified support for• Batch• Streaming• ML/Graphs• …

13

Spark: Unified Engine

GraphXMLlib

Core

Spark Streaming

Spark SQL SparkR

Easy to manage, learn, and combine functionality

Page 14: Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

Analogy

First cellularphones

Unified device(smartphone)

Specializeddevices

Better Games Better GPSBetter Phone

Page 15: Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

Analogy

Batch processing Unified systemSpecialized systems

Real-timeanalytics

Instant fraud detection

Better Apps

Page 16: Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

Large On-line Service Company

Leverages• Interactive query processing• ML

and combines data from S3, Redshift, and HBase to provide • data analytics for product management team• advanced predictive analytics to deliver new services (e.g.,

customized inventory displays tailored to each user)

16

Page 17: Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

17

Demo

Page 18: Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark

Demo Setting

18

MLlib

Core

Spark Streaming

Spark SQL

HDFS RedShift