Top Banner
© 2016 MapR Technologies 1 © 2016 MapR Technologies 1 © 2016 MapR Technologies How Spark is Enabling the New Wave of Converged Cloud Applications Ankur Desai & Carol McDonald December, 2016
43

How Spark is Enabling the New Wave of Converged Cloud Applications

Apr 16, 2017

Download

Data & Analytics

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 1© 2016 MapR Technologies 1© 2016 MapR Technologies

How Spark is Enabling the New Wave of Converged Cloud Applications

Ankur Desai & Carol McDonald

December, 2016

Page 2: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 2© 2016 MapR Technologies 2

Today’s Presenters

Carol McDonaldSolutions Architect

Ankur DesaiSr Mgr, Platform & Products

Page 3: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 3© 2016 MapR Technologies 3

Agenda

• Market Trends

• What’s Needed for Converged Streaming Applications

• Use Cases

• Demo of MapR Streams with Spark Streaming

Page 4: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 4© 2016 MapR Technologies 4

Flexible processing where change is the norm

Distributed processing across clusters, data centers, public & private cloud environments

Supports global apps that can scale arbitrarily

A Single Platform: On-Prem, In the Cloud, or InterCloud

Page 5: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 5© 2016 MapR Technologies 5

MapR on Microsoft Azure Marketplace

MapR and Microsoft enable enterprise grade big data applications in the Azure cloud

Simplified Deployment

Azure Marketplace’s automated deployment capabilities make big data easy

Azure’s infrastructure can scale up to match any requirement and scale down for value

MapR integrates with other Azure services to enable customers to analyze any type of data to unlock the biggest insights

Unlimited Scale Seamless Interoperability

Product Alignment

Page 6: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 6© 2016 MapR Technologies 6

Digital transformation for better customer experienceDeliver self-service insights across the business

• MapR platform on the Azure cloud to modernize their infrastructure and sunset legacy systems.

• Faster exploration of data with Apache Drill mitigating need for schema development.

• Support for use cases such as customer 360, supply chain & image analysis

OBJECTIVES

CHALLENGES

SOLUTION

• Modernize analytics & improve speed of marketing campaigns

• Reduce cost of existing systems• • Existing technologies prohibiting effective & timely reporting and analysis• Very long time to extract value from the data leading to lots of Excel

Leading optical retail chain

Page 7: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 7© 2016 MapR Technologies 7© 2016 MapR Technologies© 2016 MapR Technologies© 2016 MapR Technologies

The Need For Streaming

Page 8: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 8© 2016 MapR Technologies 8

Decreasing Job Latencies

Hours Mins Secs Milli Secs

Data persistence on-disk

Data persistence in-memory

Page 9: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 9© 2016 MapR Technologies 9

Big Data is Continuously Generated One Event at a Time

“time” : “6:01.103”, “event” : “RETWEET”,“location” : “lat” : 40.712784, “lon” : -74.005941

“time: “5:04.120”,“severity” : “CRITICAL”,“msg” : “Service down”

“card_num” : 1234, “merchant” : ”MERCH1”, “amount” : 50

Page 10: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 10© 2016 MapR Technologies 10

It was hot at 6:05 yesterday!

Why Stream Processing?

A n a l y z e

6:01 P.M.: 72°6:02 P.M.: 75°6:03 P.M.: 77°6:04 P.M.: 85°6:05 P.M.: 90°6:06 P.M.: 85°6:07 P.M.: 77°6:08 P.M.: 75°

90°90°6:01 P.M.: 72°6:02 P.M.: 75°6:03 P.M.: 77°6:04 P.M.: 85°6:05 P.M.: 90°6:06 P.M.: 85°6:07 P.M.: 77°6:08 P.M.: 75°

Batch processing may be too late for some events

Page 11: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 11© 2016 MapR Technologies 11

Why Stream Processing?

6:05 P.M.: 90°Topic

Temperature

Turn on the air conditioning!

It’s becoming important to process events as they arrive

S t r e a m

Page 12: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 12© 2016 MapR Technologies 12© 2016 MapR Technologies© 2016 MapR Technologies

Anatomy of Converged Streaming Applications

Page 13: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 13© 2016 MapR Technologies 13

The Trinity of Real-time

Topic 1Real Time Producers

Topic 2

Global Messaging System Persistence (Databases and Files)

Real Time Operational

Analytics

Stream Processing

Page 14: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 14© 2016 MapR Technologies 14

Serve DataStore DataStream Data

Creating the Streaming Pipeline

Process DataData Sources

Topic

Page 15: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 15© 2016 MapR Technologies 15

Open Source Engines & Tools Commercial Engines & Applications

Enterprise-Grade Platform Services

Dat

aPr

oces

sing

Web-Scale StorageMapR-FS MapR-DB

Search and

Others

Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability

MapR Streams

Cloud and

Managed Services

Search and Others

Unified M

anagement and M

onitoring

Search and

Others

Event StreamingDatabase

Custom Apps

HDFS API POSIX, NFS HBase API JSON API Kafka API

MapR Converged Data Platform

Page 16: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 16© 2016 MapR Technologies 16

MapR Streams:Global Pub-sub Event Streaming System for Big Data

Producers publish billions of messages/sec to a topic in a stream.

Guaranteed, immediate delivery to all consumers.

Tie together geo-dispersed clusters. Worldwide.

Standard real-time API (Kafka). Integrates with Spark Streaming, Storm, Apex, and Flink

Direct data access (OJAI API) from analytics frameworks.

Topic

Stream

Producers

Remote sites and consumers

Batch analytics

Topic

Replication

Consumers

Consumers

Page 17: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 17© 2016 MapR Technologies 17

Scalable Event Streaming with MapR Streams

Topics are partitioned for throughput and scalability

Partition 1: Topic - Pressure

Partition 1: Topic - Temperature

Partition 1: Topic - Warning

Partition 2: Topic - Pressure

Partition 2: Topic - Temperature

Partition 2: Topic - Warning

Partition 3: Topic - Pressure

Partition 3: Topic - Temperature

Partition 3: Topic - Warning

Consumers

Consumers

Consumers!

Page 18: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 18© 2016 MapR Technologies 18

MapR-DB is Designed to Scale

Key Range

xxxx

xxxx

Key Col B Col C

val val val

xxx val val

Fast Reads and Writes by KeyData is automatically partitioned by Key Range

Key Range

xxxxxxxx

Key Col B Col C

val val val

xxx val val

Key Range

xxxxxxxx

Key Col B Col C

val val val

xxx val val

Page 19: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 19© 2016 MapR Technologies 19© 2016 MapR Technologies© 2016 MapR Technologies

Use Cases

Page 20: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 20© 2016 MapR Technologies 20

Customer 360 & Behavior Prediction

Website Click-Stream

Real Time/Offline ClickStream Analysis

Internal Data Sources

External Data Sources

• Prediction Modelling

• Attribution Modelling

• Cohort Analysis

• Customer Lifetime Value Analysis

• Attrition Modelling

• Response Modelling

• Churn Modelling

Eliminate latency due to data movement between clusters

Eliminate Redundant storage with MapR streams and lower the TCO

360 Degree Customer View

Customer Behavior PredictionBetter Conversion Rate and Lower attrition $$$

OfflineReal Time

HA, DR, NFS, Snapshots, Data Protection

EDH/EDL

Topic

Topic

Topic

Topic

Support Tickets

DBMSEmail

CRM

Page 21: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 21© 2016 MapR Technologies 21

Prescriptive Analytics: IoT & Auto Manufacturing

GPS

Telematic Data

Telephone Truck Fleet

Data generated from cars are stored locally

Data Modelling/Secondary ETL: Data is converted from proprietary to parquet format

• Identify emission patterns• Route optimization• Customer service requests• How does throttling affect other factors such as fuel consumption, emissions, etc.• Image and video analysis• Time series analysis for threshold breach

Topic

Topic

Topic

Topic

Page 22: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 22© 2016 MapR Technologies 22© 2016 MapR Technologies© 2016 MapR Technologies

Demo

Page 23: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 23© 2016 MapR Technologies 23

What if BP had detected problems before the oil hit the water ?

1M samples/secHigh performance at scale is necessary!

Page 24: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 24© 2016 MapR Technologies 24

Use Case: Time Series Data

Data for real-time monitoring

Sensor time-stamped data

Spark processing

readSpark Streaming

Stream

Topic

Page 25: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 25© 2016 MapR Technologies 25

Use Case: Time Series Data

Sensor time-stamped data

Stream

Topic

COHUTTA,3/10/14,1:01,10.27,1.73,881,1.56,85,1.94

COHUTTA,3/10/14,1:03,10.47,1.732,882,1.7,92,0.66

COHUTTA,3/10/14,1:02,9.67,1.731,882,0.52,87,1.79

Data: PumpId, Date,Time , pressure and flow measurements

Page 26: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 26© 2016 MapR Technologies 26

Schema• All events stored, CF data could be set to expire data• Filtered alerts put in CF alerts• Daily summaries put in CF stats

Row keyCF data CF alerts CF stats

hz … psi psi … hz_avg … psi_min

COHUTTA_3/10/14_1:01 10.37 84 0

COHUTTA_3/10/14 10 0

Row Key contains oil pump name, date, and a time stamp

Page 27: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 27© 2016 MapR Technologies 27

Schema• All events stored, CF data could be set to expire data• Filtered alerts put in CF alerts• Daily summaries put in CF stats

Row keyCF data CF alerts CF stats

hz … psi psi … hz_avg … psi_min

COHUTTA_3/10/14_1:01 10.37 84 0

COHUTTA_3/10/14 10 0

Page 28: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 28© 2016 MapR Technologies 28

Schema• All events stored, CF data could be set to expire data• Filtered alerts put in CF alerts• Daily summaries put in CF stats

Row keyCF data CF alerts CF stats

hz … psi psi … hz_avg … psi_min

COHUTTA_3/10/14_1:01 10.37 84 0

COHUTTA_3/10/14 10 0

Page 29: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 29© 2016 MapR Technologies 29

Serve Data

What Do We Need to Do ?

Data Sources Store DataCollect Data Process Data

St ream

Topic

Page 30: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 30© 2016 MapR Technologies 30

readSpark Streaming

Stream

Topic

Use Case Example Code

Data for real-time monitoring

Sensor time-stamped data Spark processing

Page 31: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 31© 2016 MapR Technologies 31

KafkaProducerString topic=“/streams/pump:warning”;public static KafkaProducer producer;//1 configure KafkaProducer properties Properties properties = new Properties();properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");//2 Create KafkaProducer with propertieskafkaProducer = new KafkaProducer<String, String>(properties);String txt = “msg text”;//3 Create producer records with topic and message ProducerRecord<String, String> record = new ProducerRecord<String, String>(topic, txt);//4 use kafka producer to send recordskafkaProducer.send(record);

Page 32: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 32© 2016 MapR Technologies 32

readSpark Streaming

Stream

Topic

Use Case Example Code

Data for real-time monitoring

Sensor time-stamped data Spark processing

Page 33: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 33© 2016 MapR Technologies 33

Create a DStream

DStream: a sequence of RDDs representing a stream of data

val ssc = new StreamingContext(sparkConf, Seconds(5))// create an input Stream for set of topicsval dStream = KafkaUtils.createDirectStream[String, String](ssc, kafkaParams, topicsSet)

batchtime 0 to 1

batch time 1 to 2

batch time 2 to 3

dStream

Stored in memory as an RDD

Page 34: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 34© 2016 MapR Technologies 34

Message Data to Sensor Object

case class Sensor(resid: String, date: String, time: String, hz: Double, disp: Double, flo: Double, sedPPM: Double, psi: Double, chlPPM: Double)// Parse CSV Strings into Sensor objects def parseSensor(str: String): Sensor = { val p = str.split(",") Sensor(p(0), p(1), p(2), p(3).toDouble, p(4).toDouble, p(5).toDouble, p(6).toDouble, p(7).toDouble, p(8).toDouble)}

Page 35: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 35© 2016 MapR Technologies 35

Process DStream// Parse message values into Sensor objects val sensorDStream = dStream.map(_._2).map(parseSensor)

dStream RDDs

batch time 2 to 3

batch time 1 to 2

batchtime 0 to 1

sensorDStream RDDs

New RDDs created for every batch

map map map

Page 36: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 36© 2016 MapR Technologies 36

DataFrame and SQL Operations// for Each RDD sensorDStream.foreachRDD { rdd => val sqlContext = SQLContext.getOrCreate(rdd.sparkContext) // convert RDD to DataFrame rdd.toDF().registerTempTable("sensor") // get the avg max min for pump values val res = sqlContext.sql( "SELECT resid, date, max(hz) as maxhz, min(hz) as minhz, avg(hz) as avghz, max(disp) as maxdisp, min(disp) as mindisp, avg(disp) as avgdisp, max(flo) as maxflo, min(flo) as minflo, avg(flo) as avgflo, max(psi) as maxpsi, min(psi) as minpsi, avg(psi) as avgpsi FROM sensor GROUP BY resid,date”) res.show()}

Page 37: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 37© 2016 MapR Technologies 37

Streaming Application Output

Page 38: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 38© 2016 MapR Technologies 38

Save to HBaserdd.map(Sensor.convertToPut).saveAsHadoopDataset(jobConfig)

linesRDD DStream

sensorRDD DStream

output operation: persist data to external storage

Put objects written to HBase

batch time 2-3

batch time 1 to 2

batchtime 0 to 1

mapmap map

savesave save

Page 39: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 39© 2016 MapR Technologies 39

Start Receiving Data

sensorDStream.foreachRDD { rdd => . . .

}// Start the computation ssc.start() // Wait for the computation to terminate ssc.awaitTermination()

Page 40: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 40© 2016 MapR Technologies 40

Stream Processing

Building a Complete Data Architecture

MapR File System (MapR-FS)

MapR Converged Data Platform

MapR Database (MapR-DB)MapR Streams

Sources/Apps Bulk Processing

Page 41: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 41© 2016 MapR Technologies 41

Page 42: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 42© 2016 MapR Technologies 42

Azure and MapR Resources – 3 steps to get started

• Azure Overviewhttps://www.mapr.com/partners/partner/microsoft-azure-microsofts-cloud-computing-platform-moving-faster-achieving-more

• 7 Steps to Deploy the MapR Sandbox on Azurehttps://www.mapr.com/blog/7-steps-deploy-mapr-sandbox-microsoft-azure

• Azure Test Drivehttp://mapr.testdrivelabs.com/ (subject to change)

Page 43: How Spark is Enabling the New Wave of Converged Cloud Applications

© 2016 MapR Technologies 43© 2016 MapR Technologies 43

Q & AEngage with us!

1. Read explanation of and Download code– https://www.mapr.com/blog/fast-scalable-streaming-applications-mapr-streams-spark-streaming-and-mapr-db– https://www.mapr.com/blog/spark-streaming-hbase

2. Get Started: MapR Converged Data Platform https://www.mapr.com/get-started-with-mapr

3. Get Answers: MapR Converge Community https://community.mapr.com/community/answers

4. Get Trained: MapR On-Demand Training https://learn.mapr.com