Top Banner
59

Spark and MongoDB

Apr 16, 2017

Download

Software

Norberto Leite
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Spark and MongoDB
Page 2: Spark and MongoDB

Spark in the Leaf

Page 3: Spark and MongoDB
Page 4: Spark and MongoDB
Page 5: Spark and MongoDB

"BigDataSpain (2014) is great, hope I can make it next year too"

Wish making conference!

Page 6: Spark and MongoDB

"BigDataSpain (2015) is great, hope I can win la Loteria"

Page 7: Spark and MongoDB

7

Agenda Spark + MongoDB Connectors Use Cases Demo

Page 8: Spark and MongoDB

By now, you should have heard about MongoDB

Unless you've been living under a rock for the last few years!

Page 9: Spark and MongoDB

9

MongoDB

GENERAL PURPOSE DOCUMENT DATABASE OPEN-SOURCE

Page 10: Spark and MongoDB

MongoDB is Fully Featured

Page 11: Spark and MongoDB

11

Apache Spark

Page 12: Spark and MongoDB

12

Spark is Taylor Swift of Big Data

Page 13: Spark and MongoDB

13

Agenda Spark Taylor Swift + MongoDB Connectors Use Cases Demo

Page 14: Spark and MongoDB

14

Apache Spark Taylor Swift

Page 15: Spark and MongoDB

Spark Stack

Spark SQL Spark Streaming MLIB GraphX

Apache Spark

Seamless integration with SQL using

DataFrame API. Also supports HIVE SQL

Fast Feed data processing API. Designed for Fault Tolerance and

bridges streaming with batch processing MLib is Spark machine learning algorithms trick bag.

Spark graph library

Page 16: Spark and MongoDB

Spark Stack

Spark SQL Spark Streaming MLIB GraphX

Apache Spark

Page 17: Spark and MongoDB

Spark Stack

Spark SQL Spark Streaming MLIB GraphX

Apache Spark

Page 18: Spark and MongoDB

Spark Stack

Spark SQL Spark Streaming MLIB GraphX

Apache Spark

Page 19: Spark and MongoDB

Spark + MongoDB

Page 20: Spark and MongoDB

20

Data Management

Offline Processing Analytics Data Warehousing

OLTP Applications Fine grained operations

Page 21: Spark and MongoDB

The image cannot be displayed. Your computer 21

Delivering User Relevancy •  Integrate data from many

sources •  Fast-cycle analytics •  Real-time •  Reliable

Page 22: Spark and MongoDB

Fraud Detection

I'm so in love!

Page 23: Spark and MongoDB

Fraud Detection

I'm so in love!

Me, too<3

Now send me your CC number

?

Ok, XXXX-123-zzz

$$$

Page 24: Spark and MongoDB

Fraud Detection

Page 25: Spark and MongoDB

Workloads

Chat App

Login User Profile Contacts Messages …

Spark Fraud Detection Segmentation Recommendations

HDFS HDFS HDFS Archiving Data Crunching

Page 26: Spark and MongoDB

26

Wearable Devices Embedded Systems Internet of Things Embedded medical devices

Page 27: Spark and MongoDB

The image cannot be displayed. Your computer 27

Access complete patient history Avoid of conflicting prescriptions Clinical trials

Page 28: Spark and MongoDB

High Speed Document Design

Page 29: Spark and MongoDB

Time Series db.ticks.find(){ _id: 'MSFT_12', type: 'Open', date: ISODate("2015-07-12 10:00"), volume: 1699342, minutes: { "0": 12.9, "1": 14.4, ... "59": 15.8 }}

Resource

Type

When

Series

Page 30: Spark and MongoDB

h1p://cdn.theatlan9c.com/sta9c/infocus/ngt051713/n10_00203194.jpg

WiredTiger

Page 31: Spark and MongoDB

Very High Speed

Page 32: Spark and MongoDB

> mongod --storageEngine wiredTiger

Page 33: Spark and MongoDB

> mongod

On Upcoming 3.2

Page 34: Spark and MongoDB

34

MongoDB Storage Engines

Content Repo

IoT Sensor Backend Ad Service Customer

Analytics Archive

MongoDB Query Language (MQL) + Native Drivers

MongoDB Document Data Model

MMAP V1 WT In-Memory ? ?

Supported in MongoDB 3.0 Future Possible Storage Engines

Man

agem

ent

Sec

urity

Experimental

Page 35: Spark and MongoDB

Spark Streaming

Page 36: Spark and MongoDB

36

Spark Streaming

Spark Twitter Feed

Page 37: Spark and MongoDB

37

Spark Streaming

Twitter Feed

{ "statuses": [ { "coordinates": null, "favorited": false, "truncated": false, "created_at": "Mon Sep 24 03:35:21 +0000 2012", "id_str": "250075927172759552", "entities": { "urls": [

], "hashtags": [ { "text": "freebandnames", "indices": [ 20, 34 ] } ], "user_mentions": [] } }}

Page 38: Spark and MongoDB

38

Spark Streaming

Spark

{ "statuses": [ { "coordinates": null, "favorited": false, "truncated": false, "created_at": "Mon Sep 24 03:35:21 +0000 2012", "id_str": "250075927172759552", "entities": { "urls": [

], "hashtags": [ { "text": "freebandnames", "indices": [ 20, 34 ] } ], "user_mentions": [] } }}

{ "time": "Mon Sep 24 03:35", "freebandnames": 1}

{ "statuses": [ { "coordinates": null, "favorited": false, "truncated": false, "created_at": "Mon Sep 24 03:35:21 +0000 2012", "id_str": "250075927172759552", "entities": { "urls": [

], "hashtags": [ { "text": "freebandnames", "indices": [ 20, 34 ] } ], "user_mentions": [] } }}

{ "statuses": [ { "coordinates": null, "favorited": false, "truncated": false, "created_at": "Mon Sep 24 03:35:21 +0000 2012", "id_str": "250075927172759552", "entities": { "urls": [

], "hashtags": [ { "text": "freebandnames", "indices": [ 20, 34 ] } ], "user_mentions": [] } }}

{ "statuses": [ { "coordinates": null, "favorited": false, "truncated": false, "created_at": "Mon Sep 24 03:35:21 +0000 2012", "id_str": "250075927172759552", "entities": { "urls": [

], "hashtags": [ { "text": "freebandnames", "indices": [ 20, 34 ] } ], "user_mentions": [] } }}

{ "time": "Mon Sep 24 03:35", "freebandnames": 4}

Page 39: Spark and MongoDB

39

Capped Collection

Spark Streaming

{ "time": "Mon Sep 24 03:35", "freebandnames": 4}

{ "time": "Mon Sep 24 03:40", "bigdataspain": 400}

{ "time": "Mon Sep 24 03:50", "bigdataspain": 7556}

{ "time": "Mon Sep 24 03:50", "itshappending": 100}

Tailable Cursor

Page 40: Spark and MongoDB

Spark SQL

Page 41: Spark and MongoDB

MongoDB Hadoop Connector

Spark

HDFS HDFS HDFS

MongoDB Hadoop Connector

MongoDB Shard

Page 42: Spark and MongoDB

MongoDB Hadoop Connector

Spark

HDFS HDFS HDFS

MongoDB Hadoop Connector

MongoDB Shard

YARN

Page 43: Spark and MongoDB

43

MongoDB Hadoop Connector

Positive Not So Good

Battle Tested Not the fastest thing

Integrated with existing Hadoop components Not dedicated to Spark

Supports HIVE and PIG Dependent on HDFS

http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-hadoop/

Page 44: Spark and MongoDB

44

Stratio Spark-MongoDB http://spark-packages.org/?q=mongodb

Page 45: Spark and MongoDB

45

Stratio Spark-MongoDB

https://github.com/Stratio/spark-mongodb

Spark

HDFS HDFS HDFS

MongoDB Shard

Stratio Spark-MongoDB

Page 46: Spark and MongoDB

46

Stratio Spark-MongoDB

val mcInputBuilder = MongodbConfigBuilder(Map(Host -> List("localhost:27017"), Database -> "marketdata", Collection -> "minbars", SamplingRatio -> 1.0, WriteConcern -> MongodbWriteConcern.Normal))

val readConfig = mcInputBuilder.build()

Database

Collec9on

SamplingRa9o

WriteConcern

Page 47: Spark and MongoDB

47

Stratio Spark-MongoDB

val sqlContext = new HiveContext(sc)val dfOneMin = sqlContext.fromMongoDB(readConfig)

Page 48: Spark and MongoDB

48

Stratio Spark-MongoDB

val dfFiveMinForMonth = sqlContext.sql("""SELECT m.Symbol, m.OpenTime as Timestamp, m.Open, m.High, m.Low, m.CloseFROM...FROM minbars)as mWHERE unix_timestamp(m.CloseTime, 'yyyy-MM-dd HH:mm') - unix_timestamp(m.OpenTime, 'yyyy-MM-dd HH:mm') = 60*4""")

Page 49: Spark and MongoDB

49

Stratio Spark-MongoDB

https://github.com/Stratio/spark-mongodb

Spark

HDFS HDFS HDFS

MongoDB Shard

Stratio Spark-MongoDB

Page 50: Spark and MongoDB

50

DC West

DC West

DC West

Stratio Spark-MongoDB

https://github.com/Stratio/spark-mongodb

Spark

MongoDB Shard

Spark

Spark

Page 51: Spark and MongoDB

Demo

Page 52: Spark and MongoDB

52

Demo

Spark Stratio Spark-MongoDB

Page 53: Spark and MongoDB

53

Feeling powerful ???

Page 54: Spark and MongoDB

Future

Page 55: Spark and MongoDB

55

What to expect

•  We are working on a dedicated Spark Connector for MongoDB

•  Stratio Connector is great but: – Some Operations are actually faster if performed using

Aggregation Framework •  Better Integration with upcoming 3.2 Async Java Driver

– Specially for the Apache Streaming Support

Page 56: Spark and MongoDB

MongoDB Days 2015 05 November, 2015 London

https://www.mongodb.com/events/mongodb-days-uk

Page 57: Spark and MongoDB

57

Engineering

Sales&AccountManagement Finance&PeopleOpera9ons

Pre-SalesEngineering Marke9ng

JointheTeam

Viewalljobsandapply:h1p://grnh.se/pj10su

Page 58: Spark and MongoDB

Obrigado!

Norberto Leite Technical Evangelist [email protected] @nleite

Page 59: Spark and MongoDB