Top Banner
1 Distributed, fault-tolerant, transactional Real-Time Integration: MongoDB and SQL Databases Eugene Dvorkin Architect, WebMD
58

Real-Time Integration Between MongoDB and SQL Databases

Jan 15, 2015

Download

Technology

MongoDB

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Real-Time Integration Between MongoDB and SQL Databases

1

Distributed, fault-tolerant, transactional

Real-Time Integration: MongoDB and SQL Databases

Eugene DvorkinArchitect, WebMD

Page 2: Real-Time Integration Between MongoDB and SQL Databases

2

WebMD: A lot of data; a lot of traffic

~900 millions page view a month~100 million unique visitors a month

Page 3: Real-Time Integration Between MongoDB and SQL Databases

3

How We Use MongoDB

User Activity

Page 4: Real-Time Integration Between MongoDB and SQL Databases

4

Why Move Data to RDBMS?

Preserve existing investment in BI and data warehouse

To use analytical database such as VerticaTo use SQL

Page 5: Real-Time Integration Between MongoDB and SQL Databases

5

Why Move Data In Real-time?

Batch process is slow

No ad-hoc queries

No real-time reports

Page 6: Real-Time Integration Between MongoDB and SQL Databases

6

Challenge in moving data

Transform Document to Relational Structure Insert into RDBMS at high rate

Page 7: Real-Time Integration Between MongoDB and SQL Databases

7

Challenge in moving data

Scale easily as data volume and velocity increase

Page 8: Real-Time Integration Between MongoDB and SQL Databases

8

Our Solution to move data in Real-time: Storm

tem. Storm – open source distributed real-time computation system.

Developed by Nathan Marz - acquired by Twitter

Page 9: Real-Time Integration Between MongoDB and SQL Databases

9

Hadoop Storm

Our Solution to move data in Real-time: Storm

Page 10: Real-Time Integration Between MongoDB and SQL Databases

10

Why STORM?

JVM-based framework

Guaranteed data processing

Supports development in multiple

languages

Scalable and transactional

Page 11: Real-Time Integration Between MongoDB and SQL Databases

11

Overview of Storm cluster

Master Node

Cluster Coordination

run worker processes

Page 12: Real-Time Integration Between MongoDB and SQL Databases

12

Storm Abstractions

Tuples, Streams, Spouts, Bolts and Topologies

Page 13: Real-Time Integration Between MongoDB and SQL Databases

13

Tuples

(“ns:events”,”email:[email protected]”)

Ordered list of elements

Page 14: Real-Time Integration Between MongoDB and SQL Databases

14

Stream

Unbounded sequence of tuples

Example: Stream of messages from message queue

Page 15: Real-Time Integration Between MongoDB and SQL Databases

15

Spout

Read from stream of data – Queues, web logs, API calls, mongoDB oplogEmit documents as tuples

Source of Streams

Page 16: Real-Time Integration Between MongoDB and SQL Databases

16

BoltsProcess tuples and create new streams

Page 17: Real-Time Integration Between MongoDB and SQL Databases

17

Bolts

Apply functions /transformsCalculate and aggregate data (word count!)Access DB, API , etc.Filter dataMap/Reduce

Process tuples and create new streams

Page 18: Real-Time Integration Between MongoDB and SQL Databases

18

Topology

Page 19: Real-Time Integration Between MongoDB and SQL Databases

19

Topology

Storm is transforming and moving data

Page 20: Real-Time Integration Between MongoDB and SQL Databases

20

MongoDB

How To Read All Incoming Data from MongoDB?

Page 21: Real-Time Integration Between MongoDB and SQL Databases

21

MongoDB

How To Read All Incoming Data from MongoDB?

Use MongoDB OpLog

Page 22: Real-Time Integration Between MongoDB and SQL Databases

22

What is OpLog?

Replication mechanism in MongoDBIt is a Capped Collection

Page 23: Real-Time Integration Between MongoDB and SQL Databases

23

Spout: reading from OpLog

Located at local database, oplog.rs collection

Page 24: Real-Time Integration Between MongoDB and SQL Databases

24

Spout: reading from OpLog

Operations: Insert, Update, Delete

Page 25: Real-Time Integration Between MongoDB and SQL Databases

25

Spout: reading from OpLog

Name space: Table – Collection name

Page 26: Real-Time Integration Between MongoDB and SQL Databases

26

Spout: reading from OpLog

Data object:

Page 27: Real-Time Integration Between MongoDB and SQL Databases

27

Sharded cluster

Page 28: Real-Time Integration Between MongoDB and SQL Databases

28

Automatic discovery of sharded cluster

Page 29: Real-Time Integration Between MongoDB and SQL Databases

29

Example: Shard vs Replica set discovery

Page 30: Real-Time Integration Between MongoDB and SQL Databases

30

Example: Shard discovery

Page 31: Real-Time Integration Between MongoDB and SQL Databases

31

Spout: Reading data from OpLog

How to Read data continuously from OpLog?

Page 32: Real-Time Integration Between MongoDB and SQL Databases

32

Spout: Reading data from OpLog

How to Read data continuously from OpLog?

Use Tailable Cursor

Page 33: Real-Time Integration Between MongoDB and SQL Databases

33

Example: Tailable cursor - like tail –f

Page 34: Real-Time Integration Between MongoDB and SQL Databases

34

Manage timestamps

Use ts (timestamp in oplog entry) field to track processed records

If system restart, start from recorded ts

Page 35: Real-Time Integration Between MongoDB and SQL Databases

35

Spout: reading from OpLog

Page 36: Real-Time Integration Between MongoDB and SQL Databases

36

SPOUT – Code Example

Page 37: Real-Time Integration Between MongoDB and SQL Databases

37

TOPOLOGY

Page 38: Real-Time Integration Between MongoDB and SQL Databases

38

Working With Embedded Arrays

Array represents One-to-Many relationship in RDBMS

Page 39: Real-Time Integration Between MongoDB and SQL Databases

39

Example: Working with embedded arrays

Page 40: Real-Time Integration Between MongoDB and SQL Databases

40

Example: Working with embedded arrays

{_id: 1, ns: “person_awards”, o: { award: 'National Medal of Science', year: 1975, by: 'National Science Foundation' }}

{ _id: 1, ns: “person_awards”,o: {award: 'Turing Award', year: 1977, by: 'ACM' }}

Page 41: Real-Time Integration Between MongoDB and SQL Databases

41

Example: Working with embedded arrays

public void execute(Tuple tuple) {

.........

if (field instanceof BasicDBList) {

BasicDBObject arrayElement=processArray(field)

......

outputCollector.emit("documents", tuple, arrayElement);

Page 42: Real-Time Integration Between MongoDB and SQL Databases

42

Parse documents with Bolt

Page 43: Real-Time Integration Between MongoDB and SQL Databases

43

{"ns": "people", "op":"i", o : { _id: 1, name: { first: 'John', last: 'Backus' }, birth: 'Dec 03, 1924’}

["ns": "people", "op":"i", [“id”:1, "name_first": "John", "name_last":"Backus", "birth": "DEc 03, 1924" ]]

Parse documents with Bolt

Page 44: Real-Time Integration Between MongoDB and SQL Databases

44

@Override

public void execute(Tuple tuple) {

......

final BasicDBObject oplogObject =

(BasicDBObject)tuple.getValueByField("document");

final BasicDBObject document =

(BasicDBObject)oplogObject.get("o");

......

outputValues.add(flattenDocument(document));

outputCollector.emit(tuple,outputValues);

Parse documents with Bolt

Page 45: Real-Time Integration Between MongoDB and SQL Databases

45

Write to SQL with SQLWriter Bolt

Page 46: Real-Time Integration Between MongoDB and SQL Databases

46

Write to SQL with SQLWriter Bolt

["ns": "people", "op":"i", [“id”:1, "name_first": "John", "name_last":"Backus", "birth": "Dec 03, 1924" ]

]insert into people (_id,name_first,name_last,birth) values

(1,'John','Backus','Dec 03,1924') ,

insert into people_awards

(_id,awards_award,awards_award,awards_by) values (1,'Turing

Award',1977,'ACM'),

insert into people_awards

(_id,awards_award,awards_award,awards_by) values (1,'National

Medal of Science',1975,'National Science Foundation')

Page 47: Real-Time Integration Between MongoDB and SQL Databases

47

@Override public void prepare(.....) {.... Class.forName("com.vertica.jdbc.Driver"); con = DriverManager.getConnection(dBUrl, username,password);

@Override public void execute(Tuple tuple) { String insertStatement=createInsertStatement(tuple); try { Statement stmt = con.createStatement(); stmt.execute(insertStatement); stmt.close();

Write to SQL with SQLWriter Bolt

Page 48: Real-Time Integration Between MongoDB and SQL Databases

48

Topology Definition

TopologyBuilder builder = new TopologyBuilder();// define our spoutbuilder.setSpout(spoutId, new MongoOpLogSpout("mongodb://", opslog_progress)builder.setBolt(arrayExtractorId ,new ArrayFieldExtractorBolt(),5).shuffleGrouping(spoutId)builder.setBolt(mongoDocParserId, new MongoDocumentParserBolt()).shuffleGrouping(arrayExtractorId,documentsStreamId)builder.setBolt(sqlWriterId, new SQLWriterBolt(rdbmsUrl,rdbmsUserName,rdbmsPassword)).shuffleGrouping(mongoDocParserId)

LocalCluster cluster = new LocalCluster();cluster.submitTopology("test", conf, builder.createTopology());

Page 49: Real-Time Integration Between MongoDB and SQL Databases

49

Topology Definition

TopologyBuilder builder = new TopologyBuilder();// define our spoutbuilder.setSpout(spoutId, new MongoOpLogSpout("mongodb://", opslog_progress)builder.setBolt(arrayExtractorId ,new ArrayFieldExtractorBolt(),5).shuffleGrouping(spoutId)builder.setBolt(mongoDocParserId, new MongoDocumentParserBolt()).shuffleGrouping(arrayExtractorId,documentsStreamId)builder.setBolt(sqlWriterId, new SQLWriterBolt(rdbmsUrl,rdbmsUserName,rdbmsPassword)).shuffleGrouping(mongoDocParserId)

LocalCluster cluster = new LocalCluster();cluster.submitTopology("test", conf, builder.createTopology());

Page 50: Real-Time Integration Between MongoDB and SQL Databases

50

Topology Definition

TopologyBuilder builder = new TopologyBuilder();// define our spoutbuilder.setSpout(spoutId, new MongoOpLogSpout("mongodb://", opslog_progress)builder.setBolt(arrayExtractorId ,new ArrayFieldExtractorBolt(),5).shuffleGrouping(spoutId)builder.setBolt(mongoDocParserId, new MongoDocumentParserBolt()).shuffleGrouping(arrayExtractorId,documentsStreamId)builder.setBolt(sqlWriterId, new SQLWriterBolt(rdbmsUrl,rdbmsUserName,rdbmsPassword)).shuffleGrouping(mongoDocParserId)

LocalCluster cluster = new LocalCluster();cluster.submitTopology("test", conf, builder.createTopology());

Page 51: Real-Time Integration Between MongoDB and SQL Databases

51

Topology Definition

TopologyBuilder builder = new TopologyBuilder();// define our spoutbuilder.setSpout(spoutId, new MongoOpLogSpout("mongodb://", opslog_progress)builder.setBolt(arrayExtractorId ,new ArrayFieldExtractorBolt(),5).shuffleGrouping(spoutId)builder.setBolt(mongoDocParserId, new MongoDocumentParserBolt()).shuffleGrouping(arrayExtractorId,documentsStreamId)builder.setBolt(sqlWriterId, new SQLWriterBolt(rdbmsUrl,rdbmsUserName,rdbmsPassword)).shuffleGrouping(mongoDocParserId)

StormSubmitter.submitTopology("OfflineEventProcess", conf,builder.createTopology())

Page 52: Real-Time Integration Between MongoDB and SQL Databases

52

Lesson learned

By leveraging MongoDB Oplog or

other capped collection, tailable cursor

and Storm framework, you can build

fast, scalable, real-time data

processing pipeline.

Page 55: Real-Time Integration Between MongoDB and SQL Databases

55

Questions

Eugene Dvorkin, Architect, WebMD [email protected] Twitter: @edvorkin LinkedIn: eugenedvorkin

Page 56: Real-Time Integration Between MongoDB and SQL Databases

56

Page 57: Real-Time Integration Between MongoDB and SQL Databases

57

Page 58: Real-Time Integration Between MongoDB and SQL Databases

58

Next Sessions at 2:505th Floor:

West Side Ballroom 3&4: Data Modeling Examples from the Real World

West Side Ballroom 1&2: Growing Up MongoDB

Juilliard Complex: Business Track: MetLife Leapfrogs Insurance Industry with MongoDB-Powered Big Data Application

Lyceum Complex: Ask the Experts: MongoDB Monitoring and Backup Service Session

7th Floor:

Empire Complex: How We Fixed Our MongoDB Problems

SoHo Complex: High Performance, High Scale MongoDB on AWS: A Hands On Guide