Streaming Patterns Revolutionary Architectures with the Kafka API
Post on 15-Apr-2017
410 Views
Preview:
Transcript
© 2016 MapR Technologies L1-1 ® © 2016 MapR Technologies
®
Streaming Patterns, Revolutionary Architectures Carol McDonald
© 2016 MapR Technologies L1-2 ®
Agenda Streams Core Components
• Topics, Partitions • Fault Tolerance • High Availability
Patterns • Event Sourcing • Duality of Streams and Databases • Command Query Responsibility Separation • Polyglot Persistence, Multiple Materialized Views • Turning the Database Upside Down
Real World Examples • Fraud Detection • Healthcare Exchange
© 2016 MapR Technologies L1-3 ®
Which products are we discussing?
© 2016 MapR Technologies L1-4 ® © 2016 MapR Technologies © 2016 MapR Technologies
Streams Core Components
© 2016 MapR Technologies L1-5 ®
What’s a Stream ?
Producers Consumers Events_Stream
A stream is an unbounded sequence of events carried from a set of producers to a set of consumers.
Events
© 2016 MapR Technologies L1-6 ®
What is Streaming Data? Got Some Examples?
Data Collection Devices
Smart Machinery Phones and Tablets Home Automation
RFID Systems Digital Signage Security Systems Medical Devices
© 2016 MapR Technologies L1-7 ®
Why Streams?
Trigger Events: • Stock Prices • User Activity • Sensor Data
Topic
Many Big Data sources are Event Oriented
Stream Stream Stream
Event Data
Topic Topic
Real-Time Analytics
© 2016 MapR Technologies L1-8 ®
Analyze Data What if you need to analyze data as it arrives?
© 2016 MapR Technologies L1-9 ®
It was hot at 6:05
yesterday!
Batch Processing with HDFS
Analyze
6:01 P.M.: 72° 6:02 P.M.: 75° 6:03 P.M.: 77° 6:04 P.M.: 85° 6:05 P.M.: 90° 6:06 P.M.: 85° 6:07 P.M.: 77° 6:08 P.M.: 75°
90° 90° 6:01 P.M.: 72° 6:02 P.M.: 75° 6:03 P.M.: 77° 6:04 P.M.: 85° 6:05 P.M.: 90° 6:06 P.M.: 85° 6:07 P.M.: 77° 6:08 P.M.: 75°
© 2016 MapR Technologies L1-10 ®
Event Processing with Streams
6:05 P.M.: 90° Topic
Stream
Temperature
Turn on the air conditioning!
© 2016 MapR Technologies L1-11 ®
Organize Data What if you need to organize data as it arrives?
© 2016 MapR Technologies L1-12 ®
Integrating Many Data Sources and Applications
Sources (Producers)
Applications (Consumers)
Unorganized, Complicated, and Tightly Coupled.
© 2016 MapR Technologies L1-13 ®
Organize Data into Topics with MapR Streams Topics Organize Events into Categories and Decouple Producers from Consumers
Consumers
MapR Cluster
Topic: Pressure
Topic: Temperature
Topic: Warnings
Consumers
Consumers
Kafka API Kafka API
© 2016 MapR Technologies L1-14 ®
Process High Volume of Data What if you need to process a high volume of data as it arrives?
© 2016 MapR Technologies L1-15 ®
What if BP had detected problems before the oil hit the water ?
• 1M samples/sec • High performance at
scale is necessary!
© 2016 MapR Technologies L1-16 ®
Legacy Messaging
Millions of Sources
Hundreds of Destinations insert
Legacy Message Queue:
Message rate <100K/s
Publish Acks
delete
Consume Acks
© 2016 MapR Technologies L1-17 ®
Mechanisms for Decoupling Traditional message queues? • Huge performance hit for persistence:
• message acknowledgement per message per consumer • Lots of Non sequential disk I/O when messages added/removed
© 2016 MapR Technologies L1-18 ®
Scalable Messaging with MapR Streams
Server 1
Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning
Server 2
Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning
Server 3
Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning
Topics are partitioned for throughput and scalability
© 2016 MapR Technologies L1-19 ®
Scalable Messaging with MapR Streams
Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning
Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning
Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning
Producers are load balanced between partitions
Kafka API
© 2016 MapR Technologies L1-20 ®
Scalable Messaging with MapR Streams
Partition1: Topic - Pressure Partition1: Topic - Temperature Partition1: Topic - Warning
Partition2: Topic - Pressure Partition2: Topic - Temperature Partition2: Topic - Warning
Partition3: Topic - Pressure Partition3: Topic - Temperature Partition3: Topic - Warning
Consumers
Consumers
Consumers
Consumer groups can read in parallel
Kafka API
© 2016 MapR Technologies L1-21 ®
Core Components: Partitions
Consumers
MapR Cluster
Topic: Admission / Server 1
Topic: Admission / Server 2
Topic: Admission / Server 3
Consumers
Consumers
Partition
1
Partitions: – Messages are
appended in order
Offset: – Sequential id of a
message in a partition Partition
2
Partition
3
6 5 4 3 2 1
3 2 1
5 4 3 2 1
Producers
Producers
Producers
New Message
6 5 4 3 2 1 Old
Message
© 2016 MapR Technologies L1-22 ®
Read Cursors • Read cursor: offset ID of most recent read message • Producers Append New messages to tail • Consumers Read from head
MapR Cluster
6 5 4 3 2 1 Consumer
group Producers
Read cursors
Consumer group
© 2016 MapR Technologies L1-23 ®
Consumers
MapR Cluster
Topic: Admission / Server 1
Topic: Admission / Server 2
Topic: Admission / Server 3
Consumers
Consumers
Partition
1
Partition
2
Partition
3
6 5 4 3 2 1
3 2 1
5 4 3 2 1
Producers
Producers
Producers
Events are delivered in the order they are received, like a queue. Partitioned, Sequential Access = High Performance
New Message
6 5 4 3 2 1 Old
Message
© 2016 MapR Technologies L1-24 ®
Unlike a queue, events are persisted even after they’re delivered
Messages remain on the partition, available to other consumers Minimizes Non-Sequential disk read-writes
MapR Cluster (1 Server)
Topic: Warning
Partition 1
3 2 1 Unread Events
Get Unread
3 2 1
Client Library Consumer Poll
© 2016 MapR Technologies L1-25 ®
Considering a Messaging Platform Kafka-esque Logs?
• Sequential writing/reading disk: • Messages are persisted sequentially as produced, and read sequentially when consumed • Performance plus persistence • performance of up to a billion messages per second at millisecond-level delivery times.
Kafka model is BLAZING fast
• Kafka 0.9 API with message sizes at 200 bytes • MapR Streams on a 5 node cluster sustained 18 million events / sec • Throughput of 3.5GB/s and over 1.5 trillion events / day
© 2016 MapR Technologies L1-26 ®
When Are Messages Deleted? • Messages can be persisted forever Or • Older messages can be deleted automatically based on time to live
MapR Cluster (1 Server)
6 5 4 3 2 1 Partition 1
Older message
© 2016 MapR Technologies L1-27 ®
Parallelism When Reading To read messages from the same Topic in parallel: • create consumer groups • consumers with same group.id • partitions assigned dynamically round-robin
Consumer group: Oil Wells
Consumer A
Consumer B
Consumer C
MapR Cluster
Partition 4: Warning
Partition 3: Warning
Partition 2: Warning
Partition 1: Warning
Partition 5: Warning
© 2016 MapR Technologies L1-28 ®
Fault Tolerance Consumption: Partitions Re-Assigned Dynamically If consumer goes offline, partitions re-assigned
Consumer group.id: Oil Wells
Consumer A
Consumer C
MapR Cluster
Partition4: Warning
Partition3: Warning
Partition2: Warning
Partition1: Warning
Partition5: Warning
© 2016 MapR Technologies L1-29 ®
Processing Same Message for Different Views
Consumers
Consumers
Consumers
Producers
Producers
Producers
MapR-FS
Kafka API Kafka API
Pub Sub: Multiple Consumers, Multiple Destinations
© 2016 MapR Technologies L1-30 ® © 2016 MapR Technologies © 2016 MapR Technologies
Partition Fault Tolerance
© 2016 MapR Technologies L1-31 ®
Message Recovery What if you need to recover messages in case of server failure?
© 2016 MapR Technologies L1-32 ®
Partitions are Replicated for Fault Tolerance
Producer
Producer
Server 2 Partition2: Topic - Warning
Producer
Server 1 Partition1: Topic - Warning
Server 3 Partition3: Topic - Warning
Server 2
Server 3
Server 1
Server 3
Server 1
Server 2
© 2016 MapR Technologies L1-33 ®
Partition1: Warning Partition2: Warning Replica Partition3: Warning Replica
Partition1: Warning Replica
Partition3: Warning Replica
Partition1: Warning Replica Partition2: Warning Replica Partition3: Warning
Producer
Producer
Producer
Server 1
Server 2
Server 3
Security Investigation & Event Management
Operational Intelligence
Real-time Analytics
Partition2: Warning
Partitions are Replicated for Fault Tolerance
© 2016 MapR Technologies L1-34 ®
Partitions are Replicated for Fault Tolerance
Producer
Producer
Producer
Security Investigation & Event Management
Operational Intelligence
Real-time Analytics
Partition1: Warning Partition2: Warning Replica Partition3: Warning Replica
Partition1: Warning Replica
Partition3: Warning Replica
Partition1: Warning Replica Partition2: Warning Replica Partition3: Warning
Server 1
Server 2
Server 3
Partition2: Warning
© 2016 MapR Technologies L1-35 ®
Partitions are Replicated for Fault tolerance
Producer
Producer
Producer
Security Investigation & Event Management
Operational Intelligence
Real-time Analytics
Partition1: Warning Partition2: Warning Replica Partition3: Warning Replica
Partition1: Warning Replica
Partition3: Warning Replica
Partition1: Warning Replica Partition2: Warning Replica Partition3: Warning
Server 1
Server 2
Server 3
Partition2: Warning
© 2016 MapR Technologies L1-36 ® © 2016 MapR Technologies © 2016 MapR Technologies
Streams and High Availability
© 2016 MapR Technologies L1-37 ®
• Stream: – collection of topics managed together
• Manage stream: – replication – security – time-to-live – number of partitions
Core Components: Streams
Stream
Pressure Temperature Warning
Stream
Pressure Temperature Warning
Consumers
Consumers
Consumers
Consumers
Producers
Producers
Replication
© 2016 MapR Technologies L1-38 ®
Real-time Access What if you need real-time access to live data distributed across multiple clusters and multiple data centers?
© 2016 MapR Technologies L1-39 ®
Lack of Global Replication
Topic: C
© 2016 MapR Technologies L1-40 ®
Streams and Replication Streams:
• are a collection of topics • can be replicated worldwide
Topic: A
Topic: B
Topic: C
Topic: A
Topic: B
Topic: C
Replicating to another cluster
© 2016 MapR Technologies L1-41 ®
Streams and Replication
Topic: A
Topic: B
Topic: C
Fail Over
Streams: • high availability • disaster recovery
© 2016 MapR Technologies L1-42 ®
Replicating Streams: Master-Slave Replication
Venezuela_HA Cluster
Metrics Stream
Metrics Producers
Venezuela Cluster
Metrics Stream
Metrics
Consumers
High Availabiltiy Backup for Venezula
Master Slave
© 2016 MapR Technologies L1-43 ®
Replicating Streams: Many-to-One Replication
Houston
Metrics Stream
Metrics
Producers Venezuela
Metrics Stream
Metrics Consumers
Consumers
Producers Mexico
Metrics Stream
Metrics Consumers Analyze all data from Houston
Many
One
© 2016 MapR Technologies L1-44 ®
Replicating Streams: Multi-Master Replication
Producers Seoul
Metrics Stream
Metrics Consumers
Producers San Francisco
Metrics Stream
Metrics Consumers
Both send and receive updates
© 2016 MapR Technologies L1-45 ®
Stream Replication
WAN
Stream
Pressure Temperature Warning
Stream
Pressure Temperature Warning
Stream
Pressure Temperature Warning
© 2016 MapR Technologies L1-46 ®
Ship picks up containers…
Singapore
© 2016 MapR Technologies L1-47 ®
Arrives at destination…
Tokyo
© 2016 MapR Technologies L1-48 ®
While enroute to next destination…
Washington
© 2016 MapR Technologies L1-49 ®
Where does the data live…
Singapore Washington
Tokyo
© 2016 MapR Technologies L1-50 ®
What is important about this? Data is generated on the ship
• Must have an easy way (i.e. foolproof) to move the data off the ship
Each port stores the data from the ship
• Moving data between locations • Analytics could happen at any location
This is a multi-data center time series data use case
• Events from sensors = metrics • Same concepts as data center monitoring
© 2016 MapR Technologies L1-51 ® © 2016 MapR Technologies © 2016 MapR Technologies
Patterns
© 2016 MapR Technologies L1-52 ®
Event Sourcing
Updates
Imagine each event as a change to an entry in a database.
Account Id Balance WillO 80.00 BradA 20.00
1: WillO : Deposit : 100.00 2: BradA : Deposit : 50.00 3: BradA : Withdraw : 30.00 4: WillO : Withdraw: 20.00
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
Change log
4 3 2 1
credit, debit events
current account balances
© 2016 MapR Technologies L1-53 ®
Replication
Change Log
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
3 2 1 3 2 1 3 2 1
Duality of Streams and Tables: Database: captures data at rest Stream: captures data change
Master: Append writes
Slave: Apply writes in order
© 2016 MapR Technologies L1-54 ®
Which Makes a Better System of Record?
Which of these can be used to reconstruct the other?
1: WillO : Deposit : 100.00 2: BradA : Deposit : 50.00 3: BradA : Withdraw : 30.00 4: WillO : Withdraw: 20.00
Account Id Balance WillO 80.00 BradA 20.00
Change Log 3 2 1
© 2016 MapR Technologies L1-55 ®
Rewind: Reprocessing Events
MapR Cluster
6 5 4 3 2 1 Producers
Reprocess from oldest message
Consumer
Create new view, Index, cache
© 2016 MapR Technologies L1-56 ®
Rewind Reprocessing Events
MapR Cluster
6 5 4 3 2 1 Producers
To Newest message
Consumer
new view
Read from new view
© 2016 MapR Technologies L1-57 ®
Event Sourcing, Command Query Responsibility Separation: Turning the Database Upside Down
Key-Val Document Graph
Wide Column
Time Series Relational
??? Events Updates
© 2016 MapR Technologies L1-58 ®
What Else Do I Use My Stream For?
Lineage - “how did BradA’s balance get so low?” Auditing - “who deposited/withdrew from BradA’s account?” History – to see the status of the accounts last year Integrity - “can I trust this data hasn’t been tampered with?”
• Yup - Streams are immutable
0: WillO : Deposit : 100.00 1: BradA : Deposit : 50.00 2: BradA : Withdraw : 30.00 3: WillO : Withdraw: 20.00
© 2016 MapR Technologies L1-59 ®
What Do I Need For This to Work?
Infinitely persisted events A way to query your persisted stream data An integrated security model across the stream and databases
© 2016 MapR Technologies L1-60 ®
Fraud Detection
Point of Sale -> Data Center is Transaction Fraud ? • Lots of requests • Need answer within ~ 50 100 milliseconds
Data Center
Point of Sale
Location, time, card#
Fraud yes/no ?
© 2016 MapR Technologies L1-61 ®
Traditional Solution
POS 1..n
Fraud detector
Last card use
1. Look up last card use 2. Compute the card velocity:
• Subtract last location, time from current location, time
3. Update last card use
© 2016 MapR Technologies L1-62 ®
What Happens Next?
POS 1..n
Fraud detector
Last card use
POS 1..n
Fraud detector
POS 1..n
Fraud detector
1. Look up last card use 2. Compute the card velocity 3. Update last card use
Bottleneck !
© 2016 MapR Technologies L1-63 ®
Service Isolation: Separate Read from Write
POS 1..n
Fraud detector
Last card use
Updater
card activity
Read
Read last card use
© 2016 MapR Technologies L1-64 ®
Separate Read Model from the Write Model: Command Query Responsibility Separation
POS 1..n
Fraud detector
Last card use
Updater
card activity
Read
Event last card use
Write last card use
© 2016 MapR Technologies L1-65 ®
Event Sourcing: New Uses of Data Processing Same Message for Multiple Views
POS 1..n
Fraud detector
Last card use
Updater
Card location history
Other
card activity
© 2016 MapR Technologies L1-66 ®
Scaling Through Isolation allows Multiple Consumers
POS 1..n
Last card use
Updater
POS 1..n
Last card use
Updater
card activity
Fraud detector
Fraud detector
Multiple fraud detectors can use the same message queue
• De-coupling and isolation are key
• Propagate events, not table updates
© 2016 MapR Technologies L1-67 ®
Decoupled Architecture
Producer
Activity Handler
Producer
Producer Historical
Interesting Data Real-time
Analysis
Results Dashboard
Anomaly Detection
more than one component can make use of the same stream of messages for a variety of uses
© 2016 MapR Technologies L1-68 ®
Lessons De-coupling and isolation are key Propagate events, not table updates
© 2016 MapR Technologies L1-69 ®
Building Enterprise Software vs Internet Companies
Enterprise Software: Complexity of domain => Business logic, Business rules Banking, Healthcare, Telecom Compliance=> Security
Internet Companies: Volume of data => Complex data infrastructure Large Scale Availability, Recovery
Reference Martin Kleppmann
© 2016 MapR Technologies L1-70 ®
Building Enterprise Software vs Internet Companies
Enterprise Software: Event Sourcing
Internet Companies: Stream Processing
Reference Martin Kleppmann
© 2016 MapR Technologies L1-71 ® © 2016 MapR Technologies © 2016 MapR Technologies
Real World Solution
© 2016 MapR Technologies L1-72 ®
Credit Card Fraud Model Building
© 2016 MapR Technologies L1-73 ®
Serve NoSQL Storage Data Ingest
Fraud Stream Processing Architecture
Stream Processing Source
MapR-FS
MapR-DB
Topic: A
Topic: B
Topic: C
Topic: A
Topic: B
Topic: C
© 2016 MapR Technologies L1-74 ®
Streams Messaging
Fraud Processing
Stream Processing
Derive features
Model
raw
enriched
alerts
process
Batch Processing
MapR-FS
MapR-DB
MapR-DB
raw
enriched
alerts
Model
build model update model
© 2016 MapR Technologies L1-75 ®
Streams Messaging
Fraud Event Processing
Stream Processing
NoSQL Storage
MapR-FS
MapR-DB
Raw
Enriched
Fraud
1. Parse raw event 2. read card holder
profile from MapR-DB 3. Derive features 4. Get prediction from
model with features 5. Publish not fraud to
enriched topic 6. Publish fraud to
fraud topic
© 2016 MapR Technologies L1-76 ®
Fraud Processing Same Message for Different Views
Partition1: Topic – Raw Trans Partition1: Topic – Enriched Partition1: Topic – Fraud Alert
Partition2: Topic – Raw Trans Partition2: Topic - Enriched Partition2: Topic – Fraud Alert
Partition3: Topic – Raw Trans Partition3: Topic - Enriched Partition3: Topic – Fraud Alert
Consumers MapR-FS
MapR-DB
Consumers
Consumers
Consumers MapR-FS
MapR-DB
Consumers
Consumers
Consumers MapR-FS
MapR-DB
Consumers
Consumers
© 2016 MapR Technologies L1-77 ® © 2016 MapR Technologies © 2016 MapR Technologies
Real World Solution
© 2016 MapR Technologies L1-78 ®
JSON DB (MapR-DB)
Graph DB (Titan on
MapR-DB)
Search Engine (Elastic-Search)
Transforming the Health Care Ecosystem
Electronic Medical Records
“The Stream is the System of Record” –Brad Anderson VP Big Data Informatics
© 2016 MapR Technologies L1-79 ®
Liaison ALLOY™ Platform
79
Data Integration
ingest syndicatetransform
Data Management
masterdeduplicateharmonize
relatemerge
tokenize
store / persist
analyzesummarize
reportdistill
recommend
explorequery
sandboxbatch transform
learntraverse
© 2016 MapR Technologies L1-80 ®
Use Case: Streaming System of Record for Healthcare
Objective: • Build a flexible, secure
healthcare exchange
Records Analysis Applications
Challenges: • Many different data models • Security and privacy issues • HIPAA compliance
Records
© 2016 MapR Technologies L1-81 ®
ALLOY Health: Exchange State HIE
Clinical Data Viewer
Analytics queries like: What are the outcomes in the entire state on diabetes? Are there doctors that are doing this better than others?
Clinical Data
Financial Data Provider
Organizations
© 2016 MapR Technologies L1-82 ® 2000+ Practices 200 + Labs 30,000 + Clinicians
OrdersAnywhere PORTAL (no EHR)
EHR with HL7 ONLY
EHR with WORKFLOW INTEGRATION
RADIOLOGY
LAB
© 2016 MapR Technologies L1-83 ®
This is a PAIN !
COMPLIANCE
SECURITY CONTROLS
COMPLIANCE FEATURES
PRIVACY
PCI DSS 3.0
21 CFR Part 11
SSAE16 / SOC2
HIPAA/HITECH
© 2016 MapR Technologies L1-84 ®
WHY NOW?
84 http://bit.ly/29aBatK
© 2016 MapR Technologies L1-85 ®
WHY NOW?
2014 FQ4 profit
$ -440 M Total Cost Estimate
$ -12 B
© 2016 MapR Technologies L1-86 ®
Why Now? The Relational database is not the only tool
1234
Attribute Value
patient_id 1234
Name Jon Smith
Age 50
999
Attribute Value
patient_id 999
Name Jonathan Smith
DOB Jun 1965
86
9876
Attribute Value
provider_id 86
Name Dr. Nora Paige
Specialty Diabetes
Attribute Value
rx_id 9876
Name Sitagliptin
Dosage 325mg
Visited
Prescribed
WasPrescribed
Patient
Patient
Prescription
Provider
Context and Relationships
© 2016 MapR Technologies L1-87 ®
WHY NOW? Mind the Gap
87
© 2016 MapR Technologies L1-88 ®
Streaming System of Record for Healthcare
Stream
Topic
Records
Applications
6 5 4 3 2 1
Search
Graph DB
JSON
HBase
Micro Service
Micro Service
Micro Service
Micro Service
Micro Service
Micro Service
A P I
Streaming System of Record Materialized Views
© 2016 MapR Technologies L1-89 ®
89 Immutable Log
Raw Data
workflow
Key/Value (MapR-DB)
materialized
view
workflow
Search Engine
materialized
view
CEP
k v v v v v
k v v v
k v v
k v v v v
k v v v
k v v v v v
Document Log (MapR-FS)
log
API
App
pre-processor
workflow
Graph (ArangoDB)
materialized
view
workflow
Time Series
(OpenTSDB)
materialized view
micro service
micro service
micro service
micro service
micro service
micro service
micro service
micro service
App App App
...
The Promised Land Compliance Auditor
© 2016 MapR Technologies L1-90 ®
The Promised Land
Auditor smiley faces • Data Lineage • Audit Logging • Wire-level encryption • At Rest encryption
Replication
• Disaster Recovery • EU – data can’t leave
Non-Stream / Non-”Big Data” • Software Development Lifecycle • System Hardening • Separation of Concerns
- Dev vs Ops • Patch Management
90
Compliance Auditor
© 2016 MapR Technologies L1-91 ®
Solution Design/architecture solved some
• Streams • Data Lineage/System of Record • Kappa Architecture (Kreps/Kleppman)
MapR solved others • Unified Security • Replication DC to DC • Converge Kafka/HBase/Hadoop to one cluster • Multi-tenancy (lots of topics, for lots of tenants)
91
© 2016 MapR Technologies L1-92 ® © 2016 MapR Technologies © 2016 MapR Technologies
API
© 2016 MapR Technologies L1-93 ®
Sample Producer: All Together public class SampleProducer { String topic=“/streams/pump:warning”; public static KafkaProducer producer; public static void main(String[] args) { producer=setUpProducer(); for(int i = 0; i < 3; i++) { String txt = “msg ” + i; ProducerRecord<String, String> rec = new ProducerRecord<String, String>(topic, txt); producer.send(rec); System.out.println("Sent msg number " + i); } producer.close(); }
© 2016 MapR Technologies L1-94 ®
public class MyConsumer { public static String topic = "/stream/pump:warning”; public static KafkaConsumer consumer; public static void main(String[] args) { configureConsumer(args); consumer.subscribe(topic); while (true) { ConsumerRecords<String, String> msg= consumer.poll(pollTimeOut); Iterator<ConsumerRecord<String, String>> iter = msg.iterator(); while (iter.hasNext()) { ConsumerRecord<String, String> record = iter.next(); System.out.println(”read " + record.toString()); } } consumer.close(); } }
Sample Consumer: All Together
© 2016 MapR Technologies L1-95 ® © 2016 MapR Technologies © 2016 MapR Technologies
Summary
© 2016 MapR Technologies L1-96 ®
Can we get “Extreme” ?
1+ Trillion Events
• per day Millions of Producers
• Billions of events per second Multiple Consumers
• Potentially for every event Multiple Data Centers
• Plan for success • Plan for drastic failure
Think that is crazy? Consider having 100 servers and performing: Monitoring and Application logs…
• 100 metrics per server • 60 samples per minute • 50 metrics per request • 1,000 log entries per request (abnormally
small, depends on level) • 1million requests per day
~ 2 billion events per day, for one small (ish) use case
Extreme Average Reality
© 2016 MapR Technologies L1-97 ®
Stream Processing
Building a Complete Data Architecture
MapR File System (MapR-FS)
MapR Converged Data Platform
MapR Database (MapR-DB) MapR Streams
Sources/Apps Bulk Processing
© 2016 MapR Technologies L1-98 ®
© 2016 MapR Technologies L1-99 ®
© 2016 MapR Technologies L1-100
®
bit.ly/jjug-aug2016 Find my slides & other related materials to this talk here:
or search:
© 2016 MapR Technologies L1-101
®
MapR Blog
• https://www.mapr.com/blog/
© 2016 MapR Technologies L1-102
®
…helping you put data technology to work
● Find answers
● Ask technical questions
● Join on-demand training course discussions
● Follow release announcements
● Share and vote on product ideas
● Find Meetup and event listings
Connect with fellow Apache Hadoop and Spark professionals
community.mapr.com
top related