Top Banner
Scaling Up WSO2 BAM for Billions of Requests and Terabytes of Data Buddhika Chamith Software Engineer – WSO2 BAM
33

Scaling up wso2 bam for billions of requests and terabytes of data

Jun 21, 2015

Download

Documents

WSO2
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scaling up wso2 bam for billions of requests and terabytes of data

Scaling Up WSO2 BAM for Billions of Requests and Terabytes of Data

Buddhika ChamithSoftware Engineer – WSO2 BAM

Page 2: Scaling up wso2 bam for billions of requests and terabytes of data

Business Activity Monitoring

“The aggregation, analysis, and presentation of real-time information about activities inside organizations and involving customers and partners.” - Gartner

Page 3: Scaling up wso2 bam for billions of requests and terabytes of data

Aggregation

● Capturing data● Data storage● What data to

capture?

Page 4: Scaling up wso2 bam for billions of requests and terabytes of data

Analysis

● Data operations● Building KPIs● Operate on large

amounts of historic data or new data

● Building BI

Page 5: Scaling up wso2 bam for billions of requests and terabytes of data

Presentation

● Visualizing KPIs/BI● Custom Dashboards● Visualization tools● Not just dashboards!

Page 6: Scaling up wso2 bam for billions of requests and terabytes of data

Need for Scalability

Page 7: Scaling up wso2 bam for billions of requests and terabytes of data

BAM 2.x - Component Architecture

Page 8: Scaling up wso2 bam for billions of requests and terabytes of data

Data Agents

● Push data to BAM● Collecting

● Service data● Mediation data● Logs etc.

● Various interceptors used● Axis2 Handlers● Synapse Mediators● Tomcat Valves● Log4j Appenders

Page 9: Scaling up wso2 bam for billions of requests and terabytes of data

Performance Considerations

● Should be asynchronous ● Event batching ● SOAP?● Apache Thrift (Binary protocol)

Page 10: Scaling up wso2 bam for billions of requests and terabytes of data

Apache Thrift

● A RPC framework● With a pluggable architecture

for mixing different transports with different protocols

● Has multiple language bindings (Java, C++, Python, Perl, C# etc.)

● We mainly use Java binding

Page 11: Scaling up wso2 bam for billions of requests and terabytes of data

Not Just Performance...

● Load balancing● Failover● All available within a Java SDK libary. ● You can use it too.

Page 12: Scaling up wso2 bam for billions of requests and terabytes of data

Data Receiver

● Capture and transfer data to subscribed sinks.● Not just the database. ● Can be clustered. ● Load balancing is handled from client side.

Page 13: Scaling up wso2 bam for billions of requests and terabytes of data

Data Bridge

Page 14: Scaling up wso2 bam for billions of requests and terabytes of data

Data Storage

● Apache Cassandra● NoSQL column family

implementation● Scalable, HA and no

SPOF.● Very high write

throughput and good read throughput

● Tunable consistency with data replication

Page 15: Scaling up wso2 bam for billions of requests and terabytes of data

Deployment – Storage Cluster

Page 16: Scaling up wso2 bam for billions of requests and terabytes of data

Reciever Cluster

Page 17: Scaling up wso2 bam for billions of requests and terabytes of data

Results

With a single receiver node allocated 2GB heap with quad core on RHEL.

Page 18: Scaling up wso2 bam for billions of requests and terabytes of data

Disk Growth

Page 19: Scaling up wso2 bam for billions of requests and terabytes of data

Analyzer Engine

● Idea : Distribute processing to multiple nodes to run in parallel

● Obvious choice : Hadoop ● Uses Map Reduce Programming paradigm

Page 20: Scaling up wso2 bam for billions of requests and terabytes of data

Map Reduce

● Process multiple data chunks paralley at Mappers.

● Aggregate map outputs having similar keys at Reducers and store the result.

● Let's think of a useful example..

Page 21: Scaling up wso2 bam for billions of requests and terabytes of data

Hadoop Components

● Job Tracker● Name node● Secondary Name Node● Task Trackers● Data Nodes

Page 22: Scaling up wso2 bam for billions of requests and terabytes of data

It's Cool But ..● Do we need to have a

Hadoop cluster in order to try out BAM?

● Are we supposed to code Hadoop jobs to get

BAM to summarize some thing?

● Answers

1) No

2) No. Ok may be very rarely at best.

Courtesy: http://goo.gl/QEnpN

Page 23: Scaling up wso2 bam for billions of requests and terabytes of data

Apache Hive

● You write SQL. (Almost)● Let Hive convert to Map Reduce jobs.● So Hive does two things

● Provide an abstraction for Hadoop Map Reduce● Submit the analytic jobs to Hadoop

● Hive may spawn a Hadoop JVM locally or delegate to a Hadoop Cluster

Page 24: Scaling up wso2 bam for billions of requests and terabytes of data

A Typical Hive Script

Page 25: Scaling up wso2 bam for billions of requests and terabytes of data

Results

Page 26: Scaling up wso2 bam for billions of requests and terabytes of data

Task Framework

● Run Hive scripts periodically● Can specify as cron expressions/ predefined

templates● Handles task failover in case of node faliure● Uses Zookeeper for coordination

Page 27: Scaling up wso2 bam for billions of requests and terabytes of data

Zookeeper

● Can be run seperately or embedded within BAM

Page 28: Scaling up wso2 bam for billions of requests and terabytes of data

Analyzer Cluster

Page 29: Scaling up wso2 bam for billions of requests and terabytes of data

Dashboard

● Making dashboard scale.

Page 30: Scaling up wso2 bam for billions of requests and terabytes of data

Deployment Patterns

Single Node Single Node

Page 31: Scaling up wso2 bam for billions of requests and terabytes of data

High AvailabilityHigh Availability

Page 32: Scaling up wso2 bam for billions of requests and terabytes of data

Fully Distributed SetupFully Distributed Setup

Page 33: Scaling up wso2 bam for billions of requests and terabytes of data

Summary

● BAM ● Need for scalability● Scaling BAM components● Results● BAM deployment patterns