©2015 Slide 1 Prepared for: BDA Meetup Turbocharging CDAP Applications With Ampool Milind Bhandarkar, (@techmilind) Founder & CEO @AmpoolIO
©2015Slide 1
Prepared for:BDA Meetup
Turbocharging CDAP Applications With AmpoolMilind Bhandarkar, (@techmilind)Founder & CEO @AmpoolIO
©2015Slide 2
Prepared for:BDA Meetup
Ampool Vision
Pipelines w/ CDAP
IMDG w/ Geode
Ampool w/ CDAP
Q & A
Outline 1
2
3
4
Ampool Vision
Pipeline/ CDAP
IMDG / Geode
Ampool/ CDAP
5
Q & A
©2015Slide 3
Prepared for:BDA Meetup
Data Processing & Storage layers have evolved for scale-out
Unstructured Structured
Pers
iste
nce
Proc
essi
ng ImmutableMutable
Unmanaged Managed
Log Publish
QTx
ETL
In the beginning…
As app users & data grew…
Big Data/ App Explosion!
Ampool Vision
Pipeline/ CDAP
IMDG / Geode
Ampool/ CDAP
Q & A
©2015Slide 4
Prepared for:BDA Meetup
ImmutableMutable
Unmanaged Managed
Log Publish
ETL
Build a Processing & Storage-agnostic Memory Architecture
Unstructured Structured
Pers
iste
nce
Proc
essi
ng
Unify data processing
Design for Scale-out
Best of breed data engines!
ampool
Data Frame
Data Set
QTxAmpool Vision
Pipeline/ CDAP
IMDG / Geode
Ampool/ CDAP
Q & A
©2015Slide 5
Prepared for:BDA Meetup
Ampool’s Mission:To help build real-time customer experiences through high-performance analytics built for modern, commodity hardware platforms
For the community:To speed-up big, real-time analytics in a democratic way through a memory-centric architecture (complementing existing architectures), driving better interoperability between compute and storage layers.
Ampool Vision
Pipeline/ CDAP
IMDG / Geode
Ampool/ CDAP
Q & A
©2015Slide 6
Prepared for:BDA Meetup
AnalyticsIngest App UseETL
Big Data Processing Pipelines…use slow, persistent storage for data exchange today!
…!
" # # #
Ampool Vision
Pipeline/ CDAP
IMDG / Geode
Ampool/ CDAP
Q & A
©2015Slide 7
Prepared for:BDA Meetup
AnalyticsIngest App UseETL
…!
" # # #
AMPOOL: Fast memory across distributed compute clusters...driving performance, simplicity and agility
ampool …
Ampool Vision
Pipeline/ CDAP
IMDG / Geode
Ampool/ CDAP
Q & A
©2015Slide 8
Prepared for:BDA Meetup
AnalyticsIngest App UseETL
!
"
Energy ManagementIoT Analytics
Data ingestion flows:• Smart meter data
(Kafka)
Hive processing:• De-norm, Sessionize• Aggregations
Spark processing:• Linear Regression• Export to HBase
Downstream Apps:• Web app integration
…ampool
HDFS
Ampool Vision
Pipeline/ CDAP
IMDG / Geode
Ampool/ CDAP
Q & A
©2015Slide 9
Prepared for:BDA Meetup
Pipeline implemented in CDAPAmpool Vision
Pipeline/ CDAP
IMDG / Geode
Ampool/ CDAP
Q & A
©2015Slide 10
Prepared for:BDA Meetup
CDAP Application
Ampool Vision
Pipeline/ CDAP
IMDG / Geode
Ampool/ CDAP
Q & A
©2015Slide 11
Prepared for:BDA Meetup
In-memory TechnologyWhat is Apache Geode?
Ampool Vision
Pipeline/ CDAP
IMDG / Geode
Ampool/ CDAP
Q & A
©2015Slide 12
Prepared for:BDA Meetup
How does it compare with the Big Data stack?YCSB: Geode & HBase
Ampool Vision
Pipeline/ CDAP
IMDG / Geode
Ampool/ CDAP
Q & A
©2015Slide 13
Prepared for:BDA Meetup
Ampool with CDAP
CDAP with HBase
(as-is Application)
Configuration ChangesExtension modules/directoryDistributed Mode table/stream
CDAP with Ampool(powered by Geode)
Ampool Vision
Pipeline/ CDAP
IMDG / Geode
Ampool/ CDAP
Q & A
©2015Slide 14
Prepared for:BDA Meetup
Ampool Vision
Pipeline/ CDAP
IMDG / Geode
Ampool/ CDAP
Q & ACDAP Demo Pipeline(Video)
©2015Slide 15
Prepared for:BDA Meetup
Ampool with CDAPPipeline Baseline: Ampool & HBase
Ampool Vision
Pipeline/ CDAP
IMDG / Geode
Ampool/ CDAP
Q & A
©2015Slide 16
Prepared for:BDA Meetup
• CDAP simplifies the development of complex big data pipelines and offers extensibility at multiple layers
• In-memory technology such as Geode promise higher performancein certain use-cases
• Ampool, powered by Geode, is able to show immediate performance gains without any pipeline re-engineering!
• Future…
Key TakeawaysAmpool complements CDAP…
©2015Slide 17
Prepared for:BDA Meetup
C o m p a t i b l e w i t h t h e F u t u r e
©2015Slide 18
Prepared for:BDA Meetup
AnalyticsIngest App UseETL
ampool
Customer BehaviorPredictive Modeling
Data ingestion flows:• Click streams (Kafka)• Dim. tables (Sqoop)
2-stage MR pipeline:• Cleanse data• Sessionize clickstream
HAWQ stages:• Data import (PxF)• Exp. features (MADlib)
Spark modeling stages:• Feature analysis (MLlib)• Scoring (R/ HAWQ)
…HDFS
!
"
©2015Slide 19
Prepared for:BDA Meetup
AnalyticsIngest App UseETL
Security AnalyticsBig Data Insights
Data ingestion flows:• Security Logs (Flume)
Pig data processing:• Joins logs w/ catalog• Stores denorm. logs
Kylin stages:• Pre-aggregations• Export to HBase
Downstream Apps:• Drill-down API for logs• Web app integration
…ampool
!
"
HDFS