How to build an elastically scalable, multi-tenant, FREE big data service Webinar
Jul 16, 2015
How to build an elastically
scalable, multi-tenant, FREE
big data serviceWebinar
@karlunho
Alan Ho@sbaxi
Shailendra Baxi
@rbhargava
Rajesh
Bhargava
youtube.com/apigee
slideshare.com/apigee
Agenda
1. What & Why we built this service
2. Demo
1. Technical Architecture
2. Developer Experience
5
Apigee Developer
6
What we built
Free big data service for building
context aware apps
7
Context Aware Apps are “Behavior Driven”
8
Developer Alternatives for Machine Learning
9
Amazon Machine
Learning
Insights approach for Apigee Developer
10
Accelerated
Development
Descriptive &
Predictive
Behavior Based
Algorithms
E2E Experience
Free
Architecture
1
1
DATA
INSIGHTS
1.Data uploadStructured or Unstructured
2. ScalableVolume, Variety & Velocity
3. Core IP Machine LearningGraph ProcessingUn-structured Data
4. Analytics OfferingsPredictive & Journey analytics, segmentation
User Interactions
Prediction Journey Segmentation
Computational AlgorithmsMachine Learning Library
Data Pipelines Unstructured Data
Processors GRASP Processor
Distributed Processing FoundationDistributed Data and Job Management
Apache usergrid
Query Language
Modeling Work Bench User Interface
Transactional Datastore
Modeling, Scoring, Data Transformation,
Aggregation/Reporting
Ephemeral Hadoop Cluster
Management Service
Software LibrariesGRASP Unstructured Data
Machine Learning
Insights Master
Data Staging Area
Monitoring service
Ingestion Datastore
GRASP Query Service
QueryDatastore
Query Server
Real Time Service (Edge)
Real Time Datastore (usergrid)
node
Applications
UI, Modeling Workbench
Application Data
HTTPS, AWS APIs
HTTP(S)
Persistent Datastore
= S3
= HDFS
API
System Components
Metadata Service
Runtime MetadataJob Queue, Job Dependencies, Data
Set partitions
Metadata - Store
Static MetadataDataStore & Dataset, Application, Job
How does Insights work?
Ingest Customer Data
Batch or browser based
Event based or Customer profile
Aggregate behavior graphs
Cross-channel, domain-agnostic customer journey graphs
Enriched with Customer profile
Query capability and machine learning
Customer journey visualization
Models & Scores
Data scientist + developer support
R interface for predictive modeling on Hadoop
Integrated with API Edge (incl BaaS, node.js)
Data Flow
Customer
Data store
Persistant
Data storeHDFS on
compute cluster
Serving Data store
(Customer,
usergrid)
Data Ingestion
(Batch or Browser
based)
Data Moved to
Persistent
storage
Data brought to the
compute cluster for
processing
Processed Data
exported to
appropriate
location
Transactional Datastore
Modeling, Scoring, Data Transformation,
Aggregation/Reporting
Ephemeral Hadoop Cluster
Management Service
Software LibrariesGRASP Unstructured Data
Machine Learning
Insights Master
GRASP Query Service
QueryDatastore
Query Server
Real Time Service
Real Time Datastore (usergrid)
node
Applications
UI, Modeling Workbench
Application Data
HTTPS, AWS APIs
HTTP(S)
Persistent Datastore
= S3
= HDFS
API
Data level Multi-tenancy
Metadata Service
Runtime MetadataJob Queue, Job Dependencies, Data
Set partitions
Metadata - Store
Static MetadataDataStore & Dataset, Application, Job
Data Staging
Monitoring service
Ingestion Datastore
Datasets segregated/sharded by Account ID
Data keyed by account ID
Applications
UI, Modeling Workbench
Application Data
Transactional Datastore
Modeling, Scoring, Data Transformation,
Aggregation/Reporting
Ephemeral Hadoop Cluster
Management Service
Software LibrariesGRASP Unstructured Data
Machine Learning
Insights Master
Data Staging Area
Monitoring service
Ingestion Datastore
GRASP Query Service
QueryDatastore
Query Server
Real Time Service
Real Time Datastore (usergrid)
node
HTTPS, AWS APIs
HTTP(S)
Persistent Datastore
= S3
= HDFS
API
Scalability
Metadata Service
Runtime MetadataJob Queue, Job Dependencies, Data
Set partitions
Metadata - Store
Static MetadataDataStore & Dataset, Application, Job
Horizontal ScalingElastic/Ephemeral scaling
Sharding
Insights UI & APIs
• HTML5 Single page application
• Interacts with RESTful APIs
• Guide a novice user through the experience – Help them
understand important Predictive / Machine learning concepts
• Scalable REST API infrastructure
16
Insights R SDK
17
Developer Resources
• E2E Recommendation Tutorial – Try it Free !
• Sample Datasets
• Blog posts, Embedded Documentation
18
Try it out Apigee Developer
https://accounts-beta.apigee.com
19
Summary
• Be practical when approaching multi-tenancy
• Cost can be drastically reduced with elastic scaling & Multi-
tenancy
• Developer Experience requires continual refinement
• Try it out our Free Service for yourself !
20