Creating a platform for unlimited elastic computation power and storage Building An Elastic Real Time NoSQL Platform
Jun 28, 2015
Creating a platform for unlimited elastic computation power and storage
Building An Elastic Real Time NoSQL Platform
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Motivation
Complete elastic solution stack Applications that need massive “strategic” storage (disk-based
NoSQL) and a real time (“tactical”) component Horizontally and vertically scalable Highly available Self healing Fault tolerant: suitable for commodity h/w strategy Simplified management and monitoring, vs conventional,
multi-product solutions
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
What Is Real-Time? It’s all relative In this context, it means “really fast”. How fast is really fast? Reads as low as 5 μs read and typically
under 1 ms for a fully replicated write.
Source: http://blog.gigaspaces.com/2010/12/06/possible-impossibility-the-race-to-zero-latency/
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Two Layer Approach Advantage: Minimal
“impedance mismatch” between layers.– Both NoSQL cluster
technologies, with similar advantages
Grid layer serves as an in memory cache for interactive requests.
Grid layer serves as a real time computation fabric for CEP, and limited ( to allocated memory) real time map/reduce capability.
In Memory Compute Cluster
NoSQL Cluster
...
Raw
Eve
nt S
trea
m
Raw
Eve
nt S
trea
m
Raw
Eve
nt S
trea
m
Raw And Derived Events
Rep
orti
ng E
ngin
e
SCALE
SCALE
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Two Layer Approach (continued)
Grid layer doing CEP can act as a filter, as many raw events get converted to semantic/business events, reducing meaningless data verbosity
Grid layer provides scalable messaging NoSQL layer provides unlimited cheap storage on commodity
hardware NoSQL layer provides virtually unlimited scale processing
power
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Basics Of In Memory DataGrid Technology An In Memory Data Grid (IMDG) is a data store Grid just means “cluster” Data can be partitioned across cluster nodes Processing power near data storage Distributed hash table Application optimized data model denormalization Nodes are typically configured with one or more replicas
(sound familiar yet)? Not a “cache”: a system of record, but can be used as a
cache, or both
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Advanced Capabilities Business logic (code) co-resident with data shards Scalable messaging Dynamic code execution across cluster Multi-language support Object-oriented Document-oriented/schema free Multi-level indexing SQL Queries Full ACID transaction support Elastic scaling (automatic and manual) Write-behind persistence
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Features: IMDG vs NoSQL
Eventual/Tunable Consistency
Fault Tolerant
Highly Available
Horizontally Scalable
Low Latency
Parallel Execution
Code co-location
Unlimited scaleService remoting
Transactional
ElasticMessaging
Complex Event Processing
Platform Independent
Flexible Schema
Cloud enabled Hadoop tools
Data Grid Disk Based NoSQL
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Vive La Difference
The IMDG compliments a NoSQL store:– Can serve as a short term request cache (side cache or inline)– Can serve as a cache for MR results– Enables event driven architectures / CEP– In memory map/reduce– Very fast writes, regardless of NoSQL store– Transactional layer: can essentially turn “eventual” consistency into
pure transactional persistency without a performance hit– Highly available and independently scalable
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
A Complete Scalable Application Platform
In Memory Compute Cluster
NoSQL Cluster
...
Raw
Eve
nt S
trea
m
Raw
Eve
nt S
trea
m
Raw
Eve
nt S
trea
m
Real
Tim
e Ev
ents
Raw And Derived Events
Real
Tim
e Ev
ents
Repo
rting
Eng
ine
SCALE
SCALE
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Key Implementation Issues
Grid must support reliable asynchronous persistence– If not reliable: in-flight data is at risk. Ideally tunable to accommodate
differing risk tolerance.– If not asynchronous: too slow– If not persistent: obviously nothing gets send to disk
To do more than a distributed cache, grid must support code and data partitioning– Ideally, code is collocated in memory with data partition– Needed to support CEP, application, and service remoting capabilities
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Key Implementation Issues
Grid ideally supports FIFO entry ordering– Key to using grid as a queue– Key to scaling messaging without an additional tier– Combined with co-located business logic, operates at memory speeds
Write speed on the NoSQL layer– Grid is, in effect, queuing entries to the NoSQL layer– If the NoSQL layer cannot keep up, in memory grid backs up– This behavior is an asset, unless an unanticipated, sustained flood
occurs.– The faster the write speed the better
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Use Case 1 – Event Cloud Complex event processing
Collect events in real time•Interactions•Orders•Bills•Payments•Activations•…
Transform into decision factors•Good customer•Pays 3-6 days early•Decreasing usage•Missed payment•Unusual bill•App usage
Original events, possibly scrubbed or annotated, are passed through
Business logic derived “synthetic events” constructed from raw event stream. Possible rule engine integration(e.g. Drools).
Derived events and analytics passed on to NoSQL layer Other events forwarded to external listeners, systems
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Use Case 2 – Time Bounded
Time Bounded – suited to operations with daily business cycle (e.g. trading)
Current day (or other time period that will fit in memory) held in memory, along with related application state, caching etc…
Still streaming operations to underlying NoSQL platform, or hold for end of day flush if back end can’t write fast enough.
Supports application hosting, messaging, and complex event processing.
External clients are aware of “current day” store, vs archival. Large scale reports/analytics run in background on NoSQL
archive.
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Use Case 3 - LRU
Grid holds a subset of NoSQL store, and supports an LRU caching model.
In line or side-cache. Appropriate only in cases where, like any cache, usage pattern
does not generate many cache misses. Still supports CEP, messaging, and computation scaling
(provided grid product supports it).
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Wishlist
This platform concept is still at an early stage For Gigaspaces, integrations already exist for Cassandra and
MongoDB. Customers are currently implementing solutions Stuff I’d like to see:
– Unified management and scaling. Shared infrastructure.– Grid/NoSQL aware hive façade that can run MR jobs on both. Perhaps
other Hadoop tools integration– Deeper integration. To further optimize write speed/capacity, and
perhaps offload some in-memory aspects of underlying NoSQL platform to minimize duplication and possibly optimize elasticity.
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
Conclusion
Two shared nothing “NoSQL” architectures complementing each other
Fully elastic/scalable Ultra high performance/low latency combined with unlimited
scale. Full application stack Highly reliable and self-healing Scalable complex event handling Multi-language Simple. Two products.
18