Apache Apex + Apache Geode In-Memory Streaming, Storage & Analytics Ashish Tadose
Apache Apex + Apache GeodeIn-Memory Streaming, Storage & Analytics
Ashish Tadose
Streaming meets In Memory Data Grid
Apache Geode: Listeners• CacheWriter / CacheListener• AsyncEventListener (queue / batch)
• Parallel or Serial• Conflation
3
Apache Geode: Events & NotificationsRegister Interest•Individual Keys OR RegEx for Keys•Updates Local Copy•Examples:
• region.registerInterest(“key-1”);• region1.registerInterestRegex(“[a-z]+“);
Continuous Query•Receive Notification when Query condition met on server•Example:– SELECT * FROM /tradeOrder t WHERE t.price > 100.00
Can be DURABLE
4
Apex: Checkpointing Today
er
Operator
er
Operator
er
Operator
Filtered
Stream
Filtered
Stream
er
OperatorInput
Stream
Enriched
Stream
Enriched
Stream
Output
Stream
Checkpoint
State
Checkpoint
State
Checkpoint
State
Persistence
In-Memory
Apex: Checkpointing with Geode
er
Operator
er
Operator
er
Operator
Filtered
Stream
Filtered
Stream
er
OperatorInput
Stream
Enriched
Stream
Enriched
Stream
Output
Stream
Checkpoint
State
Checkpoint
State
Checkpoint
State
Persistence
In-Memory
Operator Checkpointing in Geode
Apex Operator check-pointing in an IMDG (Geode store)
•Checkpointing is an essential mechanism to ensure Fault Tolerance•Apex checkpoints operator state to HDFS•Slower HDFS checkpointing hurts application performance•Checkpointing in Geode ensures that application performance is not impacted •Geode has better latency for write operations than HDFS.
Implementation: GeodeStorageAgent
https://issues.apache.org/jira/browse/APEXCORE-283
Apex: Input/Output Operators
er
Operatorer
OutputOperator
Input
StreamOutput
Stream
Checkpoint
State
Checkpoint
State
Data Store
In-Memory
…
No SQL Database
er
Operatorer
Geode Output
Input
StreamOutput
Stream
Checkpoint
State
Checkpoint
State
Data Store
In-Memory
…
Geode Output Operator
• Built-in OQL support
• Visualization support
• Persistence options
• Transaction support
Data Streams to Geode Store
Apex + Geode: Future Integrations
• Geode output operator with transactional support
• Input Operator: Ingest data from Geode to Apex DAG
• Distributed Cache Operator
• Scan Operator: Parallel query execution & result retrieval
Geode Transaction Operator
Apex Output Operator to write to Geode store with Transactions
•Apex DAG uses TransactionableStore to provide guarantee that records are written are exactly once. E.g. JdbcTransactionalStore
•Geode provides transaction support for efficient and safe coordinated operations•Geode store using transactions guarantee that records are written exactly once•Put operator backed by GeodeTransactional store can help to achieve Exactly once semantics
Implementation: GeodeWindowStore as TransactionableStore
Proposed
Input Operator: Streaming Geode data
Apex Input Operator to read from Geode store
•Apex Input operators – Ingest data from external sources into Apex DAG
•Geode provides versatile and reliable event distribution to provide Real Time updates to data• Use case – Apex operator to stream async events from Geode in DAG• Call back events reduce polling cycles over network
Implementation: GeodeRegionStreamOperator receives a newly added tuples and emits in DAG
Proposed
Geode Cache Operator
Apex+Geode Cache Operator
•Geode provides efficient Events & Notifications • Register interest – update local copies • Continuous Query
• Receive notification when Query condition met on server• Eg.g SELECT * FROM /tradeOrder t WHERE t.price > 100.00
•Use Geode events notification framework to maintain & invalidate cache.
Implementation: GeodeCacheOperatormaintains consistent cache based on subscribed keyset/query
Proposed
Geode Scan Operator
Apex+Geode Scan Operator
•Function Execution provides Parallel Query Execution•MapReduce like execution - concurrent execution on members & results are collected from members & sent to caller. •Use case: Streaming application depending on large scan result from external store
Implementation: GeodeQueryOperator execute data dependent queries on distributed regionemit results in DAG
Proposed
Questions ???
Thank You …