Apache Geode, and Pivotal's leadership role in open sourcing (Gemfire) Nitin Lamba (incubating)
Apache Geode,and Pivotal's leadership role
in open sourcing (Gemfire)
Nitin Lamba
(incubating)
Pivotal’s Open Source strategyWhat is Apache Geode?HistoryDifferentiatorsBasic Concepts
ResourcesQ & A
Agenda
2
3
4
In 2015, Pivotal granted the components of its Big Data Suite to open source
6 Million Lines of Code4 new open source communities
5
May 2015 Sept 2015
Sept 2015Oct 2015
From GEMFIRE to GEODE…
6
A distributed, memory-based data management platform for data oriented apps that need:• high performance, scalability,
resiliency and continuous availability
• fast access to critical data sets• location-aware distributed data
processing• event-driven data architecture
What is GEODE?
7
• 1000+ systems in production (real customers)
• Cutting edge use cases
Incubating but ROCK solid…
8
<2000 2004 2008 2012 2016
Early drivers• Data Volumes• Margins/
transactions• IT maintenance
costs • Elasticity needs
Real-time needs• Real-time response• Time to market
needs• Flexible Data Models • Persistent+In-
memory
Global Data• Visibility across
DC• Fast Ingest• Device to
enterprise • Uptime (always
on)
Open Source!• Apache Incubation• Gemfire > Geode• Geode M1 release• 1st Geode Summit
Financial Services
US DoD
Trade Clearing
Travel Portal
Online Gambling
TelcosManufacturingAuto
InsurancePayroll processing
Rail systems
…with both SCALE and SPEED, …
9
40K
Transactionsper second
3TB Data
in-memory
17B Records
in-memory
120K
Concurrent users
… and impacting a LOT of people!
10
China RailwayCorporation
Indian Railway
s
17%
19%
36%
of the world population
High-level Architecture
11
Powerful app development kit• APIs: Java & REST• Adapters: Redis, Lucene*,
Spark*, …
Multiple persistence options• Filesystem, RDBMS or HDFS*• Sync: read-through, write-
through• Async: write-behind
Durable <K,V> cache/ store• Data replicated or partitioned• Redundant storage in-memory/
disk• Flexible data retention policiesÎ
Loca
tor
Serv
er
Serv
er
Serv
er
Serv
er +
A Peer-2-Peer in-memory Distributed System
REST
* Experimental and waiting community feedback
• Minimize copying• Minimize contention points• Run user code in-process• Partitioning & parallelism• Avoid disk seeks• Automated benchmarks
What makes it go FAST?
12
• Cache• Region• Member• Client Cache• Persistence• Functions
Let’s talk about a few BASIC CONCEPTS…
13
• In-memory storage and management for your data
• Configurable through XML, Java API or CLI
• Collection of Region
What is a CACHE?
14
• Distributed java.util.Map on steroids (Key/Value)
• Consistent API regardless of where or how data is stored
• Observable (reactive) • Highly available, redundant on
cache Member (s).
What is a REGION?
15
• Local, Replicated or Partitioned• In-memory or persistent• Redundant• LRU • Overflow
Region: Types & Options
16
LOCALLOCAL_HEAP_LRULOCAL_OVERFLOWLOCAL_PERSISTENTLOCAL_PERSISTENT_OVERFLOWPARTITIONPARTITION_HEAP_LRUPARTITION_OVERFLOWPARTITION_PERSISTENTPARTITION_PERSISTENT_OVERFLOWPARTITION_PROXYPARTITION_PROXY_REDUNDANTPARTITION_REDUNDANTPARTITION_REDUNDANT_HEAP_LRUPARTITION_REDUNDANT_OVERFLOWPARTITION_REDUNDANT_PERSISTENTPARTITION_REDUNDANT_PERSISTENT_OVERFLOWREPLICATEREPLICATE_HEAP_LRUREPLICATE_OVERFLOWREPLICATE_PERSISTENTREPLICATE_PERSISTENT_OVERFLOWREPLICATE_PROXY
• Durability• WAL for efficient writing• Consistent recovery• Compaction
Persistent Regions
17
Server 1 Server N
• A process that has a connection to the system
• A process that has created a cache
• Embeddable within your application
What is a MEMBER?
18
Client
Locator
Server
• A process connected to the Geode server(s)
• Can have a local copy of the data• Run OQL queries on local
data• Can be notified about events
on the servers
What is a CLIENT CACHE?
19
Persistence - Shared Nothing
20
Server 3Server 2Server 1
Persistence - Shared Nothing
21
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2Primary
Secondary
Persistence - Shared Nothing
22
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2Primary
Secondary
Persistence - Shared Nothing
23
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2Primary
Secondary
Persistence - Shared Nothing
24
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2Primary
Secondary
B3
B2
Server 1 waits for others when it starts
Persistence - Shared Nothing
25
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2Primary
Secondary
Fetches missed operations on restart
Persistence - Operational Logs
26
Create
k1->v1Create k2->v2
Modifyk1->v3
Create k4->v4
Modify
k1->v5Create k6->v6
Member 1Put k6->v6
Oplog2.crf
Oplog1.crf
Append to
operation log
Persistence - Operational Logs: Compaction
27
Create
k1->v1
Create k2->v2
Modifyk1->v3
Create k4->v4
Modify
k1->v5Create k6->v6
Member 1Put k6->v6
Oplog2.crf
Oplog1.crf
Append to
operation log
Copy live
data forward
• Used for distributed concurrent processing (Map/Reduce, stored procedure)
• Highly available• Data oriented• Member oriented
Functions
28
Functions
29
30
• Check out: http://geode.incubator.apache.org
• Subscribe: [email protected]
• Download: http://geode.incubator.apache.org/releases/
Join the Community!
31
Thank you!
Additional Slides
32
Built for PERFORMANCE…
33
A Re
ads
A Up
date
s
B Re
ads
B Up
date
s
C Re
ads
D In
serts
D Re
ads
F Re
ads
F Up
date
s0
200,000
400,000
600,000
800,000
1,000,000
Cassandra Geode
YCSB Workloads
Oper
atio
ns p
er s
econ
d
…and horizontal, consistent SCALABILITY!
34
Horizontal scaling for reads, consistent latency and CPU
2 4 6 8 100
1.25
2.5
3.75
5
6.25
0
4.5
9
13.5
18
speedup latency (ms) CPU %
Server Hosts
Spee
dup
• Scaled from 256 clients and 2 servers to 1280 clients and 10 servers• Partitioned region with redundancy and 1K data size
High Availability
35