ApexMeetup Geode - Talk1 2016-03-17

Apache Geode,and Pivotal's leadership role

in open sourcing (Gemfire)

Nitin Lamba

(incubating)

Pivotal’s Open Source strategyWhat is Apache Geode?HistoryDifferentiatorsBasic Concepts

ResourcesQ & A

Agenda

2

3

4

In 2015, Pivotal granted the components of its Big Data Suite to open source

6 Million Lines of Code4 new open source communities

5

May 2015 Sept 2015

Sept 2015Oct 2015

From GEMFIRE to GEODE…

6

A distributed, memory-based data management platform for data oriented apps that need:• high performance, scalability,

resiliency and continuous availability

• fast access to critical data sets• location-aware distributed data

processing• event-driven data architecture

What is GEODE?

7

• 1000+ systems in production (real customers)

• Cutting edge use cases

Incubating but ROCK solid…

8

<2000 2004 2008 2012 2016

Early drivers• Data Volumes• Margins/

transactions• IT maintenance

costs • Elasticity needs

Real-time needs• Real-time response• Time to market

needs• Flexible Data Models • Persistent+In-

memory

Global Data• Visibility across

DC• Fast Ingest• Device to

enterprise • Uptime (always

on)

Open Source!• Apache Incubation• Gemfire > Geode• Geode M1 release• 1st Geode Summit

Financial Services

US DoD

Trade Clearing

Travel Portal

Online Gambling

TelcosManufacturingAuto

InsurancePayroll processing

Rail systems

…with both SCALE and SPEED, …

9

40K

Transactionsper second

3TB Data

in-memory

17B Records

in-memory

120K

Concurrent users

… and impacting a LOT of people!

10

China RailwayCorporation

Indian Railway

s

17%

19%

36%

of the world population

High-level Architecture

11

Powerful app development kit• APIs: Java & REST• Adapters: Redis, Lucene*,

Spark*, …

Multiple persistence options• Filesystem, RDBMS or HDFS*• Sync: read-through, write-

through• Async: write-behind

Durable <K,V> cache/ store• Data replicated or partitioned• Redundant storage in-memory/

disk• Flexible data retention policiesÎ

Loca

tor

Serv

er

Serv

er

Serv

er

Serv

er +

A Peer-2-Peer in-memory Distributed System

REST

* Experimental and waiting community feedback

• Minimize copying• Minimize contention points• Run user code in-process• Partitioning & parallelism• Avoid disk seeks• Automated benchmarks

What makes it go FAST?

12

• Cache• Region• Member• Client Cache• Persistence• Functions

Let’s talk about a few BASIC CONCEPTS…

13

• In-memory storage and management for your data

• Configurable through XML, Java API or CLI

• Collection of Region

What is a CACHE?

14

• Distributed java.util.Map on steroids (Key/Value)

• Consistent API regardless of where or how data is stored

• Observable (reactive) • Highly available, redundant on

cache Member (s).

What is a REGION?

15

• Local, Replicated or Partitioned• In-memory or persistent• Redundant• LRU • Overflow

Region: Types & Options

16

LOCALLOCAL_HEAP_LRULOCAL_OVERFLOWLOCAL_PERSISTENTLOCAL_PERSISTENT_OVERFLOWPARTITIONPARTITION_HEAP_LRUPARTITION_OVERFLOWPARTITION_PERSISTENTPARTITION_PERSISTENT_OVERFLOWPARTITION_PROXYPARTITION_PROXY_REDUNDANTPARTITION_REDUNDANTPARTITION_REDUNDANT_HEAP_LRUPARTITION_REDUNDANT_OVERFLOWPARTITION_REDUNDANT_PERSISTENTPARTITION_REDUNDANT_PERSISTENT_OVERFLOWREPLICATEREPLICATE_HEAP_LRUREPLICATE_OVERFLOWREPLICATE_PERSISTENTREPLICATE_PERSISTENT_OVERFLOWREPLICATE_PROXY

• Durability• WAL for efficient writing• Consistent recovery• Compaction

Persistent Regions

17

Server 1 Server N

• A process that has a connection to the system

• A process that has created a cache

• Embeddable within your application

What is a MEMBER?

18

Client

Locator

Server

• A process connected to the Geode server(s)

• Can have a local copy of the data• Run OQL queries on local

data• Can be notified about events

on the servers

What is a CLIENT CACHE?

19

Persistence - Shared Nothing

20

Server 3Server 2Server 1


21


B1

B3

B2

B1

B3

B2Primary

Secondary


22


B1

B3

B2

B1

B3

B2Primary

Secondary


23


B1

B3

B2

B1

B3

B2Primary

Secondary


24


B1

B3

B2

B1

B3

B2Primary

Secondary

B3

B2

Server 1 waits for others when it starts


25


B1

B3

B2

B1

B3

B2Primary

Secondary

Fetches missed operations on restart

Persistence - Operational Logs

26

Create

k1->v1Create k2->v2

Modifyk1->v3

Create k4->v4

Modify

k1->v5Create k6->v6

Member 1Put k6->v6

Oplog2.crf

Oplog1.crf

Append to

operation log

Persistence - Operational Logs: Compaction

27

Create

k1->v1

Create k2->v2

Modifyk1->v3

Create k4->v4

Modify

k1->v5Create k6->v6

Member 1Put k6->v6

Oplog2.crf

Oplog1.crf

Append to

operation log

Copy live

data forward

• Used for distributed concurrent processing (Map/Reduce, stored procedure)

• Highly available• Data oriented• Member oriented

Functions

28

Functions

29

30

• Check out: http://geode.incubator.apache.org

• Subscribe: [email protected]

• Download: http://geode.incubator.apache.org/releases/

Join the Community!

http://geode.incubator.apache.org/

mailto:[email protected]

http://geode.incubator.apache.org/releases/

31

Thank you!

Additional Slides

32

Built for PERFORMANCE…

33

A Re

ads

A Up

date

s

B Re

ads

B Up

date

s

C Re

ads

D In

serts

D Re

ads

F Re

ads

F Up

date

s0

200,000

400,000

600,000

800,000

1,000,000

Cassandra Geode

YCSB Workloads

Oper

atio

ns p

er s

econ

d

…and horizontal, consistent SCALABILITY!

34

Horizontal scaling for reads, consistent latency and CPU

2 4 6 8 100

1.25

2.5

3.75

5

6.25

0

4.5

9

13.5

18

speedup latency (ms) CPU %

Server Hosts

Spee

dup

• Scaled from 256 clients and 2 servers to 1280 clients and 10 servers• Partitioned region with redundancy and 1K data size

High Availability

35

ApexMeetup Geode - Talk1 2016-03-17

Technology