Hazelcast for Terracotta Users

© 2014 Hazelcast Inc.

From Terracotta To HazelcastIntroduction on Hazelcast for Terracotta Users

AUTUMN 2014


About me

Rahul Gupta @wildnez

!Senior Solutions Architect for Hazelcast

Worked with Terracotta and Coherence

Worked for Major Investment Banks

Java Programmer since 1996

Started programming in VHDL and later on 8048, 80286 CPU

2


How this is going down

3

• Limitations of Terracotta

• What is Hazelcast

• Migration

• Important Features


Limitations of Terracotta!• Only an in-memory data store !

• Complex APIs to use distributed collections !

• No capabilities of processing data in memory !

• Data needs to be fetched by the application resulting in network hops !

• Inflexible - Only Client-Server architecture !

• Dedicated environment for backup, Extra Server Licenses (Passive/Mirror) !

• Requires dedicated environment !

• Requires downtime to scale

4


Hazelcast overcomes the limitations

• True IMDG space – In-Memory Distributed Data Caching

» Native Memory – In-Memory Parallel Data Processing

» Distributed ExecutorService » EntryProcessors

– In-Memory Map-Reduce !

• Distributed Pub-Sub Messaging Model !

• Simple Access to Distributed Collections

5



• Highly Flexible Deployments – Client-Server

» Servers run in a separate tier in dedicated environment

» Does not require dedicated infrastructure for running backup

– Embedded

» Hazelcast node runs within the application JVM

» Application nodes made distributed by Hazelcast running within their JVM

» Does not require dedicated environment

6



• Backups also serve as Main nodes !

• No extra licenses for backup !

• Scales on the fly !

• No downtime required to add/remove nodes

7

© 2014 Hazelcast Inc.© 2014 Hazelcast Inc.© 2014 Hazelcast Inc.

Configuring & forming a cluster

8


Forming a Cluster

• Hazelcast Clusters run on JVM

• Hazelcast discovers other instances via Multicast (Default)

• Use TCP/IP lists when Multicast not possible

• Segregate Clusters on same network via configuration

• Hazelcast can form clusters on Amazon EC2.

9


Hazelcast Configuration - Server

• Only one jar - look for hazelcast-all-x.x.x.jar !

• Hazelcast searches for hazelcast.xml on class path !

• Will use hazelcast-default.xml for everything else. !

• Hazelcast can be configured via XML,API or Spring !

• Configure Networks, Data Structures, Indexes,Compute

10


Form a cluster

1. Sample hazelcast.xml looks like this

11


TOP TIP

• The <group> configuration element is your friend. !

• It will help you isolate your cluster on the multicast network. !

• Don’t make the mistake of joining another developers cluster or worse still a Production Cluster!

12


Configuration via API

1. Add GroupConfig to the Config instance.

13


TOP TIP

• You can run multiple Hazelcast instances in one JVM. !

• Handy for unit testing. !

14


Configure Cluster to use TCP/IP

1. Edit multicast enabled = false 2. Add tcp-ip element with your ip address

15


Client Configuration

16


Hazelcast Configuration - Client

• Hazelcast searches for hazelcast-client.xml on class path • Full API stack - Client API same as Server API • Clients in Java, C#, C++, Memcache, REST

17


Starting as a Client or Cluster JVM

18

Notice Client and Cluster return same HazelcastInstance reference.


Code Migration

19


Migration - Terracotta to Hazelcast

• Terracotta implementation of cache puts/gets: !!!!

• Replace Terracotta implementation by Hazelcast code:

20


Migration - Terracotta to Hazelcast

• Terracotta implementation of Blocking Queue(notice the complex APIs): !!!!

• Replace Terracotta by Hazelcast Queue (notice the simplicity):

21


Topologies

22


Hazelcast Topologies

• Traditional Client -> Server (Client -> Cluster) !

• Clients do not take part in standard cluster coms. !

• Consider Client -> Cluster topology to segregate service from storage. !

• Smart Clients connect to all Clusters nodes. Operations go directly to node holding data. !

• Embedded model, for example in a J2EE container. service and storage within one JVM.

23


Terracotta Cluster

24


Embedded Hazelcast

25


Client Server -> (Client -> Cluster)

26


Distributed Collections

27


Maps

28


Distributed Maps - IMap

• Conforms to the java.util.Map interface !

• Conforms to the java.util.ConcurrentMap interface !

• Hazelcast IMap interface provides extra featuresEntryListenersAggregatorsPredicate QueriesLocking Eviction

29


Wildcard Configuration

• Hazelcast support wildcards for config.

30

• Beware of ambiguous config though. • Hazelcast doesn’t pick best match and what it picks is

random not in the order it appears in config.


Properties

• Hazelcast supports property replacement in XML config

31

• Uses System Properties by default • A Properties Loader can be configured


Near Cache

• Terracotta L1 Cache -> Hazelcast Near Cache !

• Highly recommended for read-mostly maps

32


Replicated Map

33

• Does not partition data.

• Copies Map Entry to every Cluster JVM. !• Consider for immutable slow moving data like config.

!• ReplicatedMap interface supports EntryListeners.

!


Data Distribution and Resource Management

34


Data Distribution

• Data (primary + backup) is distributed in cluster using partitions !

• 271 default partitions !

• Partitions are divided among running cluster JVMs. !

• Discovery of resident partition performed by the client before sending out update/get calls !

• In smart-client setup, requests go directly to the host node !

• Hazelcast places a backup of the Map Entry on another partition as part of the Map.put

35


Data Distribution

• The backup operation can be sync (default) or async to the Map.put !

• Each node acts as Primary and Backup compare to Active-Passive on dedicated resources setup in Terracotta - efficient use of resources !

• When cluster JVM enters or leaves, cluster partitions are rebalanced !

• In event of a node failure - – Primary data is retrieved from backup and distributed across

remaining nodes in cluster – New backup is created on good nodes

36


Data Distribution

37

Fixed number of partitions (default 271)Each key falls into a partitionpartitionId = hash(keyData)%PARTITION_COUNT Partition ownerships are reassigned upon membership

A B C


New Node Added

DA B C


Migration

DA B C


Migration

DA B C


Migration

DA B C


Migration

DA B C


Migration

DA B C


Migration

DA B C


Migration Complete

DA B C

Crash


Fault Tolerance & Recovery

46


Node Crashes

DA B C

Crash


Backups are Restored

DA B C

Crash



DA B C

Crash



DA B C

Crash



DA B C

Crash


Data is Recovered from backup

DA B C

Crash


Data is Recovered from backup

DA B C

Crash


Backup for Recovered Data

DA B C

Crash



DA B C

Crash



DA B C

Crash


All Safe

DA C


In Memory Format & Serialization

58


In Memory Format

59

• Flexibility in data store format compared to Terracotta’s binary only

• By default, data in memory is binary (serialised) format. • Local Processing on a node has to keep deserialising. • Use OBJECT if local processing (entry processor,executors) • Use BINARY if get(ing) data over the network


Serialization

60

• Custom Serialization as against “Serializable” only option in Terracotta – DataSerializable

• Fine grained control over serialization • Uses Reflection to create class instance • “implements DataSerializable”

– public void writeData(ObjectDataOutput out) – public void readData(ObjectDataInput in)

– IdentifiedDataSerializable • Better version of DataSerializable • Avoids Reflection - faster serialization • extends DataSerializable • Two new methods -

– int getId() - used instead of classname – int getFactoryId() - used to load the class given to Id


Distributed Compute

61


Distributed Executor Service

62


Distributed Executor

63

• IExectorService extends java.util.concurrent.ExecutorService

• Send a Runnable or Callable into the Cluster

• Targetable Execution All MembersMemberMemberSelectorKeyOwner

• sync/async blocking based on Futures

• Or ExecutionCallback notifies onResponse !

!



64

• If System Resources permit you can scale up the number of threads the ExecutorService uses.

!!

!!



65

!!

• Each Member creates its own work queue.

• Tasks are not partitioned or load balanced.

• If member dies while task is enqueue on member it is lost.

• You need to lock any data you access, but beware of Deadlocks! !


EntryProcessor

66


EntryProcessor

67

• A Distributed Map Entry Processor Function

• Provides locking guarantees

• Work directly on the Entry object in a node

• Executed on the Partition Thread rather than the Executor

• Submitted via the IMap

• Best to apply delta updates without moving the object across the network


EntryProcessor

68

• EntryProcessor also mutates the Backup copy

• Use the AbstractEntryProcesssor for default backup behaviour

• Implement EntryProcessor directly to provide your own Backup behaviour, for example sending delta only

!• Only alternative to Terracotta DSO


EntryProcessor

69


EntryProcessor

70

• Other tasks run on Partition Thread (Puts, Gets)

• It is important to yield the EntryProcessor

• hazelcast.entryprocessor.batch.max.size: Defaults to 10.000

• Hazelcast will not interrupt a running operation it onlyyields when the current Key has been processed.


In-memory Map Reduce

71


MAP Reduce

72

• In-memory Map/Reduce compared to disk bound M/R • Similar paradigm to Hadoop Map/Reduce • Familiar nomenclature for ease of understanding and

use – JobTracker – Job – Mapper – CombinerFactory – Reducer – Collator


Distributed Aggregation

73


Aggregators

74

• Ready-to-use in-memory data aggregation algorithms • Implemented on top of Hazelcast MapReduce framework • More convenient than MR for large set of standard operations • Work on both - IMap and MultiMap • Types of aggregation: – Average, Sum, Min, Max, DistinctValues, Count


Querying

75


Querying with Predicates

76

• Rich Predicate API that can be run against IMap, similar to criterion based Terracotta Search Collection<V> IMap.values(Predicate p) Set<K> IMap.keySet(Predicate p) Set<Map.Entry<K,V>> IMap.entrySet(Predicate p)Set<K> IMap.localKeySet(Predicate p)

!!



77

notEqual instanceOf like (%,_) greaterThan

greaterEqual lessThan lessEqual between

in isNot regex



78

• Create your own Predicates !!


SQL like queries

79

• SQLPredicate class. • Runs only on Values. • Converts the String to a set of concrete Predicates.

!!


Indexes

80

• Prevent full Map scans. • Indexes can be ordered or unordered. • Indexes can work along the Object Graph (x.y.z). • When indexing non primitives they must implement

Comparable. • Indexes can be created at runtime.

!!


Conclusion

81


Conclusion

• Hazelcast is easy to use !

• Easy to migrate from Terracotta !

• Familiar naming convention !

• Lot more features and use cases than just a data store !

• On the fly scale !

• Zero downtime !

• No single point of failure

82


Thank You !

@wildnez !

[email protected]

83

mailto:[email protected]?subject=

Hazelcast for Terracotta Users

Software

configuration hazelcast

jvm hazelcast

hazelcast searches

cluster hazelcast clusters

multiple hazelcast instances

backup embedded hazelcast

memory data store

memory map