Reliable Multicasting with JGroups Bela Ban, Jan 2004 belaban@yahoo.com http://www.jgroups.org
Feb 05, 2016
Reliable Multicasting with JGroups
Bela Ban, Jan 2004belaban@yahoo.comhttp://www.jgroups.org
EBIG, Oakland Jan 21 2004 2
OverviewAPI, architectureProtocolsBuilding BlocksPerformanceFuture, Conclusion
EBIG, Oakland Jan 21 2004 3
What Is It ?
Toolkit for reliable multicasting Fragmentation Message retransmission Ordering Group membership, membership
change notificationLAN or WAN based
EBIG, Oakland Jan 21 2004 4
License
JGroups is a toolkit (JAR), to be linked against an application
Open Source under LGPL Commercial products can use JGroups
without having to LGPL their code Modifications to JGroups itself need to
be LGPL'ed (if distributed)Dual licensing in the future
EBIG, Oakland Jan 21 2004 5
API
Channel: similar to java.net.MulticastSocket plus group membership, reliability
Operations: Create a channel with a set of properties Connect to a group X. Everyone that
connects to X will see each other Send a message to all members of X Send a message to a single member
EBIG, Oakland Jan 21 2004 6
API
Receive a message Retrieve membership Be notified when members join, leave
(including crashes) Disconnect from the group Close the channel
EBIG, Oakland Jan 21 2004 7
API
JChannel channel=new JChannel("file://home/bela/default.xml");
channel.connect("demo-group");
System.out.println("members are: " + channel.getView().getMembers());
Message msg=new Message(null, null, "Hello world");
channel.send(msg);
Message m=(Message)channel.receive(0);
System.out.println("received msg from " + m.getSrc() + ": " + m.getObject());
ch.disconnect();
ch.close();
EBIG, Oakland Jan 21 2004 8
Group topology
Architecture of JGroups
C ha nne l
G M S
U N IC A S T
U D P
F D
N A K A C K
C ha nne l
G M S
U N IC A S T
U D P
F D
N A K A C K
C ha nne l
G M S
U N IC A S T
U D P
F D
N A K A C K
N e tw o rk
A p p lic a tio n A p p lic a tio n A p p lic a tio n
B u ild ingB lo c k s
B u ild ingB lo c k s
B u ild ingB lo c k s
EBIG, Oakland Jan 21 2004 10
Demo
DrawReplicatedTree: shared state
EBIG, Oakland Jan 21 2004 11
Stats
JGroups has ~ 90KLOC 30KLOC protocols 45KLOC main + building blocks 15KLOC unit tests
~ 90 protocols shipped with JGroupsSet of well-tested stacks (in XML
files)
EBIG, Oakland Jan 21 2004 12
Available protocols I
Transport UDP, TCP, TCP_NIO, TUNNEL, JMS,
LOOPBACKDiscovery
PING, TCPPING, TCPGOSSIP, UDPPINGGroup membershipReliable delivery & FIFO
NAKACK, SMACK, UNICAST
EBIG, Oakland Jan 21 2004 13
Available protocols II
Failure detection FD, FD_SOCK, FD_PID, FD_SIMPLE,
FD_PROB, VERIFY_SUSPECTSecurity
ENCRYPT, SSL ConnectionTable (n/a)Fragmentation (FRAG)State transfer (STATE_TRANSFER)
EBIG, Oakland Jan 21 2004 14
Available protocols III
Ordering FIFO, CAUSAL, TOTAL, TOTAL_TOKEN
Virtual Synchrony FLUSH, QUEUE, VIEW_ENFORCER
Probabilistic Broadcast PBCAST
Merging: MERGE(2), MERGEFAST
EBIG, Oakland Jan 21 2004 15
Available protocols IV
Distributed message garbage collection STABLE
Debugging PERF, TRACE, PRINTOBJS, SIZE, BSH
Simulation SHUFFLE, DELAY, DISCARD, DEADLOCK,
LOSS, PARTITIONER
EBIG, Oakland Jan 21 2004 16
Available protocols V
Dynamic configuration AUTOCONF
Flow control FLOW_CONTROL, FC
Misc PIGGYBACK, COMPRESS
EBIG, Oakland Jan 21 2004 17
Transport
Task Send messages from above to all members in
the group, or to a single member Receive messages from NW, pass up stack
UDP: multicast and multiple UDP unicastTCP: mcast done by multiple TCP unicastsTUNNEL: send to external router, e.g.
through firewall
EBIG, Oakland Jan 21 2004 18
Discovery
Task Initial discovery of members Used by GMS to determine coordinator to
send JOIN request toEach member returns its own addr,
plus the addr of the coordinator Typical response ({A,A}, {B,A}, {C,A})
Wait for n milliseconds or m responses
EBIG, Oakland Jan 21 2004 19
Discovery - UDP
Multicast discovery requestEach member responds with a
unicast UDP datagram (local-addr, coord-addr), back to the sender
EBIG, Oakland Jan 21 2004 20
Discovery - TCPGOSSIPCan be used by both UDP and TCPExternal GossipServer
org.jgroups.stack.GossipServer Maintains table of <group, members> Each member registers (groupname, own
addr) Lease based - members have to
periodically renew registration Multiple GossipServers possible
EBIG, Oakland Jan 21 2004 21
Discovery - TCPGOSSIP
To obtain initial membership for a given group, TCPGOSSIP contacts the GossipServer
Membership info does not need to be accurate - only goal is to determine coord to send JOIN request to
EBIG, Oakland Jan 21 2004 22
Discovery - TCPPING
Give a set of well known membersFor discovery, those members are
pingedIf at least 1 responds, we can find
the coordinatorDoes not require additional process
EBIG, Oakland Jan 21 2004 23
Group Membership
Task Maintain a list of members Notify members when a new member
joins, or an existing member leaves (or crashes)
Each member has the same ordered list List can be retrieved by Channel.getView() First (= oldest) member is coordinator If coord crashes, 2nd oldest takes over
EBIG, Oakland Jan 21 2004 24
Group Membership - JOIN
New member uses discovery to find coord If first member -> become coord Else: sends JOIN to coord
Coord adds new member to list, multicasts new view (member list) to all members
If 2 initial members are started at the same time, MERGE protocol merges them into a single group
EBIG, Oakland Jan 21 2004 25
Group Membership - LEAVE
Member sends LEAVE to coordCoord multicasts new view to all
members
EBIG, Oakland Jan 21 2004 26
Group membership - CRASH
Failure detection protocol sends up SUSPECT event
VERIFY_SUSPECT double checksGMS multicasts new view (not
containing crashed member)If member resurfaces, it will be
shunned Has to leave and rejoin group
EBIG, Oakland Jan 21 2004 27
Failure detection
Task Detect if a member has crashed and
send SUSPECT event up the stack (to be handled by GMS)
Logical ring over membershipEach member pings its neighbor to the right
EBIG, Oakland Jan 21 2004 28
Failure detection - FD
EBIG, Oakland Jan 21 2004 29
Reliable delivery & FIFO
Lossless and FIFO delivery for multicast and unicast messages Multicast: NAK and ACK Unicast: ACK
Missing messages (gaps) are retransmitted Sender resends or Receiver requests retransmission
EBIG, Oakland Jan 21 2004 30
Encryption
Uses public/private encryption to join new member and get shared group key
Shared key is used to encrypt all messages
Group key is recomputed on joins/leavesSSL ConnectionTable
As alternative, to be used in TCP Uses SSLSocket rather than Socket
EBIG, Oakland Jan 21 2004 31
Properties configuration
Plain string format "UDP(mcast_addr=228.8.8.8;mcast_port=45566;ip_ttl=32;" +
"mcast_send_buf_size=64000;mcast_recv_buf_size=64000):" + "PING(timeout=2000;num_initial_members=3):" + "MERGE2(min_interval=5000;max_interval=10000):" + "FD_SOCK:" + "VERIFY_SUSPECT(timeout=1500):" +
"pbcast.NAKACK(max_xmit_size=8096;gc_lag=50;retransmit_timeout=600,1200,2400):" + "UNICAST(timeout=600,1200,2400,4800):" + "pbcast.STABLE(desired_avg_gossip=20000):" + "FRAG(frag_size=8096;down_thread=false;up_thread=false):" + "pbcast.GMS(join_timeout=5000;join_retry_timeout=2000;" + "shun=false;print_local_addr=true)"
URL / XML
EBIG, Oakland Jan 21 2004 32
Advantages of protocol stacks
Each property is implemented by 1 prot Fragmentation, retransmission, ordering
Protocols are assembled into a stackStack has exactly the properties needed
by the appl / required by the networkCan‘t get this with java.net.Socket,
always comes with full TCP/IP
EBIG, Oakland Jan 21 2004 33
Advantages of protocol stacks
Small scope: a protocol does just one job, but does it well
Protocol stacks are fashionable: Servlet 2.3 filters Interceptors (Corba, JBoss) AOP: separation of concerns, e.g.
fragmentation should not be an application concern
EBIG, Oakland Jan 21 2004 34
Benefits
Same application code, different protocol stacks (deployment issue)
Application requirements reflected in protocol stack specification
App focuses on domain specific issues
EBIG, Oakland Jan 21 2004 35
Building Blocks
Replicated CacheNotificationBusGroup RPC
EBIG, Oakland Jan 21 2004 36
Replicated Cache
Shared state across a groupAny change is replicated to all membersNew members acquire initial state from
coordStructures supported
Tree Hashmap Queues
EBIG, Oakland Jan 21 2004 37
NotificationBus
Thin layer on ChannelNotifications sent to all membersCallback when notification is
receivedHook for state sharing
EBIG, Oakland Jan 21 2004 38
Group RPC
Invoke a method call in all membersGet a list of responsesWait for all responses, majority, first,
or none response (use optional timeout)
Handles crashed members correctly (no blocking)
EBIG, Oakland Jan 21 2004 46
Serverless JMS
JMS based on JGroupsPeer-to-peer architecture rather than C/SClient publishing to a topic
Instead of sending msg to server, and server distributes to multiple clients: publisher multicasts message
JMS Server just another member Handles persistent messages (DB)
EBIG, Oakland Jan 21 2004 47
Serverless JMS
Publisher
JMS Server
Subscriber
Subscriber
Subscriber
Client/Server Model
Publisher
JMS Server
Subscriber
Subscriber
Subscriber
Serverless Model
Multicast
(discard) (accept)
(accept)
(accept)
(accept)
(discard)
Cost: 4 unicasts Cost: 1 multicast
EBIG, Oakland Jan 21 2004 48
Serverless JMS
Clients are still able to publish even when server is down
Caveat: works in scenario where client and server are in same multicast-reachable NW
Status Topics/Queues available No TX/XA, no durable subscriptions, no
persistent messages Download (standalone) beta at jboss.org
EBIG, Oakland Jan 21 2004 52
Where is JGroups used ?
JBoss Clustering
Replication of entity beans, SLSBs and SFSBsHA-JNDICache invalidationSession repl (integrated Tomcat, Jetty)
Serverless JMS Cache
Replicated transactional clustered cache
EBIG, Oakland Jan 21 2004 53
Where is JGroups used ?
Jonas appserver (clustering)GroupPac (FT-CORBA impl)GCT: port to .NETReplicated Caching
OpenSyphony OSCache Jakarta Turbine's JCS Swarmcache
EBIG, Oakland Jan 21 2004 54
Where is JGroups used ?
Session replication Jetty Tomcat 4.x Work in progress on plugin architecture
for Tomcat 5.xUnofficial ones...
EBIG, Oakland Jan 21 2004 55
Performance
4 nodes, 1 or 2 senders750MHz SunBlade 1000 512MB, 100MB
switched ethernetJGroups 2.18000 10K msgs, in 200 bursts of 20 (2
senders), sleep after burst = 5ms 451 msgs/s == 4.5MB/s throughput Resident heap size 35MB max (-Xmx128m)
EBIG, Oakland Jan 21 2004 56
Performance
1.4 billion messages total4 nodes, 2 sendersMessage size = 10KAverage msgs/s: 350Max resident mem: 35M (-Xmx128m)Tests available as part of JG distro
Includes gnuplot scripts to generate graphs
EBIG, Oakland Jan 21 2004 57
Current and future projects
JBossCache, Serverless JMSPort to J2ME (first version available on
www.jgroups-me.org)hsqldb (HyperSonic) database replicationJCache JSR 107 compliant impl (JBoss
Cache)Potential work on GroupComm JSR
jcluster project on dev.java.net
EBIG, Oakland Jan 21 2004 58
Links
www.jgroups.org "Papers and Articles": link to IBM
devworks
Questions ?