1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8 Achieving the performance benefits of Infiniband in Java Mark Falco Oracle Coherence Development
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Achieving the performance benefits of Infiniband in Java
Mark Falco Oracle Coherence Development
2 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
The following is intended to outline general product use and direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
3 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Exalogic / Exabus
4 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Exalogic - Hardware
• 24 cores • 96GB RAM • 30 compute nodes in a full rack • QDR Infiniband
5 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Infiniband
• High throughput (~32gbs in QDR) • Low latency (~1us) • Super Jumbo Frames (MTU 64KB) • Supports standard IP stack (UDP/TCP) • Verbs based API • Remote Direct Memory Access (RDMA)
– pre-registered memory accessible to remote machines – operates without involving host CPU
6 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Exabus - Exalogic I/O and Network Design Eliminates cloud, cluster and network virtualization I/O bottlenecks
Data Center Service Network (10GbE)
Management Network (GbE)
Data Center Mgmt Network (GbE)
10GbE
GbE
Ethernet Gateway Switches
Standard Oracle
Database
Exabus (InfiniB
and I/O B
ackplane)
Exadata Exalogic
SPARC SuperCluster
Management Switch Storage
Compute Nodes
…
Spine Switch
Exalogic X2-2
Copyright © 2011 Oracle Corpora4on
ZFS Storage
IB
7 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Exabus - Optimizations Direct Memory I/O for Java
• New Java APIs and Exalogic Elastic Cloud Software - Low Latency Java support for Infiniband - Optimized implementation for Exalogic Infiniband
• Surfacing low-level advanced networking capabilities
8 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Infiniband - Socket Direct Protocol
• Streaming sockets API, i.e. SOCK_STREAM • Easily integrated into TCP based applications • zero-copy or kernel-bypass • Java availability
– Proprietary in JDK6 – Standard in JDK7
9 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Infiniband - Coherence Integration
• Initially attempted over standard UDP – Experimented with TCP/SDP
• Required many co-located nodes to utilize bandwidth – Dozens in order to max out HCA
• Latencies – Large objects: benefit from Infiniband without protocol change – Small objects: on-par with standard ethernet (300-600us)
10 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
MessageBus
• Binary low-level message transport – Multi-point addressing – Reliable ordered delivery – Asynchronous event based programming model
• Pluggable provider based framework – SocketBus (TCP/SDP) – Native RDMA Exabus
11 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Exabus - MessageBus Next-generation of Exalogic performance optimization
New for Exalogic V1.1 Exalogic V1.0
IB Transport APIs MessageBus SDP
Coherence WebLogic
Tuxedo
InfiniBand Core
Hardware and Firmware
EoIB
Any Linux or
Solaris App.
TCP/IP
IPoIB
Na4ve RDMA MessageBus
12 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
MessageBus - API public interface MessageBus {!
void setEventCollector(Collector<Event> collector);!
void open();!
void close();!
void connect(EndPoint peer);!
void disconnect(EndPoint peer);!
void release(EndPoint peer);!
void flush();!
void send(EndPoint peer, BufferSequence buf, Object receipt);!
}!
13 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
MessageBus - Events Event Indicates
OPEN Start of bus event stream
CLOSE End of bus event stream
CONNECT Start of per-‐connec4on event stream
DISCONNECT End of confirmed delivery per-‐connec4on event stream
RELEASE End of per-‐connec4on event stream
MESSAGE Local message delivery
RECEIPT Message delivery confirma4on
BACKLOG_EXCESSIVE Start of backlog condi4on
BACKLOG_NORMAL End of backlog condi4on
14 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
MessageBus - Native RDMA
• Zero-copy and kernel-bypass • Optimized for sender latency • Predictive notifications avoid costly interrupts • Asynchronous task based system manages protocol • Custom DirectByteBuffer
– allows for zero-copy – reduces GC pressure
15 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
Message Transfer - Native RDMA Receiver
Message
RDMA Write Header
Collector
Sender
Collector
Delivery
Message
RDMA Write Receipt
Ring Buffer
Ring Buffer
Delivery
RDMA Read Body
Allocation
16 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
MessageBus - Coherence Integration
• Pluggable message transport • MessageBus per service
– Legacy system utilized a single transport for entire JVM
• Increased Parallel Processing – Network I/O – Message Deserialization
• Message Delivery - Java context switches 1 vs. 3 – Potential for zero context switches
17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
MessageBus - Coherence Integration
Member 1
PartitionedCacheService
(Cache: D, E, F)
MessageBustmb://
192.168.1.2:8000.2
PartitionedCacheService
(Cache: A, B, C)
MessageBustmb://
192.168.1.1:8000.1
InvocationService
MessageBustmb://
192.168.1.2:8000.3
Member 2
PartitionedCacheService
(Cache: D, E, F)
MessageBustmb://
192.168.1.2:8001.2
PartitionedCacheService
(Cache: A, B, C)
MessageBustmb://
192.168.1.1:8001.1
InvocationService
MessageBustmb://
192.168.1.2:8001.3
Member 3
PartitionedCacheService
(Cache: D, E, F)
MessageBustmb://
192.168.1.2:8002.2
PartitionedCacheService
(Cache: A, B, C)
MessageBustmb://
192.168.1.1:8002.1
InvocationService
MessageBustmb://
192.168.1.2:8002.3
Exabus RDMA
18 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8
MessageBus - Coherence Integration
• The network is no longer the bottleneck • Measured Improvements
– small number of nodes can max out HCA – latencies reduced to ~100us RDMA Bus, ~200us SocketBus
• Future direction – more MessageBusses per service – prototyped solution drops latency down to 70us – designs to drop latency to 40us