Pure Genius: How To Get Mainframe-Like Scalability & Availability For Midrange DB2 The Information Management Specialists Availability For Midrange DB2 James Gill & Julian Stuhler GSE November 2010
Jun 08, 2015
Pure Genius: How To Get
Mainframe-Like Scalability &
Availability For Midrange DB2
The Information Management Specialists
Availability For Midrange DB2
James Gill & Julian Stuhler
GSE November 2010
Agenda
• pureScale Overview
� Why should you care?
The Information Management Specialists
� Architectural overview
• Experiences
� Triton’s pureScale environment
� Installation
� Performance
� Resilience
Why pureScale?
• Availability � Any system outage has direct impact on profitability and
customer retention
The Information Management Specialists
customer retention
� Serving multiple geographies makes planned downtime more difficult
• Agility / Scalability � Almost every business has major workload spikes, with
significant unused capacity at other times
� Need to be able to rapidly scale up/down in a cost effective way, with little or no change needed to the application
What is pureScale?
• Optional feature for DB2 for Linux, UNIX and Windows� Current support is for AIX on System p and
SUSE Linux on IBM System x servers only
The Information Management Specialists
SUSE Linux on IBM System x servers only
• Implements a shared-disk clustering solution to support high scalability and availability� Up to 128 members in initial release
• Based on proven “data sharing” technology used in DB2 for z/OS for past 15 years
• Capacity-based charging model allows cluster to be easily expanded/contracted
• Little or no application change required
pureScale Architecture
Primary CF Secondary CF
GBP
GLM SCA
The Information Management Specialists
Member A
Shared Database
Member B Member C
IB Interconnect
Architecture - Members
Agents and threads
dbheap,
The Information Management Specialists
Shared Database
Log buffers
Bufferpools
Logs
dbheap, sort heap, etc
Architecture - CFs
SCA
The Information Management Specialists
SCA
GBP
Directory
Data Pgs
Index Pgs
GLM
Hash table
Lock
entries
Architecture - Infiniband
• Low latency (1 – 1.3 microseconds)
• High speed (300Gb/s – EDR 12x)
The Information Management Specialists
• High speed (300Gb/s – EDR 12x)
• RDMA – remote direct memory access
� NIC managed
►No processor interrupt
► 5 – 30 microsecond access time
Architecture – Infiniband 2
The Information Management Specialists
www.infinibandta.org/content/pages.php?pg=technology_overview
pureScale Scalability
10.410
12
14T
hro
ug
hp
ut
v 1
Me
mb
er
The Information Management Specialists
11.98
3.9
7.6
0
2
4
6
8
0 2 4 6 8 10 12 14
Th
rou
gh
pu
t v
1 M
em
be
r
Number of pureScale Members
Source: Internal IBM Lab Tests
Practical Experiences
• Triton’s Commodity Cluster
• IBM’s nano-cluster
The Information Management Specialists
• IBM’s nano-cluster
• Architecture
• Installation
• Performance
• Resilience
Triton’s Commodity Cluster
• Objectives� Undertake basic validation of IBM’s performance &
scalability claims
The Information Management Specialists
scalability claims
� Build technical experience in a pureScale environment and establish platform for ongoing R&D
� Assist IBM with early beta testing
• Constraints� Budget < £1K
� Easily portable for customer demos etc.
Commodity Cluster - Architecture
CF
The Information Management Specialists
Node 1 Node 2
1GB Ethernet
iSCSI
NAS
TE App
Server
Triton’s Commodity Cluster
• 2 member nodes and one CF
• Each node:� Intel D510M0 (Dual core 1GHz Atom)
The Information Management Specialists
� Intel D510M0 (Dual core 1GHz Atom)
� 4GB RAM
� 40GB SSD
• Shared disk� iSCSI 1TB (QNAP TS110)
• DB2 9.8 pureScale FP2 development image
• Technology Explorer used for workload and monitoring� www.sourceforge.net/projects/db2mc
IBM nanoCluster - Architecture
CF
node101 node102
App
Servers
CF
node103
The Information Management Specialists
CF
1GB Ethernet
GPFS
Disk
member
CF
member
IBM’s pureScale nanoCluster
• 2 combined member and CF nodes
• One shared disk and app server tier node
• Each node:
The Information Management Specialists
• Each node:
� Intel D510M0 (dual core 1GHz Atom)
� 4GB RAM
� Disk
► 40GB SSD (pureScale nodes)
► 100GB 7200rpm SATA (shared disk/app server node)
Triton pureScale Experiences
• Installation experiences
� SLES 10 requires
The Information Management Specialists
� SLES 10 requires
► compat-libstdc++-5.0.7-22.2.x86_64.rpm
� db2cluster resolves iSCSI mount issues
� Ensure FQDN names in /etc/hosts
� Very slick considering the component count
Triton pureScale Experiences
• Commodity Cluster Performance
� Technology Explorer
The Information Management Specialists
� Technology Explorer
►WMD Java workload driver (WLB enabled)
► 2.5M row table
►Vanilla installation
► 32 threads, 25ms think time
� Delivered 1000tps @ 95%CPU load
• 14 simulated client connections
• WLB ACR enabled
• 1ms think time
• 250,000 row table
Performance - nanoCluster
The Information Management Specialists
• Delivering c. 5500tps @ 50% CPU load
Workload Balancing (WLB)
• Members track and share available capacity
� db2pd –serverlist
The Information Management Specialists
� db2pd –serverlist
• Slipstreamed periodically to clients
� DB2 9.7 FP1 or higher
• Transaction workload balance
� On UR boundaries
WLB
• db2pd –serverlist
Database Member 0 -- Active -- Up 0 days 00:20:43
The Information Management Specialists
Database Member 0 -- Active -- Up 0 days 00:20:43
Server List:
Time: Tue Nov 2 07:26:54
Database Name: DTW
Count: 2
Hostname Non-SSL Port SSL Port Priority
node102.purescale.demo 50001 0 52
node103.purescale.demo 50001 0 47
Resilience – CF Failure
• Simulate failure of the primary CFdb2lco@node102:~> db2instance -list
ID TYPE STATE HOME_HOST CURRENT_HOST ALERT PARTITION_NUMBER
The Information Management Specialists
ID TYPE STATE HOME_HOST CURRENT_HOST ALERT PARTITION_NUMBER
-- ---- ----- --------- ------------ ----- ----------------
0 MEMBER STARTED node102 node102 NO 0
1 MEMBER STARTED node103 node103 NO 0
128 CF PRIMARY node102 node102 NO -
129 CF PEER node103 node103 NO -
HOSTNAME STATE INSTANCE_STOPPED ALERT
-------- ----- ---------------- -----
node103 ACTIVE NO NO
node102 ACTIVE NO NO
db2lco@node102:~> ps -ef | grep ca-server
db2lco 4158 4153 0 05:30 ? 00:00:31 ca-mgmnt-lwd -i128 -p56000 -k8521f27a -d/home/db2lco/sqllib/db2dump -
e/home/db2lco/sqllib/cf/ca-server -f
db2lco 4164 4158 18 05:30 ? 00:12:41 /home/db2lco/sqllib/cf/ca-server -i 128 -p 56000 -k 1474562 -s 0 -f
db2lco 32120 21933 0 06:37 pts/0 00:00:00 grep ca-server
db2lco@node102:~> kill -9 4164
Resilience – CF Failure - Impact
The Information Management Specialists
pureScale Summary
• Robust clustering technology based on a proven architecture� Scalability
The Information Management Specialists
� Scalability
� Resilience
• No code change to scale out
• Excellent price/performance characteristics
• Initial customer implementations are under way
• What are you waiting for?
Feedback / Questions
James Gill – [email protected]
The Information Management Specialists
James Gill – [email protected]
Julian Stuhler – [email protected]
www.triton.co.uk
pureScale webcast series each Tuesday