Accelerating MySQL in Open Source Hyperscale Systems Doubling MySQL performance with Native Flash Access Torben Mathiasen Percona Live 2013 – Santa Clara
Jun 10, 2015
Accelerating MySQL in Open Source Hyperscale Systems
Doubling MySQL performance with Native Flash Access
Torben Mathiasen
Percona Live 2013 – Santa Clara
2
The Hyperscale Market
• Characterized by distributed scale-out architecture
• Maximize compute efficiency with high volume servers
• Cost effective on both the hardware and software side
• Always looking for improvements. Even small changes can have a huge impact on overall efficiency
3
Hyperscale continued..
Scale-out for different purposes• Increase DRAM• Increase CPU core count• Increase I/O performance with more storage devices
Scaling out does increase complexity• Software architecture and efficiency• Network latency and cost
• Often not using full hardware resources per box
4
Hyperscale software
▸ Distributed data processing
• Hadoop, Hbase, MySQL, memcached, Cassandra, mongodb, etc (this is a long list)
• Open-source software
▸ Some key characteristics
• Often scales horizontally
• Many of them are built for spindles (minimize random I/O)
• Moving to optimize for seek-less storage
• Often run as multiple instances on the same box
5
Application performance determine value
Linux Kernel (schedulers, I/O path, syscalls)
File System (XFS, Ext4, Btrfs, DirectFS)
Apps (MySQL, HBase, persistent memcached)
6
Application Performance research at Fusion-io
▸Remember, efficiency per box matters
• Reducing CPU
• Improving application I/O engines
• Reducing application required writes
• Lower application latency
• Hardware footprint
7
App value starts with low latency
▸ Time to submit and complete a single I/O
▸ Also impacts asynchronous I/O at low queue depth
▸ Important for any application (besides block benchmarks)
▸ MySQL usually sees <32 outstanding requests
▸ Bandwidth rarely the limiting factor
8
directFS: Direct File System
▸Appears as Linux file system• Provides performance to applications “as is” • Focuses only on file namespace
▸Employs existing flash translation layer for:• Large virtualized addressed space• Direct flash access• Crash recovery mechanisms
▸Exports primitives through file namespace• Application access through directFS or straight to
device
9
File System Efficiency
Native Flash Translation Layerblock allocation, mapping, recycling
ACID updates, logging/journaling, crash-recovery
directFSfile metadata mgmt
Kernel block layer
kernel-space
user-space
Ext3 file metadata mgmt,
block allocation, mapping, recycling,ACID updates, logging/journaling, crash-recovery
Primitive Interfaces
Application
Linux VFS (virtual file system) abstraction layer
10
directFS: Speed Through Simplicity
directFS
ReiserFS
Ext4
Btrfs
XFS
0 10000 20000 30000 40000 50000 60000 70000
L INES OF CODE
11
Achieving Raw Device Performance
▸directFS with Atomic Writes benefits • File system convenience • Performance of simple
writes to raw device
▸ Parity indicates significantly more functionality without any performance impactB
an
dw
idth
(M
iB/s
)I/O Size
Block directFS
8 Threads
12
Fusion-io Software Development Kit
Traditional Storage
Proprietary Storage OS
Storage Media
Native Flash Translation Layer
Storage Media
Software Defined Storage
Applications
Block I/O Block I/O Enhanced I/OAtomic Writes / directFS
Key-Value Store API
Memory AccessExtended Memory
Auto Commit Memory
13
Percona Server and MariaDB
▸Efficient XtraDB storage engine
▸Well optimized for seek-less storage like flash
▸Many config parameters to fine-tune performance
▸What else can be done?• Lock contention can still be improved as seen by using
multiple instances with the same storage device• Tapping into the native performance of flash by
exposing key FTL features to the application
14
Atomic writes
▸ File system ioctl() tells DirectFS that all I/O to this file should be treated as atomic
▸ Works with both synchronous and async io_submit() interface
▸ Provides low latency async IO with atomic guarantees
▸ Minimal application changes required
15
MySQL Writes Comparison
Traditional MySQL Writes MySQL with Atomic Writes
Page CPage
B
Page A
Buffer
DRAMBuffer
SSD (or HDD) Database
Database Server
Page C
Page B
Page A
Page C
Page B
Page A
Page C
Page B
Page A
Application initiates updates to pages A, B, and C.
1
MySQL copies updated pages to memory buffer.
2
MySQL writes to double-write buffer on the media.
3
Once step 3 is acknowledged, MySQL writes the updates to the actual tablespace.
4
ioMemory Database
Page C
Page B
Page A
DRAMBuffer
Page C
Page B
Page A
Application initiates updates to pages A, B, and C.
1
MySQL copies updated pages to memory buffer.
2
MySQL writes to actual tablespace, bypassing the double-write buffer step due to inherent atomicity guaranteed by the intelligent device.
3
Database Server
Page CPage
B
Page A
16
Minimal application change
#define DFS_IOCTL_ATOMIC_WRITE_SET _IOW(0x95, 2, uint)
if (srv_use_atomic_writes && type == OS_DATA_FILE
&& os_file_set_atomic_writes(file, name)) {
close(file);
*success = FALSE;
file = -1;
}
int ret = ioctl (file, DFS_IOCTL_ATOMIC_WRITE_SET, &atomic_option);
17
Atomic benchmarks
First, lets sum up the MySQL benefits here:
• Writing only 50% of the data otherwise required for ACID compliance
That’s pretty much it…but it gives us▸ Twice the flash endurance▸ Much better latency because of fewer syscalls▸ Much better application throughput due to less I/O▸ Better concurrency due to fewer locks
18
Atomics TPC-C throughput
19
Atomics latency
20
Percona’s testing of atomics
21
Flash Evolution Details
FLASH AS DISK
Application
Application source code converts native data structures into block I/O
Conventional I/O Access
Block I/O
Proprietary Storage OS
FLASH BEYOND DISK
Application
Application source code does I/O with native data structures
Native: Enhanced I/O
Atomic I/OTransaction
Key-ValueTransaction
User-DefinedObject
Transaction
Open Interface Layer
FLASH AS MEMORY
Application
Application source code manipulates native data structures
directly in persistent memory
Native: Persistent Memory
High-speedLogging
MemoryTransaction
Checkpointed Memory
Open Interface Layer
22
Fusion-io advanced developmentStorage Class Memory
Small Capacity DRAM (volatile)$$/GB Big Capacity Flash
$/GB
Memory-speed persistenceByte-addressable vs. block addressable
Small Capacity SCM(persistent)Server Virtual
Memory
Server
23
SCM research
▸Lets look at keeping a database log using memory semantics
▸Goal is to reduce latency, cost of flushing data to a persistent state and further minimize writes
▸SCM testing using modified Innosim toool
24
SCM logger interface
▸ logger_open()
Open and initialize logging infrastructure within the FTL
▸ logger_close()
Clean-up
▸ logger_append()
Append to head of log at memory speeds. This basically translates to a memcpy()
▸ logger_sync()
Serialize data using assembler ‘mfence’ instruction
25
Practical Database Use Case: MySQL
B a s e l i n e S c e n a r i o 1 S c e n a r i o 2
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
8000
16,000 15,750
INN
OS
IM O
PS
/SE
C
Nearly as fast as disabling
the transaction log completely.
Log transaction through block I/O No Logging Log to Fusion-io ACM
26
High-Speed Transaction Logging
▸ 64KB of persistent memory
used as head of log
▸ 2x throughput increase for
update intensive transaction
workloads (MySQL innosim)
▸ 30% reduction in writes to
flash
▸ Performance very close to
disabling logs entirely 1 4 8 16 32 64
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
Innosim Reads (update intensive transaction
workload)
ACM txlog/binlog
Regular txlog/binlog
Disable txlog/binlog
Number of Users
Op
s/s
27
The coming shift in software development
▸As an SSD, flash accelerates applications.
At full maturity, Non-Volatile Memorywill transform software development.
28
Async atomics Availability
▸ Percona Server dev : 5.5.31/32
▸ MariaDB mainline: 5.5.31
▸ DirectFS drop available next week• Register at developer.fusionio.com
• Download at support.fusionio.com
▸ Oracle InnoDB atomics patch on launchpad next week
29
Hyperscale native flash benefits
• Avoid scaling out for DRAM• More work per node, less nodes• Less DRAM per node for the same workload• Less DRAM per node may mean lower cost servers
Continue to work with vendors to
• Improve application efficiency• Move applications to be fully flash aware• Further allow application access to the FTL (ptrim, etc).
MySQL Atomics and Storage Class Memory shown at Fusion-io booth!
f u s i o n i o . c o m | R E D E F I N E W H A T ’ S P O S S I B L E
T H A N K Y O U
Click to download the Whitepaper
MySQL Low Latency and High Throughputwith directFS and Atomics