Top Banner
44

Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Jun 23, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault
Page 2: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

ScyllaDB: Achieving No-CompromisePerformance

Avi Kivity, CTO@AviKivity(Hiring!)

Page 3: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Agenda

BackgroundGoalsMethodsConclusion

Page 4: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Non-Agenda

● Docker● Microservices● Node.js● Docker

● Orchestration● JVM GC Tuning● JSON over HTTP● Docker

Page 5: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

More Non-Agenda

● Cache lines, coherency protocols● NUMA● Algorithms are the only thing that matters,

everything else is implementation detail● Docker

Page 6: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Background - ScyllaDB

● Clustered NoSQL database compatible with Apache Cassandra

● ~10X performance on same hardware● Low latency, esp. higher percentiles● Self tuning● C++14, fully asynchronous; Seastar!

Page 7: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

YCSB Benchmark:3 node Scylla cluster vs 3, 9, 15, 30Cassandra machines

3 Scylla30 Cassandra

3 Cassandra

3 Scylla

30 Cassandra

3 Cassandra

Page 8: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Log-Structured Merge Tree

SStable 1

SStable 2

SStable 3Tim

e

SStable 4

SStable 5SStable 1+2+3

Foreground Job Background Job

Page 9: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

High-level Goals

● Efficiency:○ Make the most out of every cycle

● Utilization:○ Squeeze every cycle from the machine

● Control○ Spend the cycles on what we want, when we want

Page 10: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Characterizing the problem

● Large numbers of small operations ○ Make coordination cheap

● Lots of communications○ Within the machine○ With disk○ With other machines

Page 11: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Asynchrony,Everywhere

Page 12: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault
Page 13: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault
Page 14: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

● Thread-per-core design○ Never block

● Asynchronous networking● Asynchronous file I/O● Asynchronous multicore

General Architecture

Page 15: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Scylla has its own task schedulerTraditional stack Scylla’s stack

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise is a pointer to eventually computed value

Task is a pointer to a lambda function

Scheduler

CPU

Scheduler

CPU

Scheduler

CPU

Scheduler

CPU

Scheduler

CPU

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread is a function pointer

Stack is a byte array from 64k to megabytes

Context switch cost is

high. Large stacks pollutes

the caches No sharing, millions of

parallel events

Page 16: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

The Concurrency Dilemma

Page 17: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Fundamental performance equation

Concurrency = Throughput * Latency

Page 18: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Fundamental performance equation

Throughput = Concurrency

Latency

Page 19: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Fundamental performance equation

Latency = Concurrency

Throughput

Page 20: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Lower bounds for concurrency

● Disks want minimum iodepth for full throughput (heads/chips)

● Remote nodes need concurrency to hide network latency and their own min. concurrency

● Compute wants work for each core

Page 21: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Results of Mathematical Analysis

● Want high concurrency (for throughput)● Want low concurrency (for latency)● Resources require concurrency for full

utilization

Page 22: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Sources of concurrency

● Users○ Reduce concurrency / add nodes

● Internal processes○ Generate as much concurrency as possible○ Schedule

Page 23: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Resource Scheduling

Sch

edul

er

Storage

8

User read

User write

Compaction (internal)

Streaming (internal)

30

12

50

50

Page 24: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Why not the Linux I/O scheduler?

● Can only communicate priority by originating thread

● Will reorder/merge like crazy● Disable

Page 25: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Figuring out optimal disk concurrency

Max useful disk concurrency

Page 26: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Cache design

Cache files or objects?

Page 27: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Using the kernel page cache

● 4k granularity● Thread-safe● Synchronous APIs● General-purpose● Lack of control (1)● Lack of control (2)

● Exists● Hundreds of

hacker-years● Handling lots of edge

cases

Page 28: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Unified cacheCassandra Scylla

Key cache

Row cache

On-heap /Off-heap

Linux page cache

SSTables

Unified cache

SSTables

TuningParasitic rowsPage faults

App thread

Kernel

SSD

Page faultSuspend thread

Initiate I/OContext switch

I/O completesInterruptContext switch

Map pageResume thread

SSTable page (4k)

Your data (300b)

Page 29: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Workload Conditioning

Page 30: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Workload Conditioning• Internal feedback loops to balance competing loads

Memtable

Seastar SchedulerCompaction

Query

Repair

Commitlog

SSD

Compaction Backlog Monitor

Memory Monitor

Adjust priorityAdjust priority

WAN

CPU

Page 31: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Replacing the system memory allocator

Page 32: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

System memory allocator problems

● Thread safe● Allocation back pressure

Page 33: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Seastar memory allocator

● Non-Thread safe!○ Each core gets a private memory pool

● Allocation back pressure○ Allocator calls a callback when low on memory○ Scylla evicts cache in response

Page 34: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

One allocatoris not enough

Page 35: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Remaining problems with malloc/free

● Memory gets fragmented over time○ If workload changes sizes of allocated objects

● Allocating a large contiguous block requires evicting most of cache

Page 36: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

OOM :(Memory

Page 37: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Log-structured memory allocation

● The cache○ Large majority of memory allocated○ Small subset of allocation sites

● Teach allocator how to move allocated objects around○ Updating references

Page 38: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Log-structured memory allocation

Fancy Animation

Page 39: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Future Improvements

Page 40: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Userspace TCP/IP stack

● Thread-per-core design● Use DPDK to drive hardware● Present as experimental mode

○ Needs more testing and productization

Page 41: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

Query Compilation to Native Code

● Use LLVM to JIT-compile CQL queries● Embed database schema and internal

object layouts into the query

Page 42: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

● Full control of the software stack can generate big payoffs

● Careful system design can maximize throughput● Without sacrificing latency● Without requiring endless end-user tuning● While having a lot of fun

Conclusions

Page 43: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

● Download: http://www.scylladb.com● Twitter: @ScyllaDB● Source: http://github.com/scylladb/scylla● Mailing lists: scylladb-user @ groups.google.com● Company site & blog: http://www.scylladb.com

How to interact

Page 44: Performance ScyllaDB: Achieving No-Compromise · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Parasitic rowsPage faultsTuning App thread Kernel SSD Page fault

THE SCYLLA IS THE LIMITThank you.