Top Banner
SCYLLA: NoSQL at Ludicrous Speed 主讲人:ScyllaDB软件工程师 贺俊
37

SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Jun 23, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

SCYLLA:

NoSQL at Ludicrous Speed

主讲人:ScyllaDB软件工程师 贺俊

Page 2: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Today we will cover:

+ Intro: Who we are, what we do, who uses it

+ Why we started ScyllaDB

+ Why should you care

+ How we made design decisions to achieve no-compromise performance and availability

Page 3: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

+ Founded by KVM hypervisor creators+ Q2 2014 - Pivot to the database world+ Q3 2015 - Decloak during Cassandra Summit 2015, Beta+ Q1 2016 - General Availability+ Q3 2016 - First Scylla Summit: 100+ Attendees+ Q1 2017 - Completed B round+ $25MM in funding+ HQs: Palo Alto, CA; Herzelia, Israel+ 42+ employees, hiring!

Introduction

Page 4: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Why?@#$%$$%^?

Page 5: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Scylla benchmark by Samsung

op/s

Page 6: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

What we do: Scylla, towards the best NoSQL

+ > 1 million OPS per node

+ < 1ms 99% latency

+ Auto tuned

+ Scale up and out

+ Open source

+ Large community (piggyback on Cassandra)

+ Blends in the ecosystem- Spark, Presto, time series, search, ..

Page 7: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Where Scylla is deployed?

Page 8: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Today we will cover:

+ Intro: Who we are, what we do, who uses it

+ Why we started ScyllaDB

+ Why should you care

+ How we made design decisions to achieve no-compromise performance and availability

Page 9: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Why we started Scylla?

+ Originally it was about performance/efficiency only+ Over time, we understood we can deliver more:

+ SLA between background and foreground tasks+ Work well on any given hardware {back pressure}+ Deliver consistent, low 99th percentile latency+ Reduction in admin effort+ Low latency under the face of failures (hot cache load balancing)+ High observability

Page 10: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Cassandra Scylla

Throughput: Cannot utilize multi-core efficiently Scales linearly - shard-per-core

Latency: High due to Java and JVM’s GC Low and consistent - own cache

Complexity: Intricate tuning and configuration Auto tuned, dynamic scheduling

Admin: Maintenance impacts performance SLA guarantee for admin vs serving

Page 11: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Today we will cover:

+ Intro: Who we are, what we do, who uses it

+ Why we started ScyllaDB

+ Why should you care

+ How we made design decisions to achieve no-compromise performance and availability

Page 12: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Case study: Document column family

• Outbrain is the world’s largest content discovery platform.

• Over 557 million unique visitors from across the globe.

• 250 billion personalized content recommendations every month.

Page 13: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Outbrain: Cassandra plus Memcache

• First read from memcached, go to Cassandra on misses.

• Pain: 1) Stale data from cache 2) Complexity 3) Cold cache -> C* gets full volume

ReadMicroservice

Write Process

Page 14: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Scylla/Cassandra side by side deployment

• Writes are written in parallel to C* and Scylla

• Reads are done in parallel:

1) Memcached + Cassandra 2) Scylla (no cache at all)

ReadMicroservice

Write Process

Page 15: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Scylla (w/o cache) vs Cassandra + Memcached

15

Scylla Cassandra Diff%

Requests/Minute

12,000,000 500,000(memcache

handles 11,500,000)

24X

AVG Latency

4 ms 8 ms 2X

Max Latency

8 ms 35 ms 3.5X

Hardware 9 machines 30+9 machines

4.3X

Cassandra’s latency

Scylla’s latency

Page 16: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

What does it mean for a non Cassandra user?

+ Throughput, latency and scale benefits+ Wide range of big data integration: {Kariosdb, Spark,

JanusGraph, Presto, Kafka, Elastic}+ Best HA/DR in the industry. + Stop using caches in front of the database+ Consolidate HBase, Redis, MySQL, Mongo and others

Page 17: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Assorted Quotes

Page 18: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Today we will cover:

+ Intro: Who we are, what we do, who uses it

+ Why we started ScyllaDB

+ Why should you care

+ How we made design decisions to achieve no-compromise performance and availability

Page 19: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Design decisions: #1 The trivials

Page 20: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

• SSTable file format• Configuration file format• CQL language• CQL native protocol• JMX management protocol• Management command line

Design decisions: #2 Compatibility

Page 21: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Double cluster - Migration w/o downtime

AppCassandra

Scylla

CQLproxy

Page 22: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Design decisions: #3 All things async

Page 23: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Design decisions: #4 Shard per core

Threads Shards

Page 24: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

SCYLLA DB: Network Comparison

Kernel

Cassandra

TCP/IPScheduler

queuequeuequeuequeuequeuethreads

NICQueues

Kernel

Traditional stack Scylla sharded stack

Memory

Application

TCP/IP

Task Schedulerqueuequeuequeuequeuequeuesmp queue

NICQueue

DPDK

Kernel (isn’t

involved)

Userspace

Application

TCP/IP

Task Schedulerqueuequeuequeuequeuequeuesmp queue

NICQueue

DPDK

Kernel (isn’t

involved)

Userspace

Application

TCP/IP

Task Schedulerqueuequeuequeuequeuequeuesmp queue

NICQueue

DPDK

Kernel (isn’t

involved)

Userspace

CoreDatabase

Task Scheduler

queuequeuequeuequeuequeuesmp queue

NICQueue

Userspace

Page 25: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Scylla has its own task schedulerTraditional stack Scylla’s stack

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise

Task

Promise

Task

Promise

Task

Promise

Task

CPU

Promise is a pointer to eventually computed value

Task is a pointer to a lambda function

Scheduler

CPU

Scheduler

CPU

Scheduler

CPU

Scheduler

CPU

Scheduler

CPU

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Thread is a function pointer

Stack is a byte array from 64k to megabytes

Page 26: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

SCYLLA IS DIFFERENT

p DMAp Log structured

merge treep DBaware cachep Userspace I/O

scheduler

p NUMA friendlyp Log structured

allocatorp Zero copy

p Thread per corep Lock-freep Task schedulerp Reactor programingp C++14

p Multi queue p Poll modep Userspace TCP/IP

Page 27: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Scylla vs C* latency by Kenshoo

Page 28: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Design Decision: #5 Unified cacheCassandra Scylla

Key cache

Row cache

On-heap /Off-heap

Linux page cache

SSTables

Unified cache

SSTables

Complex Tuning

Page 29: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Cassandra Streaming configuration

Design decisions: #6 I/O scheduler

Page 30: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Scylla I/O Scheduling

Query

Commitlog

Compaction

Queue

Queue

Queue

UserspaceI/O

SchedulerDisk

Max useful disk concurrency

I/O queued in FS/deviceNo queues

Page 31: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

I/O scheduler result by Kenshoo

Page 32: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Memtable

Seastar SchedulerCompaction

Query

Repair

Commitlog

SSD

Compaction Backlog Monitor

Memory Monitor

Adjust priority

Adjust priority

WAN

CPU

Design Decision: #7 Workload conditioning

Page 33: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Workload Conditioning in practice

Disk can’t keep up:workload conditioning will figure out the right request rate

Page 34: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Upcoming releases+ Enterprise release, based on 1.6+ 1.7 - May 2017

▪ Counters

▪ New intra-node sharding algorithm

▪ SStableloader from 2.2/3.x

▪ Debian

+ 2.0 – Sep 2017▪ Materialized views

▪ Execution blocks (cpu cache optimization which boost performance)

▪ Partial row cache (for wide row streaming)

▪ Heat Weighted Load Balancing

Page 35: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Vertical HorizontalCoredatabase

Scylla Beyond Cassandra

Page 36: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

Q&A

Resources

slideshare.net/ScyllaDB

[email protected] (@DorLaor)

[email protected] (@AviKivity)

@scylladb

http://bit.ly/2oHAfok

youtube.com/c/scylladbgithub.com/scylladb/scylla

scylladb.com/blog

Page 37: SCYLLA: NoSQL at Ludicrous Speedreport.idx365.com/TalkingData/【T112017-数据... · On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning. Cassandra

THANKSSCYLLA: NoSQL at Ludicrous Speed