Top Banner
CASSANDRA INTERNALS OVERVIEW DATASTAX BOOTCAMP 2015 Sam Tunnicliffe [email protected] / @beobal
41

Cassandra Internals Overview

Apr 12, 2017

Download

Technology

beobal
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cassandra Internals Overview

CASSANDRAINTERNALS OVERVIEW

DATASTAX BOOTCAMP 2015Sam Tunnicliffe

[email protected] / @beobal

Page 2: Cassandra Internals Overview

OVERVIEWSystem startupMessagingGossipSchema PropagationRequest Coordination

Page 3: Cassandra Internals Overview

STARTUPorg.apache.cassandra.service.CassandraDaemon

protected void setup()

Load config

Run preflight checks

Load schema

Clean up local temporary state

Recover CommitLog

Schedule background compactions

Initialize storage service

Page 4: Cassandra Internals Overview

PREFLIGHT CHECKSSane clockJNIJVM & InstrumentationFilesystem permissionsSystem keyspace statusUpgrades (#8049)Incompatible SSTables (#8049)

Page 5: Cassandra Internals Overview

STARTUPorg.apache.cassandra.service.CassandraDaemon

protected void setup()

Load config

Run pre-flight checks

Load schema

Clean up local temporary state

Recover CommitLog

Schedule background compactions

Initialize storage service

Page 6: Cassandra Internals Overview

CLEAN UP LOCAL STATETruncate compactions_in_progressScrub data directories

Page 7: Cassandra Internals Overview

STARTUPorg.apache.cassandra.db.commitlog.CommitLog

public int recover() throws IOException

Load config

Run pre-flight checks

Load schema

Clean up local temporary state

Recover CommitLog

Schedule background compactions

Initialize storage service

Page 8: Cassandra Internals Overview

INITIALIZE STORAGE SERVICEorg.apache.cassandra.service.StorageService

public synchronized void initServer() throws ConfigurationException

Load ring state (unless don't)

Start gossip & get initial ring info

Set tokens

Page 9: Cassandra Internals Overview

BOOTSTRAPAbort if other range movements happening

Fetch bootstrap data

Build secondary indexes

Page 10: Cassandra Internals Overview

INITIALIZE STORAGE SERVICELoad ring state (unless don't)

Start gossip & get initial ring info

Set tokens

Setup auth resources

Ensure gossip stabilized

Page 11: Cassandra Internals Overview

STARTUPLoad config

Run preflight checks

Load schema

Clean up local temporary state

Recover CommitLog

Schedule background compactions

Initialize storage service

Page 12: Cassandra Internals Overview

-- it is done --

STARTUP

Page 13: Cassandra Internals Overview

MESSAGINGSERVICEorg.apache.cassandra.net.MessagingService

Low level one-way messagingpublic void sendOneWay(MessageOut message, InetAddress to)

Async Request/Responsepublic int sendRR(MessageOut message, InetAddress to, IAsyncCallback cb)

Page 14: Cassandra Internals Overview

MESSAGINGSERVICEorg.apache.cassandra.net.MessagingService

Readspublic int sendRRWithFailure(MessageOut message,

                             InetAddress to, 

                             IAsyncCallbackWithFailure cb)

Writespublic int sendRR(MessageOut<? extends IMutation> message,

                  InetAddress to,

                  AbstractWriteResponseHandler handler,

                  boolean allowHints)

Page 15: Cassandra Internals Overview

MESSAGINGSERVICEPre-emptively drops messages when overwhelmed

Dropped if time at execution > send time + timeout

Timeout value dependant on message type

Most client-initated requests can be dropped

(see MessagingService.DROPPABLE_VERBS)

Page 16: Cassandra Internals Overview

GOSSIPWhat it does do:

Disseminates members' state around the clusterVersioned: generation (per JVM) & version (per value)Heartbeats: incremented every gossip roundApplication state:

StatusTokensRelease & schema versionDC & RackAddressesData sizeHealth

Page 17: Cassandra Internals Overview

GOSSIPWhat doesn't it do:

Notify about up or down nodesPropagate schemaTransmit data filesDistribute mutations

Page 18: Cassandra Internals Overview

GOSSIP

https://wiki.apache.org/cassandra/ArchitectureGossip

Page 19: Cassandra Internals Overview

GOSSIPorg.apache.cassandra.gms.Gossiper

private class GossipTask implements Runnable

{

    public void run()

    {...

Each round (1 second) gossip to:

1 live endpointmaybe 1 unreachable endpointmaybe 1 seed - if neither of the above

Page 20: Cassandra Internals Overview

SCHEMA MIGRATIONAnother custom protocol

Also uses MessagingService

Target schema objects serialized as Mutations

diff/merge schema representations

Page 21: Cassandra Internals Overview

SCHEMA PUSHorg.apache.cassandra.service.MigrationManager

private static Future<?> announce(final Collection<Mutation> schema)

Page 22: Cassandra Internals Overview

SCHEMA PULLorg.apache.cassandra.service.MigrationManager

public void scheduleSchemaPull(InetAddress endpoint, EndpointState state)

Page 23: Cassandra Internals Overview

Client request arrives at coordinator:

COORDINATION

Transformed into actionable command(s):

IReadCommandIMutation

Coordinator distributes execution around the cluster

Replicas perform commands and respond to coordinator

Gather responses and determine client response

Page 24: Cassandra Internals Overview

COORDINATIONorg.apache.cassandra.service

StorageProxyAbstractWriteResponseHandlerAbstractReadExecutor

org.apache.cassandra.locatorAbstractReplicationStrategyIEndpointSnitch

Page 25: Cassandra Internals Overview

https://wiki.apache.org/cassandra/ArchitectureInternals

COORDINATING WRITESorg.apache.cassandra.service.StorageProxy

public static void mutate(Collection<? extends IMutation> mutations, 

                          ConsistencyLevel consistency_level)

Get endpoints using replication strategy

Get pending endpoints from ring metadata

Deliver mutations to both sets of endpoints

Collate responses & determine client response

Maybe store local hints for unreachable replicas

Page 26: Cassandra Internals Overview

DATA REPLICATIONorg.apache.cassandra.locator.SimpleStrategy

Page 27: Cassandra Internals Overview

DATA REPLICATIONorg.apache.cassandra.locator.NetworkTopologyStrategy

Page 28: Cassandra Internals Overview

https://wiki.apache.org/cassandra/ArchitectureInternals

COORDINATING WRITESorg.apache.cassandra.service.StorageProxy

public static void mutate(Collection<? extends IMutation> mutations, 

                          ConsistencyLevel consistency_level)

Get endpoints using replication strategy

Get pending endpoints from ring metadata

Deliver mutations to both sets of endpoints

Collate responses & determine client response

Maybe store local hints for unreachable replicas

Page 29: Cassandra Internals Overview

DELIVERING MUTATIONSorg.apache.cassandra.service.StorageProxy

public static void sendToHintedEndpoints(final Mutation mutation,

                                         Iterable<InetAddress> targets,

                                         AbstractWriteResponseHandler responseHandler,

                                         String localDataCenter)

Mutations sent to replicas using MessagingService

ResponseHandler registered as callback

Callback registry triggers an event on expiry

Sent directly within local datacenter

Forwarded via single node in each remote DC

Page 30: Cassandra Internals Overview

COORDINATING WRITESorg.apache.cassandra.service.StorageProxy

public static void mutate(Collection<? extends IMutation> mutations, 

                          ConsistencyLevel consistency_level)

Get endpoints using replication strategy

Get pending endpoints from ring metadata

Deliver mutations to both sets of endpoints

Collate responses & determine client response

Maybe store local hints for unreachable replicas

Page 31: Cassandra Internals Overview

HINTSNodes can be down

Writes may timeout

In which case we may hint

Enabled/disabled globally or enabled per-DC

Writing a hint counts towards ConsistencyLevel.ANY

Deliver hints when a node comes back up & periodically

Too many hints in progress for a replica means we bail early

Page 32: Cassandra Internals Overview

Determine point of failure by WriteType

LOGGED BATCHESorg.apache.cassandra.service.StorageProxy

public static void mutateAtomically(Collection<Mutation> mutations, 

                                    ConsistencyLevel consistency_level)

CommitLog for batches

Guarantee eventual success of batched statements

Strives to distribute to across racks in local DC

On success, cleanup log entries asynchronously

Failed batches replayed by the nodes holding the logs

WriteType.BATCH_LOGWriteType.BATCH

Page 33: Cassandra Internals Overview

COORDINATING READSorg.apache.cassandra.service.StorageProxy

public static List<Row> read(List<ReadCommand> commands, 

                             ConsistencyLevel consistencyLevel, 

                             ClientState state)

Partition based reads

Read Repair & Data vs Digest Requests

Rapid Read Protection & (non)speculating executors

Distribution is more slightly complex than for writes

Page 34: Cassandra Internals Overview

IDENTIFY TARGET ENDPOINTSorg.apache.cassandra.service.AbstractReadExecutor

public static AbstractReadExecutor getReadExecutor(ReadCommand command, 

                                                   ConsistencyLevel consistencyLevel)

Use replication strategy to get live endpoints

Snitch sorts by proximity & health of replicas

Consult table metadata for Read Repair Decision

Page 35: Cassandra Internals Overview

READ REPAIR DECISIONApply filter to sorted list of all live replicas

NONE: closest n replicas required by CLGLOBAL: all live replicasDC_LOCAL: all local replicas

Add closest n remotes needed to satisfy CLDefault Global Chance: 0.0Default Local Chance: 0.1

Give us a list of replicas to send read requests

Page 36: Cassandra Internals Overview

RAPID READ PROTECTIONNever

Always

Fixed timeout

Table latency percentile

Page 37: Cassandra Internals Overview

LIGHTS, CAMERA, EXECUTIONFire off each command using read executor

Requests are sent via MessagingService

Closest replica(s) sent full data requests

Others get digest requests

Page 38: Cassandra Internals Overview

RESOLUTIONResolution can have two outcomes:

Page 39: Cassandra Internals Overview

RESOLUTIONDigestMismatchException

Trigger a foreground read repairOf all targetted replicas

Page 40: Cassandra Internals Overview

FOREGROUND READ REPAIRAll data requests, no digests

Includes replicas contacted initially

Effectively ConsistencyLevel.ALL

Specialized resolver: RowDataResolver

Retry any short reads

May also perform background Read Repair

Page 41: Cassandra Internals Overview

OVERVIEW OVER