Top Banner
APACHECON NORTH AMERICA 2013 CASSANDRA INTERNALS Aaron Morton @aaronmorton www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
81

Apache Con NA 2013 - Cassandra Internals

Apr 22, 2015

Download

Technology

aaronmorton

Talk from ApacheCon North America 2013 on Cassandra Internals by Aaron Morton.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Apache Con NA 2013 - Cassandra Internals

APACHECON NORTH AMERICA 2013

CASSANDRA INTERNALS

Aaron Morton@aaronmorton

www.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Page 2: Apache Con NA 2013 - Cassandra Internals

About MeFreelance Cassandra Consultant

Based in Wellington, New ZealandApache Cassandra Committer

Data Stax MVP for Apache Cassandra

Page 3: Apache Con NA 2013 - Cassandra Internals

ArchitectureCode

Page 4: Apache Con NA 2013 - Cassandra Internals

Cassandra Architecture

API's

Cluster Aware

Cluster Unaware

Clients

Disk

Page 5: Apache Con NA 2013 - Cassandra Internals

Cassandra Cluster Architecture

API's

Cluster Aware

Cluster Unaware

Clients

Disk

API's

Cluster Aware

Cluster Unaware

Disk

Node 1 Node 2

Page 6: Apache Con NA 2013 - Cassandra Internals

Dynamo Cluster Architecture

API's

Dynamo

Database

Clients

Disk

API's

Dynamo

Database

Disk

Node 1 Node 2

Page 7: Apache Con NA 2013 - Cassandra Internals

ArchitectureAPI

DynamoDatabase

Page 8: Apache Con NA 2013 - Cassandra Internals

API Transports

ThriftNative Binary

Read LineRMI

Page 9: Apache Con NA 2013 - Cassandra Internals

Thrift Transport

//Custom TServer implementations

o.a.c.thrift.CustomTThreadPoolServero.a.c.thrift.CustomTNonBlockingServero.a.c.thrift.CustomTHsHaServer

Page 10: Apache Con NA 2013 - Cassandra Internals

API Transports

ThriftNative Binary

Read LineRMI

Page 11: Apache Con NA 2013 - Cassandra Internals

Native Binary Transport

Beta in Cassandra 1.2Uses Netty 3.5Enabled with

start_native_transport(Disabled by default)

Page 12: Apache Con NA 2013 - Cassandra Internals

o.a.c.transport.Server.run()

//Setup the Netty servernew ExecutionHandler()new NioServerSocketChannelFactory()ServerBootstrap.setPipelineFactory()

Page 13: Apache Con NA 2013 - Cassandra Internals

o.a.c.transport.Message.Dispatcher.messageReceived()

//Process message from clientServerConnection.validateNewMessage()Request.execute()ServerConnection.applyStateTransition()Channel.write()

Page 14: Apache Con NA 2013 - Cassandra Internals

o.a.c.transport.messages

CredentialsMessage()EventMessage()ExecuteMessage()PrepareMessage()QueryMessage()ResultMessage()

(And more...)

Page 15: Apache Con NA 2013 - Cassandra Internals

Messages

Defined in the Native Binary Protocol

$SRC/doc/native_protocol.spec

Page 16: Apache Con NA 2013 - Cassandra Internals

API Services

JMXCLI

ThriftCQL 3

Page 17: Apache Con NA 2013 - Cassandra Internals

JMX Management Beans

Spread around the code base.

Interfaces named *MBean

Page 18: Apache Con NA 2013 - Cassandra Internals

JMX Management Beans

Registered with the names such as

org.apache.cassandra.db:type=StorageProxy

Page 19: Apache Con NA 2013 - Cassandra Internals

API Services

JMXCLI

ThriftCQL 3

Page 20: Apache Con NA 2013 - Cassandra Internals

o.a.c.cli.CliMain.main()

// Connect to server to read inputthis.connect()this.evaluateFileStatements()this.processStatementInteractive()

Page 21: Apache Con NA 2013 - Cassandra Internals

CLI Grammar

ANTLR Grammar$SRC/src/java/o/a/c/cli/CLI.g

Page 22: Apache Con NA 2013 - Cassandra Internals

o.a.c.cli.CliClient.executeCLIStatement()

// Process statementCliCompiler.compileQuery() #ANTLRswitch (tree.getType()) case...

Page 23: Apache Con NA 2013 - Cassandra Internals

API Services

JMXCLI

ThriftCQL 3

Page 24: Apache Con NA 2013 - Cassandra Internals

o.a.c.thrift.CassandraServer

// Implements Thrift Interface// Access control// Input validation// Mapping to/from Thrift and internal types

Page 25: Apache Con NA 2013 - Cassandra Internals

Thrift Interface

Thrift IDL$SRC/interface/cassandra.thrift

Page 26: Apache Con NA 2013 - Cassandra Internals

o.a.c.thrift.CassandraServer.get_slice()

// get columns for one rowTracing.begin()ClientState cState = state()cState.hasColumnFamilyAccess()multigetSliceInternal()

Page 27: Apache Con NA 2013 - Cassandra Internals

CassandraServer.multigetSliceInternal()

// get columns for may rowsThriftValidation.validate*()// Create ReadCommandsgetSlice()

Page 28: Apache Con NA 2013 - Cassandra Internals

CassandraServer.getSlice()

// Process ReadCommands// return Thrift types

readColumnFamily()thriftifyColumnFamily()

Page 29: Apache Con NA 2013 - Cassandra Internals

CassandraServer.readColumnFamily()

// Process ReadCommands// Return ColumnFamilies

StorageProxy.read()

Page 30: Apache Con NA 2013 - Cassandra Internals

API Services

JMXCLI

ThriftCQL 3

Page 31: Apache Con NA 2013 - Cassandra Internals

o.a.c.cql3.QueryProcessor

// Prepares and executes CQL3 statements// Used by Thrift & Native transports// Access control// Input validation// Returns transport.ResultMessage

Page 32: Apache Con NA 2013 - Cassandra Internals

CQL3 Grammar

ANTLR Grammar$SRC/o.a.c.cql3/Cql.g

Page 33: Apache Con NA 2013 - Cassandra Internals

o.a.c.cql3.statements.ParsedStatement

// Subclasses generated by ANTLR// Tracks bound term count// Prepare CQLStatementprepare()

Page 34: Apache Con NA 2013 - Cassandra Internals

o.a.c.cql3.statements.CQLStatement

checkAccess(ClientState state)validate(ClientState state)execute(ConsistencyLevel cl, QueryState state, List<ByteBuffer> variables)

Page 35: Apache Con NA 2013 - Cassandra Internals

o.a.c.cql3.functions.Function

argsType()returnType()execute(List<ByteBuffer> parameters)

Page 36: Apache Con NA 2013 - Cassandra Internals

statements.SelectStatement.RawStatement

// Implements ParsedStatement// Input validationprepare()

Page 37: Apache Con NA 2013 - Cassandra Internals

statements.SelectStatement.execute()

// Create ReadCommandsStorageProxy.read()

Page 38: Apache Con NA 2013 - Cassandra Internals

ArchitectureAPI

DynamoDatabase

Page 39: Apache Con NA 2013 - Cassandra Internals

Dynamo Layero.a.c.service

o.a.c.neto.a.c.dht

o.a.c.locatoro.a.c.gms

o.a.c.stream

Page 40: Apache Con NA 2013 - Cassandra Internals

o.a.c.service.StorageProxy

// Cluster wide storage operations// Select endpoints & check CL available// Send messages to Stages// Wait for response// Store Hints

Page 41: Apache Con NA 2013 - Cassandra Internals

o.a.c.service.StorageService

// Ring operations// Track ring state// Start & stop ring membership// Node & token queries

Page 42: Apache Con NA 2013 - Cassandra Internals

o.a.c.service.IResponseResolver

preprocess(MessageIn<T> message)resolve() throws DigestMismatchException

RowDigestResolverRowDataResolverRangeSliceResponseResolver

Page 43: Apache Con NA 2013 - Cassandra Internals

Response Handlers / Callback

implements IAsyncCallback<T>

response(MessageIn<T> msg)

Page 44: Apache Con NA 2013 - Cassandra Internals

o.a.c.service.ReadCallback.get()

//Wait for blockfor & datacondition.await(timeout, TimeUnit.MILLISECONDS)

throw ReadTimeoutException()

resolver.resolve()

Page 45: Apache Con NA 2013 - Cassandra Internals

o.a.c.service.StorageProxy.fetchRows()

getLiveSortedEndpoints()new RowDigestResolver()new ReadCallback()MessagingService.sendRR()---------------------------------------ReadCallback.get() # blockingcatch (DigestMismatchException ex)catch (ReadTimeoutException ex)

Page 46: Apache Con NA 2013 - Cassandra Internals

Dynamo Layero.a.c.service

o.a.c.neto.a.c.dht

o.a.c.locatoro.a.c.gms

o.a.c.stream

Page 47: Apache Con NA 2013 - Cassandra Internals

o.a.c.net.MessagingService.verb<<enum>>

MUTATIONREADREQUEST_RESPONSETREE_REQUESTTREE_RESPONSE

(And more...)

Page 48: Apache Con NA 2013 - Cassandra Internals

o.a.c.net.MessagingService.verbHandlers

new EnumMap<Verb, IVerbHandler>(Verb.class)

Page 49: Apache Con NA 2013 - Cassandra Internals

o.a.c.net.IVerbHandler<T>

doVerb(MessageIn<T> message, String id);

Page 50: Apache Con NA 2013 - Cassandra Internals

o.a.c.net.MessagingService.verbStages

new EnumMap<MessagingService.Verb, Stage>(MessagingService.Verb.class)

Page 51: Apache Con NA 2013 - Cassandra Internals

o.a.c.net.MessagingService.receive()

runnable = new MessageDeliveryTask( message, id, timestamp);

StageManager.getStage( message.getMessageType());

stage.execute(runnable);

Page 52: Apache Con NA 2013 - Cassandra Internals

o.a.c.net.MessageDeliveryTask.run()

// If dropable and rpc_timeoutMessagingService.incrementDroppedMessag

es(verb);

MessagingService.getVerbHandler(verb)verbHandler.doVerb(message, id)

Page 53: Apache Con NA 2013 - Cassandra Internals

Dynamo Layero.a.c.service

o.a.c.neto.a.c.dht

o.a.c.locatoro.a.c.gms

o.a.c.stream

Page 54: Apache Con NA 2013 - Cassandra Internals

o.a.c.dht.IPartitioner<T extends Token>

getToken(ByteBuffer key)getRandomToken()

LocalPartitionerRandomPartitionerMurmur3Partitioner

Page 55: Apache Con NA 2013 - Cassandra Internals

o.a.c.dht.Token<T>

compareTo(Token<T> o)

BytesTokenBigIntegerTokenLongToken

Page 56: Apache Con NA 2013 - Cassandra Internals

Dynamo Layero.a.c.service

o.a.c.neto.a.c.dht

o.a.c.locatoro.a.c.gms

o.a.c.stream

Page 57: Apache Con NA 2013 - Cassandra Internals

o.a.c.locator.IEndpointSnitch

getRack(InetAddress endpoint)getDatacenter(InetAddress endpoint)sortByProximity(InetAddress address,

List<InetAddress> addresses)

SimpleSnitchPropertyFileSnitchEc2MultiRegionSnitch

Page 58: Apache Con NA 2013 - Cassandra Internals

o.a.c.locator.AbstractReplicationStrategy

getNaturalEndpoints( RingPosition searchPosition)calculateNaturalEndpoints(Token searchToken, TokenMetadata tokenMetadata)

SimpleStrategyNetworkTopologyStrategy

Page 59: Apache Con NA 2013 - Cassandra Internals

o.a.c.locator.TokenMetadata

BiMultiValMap<Token, InetAddress> tokenToEndpointMapBiMultiValMap<Token, InetAddress> bootstrapTokensSet<InetAddress> leavingEndpoints

Page 60: Apache Con NA 2013 - Cassandra Internals

Dynamo Layero.a.c.service

o.a.c.neto.a.c.dht

o.a.c.locatoro.a.c.gms

o.a.c.stream

Page 61: Apache Con NA 2013 - Cassandra Internals

o.a.c.gms.VersionedValue

// VersionGenerator.getNextVersion()

public final int version;public final String value;

Page 62: Apache Con NA 2013 - Cassandra Internals

o.a.c.gms.ApplicationState<<enum>>

STATUSLOADSCHEMADCRACK

(And more...)

Page 63: Apache Con NA 2013 - Cassandra Internals

o.a.c.gms.HeartBeatState

//VersionGenerator.getNextVersion();

private int generation;private int version;

Page 64: Apache Con NA 2013 - Cassandra Internals

o.a.c.gms.Gossiper.GossipTask.run()

// SYN -> ACK -> ACK2makeRandomGossipDigest()new GossipDigestSyn()

// Use MessagingService.sendOneWay()Gossiper.doGossipToLiveMember()Gossiper.doGossipToUnreachableMember()Gossiper.doGossipToSeed()

Page 65: Apache Con NA 2013 - Cassandra Internals

gms.GossipDigestSynVerbHandler.doVerb()

Gossiper.examineGossiper()new GossipDigestAck()MessagingService.sendOneWay()

Page 66: Apache Con NA 2013 - Cassandra Internals

gms.GossipDigestAck2VerbHandler.doVerb()

Gossiper.notifyFailureDetector()Gossiper.applyStateLocally()

Page 67: Apache Con NA 2013 - Cassandra Internals

ArchitectureAPI Layer

Dynamo LayerDatabase Layer

Page 68: Apache Con NA 2013 - Cassandra Internals

Database Layero.a.c.concurrent

o.a.c.db

o.a.c.cacheo.a.c.io

o.a.c.trace

Page 69: Apache Con NA 2013 - Cassandra Internals

o.a.c.concurrent.StageManager

stages = new EnumMap<Stage, ThreadPoolExecutor>(Stage.class);

getStage(Stage stage)

Page 70: Apache Con NA 2013 - Cassandra Internals

o.a.c.concurrent.Stage

READMUTATIONGOSSIPREQUEST_RESPONSEANTI_ENTROPY

(And more...)

Page 71: Apache Con NA 2013 - Cassandra Internals

Database Layero.a.c.concurrent

o.a.c.db

o.a.c.cacheo.a.c.io

o.a.c.trace

Page 72: Apache Con NA 2013 - Cassandra Internals

o.a.c.db.Table

// Keyspaceopen(String table)getColumnFamilyStore(String cfName)

getRow(QueryFilter filter)apply(RowMutation mutation, boolean writeCommitLog)

Page 73: Apache Con NA 2013 - Cassandra Internals

o.a.c.db.ColumnFamilyStore

// Column FamilygetColumnFamily(QueryFilter filter)getTopLevelColumns(...)

apply(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer)

Page 74: Apache Con NA 2013 - Cassandra Internals

o.a.c.db.IColumnContainer

addColumn(IColumn column)remove(ByteBuffer columnName)

ColumnFamilySuperColumn

Page 75: Apache Con NA 2013 - Cassandra Internals

o.a.c.db.ISortedColumns

addColumn(IColumn column, Allocator allocator)removeColumn(ByteBuffer name)

ArrayBackedSortedColumnsAtomicSortedColumnsTreeMapBackedSortedColumns

Page 76: Apache Con NA 2013 - Cassandra Internals

o.a.c.db.Memtable

put(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer)

flushAndSignal(CountDownLatch latch, Future<ReplayPosition> context)

Page 77: Apache Con NA 2013 - Cassandra Internals

Memtable.FlushRunnable.writeSortedContents()

// SSTableWritercreateFlushWriter()

// Iterate through rows & CF’s in orderwriter.append()

Page 78: Apache Con NA 2013 - Cassandra Internals

o.a.c.db.ReadCommand

getRow(Table table)

SliceByNamesReadCommandSliceFromReadCommand

Page 79: Apache Con NA 2013 - Cassandra Internals

o.a.c.db.IDiskAtomFilter

getMemtableColumnIterator(...)getSSTableColumnIterator(...)

IdentityQueryFilterNamesQueryFilterSliceQueryFilter

Page 80: Apache Con NA 2013 - Cassandra Internals

Thanks.

Page 81: Apache Con NA 2013 - Cassandra Internals

Aaron Morton@aaronmorton

www.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License