Cassandra as an event sourced journal for big data analytics Cassandra Summit 2015

Martin Zapletal @zapletal_martin

Cake Solutions @cakesolutions

#CassandraSummit

Presented by Anirvan Chakraborty @anirvan_c

https://twitter.com/zapletal_martin



● Introduction● Event sourcing and CQRS● An emerging technology stack to handle data● A reference application and it’s architecture● A few use cases of the reference application● Conclusion

● Increasing importance of data analytics● Current state

○ Destructive updates○ Analytics tools with poor scalability and integration○ Manual processes○ Slow iterations○ Not suitable for large amounts of data

● Whole lifecycle of data

● Data processing● Data stores● Integration and messaging● Distributed computing primitives● Cluster managers and task schedulers● Deployment, configuration management and DevOps● Data analytics and machine learning

● Spark, Mesos, Akka, Cassandra, Kafka (SMACK, Infinity)

ACID Mutable State

● Create, Read, Update, Delete● Exposes mutable internal state● Many read methods on repositories● Mapping of data model and objects (impedance mismatch)● No auditing● No separation of concerns (read / write, command / event)● Strongly consistent● Difficult optimizations of reads / writes● Difficult to scale● Intent, behaviour, history, is lost

Balance = 5

Balance = 10

Update Account

Balance = 10

Account

[1]

CQRS

Client

QueryCommand

DBDB

Denormalise/Precompute

Kappa architecture

Batch-Pipeline

Kafka

All

you

r d

ata

NoSQL

SQL

Spark

Client

Client

Client Views

Streamprocessor

Flume

ScoopHive

Impala

Oozie

HDFS

Lambda Architecture

Batch Layer Serving Layer

Stream layer (fast)

Query

Query

All

you

r d

ata

Serving DB

[2, 3]

● Append only data store● No updates or deletes (rewriting history)● Immutable data model● Decouples data model of the application and storage● Current state not persisted, but derived. A sequence of updates that led to it.● History, state known at any point in time● Replayable● Source of truth● Optimisations possible● Works well in distributed environment - easy partitioning, conflicts● Helps avoiding transactions● Works well with DDD

userId date change

1

1

1

10/10/2015

11/10/2015

23/10/2015

+300

-100

-200

1 24/10/2015 +100

balanceChanged

event

balanceChanged

balanceChanged

balanceChanged

Event journal

● Command Query Responsibility Segregation● Read and write logically and physically separated ● Reasoning about the application● Clear separation of concerns (business logic)● Often different technology, scalability● Often lower consistency - eventual, causal

Command

● Write side● Messages, requests to mutate state● Behaviour, serialized method call essentially● Don’t expose state● Validated and may be rejected or emit one or more events (e.g. submitting a form)

Event

● Write side● Immutable● Indicating something that has happened● Atomic record of state change● Audit log

Query

● Read side● Precomputed

userId = 1updateBalance(+100)

Write

Command Event

userId date change

1

1

1

10/10/2015

11/10/2015

23/10/2015

+300

-100

-200

1 24/10/2015 +100

balanceChanged

eventbalanceChanged

balanceChanged

balanceChanged

Event journal

Command handler

Read

balance

1 100

userId = 1balance = 100

Query

userId

● Partial order of events for each entity● Operation semantics, CRDTs

UserNameUpdated(B)

UserNameUpdated(B)

UserNameUpdated(A)

UserNameUpdated(A)

● Localization● Conflicting concurrent histories

○ Resubmission○ Deduplication○ Replication

● Identifier● Version● Timestamp● Vector clock

● Actor framework for truly concurrent and distributed systems● Thread safe mutable state - consistency boundary● Domain modelling, distributed state● Simple programming model - asynchronously send messages, create

new actors, change behaviour● Supports CQRS/ES● Fully distributed - asynchronous, delivery guarantees, failures, time

and order, consistency, availability, communication patterns, data locality, persistence, durability, concurrent updates, conflicts, divergence, invariants, ...

?

?

? + 1

? + 1

? + 2

UserId = 1Name = Bob

BankAccountId = 1Balance = 1000

UserId = 1Name = Alice

● Distributed domain modelling● In memory● Ordering, consistency

id = 1

● Actor backed by data store● Immutable event sourced journal● Supports CQRS (write and read side)

● Persistence, replay on failure, rebalance, at least once delivery

user1, event 2

user1, event 3

user1, event 4

user1, event 1

class UserActor extends PersistentActor {

override def persistenceId: String = UserPersistenceId(self.path.name).persistenceId

override def receiveCommand: Receive = notRegistered(DistributedData(context.system).replicator)

def notRegistered(distributedData: ActorRef): Receive = { case cmd: AccountCommand => persist(AccountEvent(cmd.account)){ acc => context.become(registered(acc)) sender() ! \/-() } }

def registered(account: Account): Receive = { case eres @ EntireResistanceExerciseSession(id, session, sets, examples, deviations) => persist(eres)(data => sender() ! \/-(id)) }

override def receiveRecover: Receive = { ... }}

● Akka Persistence Cassandra journal○ Globally distributed journal○ Scalable, resilient, highly available○ Performant, operational database

● Community plugins

akka {

persistence {

journal.plugin = "cassandra-journal"

snapshot-store.plugin = "cassandra-snapshot-store"

}

}

● Partition-size● Events in each cluster partition ordered (persistenceId - partition pair)

CREATE TABLE IF NOT EXISTS ${tableName} ( processor_id text, partition_nr bigint, sequence_nr bigint, marker text, message blob, PRIMARY KEY ((processor_id, partition_nr), sequence_nr, marker)) WITH COMPACT STORAGE AND gc_grace_seconds = ${config.gc_grace_seconds}

processor_id partition_nr sequence_nr marker message

user-1 0 0 H 0x0a6643b334...

user-1 0 1 A 0x0ab2020801...

user-1 0 2 A 0x0a98020801...

● Internal state, moment in time● Read optimization

CREATE TABLE IF NOT EXISTS ${tableName} ( processor_id text, sequence_nr bigint, timestamp bigint, snapshot blob, PRIMARY KEY (processor_id, sequence_nr)) WITH CLUSTERING ORDER BY (sequence_nr DESC)

processor_id sequence_nr snapshot timestamp

user-1 16 0x0400000001... 1441696908210

user-1 20 0x0400000001... 1441697587765

● Uses Akka serialization

0x0a6643b334 …

PersistentRepr

Akka.Serialization

Payload: T

Protobuffactor {

serialization-bindings {

"io.muvr.exercise.ExercisePlanDeviation" = kryo,

"io.muvr.exercise.ResistanceExercise" = kryo,

}

serializers {

java = "akka.serialization.JavaSerializer"

kryo = "com.twitter.chill.akka.AkkaSerializer"

}

}

class UserActorView(userId: String) extends PersistentView {

override def persistenceId: String = UserPersistenceId(userId).persistenceId

override def viewId: String = UserPersistenceId(userId).persistentViewId

override def autoUpdateInterval: FiniteDuration = FiniteDuration(100, TimeUnit.MILLISECONDS)

def receive: Receive = viewState(List.empty)

def viewState(processedDeviations: List[ExercisePlanProcessedDeviation]): Receive = {

case EntireResistanceExerciseSession(_, _, _, _, deviations) if isPersistent =>

context.become(viewState(deviations.filter(condition).map(process) ::: processedDeviations))

case GetProcessedDeviations => sender() ! processedDeviations

}

}

● Akka 2.4● Potentially infinite stream of data● Ordered, replayable, resumable● Aggregation, transformation, moving data

● EventsByPersistenceId● AllPersistenceids● EventsByTag

val readJournal =

PersistenceQuery(system).readJournalFor(CassandraJournal.Identifier)

val source = readJournal.query(

EventsByPersistenceId(UserPersistenceId(name).persistenceId, 0, Long.MaxValue), NoRefresh)

.map(_.event)

.collect{ case s: EntireResistanceExerciseSession => s }

.mapConcat(_.deviations)

.filter(condition)

.map(process)

implicit val mat = ActorMaterializer()

val result = source.runFold(List.empty[ExercisePlanDeviation])((x, y) => y :: x)

● Potentially infinite stream of events

Source[Any].map(process).filter(condition)

Publisher Subscriber

process

condition

backpressure

● In Akka we have the read and write sides separated, in Cassandra we don’t

● Different data model● Avoid using operational datastore● Eventual consistency● Streaming transformations to different format● Unify journalled and other data

● Computations and analytics queries on the data● Often iterative, complex, expensive computations● Prepared and interactive queries● Data from multiple sources, joins and transformations● Often directly on a stream of data● Whole history of events● Historical behaviour● Works retrospectively, can answer questions in the future that we don’t

know exist yet● Various data types from various sources● Large amounts of fast data● Automated analytics

● Cassandra 3.0 - user defined functions, functional indexes, aggregation functions, materialized views

● Server side denormalization● Eventual consistency● Copy of data with different partitioning

userId

performance

● In memory dataflow distributed data processing framework, streaming and batch

● Distributes computation using a higher level API● Load balancing● Moves computation to data ● Fault tolerant

● Resilient Distributed Datasets● Fault tolerance● Caching● Serialization● Transformations

○ Lazy, form the DAG○ map, filter, flatMap, union, group, reduce, sort, join, repartition, cartesian, glom, ...

● Actions○ Execute DAG, retrieve result○ reduce, collect, count, first, take, foreach, saveAs…, min, max, ...

● Accumulators● Broadcast Variables● Integration● Streaming● Machine Learning● Graph Processing

textFile mapmapreduceByKey

collect

sc.textFile("counts") .map(line => line.split("\t")) .map(word => (word(0), word(1).toInt)) .reduceByKey(_ + _) .collect()

[4]

Spark master

Spark worker

Cassandra

● Cassandra can store● Spark can process

● Gathering large amounts of heterogeneous data● Queries● Transformations● Complex computations● Machine learning, data mining, analytics● Now possible● Prepared and interactive queries

lazy val sparkConf: SparkConf =

new SparkConf()

.setAppName(...).setMaster(...).set("spark.cassandra.connection.host", "127.0.0.1")

val sc = new SparkContext(sparkConf)

val data = sc.cassandraTable[T]("keyspace", "table").select("columns")

val processedData = data.flatMap(...)...

processedData.saveToCassandra("keyspace", "table")

● Akka Analytics project● Handles custom Akka serialization

case class JournalKey(persistenceId: String, partition: Long, sequenceNr: Long)

lazy val sparkConf: SparkConf =

new SparkConf()

.setAppName(...).setMaster(...).set("spark.cassandra.connection.host", "127.0.0.1")

val sc = new SparkContext(sparkConf)

val events: RDD[(JournalKey, Any)] = sc.eventTable()

events.sortByKey().map(...).filter(...).collect().foreach(println)

● Spark streaming● Precomputing using spark or replication often aiming for different data

modelOperational cluster Analytics cluster

Precomputation / replication

Integration with other data sources

val events: RDD[(JournalKey, Any)] = sc.eventTable().cache().filterClass[EntireResistanceExerciseSession].flatMap(_.deviations)

val deviationsFrequency = sqlContext.sql(

"""SELECT planned.exercise, hour(time), COUNT(1)

FROM exerciseDeviations

WHERE planned.exercise = 'bench press'

GROUP BY planned.exercise, hour(time)""")

val deviationsFrequency2 = exerciseDeviationsDF

.where(exerciseDeviationsDF("planned.exercise") === "bench press")

.groupBy(

exerciseDeviationsDF("planned.exercise"),

exerciseDeviationsDF("time”))

.count()

val deviationsFrequency3 = exerciseDeviations

.filter(_.planned.exercise == "bench press")

.groupBy(d => (d.planned.exercise, d.time.getHours))

.map(d => (d._1, d._2.size))

def toVector(user: User): mllib.linalg.Vector =

Vectors.dense(

user.frequency, user.performanceIndex, user.improvementIndex)

val events: RDD[(JournalKey, Any)] = sc.eventTable().cache()

val users: RDD[User] = events.filterClass[User]

val kmeans = new KMeans()

.setK(5)

.set...

val clusters = kmeans.run(users.map(_.toVector))

val weight: RDD[(JournalKey, Any)] = sc.eventTable().cache()

val exerciseDeviations = events

.filterClass[EntireResistanceExerciseSession]

.flatMap(session =>

session.sets.flatMap(set =>

set.sets.map(exercise => (session.id.id, exercise.exercise))))

.groupBy(e => e)

.map(g =>

Rating(normalize(g._1._1), normalize(g._1._2),

normalize(g._2.size)))

val model = new ALS().run(ratings)

val predictions = model.predict(recommend)

bench press

bicep curl

dead lift

user 1 5 2

user 2 4 3

user 3 5 2

user 4 3 1

val events = sc.eventTable().cache().toDF()

val lr = new LinearRegression()

val pipeline = new Pipeline().setStages(Array(new UserFilter(), new ZScoreNormalizer(),

new IntensityFeatureExtractor(), lr))

val paramGrid = new ParamGridBuilder()

.addGrid(lr.regParam, Array(0.1, 0.01))

.addGrid(lr.fitIntercept, Array(true, false))

getEligibleUsers(events, sessionEndedBefore)

.map { user =>

val trainValidationSplit = new TrainValidationSplit()

.setEstimator(pipeline)

.setEvaluator(new RegressionEvaluator)

.setEstimatorParamMaps(paramGrid)

val model = trainValidationSplit.fit(

events,

ParamMap(ParamPair(userIdParam, user)))

val testData = // Prepare test data.

val predictions = model.transform(testData)

submitResult(userId, predictions, config)

}

val events: RDD[(JournalKey, Any)] = sc.eventTable().cache()

val connections = events.filterClass[Connections]

val vertices: RDD[(VertexId, Long)] =

connections.map(c => (c.id, 1l))

val edges: RDD[Edge[Long]] = connections

.flatMap(c => c.connections

.map(Edge(c.id, _, 1l)))

val graph = Graph(vertices, edges)

val ranks = graph.pageRank(0.0001).vertices

7 * Dumbbell Alternating Curl

Data

Data

Preprocessing

Preprocessing

Features

Features

Training

Testing

Error %

● Exercise domain as an example● Analytics of both batch (offline) and streaming (online) data

● Analytics important in other areas (banking, stock market, network, cluster monitoring, business intelligence, commerce, internet of things, ...)

● Enabling value of data

● Event sourcing● CQRS● Technologies to handle the data

○ Spark○ Mesos○ Akka○ Cassandra○ Kafka

● Handling data● Insights and analytics enable value in data

● Jobs at www.cakesolutions.net/careers● Code at https://github.com/muvr ● Martin Zapletal @zapletal_martin ● Anirvan Chakraborty @anirvan_c

http://www.cakesolutions.net/careers

https://github.com/muvr

[1] http://www.benstopford.com/2015/04/28/elements-of-scale-composing-and-scaling-data-platforms/

[2] http://malteschwarzkopf.de/research/assets/google-stack.pdf

[3] http://malteschwarzkopf.de/research/assets/facebook-stack.pdf

[4] http://www.slideshare.net/LisaHua/spark-overview-37479609

http://www.benstopford.com/2015/04/28/elements-of-scale-composing-and-scaling-data-platforms/

http://malteschwarzkopf.de/research/assets/google-stack.pdf

http://malteschwarzkopf.de/research/assets/facebook-stack.pdf

http://www.slideshare.net/LisaHua/spark-overview-37479609

Cassandra as an event sourced journal for big data analytics Cassandra Summit 2015

Software