Top Banner
52

Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

Jan 15, 2015

Download

Technology

In this session, you’ll learn about how Apache Cassandra is used with Python in the NY Times ⨍aбrik messaging platform. Michael will start his talk off by diving into an overview of the NYT⨍aбrik global message bus platform and its “memory” features and then discuss their use of the open source Apache Cassandra Python driver by DataStax. Progressive benchmark to test features/performance will be presented: from naive and synchronous to asynchronous with multiple IO loops; these benchmarks tailored to usage at the NY Times. Code snippets, followed by beer, for those who survive. All code available on Github!
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 2: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

Cassandra python driver Benchmarking concurrency for nyt aбrik⨍[email protected]

Page 3: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 4: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 5: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 6: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 7: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 8: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 9: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 10: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 11: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 12: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

A Global Mesh with a Memory

Message-based: WebSocket, AMQP, SockJS

If in doubt:• Resend• Reconnect• Reread

Idempotent:• Replicating• Racy• Resolving

Classes of service:• Gold: replicate/race• Silver: prioritize• Bronze: queueable

Millions of users

Page 13: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 14: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

Message: an event with data

CREATE TABLE source_data ( hash_key int, -- real ones are more complex message_id timeuuid, body blob, -- whatever metadata text, -- JSON PRIMARY KEY (hash_key, message_id));

Page 15: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 16: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 17: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 18: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

1-10kb

1-10kb

Ack

Ack

Push

Page 19: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

1kb

1kb

10-150kb

10-150kb

Pull

Synchronous:C* Thrift orCQL Native

Page 20: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

ConcurrentDegree = 3

(using theLibev eventLoop)

Asynchronous:CQL Native only

Page 21: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

More Concurrency

Can also try:• DC Aware• Token Aware• Subprocessing

Page 22: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

Build one

def build_message(self): message = { "message_id": str(uuid.uuid1()), "hash_key": randint(0, self._hash_key_range), # int(e ** 8) "app_id": self._app_id, "timestamp": datetime.utcnow().isoformat() + 'Z', "content_type": "application/binary", "body": os.urandom(randint(1, self._body_range)) # int(e ** 9) }

Page 23: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

Kick-off

def push_message(self): if self._submitted_count.next() < self._message_count: message = self.build_message() self.submit_query(message)

def push_initial_data(self): self._start_time = time()

try: with self._lock: for i in range( 0, min(CONCURRENCY, self._message_count) ): self.push_message()

Page 24: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

Put it in the pipeline

def submit_query(self, message): body = message.pop('body')

substitution_args = ( json.dumps(message, **JSON_DUMPS_ARGS), body, message['hash_key'], uuid.UUID(message['message_id']) )

future = self._cql_session.execute_async( self._query, substitution_args )

future.add_callback(self.push_or_finish) future.add_errback(self.note_error)

Page 25: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

Maintain concurrency or finish

def push_or_finish(self, _): try: if ( self._unfinished and self._confirmed_count.next() < self._message_count ): with self._lock: self.push_message() else: self.finish()

Page 26: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

1-10kb

1-10kb

Ack

Ack

Push

Page 27: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 28: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 29: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 30: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 31: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 32: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 33: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 34: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 35: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 36: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 37: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 38: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 39: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

Push some messages

usage: bm_push.py [-h] [-c [CQL_HOST [CQL_HOST ...]]] [-d LOCAL_DC] [--remote-dc-hosts REMOTE_DC_HOSTS] [-p PREFETCH_COUNT] [-w WORKER_COUNT] [-a] [-t] [-n {ONE, TWO, THREE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM, SERIAL, LOCAL_SERIAL, LOCAL_ONE}] [-r] [-j] [-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]

Push messages from a RabbitMQ queue into a Cassandra table.

Page 40: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

Push messages many times

usage: run_push.py [-h] [-c [CQL_HOST [CQL_HOST ...]]] [-i ITERATIONS] [-d LOCAL_DC] [-w [worker_count [worker_count ...]]] [-p [prefetch_count [prefetch_count ...]]] [-n [level [level ...]]] [-a] [-t] [-m MESSAGE_EXPONENT] [-b BODY_EXPONENT] [-l {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}]

Run multiple test cases based upon the product of worker_counts,prefetch_counts, and consistency_levels. Each test case may be run with up to4 variations reflecting the use or not of the dc_aware and token_awarepolicies. The results are output to stdout as a JSON object.

Page 41: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 42: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 43: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform

1kb

1kb

10-150kb

10-150kb

Pull

Page 44: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 45: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 46: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 47: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 48: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 49: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 50: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 51: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform
Page 52: Cassandra Day NY 2014: Apache Cassandra & Python for the The New York Times ⨍aбrik Platform