Cassandra Day Atlanta 2015: Software Development with Apache Cassandra: A Walkthrough

Post on 15-Jul-2015

296 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

Transcript

CASSANDRA DAY ATLANTA 2015

SOFTWARE DEVELOPMENT WITH CASSANDRA:A WALKTHROUGH

Nate McCall@zznate

#CassandraDaysCo-Founder & Sr. Technical Consultant

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

About The Last Pickle.

Work with clients to deliver and improve Apache Cassandra based solutions.

Based in New Zealand & USA.

OVERVIEW

Overview:

What makes a software development

project successful?

Overview: Successful Software Development

- it ships- maintainable- good test coverage- check out and build

Overview:

Impedance mismatch:distributed systems

developmenton a laptop.

DATA MODELING

Data Modeling:

… a topic unto itself.But quickly:

Data Modeling - Quickly

• It’s Hard• Do research• #1 performance problem• Tip: don’t “port” your schema

Data Modeling - Using CQL:

• tools support• easy tracing (and trace discovery)• documentation*

*Maintained in-tree:https://github.com/apache/cassandra/blob/cassandra-1.2/doc/cql3/CQL.textilehttps://github.com/apache/cassandra/blob/cassandra-2.0/doc/cql3/CQL.textilehttps://github.com/apache/cassandra/blob/cassandra-2.1/doc/cql3/CQL.textile

Data Modeling - DevCenter :

Tools:DataStax DevCenter

http://www.datastax.com/what-we-offer/products-services/devcenter

WRITING CODE

Writing Code:

ORM?maybe - only if it’s very simple

more later…

http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/reference/crudOperations.html

Writing Code:

use CQL

Writing Code:

Use the Java Driver

Writing Code - Java Driver :

• Reference implementation• Well written, extensive coverage• open source• dedicated resourceshttps://github.com/datastax/java-driver/

Writing Code - Java Driver :

Existing Spring Users:Spring Data Integration

http://projects.spring.io/spring-data-cassandra/

Writing Code - Java Driver :

Guice Users:“GuicyFig:”

Archaius + Guice

https://stash.safehaus.org/projects/GFIG/repos/main/browse

Writing Code - Java Driver :

Four rules for Writing Code• one Cluster for physical cluster• one Session per app per keyspace• use PreparedStatements • use Batches to reduce network IO

Writing Code - Java Driver :

Configuration is Similar to Other DB Drivers(with caveats**)

http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/clusterConfiguration_c.html

Writing Cluster - Java Driver - Configuration:

Major Difference:it’s a Cluster!

Writing Code - Java Driver - Configuration:

Two groups of configurations

• policies• connections

Writing Code - Java Driver - Configuration:

Three Policy Types:• load balancing• connection• retry

Writing Code - Java Driver - Configuration:

Connection Options:• protocol*• pooling• socket

*https://github.com/apache/cassandra/blob/cassandra-2.1/doc/native_protocol_v3.spec

Writing Code - Java Driver :

Embrace Asynchronicity(but use RxJava)

https://github.com/ReactiveX/RxJava

Writing Code - Java Driver :

A note about User Defined Types (UTDs)

Writing Code - Java Driver - Using UDTs:

Wait.- serialized as blobs !!?!- new version already being discussed*- will be a painful migration path

* https://issues.apache.org/jira/browse/CASSANDRA-7423

Writing Code:

Metrics API for your own code

https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/metrics/ColumnFamilyMetrics.javahttps://dropwizard.github.io/metrics/3.1.0/

Writing Code - Instrumentation via Metrics API:

Run Riemann locally

http://riemann.io/

Writing Code:

Using Trace (and doing so frequently)

Writing Code - Tracing:

Trace per query via DevCenter

http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html

Writing Code - Tracing:

Trace per query via cqlsh

http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html

cqlsh> tracing on;Now tracing requests.cqlsh> SELECT doc_version FROM data.documents_by_version ... WHERE application_id = myapp ... AND document_id = foo ... AND chunk_index = 0 ... ORDER BY doc_version ASC ... LIMIT 1;

doc_version------------- 65856

Tracing session: 46211ab0-2702-11e4-9bcf-8d157d448e6b

Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817…

Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817…

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

!!?!

… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592

Writing Code - Tracing:

Enable traces in the driver

http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/tracing_t.html

Writing Code - Tracing:

`nodetool settraceprobability`

Writing Code - Tracing:

…then make sure you try it again

with a node down!

Writing Code - Tracing:

Final note on tracing:do it sparingly

Writing Code - Tracing:

Coming Soon:slow query log

(client side)

https://github.com/datastax/java-driver/compare/java646https://datastax-oss.atlassian.net/browse/JAVA-646

Writing Code:

Logging Verbositycan be changed dynamically**

** since 0.4rc1

http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configLoggingLevels_r.html

Writing Code:

nodetool for developers• cfstats• cfshistograms• proxyhistograms

Writing Code - nodetool - cfstats:

cfstats:per-table statistics about size

and performance (single most useful command)

Writing Code - nodetool - cfhistograms:

cfhistograms:column count and partition size vs. latency distribution

Writing Code - nodetool - proxyhistograms:

proxyhistograms:performance of inter-cluster

requests

MANAGING ENVIRONMENTS

Managing Environments:

Configuration Management is Essential

Managing Environments:

Laptop to Productionwith NO

Manual Modifications!

Managing Environments:

Running Cassandraduring development

Managing Environments - Running Cassandra:

Local Cassandra• easy to setup• you control it • but then you control it!

Managing Environments - Running Cassandra:

CCM• supports multiple versions• clusters and datacenters• up/down individual nodeshttps://github.com/pcmanus/ccm

Managing Environments - Running Cassandra:

Vagrant• isolated, controlled environment• configuration mgmt integration• same CM for production!

http://www.vagrantup.com/

server_count = 3network = '192.168.2.'first_ip = 10

servers = []seeds = []cassandra_tokens = [](0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63}end

server_count = 3network = '192.168.2.'first_ip = 10

servers = []seeds = []cassandra_tokens = [](0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63}end

server_count = 3network = '192.168.2.'first_ip = 10

servers = []seeds = []cassandra_tokens = [](0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63}end

chef.json = { :cassandra => {'cluster_name' => 'VerifyCluster', 'version' => '2.0.8', 'setup_jna' => false, 'max_heap_size' => '512M', 'heap_new_size' => '100M', 'initial_token' => server['initial_token'], 'seeds' => "192.168.2.10", 'listen_address' => server['ip'], 'broadcast_address' => server['ip'], 'rpc_address' => server['ip'], 'conconcurrent_reads' => "2", 'concurrent_writes' => "2", 'memtable_flush_queue_size' => "2", 'compaction_throughput_mb_per_sec' => "8", 'key_cache_size_in_mb' => "4", 'key_cache_save_period' => "0", 'native_transport_min_threads' => "2", 'native_transport_max_threads' => "4" }, }

TESTING

Testing:

Use a Naming Scheme

• *UnitTest.java: no external resources• *ITest.java: uses external resources• *PITest.java: safely parallel “ITest”

Testing:

Tip: wildcards on the CLI

are not a naming schema.

Testing:

Group tests into

logical units (“suites”)

Testing - Suites:

Benefits of Suites:• share test data• share Cassandra instance(s)• build profiles

<profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>

<profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>

Testing - Suites:

Using annotations for suites in code

Testing - Suites:

Interesting test plumbing• [Before|Afer]Suite• [Before|After]Group• Listeners

Testing:

Use Mocks where possible

Testing:

Unit Integration Testing

Testing:

Verify Assumptions:test failure scenarios

explicitly

Testing - Integration:

Runtime Integrations:• local • in-process• forked-process

Testing - Integration - Runtime:

EmbeddedCassandra

https://github.com/jsevellec/cassandra-unit/

Testing - Integration - Runtime:

ProcessBuilder to fork Cassandra(s)

Testing - Integration - Runtime:

CCMBridge:delegate to CCM

https://github.com/datastax/java-driver/blob/2.1/driver-core/src/test/java/com/datastax/driver/core/CCMBridge.java

Testing - Integration - Runtime:

Vagrant:delegate to vagrant cli

Testing - Integration:

Best Practice:Jenkins should be able to

manage your cluster

Testing - Integration - Best Practices:

Vagrant vs. CCMBridge?

• choice of style, really• developer integration with CM• what else is in the architecture?

Testing:

Load Testing Goals• reproducible metrics• catch regressions• test to breakage point

Testing - Load Testing:

Stress.java(lot’s of changes recently)

https://www.datastax.com/documentation/cassandra/2.1/cassandra/tools/toolsCStress_t.htmlhttp://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema

Testing - Load Testing:

Workload recording and playback coming soon

one day

https://issues.apache.org/jira/browse/CASSANDRA-8929

Testing:

Primary testing goal:Don’t let

cluster behavior surprise you.

Summary:• Go slowly with bite sized chunks• Segment your tests and use build profiles• Monitor and Instrument• Use reference implementation drivers• Control your environments• Verify any assumptions about failures

Thanks.

Nate McCall@zznate

Co-Founder & Sr. Technical Consultantwww.thelastpickle.com

#CassandraDays

top related