Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Time Series Data With A Apache Cassandra

ApacheCon EuropeNovember 18, 2014

Eric Evanseevans@opennms.org

@jericevans

Network

Management

System

OpenNMS: What It Is

● Network Management System○ Discovery and Provisioning○ Service monitoring○ Data collection○ Event management, notifications

● Java, open source, GPLv3● Since 1999

Time series: RRDTool

● Round Robin Database● First released 1999● Time series storage● File-based, constant-size, self-maintaining● Automatic, incremental aggregation

… and oh yeah, graphing

Consider

● 5+ IOPs per update (read-modify-write)!● 100,000s of metrics, 1,000s IOPS● 1,000,000s of metrics, 10,000s IOPS● 15,000 RPM SAS drive, ~175-200 IOPS

We collect and write a great deal; We read (graph) relatively little.

So why are we aggregating everything?

● Not everything is a graph● Inflexible● Incremental backups impractical● Availability subject to filesystem access

Metrics typically appear in groups that are accessed together.

Optimizing storage for grouped access is a great idea!

What OpenNMS needs:

● High throughput● High availability● Late aggregation● Grouped storage/retrieval

Cassandra

● Distributed database● Highly available● High throughput● Tunable consistency

SSTables

Writes

Commitlog

Memtable

SSTable

Memory

Write Properties

● Optimized for write throughput● Sorted on disk● Perfect for time series!

Partitioning AZ

Key: Apple

Placement

Key: Apple

Replication

Key: Apple

CAP Theorem

Consistency

Availability

Partition tolerance

Consistency

R+W > N?

Distribution Properties

● Symmetrical● Linearly scalable● Redundant● Highly available

D ata odelM

Data Model

resource

Data Model

resource

T1 T2 T3

Data Model

resource

Data Model

CREATE TABLE samples ( T timestamp,

M text,

V double,

resource text,

PRIMARY KEY(resource, T, M));

Data model

V1T1 M1 V2T1 M2 T1 V3M3resource

Data model

SELECT * FROM samples

WHERE resource = ‘resource’

AND T = ‘T1’;

Data model

T1 M1 V1resource

Data model

T1 M1 V1

T1 M2 V2

resource

Data model

T1 M1 V1

T1 M2 V2

T1 M3 V3

resource

Data model

SELECT * FROM samples

WHERE resource = ‘resource’

AND T >= ‘T1’ AND T <= ‘T3’;

● Standalone time series data-store○ REST server○ Java API

● Raw sample storage and retrieval● Flexible aggregations (computed at read)

○ Rate (counter types)○ Pluggable aggregation functions○ Arbitrary calculations

● Search-enabled● Fast; Runs at Cassandra-speed● Apache licensed● Github (http://github.com/OpenNMS/newts)● http://newts.io

Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Technology

ApacheCon 2017: InnerSource and The Apache Way

[ApacheCon 2016] Advanced Apache Cordova

Diaposotivas apache-cassandra

Apache iBatis (ApacheCon US 2007)

Inﬁnite’SessionClustering’with’...

Introduction to Apache jclouds at ApacheCon 2014

Apache Student Induction ApacheCon 2013

Apache Olingo - ApacheCon Denver 2014

Cassandra concepts, patterns and anti-patterns - ApacheCon.....

Conhecendo Apache Cassandra Meetup - Conhecendo … · Eiti...

Rethinking Topology In Cassandra (ApacheCon NA)

Apachecon cassandra transport

Apache Cassandra in Action - O'Reilly...

Introduction to Cassandra • Why Spark - Apache Cassandra |...

ApacheCon 2014: Infinite Session Clustering with Apache...

DevCenter Apache cassandra