Chronix as Long-Term Storage for Prometheus

Post on 16-Apr-2017

668 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

Transcript

Chronix as long term storage for Prometheus

Florian Lautenschlager, Moritz Kammerer

@flolaut, @phxql

Prometheus

Cloud Native Application

Cloud Native ApplicationCloud Native Application

Cloud Native Application

Cloud Native ApplicationCloud Native Application

Real-time monitoring and alerting for cloud native apps to detect

anomalies close to their occurrence and to initiate measures.

TIMENOW 14 Days

Beyond real-time monitoring of cloud native apps?

Nothing more to do?

Prometheus

Cloud Native Application

Cloud Native ApplicationCloud Native Application

Cloud Native Application

Cloud Native ApplicationCloud Native Application

TIMENOW THEN

Real-time monitoring and alerting for cloud native apps to detect

anomalies close to their occurrence and to initiate measures.Lossless long term storage to store

data forever allowing analyses

beyond real-time monitoring!

Chronix

Agenda

■ Some words about Chronix, its Architecture, its Features, and its Performance.

■ How did we built the integration with Prometheus.

■ Showcase: Prometheus, Chronix Ingester, Chronix, and Grafana

Chronix is more than just a simple time series database. It’s a time series processing tool stack for all purposes.

Time Series Database: What’s that?

■ Definition 1: “A sample s is a tuple of {timestamp, value}, where the

value could be any kind of object.”

■ Definition 2: “A time series T is an arbitrary list of chronological

ordered samples of one value type”.

■ Definition 3: “A chunk C is a chronological ordered part of a time

series.”

■ Definition 4: “A time series database TSDB is a specialized database

for storing and retrieving time series in an efficient and optimized

way”.

s

{t,v}

1T

{s1,s2}

TCT

T1

C1,1

C1,2

TSDBT3C2,2

T1 C2,1

Chronix’ architecture enables both efficient storage of time series and millisecond range queries.

(1)

Semantic Transformation

(2)

Attributes and Chunks

(3)

Basic Compression

(4)

Multi-Dimensional

Storage

Record

data:<chunk>

attributes

Record

data:compressed

<chunk>

attributes

Record Storage

68 Billion Points1 Mio. Chunks *

68.000 Points~ 96% Compression

Optional

The key data type of Chronix is called a record. It stores a compressed time series chunk and its attributes.

record{

data:compressed{<chunk>}

//technical fields id: 3dce1de0−...−93fb2e806d19 version: 1501692859622883300 start: 1427457011238 end: 1427471159292

//optional attributes host: prodI5 process: scheduler group: jmxmetric: heapMemory.Usage.Usedmax: 896.571

}

Data:compressed{<chunk of time series data>}

■ Time Series: timestamp, numeric value

■ Traces: calls, exceptions, …

■ Logs: access, method runtimes

■ Complex data: models, test coverage,

anything else…

Optional attributes

■ Arbitrary attributes for the time series

■ Attributes are indexed

■ Make the chunk searchable

■ Can contain pre-calculated values

Chronix provides specialized aggregations, transformations, and analyses for time series that are commonly used.

Aggregations

■ Min / Max / Average / Sum / Count

■ Percentile

■ Standard Deviation

■ First / Last

■ Range

Analyses

■ Trend Analysis

Using a linear regression model

■ Outlier Analysis

Using the IQR

■ Frequency Analysis

Check occurrence within a time range

■ Fast Dynamic Time Warping

Time series similarity search

■ Symbolic Aggregate Approximation

Similarity and pattern search

Transformations

■ Bottom/Top n-values

■ Moving average

■ Divide / Scale

■ Downsampling

Many more

Many more

Only scalar values? One size fits all? No! What about logs, traces, and others? No problem – Just do it yourself!

■ Chronix Time Series

■Time Series framework that is used by Chronix.

■Time Series Types:

■Numeric: Doubles (the time series known to be the default)

■More to come.

public interface TimeSeriesConverter<T> {

/*** Shall create an object of type T from the given binary time series.*/

T from(BinaryTimeSeries binaryTimeSeriesChunk, long queryStart, long queryEnd);

/*** Shall do the conversation of the custom time series T into the binary time series that is

stored.*/

BinaryTimeSeries to(T timeSeriesChunk);}

That‘s the easiest way to play with Chronix. A single instance of Chronix on a single node.

Java 8 (JRE)

Chronix - 0.4

Solr - 6.2.1

Lucene

Solr plugins

8983

Your Computer

Chronix-Query-Handler

Chronix-Ingestion-Handler

Chronix-Retention

OpenTSDB

Prometheus

KairosDB

HTTP

Chronix-Compaction-Handler

Chronix Client

InfluxDB

Graphite

Go

Java

Code-Slide: How to set up Chronix, ask for time series data, and call some server-side aggregations in Java.

■ Create a connection to Solr and set up Chronix

■ Define and range query and stream its results

■ Call some aggregations

solr = new HttpSolrClient("http://localhost:8913/solr/chronix/")chronix = new ChronixClient(new MetricTimeSeriesConverter<>(),

new ChronixSolrStorage(200, groupBy, reduce))

query = new SolrQuery("metric:*Load*")chronix.stream(solr,query)

query.addFilterQuery("function=max,min,count,sdiff")stream = chronix.stream(solr,query) Signed Difference:

First=20, Last=-100

-80

Group chunks on a combination

of attributes and reduce them to

a time series.

Get all time series whose

metric contains Load

Compared to other time series databases Chronix‘ results for our use case are outstanding.

■ We have evaluated Chronix with:

■ InfluxDB, OpenTSDB, and KairosDB

■All databases are configured as single node

■ Storage demand for 108 GB of raw csv time

series data.

■Chronix (8.7 GB) saves 20% – 84% of the space

other time series databases.

■ Query times on imported data.

■73% – 92% faster on data retrieval.

■80% – 97% faster on a mix of analyses.

■ Memory footprint: after start, max during

import, max during query mix

■Chronix takes 1.6 times less memory than

the best alternative.

The hard facts. For more details I suggest you to read our research paper about Chronix.

Florian Lautenschlager, Michael Philippsen, Andreas Kumlehn, Josef Adersberger

Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in

Operational Data

FAST 2017 (submitted)

17

Let‘s dig into Chronix Ingesters’ internals.

Image Credit: http://www.taringa.net/posts/ciencia-educacion/12656540/La-Filosofia-del-Dr-House-2.html

Big Picture. It’s a simply and scalable architecture.

Prometheus

Standard Prometheus

InstallationChronix ServerChronix Ingester

• Collects metrics from

various services.

• Writes them to its

default storage

• Writes them using the

standard remote write

interface to Chronix

Ingester

• Collects samples in

batches and writes

them later to Chronix

with an ideal batch size

• Writes checkpoints to

disk to avoid loss of

data.

• Scales easily

• Lossless long term

storage

• Data distribution

(Apache Solr)

• Rich set of analyses

functions for data

analytics beyond real-

time monitoring.

Chronix Chronix

Single Host

Prometheus Chronix ServerChronix Ingester

In-Memory

Everything runs on a single machine. Small. Simple. Beautiful.

S S S B B B

S Sample: {t,v}

B Batch: [{t,v},{t,v},{t,v}]

Single HostPrometheus

Chronix Server

Chronix Ingester

In-Memory

Once per Prometheus on a single host.

Chronix Ingester

In-Memory

Prometheus

S Sample: {t,v}

B Batch: [{t,v},{t,v},{t,v}]

Single Host

Prometheus

Chronix ServerChronix Ingester

In-Memory

Chronix Ingester Singleton ;-)

Prometheus

S Sample: {t,v}

B Batch: [{t,v},{t,v},{t,v}]

B B B

Single Host

Prometheus

Chronix Server

Chronix Ingester

In-Memory

Chronix Ingester Cloud behind a proxy to serve multiple Prometheus servers.

Prometheus

S Sample: {t,v}

B Batch: [{t,v},{t,v},{t,v}]

N

G

I

N

XChronix Ingester

In-Memory

Prometheus

Prometheus

Single Host

Single Host

Single HostSingle HostPrometheus

Chronix ServerChronix Ingester

In-Memory

Cloud Mode: Multiple Prometheus Servers, One Chronix Ingester per Host, A Chronix Server Cloud

Prometheus

N

G

I

N

XChronix Ingester

In-Memory

Prometheus

Prometheus Chronix Server Cloud

M

a

s

t

e

r

Architectural Key Factor: The Chronix Ingestor

■ Small Go Program

■Binary Size: 8.5 MB

■Lines of Code: ~ 720 LoC

■Scales easily: Copy, Execute

■ Handles writes from Prometheus

■Just a small configuration:

remote_write: url:http://<host>:<port>/ingest

■ Batches samples in memory

■Prometheus sends single samples.

■Chronix needs large chunks (n single

samples) to work well

■Max Batch Age

■5M, 12H, ..

■ Crash and restart resilience

■ In-memory is dangerous. The Ingester

holds some amount of transient state

■Regularly writes checkpoints of the entire

in-memory state to disk

■Latest checkpoint is loaded on restart

Chronix loves Chunks. Hence the Ingester batches samples.

The data models for Prometheus and Chronix are similar.

■ Prometheus

■Uses so called labels (key-value pairs) to store dimensional values

■Are added dynamically

■Stores samples (pairs of timestamp and scalar value)

■ Chronix

■Uses attributes (key-value pairs) to store dimensional values

■Schema, Schema less, Dynamic Fields, etc.

■Stores samples of timestamp an any value type: scalar, trace, string, etc.

An example Chronix schema to define the available fields.

<?xml version="1.0" encoding="UTF-8" ?>

<schema name="Chronix" version="1.5"><types><fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/><fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/><fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/><fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/><fieldType name="binary" class="solr.BinaryField"/>

</types>

<fields> <!-- The required fields --> <field name="id" type="string" indexed="true" stored="true" required="true"/> <field name="_version_" type="long" indexed="true" stored="true"/><field name="start" type="long" indexed="true" stored="true" required="true"/><field name="end" type="long" indexed="true" stored="true" required="true"/><field name="data" type="binary" indexed="true" stored="true" required="false"/><field name="metric" type="string" indexed="true" stored="true" required="true"/><!-- Dynamic field for tags --><dynamicField name="*_s" type="string" indexed="true" stored="true"/>

</fields><uniqueKey>id</uniqueKey> <solrQueryParser defaultOperator="OR"/>

</schema>

Definition of types

Available Fields

Prometheus labels are strings. Chronix Ingester creates them in

Chronix Server dynamically using the dynamicField *_s.

Prometheus_Label -> Chronix_Label

host -> host_s

Showcase: Prometheus, Chronix Ingester, Chronix and Grafana

Prometheus Chronix ServerChronix Ingester

In-Memory

S S S

Grafana

B B B

Disk usage: 11 Days of Data

112,815,835 Samples

Prometheus: ~ 786 MB (whole data directory)

Chronix: ~ 265 MB (without compaction)

A few words about performance in our showcase.

Compaction Effects.

Compaction

Points per

Chunk

Amount of

Records

Disk Usage in

MB

Compaction Time in

Seconds

no -1 610355 265 0

yes 100 1422369 357 134

yes 500 284815 187 75

yes 1000 142573 160 93

yes 5000 28850 131 69

yes 10000 14797 126 61

yes 25000 6408 123 61

yes 100000 2051 121 60

yes 500000 920 119 63

Contains about 112 points per chunk without compaction!

A few words about performance in our showcase.

CPU usage: 4 Cores available (= 400 % Max)

A few words about performance in our showcase.

Memory consumption (max. 8 G)

Ingester

Prometheus

Prometheus Configuration

Chronix Default Web-UI

Using the data source plugins for Chronix and Prometheus.

Ingester Health: Everything Green!

Short Term Data in Prometheus.Long Term Data in Chronix.

See the difference?

Everything is open source and free to everyone.The code is the truth.

Chronix Website: www.chronix.ioChronix Github: https://github.com/ChronixDB- Ingester: https://github.com/ChronixDB/chronix.ingester

Questions?- Twitter: @ChronixDB, @flolaut, @phxql- Slack: https://qaware.slack.com/messages/chronix/

Now it’s your turn.

Now it’s your turn.

top related