Top Banner
Chronix as long term storage for Prometheus Florian Lautenschlager, Moritz Kammerer @flolaut, @ phxql
40

Chronix as Long-Term Storage for Prometheus

Apr 16, 2017

Download

Data & Analytics

QAware GmbH
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chronix as Long-Term Storage for Prometheus

Chronix as long term storage for Prometheus

Florian Lautenschlager, Moritz Kammerer

@flolaut, @phxql

Page 2: Chronix as Long-Term Storage for Prometheus

Prometheus

Cloud Native Application

Cloud Native ApplicationCloud Native Application

Cloud Native Application

Cloud Native ApplicationCloud Native Application

Real-time monitoring and alerting for cloud native apps to detect

anomalies close to their occurrence and to initiate measures.

TIMENOW 14 Days

Page 3: Chronix as Long-Term Storage for Prometheus

Beyond real-time monitoring of cloud native apps?

Nothing more to do?

Page 4: Chronix as Long-Term Storage for Prometheus

Prometheus

Cloud Native Application

Cloud Native ApplicationCloud Native Application

Cloud Native Application

Cloud Native ApplicationCloud Native Application

TIMENOW THEN

Real-time monitoring and alerting for cloud native apps to detect

anomalies close to their occurrence and to initiate measures.Lossless long term storage to store

data forever allowing analyses

beyond real-time monitoring!

Chronix

Page 5: Chronix as Long-Term Storage for Prometheus
Page 6: Chronix as Long-Term Storage for Prometheus

Agenda

■ Some words about Chronix, its Architecture, its Features, and its Performance.

■ How did we built the integration with Prometheus.

■ Showcase: Prometheus, Chronix Ingester, Chronix, and Grafana

Page 7: Chronix as Long-Term Storage for Prometheus

Chronix is more than just a simple time series database. It’s a time series processing tool stack for all purposes.

Page 8: Chronix as Long-Term Storage for Prometheus

Time Series Database: What’s that?

■ Definition 1: “A sample s is a tuple of {timestamp, value}, where the

value could be any kind of object.”

■ Definition 2: “A time series T is an arbitrary list of chronological

ordered samples of one value type”.

■ Definition 3: “A chunk C is a chronological ordered part of a time

series.”

■ Definition 4: “A time series database TSDB is a specialized database

for storing and retrieving time series in an efficient and optimized

way”.

s

{t,v}

1T

{s1,s2}

TCT

T1

C1,1

C1,2

TSDBT3C2,2

T1 C2,1

Page 9: Chronix as Long-Term Storage for Prometheus

Chronix’ architecture enables both efficient storage of time series and millisecond range queries.

(1)

Semantic Transformation

(2)

Attributes and Chunks

(3)

Basic Compression

(4)

Multi-Dimensional

Storage

Record

data:<chunk>

attributes

Record

data:compressed

<chunk>

attributes

Record Storage

68 Billion Points1 Mio. Chunks *

68.000 Points~ 96% Compression

Optional

Page 10: Chronix as Long-Term Storage for Prometheus

The key data type of Chronix is called a record. It stores a compressed time series chunk and its attributes.

record{

data:compressed{<chunk>}

//technical fields id: 3dce1de0−...−93fb2e806d19 version: 1501692859622883300 start: 1427457011238 end: 1427471159292

//optional attributes host: prodI5 process: scheduler group: jmxmetric: heapMemory.Usage.Usedmax: 896.571

}

Data:compressed{<chunk of time series data>}

■ Time Series: timestamp, numeric value

■ Traces: calls, exceptions, …

■ Logs: access, method runtimes

■ Complex data: models, test coverage,

anything else…

Optional attributes

■ Arbitrary attributes for the time series

■ Attributes are indexed

■ Make the chunk searchable

■ Can contain pre-calculated values

Page 11: Chronix as Long-Term Storage for Prometheus

Chronix provides specialized aggregations, transformations, and analyses for time series that are commonly used.

Aggregations

■ Min / Max / Average / Sum / Count

■ Percentile

■ Standard Deviation

■ First / Last

■ Range

Analyses

■ Trend Analysis

Using a linear regression model

■ Outlier Analysis

Using the IQR

■ Frequency Analysis

Check occurrence within a time range

■ Fast Dynamic Time Warping

Time series similarity search

■ Symbolic Aggregate Approximation

Similarity and pattern search

Transformations

■ Bottom/Top n-values

■ Moving average

■ Divide / Scale

■ Downsampling

Many more

Many more

Page 12: Chronix as Long-Term Storage for Prometheus

Only scalar values? One size fits all? No! What about logs, traces, and others? No problem – Just do it yourself!

■ Chronix Time Series

■Time Series framework that is used by Chronix.

■Time Series Types:

■Numeric: Doubles (the time series known to be the default)

■More to come.

public interface TimeSeriesConverter<T> {

/*** Shall create an object of type T from the given binary time series.*/

T from(BinaryTimeSeries binaryTimeSeriesChunk, long queryStart, long queryEnd);

/*** Shall do the conversation of the custom time series T into the binary time series that is

stored.*/

BinaryTimeSeries to(T timeSeriesChunk);}

Page 13: Chronix as Long-Term Storage for Prometheus

That‘s the easiest way to play with Chronix. A single instance of Chronix on a single node.

Java 8 (JRE)

Chronix - 0.4

Solr - 6.2.1

Lucene

Solr plugins

8983

Your Computer

Chronix-Query-Handler

Chronix-Ingestion-Handler

Chronix-Retention

OpenTSDB

Prometheus

KairosDB

HTTP

Chronix-Compaction-Handler

Chronix Client

InfluxDB

Graphite

Go

Java

Page 14: Chronix as Long-Term Storage for Prometheus

Code-Slide: How to set up Chronix, ask for time series data, and call some server-side aggregations in Java.

■ Create a connection to Solr and set up Chronix

■ Define and range query and stream its results

■ Call some aggregations

solr = new HttpSolrClient("http://localhost:8913/solr/chronix/")chronix = new ChronixClient(new MetricTimeSeriesConverter<>(),

new ChronixSolrStorage(200, groupBy, reduce))

query = new SolrQuery("metric:*Load*")chronix.stream(solr,query)

query.addFilterQuery("function=max,min,count,sdiff")stream = chronix.stream(solr,query) Signed Difference:

First=20, Last=-100

-80

Group chunks on a combination

of attributes and reduce them to

a time series.

Get all time series whose

metric contains Load

Page 15: Chronix as Long-Term Storage for Prometheus

Compared to other time series databases Chronix‘ results for our use case are outstanding.

■ We have evaluated Chronix with:

■ InfluxDB, OpenTSDB, and KairosDB

■All databases are configured as single node

■ Storage demand for 108 GB of raw csv time

series data.

■Chronix (8.7 GB) saves 20% – 84% of the space

other time series databases.

■ Query times on imported data.

■73% – 92% faster on data retrieval.

■80% – 97% faster on a mix of analyses.

■ Memory footprint: after start, max during

import, max during query mix

■Chronix takes 1.6 times less memory than

the best alternative.

Page 16: Chronix as Long-Term Storage for Prometheus

The hard facts. For more details I suggest you to read our research paper about Chronix.

Florian Lautenschlager, Michael Philippsen, Andreas Kumlehn, Josef Adersberger

Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in

Operational Data

FAST 2017 (submitted)

Page 17: Chronix as Long-Term Storage for Prometheus

17

Let‘s dig into Chronix Ingesters’ internals.

Image Credit: http://www.taringa.net/posts/ciencia-educacion/12656540/La-Filosofia-del-Dr-House-2.html

Page 18: Chronix as Long-Term Storage for Prometheus

Big Picture. It’s a simply and scalable architecture.

Prometheus

Standard Prometheus

InstallationChronix ServerChronix Ingester

• Collects metrics from

various services.

• Writes them to its

default storage

• Writes them using the

standard remote write

interface to Chronix

Ingester

• Collects samples in

batches and writes

them later to Chronix

with an ideal batch size

• Writes checkpoints to

disk to avoid loss of

data.

• Scales easily

• Lossless long term

storage

• Data distribution

(Apache Solr)

• Rich set of analyses

functions for data

analytics beyond real-

time monitoring.

Chronix Chronix

Page 19: Chronix as Long-Term Storage for Prometheus

Single Host

Prometheus Chronix ServerChronix Ingester

In-Memory

Everything runs on a single machine. Small. Simple. Beautiful.

S S S B B B

S Sample: {t,v}

B Batch: [{t,v},{t,v},{t,v}]

Page 20: Chronix as Long-Term Storage for Prometheus

Single HostPrometheus

Chronix Server

Chronix Ingester

In-Memory

Once per Prometheus on a single host.

Chronix Ingester

In-Memory

Prometheus

S Sample: {t,v}

B Batch: [{t,v},{t,v},{t,v}]

Page 21: Chronix as Long-Term Storage for Prometheus

Single Host

Prometheus

Chronix ServerChronix Ingester

In-Memory

Chronix Ingester Singleton ;-)

Prometheus

S Sample: {t,v}

B Batch: [{t,v},{t,v},{t,v}]

B B B

Page 22: Chronix as Long-Term Storage for Prometheus

Single Host

Prometheus

Chronix Server

Chronix Ingester

In-Memory

Chronix Ingester Cloud behind a proxy to serve multiple Prometheus servers.

Prometheus

S Sample: {t,v}

B Batch: [{t,v},{t,v},{t,v}]

N

G

I

N

XChronix Ingester

In-Memory

Prometheus

Prometheus

Page 23: Chronix as Long-Term Storage for Prometheus

Single Host

Single Host

Single HostSingle HostPrometheus

Chronix ServerChronix Ingester

In-Memory

Cloud Mode: Multiple Prometheus Servers, One Chronix Ingester per Host, A Chronix Server Cloud

Prometheus

N

G

I

N

XChronix Ingester

In-Memory

Prometheus

Prometheus Chronix Server Cloud

M

a

s

t

e

r

Page 24: Chronix as Long-Term Storage for Prometheus

Architectural Key Factor: The Chronix Ingestor

■ Small Go Program

■Binary Size: 8.5 MB

■Lines of Code: ~ 720 LoC

■Scales easily: Copy, Execute

■ Handles writes from Prometheus

■Just a small configuration:

remote_write: url:http://<host>:<port>/ingest

■ Batches samples in memory

■Prometheus sends single samples.

■Chronix needs large chunks (n single

samples) to work well

■Max Batch Age

■5M, 12H, ..

■ Crash and restart resilience

■ In-memory is dangerous. The Ingester

holds some amount of transient state

■Regularly writes checkpoints of the entire

in-memory state to disk

■Latest checkpoint is loaded on restart

Page 25: Chronix as Long-Term Storage for Prometheus

Chronix loves Chunks. Hence the Ingester batches samples.

Page 26: Chronix as Long-Term Storage for Prometheus

The data models for Prometheus and Chronix are similar.

■ Prometheus

■Uses so called labels (key-value pairs) to store dimensional values

■Are added dynamically

■Stores samples (pairs of timestamp and scalar value)

■ Chronix

■Uses attributes (key-value pairs) to store dimensional values

■Schema, Schema less, Dynamic Fields, etc.

■Stores samples of timestamp an any value type: scalar, trace, string, etc.

Page 27: Chronix as Long-Term Storage for Prometheus

An example Chronix schema to define the available fields.

<?xml version="1.0" encoding="UTF-8" ?>

<schema name="Chronix" version="1.5"><types><fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/><fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/><fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/><fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/><fieldType name="binary" class="solr.BinaryField"/>

</types>

<fields> <!-- The required fields --> <field name="id" type="string" indexed="true" stored="true" required="true"/> <field name="_version_" type="long" indexed="true" stored="true"/><field name="start" type="long" indexed="true" stored="true" required="true"/><field name="end" type="long" indexed="true" stored="true" required="true"/><field name="data" type="binary" indexed="true" stored="true" required="false"/><field name="metric" type="string" indexed="true" stored="true" required="true"/><!-- Dynamic field for tags --><dynamicField name="*_s" type="string" indexed="true" stored="true"/>

</fields><uniqueKey>id</uniqueKey> <solrQueryParser defaultOperator="OR"/>

</schema>

Definition of types

Available Fields

Prometheus labels are strings. Chronix Ingester creates them in

Chronix Server dynamically using the dynamicField *_s.

Prometheus_Label -> Chronix_Label

host -> host_s

Page 28: Chronix as Long-Term Storage for Prometheus

Showcase: Prometheus, Chronix Ingester, Chronix and Grafana

Prometheus Chronix ServerChronix Ingester

In-Memory

S S S

Grafana

B B B

Page 29: Chronix as Long-Term Storage for Prometheus

Disk usage: 11 Days of Data

112,815,835 Samples

Prometheus: ~ 786 MB (whole data directory)

Chronix: ~ 265 MB (without compaction)

A few words about performance in our showcase.

Page 30: Chronix as Long-Term Storage for Prometheus

Compaction Effects.

Compaction

Points per

Chunk

Amount of

Records

Disk Usage in

MB

Compaction Time in

Seconds

no -1 610355 265 0

yes 100 1422369 357 134

yes 500 284815 187 75

yes 1000 142573 160 93

yes 5000 28850 131 69

yes 10000 14797 126 61

yes 25000 6408 123 61

yes 100000 2051 121 60

yes 500000 920 119 63

Contains about 112 points per chunk without compaction!

Page 31: Chronix as Long-Term Storage for Prometheus

A few words about performance in our showcase.

CPU usage: 4 Cores available (= 400 % Max)

Page 32: Chronix as Long-Term Storage for Prometheus

A few words about performance in our showcase.

Memory consumption (max. 8 G)

Ingester

Page 33: Chronix as Long-Term Storage for Prometheus

Prometheus

Page 34: Chronix as Long-Term Storage for Prometheus

Prometheus Configuration

Page 35: Chronix as Long-Term Storage for Prometheus

Chronix Default Web-UI

Page 36: Chronix as Long-Term Storage for Prometheus

Using the data source plugins for Chronix and Prometheus.

Page 37: Chronix as Long-Term Storage for Prometheus

Ingester Health: Everything Green!

Page 38: Chronix as Long-Term Storage for Prometheus

Short Term Data in Prometheus.Long Term Data in Chronix.

See the difference?

Page 39: Chronix as Long-Term Storage for Prometheus

Everything is open source and free to everyone.The code is the truth.

Chronix Website: www.chronix.ioChronix Github: https://github.com/ChronixDB- Ingester: https://github.com/ChronixDB/chronix.ingester

Questions?- Twitter: @ChronixDB, @flolaut, @phxql- Slack: https://qaware.slack.com/messages/chronix/

Page 40: Chronix as Long-Term Storage for Prometheus

Now it’s your turn.

Now it’s your turn.