YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Chronix Poster for the Poster Session FAST 2017

Chronix: Long Term Storage and Retrieval Technologyfor Anomaly Detection in Operational Data

Florian Lautenschlager,1 Michael Philippsen,2 Andreas Kumlehn,2 and Josef Adersberger11QAware GmbH, Munich, Germany 2University Erlangen-Nürnberg (FAU), Programming Systems Group, Erlangen

Chronix: Long Term Storage and Retrieval Technologyfor Anomaly Detection in Operational Data

Florian Lautenschlager,1 Michael Philippsen,2 Andreas Kumlehn,2 and Josef Adersberger11QAware GmbH, Munich, Germany 2University Erlangen-Nürnberg (FAU), Programming Systems Group, Erlangen

AbstractAnomalies in the runtime behavior of software systems, especiallyin distributed systems, are inevitable, expensive, and hard to lo-cate. To detect and correct such anomalies one has to automati-cally collect, store, and analyze the operational data of the run-time behavior, often represented as time series. There are efficientmeans both to collect and analyze the runtime behavior. Butgeneral-purpose time series databases do not focuson the specific needs of anomaly detection. Chronixis a domain specific time series database targeted atanomaly detection in operational data.

Detecting Anomalies in Running Software matters

•Resource consumption: anomalousmemory consumption, high CPU usage, . . .•Sporadic failure: blocking state,deadlock, dirty read, . . .•Security: port scanning activity,short frequent login attempts, . . .

↪→ Economic or reputation loss.

Anomaly Detection Tool Chain for Operational Data

Collection

Framework

Analysis

FrameworkTime Series Database

Collects operational data

from a running application

Asks the database for data

and analyzes the dataStores the time series data

General Purpose TSDB

• Brake shoe

• Resource hog

• Productivity obstacle

Chronix:

Domain specific TSDB

Domain specific sensors

and adaptors

Domain specific analysis

algorithms and tools

Application’s

Operational

Data

Types of operational data:

• Metrics: scalar values, e.g.,

rates, runtimes, total hits,

counters, …

• Events: single occurrences,

e.g., a user’s login, product

order, …

• Traces: sequences within a

software system, e.g., the

called methods, …

General-purpose TSDBs in Anomaly Detection

Requirements

Graphite

InfluxD

BOpenT

SDB

KairosDB

Prom

etheus

Genericdata model # G# # G# #

Analysissupport G# G# # G# G#

Lossless longterm storage # G#

No support for data types= Productivity obstacle

No support for analyses= Productivity obstacle + Brake shoe

High memory footprint= Performance hogHigh storage demands= Performance hogLoss of historical data= Brake shoe

What makes Chronix domain specific?

Option to pre-compute an extra representation of the data

Optional timestamp compression for almost-periodic time series

Records that meet the needs of the domain

Compression technique that suits the domain’s data

Underlying multi-dimensional storage

Domain specific query language with server-side evaluation

Domain specific commissioning of configuration parameters

How it works!

Example: Almost-periodic time series

Timestamp Value Metric Process Host

25.10.2016 00:00:01.546 218.34 ingester\time SmartHub QAMUC

25.10.2016 00:00:06.718 218.37 ingester\time SmartHub QAMUC

25.10.2016 00:00:11.891 218.49 ingester\time SmartHub QAMUC

25.10.2016 00:00:16.964 218.35 ingester\time SmartHub QAMUC

… … … … …

Optional Pre-compute Extras

Timestamp Value Metric Process Host SAX

25.10.2016 00:00:01.546 218.34 ingester\time SmartHub QAMUC A

25.10.2016 00:00:06.718 218.37 ingester\time SmartHub QAMUC B

25.10.2016 00:00:11.891 218.49 ingester\time SmartHub QAMUC C

25.10.2016 00:00:16.964 218.35 ingester\time SmartHub QAMUC B

… … … … … …

B

A

C

B

C

D

C

A

B

• Lossless storage that keeps all details as analyses may need them.• Programming interface to add extra domain specific "columns".• These "columns" speed up anomaly detection queries.

Optional Timestamp Compaction

Timestamp

25.10 … :01.546

25.10 … :06.718

25.10 … :11.891

25.10 … :16.964

Timestamp

25.10 … :01.546

5.172

5.173

5.073

Timestamp

25.10 … :01.546

5.172

0.001

0.1

Timestamp

25.10 … :01.546

5.172

-

-

Timestamp

25.10 … :01.546

5.172

space saved

space

saved

Drop diffs

below threshold

Calculate

deltas

Compute

diffs

between

them

If accumulated

drift > threshold store delta

•Date-Delta-Compaction for almost-periodic time series.• Functionally lossless as all relevant details are kept.

• Degree of inaccuracy is a configuration parameter of

Domain Specifc Records

Process Host SAX

SmartHub QAMUC A

SmartHub QAMUC B

SmartHub QAMUC C

SmartHub QAMUC B

… … …

1Record

metric: ingester\time

process: SmartHub

host: QAMUC

start: 25.10.2016 00:00:01.546

end: …

type: metric

data: Timestamp Value SAX

25.10.2016 00:00:01.546 218.34 A

5.172 218.37 B

- 218.49 C

- 218.35 B

1

BLOB

chunk

& convert

2

Timestamp Value Metric

25….:01.546 218.34 ingester\time

5.172 218.37 ingester\time

- 218.49 ingester\time

- 218.35 ingester\time

… … …

1

1 1 2

22

• Exploit repetitiveness and bundle "lines" into data chunks.• Programming interface for a specifc time series record encoding.

• Chunk size is a configuration parameter of

Domain Specific Compression

Record

metric: ingester\time

process: SmartHub

host: QAMUC

start: 25.10.2016 00:00:01.546

end: …

type: metric

data: 00105e0 e6b0 343b 9

07bc 0804 e7d508040

Record

metric: ingester\time

process: SmartHub

host: QAMUC

start: 25.10.2016 00:00:01.546

end: …

type: metric

data: Timestamp Value SAX

25.10.2016 00:00:01.546 218.34 A

5.172 218.37 B

- 218.49 C

- 218.35 B Compressed BLOB

serialize

& compress

• Lossless compression techniques minimizes the record size.• Domain data often has small increments, recurring patterns, etc.

• Choice of compression technqiue is a configuration parameter of

Multi-Dimensional Storage

Timestamp Value Metric Process Host

25.10.2016 00:00:01.546 218.34 ingester\time SmartHub QAMUC

25.10.2016 00:00:06.718 218.37 ingester\time SmartHub QAMUC

25.10.2016 00:00:11.891 218.49 ingester\time SmartHub QAMUC

25.10.2016 00:00:16.964 218.35 ingester\time SmartHub QAMUC

… … … … …

q=host:QAMUC AND metric:ingester*AND type:[metric OR trace] AND end:NOW-7MONTH

• Explorative: Users can use the attributes to find a record.•Correlating: Queries can use and combine all types.

Query Language & Server-Side eval.

Basic GraphiteInfluxDB

OpenTSDBKairosDB

PrometheusChronix

distinct × X × × × Xintegral X × × × × X

min/max/sum X X X X X Xbottom/top × X × × X X

first/last X X × X × X. . . . . . . . . . . . . . . . . . . . .

nnderivative X X × × × Xmovavg X X × × × X

divide/scale X X × X X X

High-levelsax [33] × × × × × X

fastdtw [38] × × × × × Xoutlier × × × × × X

trend × × × × × Xfrequency × × × × × X

grpsize × × × × × Xsplit × × × × × X

queryread

resultprocess

q=metric:ingester* & cf=outlier

• Basic functions & high-level built-in domain specific functions.• Plug-in interface to add functions for server-side evalution.

Domain Specific Commissioning

0102030405060708090

0 5 10 50 100 200 1000DDC threshold in ms

Rat

es in

% −

Ave

rage

in m

s

Inaccuracy RateAverage DeviationSpace Reduction

323436384042444648

32 64 128 256 512 1024Chunk size in kBytes

Tota

l acc

ess

time

in s

ec

gzipLZ4Snappy

•DDC-Threshold: 200 ms.•Compression & Chunk Size: gzip + 128 kByte.

Easily detectpattern!

Fast!

Best Values!

Select the

ideal

Compression!

Projects 1–3

Remove

Jitter!

EvaluationBenchmark ClientUbuntu 16.04.1 x64, 12 core, 32 GB Ram,

512 GB SSD

Java

Benchmark

Benchmark ServerUbuntu 16.04.1 x64, 12 core, 32 GB Ram,

512 GB SSD

Docker InfluxDB KairosDB

ChronixOpenTSDB

Queries

Time Series

HTTP

Data of 5 Industry Projects.Project 1 2 3 4 5 total

time series 1,080 8,567 4,538 500 24,055 38,740

pairs

(mio) metric 2.4 331.4 162.6 3.9 3,762.3 4,262.6

lsof 0.0 0.0 0.0 0.4 0.0 0.4strace 0.0 0.0 0.0 12.1 0.0 12.1

(a) Pairs and time series per project.

Project r 0.5 1 7 14 21 28 56 91 1801 – 3 q 15 30 30 10 5 3 1 2 0 96

q 2 11 15 8 12 5 1 2 2 584 & 5 b 1 6 5 7 2 4 4 1 2 32

h 2 6 10 8 6 6 3 2 0 43

(b) Time ranges r (days) and occurrences of queries (q) for raw data retrieval,and of queries with basic (b) and high-level (h) functions.

Memory footprint (in MBytes)

InfluxDB

OpenTSDB

KairosD

B

Chronix

Initially 33 2,726 8,763 446Import (max) 10,336 10,111 18,905 7,002Query (max) 8,269 9,712 11,230 4,792

Chronix has a 34%–69% smaller memory footprint.

Storage demands (in GBytes)

Project RawData

InfluxDB

OpenTSDB

KairosD

B

Chronix

4 1.2 0.2 0.2 0.3 0.15 107.0 10.7 16.9 26.5 8.6

total 108.2 10.9 17.1 26.8 8.7

Chronix saves 20%–68% of the storage space.

Data retrieval times (in s)

r q InfluxDB

OpenTSDB

KairosD

B

Chronix

0.5 2 4.3 2.8 4.4 0.91 11 5.2 5.6 6.6 5.37 15 34.1 17.4 26.8 7.0

14 8 36.2 14.2 25.5 4.021 12 76.5 29.8 55.0 6.028 5 7.9 3.9 5.6 0.556 1 35.4 12.4 24.1 1.291 2 47.5 15.5 33.8 1.1

180 2 96.7 36.7 66.6 1.1total 343.8 138.3 248.4 27.1

Chronix saves 80%–92% on data retrieval time.

Times for b- and h-queries (in s)

Basic (b) InfluxDB

OpenTSDB

KairosD

B

Chronix

4 Avg. 0.9 6.1 9.8 4.45 Max. 1.3 8.4 9.1 6.03 Min. 0.7 2.7 5.3 2.83 Dev. 6.7 16.7 21.1 2.35 Sum 0.7 6.0 12.0 2.04 Count 0.8 5.5 10.5 1.08 Perc. 10.2 25.8 34.5 8.6High-level (h)

12 Outlier 30.7 29.1 117.6 18.914 Trend 162.7 50.4 100.6 30.211 Freq. 47.3 23.9 45.7 16.33 GroupSize 218.9 2927.8 206.3 29.63 Split 123.1 2893.9 47.9 37.2

75 total 604.0 5996.3 620.4 159.3

Chronix saves 73%–97% on analysis times.

Typical sce

nario.

Impo

rtan

t!

Neede

dfo

r

exp

lora

tio

n!

Evaluatio

n:

Projects

4 & 5

ConclusionChronix exploits the characteristics of the domain in many ways andthus achieves better storage and query results. Chronix is open source. www.chronix.io

AcknowledgementsThis research was in part supported by the Bavarian Ministry of Economic Affairs and Media, Energy andTechnology as an IuK-grant for the project DfD – Design for Diagnosability.

Related Documents