Top Banner
InfluxDB The time series database Modern Factory #workshops Marcin Szepczyński, July 2016
39

Influxdb and time series data

Jan 09, 2017

Download

Engineering

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Influxdb and time series data

InfluxDB The time series database

Modern Factory #workshops Marcin Szepczyński, July 2016

Page 2: Influxdb and time series data

What is time series data?

Page 3: Influxdb and time series data

What is time series data?

• A time series data is a sequence of data points made from the same source over the time interval.

• If you have a time series data and plot it, one of your axes will be always a time.

Page 4: Influxdb and time series data

Examples of time series data

Page 5: Influxdb and time series data
Page 6: Influxdb and time series data
Page 7: Influxdb and time series data

What is not a time series data?

Page 8: Influxdb and time series data
Page 9: Influxdb and time series data
Page 10: Influxdb and time series data

Regular vs irregular time series

Page 11: Influxdb and time series data

Time series data is good for

• Internet of Things (e.g. sensors data)

• Alerting

• Monitoring

• Real Time Analytics

Page 12: Influxdb and time series data

InfluxDB is I in TICK stack

• Telegraf - time data collector

• InfluxDB - time series database

• Chronograf - time series data visualization

• Kapacitor - time series data processing and alerting

Page 13: Influxdb and time series data

InfluxDB features

• SQL-like query language

• Schemaless

• Case sensitive

• Data types: string, float64, int64, boolean

Page 14: Influxdb and time series data

Measurement

• Measurement (or Point) is a single record (row) in InfluxDB data store

• Each measurement has time (as primary key), tags (indexed columns) and fields (not indexed columns)

Page 15: Influxdb and time series data

Inserting

INSERT table_name,tag1=value1,tag2=value2 temp=30.5,value=1.5

measurement name („table”)

Page 16: Influxdb and time series data

Inserting

INSERT table_name,tag1=value1,tag2=value2 temp=30.5,value=1.5

measurement name („table”)

Comma is a separator between measurement and tags Comma is a separator between each tag and each field

Page 17: Influxdb and time series data

Inserting

INSERT table_name,tag1=value1,tag2=value2 temp=30.5,value=1.5

measurement name („table”)

Space is a separator between tags and fields

Page 18: Influxdb and time series data

Inserting

INSERT table_name,tag1=value1,tag2=value2 temp=30.5,value=1.5

measurement name („table”)

tags

Tagstag1 tag2

value1 value2

Page 19: Influxdb and time series data

Inserting

INSERT table_name,tag1=value1,tag2=value2 temp=30.5,value=1.5

measurement name („table”)

fields

Fieldstemp value30.5 1.5

Page 20: Influxdb and time series data

Inserting

INSERT table_name,tag1=value1,tag2=value2 temp=30.5,value=1.5

measurement name („table”)

Comma is a separator between measurement and tags Comma is a separator between each tag and each field

Space is a separator between tags and fields

tags

fields

Page 21: Influxdb and time series data

Querying• Show databases: > SHOW DATABASES

• Select database: > USE workshop

• Show measurements („tables”)> SHOW MEASUREMENTS

• Simple select all > SELECT * FROM measurement_name

Page 22: Influxdb and time series data

Querying (2)• Select with limit:> SELECT * FROM measure LIMIT 10

• Select with offset: > SELECT * FROM measure OFFSET 10

• Select where clause: > SELECT * FROM measure WHERE tag1 = ’value1’

• Select with order clause: > SELECT * FROM measure ORDER BY cpu DESC

Page 23: Influxdb and time series data

Querying (3)

• Operators:= equal to <>, != not equal to > greater than < less than =~ matches against (REGEX) !~ doesn’t matches against (REGEX)

Page 24: Influxdb and time series data

Aggregations - COUNT()

Returns the number of non-null values. > SELECT count(<field>) FROM measure > SELECT count(cpu) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 25: Influxdb and time series data

Aggregations - MEAN()

Returns the mean (average) value of a single field (calculates only for non-null values). > SELECT mean(<field>) FROM measure > SELECT mean(cpu) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 26: Influxdb and time series data

Aggregations - MEDIAN()

Returns the middle value from the sorted values in single field (Its similar to PERCENTILE(field, 50). > SELECT median(<field>) FROM measure > SELECT median(cpu) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 27: Influxdb and time series data

Aggregations - SPREAD()

Returns the difference between minimum and maximum value of the field. > SELECT spread(<field>) FROM measure > SELECT spread(cpu) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 28: Influxdb and time series data

Aggregations - SUM()

Returns the sum of all values in a single field. > SELECT sum(<field>) FROM measure > SELECT sum(cpu) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 29: Influxdb and time series data

Selectors - BOTTOM(N)

Returns the smaller N values in a single field. > SELECT bottom(<field>, <N>) FROM measure > SELECT bottom(cpu, 5) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 30: Influxdb and time series data

Selectors - FIRST()

Returns the oldest values of a single field. > SELECT first(<field>) FROM measure > SELECT first(cpu) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 31: Influxdb and time series data

Selectors - LAST()

Returns the newest values of a single field. > SELECT last(<field>) FROM measure > SELECT last(cpu) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 32: Influxdb and time series data

Selectors - MAX()

Returns the highest value in a single field. > SELECT max(<field>) FROM measure > SELECT max(cpu) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 33: Influxdb and time series data

Selectors - MIN()

Returns the lowest value in a single field. > SELECT min(<field>) FROM measure > SELECT min(cpu) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 34: Influxdb and time series data

Selectors - PERCENTILE(N)

Returns the N-percentile value for sorted values of a single field.> SELECT percentile(<field>, <N>) FROM measure> SELECT percentile(cpu, 95) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 35: Influxdb and time series data

Selectors - TOP(N)

Returns the largest N values in a single field. > SELECT top(<field>, <N>) FROM measure > SELECT top(cpu, 5) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 36: Influxdb and time series data

GROUP BY clause

InfluxDB supports GROUP BY clause with tag values, time intervals, tag values and time intervals and GROUP BY with fill().

Page 37: Influxdb and time series data

Downsampling

InfluxDB can handle hundreds of thousands of data points per second. Working with that much data over a long period of time can create storage concerns. A natural solution is to downsample the data; keep the high precision raw data for only a limited time, and store the lower precision, summarized data for much longer or forever.

Page 38: Influxdb and time series data

Data retention

A retention policy is the part of InfluxDB’s data structure that describes for how long InfluxDB keeps data and how many copies of those data are stored in the cluster. A database can have several RPs and RPs are unique per database.

Page 39: Influxdb and time series data

More

https://influxdata.com/videos/

https://docs.influxdata.com/influxdb