YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Influxdb and time series data

InfluxDB The time series database

Modern Factory #workshops Marcin Szepczyński, July 2016

Page 2: Influxdb and time series data

What is time series data?

Page 3: Influxdb and time series data

What is time series data?

• A time series data is a sequence of data points made from the same source over the time interval.

• If you have a time series data and plot it, one of your axes will be always a time.

Page 4: Influxdb and time series data

Examples of time series data

Page 5: Influxdb and time series data
Page 6: Influxdb and time series data
Page 7: Influxdb and time series data

What is not a time series data?

Page 8: Influxdb and time series data
Page 9: Influxdb and time series data
Page 10: Influxdb and time series data

Regular vs irregular time series

Page 11: Influxdb and time series data

Time series data is good for

• Internet of Things (e.g. sensors data)

• Alerting

• Monitoring

• Real Time Analytics

Page 12: Influxdb and time series data

InfluxDB is I in TICK stack

• Telegraf - time data collector

• InfluxDB - time series database

• Chronograf - time series data visualization

• Kapacitor - time series data processing and alerting

Page 13: Influxdb and time series data

InfluxDB features

• SQL-like query language

• Schemaless

• Case sensitive

• Data types: string, float64, int64, boolean

Page 14: Influxdb and time series data

Measurement

• Measurement (or Point) is a single record (row) in InfluxDB data store

• Each measurement has time (as primary key), tags (indexed columns) and fields (not indexed columns)

Page 15: Influxdb and time series data

Inserting

INSERT table_name,tag1=value1,tag2=value2 temp=30.5,value=1.5

measurement name („table”)

Page 16: Influxdb and time series data

Inserting

INSERT table_name,tag1=value1,tag2=value2 temp=30.5,value=1.5

measurement name („table”)

Comma is a separator between measurement and tags Comma is a separator between each tag and each field

Page 17: Influxdb and time series data

Inserting

INSERT table_name,tag1=value1,tag2=value2 temp=30.5,value=1.5

measurement name („table”)

Space is a separator between tags and fields

Page 18: Influxdb and time series data

Inserting

INSERT table_name,tag1=value1,tag2=value2 temp=30.5,value=1.5

measurement name („table”)

tags

Tagstag1 tag2

value1 value2

Page 19: Influxdb and time series data

Inserting

INSERT table_name,tag1=value1,tag2=value2 temp=30.5,value=1.5

measurement name („table”)

fields

Fieldstemp value30.5 1.5

Page 20: Influxdb and time series data

Inserting

INSERT table_name,tag1=value1,tag2=value2 temp=30.5,value=1.5

measurement name („table”)

Comma is a separator between measurement and tags Comma is a separator between each tag and each field

Space is a separator between tags and fields

tags

fields

Page 21: Influxdb and time series data

Querying• Show databases: > SHOW DATABASES

• Select database: > USE workshop

• Show measurements („tables”)> SHOW MEASUREMENTS

• Simple select all > SELECT * FROM measurement_name

Page 22: Influxdb and time series data

Querying (2)• Select with limit:> SELECT * FROM measure LIMIT 10

• Select with offset: > SELECT * FROM measure OFFSET 10

• Select where clause: > SELECT * FROM measure WHERE tag1 = ’value1’

• Select with order clause: > SELECT * FROM measure ORDER BY cpu DESC

Page 23: Influxdb and time series data

Querying (3)

• Operators:= equal to <>, != not equal to > greater than < less than =~ matches against (REGEX) !~ doesn’t matches against (REGEX)

Page 24: Influxdb and time series data

Aggregations - COUNT()

Returns the number of non-null values. > SELECT count(<field>) FROM measure > SELECT count(cpu) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 25: Influxdb and time series data

Aggregations - MEAN()

Returns the mean (average) value of a single field (calculates only for non-null values). > SELECT mean(<field>) FROM measure > SELECT mean(cpu) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 26: Influxdb and time series data

Aggregations - MEDIAN()

Returns the middle value from the sorted values in single field (Its similar to PERCENTILE(field, 50). > SELECT median(<field>) FROM measure > SELECT median(cpu) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 27: Influxdb and time series data

Aggregations - SPREAD()

Returns the difference between minimum and maximum value of the field. > SELECT spread(<field>) FROM measure > SELECT spread(cpu) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 28: Influxdb and time series data

Aggregations - SUM()

Returns the sum of all values in a single field. > SELECT sum(<field>) FROM measure > SELECT sum(cpu) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 29: Influxdb and time series data

Selectors - BOTTOM(N)

Returns the smaller N values in a single field. > SELECT bottom(<field>, <N>) FROM measure > SELECT bottom(cpu, 5) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 30: Influxdb and time series data

Selectors - FIRST()

Returns the oldest values of a single field. > SELECT first(<field>) FROM measure > SELECT first(cpu) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 31: Influxdb and time series data

Selectors - LAST()

Returns the newest values of a single field. > SELECT last(<field>) FROM measure > SELECT last(cpu) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 32: Influxdb and time series data

Selectors - MAX()

Returns the highest value in a single field. > SELECT max(<field>) FROM measure > SELECT max(cpu) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 33: Influxdb and time series data

Selectors - MIN()

Returns the lowest value in a single field. > SELECT min(<field>) FROM measure > SELECT min(cpu) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 34: Influxdb and time series data

Selectors - PERCENTILE(N)

Returns the N-percentile value for sorted values of a single field.> SELECT percentile(<field>, <N>) FROM measure> SELECT percentile(cpu, 95) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 35: Influxdb and time series data

Selectors - TOP(N)

Returns the largest N values in a single field. > SELECT top(<field>, <N>) FROM measure > SELECT top(cpu, 5) FROM cpu_temp WHERE time > '2016-07-04' AND time < '2016-07-05' GROUP BY time(1h)

Page 36: Influxdb and time series data

GROUP BY clause

InfluxDB supports GROUP BY clause with tag values, time intervals, tag values and time intervals and GROUP BY with fill().

Page 37: Influxdb and time series data

Downsampling

InfluxDB can handle hundreds of thousands of data points per second. Working with that much data over a long period of time can create storage concerns. A natural solution is to downsample the data; keep the high precision raw data for only a limited time, and store the lower precision, summarized data for much longer or forever.

Page 38: Influxdb and time series data

Data retention

A retention policy is the part of InfluxDB’s data structure that describes for how long InfluxDB keeps data and how many copies of those data are stored in the cluster. A database can have several RPs and RPs are unique per database.

Page 39: Influxdb and time series data

More

https://influxdata.com/videos/

https://docs.influxdata.com/influxdb


Related Documents