8/9/2019 Time Series Data Concepts
1/52
RFCorsello
Research
Foundation
Time Series DataConcepts
8/9/2019 Time Series Data Concepts
2/52
Introduction
Sensors and other continual monitoring data collection efforts are used in mfields
These forms of data collection have a common underlying premise
A fixed set of data fields collected at regular time intervals over a longer-terperiod
This is the essence of a time series
How time series data is collected and used has a direct influence on the stor
methodology that should be used Time series data is a form of temporal data that is managed as a set
It is the uniformity of the collection that enables and favors specific treatmemanagement
8/9/2019 Time Series Data Concepts
3/52
Temporal Concepts
Time is an intrinsic concept familiar to us all
It marks the when of all events
All events may be marked by when they occur
All measurements are collected in time
A temperature (say 15oC) is a value measured at a point in time (and space)
Measurements are taken at the time now for when the measurement occurs
At any point in time after the measurement was recorded, it can be referred to bythe time of the measurement
This basic concept implies that all data is temporal in nature
The term temporal in this construct means with respect topertaining to time where are key term is data
Temporal data is any data that has value measured with retime
Temporal data is about bounding the validity or relevance in time
If a river is measured to be 15oC; that measurement is onltime the measurement was taken
For any data measurement:
the value measured (15oC) is non-temporal
15oC is a value, it is the thing measured a river which
For certain very specific applications, a measurement mayvariance over time and is therefore temporally static
This does not imply that the data is not tempotemporal validity of the measurement is equilife of the item measured
These are two distinct concepts, temporality of the measuof the item measured
8/9/2019 Time Series Data Concepts
4/52
Time Series
A time series is defined as a fixed structure of data collectedrepeatedly over time at fixed intervals
This definition is very broad and as such allows for variability in
areas
8/9/2019 Time Series Data Concepts
5/52
Time Domain
A single time series data set will have a time domain marking thand end of the time series
For continual monitoring scenarios, the end may be thought of as b
now and the end of time
Since the data is a time series, now represents the current last re
8/9/2019 Time Series Data Concepts
6/52
Time Interval
For any time series, there is a fixed interval between value points For example every five minutes is an interval for a time series of data
collected at five minute intervals
It is this exact concept that permits a time series to only store the data and not the time value it is a measurement for
A time series only stores two actual times
Start date/time
End date/time
The time series stores a single interval value that is the return period osampling interval separating discrete readings
Five minutes in our example
8/9/2019 Time Series Data Concepts
7/52
Measurement Interval
An important related concept of time series data is the actual measurement inter
If a measurement is taken every five minutes, what is the collection method for the meas
If a temperature measurement is recorded every five minutes on the 0 and 5 (e.g. 5:0
Is the measure:
An instantaneous temperature
An average temperature from the previous time
An average of a split time (5:00 recorded, sampled from 4:57:30-5:02:30)
This information is not part of the time series itself, but is instead metadata aboutseries
An important concept here is that for continual monitoring time series, changes oover time may measure using different approaches
In the case of different measure intervals, the time series should be split for consistency
8/9/2019 Time Series Data Concepts
8/52
Interval Examples
8/9/2019 Time Series Data Concepts
9/52
Relationship to Temporal Data
Time series data is a special case of temporal data
A time series is temporal in that each measurement within the time series may be treated as a sitemporal measurement
The fixed interval of measures makes the treatment of the data special, whereas the data itself isin any way
A single time series may have thousands or millions of individual measurements, each sfixed intervals
If a time series were to have only a single measurement (the degenerate case), it woulda temporal measure
Any collection of temporal measures that have the property of being evenly spaced in tbe treated as a time series
It is possible to construct a time series from non-evenly spaced data via an interpolation
It is common to abstract detailed measures (such as hourly temperatures at uneven intervals sinto more abstract time series such as daily, weekly or monthly means
8/9/2019 Time Series Data Concepts
10/52
Time Series and Temporal Representa
8/9/2019 Time Series Data Concepts
11/52
Collection
Time series data may be collected in any of a number of ways
A simulation or application may generate a time series directly
A single run of an application generates a full time series at once
An application may also append to a time series each time it runs
In the latter case, it is critical the application is consistent in each run to maintain the integrity of time serieoffsets
It is often desirable to know which run produced which part of the time series
In the collection of time series data from sensors or manual entry:
Each subsequent round of collection is conceptually separate from the previous round of collection
In the case of a field deployed sensor (non-telemetry)
Each time the sensor is changed out or data is downloaded there is a new time series created for that batch o
This is critical in that each deployment of a sensor may overlap slightly, may have short gaps, or may be skew(every five minutes, but on the 1s and 6s)
8/9/2019 Time Series Data Concepts
12/52
Collection Example
8/9/2019 Time Series Data Concepts
13/52
Virtual Time Series
The concept of multiple time series collections that align with eestablishes a need for a virtual time series
This virtual time series is the defined global time series for a
collection definition (fields, interval and domain)
Composed of individual physical time series that each contains a
data records for a collection effort
8/9/2019 Time Series Data Concepts
14/52
Time Series Use
The long-term purpose of time series is no different than that of any data
How time series data is used will influence the approach used for storage to adequate performance and storage volumes are available to handle the dem
It is the nature of how time series data is used that most influences its specitreatment
In many cases a time series is used as a whole (the entire series) rather thanindividual measures
Without such a directed form of use, the notion of a time series would be iras a separate entity from the more general temporal data
It is the cost of storage and transmission which can greatly affect the perforapplications using time series data that suggests the special treatment of timto reduce size and increase access performance
8/9/2019 Time Series Data Concepts
15/52
Methodologies of
8/9/2019 Time Series Data Concepts
16/52
Random Extraction
The most basic form of use for a time series is that of random extra
A user needs data from a time series based upon a set of criterion know
by the user at extraction time (not planned or expected at data collecti
This is one of the most common scenarios for any data use and has larg
implications in storage format
For random extraction, a user may request all records where temperatover 32
This form of access results in a search over the time series to extract th
individual elements matching the criteria provided
8/9/2019 Time Series Data Concepts
17/52
Temporal Extraction
The easiest form of extraction from a time series is temporal ex
The user wants a portion of the time series between two dates
This results in a new time series being returned that is bounded by
most constrained limits between the user defined limits and the tim
series internal limits Such as requesting an extraction starting prior to the start of the time
itself
8/9/2019 Time Series Data Concepts
18/52
Complete Delivery
The best case use scenario for a time series is complete delivery
Notice this is not an extraction, in that the entire data set is deliver
whole
No processing is required beyond integrating physical time seriesvirtual record
8/9/2019 Time Series Data Concepts
19/52
Enumeration
Once delivered a user will general walk through the data in so
manner toward a goal
For example, to compute the average of a time series a full forward
scrolling read is performed to sum all values in the time series
This is a complete linear access from start to finish
8/9/2019 Time Series Data Concepts
20/52
Linear and Partial Access
Linear Access Linear or sequential access is the direct reading of the time series i
order of the data
Linear access has no special requirements and is one common acce
scenario
Partial Access Only a portion of the data may need to be reviewed
The access will only need to visit a portion of the data points withi
time series
8/9/2019 Time Series Data Concepts
21/52
Random Access
The user may need to access any point within the time series at any time
The user must be able to move within the time series at will
Random access is the most complex form of access for any data structure, acommonly required
One common example of random access is for sort
If a user wanted to sort a time series by temperature rather than by time, thused both linear access to enumerate and random access to read specific ite
More significantly, random access allows for access by data field, such astemperature (e.g. get record for temperature = 26)
This form of random access is closed related to random extraction and has simpacts for performance
8/9/2019 Time Series Data Concepts
22/52
Index or Ordinal
Index or ordinal access to a time series is access by time offset ooffset into the time series by position (e.g. the 26th data poinseries)
Index access is closely related to random access
Is in fact a mechanism for random access without the performaissues of other forms of random access
In general, index access is the only form of random access with performance costs
Still has implications for large volume time series
8/9/2019 Time Series Data Concepts
23/52
StoPlacing the Bytes on
8/9/2019 Time Series Data Concepts
24/52
Storage
There are many well-defined storage formats for dealing with the storage and transport of time series data such as:
CDF (Common Data Format)
NetCDF (Network Common Data Format)
There are many databases and applications that have support for time series data such as
Aquarius
Historis
Temporal Analyst
GrADs
Timescape XDB
Hec-DSS
There is a common thread across all time series formats
A time series is a set of data delimited in time by a fixed interval with a fixed start date (our general definition)
In specific implementations, there may be constraints on the data stored in a single time series (the fields) or on the maximum size of the time series w hen stored (Aquarius for the underlying database)
When planning time series storage, considerations must be made for the collection and use of the data to be stored to ensure adequate capacity and performanc
Each type of data to be stored in a time series (the field set) will require a dedicated time series store
For example, a water quality time series cannot store sediment data (there are different fields)
A water/sediment time series may be created that stores both together as a single entity
8/9/2019 Time Series Data Concepts
25/52
Storage Mechanisms
A time series may be stored:
In a relational database management system (RDBMS)
In flat files
As XML
The selection of storage location (e.g. flat file or RDBMS) will influethe data within that location is structured
For example, in an RDBMS, each time series could be stored as:
A dedicated table
A set of rows in a shared table
A single row in a shared table
8/9/2019 Time Series Data Concepts
26/52
Field Storage
An important aspect of the time series is the fields within the se
If a time series stores only a single parameter (such as temperat
the time series storage is relatively trivial
If the time series stores a complex data structure, the storage otime series will be equally complex
8/9/2019 Time Series Data Concepts
27/52
Storage Basics
For storage on a computer:
Data must be reduced into bytes that are written to and read from disk
Even in an RDBMS, the same is true
In any programming language or RDBMS, there are a set of specific data types that are well known and can be directly convertbytes and the data type (such as a 32-bit integer or text string)
Each language and database understands a different way of converting between bytes and data types:
A 32-bit integer in Java does not represent the same byte pattern as a 32-bit integer in Visual Basic
The conversion of a data type to bytes is called serialization and the reverse is called deserialization
This is an ongoing issue in computer science and affects all computing applications
As long as there is a single platform performing all operations across the lifecycle, there is no measurable issue
The most consistent format across all platforms is text, which is a powerful indicator of why XML has been so successful as everepresented as text in XML
The comparison of data (such as during search) requires the processing software to understand the data stored
Due to this fundamental concept, the storage format used should be aligned with the ultimate patterns of use and limitations of the platforms (for example maximum allowed field lengths in an RDBMS)
8/9/2019 Time Series Data Concepts
28/52
Storage Considerations
It is critical that storage designers consider:
Volume (size)
Access speed (read and write)
General performance
If most access will enumerate a data set, the selected storage mechshould favor that form of access
If random access is still needed, then no optimizations should be usenumerations that make random access unusable
This is always a trade-off and must be evaluated on a case-by-case
8/9/2019 Time Series Data Concepts
29/52
Time Series Field Concepts
Each time series may have multiple fields of data collected
Each time series may have different fields collected than anotheseries
Given both of these premises, the design of the data fields withtime series may be of considerable importance
Time series data may be stored in any number of ways using varioutechnologies
In each of these technologies, the time series and the data values arelated and may be treated differently based upon the specific techused
8/9/2019 Time Series Data Concepts
30/52
Single Field Time Series
This form of time series has a single value collected at each time interval
This form of time series may be thought of and treated as a basic value stream of discrete values for the single field at t he fixed interval of the time series
The field and storage design for this type of time series only needs to deal with the most primitive anomaly:
Missing data values
Within any time series it must be expected that some individual value points may be corrupt and therefore are missing from the series
In any time series that uses IEEE 754 compliant single (32-bit) or double (64-bit) precision floating point numbers, there is a built- in not a number (NaN) value
In this case, no special handling is required for the time series except to expect that NaN values may be present anywhere within the value stream
If a single field time series is storing data in another format, such as integer or string values, accommodations must be made for the absence of value within the value stream
For the design of single field time series data, there are two basic approaches:
Time coupled
Sequential
A time coupled single value series will associate each record within the time s eries as the (T,V) pair of time (T) and value (V)
This set of pairs becomes the time series
A sequential single value series will provide all records within the time series as a stream of values with only a single time stored indicating the start of the series and a single intthe temporal spacing of the values within the series
In this manner, the time series may be though of simply as an array of values
8/9/2019 Time Series Data Concepts
31/52
Multiple Value Time Series Each temporal record within the time series has a set of multiple fields
Based the definition of a time series, all records have exactly the same set owithin a single time series
Each time series defines its own set of fields, and therefore may result in arbmany time series field sets within an organizations corpus of time series dat
The pattern for storing multiple field time series data can take several forms
The most basic form is to treat each field within the time series as a distinctfield time series
This approach isolates each data field as a distinct time series and provides tto distribute the storage of each time series to different storage locations
There is however the overhead of additional storage for the time series met
8/9/2019 Time Series Data Concepts
32/52
Hub and Spoke Model
A basic expansion of the single field time series pattern for mult
fields is to create a hub and spoke or star pattern for the tim
series
The core time series metadata is recorded as a single entity, wit
field modeled as a discrete time series data value stream
8/9/2019 Time Series Data Concepts
33/52
Field Interleaved
A time series is stored as a series of value streams Each value stream is complete for the time series, containing all va
a single field
This model most closely resembles the result of the hub and spoke
where each parameter is isolated as a series
The total time series has one value stream per field that can be easenumerated
If values of multiple fields must be accessed together, there is addi
overhead for enumerating multiple streams
8/9/2019 Time Series Data Concepts
34/52
Field Interleaved Example
8/9/2019 Time Series Data Concepts
35/52
Interval Interleaved
The fields are stored in order within each temporal interval This permits each temporal interval to be the primary unit of separ
between each data record
Within a single temporal record, the fields are consecutive in a preorder
Interval interleaved storage provides rapid enumeration of the tim
when all fields are used in the enumeration
If it is most common to enumerate the time series to access only aparameter, there is overhead in the transport and skipping of unusto access the required field
8/9/2019 Time Series Data Concepts
36/52
Interval Interleaved Example
8/9/2019 Time Series Data Concepts
37/52
Coupled Interleaved
For any time series where general enumeration involves specific kn
groups of fields, a hybrid of field and interval interleaving may be u
Allows for groups of fields to be represented as field interleaved with t
remainder of the dataset interval interleaved
Provides fast enumeration for the coupled fields while avoiding the cos
skipping unused fields
If the coupling of fields is not known at design time, this representationdifficult to plan for
Use of this pattern has the overhead of both interleaving methods if
enumerating uncoupled fields (e.g. field 1 and field 5 in example)
8/9/2019 Time Series Data Concepts
38/52
Coupled Interleaved Example
8/9/2019 Time Series Data Concepts
39/52
RDBMS Storage Patt
8/9/2019 Time Series Data Concepts
40/52
RDBMS Storage
Time series data can be stored in a number of ways within an R
Time series data may be stored as temporal records, one value
Likewise, time series data can be compacted into a single field a
stored as a binary object (BLOB) or XML
8/9/2019 Time Series Data Concepts
41/52
Flat Temporal
In the flat temporal model of storing time series data, there is n
notion of a time series
All data is simply stored as temporal records
This is the most simplistic method of storing temporal data overall
Provides good performance for random access
Suffers from poor insert performance (mainly when indexed) Relatively slow overall sequential access performance due to the ta
scan nature of retrieval
8/9/2019 Time Series Data Concepts
42/52
Flat Time Series
Each time series is registered in a time series table that definethe time series reference information (metadata)
All the actual data for the time series is stored in a values table
Each record in the values table stores a single time series record (p
time)
In most cases, each time series will have a different set of fields antherefore be best represented by a separate values table
Results in a single master time series table and multiple values ta
8/9/2019 Time Series Data Concepts
43/52
Flat Time Series Example
Provides similar performance characteristics to the flat tempora
model
Time series table allows for retrieval based upon a specific time se
instance
Allows for a long-term time series (such as continual monitoring) t
identified in a single values table
If random access to data is the most common, this model will ybest overall performance characteristics and allow for query by
values with no special software capabilities utilized
8/9/2019 Time Series Data Concepts
44/52
Entity Time Series
An entire time series is treated as an entity
Individual data values treated simply as atoms within the entity Time series is stored as a single record in a database table
Entity time series storage has multiple flavors that each have differences to improve some aspect of the time series storage sperformance
Flat BLOB
Flat XML
External File
8/9/2019 Time Series Data Concepts
45/52
Entity Time Series Example
8/9/2019 Time Series Data Concepts
46/52
Dynamic Time Series A further refinement for RDBMS storage of time series data is to
dynamically structure the storage rather than use fixed element
the previous methodologies
Dynamic time series storage is a broad class of methodologies tattempt to gain advantages in performance and size for managiseries data within an RDBMS
In all dynamic time series storage strategies data within the valu
fields may be encoded as BLOB or XML data In dynamic storage, the time series is simply broken into multip
individual records each of which contains multiple data values
8/9/2019 Time Series Data Concepts
47/52
Fixed Size Dynamic Storage
Each record has a field target size limit (e.g. 100kb, 10Mb, etc) for s
data values
The data value encoding software is responsible for breaking the timseries into chunks of data values that do not exceed this size limit
The goal is to encode the most discrete values possible, in time orddo not exceed this size limit. In this manner, each record will contavalues between a min and max time
There are two basic sub-strategies for fixed size time series storage
Time Window
Entity Window
8/9/2019 Time Series Data Concepts
48/52
Fixed Size Dynamic Example
In the time window strategy, the time series values table mainta
start date and end date for each time series record that indicatebounds stored within that record
The entity window strategy is very similar, except that if the tim
records are all of fixed size, it is possible to know a priori what t
exact maximum number of data values may be stored within a
record of the time series
8/9/2019 Time Series Data Concepts
49/52
Entity Window Computation
In this strategy, the time series itself indicates the number of values stored within
and the offset is computed to any value as:
Once the computation is completed
recordOffset indicates the sequenceId (zero-based) containing the value
elementOffset indicates which value within the record is to be returned
The need for this computation makes random access possible but slightly computcostly
For enumeration of data, there is no such overhead cost
8/9/2019 Time Series Data Concepts
50/52
8/9/2019 Time Series Data Concepts
51/52
Conclusion
Every organization must evaluate its information strategy and time series data needs toadequate planning and effective implementations are used for an effective lifecycle for
There are many considerations for each type of time series data that comprises the orgainformation corpus
Data modeling and implementation planning is an activity which is critical to ensure theentities are captured in a repeatable, standardized and maintainable manner
Time series data can be reduced to a simple set of concepts and a small set of general pimplementation
Each actual time series data set within the organization can use these concepts and pat
create an effective and efficient implementation for that time series that can be reusedorganizations lifetime
Each time series data type will need to be evaluated separately and the most effective spatterns used
8/9/2019 Time Series Data Concepts
52/52
QuestThere are no silve