Confidential 1
Confidential 1
How to get Real-Time Value from your IoT Data
Vincent Poncet Solution Engineer EMEA
IoT Application Characteristics
© DataStax, All Rights Reserved.3
Real-Time DistributedAlways-OnContextual Scalable
Platform for IoT Applications
DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.4
DSEFS
From validation to momentum.
400+Employees
$190MFunding
500+Customers
Founded in April 2010Santa Clara • San Francisco • Austin •
London • Paris • Berlin • Tokyo • Sydney
(Series E – Sept. 2014) 30% +
2016 World’s Best
100 Cloud Companies
Ranked #1 in multiple operational
database categories
© 2017 DataStax, All Rights Reserved. Company Confidential
GE
THE CHALLENGE: Collect sensor data from millions of
devices from around the world and
manage trillions of transactions per day
© DataStax, All Rights Reserved.6
• GE offers the first Industrial Cloud Platform called
GE Predix. Datastax is part of the data services
layer within the platform
• DSE will collect sensor data from millions of
devices from around the world to help GE provide
predictive maintenance to their customers and
increase operational efficiencies
• Predix manages trillions of transactions
per day. DSE was recognized as the only
solution that could support this scale and
data center replication
First Utility
• First Utility offers a disruptive, modern
application called My Energy that gives
customers total transparency to understand &
manage their energy consumption
• Each Smart Meter produces up to 17,000
readings per year
• DSE provides the distributed, responsive &
intelligent foundation to power My Energy at
scale
• As a result, customers use 5-6% less energy
and further reduce their energy bills
THE CHALLENGE: Drive better customer experiences by
giving customers the information they
need to control their energy usage
through Smart Meters technology
© DataStax, All Rights Reserved.7
Traxens
THE CHALLENGE: Implement a solution for global,
real-time, end-to-end monitoring of
containers door to door and proactive
alerts for issues
© DataStax, All Rights Reserved.8
• Traxens offers an IOT service, Trax-Hub, for
real-time, end to end global monitoring of
containers door to door
• Alerts for open boxes, temperature changes,
etc.
• Granular monitoring of individual
containers: Traxens can store information
on all containers (up to 20,000 in one ship),
and hundreds of attributes per container
• Scalable platform for future needs
What is Apache Cassandra?
© DataStax, All Rights Reserved.9
Apache Cassandra
©2014 DataStax Confidential.
Do not distribute without
consent.
• Distributed NoSQL Database
• Google Big Table
• Amazon Dynamo
• Continuous Availability
• Disaster Avoidance
• Linear Scale Performance
• Add nodes to scale
• Runs on Commodity Hardware
• Cloud or on Premise
San
Francisco
New York
Munich
Apache Cassandra Disaster Avoidance
©2014 DataStax Confidential. Do not distribute without consent.
San
Francisco
New York
Munich
© DataStax, All Rights Reserved. 12
Example Data Model
Sensor collects data
Cassandra stores in sequence
Application reads in sequence
Car Sensor Use Case
• Store data per sensor
• Store time series in order: first to last
• Get all data for one sensor
• Get data for a single date and time
• Get data for a range of dates and times
Needed Queries
Data Model to support queries
Use Case
Sensor Id and Time are unique
Store as many as needed
CREATE TABLE car_stats (
sensor_id text,
collect_time timestamp,
temperature text,
longitude text,
latitude text,
speed text,
PRIMARY KEY (sensor_id,collect_time)
);
INSERT INTO car_stats(sensor_id,collect_time,temperature,longitude,latitude,speed)
VALUES ('1234ABCD','2013-04-03 07:01:00','19C',’134.231’,‘234.234’,’60kmh’);
INSERT INTO car_stats(sensor_id,collect_time,temperature,longitude,latitude,speed)
VALUES ('1234ABCD','2013-04-03 07:02:00','20C',’135.230’,‘237.239’,’65kmh’);
INSERT INTO car_stats(sensor_id,collect_time,temperature,longitude,latitude,speed)
VALUES ('1234ABCD','2013-04-03 07:03:00','20C',’137.431’,‘240.793’,’68kmh’);
INSERT INTO car_stats(sensor_id,collect_time,temperature,longitude,latitude,speed)
VALUES ('1234ABCD','2013-04-03 07:04:00','21C',’138.589’,‘234.234’,’69kmh’);
Data Model
SELECT sensor_id,collect_time,temperature,longitude,latitude,speed
FROM car_stats
WHERE sensor_id='1234ABCD';
Sensor_id Collect_time temperature longitude latitude Speed
1234ABCD 2015-04-03
07:01:00
19C 134.231 234.234 60kmh
1234ABCD 2015-04-03
07:02:00
20C 135.230 237.239 65kmh
1234ABCD 2015-04-03
07:03:00
20C 137.431 240.793 68kmh
1234ABCD 2015-04-03
07:04:00
21C 138.589 234.234 69kmh
Storage Model – Logical View
SELECT sensor_id,collect_time,temperature,longitude,latitude,speed
FROM car_stats
WHERE sensor_id='1234ABCD';
Merged, Sorted and Stored Sequentially
1234A
BCD
2015-04-03 07:01:00 2015-04-03 07:02:00 2015-04-03 07:03:00 2015-04-03 07:04:00
19C 134.2
31
234.2
34
60km
h
20C 135.2
30
237.2
39
65kmh 20C 137.4
31
240.7
93
68kmh 21C 138.5
89
234.2
34
69kmh
Storage Model – Disk Layout
Range queries
“Slice” operation on disk
SELECT sensor_id,collect_time,temperature,longitude,latitude,speed
FROM car_stats
WHERE sensor_id='1234ABCD'
AND collect_time >= '2015-04-03 07:01:00'
AND collect_time <= '2015-04-03 07:04:00';
Single seek on disk
Query Patterns
1234
ABC
D
2015-04-03 07:01:00 2015-04-03 07:02:00 2015-04-03 07:03:00 2015-04-03 07:04:00
19C 134.2
31
234.2
34
60km
h
20C 135.2
30
237.2
39
65kmh 20C 137.4
31
240.7
93
68kmh 21C 138.5
89
234.2
34
69kmh
Range queries
“Slice” operation on disk
Sorted by
collect_time
SELECT sensor_id,collect_time,temperature,longitude,latitude,speed
FROM car_stats
WHERE sensor_id='1234ABCD'
AND collect_time >= '2015-04-03 07:01:00'
AND collect_time <= '2015-04-03 07:04:00';
Query Patterns
Sensor_id Collect_time temperature longitude latitude Speed
1234ABCD 2015-04-03
07:01:00
19C 134.231 234.234 60kmh
1234ABCD 2015-04-03
07:02:00
20C 135.230 237.239 65kmh
1234ABCD 2015-04-03
07:03:00
20C 137.431 240.793 68kmh
1234ABCD 2015-04-03
07:04:00
21C 138.589 234.234 69kmh
Cassandra Data Modeling
Requires a different mindset than RDBMS modeling
Know your data and your queries up front
Queries drive a lot of the modeling decisions (i.e. “table per query” pattern)
Denormalize/Duplicate data at write time to do as few queries as possible
come read time
Remember, storage is cheap and writes in Cassandra are FAST ( about
1,000 inserts / second per physical CPU core )
20
© DataStax, All Rights Reserved. 21
DataStax Enterprise DSE
© 2015 DataStax, All Rights Reserved. 22
I’ve ingested my data, now what?
Platform for IoT Applications
DataStax is a registered trademark of DataStax, Inc. and its
subsidiaries in the United States and/or other countries.
23
DSEFS
DataStax EnterpriseCompany Confidential
Offline
Application
External
Spark or
Hadoop
Cluster
Spark/
Hadoop
RDBMS
24
Real Time
Analytics
Batch
Analytics
Real
Time
Search
Certified Apache Cassandra
No Single Point of Failure | Linear Scalability | Always-On
DSE – Fully Integrated Technology Stack
Ease of
Use
DataStax
Studio
OpsCenter
Services
Monitoring,
Operations
Low
Latency
In-
Memory
Data
DSE
Graph
Graph
DatabaseOperational
Resiliency
File
System
Advanced
SecurityDSEFS
Analytics
Transformations
• Ready and certified for production environments.
• Rigorous certification process:
• Extensive quality assurance testing.
• Performance and scale tests with 1,000 node clusters.
• 3rd party software validation.
• Certified for key supported platforms.
©2014 DataStax Confidential. Do not distribute without consent.
DataStax Enterprise – Certified Cassandra
26
• Embedded Spark
• ETL workloads, Real-Time Streaming Analytics, SQL Operational Analytics on
Cassandra.
• DSE benefits:
• Spark Master HA
• Integrated security
• Support
DSE Analytics
HTTP Application Message Queue
Streaming
Analytics
Near Real Time
Analytics
Real-time
DSE Multi-Workload Analytics Architecture
© 2015 DataStax, All Rights Reserved. 27
28
• DSE Search inherits all the power and capabilities of Solr and builds on top of it
to create even more powerful enterprise search functionality
• Built-in scale out and continuous availability and multiple data centers support
• Automatic indexing when inserting and updating in Cassandra
• Search Capabilities integrated into Cassandra Query Language
• Multi-criteria
• Full text
• Geospatial
• Faceting
• Auto-completion
DSE Search
HTTP Application Message Queue
Streaming
Analytics
Near Real Time
Analytics
Real-timeSearch
© 2015 DataStax, All Rights Reserved. 29
DSE Multi-Workload Analytics Architecture
©2016 DataStax
• Allows one-way replication from “edge” cluster to another,
centralized hub cluster.
• Ideal for retail, energy, and other “edge of the internet of things” use
cases.
• Hub and spoke
DSE Advanced Replication
©2016 DataStax
• Able to automatically move data to different storage media based on
defined criteria.
• Helps reduce storage costs by relegating lesser-used or older data to
less expensive storage devices.
• Works on a granular per-row basis.
DSE Tiered Storage
Confidential 32
DSEFS
• Distributed file system, masterless, API
compatible with HDFS
• Resiliency of metadata, being stored in
Cassandra tables
• Cost effective, cold storage of data
• Staging
• Archiving
• Analytics (with Spark)
Confidential 33
Storage Temperature Management
• Business value of data record per byte is low in IoT use cases
• Being able to optimize the cost of storage depending on the usage of the data is
key
• A tiering / temperature management approach is a relevant response
• Hot Fast storage using SSD for fresh data
• Warm cost effective storage using HDD for older data (in-DB online archive)
• Cold cheapest storage using file system for long term data (out-DB archive)
• Can be used with Spark for analytical usages
Hot Data
Tiered Storage
SSD
Warm Data
Tiered Storage
HDD
Cold Data
DSEFS
HDD
©2016 DataStax
• Transparent Data Encryption of ALL DSE data at rest
• Role based access control
• Unified authentication: Allows multiple security authentication protocols (e.g.
Kerberos, LDAP, Active Directory, internal Cassandra) to be used on the same
database cluster.
• Data Auditing
DSE Enterprise Security
Build and Manage
Interact with DataStax
Enterprise from your
application
Create objects via DDL (e.g.
CREATE…)
GRANT/REVOKE
INSERT, UPDATE, DELETE
Query data with SELECT
Certified DataStax drivers:
Community drivers:
Java C# Python C++
Node.js ODBC PHP Ruby
Closure Erlang Haskell Rust
The Cassandra Query
Language (CQL)
Explore, query, and
analyze DSE
• Visually Create and
Navigate Database
Objects via CQL
• Gremlin Query Language
Support
• Auto-completion, result set
visualization, execution
management, and much
more.
• Friendly Fluent APIDataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.37
Studio
Visual management for
DSE
• Automate what no one
likes – backups, repairs
• REST API to work in your
world
• Instantly manage your
cluster, scaling up or down
at a moment’s notice
• Monitor your cluster and
follow best practices,
ensuring a secure
environmentDataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States
and/or other countries.38
OpsCenter
• Designed to automatically handle many maintenance and
management tasks.
• Makes DSE easy to work with.
• Services included:
• Repair service
• Capacity service
• Performance service
• Best Practice service
• Backup/Restore service
©2014 DataStax Confidential. Do not distribute without consent.
DataStax Automatic Management Services
• 24x7x365
• Production and non-production environments.
• Health checks for assistance on architecture,
design, and tuning.
• Certified service packs
• Hot-fix support and back porting of bug fixes
©2014 DataStax Confidential. Do not distribute without consent.
DataStax Expert Support
DataStax Managed Cloud
• DSE on AWS with Managed Provisioning
and Scaling by DataStax
• 24x7x365 Coverage,
Lights-Out Management
• System Configuration and Tuning to Meet
Customer Specific Requirements
• Architecture Advisory Services, Guidance
and Best Practices
A Fully Managed, Secure Architecture
41 © 2017 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache
Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Thank you
© 2017 DataStax, All Rights Reserved. Company Confidential
We are the powerbehind the moment.
© 2017 DataStax, All Rights Reserved. Company Confidential
Confidential
4
4