How to get Real-Time Value from your IoT Data - Datastax

Post on 22-Jan-2018

158 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

Transcript

Confidential 1

How to get Real-Time Value from your IoT Data

Vincent Poncet Solution Engineer EMEA

IoT Application Characteristics

© DataStax, All Rights Reserved.3

Real-Time DistributedAlways-OnContextual Scalable

Platform for IoT Applications

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.4

DSEFS

From validation to momentum.

400+Employees

$190MFunding

500+Customers

Founded in April 2010Santa Clara • San Francisco • Austin •

London • Paris • Berlin • Tokyo • Sydney

(Series E – Sept. 2014) 30% +

2016 World’s Best

100 Cloud Companies

Ranked #1 in multiple operational

database categories

© 2017 DataStax, All Rights Reserved. Company Confidential

GE

THE CHALLENGE: Collect sensor data from millions of

devices from around the world and

manage trillions of transactions per day

© DataStax, All Rights Reserved.6

• GE offers the first Industrial Cloud Platform called

GE Predix. Datastax is part of the data services

layer within the platform

• DSE will collect sensor data from millions of

devices from around the world to help GE provide

predictive maintenance to their customers and

increase operational efficiencies

• Predix manages trillions of transactions

per day. DSE was recognized as the only

solution that could support this scale and

data center replication

First Utility

• First Utility offers a disruptive, modern

application called My Energy that gives

customers total transparency to understand &

manage their energy consumption

• Each Smart Meter produces up to 17,000

readings per year

• DSE provides the distributed, responsive &

intelligent foundation to power My Energy at

scale

• As a result, customers use 5-6% less energy

and further reduce their energy bills

THE CHALLENGE: Drive better customer experiences by

giving customers the information they

need to control their energy usage

through Smart Meters technology

© DataStax, All Rights Reserved.7

Traxens

THE CHALLENGE: Implement a solution for global,

real-time, end-to-end monitoring of

containers door to door and proactive

alerts for issues

© DataStax, All Rights Reserved.8

• Traxens offers an IOT service, Trax-Hub, for

real-time, end to end global monitoring of

containers door to door

• Alerts for open boxes, temperature changes,

etc.

• Granular monitoring of individual

containers: Traxens can store information

on all containers (up to 20,000 in one ship),

and hundreds of attributes per container

• Scalable platform for future needs

What is Apache Cassandra?

© DataStax, All Rights Reserved.9

Apache Cassandra

©2014 DataStax Confidential.

Do not distribute without

consent.

• Distributed NoSQL Database

• Google Big Table

• Amazon Dynamo

• Continuous Availability

• Disaster Avoidance

• Linear Scale Performance

• Add nodes to scale

• Runs on Commodity Hardware

• Cloud or on Premise

San

Francisco

New York

Munich

Apache Cassandra Disaster Avoidance

©2014 DataStax Confidential. Do not distribute without consent.

San

Francisco

New York

Munich

© DataStax, All Rights Reserved. 12

Example Data Model

Sensor collects data

Cassandra stores in sequence

Application reads in sequence

Car Sensor Use Case

• Store data per sensor

• Store time series in order: first to last

• Get all data for one sensor

• Get data for a single date and time

• Get data for a range of dates and times

Needed Queries

Data Model to support queries

Use Case

Sensor Id and Time are unique

Store as many as needed

CREATE TABLE car_stats (

sensor_id text,

collect_time timestamp,

temperature text,

longitude text,

latitude text,

speed text,

PRIMARY KEY (sensor_id,collect_time)

);

INSERT INTO car_stats(sensor_id,collect_time,temperature,longitude,latitude,speed)

VALUES ('1234ABCD','2013-04-03 07:01:00','19C',’134.231’,‘234.234’,’60kmh’);

INSERT INTO car_stats(sensor_id,collect_time,temperature,longitude,latitude,speed)

VALUES ('1234ABCD','2013-04-03 07:02:00','20C',’135.230’,‘237.239’,’65kmh’);

INSERT INTO car_stats(sensor_id,collect_time,temperature,longitude,latitude,speed)

VALUES ('1234ABCD','2013-04-03 07:03:00','20C',’137.431’,‘240.793’,’68kmh’);

INSERT INTO car_stats(sensor_id,collect_time,temperature,longitude,latitude,speed)

VALUES ('1234ABCD','2013-04-03 07:04:00','21C',’138.589’,‘234.234’,’69kmh’);

Data Model

SELECT sensor_id,collect_time,temperature,longitude,latitude,speed

FROM car_stats

WHERE sensor_id='1234ABCD';

Sensor_id Collect_time temperature longitude latitude Speed

1234ABCD 2015-04-03

07:01:00

19C 134.231 234.234 60kmh

1234ABCD 2015-04-03

07:02:00

20C 135.230 237.239 65kmh

1234ABCD 2015-04-03

07:03:00

20C 137.431 240.793 68kmh

1234ABCD 2015-04-03

07:04:00

21C 138.589 234.234 69kmh

Storage Model – Logical View

SELECT sensor_id,collect_time,temperature,longitude,latitude,speed

FROM car_stats

WHERE sensor_id='1234ABCD';

Merged, Sorted and Stored Sequentially

1234A

BCD

2015-04-03 07:01:00 2015-04-03 07:02:00 2015-04-03 07:03:00 2015-04-03 07:04:00

19C 134.2

31

234.2

34

60km

h

20C 135.2

30

237.2

39

65kmh 20C 137.4

31

240.7

93

68kmh 21C 138.5

89

234.2

34

69kmh

Storage Model – Disk Layout

Range queries

“Slice” operation on disk

SELECT sensor_id,collect_time,temperature,longitude,latitude,speed

FROM car_stats

WHERE sensor_id='1234ABCD'

AND collect_time >= '2015-04-03 07:01:00'

AND collect_time <= '2015-04-03 07:04:00';

Single seek on disk

Query Patterns

1234

ABC

D

2015-04-03 07:01:00 2015-04-03 07:02:00 2015-04-03 07:03:00 2015-04-03 07:04:00

19C 134.2

31

234.2

34

60km

h

20C 135.2

30

237.2

39

65kmh 20C 137.4

31

240.7

93

68kmh 21C 138.5

89

234.2

34

69kmh

Range queries

“Slice” operation on disk

Sorted by

collect_time

SELECT sensor_id,collect_time,temperature,longitude,latitude,speed

FROM car_stats

WHERE sensor_id='1234ABCD'

AND collect_time >= '2015-04-03 07:01:00'

AND collect_time <= '2015-04-03 07:04:00';

Query Patterns

Sensor_id Collect_time temperature longitude latitude Speed

1234ABCD 2015-04-03

07:01:00

19C 134.231 234.234 60kmh

1234ABCD 2015-04-03

07:02:00

20C 135.230 237.239 65kmh

1234ABCD 2015-04-03

07:03:00

20C 137.431 240.793 68kmh

1234ABCD 2015-04-03

07:04:00

21C 138.589 234.234 69kmh

Cassandra Data Modeling

Requires a different mindset than RDBMS modeling

Know your data and your queries up front

Queries drive a lot of the modeling decisions (i.e. “table per query” pattern)

Denormalize/Duplicate data at write time to do as few queries as possible

come read time

Remember, storage is cheap and writes in Cassandra are FAST ( about

1,000 inserts / second per physical CPU core )

20

© DataStax, All Rights Reserved. 21

DataStax Enterprise DSE

© 2015 DataStax, All Rights Reserved. 22

I’ve ingested my data, now what?

Platform for IoT Applications

DataStax is a registered trademark of DataStax, Inc. and its

subsidiaries in the United States and/or other countries.

23

DSEFS

DataStax EnterpriseCompany Confidential

Offline

Application

External

Spark or

Hadoop

Cluster

Spark/

Hadoop

RDBMS

24

Real Time

Analytics

Batch

Analytics

Real

Time

Search

Certified Apache Cassandra

No Single Point of Failure | Linear Scalability | Always-On

DSE – Fully Integrated Technology Stack

Ease of

Use

DataStax

Studio

OpsCenter

Services

Monitoring,

Operations

Low

Latency

In-

Memory

Data

DSE

Graph

Graph

DatabaseOperational

Resiliency

File

System

Advanced

SecurityDSEFS

Analytics

Transformations

• Ready and certified for production environments.

• Rigorous certification process:

• Extensive quality assurance testing.

• Performance and scale tests with 1,000 node clusters.

• 3rd party software validation.

• Certified for key supported platforms.

©2014 DataStax Confidential. Do not distribute without consent.

DataStax Enterprise – Certified Cassandra

26

• Embedded Spark

• ETL workloads, Real-Time Streaming Analytics, SQL Operational Analytics on

Cassandra.

• DSE benefits:

• Spark Master HA

• Integrated security

• Support

DSE Analytics

HTTP Application Message Queue

Streaming

Analytics

Near Real Time

Analytics

Real-time

DSE Multi-Workload Analytics Architecture

© 2015 DataStax, All Rights Reserved. 27

28

• DSE Search inherits all the power and capabilities of Solr and builds on top of it

to create even more powerful enterprise search functionality

• Built-in scale out and continuous availability and multiple data centers support

• Automatic indexing when inserting and updating in Cassandra

• Search Capabilities integrated into Cassandra Query Language

• Multi-criteria

• Full text

• Geospatial

• Faceting

• Auto-completion

DSE Search

HTTP Application Message Queue

Streaming

Analytics

Near Real Time

Analytics

Real-timeSearch

© 2015 DataStax, All Rights Reserved. 29

DSE Multi-Workload Analytics Architecture

©2016 DataStax

• Allows one-way replication from “edge” cluster to another,

centralized hub cluster.

• Ideal for retail, energy, and other “edge of the internet of things” use

cases.

• Hub and spoke

DSE Advanced Replication

©2016 DataStax

• Able to automatically move data to different storage media based on

defined criteria.

• Helps reduce storage costs by relegating lesser-used or older data to

less expensive storage devices.

• Works on a granular per-row basis.

DSE Tiered Storage

Confidential 32

DSEFS

• Distributed file system, masterless, API

compatible with HDFS

• Resiliency of metadata, being stored in

Cassandra tables

• Cost effective, cold storage of data

• Staging

• Archiving

• Analytics (with Spark)

Confidential 33

Storage Temperature Management

• Business value of data record per byte is low in IoT use cases

• Being able to optimize the cost of storage depending on the usage of the data is

key

• A tiering / temperature management approach is a relevant response

• Hot Fast storage using SSD for fresh data

• Warm cost effective storage using HDD for older data (in-DB online archive)

• Cold cheapest storage using file system for long term data (out-DB archive)

• Can be used with Spark for analytical usages

Hot Data

Tiered Storage

SSD

Warm Data

Tiered Storage

HDD

Cold Data

DSEFS

HDD

©2016 DataStax

• Transparent Data Encryption of ALL DSE data at rest

• Role based access control

• Unified authentication: Allows multiple security authentication protocols (e.g.

Kerberos, LDAP, Active Directory, internal Cassandra) to be used on the same

database cluster.

• Data Auditing

DSE Enterprise Security

Build and Manage

Interact with DataStax

Enterprise from your

application

Create objects via DDL (e.g.

CREATE…)

GRANT/REVOKE

INSERT, UPDATE, DELETE

Query data with SELECT

Certified DataStax drivers:

Community drivers:

Java C# Python C++

Node.js ODBC PHP Ruby

Closure Erlang Haskell Rust

The Cassandra Query

Language (CQL)

Explore, query, and

analyze DSE

• Visually Create and

Navigate Database

Objects via CQL

• Gremlin Query Language

Support

• Auto-completion, result set

visualization, execution

management, and much

more.

• Friendly Fluent APIDataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.37

Studio

Visual management for

DSE

• Automate what no one

likes – backups, repairs

• REST API to work in your

world

• Instantly manage your

cluster, scaling up or down

at a moment’s notice

• Monitor your cluster and

follow best practices,

ensuring a secure

environmentDataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States

and/or other countries.38

OpsCenter

• Designed to automatically handle many maintenance and

management tasks.

• Makes DSE easy to work with.

• Services included:

• Repair service

• Capacity service

• Performance service

• Best Practice service

• Backup/Restore service

©2014 DataStax Confidential. Do not distribute without consent.

DataStax Automatic Management Services

• 24x7x365

• Production and non-production environments.

• Health checks for assistance on architecture,

design, and tuning.

• Certified service packs

• Hot-fix support and back porting of bug fixes

©2014 DataStax Confidential. Do not distribute without consent.

DataStax Expert Support

DataStax Managed Cloud

• DSE on AWS with Managed Provisioning

and Scaling by DataStax

• 24x7x365 Coverage,

Lights-Out Management

• System Configuration and Tuning to Meet

Customer Specific Requirements

• Architecture Advisory Services, Guidance

and Best Practices

A Fully Managed, Secure Architecture

41 © 2017 DataStax, All Rights Reserved. DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries. Apache

Cassandra, Apache, Spark, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.

Thank you

© 2017 DataStax, All Rights Reserved. Company Confidential

We are the powerbehind the moment.

© 2017 DataStax, All Rights Reserved. Company Confidential

Confidential

4

4

top related