Top Banner
Analyzing petabytes of smart meter data using Cloud Bigtable, Cloud Dataflow, and BigQuery Edwin Poot & Erik van Wijk, Energyworx Max Luebbe, Google
57

Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

Jan 20, 2017

Download

Technology

Edwin Poot
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

Analyzing petabytes of smart meter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

Edwin Poot & Erik van Wijk, Energyworx

Max Luebbe, Google

Page 2: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

2

ENERGY TRANSITION IN PROGRESS2

Page 3: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

3

● rise of renewable energy sources

● regulation & market demands

● competition & increased costs

● intelligent devices in the home or along the utilities infrastructure (“Internet of Things”)

● two-way flow of information instead of one-way

● increase of consumption

Page 4: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

4

1. increasing density brings increasing data quality problems

2. strict regulations for safeguarding user privacy

3. redistribution of economic power and energy demand

4. rising competition between distributed and central

5. innovation outpaces regulation

Top 5 industry challenges

Page 5: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

www.energyworx.com

CHINA435 M

USA132 M

JAPAN58.7 MFRANCE

35 M

UK53 M

NL8 M

Italy32 M

Ontario4.7 M

British Columbia

1.2 M

Quebec3.8 M Germany

50 M

5

Page 6: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

conventional utility systems cannot cope with this data diversity and endless stream of all types, shapes and sizes

smart meters

smart grid equipmentsensors

home automation

multichannel customer interactions

consumers’ usage behavior

weather

social

spatial

creating a single, centralized view of data – accessible to many, and for many use cases, that is the key to success

6

Page 7: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

“We enable the energy evolution by uncovering and monetizing the hidden value of your data!”

ingest, process, analyze & learn

7

Page 8: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

8

Enabling data-driven business models for the Energy & Utility industry since 2012

Offices in The Netherlands and in the United States,

Delivering a revolutionary data management & intelligence cloud

service disrupting the global Energy & Utilities market

Pushing out established vendors using pure play SaaS

Creating actionable information - sparking new

business concepts and models

Crunching data without being limited by scale,

speed and obsolete pricing models

Page 9: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

9

generation

Meter Data Management

Renewable Energy Management

transmission trading distribution supply

Social EnergyConsumer Engagement

imbalancessettlements

Energy insights for wholesale connections

energyworx and the energy value chain

Page 10: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

10

ENERGY INTELLIGENCE

ENERGY PROSUMERS & RETAILERS

Demand Response (price)

Energy Insights

Demand Response (load)

Grid InsightsRenewables Engagement

Gamification Benchmarking

Balancing Congestion

Optimization Anomalies

MARKETS & SOLUTIONS

ENERGY DATA MANAGEMENTMeter Data Management Energy Data Hub

ENERGY SYSTEM OPERATORS

Page 11: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

11

● Always supporting the latest IoT products and/or equipment

● Protocol agnostic data ingestion and limitless computation capacity

● Cloud Machine learning to support new business concepts and models

● Pay as you grow SaaS model, so no large upfront investments

OUR ADVANTAGES

Page 12: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

1212

Our platform

Page 13: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

13

PLATFORM EVOLUTION HIGHLIGHTS

2012 2013 2014 2015 2016

- batched data- temporal aggregations - VEE- utility connectivity- API

- multi-tenancy- permissions- custom querying- grouping- tag properties

- datalabs (EDA)- Machine learning- CloudML- (A)DR

- streaming data- pseudonymisation- tagging- analytics- dynamic profiling- PayPerUse model

- IoT devices- many new adapters- performance- web console- Sheets addon

Data ingestion & management Insights & analysis Intelligence & IoT control

Page 14: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

14

Page 15: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

DELIVER A DATA MANAGEMENT & ANALYTICS SERVICE FOR ENERGY & UTILITY COMPANIES

PUBLIC

&

PRIVATECLOUD

15

Page 16: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

1616

Big Data Challenges at Google

Page 17: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

17

Google's mission to "organize the world’s information" presents new challenges.

Page 18: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

18

Big Data technologies invented at Google

2012 20132002 2004 2006 2008 2010

GFS

MapReduce

Bigtable Colossus

Dremel Flume

Millwheel

Page 19: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

1919

How do we … ?

Page 20: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

20

… build a 100TB+ filesystem?

Need: Google was building enormous data sets, and needed an abstracted way to store and access at scale.

Page 21: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

21

… build a 100TB+ filesystem?

Need: Google was building enormous data sets, and needed an abstracted way to store and access at scale.

Solution: GFS (replaced by higher-scale Colossus in 2010)

Page 22: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

22

… build a 100TB+ filesystem?

Need: Google was building enormous data sets, and needed an abstracted way to store and access at scale.

Solution: GFS (replaced by higher-scale Colossus in 2010)

Google Cloud Storage

Page 23: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

23

Need: Massive data index files took weeks to rebuild. We needed random read/write access.

… build a petabyte database?

Page 24: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

24

Need: Massive data index files took weeks to rebuild. We needed random read/write access.

Solution: Bigtable (internal service launched 2006)

… build a petabyte database?

Page 25: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

25

Need: Massive data index files took weeks to rebuild. We needed random read/write access.

Solution: Bigtable (internal service launched 2006)

Google Cloud Bigtable

… build a petabyte database?

Page 26: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

26

Need: Ad hoc queries over massive quantities of data, in just seconds.

… query a trillion rows in seconds?

Page 27: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

27

Need: Ad hoc queries over massive quantities of data, in just seconds.

Solution: Dremel

… query a trillion rows in seconds?

Page 28: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

28

Need: Ad hoc queries over massive quantities of data, in just seconds.

Solution: Dremel

Google BigQuery

… query a trillion rows in seconds?

Page 29: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

29

Need: Process petabytes of static and streaming data, quickly.

… build data-processing at Google scale?

Page 30: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

30

Need: Process petabytes of static and streaming data, quickly.

Solution: MapReduce, Flume, and Millwheel

… build data-processing at Google scale?

Page 31: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

31

Need: Process petabytes of static and streaming data, quickly.

Solution: MapReduce, Flume, and Millwheel

Google Cloud Dataflow

… build data-processing at Google scale?

Page 32: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

3232

Imagine what one can build...

Page 33: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

33

.. when scale is a solved problem.

Page 34: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

34

Google Cloud Platform is the same infrastructure

Cloud Storage BigQuery Cloud DataflowCloud Bigtable

Page 35: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

35

Cloud Bigtable is the same service Google uses

Cloud Bigtable

Bigtable Service

Page 36: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

36

What is Cloud Bigtable?

NoSQL database for large datasets / large throughput

Supports sequential scans

Auto-adjusts to access patterns

Page 37: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

37

Bigtable Node

Bigtable Node

Bigtable Node

How does Cloud Bigtable work?

Colossus Filesystem

Client Client Client Client Client Client

Processing

Storage

Clients

Page 38: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

38

Node

Cloud Bigtable learns access patterns...

Filesystem

Node Node

Client Client Client Client Client Client

Processing

Storage

Clients

A B C D E

Page 39: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

39

Node Node Node

… and rebalances data accordingly

Filesystem

Client Client Client Client Client Client

Processing

Storage

Clients

A B C D EB C

Page 40: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

40

Throughput can be controlled by node count

Node Node Node

Nodes

80,000

60,000

40,000

20,000

QPS

Bigtable Nodes

864200

Page 41: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

41

Throughput can be controlled by node count

400,000

300,000

200,000

100,000

QPS

Bigtable Nodes

4030201000

Nodes

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Page 42: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

42

Throughput can be controlled by node count

4,000,000

3,000,000

2,000,000

1,000,000

QPS

Bigtable Nodes

40030020010000

NodesNode Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node Node Node Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Node Node

Page 43: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

43

Years of engineering to...

Teach Bigtable to configure itself

Isolate performance from “noisy neighbors”

React automatically to new patterns, splitting and balancing

Cloud Bigtable

Page 44: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

44

Google has had an internal cloud for over a decade

The same engineering that has made our internal services better makes our Cloud better:

Simpler control planes Multi-tenancy Adapts to large, new patterns

Page 45: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

4545

Why we chose Google

Page 46: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

46

Why did we choose

● Fastest with consistent performance

● Competitive and transparent pricing

● Autoscale to millions of users (and back)

● Unlimited flexible storage and caching

● Big Data & Machine Learning capabilities

● Development SDK & tools

● 24/7 access to expert support resources

Page 47: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

47

5 things we’ve learned along the way

1 2 3 4 5

SKILLS, KNOWLEDGE &

TRAININGREQUIRED

IMPLEMENTATION TIME CODE

ABSTRACTION USING API’S

PAAS SANDBOX

IMPACT ON BUSINESS MODEL

understand all PaaS possibilities and components to

prevent reinventing what already exists

and speed-up implementation &

migration

shorter release cycles require smaller feature sets per release, adapt

your software development &

release management method

to be cloud agnostic you need code

abstraction layers per PaaS service

you use

design and modify your software

architecture to fit the PaaS sandbox

adapt your business model to PaaS cost

model

Page 48: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

4848

Our service architecture

Page 49: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

49

INGEST PROCESS ANALYZESTORE

App Engine

Cloud PubSub

App EngineCloud Storage

Datastore

Bigtable

BigQuery

Cloud SQL

Dataflow

Dataproc CloudML

Datalab

BigQuery

API

Events

Devices

Validate

Aggregate

Calculate

Timeseries

Metadata

Tags

Insights

Predict

Decide

Page 50: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

50

Data Ingestion Process

Cloud PubSub DataFlow

IoT EquipmentBig Table

BigQuery

Page 51: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

5151

Use cases

Page 52: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

“Creating actionable insights - sparking new business

concepts and models. Crunching data without being

limited by scale, speed and obsolete pricing models.”

52

Page 53: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

5353

Uncovering hidden value from data

Page 54: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

54

• Classification• Clustering• Regression• Anomaly detection• Prediction/forecasting• Motif discovery• Association rules

Exploratory Data Analysis with Energyworx

Uncover hidden value from your data!

Features:- part of Energyworx SaaS- autoscaling with demand- notebook development

environment - private & public models- Energyworx shared models

Page 55: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

5555

Demo: Clustering time series data from Smart Meters

Page 56: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

5656

Q & A

Page 57: Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, and BigQuery

5757

Thank you!