Top Banner
IoT @ Google Scale James Chittenden Google Cloud Platform Solutions Engineer [email protected]
30

IoT at Google Scale

Jan 15, 2017

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: IoT at Google Scale

IoT @ Google ScaleJames Chittenden Google Cloud Platform Solutions [email protected]

Page 2: IoT at Google Scale

+James Chittenden(Big Data Cloud Engineer)

[email protected]

Page 3: IoT at Google Scale

Big Data at Googleaka. Data at Google

Page 4: IoT at Google Scale

Manage the Entire Lifecycle of Big Data

Cloud Logs

Google App Engine

Google Analytics Premium

Cloud Pub/Sub

BigQuery Storage(tables)

Cloud Bigtable(noSQL)

Cloud Storage(files)

Cloud Dataflow

BigQuery Analytics(SQL)

Capture Store Analyze

Batch

Real time analytics and Alerts

Cloud DataStore

Process

Stream

Cloud Dataflow

Cloud Monitoring

Page 5: IoT at Google Scale

End to End View of the GCP IoT Architecture

Page 6: IoT at Google Scale

Device to Device Protocols

● Device Discovery● Device to Device authentication● Device Configuration● Protocol Routing

Page 7: IoT at Google Scale

Machine Learning: Pattern Detection and Prediction

● Subscribers scan real time streams and feed data into the Machine Learning Recognition algorithm

● Dataflow Orchestrates streaming algorithms which compare data streams against Experience Database

● Correlators detect known patterns and publish alerts using Cloud Pub/Sub

Page 8: IoT at Google Scale

Cloud Storage Archival and Retrieval

● Data is periodically unloaded from Big Table and stored in Cloud Storage for archival

● Data in Cloud Storage can be quickly re-loaded in Big Table should it need to be re-processed.

Page 9: IoT at Google Scale

Cloud Pub/SubReal-time and reliable messaging with Pub/Sub

Page 10: IoT at Google Scale

Messaging is a shock-absorber

Throughput LatencyAvailability

Images by Connie Zhou

• Buffer new requests during outages

• Prevent overloads that cause outages

• Redirect requests to recover from outages

• Smooth out spikes in new request rate

• Balance load across multiple workers

• Balance arrival rate with service rate

• Accept requests closer to the network edge

• Optimize message flow across regions

• Leverage shared efforts to improve protocols

Page 11: IoT at Google Scale

Pub/Sub is a change-absorber

Sinks TransformsSources

Images by Connie Zhou

• New data sources can plug into old data flows

• New data sources can use new schemas

• Common security policies for all sources

• Data can be sent to new destinations

• Push and Pull delivery are both available

• Spans organizational boundaries

• Select subsets of messages that matter

• Helps manage schema and version changes

• Can merge streams into new topics

Page 12: IoT at Google Scale

Chat & Mobile

Every time your GMail box pops up a new message, it’s because of a push notification to your browser or mobile device.

One of the most important real-time information streams in the company is advertising revenue — we use Pub/Sub to broadcast budgets to our entire fleet of search engines

Google Cloud Messaging for Android delivers billions of messages a day, reliably and securely for Google’s own mobile apps and the entire developer community

Updating search results as you type is a feat of real-time indexing that depends on Pub/Sub to update caches with breaking news

Ads & Budgets Instant SearchPush Notifications

Pub/Sub at Google

Page 13: IoT at Google Scale

HTTP ServerSubscriber

Pub/Sub System

WebhookDelivery

Publisher

Topic

Subscription

HTTP PushDelivery

GoogleApp Engine

Pull Subscriber

Subscription Subscription

Google RPCDelivery

CloudDataflow

Subscription

On-Prem/Cloud Any Environment

Page 14: IoT at Google Scale

Subscriber

Msg

Pub/Sub System

Subscriber

Msg

Pub/Sub System

Ack

RPC SendRPC Return

Ack

Push Subscription Pull Subscription

Page 15: IoT at Google Scale

“We don’t really run MapReduce at Google anymore”- Urs Hoelzle

Google Dataflow

Page 16: IoT at Google Scale

Google Technologies

SpannerDremelMapReduce

Big Table MillWheel

2012 2014+2002 2004 2006 2008 2010

GFS

2013

More!

Flumejava

Colossus

Page 17: IoT at Google Scale

Autoscaling mid-job

Fully managed - No-Ops

Intuitive Data Processing Framework

Batch and Stream Processing in one

Liquid sharding mid-job

1

2

3

4

5

Dataflow Goodies

Page 18: IoT at Google Scale

Autoscaling mid-job

Fully managed - No-Ops

Intuitive Data Processing Framework

Batch and Stream Processing in one

Liquid sharding mid-job

1

2

3

4

5

Pipeline p = Pipeline.create();

p.begin()

.apply(TextIO.Read.from(“gs://…”))

.apply(ParDo.of(new ExtractTags())

.apply(Count.create())

.apply(ParDo.of(new ExpandPrefixes())

.apply(Top.largestPerKey(3))

.apply(TextIO.Write.to(“gs://…”));

p.run();

Dataflow Goodies

Page 19: IoT at Google Scale

Autoscaling mid-job

Fully managed - No-Ops

Intuitive Data Processing Framework

Batch and Stream Processing in one

Liquid sharding mid-job

1

2

3

4

5

Deploy

Schedule & Monitor

Dataflow Goodies

Page 20: IoT at Google Scale

Autoscaling mid-job

Fully managed - No-Ops

Intuitive Data Processing Framework

Batch and Stream Processing in one

Liquid sharding mid-job

1

2

3

4

5

800 RPS 1200 RPS 5000 RPS 50 RPS

Dataflow Goodies

Page 21: IoT at Google Scale

Autoscaling mid-job

Fully managed - No-Ops

Intuitive Data Processing Framework

Batch and Stream Processing in one

Liquid sharding mid-job

1

2

3

4

5

Dataflow Goodies

Page 22: IoT at Google Scale

Autoscaling mid-job

Fully managed - No-Ops

Intuitive Data Processing Framework

Batch and Stream Processing in one

Liquid sharding mid-job

1

2

3

4

5

Pipeline p = Pipeline.create();

p.begin()

.apply(TextIO.Read.from(“gs://…”))

.apply(ParDo.of(new ExtractTags())

.apply(Count.create())

.apply(ParDo.of(new ExpandPrefixes())

.apply(Top.largestPerKey(3))

.apply(TextIO.Write.to(“gs://…”));

p.run();

.apply(PubsubIO.Read.from(“input_topic”))

.apply(Window.<Integer>by(FixedWindows.of(5, MINUTES))

.apply(PubsubIO.Write.to(“output_topic”));

Dataflow Goodies

Page 23: IoT at Google Scale

Unified Model

Page 24: IoT at Google Scale

Unified Model

Page 25: IoT at Google Scale

Pub/Sub + Dataflow + BigQuery Demo

Page 26: IoT at Google Scale

Life of a Pipeline

Page 27: IoT at Google Scale
Page 28: IoT at Google Scale

Dataflow

Your Data BigQuery

Fast ETLRegexJSONUDFs

Spreadsheets

BI Tools

Coworkers

Applications + Reports PubSub

Cloud Storage

BigTable

Enterprise Big Data Architecture on Google

Page 29: IoT at Google Scale

Plus True Stream Processing

Plus Autoscaling and per-minute billing

All the benefits of Hadoop-on-Google

Plus a Fully-Managed Service

Plus New, Intuitive Framework

1

2

3

4

5

Why Dataflow?

Page 30: IoT at Google Scale

Questions?