Top Banner
39

Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Apr 16, 2017

Download

Technology

Big Data Spain
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek
Page 2: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

1!

Aljoscha Krettek @aljoscha

Big Data Spain November 17, 2016

Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

Page 3: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

What I’d Like to Talk About

2

§  Streaming Architecture and Flink

§  IoT and Event-Time based stream processing

§  Use-Case Examples

Page 4: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

3

Original creators of Apache Flink®

Providers of the dA Platform, a supported

Flink distribution

Page 5: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Intro: The Streaming Architecture

4

Page 6: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Rethinking Data Architecture

§  Better app isolation

§  Real-time reaction to events

§  Robust continuous applications

§  Process both real-time and historical data 5

Page 7: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

6

app state

app state

app state

event log

Query service

Page 8: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

What is (Distributed) Streaming §  Streaming:

Computations on never-ending “streams” of data records (“events”)

§  Distributed: Computation spread across many machines

7

Your code

Your code

Your code

Your code

Page 9: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

What is Stateful Streaming §  Computation and state

•  E.g., counters, windows of past events, state machines, trained ML models

§  Result depends on history of stream

§  A stateful stream processor should gives the tools to manage state •  Recover, roll back, version,

upgrade, etc 8

Your code

state

Page 10: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

What is Event-Time Streaming §  Data records associated with

timestamps (time series data)

§  Processing depends on timestamps

§  An event-time stream processor should give you the tools to reason about time •  Handle streams that are out of order •  Core feature is watermarks – a clock

to measure event time

9

Your code

state

t3 t1 t2 t4 t1-t2 t3-t4

Page 11: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Recap: What is Streaming? §  Continuous processing on data that is

continuously generated §  I.e., pretty much all “big” data §  It’s all about state and time §  Flink does all of what we just saw

10

Page 12: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

IoT and Event-time Stream Processing

11

Page 13: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

12 1read.bi/1yDOQQ3

The 'Internet Of Everything' Will Generate $14.4 Trillion Of Value Over The Next Decade.1

Page 14: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Example Event Sources

13

Page 15: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

A Simple Definition

14

IoT use cases from the system’s perspective: A large number of (distributed) things generating a large amount of data.

Page 16: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Important Properties

15

§  Data is continuously produced → Stream Processing

§  Events have a timestamp that has to be considered → Event-time based processing

§  Data/Events can arrive with huge delays §  Most analyses happen on time windows

Page 17: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Remember: Streaming technology is enabling the obvious: continuous

processing on data that is continuously produced

Hint: you already have streaming data 16

Page 18: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

What Is Event-Time Processing

17

13127359611 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 Processing Time

Event timestamp

Message Queue

Page 19: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

What’s The Problem?

18

13

12

7359611 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 Processing Time

Processing-Time Windows 137356

12 137 356Event-Time Windows

12

11 12

Mismatch between event time and processing time.

Page 20: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Sources of Time Mismatch §  Big Mismatch •  Network disconnects •  Slow network

§  Small Mismatch •  The nature of distributed systems •  Differing system clock time

19

Page 21: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Big Event-Time Mismatch

20

1977 1980 1983 1999 2002 2005 2015

Processing Time

Episode IV

Episode V

Episode VI

Episode I

Episode II

Episode III

Episode VII

Event Time

Page 22: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Small Event-Time Mismatch

21

Robust Stream Processing with Apache Flink®: A Simple Walkthrough http://data-artisans.com/robust-stream-processing-flink-walkthrough/

Page 23: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

22

Page 24: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

23

Page 25: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

24

Page 26: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Recap: Event-Time §  IoT use cases need event-time

processing §  Even small mismatch of event time/

processing time will lead to wrong results

25

Page 27: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Use-Case Examples

26

Page 28: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

30 Flink applications in production for more than one year. 10 billion events (2TB) processed daily

Complex jobs of > 30 operators running 24/7, processing 30 billion events daily, maintaining state of 100s of GB with exactly-once guarantees

Largest job has > 20 operators, runs on > 5000 vCores in 1000-node cluster, processes millions of events per second

27

Page 29: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

King §  Challenges: •  Many games (Candy Crush, Farm Heroes, Pet

Rescue, and Bubble Witch…) •  300 million monthly unique users •  30 billion events received every day

§  Need Event-Time Based statistics

28 https://techblog.king.com/rbea-scalable-real-time-analytics-king/

Page 30: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Solution: RBEA

29 https://techblog.king.com/rbea-scalable-real-time-analytics-king/

Page 31: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Solution: RBEA §  Multiplexing of multiple data scientist

requests into a single Flink job §  Groovy as language for analysis scripts §  Event-time windowing

30 https://techblog.king.com/rbea-scalable-real-time-analytics-king/

Page 32: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Bouygues Telecom

31 http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/

~120 users*

5 Flink Production Apps

750 TB Storage

4 billion Events/ day

2015

~300 users*

30 Flink Production Apps

2 PB Storage5

10 billion Events/ day

2016

* Users of the information system

Page 33: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Bouygues: Challenges §  Low latency & streaming fashion counters §  Massive amounts of data + bursty loads §  Reliability §  Multiple flow correlation §  Time management: •  Out of order & late events → our worst enemies •  Flexible window management

32 http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/

Page 34: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

33 http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/

Page 35: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

In Summary

34

§  If you need to ask: you already have a streaming use case!

§  IoT requires Proper Time Management §  Apache Flink has done that for a long

time now*

* Since version 0.10

Page 36: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

35!

Thank you! @aljoscha @ApacheFlink @dataArtisans

Page 37: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

36

One day of hands-on Flink training One day of conference Tickets are on sale Call for Papers is already open Please visit our website: http://sf.flink-forward.org Follow us on Twitter: @FlinkForward

Page 38: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

We are hiring!data-artisans.com/careers

Page 39: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Appendix

38