Top Banner
#DevoxxFR Stream Processing with Apache Flink Tugdual “Tug” Grall Technical Evangelist @ MapR [email protected] @tgrall 1
56

Introduction to Streaming with Apache Flink

Apr 16, 2017

Download

Technology

Tugdual Grall
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Streaming with Apache Flink

#DevoxxFR

Stream Processing with Apache Flink

Tugdual “Tug” Grall Technical Evangelist @ MapR [email protected] @tgrall

1

Page 2: Introduction to Streaming with Apache Flink

#DevoxxFR

{“about” : “me”}

2

Tugdual “Tug” Grall • MapR : Technical Evangelist • MongoDB, Couchbase, eXo, Oracle • NantesJUG co-founder

• @tgrall • http://tgrall.github.io • [email protected] / [email protected]

Page 3: Introduction to Streaming with Apache Flink

#DevoxxFR 3

Open Source Engines & Tools Commercial Engines & Applications

Enterprise-Grade Platform Services

Dat

aPr

oces

sing

Web-Scale StorageMapR-FS MapR-DB

Search and Others

Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability

MapR Streams

Cloud and Managed Services

Search and Others

Unified M

anagement and M

onitoring

Search and Others

Event StreamingDatabase

Custom Apps

HDFS API POSIX, NFS HBase API JSON API Kafka API

MapR Converged Data Platform

Page 4: Introduction to Streaming with Apache Flink

#DevoxxFR 4

Streaming technology is enabling the obvious: continuous processing on data that is continuously produced

Hint: you already have streaming data

Page 5: Introduction to Streaming with Apache Flink

#DevoxxFR

Decoupling

5

App B

App A

App C

State managed centralized

App B

App A

App C

Applications build their own state

Page 6: Introduction to Streaming with Apache Flink

#DevoxxFR 6

Event Stream=Data

Pipelines

Page 7: Introduction to Streaming with Apache Flink

#DevoxxFR

Streaming and Batch

7

2016-3-1 12:00 am

2016-3-1 1:00 am

2016-3-1 2:00 am

2016-3-11 11:00pm

2016-3-12 12:00am

2016-3-12 1:00am

2016-3-11 10:00pm

2016-3-12 2:00am

2016-3-12 3:00am…

partition

partition

Page 8: Introduction to Streaming with Apache Flink

#DevoxxFR

Streaming and Batch

8

2016-3-1 12:00 am

2016-3-1 1:00 am

2016-3-1 2:00 am

2016-3-11 11:00pm

2016-3-12 12:00am

2016-3-12 1:00am

2016-3-11 10:00pm

2016-3-12 2:00am

2016-3-12 3:00am…

partition

partition

Stream (low latency)

Stream (high latency)

Page 9: Introduction to Streaming with Apache Flink

#DevoxxFR

Streaming and Batch

9

2016-3-1 12:00 am

2016-3-1 1:00 am

2016-3-1 2:00 am

2016-3-11 11:00pm

2016-3-12 12:00am

2016-3-12 1:00am

2016-3-11 10:00pm

2016-3-12 2:00am

2016-3-12 3:00am…

partition

partition

Stream (low latency)

Batch(bounded stream)Stream (high latency)

Page 10: Introduction to Streaming with Apache Flink

#DevoxxFR

Processing

10

• Request / Response

Page 11: Introduction to Streaming with Apache Flink

#DevoxxFR

Processing

11

• Request / Response

• Batch

Page 12: Introduction to Streaming with Apache Flink

#DevoxxFR

Processing

12

• Request / Response

• Batch

• Stream Processing

Page 13: Introduction to Streaming with Apache Flink

#DevoxxFR

Processing

13

• Request / Response

• Batch

• Stream Processing

• Real-time reaction to events

• Continuous applications

• Process both real-time and historical data

Page 14: Introduction to Streaming with Apache Flink

#DevoxxFR 14

Page 15: Introduction to Streaming with Apache Flink

#DevoxxFR

Flink Architecture

15

Page 16: Introduction to Streaming with Apache Flink

#DevoxxFR

Flink Architecture

16

DeploymentLocal Cluster Cloud

Single JVM Standalone, YARN, Mesos AWS, Google

Page 17: Introduction to Streaming with Apache Flink

#DevoxxFR

Flink Architecture

17

DeploymentLocal Cluster Cloud

Single JVM Standalone, YARN, Mesos AWS, Google

CoreRuntime

Distributed Streaming Dataflow

Page 18: Introduction to Streaming with Apache Flink

#DevoxxFR

Flink Architecture

18

DeploymentLocal Cluster Cloud

Single JVM Standalone, YARN, Mesos AWS, Google

CoreRuntime

Distributed Streaming Dataflow

DataSet APIBatch Processing

API &

Libraries

Page 19: Introduction to Streaming with Apache Flink

#DevoxxFR

Flink Architecture

19

DeploymentLocal Cluster Cloud

Single JVM Standalone, YARN, Mesos AWS, Google

CoreRuntime

Distributed Streaming Dataflow

DataSet APIBatch Processing

API &

Libraries

FlinkMLMachine Learning

GellyGraph Processing

TableRelational

Page 20: Introduction to Streaming with Apache Flink

#DevoxxFR

Flink Architecture

20

DeploymentLocal Cluster Cloud

Single JVM Standalone, YARN, Mesos AWS, Google

CoreRuntime

Distributed Streaming Dataflow

DataSet APIBatch Processing

DataStream APIStream Processing

API &

Libraries

FlinkMLMachine Learning

GellyGraph Processing

TableRelational

Page 21: Introduction to Streaming with Apache Flink

#DevoxxFR

Flink Architecture

21

DeploymentLocal Cluster Cloud

Single JVM Standalone, YARN, Mesos AWS, Google

CoreRuntime

Distributed Streaming Dataflow

DataSet APIBatch Processing

DataStream APIStream Processing

API &

Libraries

FlinkMLMachine Learning

GellyGraph Processing

TableRelational

CEPEvent Processing

TableRelational

Page 22: Introduction to Streaming with Apache Flink

#DevoxxFR 22

Demonstration

Flink Basics

Page 23: Introduction to Streaming with Apache Flink

#DevoxxFR

Batch & Stream

23

case class Word (word: String, frequency: Int)

// DataSet API - Batchval lines: DataSet[String] = env.readTextFile(…)

lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))} .groupBy("word").sum("frequency") .print()

// DataStream API - Streamingval lines: DataSream[String] = env.fromSocketStream(...)

lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))} .keyBy("word”).window(Time.of(5,SECONDS))

.every(Time.of(1,SECONDS)).sum(”frequency") .print()

Page 24: Introduction to Streaming with Apache Flink

#DevoxxFR

Steam Processing

24

SourceFilter /

Transform Sink

Page 25: Introduction to Streaming with Apache Flink

#DevoxxFR

Flink Ecosystem

25

Source Sink

Apache Kafka

MapR Streams

AWS Kinesis

RabbitMQ

Twitter

Apache Bahir

Apache Kafka

MapR Streams

AWS Kinesis

RabbitMQ

Elasticsearch

HDFS/MapR-FS

Page 26: Introduction to Streaming with Apache Flink

#DevoxxFR

Stateful Steam Processing

26

SourceFilter /

TransformState

read/write Sink

Page 27: Introduction to Streaming with Apache Flink

#DevoxxFR 27

Is Flink used?

Page 28: Introduction to Streaming with Apache Flink

#DevoxxFR

Powered by Flink

28

Page 29: Introduction to Streaming with Apache Flink

#DevoxxFR 29

10 Billion events/day 2Tb of data/day

30 Applications 2Pb of storage and growing

Source Bouyges Telecom : http://berlin.flink-forward.org/wp-content/uploads/2016/07/Thomas-Lamirault_Mohamed-Amine-Abdessemed-A-brief-history-of-time-with-Apache-Flink.pdf

Page 30: Introduction to Streaming with Apache Flink

#DevoxxFR 30

Stream Processing

Windowing

Page 31: Introduction to Streaming with Apache Flink

#DevoxxFR

Stream Windows

31

Page 32: Introduction to Streaming with Apache Flink

#DevoxxFR

Stream Windows

32

Page 33: Introduction to Streaming with Apache Flink

#DevoxxFR

Stream Windows

33

Page 34: Introduction to Streaming with Apache Flink

#DevoxxFR

Stream Windows

34

Page 35: Introduction to Streaming with Apache Flink

#DevoxxFR

Stream Windows

35

Page 36: Introduction to Streaming with Apache Flink

#DevoxxFR 36

Demonstration

Flink Windowing

Page 37: Introduction to Streaming with Apache Flink

#DevoxxFR 37

Time

What about it ?

Page 38: Introduction to Streaming with Apache Flink

#DevoxxFR

Demonstration

38

• Multiple notion of “Time” in Flink

• Event Time

• Ingestion Time

• Processing Time

Page 39: Introduction to Streaming with Apache Flink

#DevoxxFR

What Is Event-Time Processing

39

1977 1980 1983 1999 2002 2005 2015

Processing Time

EpisodeIV

EpisodeV

EpisodeVI

EpisodeI

EpisodeII

EpisodeIII

EpisodeVII

Event Time

Page 40: Introduction to Streaming with Apache Flink

#DevoxxFR

Time in Flink

40

Page 41: Introduction to Streaming with Apache Flink

#DevoxxFR 41

Complex Event Processing

Page 42: Introduction to Streaming with Apache Flink

#DevoxxFR

Complex Event Processing

42

• Analyzing a stream of events and drawing conclusions

• “if A and then B ! infer event C”

• Demanding requirements on stream processor

• Low latency!

• Exactly-once semantics & event-time support

Page 43: Introduction to Streaming with Apache Flink

#DevoxxFR

Stream Windows

43

Page 44: Introduction to Streaming with Apache Flink

#DevoxxFR

Order Events

44

Process is reflected in a stream of order events

Order(orderId, tStamp, “received”)Shipment(orderId, tStamp, “shipped”)Delivery(orderId, tStamp, “delivered”)

orderId: Identifies the ordertStamp: Time at which the event happened

Page 45: Introduction to Streaming with Apache Flink

#DevoxxFR

Real-time Warnings

45

Page 46: Introduction to Streaming with Apache Flink

#DevoxxFR

CEP to the Rescue

46

Define processing and delivery intervals (SLAs)

ProcessSucc(orderId, tStamp, duration)ProcessWarn(orderId, tStamp)DeliverySucc(orderId, tStamp, duration)DeliveryWarn(orderId, tStamp)

orderId: Identifies the ordertStamp: Time when the event happenedduration: Duration of the processing/delivery

Page 47: Introduction to Streaming with Apache Flink

#DevoxxFR

CEP Example

47

Page 48: Introduction to Streaming with Apache Flink

#DevoxxFR

Processing: Order ! Shipment

48

Page 49: Introduction to Streaming with Apache Flink

#DevoxxFR 49

Processing: Order ! Shipmentval processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1))

Page 50: Introduction to Streaming with Apache Flink

#DevoxxFR 50

val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1))

val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern)

Processing: Order ! Shipment

Page 51: Introduction to Streaming with Apache Flink

#DevoxxFR 51

val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1))

val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern)

val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select { (pP, timestamp) => // Timeout handler ProcessWarn(pP("received").orderId, timestamp) } { fP => // Select function ProcessSucc( fP("received").orderId, fP("shipped").tStamp, fP("shipped").tStamp – fP("received").tStamp) }

Processing: Order ! Shipment

Page 52: Introduction to Streaming with Apache Flink

#DevoxxFR

Count Delayed Shipments

52

Page 53: Introduction to Streaming with Apache Flink

#DevoxxFR

Compute Avg Processing Time

53

Page 54: Introduction to Streaming with Apache Flink

#DevoxxFR

The End

54

• Process events in real time and/or batch

• Complex Event Processing (CEP)

• Many other things to discover

• Deployment

• High Availability

• Table/Relational API

• … https://mapr.com/ebooks/

Page 55: Introduction to Streaming with Apache Flink

#DevoxxFR 55

Flink Community &

Thanks to

Kostas Tzoumas Stephan Ewen Fabian Hueske Till Rohrmann

Jamie Grier

Page 56: Introduction to Streaming with Apache Flink

#DevoxxFR

Stream Processing with Apache Flink

Tugdual “Tug” Grall Technical Evangelist @ MapR [email protected] @tgrall

56