Top Banner
Stephan Ewen @stephanewen What's coming up in Apache Flink? Quick teaser of some of the upcoming features
21

Apache Flink Berlin Meetup May 2016

Feb 17, 2017

Download

Software

Stephan Ewen
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Apache Flink Berlin Meetup May 2016

Stephan Ewen@stephanewen

What's coming up inApache Flink?Quick teaser of some of the upcoming features

Page 2: Apache Flink Berlin Meetup May 2016

Disclaimer

2

This list of threads is incomplete

This is not an Apache Flink roadmap!

Page 3: Apache Flink Berlin Meetup May 2016

What's coming up?

3

APIs

Integration Operations

Stream SQL

Queryable State

Cassandra

Deployment and Management(YARN, Mesos, Docker, …)

Dynamically ScalingStreaming Programs

Metrics

File System Sources

Side InputsJoining streamsand static data

BigTopIntegration

KinesisState Scalability

Page 4: Apache Flink Berlin Meetup May 2016

4

Stream SQL

Page 5: Apache Flink Berlin Meetup May 2016

Two definitions of Stream SQL

1. Run a continuous SQL query that reads an infinitestream and continuously produces results

2. Continuously ingest streams into a warehouse.Query the real time data in the warehouse.

5

Page 6: Apache Flink Berlin Meetup May 2016

Two definitions of Stream SQL

1. Run a continuous SQL query that reads an infinitestream and continuously produces results

2. Continuously ingest streams into a warehouse.Query the real time data in the warehouse.

6

That's Flink's Stream SQL

Good use case for Kafka + Flink + Druid

Page 7: Apache Flink Berlin Meetup May 2016

An Example

7

val execEnv = StreamExecutionEnvironment.getExecutionEnvironmentval tableEnv = TableEnvironment.getTableEnvironment(execEnv)

// define a JSON encoded Kafka topic as external tableval sensorSource = new KafkaJsonSource[(String, Long, Double)]("sensorTopic", kafkaProps, ("location", "time", "tempF"))

// register external tabletableEnv.registerTableSource("sensorData", sensorSource)

// define query in external tableval roomSensors: Table = tableEnv.sql(""" SELECT STREAM time, location AS room, (tempF - 32) * 0.556 AS tempC FROM sensorData WHERE location LIKE 'room%' """)

// write the table back to Kafka as JSONroomSensors.toSink(new KafkaJsonSink(...))

Page 8: Apache Flink Berlin Meetup May 2016

The Implementation

8Flink 1.0 Flink 1.1 +

Page 9: Apache Flink Berlin Meetup May 2016

9

Queryable State

Page 10: Apache Flink Berlin Meetup May 2016

Sharing State with Applications

10

Access to the stream aggregates with a latency bound Write them to a key/value store

Page 11: Apache Flink Berlin Meetup May 2016

Sharing State with Applications

11

Access to the stream aggregates with a latency bound Write them to a key/value store

Often the biggestbottleneck

Page 12: Apache Flink Berlin Meetup May 2016

Queryable State

12

Optional, andonly at the end of

windows

Send queries to Flink's internal state

Page 13: Apache Flink Berlin Meetup May 2016

What does it bring? Fewer moving parts in the infrastructure Performance!

From an extension of Yahoo!'s streaming benchmark:• With key/value store: 280,000 events/s• Queryable state: 15,000,000 events/s

What's the secret?• No synchronous distributed communication• Persistence via Flink's checkpoint (async snapshots)

13

Page 14: Apache Flink Berlin Meetup May 2016

14

Dynamic Scaling

Page 15: Apache Flink Berlin Meetup May 2016

Adjust parallelism of Streaming Programs

15

Initialconfiguration

Scale Out(for load)

Scale In(save resources)

Page 16: Apache Flink Berlin Meetup May 2016

Adjust parallelism of Streaming Programs Adjusting parallelism without (significantly) interrupting the

program

Initial version:• Savepoint -> stop -> restart-with-different-parallelism

Stateless operators: Trivial Stateful operators: Repartition state

• State reorganized by key for key/value state and windows

16

Page 17: Apache Flink Berlin Meetup May 2016

Consistent Hashing

17

Page 18: Apache Flink Berlin Meetup May 2016

Redistribution via Key Groups

18

Page 19: Apache Flink Berlin Meetup May 2016

Redistribution via Key Groups Flink 1.0: Hash keys into parallel partitions. Finest granularity is a partition.

Flink 1.1: Hash keys into KeyGroups. Assign KeyGroups to parallel partitions Change of parallelism means change of assignment of

KeyGroups to parallel partitions

19

Page 20: Apache Flink Berlin Meetup May 2016

Flink Forward 2016, Berlin

Submission deadline: June 30, 2016Early bird deadline: July 15, 2016

www.flink-forward.org

Page 21: Apache Flink Berlin Meetup May 2016

We are hiring!data-artisans.com/careers