Top Banner
Bay Area Apache Flink Meetup #2 Distributed Stream and Graph Processing Community Update August 2015 Henry Saputra Committer and PMC Member [email protected] @Kingwulf
16

Bay Area Apache Flink Meetup Community Update August 2015

Feb 09, 2017

Download

Software

Henry Saputra
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bay Area Apache Flink Meetup Community Update August 2015

Bay Area Apache Flink Meetup #2 Distributed Stream and Graph Processing

Community Update August 2015

Henry SaputraCommitter and PMC Member

[email protected]@Kingwulf

Page 2: Bay Area Apache Flink Meetup Community Update August 2015

Apache Flink is an open source platform for scalable batch and stream data processing.

Apache Flink is …

2

• The core of Apache Flink is a distributed streaming dataflow engine.• Executing dataflows in

parallel on clusters• Providing a reliable

foundation for various workloads

• DataSet and DataStream programming abstractions are the foundation for user programs and higher layers

Page 3: Bay Area Apache Flink Meetup Community Update August 2015

One engine for many use cases

3

Real time streaming topologies

Machine Learning at scale

Graph Analysis

Long batchpipelines

Page 4: Bay Area Apache Flink Meetup Community Update August 2015

What happened? - 1• New PMC: Maximilian Michels• New Committer: Chesnay Schepler• Discussions for a 0.9.1 release had started• Apache Flink is becoming more popular:– 1000+ Twitter followers– 500+ GitHub stars– Named as “open source Big Data project” to

watch by ZDNet.– Flink Forward schedule with great speakers

announced4

Page 5: Bay Area Apache Flink Meetup Community Update August 2015

What happened? - 2• Apache Flink on Wikipedia: https://

en.wikipedia.org/wiki/Apache_Flink • New JobManager Dashboard• Apache SAMOA 0.3.0-incubating with Flink

integration• New “Features” page• Contributors list (can you spot your name?)https://cwiki.apache.org/confluence/display/FLINK/List+of+contributors

5

Page 6: Bay Area Apache Flink Meetup Community Update August 2015

New Job Manager Dashboard

6

Page 7: Bay Area Apache Flink Meetup Community Update August 2015

New Website Redesign and New Features page

7

Page 8: Bay Area Apache Flink Meetup Community Update August 2015

New Architecture diagram in 0.10 documentation

8

Page 9: Bay Area Apache Flink Meetup Community Update August 2015

More contents in the Wiki for Internal Information

9

Page 10: Bay Area Apache Flink Meetup Community Update August 2015

In master (0.10-SNAPSHOT) - 1

10

• Gelly Scala API• More improvements and fixes for YARN• Flink dropped Java 6 support• Streaming connector for Elastic Search• Sampling operation on DataSet API• A lot of bug fixes:– Streaming: APIs, general stability, kafka

connector

Page 11: Bay Area Apache Flink Meetup Community Update August 2015

In master (0.10-SNAPSHOT) - 2

• Low watermarks / Event time• New JM Dashboard• Akka messages are now aware of leader

IDs (for HA)• Zookeeper integration (for HA)• Live accumulators (runtime only)• Stability improvements

11

Page 12: Bay Area Apache Flink Meetup Community Update August 2015

Articles and Mentions• High-throughput, low-latency, and exactly-once stream

processing with Apache Flink [1]

• Introducing Gelly: Graph Processing with Apache Flink [2]

• Apache Flink and the case for stream processing [3]

• Crunching Parquet Files with Apache Flink [4]

• The morning paper: Asynchronous Distributed Snapshots for Distributed Dataflows [5]

• Five open source Big Data projects to watch [6]

• Big Data Performance Engineering: Examples from Hadoop, Pig, HBase, Flink and Spark [7]

12

[1] http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/[2] http://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html[3] http://www.kdnuggets.com/2015/08/apache-flink-stream-processing.html[4] https://medium.com/@istanbul_techie/crunching-parquet-files-with-apache-flink-200bec90d8a7[5] http://blog.acolyer.org/2015/08/19/asynchronous-distributed-snapshots-for-distributed-dataflows/[6] http://www.zdnet.com/article/five-open-source-big-data-projects-to-watch/[7] http://www.bigsynapse.com/addressing-big-data-performance

Page 13: Bay Area Apache Flink Meetup Community Update August 2015

New Meetups and Events

13

• Chicago: Flink Training @ Capital One

• Bay Area: Stream & Graph Processing @ MapR

13

Page 14: Bay Area Apache Flink Meetup Community Update August 2015

GitHub stats

14

Page 15: Bay Area Apache Flink Meetup Community Update August 2015

Upcoming• Sept 15: Washington DC Area Apache

Flink Meetup• Sept 17: StreamProcessing.be meetup• Sept 28-30: Flink Talks at ApacheCon Big

Data BudapestNew Meetup groups:• New York• Boston

15

Page 16: Bay Area Apache Flink Meetup Community Update August 2015

Flink Forward schedule published

16

• http://flink-forward.org/?post_type=day• Talks by Google, Data Artisans, Huawei,

CapitalOne, Bouyges, Ericsson, Amadeus, ResearchGate, RedHat, and many more.

50% off for this meetup‘s guests

FlinkMeetupBayArea50