Top Banner
57

Scalable stream processing with Apache Kafka and Apache Samza

Jan 22, 2018

Download

Technology

Rnjai Lamba
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scalable stream processing with Apache Kafka and Apache Samza
Page 2: Scalable stream processing with Apache Kafka and Apache Samza
Page 3: Scalable stream processing with Apache Kafka and Apache Samza
Page 4: Scalable stream processing with Apache Kafka and Apache Samza
Page 5: Scalable stream processing with Apache Kafka and Apache Samza
Page 6: Scalable stream processing with Apache Kafka and Apache Samza
Page 7: Scalable stream processing with Apache Kafka and Apache Samza
Page 8: Scalable stream processing with Apache Kafka and Apache Samza
Page 9: Scalable stream processing with Apache Kafka and Apache Samza
Page 10: Scalable stream processing with Apache Kafka and Apache Samza
Page 11: Scalable stream processing with Apache Kafka and Apache Samza
Page 12: Scalable stream processing with Apache Kafka and Apache Samza
Page 13: Scalable stream processing with Apache Kafka and Apache Samza
Page 14: Scalable stream processing with Apache Kafka and Apache Samza

{ eventType: PageViewEvent, timestamp: 1413215518, viewerId: 1234, sessionId: fa1afe101234deadbeef, pageKey: profile-view, viewedProfileId: 4321, trackingKey: invitation-email, …metadata about displayed content…

}

Page 15: Scalable stream processing with Apache Kafka and Apache Samza
Page 16: Scalable stream processing with Apache Kafka and Apache Samza
Page 17: Scalable stream processing with Apache Kafka and Apache Samza
Page 18: Scalable stream processing with Apache Kafka and Apache Samza
Page 19: Scalable stream processing with Apache Kafka and Apache Samza
Page 20: Scalable stream processing with Apache Kafka and Apache Samza
Page 21: Scalable stream processing with Apache Kafka and Apache Samza
Page 22: Scalable stream processing with Apache Kafka and Apache Samza
Page 23: Scalable stream processing with Apache Kafka and Apache Samza

{ eventType: PageViewEvent, timestamp: 1413215518, viewerId: 1234, sessionId: fa1afe101234deadbeef, pageKey: profile-view, viewedProfileId: 4321, trackingKey: invitation-email, …metadata about displayed content…

}

Page 24: Scalable stream processing with Apache Kafka and Apache Samza
Page 25: Scalable stream processing with Apache Kafka and Apache Samza
Page 26: Scalable stream processing with Apache Kafka and Apache Samza
Page 27: Scalable stream processing with Apache Kafka and Apache Samza
Page 28: Scalable stream processing with Apache Kafka and Apache Samza
Page 29: Scalable stream processing with Apache Kafka and Apache Samza
Page 30: Scalable stream processing with Apache Kafka and Apache Samza
Page 31: Scalable stream processing with Apache Kafka and Apache Samza
Page 32: Scalable stream processing with Apache Kafka and Apache Samza

key = urn:linkedin:profile:1234 value = { eventType: ProfileEditEvent, timestamp: 1413215518, profile: { location: “Cambridge, UK”, industry: “Software”, positions: [ {job_title: “Author”, company: “O’Reilly”}, … ]}}

Page 33: Scalable stream processing with Apache Kafka and Apache Samza
Page 34: Scalable stream processing with Apache Kafka and Apache Samza
Page 35: Scalable stream processing with Apache Kafka and Apache Samza
Page 36: Scalable stream processing with Apache Kafka and Apache Samza
Page 37: Scalable stream processing with Apache Kafka and Apache Samza
Page 38: Scalable stream processing with Apache Kafka and Apache Samza
Page 39: Scalable stream processing with Apache Kafka and Apache Samza
Page 40: Scalable stream processing with Apache Kafka and Apache Samza
Page 41: Scalable stream processing with Apache Kafka and Apache Samza
Page 42: Scalable stream processing with Apache Kafka and Apache Samza
Page 43: Scalable stream processing with Apache Kafka and Apache Samza
Page 44: Scalable stream processing with Apache Kafka and Apache Samza
Page 45: Scalable stream processing with Apache Kafka and Apache Samza
Page 46: Scalable stream processing with Apache Kafka and Apache Samza

key = urn:linkedin:profile:1234 value = { eventType: ProfileEditEvent, timestamp: 1413215518, profile: { location: “Cambridge, UK”, industry: “Software”, positions: [ {job_title: “Author”, company: “O’Reilly”}, … ]}}

Page 47: Scalable stream processing with Apache Kafka and Apache Samza
Page 48: Scalable stream processing with Apache Kafka and Apache Samza
Page 49: Scalable stream processing with Apache Kafka and Apache Samza
Page 50: Scalable stream processing with Apache Kafka and Apache Samza
Page 51: Scalable stream processing with Apache Kafka and Apache Samza
Page 52: Scalable stream processing with Apache Kafka and Apache Samza
Page 53: Scalable stream processing with Apache Kafka and Apache Samza
Page 54: Scalable stream processing with Apache Kafka and Apache Samza
Page 55: Scalable stream processing with Apache Kafka and Apache Samza
Page 56: Scalable stream processing with Apache Kafka and Apache Samza

References (fun stuff to read)

1.  Martin Kleppmann: “Designing data-intensive applications.” O’Reilly Media, to appear in 2015. http://dataintensive.net

2.  Jay Kreps: “Why local state is a fundamental primitive in stream processing.” 31 July 2014. http://radar.oreilly.com/2014/07/why-local-state-is-a-fundamental-primitive-in-stream-processing.html

3.  Jay Kreps: “I ♥︎ Logs.” O'Reilly Media, September 2014. http://shop.oreilly.com/product/0636920034339.do

4.  Nathan Marz and James Warren: “Big Data: Principles and best practices of scalable realtime data systems.” Manning MEAP, to appear January 2015. http://manning.com/marz/

5.  Jakob Homan: “Real time insights into LinkedIn's performance using Apache Samza.” 18 Aug 2014. http://engineering.linkedin.com/samza/real-time-insights-linkedins-performance-using-apache-samza

6.  Martin Kleppmann: “Moving faster with data streams: The rise of Samza at LinkedIn.” 14 July 2014. http://engineering.linkedin.com/stream-processing/moving-faster-data-streams-rise-samza-linkedin

7.  Praveen Neppalli Naga: “Real-time Analytics at Massive Scale with Pinot.” 29 Sept 2014. http://engineering.linkedin.com/analytics/real-time-analytics-massive-scale-pinot

8.  Shirshanka Das, Chavdar Botev, Kapil Surlaker, et al.: “All Aboard the Databus!,” at ACM Symposium on Cloud Computing (SoCC), October 2012. http://www.socc2012.org/s18-das.pdf

9.  Apache Samza documentation. http://samza.incubator.apache.org 10. Alan Woodward and Martin Kleppmann: “Samza-Luwak Proof of Concept.” 10 November 2014.

https://github.com/romseygeek/samza-luwak

Page 57: Scalable stream processing with Apache Kafka and Apache Samza