Top Banner
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Streaming ETL in Kafka for Everyone with KSQL Software Engineer, Confluent Inc. Hojjat Jafarpour
23

Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

Jan 22, 2018

Download

Technology

ScyllaDB
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Streaming ETL in Kafka for Everyone with KSQL

Software Engineer, Confluent Inc.

Hojjat Jafarpour

Page 2: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Hojjat Jafarpour

2

Software Engineer at Confluent ○Starter KSQL project at Confluent

Previously at Tidemark, Quantcast, Informatica and

NEC Labs

PhD in Computer Science from UC Irvine○Data management, pub/sub and streaming

[email protected]

@hojjat

Page 3: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Streaming ETL, with Apache Kafka and Confluent Platform

3

Page 4: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

4

Page 5: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

5

Page 6: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

6

Page 7: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

7

Page 8: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Kafka Connect : Stream data in and out of Kafka

8

Amazon

S3

Page 9: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Single Message Transform (SMT)

9

▪ Modify events before storing in

Kafka:o Mask/drop sensitive informationo Set partitioning keyo Store lineage

▪ Modify events going out of

Kafka:o Route high priority events to faster

data storeso Direct events to different

Elasticsearch indexeso Cast data types to match destination

Page 10: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

10

But I need to join…aggregate…filter

Page 11: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

KSQL from Confluent

11

A Developer Preview of

KSQL

An Open Source Streaming SQL

Engine for Apache KafkaTM

Page 12: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

KSQL: a Streaming SQL Engine for Apache Kafka™ from Confluent

▪ Enables stream processing with zero coding required

▪ The simplest way to process streams of data in real-time

▪ Powered by Kafka: scalable, distributed, battle-tested

▪ All you need is Kafka–No complex deployments of bespoke

systems for stream processing

12

Ksql>

Page 13: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

KSQL: the Simplest Way to Do Stream Processing

CREATE STREAM possible_fraud AS

SELECT card_number, count(*)

FROM authorization_attempts

WINDOW TUMBLING (SIZE 5 SECONDS)

GROUP BY card_number

HAVING count(*) > 3;

13

Page 14: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

KSQL Concepts

▪ STREAM and TABLE as first-class citizens

o Interpretations of topic content

▪ STREAM - data in motion

▪ TABLE - collected state of a stream

o One record per key (per window)

o Current values (compacted topic) ← Not yet in KSQL

▪ STREAM – TABLE Joins

14

Page 15: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Window Aggregations

Three types supported (same as KStreams):

● TUMBLING: Fixed-size, non-overlapping, gap-less windows

• SELECT ip, count(*) AS hits FROM clickstream

WINDOW TUMBLING (size 1 minute) GROUP BY ip;

● HOPPING: Fixed-size, overlapping windows

• SELECT ip, SUM(bytes) AS bytes_per_ip_and_bucket FROM clickstream

WINDOW HOPPING ( size 20 second, advance by 5 second) GROUP BY ip;

● SESSION: Dynamically-sized, non-overlapping, data-driven window

• SELECT ip, SUM(bytes) AS bytes_per_ip FROM clickstream

WINDOW SESSION (20 second) GROUP BY ip;

15

Page 16: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Streaming ETL, powered by Apache Kafka and Confluent Platform

16

KSQL

Page 17: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Simple Web Analytics Pipeline

● Pageview stream● User table● Materialized views

o Region visitor counto Region visitor demography

17

CREATE STREAM pageviews (viewtime BIGINT, userid VARCHAR, pageid VARCHAR) WITH

(kafka_topic='pageviews', value_format=JSON);

CREATE TABLE users (registertime BIGINT, gender VARCHAR, regionid VARCHAR, userid

VARCHAR) WITH (kafka_topic='users', value_format='JSON');

Page 18: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Simple Web Analytics Pipeline

18

Region visitor count

CREATE STREAM joined_pageviews AS

SELECT users.userid AS userid, pageid, regionid, gender

FROM pageviews LEFT JOIN users ON pageviews.userid = users.userid;

CREATE TABLE region_visitor_count AS

SELECT regionid , COUNT(*) AS visit_count

FROM joined_pageviews

WINDOW TUMBLING (size 30 second)

GROUP BY regionid;

Page 19: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Simple Web Analytics Pipeline

19

Region visitor demography

CREATE TABLE region_visitor_demo_count AS

SELECT regionid, gender, COUNT(*) AS visit_count

FROM joined_pageviews

WINDOW TUMBLING (size 30 second)

GROUP BY gender, regionid;

Page 20: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Streaming ETL, powered by Apache Kafka and Confluent Platform

20

KSQL

Page 21: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Confluent Platform: Enterprise Streaming based on Apache Kafka™

21

Database

ChangesLog Events loT Data

Web

Events…

CRM

Data Warehouse

Database

Hadoop

Data

Integration

Monitoring

Analytics

Custom Apps

Transformations

Real-time

Applications

Apache Open Source Confluent Open Source Confluent Enterprise

Confluent Platform

Confluent Platform

Apache Kafka™

Core | Connect API | Streams API

Data Compatibility

Schema Registry

Monitoring & Administration

Confluent Control Center | Security

Operations

Replicator | Auto Data Balancing

Development and Connectivity

Clients | Connectors | REST Proxy | KSQL | CLI

Page 22: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Date to remember

22

• Kafka Summit 2018

• April 23-24 in London!

• More details:

https://kafka-summit.org/

Page 23: Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

THANK YOU

[email protected]

@hojjat

Please stay in touch

Any questions?

https://github.com/confluentinc/ksql/

https://www.confluent.io/download/