Top Banner
Routing billions of events a day: How we do routing in Schibsted Carlos Manuel Duclos-Vergara, Staff Engineer
31

How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Jul 25, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Routing billions of events a day:How we do routing in Schibsted

1

Carlos Manuel Duclos-Vergara, Staff Engineer

Page 2: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

About me

2

Page 3: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Agenda• Schibsted• A short story• GDPR• Pulse (our tracking solution)

• Overview• Internals

3

Page 4: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Schibsted

4

Page 5: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Event generation

5

Page 6: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Event routing

6

Page 7: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Event dispatching

7

Page 8: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Event consumption

8

Page 9: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

GDPR and data collection

9

Legal basis for data collection

1. Consent2. Processing obligation3. Legal obligation4. Vital interest5. Public interest6. Legitimate interest

User rights

1. Data portability2. Right to be forgotten

Page 10: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

End to end event processing solution

10

Page 11: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Pulse ecosystem

11

Page 12: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Lifetime of an event

12

Page 13: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Side track: How much is 1 billion events

13

Page 14: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Common pipeline

14

Page 15: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Batch pipeline

15

Page 16: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Streaming pipeline

16

Page 17: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Processing and routing internals

17

Page 18: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Routing lib

18

Page 19: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Processing: routing languageSinkName: eventType: event schema filter: inline || stored || null transform: stored || null SinkType: SinkDetails:

19

ProbeEvent-1: eventType: ProbeEvent kafka: topic: probe-topic

Page 20: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Event formats: probe event{

"$schema": "http://json-schema.org/draft-04/schema#",

"allOf": [

{

"$ref": "base-routable-event.json#"

}

],

"description": "Events sent by Data Platform Probe to measure latencies and missing events in the pipeline",

"id": "http://schema.schibsted.com/events/backend-probe-event.json#",

"properties": {

"senderId": {

"description": "Sender ID, in case several instances of Probe is running",

"type": "integer"

},

"sequenceNumber": {

"description": "Probe sequence number",

"type": "integer"

},

"timeSent": {

"$ref": "../common-definitions.json#/definitions/timestamp",

"description": "UTC timestamp of when the event is generated by Probe"

}

},

"title": "BackendProbeEvevnt",

"type": "object"

}

20

Page 21: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

JSLT: The magic sauce of processingJSON query and transformation language

21

Github repo: https://github.com/schibsted/jslt

License: Apache 2.0

{

"time": round(parse-time(.published, "yyyy-MM-dd'T'HH:mm:ssX") * 1000),

"device_manufacturer": .device.manufacturer,

"device_model": .device.model,

"language": .device.acceptLanguage,

"os_name": .device.osType,

"os_version": .device.osVersion,

"platform": .device.platformType,

"user_properties": {

"is_logged_in" : boolean(.actor."spt:userId")

}

}

Page 22: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Routing: batch

22

Page 23: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Routing: streaming

23

Page 24: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Lessons learned (so far…)• Schemas and versions• Backfilling and recovery• Logging and metrics• Auditing

24

Page 25: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

And finally

25

Page 26: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation
Page 27: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Extra

27

Page 28: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

About Schibsted

28

Page 29: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Marketplaces

29

Page 30: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

News Media

30

Page 31: How we do routing in Schibsted Routing billions of events ... · GDPR and data collection 9 Legal basis for data collection 1. Consent 2. Processing obligation 3. Legal obligation

Some of our Next companies

31