Top Banner
Building a real-time, scalable and intelligent programmatic ad buying platform Martín Bonamico Juan Martín Pampliega
28

Building a real-time, scalable and intelligent programmatic ad buying platform

Jan 07, 2017

Download

Software

Jampp
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Building a real-time, scalable and intelligent programmatic ad buying platform

Building a real-time, scalable and intelligent programmatic

ad buying platformMartín Bonamico

Juan Martín Pampliega

Page 2: Building a real-time, scalable and intelligent programmatic ad buying platform

Agenda1. Jampp

2. Adtech, RTB, clicks, installs, events

3. Initial Architecture

4. Initial Architecture Characteristics

5. Evolution of Data Needs

6. New Data Infrastructure - Stream Processing

7. Key Take Aways

Page 3: Building a real-time, scalable and intelligent programmatic ad buying platform

Jampp and AdTech

Page 4: Building a real-time, scalable and intelligent programmatic ad buying platform

Jampp is a leading mobile app marketing and retargeting platform.

Founded in 2013, Jampp has offices in San Francisco, London, Berlin, Buenos Aires, São Paulo and Cape Town.

We help companies grow their business by seamlessly acquiring, engaging & retaining mobile app users.

Page 5: Building a real-time, scalable and intelligent programmatic ad buying platform

Jampp’s platform combines machine learning with big data for programmatic ad buying which optimizes towards in-app activity.

Our platform processes +200,000 RTB ad bid requests per second (17+ billions per day) which amounts to about 300 MB/s or 25 TB of data per day.

Page 6: Building a real-time, scalable and intelligent programmatic ad buying platform

How does programmatic ads work?

DOWNLOAD

APP

Source /Exchange Jampp Tracking

PlatformAppStore /

Google Play

App Install

Postback

Postback

Page 7: Building a real-time, scalable and intelligent programmatic ad buying platform

RTB: Real Time Bidding

Page 8: Building a real-time, scalable and intelligent programmatic ad buying platform

Jampp Events1. RTB:

a. Auction: the exchange asks if we want to bid for the impression.

b. Bid/Non-Bid: bid with price or non-bid (less than 80ms).c. Impression: the ad is displayed to the user.

2. Non-RTB:a. Click: event that marks when the user clicks on the ad.b. Install: install of the app on first app open. c. Event: in app events like purchase, view, favorited.

Page 9: Building a real-time, scalable and intelligent programmatic ad buying platform

Data @ Jampp● Our platform started using RDBMSs and a

traditional Data Warehouse architecture on Amazon Web Services.

● Data grew exponentially and data needs became more complex.

● In the last year alone, 2500%+ in-app events and 500%+ RTB bids.

● This made us evolve our architecture to be able to effectively handle Big Data.

Page 10: Building a real-time, scalable and intelligent programmatic ad buying platform

Initial Data Architecture

Page 11: Building a real-time, scalable and intelligent programmatic ad buying platform

C1

C2

Cn

Cupper

Load Balancer

MySQL

Click

Install

Event

ClickRedirect

PostgreSQL

B1 B2 Bn

Replicator

API(Pivots)

Auctions Bids ImpressionsInitial Jampp Infrastructure

Page 12: Building a real-time, scalable and intelligent programmatic ad buying platform

Jampp Initial Systems: Bidder● OpenRTB bidding system implementation that runs on

200+ virtual machines with 70GB RAM each.

● Strong latency requirements. Less than 80ms to answer a request.

● Written in Cython and uses ZMQ for communication.

● Heavy use of coherent caching to comply with latency requirements.

● Data is continually replicated and enriched from MySQL by the replicator process.

Page 13: Building a real-time, scalable and intelligent programmatic ad buying platform

Jampp Initial Systems: Cupper● Event tracking system written in Node.js.

● Tracks clicks, installs and in-app events. (200+ millions per day)

● Can be scaled horizontally (10 instances) and is located behind a load balancer (ELB).

● Uses a MySQL database to store attributed events and Kinesis to store organics.

Page 14: Building a real-time, scalable and intelligent programmatic ad buying platform

Jampp Initial Systems: API● PostgreSQL is used as a Data Warehouse database apart

from the use the bidder does.

● An API exposes the data for querying with a caching layer.

● Fact tables are maintained with hourly, daily and monthly granularity and high cardinality dimensions are removed in large fact tables for data older than 15 days.

● Data is continually aggregated through an aggregation process written in Python.

Page 15: Building a real-time, scalable and intelligent programmatic ad buying platform

Evolution of the Data Architecture

Page 16: Building a real-time, scalable and intelligent programmatic ad buying platform

Emerging Needs● Log forensics capabilities - as our systems and company

scale and we integrate with more outside systems.

● More historical and granular data for advanced analytics and model training.

● The need to make the data readily available to other systems outside from the traditional RDBMS arose. Some of these systems are too demanding for RDBMS to handle easily.

Page 17: Building a real-time, scalable and intelligent programmatic ad buying platform

C1

C2

Cn

Cupper

Load Balancer

MySQL(Ruby)

Click

Install

Event

ClickRedirect

ELB Logs

C1

C2

Cn

EMR - Hadoop Cluster

AirPal

Initial Evolution

Page 18: Building a real-time, scalable and intelligent programmatic ad buying platform

New System Characteristics● The new system was based on Amazon Elastic Map

Reduce.

● Data imported hourly from RDBMSs with Sqoop.

● Logs are imported every 10 minutes from different sources to S3 tables.

● Facebook PrestoDB and Apache Spark are used for interactive log and analytics.

Page 19: Building a real-time, scalable and intelligent programmatic ad buying platform

New System Characteristics● Scalable storage and processing capabilities using

HDFS, YARN and Hive for ETLs and data storage.

● Connectors from different languages like Python, Julia and Java/Scala.

● Data archiving in S3 for long term storage and enabling other data processing technologies.

Page 20: Building a real-time, scalable and intelligent programmatic ad buying platform

Aspects that needed improvement● Data still imported in batch mode. Delay was larger

for MySQL data than with Python replicator.

● EMR not great for long running clusters.

● The EMR cluster is not designed with strong multi-user capabilities. It is better to have multiple clusters with few users than a big one with many.

● Data still being accumulated in RDBMSs (clicks, installs, events).

Page 21: Building a real-time, scalable and intelligent programmatic ad buying platform

Final stage of the evolution● Real-time event processing architecture based on

best practices for stream processing in AWS.

● Uses Amazon Kinesis for streaming data storage and Amazon Lambda for data processing.

● DynamoDB and Redis are used for temporal data storage for enrichment and analytics.

● S3 gives us a Source of Truth for batch data applications and Kinesis for stream processing.

Page 22: Building a real-time, scalable and intelligent programmatic ad buying platform

Our Real-Time Architecture

Page 23: Building a real-time, scalable and intelligent programmatic ad buying platform

Still, it isn’t perfect...● There is no easy way to manage windows and out or

order data with Amazon Lambda.

● Consistency of DynamoDB and S3.

● Price of AWS managed services for events with large numbers compared to custom maintained solutions.

● ACID guarantees of RDBMs are not an easy thing to part with.

● SQL and indexes in RDBMs make forensics easier.

Page 24: Building a real-time, scalable and intelligent programmatic ad buying platform

Benefits of the Evolution● Enables the use of stream processing frameworks to

keep data as fresh as economically possible.

● Decouples data from processing to enable multiple Big Data engines running on different clusters/ infrastructure.

● Easy on demand scaling given by AWS managed tools like AWS Lambda, AWS DynamoDB and AWS EMR.

● Monitoring, logs and alerts managed by AWS Cloudwatch.

Page 25: Building a real-time, scalable and intelligent programmatic ad buying platform

Big Data Technologies at Jampp

S3HDFS

Hadoop/YARNLambda

DynamoDB

Page 26: Building a real-time, scalable and intelligent programmatic ad buying platform

Key Take Aways● Ad tech is a technologically intensive market which

complies with the three Vs from Big Data.

● As the business’ data needs grows in complexity specialized data systems need to be put in place.

● Using technologies that are meant to scale easily and are managed by a third party can bring you peace of mind.

● Stream processing is fundamental in new Big Data Projects.

● There is currently no one tool that clearly fulfills all the needs for scalable and correct stream processing.

Page 27: Building a real-time, scalable and intelligent programmatic ad buying platform

Referenceshttp://radar.oreilly.com/2015/08/the-world-beyond-batch-streaming-101.html

http://radar.oreilly.com/2015/08/the-world-beyond-batch-streaming-102.html

https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

http://blog.confluent.io/2015/01/29/making-sense-of-stream-processing/

JAMPP - AGRANDA 2015

http://44jaiio.sadio.org.ar/sites/default/files/agranda14-30.pdf

Page 28: Building a real-time, scalable and intelligent programmatic ad buying platform

Questions?geeks.jampp.com

We Are Hiring! - jampp.com/jobs.php

[email protected]

[email protected]