Shimon Tolts General Manager, Data Solutions Atom Data Pipeline Processing 200B events with Node.js And Docker On AWS
Apr 08, 2017
Shimon Tolts General Manager, Data Solutions
AtomData Pipeline Processing 200B events
with Node.js And Docker On AWS
About ironSource: Hypergrowth
People Reached Each Month
4200Apps Installed Every Minutewith the ironSource Platform
Registered & Analyzed Data EventsEvery Month
200B
800M
50B
0
100B
150B
200B
Jun 201
5
Jul 201
5
Aug 201
5
Sep 201
5
Oct 201
5
Nov 201
5
Dec 201
5
Jan 201
6
Feb 201
6
Mar 201
6
Apr 201
6
May 201
6
We needed a way to manage this data:
Our Business Challenge
ProcessCollect Store
Collection
● Multi region layer - Latency based
routing
● Low latency from client to Atom servers
● High Availability - AWS regions does
fail!
● Storing raw data + headers upon
receiving
Data Enrichment● Enrich data before storing in your Data
Lake and/or Warehouse○ IP to Country○ Currency conversion ○ Decrypt data○ User Agent parsing - OS, Browser, Device...
● Any custom logic you would like! - fully extendible
Data Targets● Near real-time data insertion - 1
minute!● Stream data to Google Storage and/or
AWS S3● Smart insertion of data into AWS
Redshift○ Set the amount of parallel copys○ Configure priority on tables
● BigQuery - Streaming data using batch files import (saves 20% cost)
Micro-Services Architecture● Everything is a service● Decoupling● Distributed systems
Separate lifecycle● Communication using RESTful /
Queue / Streams
Docker● Linux Container● Save provisioning time● Infrastructure as code● Dev-Test-Production - identical
container● Ship easily
Cloud infrastructure● Pay as you go - (grow)● SaaS services ● Auto-scaling-groups● DynamoDB● RDS *SQL● Redshift data warehouse
Continuous Integration● From commit to production● Jenkins commit hook● Git branching model● AWS dynamic slaves● Unit tests● Docker builds● Updating live environment
Diagram
Starting PointPre-baked images - AMIs
Supervisor
Nginx reverse proxy
Node.js * cpu-count
Provisioning time * instances
Bash provisioning scripts
Minimum Viable ProductInfrastructure as code
Nginx
Node.js * cpu-count
Supervisor
Docker Hub
No Bash scripts!
No provisioning time * instances
https://github.com/ironSource/docker-config/blob/bb6be85b97132cbdd10084305ee1ee2f414b0b50/Dockerfile
Interactive CycleNginx
Supervisor
Infrastructure as code
Node.js * cpu-count
Docker Hub
No Bash scripts!
No provisioning time * instances
https://github.com/ironSource/docker-config/blob/c4bbad11a323fd6e36ff31505c43e7c8dc51b1eb/Dockerfile-iojs-cluster
User Data
https://github.com/ironSource/docker-config/blob/2f4ccc7c277850de928cc432f47b2fc58fb8732a/Dockerfile-nodejs-cluster
docker-common.yml
docker-compose.yml
https://stash.ironsrc.com/projects/INFRA-IB/repos/ironbeastcompserter/browse/docker-compose.ymlDocker Compose Example #1 (Using ‘Extends):
User Data
Docker Compose Example #2 (Using ‘links’):