Top Banner
www.flurry.com November 14, 2013 Anthony Watkins, Senior Director of Developer Relations Processing Terabytes of Data in Real- Time @flurrymobile @antwatkins
23

Flurry Analytic Backend - Processing Terabytes of Data in Real-time

Aug 11, 2014

Download

Data & Analytics

Trieu Nguyen

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

www.flurry.com

November 14, 2013

Anthony Watkins, Senior Director of Developer Relations

Processing Terabytes of Data in Real-

Time

@flurrymobile

@antwatkins

Page 2: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

www.flurry.com

Page 3: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

Flurry is a leading mobile advertising and analytics provider

Pub

lishe

r

Adv

ertis

er

Audience

AppCircle Applications: 10,000+

Devices/month: 300M

Conversions/month: 120M

AppSpot Applications: 2,500+

Devices/month: 250M

Impressions/month: 7.5B

Analytics Applications: 400,000

Devices/month: 1.2B

Data points/month: 1.9T

Page 4: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

• Why Flurry Switched from a MapReduce Framework to

pipeline processing

• How Flurry uses Kafka in data processing

• Tuning of Kafka to work in Flurry’s environment

• Flurry Monitoring and error handling of streams

Topics

The Path to Real-Time Processing

www.flurry.com 4

Page 5: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

The Why

www.flurry.com 5

Page 6: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

Past Processing Model

www.flurry.com 6

Device Reports

NoSQL DataStore

Batch

Collectors

MapReduce

(jobs)

External

Action

Page 7: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

Flurry Analytics MapReduce Architecture

www.flurry.com 7

Agent Portal Data Log Processor

Developer

Portal Metrics Computer

HDFS

HBase

HBase

Hadoop/Hbase

Jetty

Jetty

HTTP

Binary Encoded

Data

Raw Data

Log Archive

Metrics Table

(Cube)

Normalized

Data Storage

User Profile

Data

MySQL

Hadoop Map/Reduce

Hadoop Map/Reduce

Web Layer Metrics Processing

Page 8: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

Data Collection and Processing in MR

Pros

www.flurry.com 8

MapReduce

(jobs)

Page 9: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

Data Collection and Processing in MR

Cons

www.flurry.com 9

Device Reports

MapReduce

(jobs)

Job Time

Startup Time

Page 10: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

Flurry Kafka

The Move to Kafka

www.flurry.com 10

Page 11: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

About Kafka

Origin

www.flurry.com 11

November 2010 June 2011 November 2012

Page 12: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

About Kafka

www.flurry.com 12

Producer Producer Producer

Kakfa Broker

Consumer Consumer Consumer

Page 13: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

About Kafka

www.flurry.com 13

Kafka Broker

*

* Partition image courtesy of http://kafka.apache.org/images/log_anatomy.png

Page 14: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

About Kafka

www.flurry.com 14

Producer 1 Producer N Producer 2

Kafka Cluster

Broker 1

P0 P2

Broker 2

P1 P3

Consumer Group

C1 C2 C3

Page 15: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

Why Kafka for Flurry

www.flurry.com 15

Device Reports

MapReduce

(jobs) Kafka

Startup

Time

Page 16: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

Introducing the Data Log Consumer (DLC)

www.flurry.com 16

Agent Portal Data Log Consumer

Developer

Portal Metrics Computer

HDFS

HBase

HBase

Hadoop/Hbase

Jetty

Jetty

HTTP

Binary Encoded

Data

Metrics Table

(Cube)

Normalized

Data Storage

User Profile

Data

MySQL

Kafka

Hadoop Map/Reduce

Web Layer Metrics Processing

Page 17: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

• Zookeeper timeouts

• Completely async service

• Default fsync interval

• Commit threshold from local environments

Tuning Kafka for Flurry

Challenges

www.flurry.com 17

Page 18: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

How Flurry Uses Kafka

Infrastructure and Setup

www.flurry.com 18

Consumer Group

C1 C2 C… C325

Kafka Cluster

B1 B2 B3

Broker

P1 P2 P… P400

Topic

Page 19: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

Flurry Monitoring / Error Handling

Monitoring

www.flurry.com 19

• Alerts

• Consumer Failure

• Broker Failure

Error Handling

Page 20: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

Next Steps: 0.8

www.flurry.com 20

Data Log Consumer

HDFS

Kafka

Data Log Consumer

Kafka

Kafka Cluster

Broker 1

P0 P2

Broker 2

P1 P3

P1’ P3’ P0’ P2’

Page 21: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

Next Steps: Extended Pipeline

www.flurry.com 21

Input Data

NoSQL DataStore

Real-Time Batch

Collectors

Consumer/

Producer

Systems

MapReduce

(jobs)

External

Action External

Action

Page 22: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

Next Steps: Topics and Consumer Groups

Infrastructure and Setup

www.flurry.com 22

Consumer Group 2

C1’ C2’ C… CN’

Topic 1

Consumer Group 1

C1 C2 C… CN

Consumer Group N

C1’’ C2’’ C… CN’’

Topic 2

Page 23: Flurry Analytic Backend - Processing Terabytes of Data in Real-time

www.flurry.com

November 14, 2013

[email protected]

blog.flurry.com

@flurrymobile

@antwatkins

Thank you