Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)

Spark and Spark Streaming @ Netflix Kedar Sadekar & Monal Daxini

Mission •  Enable rapid pace of innovation for

Algorithm Engineers

•  Business Value – More A/B tests

Experiments

Users with plays Feature selection Large sample size

Turn back time Multiple ideas

Use Cases

Feature Selection

Feature Generation

Model Training

Metric Evaluation

Use Cases

More Users

Faster Iteration

InteractiveTurn back time

Solution – Netflix BDAS

Netflix BDAS

Notebooks• Zeppelin / iPython

• Inline Graphs / REPL

Prana Sidecar• Netflix Ecosystem

Berkeley BDAS• Faster Compute• Scale Users• Access to S3 / Hive

Netflix BDAS - Features •  Simplicity

–  Individual cluster •  Prana - Netflix ecosystem

–  Automatic configuration –  Classloader isolation –  Discovery & healthcheck

Netflix BDAS - Features •  Ad-hoc experimentation •  Time machine functionality •  Access to Hive data and micro services from single place

–  Access to multiple AWS buckets (S3)

Netflix BDAS – Sample Deployment

Wins •  8X the number of users

•  5x - 9x faster

•  Interactive

•  Turn back time

Learnings •  Easy to bring down an online system •  Almost killed 1000’s of ETL jobs

- hive metastore update •  Too many systems and configuration •  Playing catch up with libraries and tools

- Hadoop, iPython, Zeppelin •  Scala / Spark learning curve •  Debugging

- files open, no resources etc.

Increased Adoption Adoption increasing amongst teams •  Multiple Algorithmic Eng. teams •  Personalization Infrastructure •  Marketing •  Security •  A/B Test Engineering

Looking Ahead •  Spark-R / Dataframes support •  Multi-tenancy

–  Job specific configurations •  Debuggability •  Newer notebooks •  Spark Streaming

–  Lambda Architecture –  Real-time algorithms (trending now)

Netflix Streaming Event Data Pipeline

Monal Daxini

Netflix Event Data Pipeline

Event Streams

Stream processing

Publish

Collect

Process

Events @ Cloud Scale

450 Billion Events per Day

8 Million (17 GB) per sec peak

S3 EMR

Event Producer

Fronting Kafka

Consumer Kafka

At least Once Processing

Stream Consumers

Mantis

450 Billion events / day

S3 EMR

Event Producer

Fronting Kafka

Consumer Kafka

Stream Consumers

Mantis

S3 EMR

Event Producer

Fronting Kafka

Consumer Kafka

Stream Consumers

Mantis

S3 EMR

Event Producer

Fronting Kafka

Consumer Kafka

Stream Consumers

Mantis

Backpressure

Direct API for Kafka

Improve Cloud Multi-tenancy

What’s Missing?

Backpressure + JobScheduler JobGenerator DStreamGraph

Driver

00 : 00 00 : 01 00 : 02 00 : 03 00 : 04 00 : 05 00 : 06 00 : 07 00 : 08 00 : 09 00 : 10

Unbounded …

Ops Queue

Backpressure

SPARK-7398 – Add backpressure to Spark streaming

SPARK-6691 – Add dynamic rate limiter to Spark streaming

Backpressure + Backpressure implementation slated for

Spark 1.5 release

Spark 1.3

Kafka Integration

Spark 1.2 Receiver based Kafka Integration

2x Faster

Prefetch messages

Connection reuse (pooling)

Enhance

Cloud Multi-tenancy ì

Cloud Scheduler

Mesos Framework

Mesos Slave

Docker

Executor

Task Task

Docker

Executor

Task Task

Mesos Slave

Docker

Spark Driver

Improve

Measuring Spark Streaming Latencies

Spark Streaming Cloud Multi-tenancy

Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal Daxini, Netflix)

Data & Analytics

Data Models for Participating Programs -...

Intro to Spark and Spark SQL

NGK SPARK pÚü6s RESISTOR TYPE SPARK PLUGS SPARK PLUGS...

Spark and Spark Streaming at Netfix-(Kedar Sedekar and Monal...

REPLACEMENT SPARK PLUGS Spark Plug Application Chart ·...

Spark Plug Thread Repair Spark Plug Spark Plug Sockets for.....

Orrders 69 70 160514 postings - Taxguru.In€¦ · sanjay.....

Spark SQL | Apache Spark

Real Time Analytics via Spark & Scala | Spark & Scala...

Spark streaming , Spark SQL

Using Spark @ Conviva Spark Summit 2013

Spark Your Legacy (Spark Summit 2016)

SPARK SPARK VRT

Budapest Spark Meetup - Apache Spark @enbrite.ly

S U M M I T - Amazon Web Services... · Task2/Slide1 Task.....

Kerberizing spark. Spark Summit east