Top Banner
Netflix API Crash Course Building & Running the API in 30 minutes Ben Schmaus, Netflix May 2013, Gluecon [email protected] @schmaus
60

Gluecon 2013 netflix api crash course

Dec 05, 2014

Download

Technology

Presentation from Gluecon 2013 on building and running the Netflix API.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Gluecon 2013   netflix api crash course

Netflix API Crash CourseBuilding & Running the API in 30 minutes

Ben Schmaus, NetflixMay 2013, Gluecon

[email protected]@schmaus

Page 2: Gluecon 2013   netflix api crash course

Streaming TV Shows & Movies Globally

Page 3: Gluecon 2013   netflix api crash course

> 1000 Devices

Page 4: Gluecon 2013   netflix api crash course

1/3 ofInternet at peak

Page 5: Gluecon 2013   netflix api crash course

Programmer not Distributor

Page 6: Gluecon 2013   netflix api crash course

More than 36 million subscribers in over

40 countries

Page 7: Gluecon 2013   netflix api crash course

How does the API fit into the picture?

Page 8: Gluecon 2013   netflix api crash course

PersonalizationEngine User Info Movie

Metadata Ratings SimilarMovies

InstantQueue

A/B TestEngine

API

Page 9: Gluecon 2013   netflix api crash course

PersonalizationEngine User Info Movie

Metadata Ratings SimilarMovies

InstantQueue

A/B TestEngine

APIEnable UX Innovation

Insulate from Failure

Page 10: Gluecon 2013   netflix api crash course

> 2 Billion Requests per Day

Page 11: Gluecon 2013   netflix api crash course

Growth Over Time

Page 12: Gluecon 2013   netflix api crash course
Page 13: Gluecon 2013   netflix api crash course
Page 14: Gluecon 2013   netflix api crash course
Page 15: Gluecon 2013   netflix api crash course
Page 16: Gluecon 2013   netflix api crash course
Page 17: Gluecon 2013   netflix api crash course
Page 18: Gluecon 2013   netflix api crash course
Page 19: Gluecon 2013   netflix api crash course
Page 20: Gluecon 2013   netflix api crash course
Page 21: Gluecon 2013   netflix api crash course
Page 22: Gluecon 2013   netflix api crash course
Page 23: Gluecon 2013   netflix api crash course

Automation

Visibility

Operational awareness

Balance speed& quality

Page 24: Gluecon 2013   netflix api crash course

How's the APIput together?

Page 25: Gluecon 2013   netflix api crash course

ELB RoutingCluster

Mid-tier Services

Backend App

Cluster

Backend App

Cluster

+

API Layer

Page 26: Gluecon 2013   netflix api crash course

ELB RoutingCluster

Mid-tier Services

Backend App

Cluster

Backend App

Cluster

+

API Layer

Page 27: Gluecon 2013   netflix api crash course

Inside an API

App Server

RxJava

Hystrix

Service Client 1 Service Client 2 Service Client N

Page 28: Gluecon 2013   netflix api crash course

HystrixRx+Java Service Layer

Service Client(provided JAR)

ApplicationService

/device/endpoint(provided script)

Service

UI Teams

Mid-tierService Teams

API Team

Page 29: Gluecon 2013   netflix api crash course

Continually changing UI scripts and mid-tier services

Functionality, resiliency and performance drifts over time

Page 30: Gluecon 2013   netflix api crash course

Deployment & Ops

Page 31: Gluecon 2013   netflix api crash course

REMOVE MANUAL WORK pushing code to multiple AWS regions/clusters

ENABLE RAPID DEPLOYMENT of code despite limited visibility into how it's

changed

KEEP TEAM INFORMED about what's happening in prod

MITIGATE RISK of systemic failure

Page 32: Gluecon 2013   netflix api crash course

Tools

Page 33: Gluecon 2013   netflix api crash course

End-to-end Traceability Using Python/Java Glue

Page 34: Gluecon 2013   netflix api crash course

Code Flow

Page 35: Gluecon 2013   netflix api crash course

Run 1% of your traffic on the new code and see how it does

Page 36: Gluecon 2013   netflix api crash course

API ami-123 API ami-456

2xx4xx5xx

latencybusy threads

load...

Page 37: Gluecon 2013   netflix api crash course

Manually looking at graphs and SSH-ing into servers and grep-ing logs

doesn't scale(although we used to do that)

Page 38: Gluecon 2013   netflix api crash course

Confidence score for each AMI based on comparison of 1000+ metrics

Page 39: Gluecon 2013   netflix api crash course

Scannable visualization of metric space

More important

Less important

Page 40: Gluecon 2013   netflix api crash course

Cross-reference Jira, Link to code diffs

Page 41: Gluecon 2013   netflix api crash course

Track lib changes

Page 42: Gluecon 2013   netflix api crash course

Easy to access report artifacts for each AMI

Page 43: Gluecon 2013   netflix api crash course

Your basic red/black push

Page 44: Gluecon 2013   netflix api crash course
Page 45: Gluecon 2013   netflix api crash course
Page 46: Gluecon 2013   netflix api crash course
Page 47: Gluecon 2013   netflix api crash course
Page 48: Gluecon 2013   netflix api crash course

Doing red/black by hand for multiple clusters across multiple regions is

not fun

Page 49: Gluecon 2013   netflix api crash course

Automate multi-cluster/region pushes

Page 50: Gluecon 2013   netflix api crash course

Automate multi-cluster/region pushes

Don't forget to automate

rollbacks, too!

Page 51: Gluecon 2013   netflix api crash course

$Who, $What, $Where, $When

e.g., "bschmaus, ami-123, Sandbox Canary, 2013-05-06 19:05"

Latest prod change in chat topic

Page 52: Gluecon 2013   netflix api crash course

Quickly see status of all clusters in a region

Page 53: Gluecon 2013   netflix api crash course

What the #%*! just happened!?

Page 54: Gluecon 2013   netflix api crash course

Historical & realtime metrics, sort realtime by error/request rate

Page 55: Gluecon 2013   netflix api crash course

Distributed grep + tail

2013-05-09.20:38:54 MX 200 us-east-1c i-1824cb73 i-1c61b77f prod NFPS3-001-8G50FJCX... 288404769389848058 90ms api-global.netflix.com GET /tvui/release/470/plus/pathEvaluator -amazon.ami-id: ami-502eb039amazon.availability-zone: us-east-1camazon.instance-id: i-1824cb73amazon.instance-type: m2.2xlargeamazon.local-ipv4: 10.6.213.112amazon.public-hostname: ec2-54-243-4-69.compute-1.amazonaws.comamazon.public-ipv4: 54.243.4.69cookie_esn: NFPS3-001-8G50FJCX...country: MXcurrentTime: 1368131934468duration-millis: 90esn: NFPS3-001-8G50FJCX...geo.city: CIUDADOBREGON...

$ ./simple_stream.py -f -q 'e["country"]=="MX" && e["esn"]==~/NFPS3.*/' -r us

Page 56: Gluecon 2013   netflix api crash course

Go for haystack handing you the needle

Page 57: Gluecon 2013   netflix api crash course

Or at least be able to make smaller haystacks

Page 58: Gluecon 2013   netflix api crash course

Continuously experiment to make hard things easier

Page 59: Gluecon 2013   netflix api crash course

Even with the best tools, building software is hard work.

Great engineers build great software.

Page 60: Gluecon 2013   netflix api crash course

Want to help us build the API?

[email protected]@schmaus