Top Banner
VELOX: MODELS IN ACTION Presented by Dan Crankshaw [email protected] Henry Milner, Joseph Gonzalez, Peter Bailis, Haoyuan Li, Tomer Kaftan, Zhao Zhang, Ali Ghodsi, Michael Franklin, Michael Jordan, and Ion Stoica https://amplab.cs.berkeley.edu/projects/velox/
90
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Velox: Models in Action

VELOX:MODELS IN ACTION

Presented by Dan Crankshaw [email protected]

Henry Milner, Joseph Gonzalez, Peter Bailis, Haoyuan Li, Tomer Kaftan,Zhao Zhang, Ali Ghodsi, Michael Franklin, Michael Jordan, and Ion Stoica

https://amplab.cs.berkeley.edu/projects/velox/

Page 2: Velox: Models in Action

Data

ModelPredictionsPredict

Train

Observe

Well Studied

MODELS AT REST

Page 3: Velox: Models in Action

Data

ModelPredictionsServing

TrainingFeedb

ack

OpenChallenges

Page 4: Velox: Models in Action

Data

ModelPredictionsServing

TrainingFeedb

ack

OpenChallenges

Velox Model Management System

Page 5: Velox: Models in Action

Catify: Music for Cats

Page 6: Velox: Models in Action

Node.js App Server

Apache Web Server

MongoDB

Catify: Music for Cats

Page 7: Velox: Models in Action

MODELING TASK

Rating

Songs

Page 8: Velox: Models in Action

MODELING TASK

Ratings

Songs

Prediction

Page 9: Velox: Models in Action

Data

ModelPredictionsServing

TrainingFeedb

ack

Page 10: Velox: Models in Action

Catify: Music for Cats

Tachyon + HDFS

Pipeline

CatID Song Score

1 16 2.1

1 14 3.7

3 273 4.2

4 14 1.9

Page 11: Velox: Models in Action

Catify: Music for Cats

Tachyon + HDFS

Pipeline

CatID Song Score

1 16 2.1

1 14 3.7

3 273 4.2

4 14 1.9

Page 12: Velox: Models in Action

Catify: Music for Cats

Tachyon + HDFS

Pipeline

CatID Song Score

1 16 2.1

1 14 3.7

3 273 4.2

4 14 1.9

Page 13: Velox: Models in Action

Pipeline

Tachyon + HDFS

Node.js App Server

Apache Web Server

MongoDB

Catify: Music for Cats

Page 14: Velox: Models in Action

Data

ModelPredictionsServing

TrainingFeedb

ack

Page 15: Velox: Models in Action

Pipeline

Tachyon + HDFS

Node.js App Server

Apache Web Server

MongoDB

Catify: Music for Cats

Page 16: Velox: Models in Action

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Materialize all predictions

Pipeline

Catify: Music for Cats

Page 17: Velox: Models in Action

Catify: Music for Cats

SongsO(users + songs)

Users

Page 18: Velox: Models in Action

Songs

Users

O(users * songs)

Catify: Music for Cats

Page 19: Velox: Models in Action

Pipeline

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Catify: Music for Cats

Page 20: Velox: Models in Action

Pipeline

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Training Data

Catify: Music for Cats

Page 21: Velox: Models in Action

Pipeline

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Training Data

New Model

Catify: Music for Cats

Page 22: Velox: Models in Action

What’s wrong?

Page 23: Velox: Models in Action

1. Built from scratch for each application

What’s wrong?

Page 24: Velox: Models in Action

1. Built from scratch for each application

2. Different systems

What’s wrong?

Page 25: Velox: Models in Action

1. Built from scratch for each application

2. Different systems3. Space inefficient

What’s wrong?

Page 26: Velox: Models in Action

1. Built from scratch for each application

2. Different systems3. Space inefficient4. Stale predictions

What’s wrong?

Page 27: Velox: Models in Action

1. Built from scratch for each application

2. Different systems3. Space inefficient4. Stale predictions5. The T-Swift effect Sample Bias

What’s wrong?

Page 28: Velox: Models in Action

Pipeline

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Training Data

New Model

Catify: Music for Cats

Page 29: Velox: Models in Action

Pipeline

Tachyon + HDFS

Web Application Velox

The Missing Piece

Page 30: Velox: Models in Action

Data

ModelPredictionsServing

TrainingFeedb

ack

Page 31: Velox: Models in Action

Tachyon + HDFS

Velox

The Missing Piece

Prediction Service

Model Manager

Web Application

Pipeline

Page 32: Velox: Models in Action

BENEFITS

Page 33: Velox: Models in Action

BENEFITS1. Low-latency and scalable

predictions as a service

Page 34: Velox: Models in Action

BENEFITS1. Low-latency and scalable

predictions as a service2. Integrated approach leads to

fresher, better predictions

Page 35: Velox: Models in Action

BENEFITS1. Low-latency and scalable

predictions as a service2. Integrated approach leads to

fresher, better predictions3. Easy translation to production

predictions

Page 36: Velox: Models in Action

BENEFITS1. Low-latency and scalable

predictions as a service2. Integrated approach leads to

fresher, better predictions3. Easy translation to production

predictions4. Eases operational pain

Page 37: Velox: Models in Action

PERSONALIZED MODELING

Page 38: Velox: Models in Action

PERSONALIZED MODELING

Page 39: Velox: Models in Action

wu · f(x; ✓)Rating =

PERSONALIZED MODELING

Page 40: Velox: Models in Action

Shared BasisFeature Models

wu · f(x; ✓)Rating =

PERSONALIZED MODELING

Page 41: Velox: Models in Action

Shared BasisFeature Models

PersonalizedUser Model

wu · f(x; ✓)Rating =

PERSONALIZED MODELING

Page 42: Velox: Models in Action

Shared BasisFeature Models

PersonalizedUser Model

wu · f(x; ✓)

Change slowly

Rating =

PERSONALIZED MODELING

Page 43: Velox: Models in Action

Shared BasisFeature Models

PersonalizedUser Model

wu · f(x; ✓)

Change slowlyHighly dynamic

Rating =

PERSONALIZED MODELING

Page 44: Velox: Models in Action

Data

ModelPredictionsServing

TrainingFeedb

ack

Page 45: Velox: Models in Action

VELOX

Pipeline

Tachyon + HDFS

VeloxPrediction Service

Model Manager

Web Application

Predictions as a service

Page 46: Velox: Models in Action

VELOX

Pipeline

Tachyon + HDFS

VeloxPrediction Service

Model Manager

Web Application

Predictions as a service

Page 47: Velox: Models in Action

PREDICTION API

GET  /velox/catify/predict_top_k?userid=22&k=100

GET  /velox/catify/predict?userid=22&song=27632

Page 48: Velox: Models in Action

PREDICTION API

GET  /velox/catify/predict_top_k?userid=22&k=100

GET  /velox/catify/predict?userid=22&song=27632

Page 49: Velox: Models in Action

PREDICTION API

GET  /velox/catify/predict_top_k?userid=22&k=100

GET  /velox/catify/predict?userid=22&song=27632

Page 50: Velox: Models in Action

PREDICTIONS

def  predict(  u:  UUID,  x:  Context  )

wu · f(x; ✓)

Page 51: Velox: Models in Action

Look up user weight

PREDICTIONS

def  predict(  u:  UUID,  x:  Context  )

wu · f(x; ✓)

Page 52: Velox: Models in Action

Compute Features

Look up user weight

PREDICTIONS

def  predict(  u:  UUID,  x:  Context  )

wu · f(x; ✓)

Page 53: Velox: Models in Action

LOW-LATENCY PREDICTIONS

Velox

Tachyon

Partition  0

Velox

Tachyon

Partition  1

Velox

Tachyon

Partition  2

Partition users

Page 54: Velox: Models in Action

Compute Features

Look up user weight

PREDICTIONS

def  predict(  u:  UUID,  x:  Context  )

wu · f(x; ✓)

Page 55: Velox: Models in Action

LOW-LATENCY PREDICTIONS

Velox

Tachyon

Feature Cache

Page 56: Velox: Models in Action

LOW-LATENCY PREDICTIONS

Velox

Tachyon

Feature Cache

Features shared between users

Page 57: Velox: Models in Action

Data

ModelPredictionsServing

TrainingFeedb

ack

Page 58: Velox: Models in Action

Data

ModelPredictionsServing

TrainingFeedb

ack

Page 59: Velox: Models in Action

Pipeline

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Catify: Music for Cats

Page 60: Velox: Models in Action

Pipeline

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Training Data

Catify: Music for Cats

Page 61: Velox: Models in Action

SIMPLE EXPLORATION

Rating

Songs

Prediction

Page 62: Velox: Models in Action

SIMPLE EXPLORATION

Rating

Songs

Prediction

Epsilon-greedy

Page 63: Velox: Models in Action

SIMPLE EXPLORATION

Rating

Songs

Prediction

Epsilon-greedy

Page 64: Velox: Models in Action

ACTIVE LEARNING

Rating

Songs

Prediction

Page 65: Velox: Models in Action

ACTIVE LEARNING: LinUCB

Rating

Songs

Prediction

Uncertainty

Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. WWW '10: Proceedings of the 19th international conference on World wide web, New York, New York, USA:  ACM. doi:10.1145/1772690.1772758

Page 66: Velox: Models in Action

ACTIVE LEARNING: LinUCB

Rating

Songs

Prediction

Look at upper confidence bound

Uncertainty

Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. WWW '10: Proceedings of the 19th international conference on World wide web, New York, New York, USA:  ACM. doi:10.1145/1772690.1772758

Page 67: Velox: Models in Action

ACTIVE LEARNING: LinUCB

Rating

Songs

Prediction

Look at upper confidence bound

Uncertainty

Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. WWW '10: Proceedings of the 19th international conference on World wide web, New York, New York, USA:  ACM. doi:10.1145/1772690.1772758

Page 68: Velox: Models in Action

Data

ModelPredictionsServing

TrainingFeedb

ack

Page 69: Velox: Models in Action

Pipeline

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Velox

Catify: Music for Cats

Prediction Service

Model Manager

Page 70: Velox: Models in Action

Data

ModelPredictionsServing

TrainingFeedb

ackMgmt.

Page 71: Velox: Models in Action

Data

ModelPredictionsServing

TrainingFeedb

ackMgmt.

RealtimeLearning

Page 72: Velox: Models in Action

Pipeline

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Training Data

New Model

Catify: Music for Cats

Page 73: Velox: Models in Action

GET  /velox/catify/predict?userid=22&song=27632

GET  /velox/catify/predict_top_k?userid=22&k=100

USER-FACING API

Page 74: Velox: Models in Action

GET  /velox/catify/predict?userid=22&song=27632

GET  /velox/catify/predict_top_k?userid=22&k=100

USER-FACING API

POST  /velox/catify/observe?userid=22&song=27632?score=3.7

Page 75: Velox: Models in Action

ONLINE UPDATES

def  observe(u:  UUID,  x:  Context,  y:  Score)

wu · f(x; ✓)

Page 76: Velox: Models in Action

Update wu with new training point

ONLINE UPDATES

def  observe(u:  UUID,  x:  Context,  y:  Score)

wu · f(x; ✓)

Page 77: Velox: Models in Action

Basis functions stay fixed

Update wu with new training point

ONLINE UPDATES

def  observe(u:  UUID,  x:  Context,  y:  Score)

wu · f(x; ✓)

Page 78: Velox: Models in Action

Data

ModelPredictionsServing

TrainingFeedb

ackMgmt.

RealtimeLearning

Page 79: Velox: Models in Action

Data

ModelPredictionsServing

TrainingFeedb

ackMgmt.

RealtimeLearning + Offline Retraining

Page 80: Velox: Models in Action

Pipeline

Tachyon + HDFS

Node.js App Server

NGINX

MongoDB

Velox

Catify: Music for Cats

Prediction Service

Model Manager

Page 81: Velox: Models in Action

Data

ModelPredictionsServing

Feedb

ack

Velox Model Management System

Spark

Page 82: Velox: Models in Action

The future of research in scalable learning systems will be in the integration of the learning lifecycle:

Data

ModelPredictionsServing

TrainingFeedb

ack

Page 83: Velox: Models in Action

SUMMARY

Page 84: Velox: Models in Action

•Model training and predictions rely on ad-hoc, manual processes spread across multiple systems

SUMMARY

Page 85: Velox: Models in Action

•Model training and predictions rely on ad-hoc, manual processes spread across multiple systems

•The Velox system automatically maintains multiple models while providing low latency, scalable, and personalized predictions

SUMMARY

Page 86: Velox: Models in Action

•Model training and predictions rely on ad-hoc, manual processes spread across multiple systems

•The Velox system automatically maintains multiple models while providing low latency, scalable, and personalized predictions

•Velox is part of BDAS, is coming soon…

SUMMARY

Page 87: Velox: Models in Action

•Model training and predictions rely on ad-hoc, manual processes spread across multiple systems

•The Velox system automatically maintains multiple models while providing low latency, scalable, and personalized predictions

•Velox is part of BDAS, is coming soon…•https://amplab.cs.berkeley.edu/projects/velox/

SUMMARY

Page 88: Velox: Models in Action

BACKUP MATERIAL

Page 89: Velox: Models in Action

RETRAIN OFFLINEdef  retrainOffline(sc:  SparkContext,  

trainingData:  RDD)

wu · f(x; ✓)

Page 90: Velox: Models in Action

Retrain feature functions

RETRAIN OFFLINEdef  retrainOffline(sc:  SparkContext,  

trainingData:  RDD)

wu · f(x; ✓)

Use Spark for batch retrain