Top Banner
Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble Indexing
17

Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

Jan 05, 2016

Download

Documents

Della Glenn
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li GuoChinese Academy of SciencesState Grid Energy Institute, China

Efficient Behavior Targeting Using SVM Ensemble Indexing

Page 2: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

Behavior Targeting (BT) uses users’ historical behavior data to select the most relevant ads for display.

Example from Yahoo! Research

Behavior targeting

ads

User behavior data

Targeted users

Page 3: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

Regression for BT

Poisson Regression model (Ye Chen, eBay, 2009). x: ad clicks and views, page views, search

queries and clicks. y: click-through rate (CTR).

Ye Chen et al., Large-scale behavior targeting (KDD’09 best paper award)

View data

Click data

Poisson dis.

Poisson dis.

Poisson reg.

on view

Poisson reg. on

click

ad catego

ry

Page 4: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

Limitations

Limitations: parameter tuning is very difficult. the Poisson assumption is not always true for real-world

behavior data. Clicks are typically several orders of magnitude fewer

than views. User interests are not always fixed, but rather transient.

Page 5: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

Classification for BT SVM for classification

Example 1: 3 users on Nikon (www.nikon.com)’s ad a

View data

Click data

ad catego

ry

View and click

data(+)

View but no click data(-)

SVM for classificat

ion

Challenges 1,2,3

Page 6: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

Classification for BT

Ensemble SVM on data streams

Merits no complicated parameters no statistical assumptions Dynamic model on data streams

Challenge 4

Page 7: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

Limitations

Time cost is heavy for online computing ensemble prediction

time cost: A (advertisers)*W(ensemble size)*N(support vectors)*T(features)

Example 2: We collect 2 million behavior events (W = 10) in 1 minute, and prediction result costs 53 minutes.

Page 8: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

Solutions

Construct Index structure for Ensemble SVM.

Why the index work ?Trade space for time. shared features among multiple support vectorsthe sparse structure of support vectors

Support vector

Text termsFeatures

Document

Ensemble SVM

Document set

map

P. Zhang et al., knowledge index for online data streams

( KDD 2011 & ICDM 2011)

Page 9: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

The index structure

The SVM-index structure Example 3: based on example 1,

consider a SVM with 3 support vectors

Ensemble informati

on

Support vectors

Inverted hashing

table

Time complexity O(T)

Page 10: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

The index structure

Operations– Search: Predict the label of each incoming user data x,

• Step 1: searches support vectors in the left inverted indexes

• Step 2: calculate x’s class label

– Insert: Integrate new classifiers into ensemble– Delete: Drop outdated classifiers from ensemble

Memory

See our source codes.

Page 11: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

Experiments

Data sets Search engine data

• Comparisons– Possion– E-SVM– E-Index (our method)

Page 12: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

Observations

Comparisons

E-index has sub-linear prediction time

E-SVM consumes more memory

Page 13: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

Comparisons

Ensemble models are more accurate than Poisson regression model

Page 14: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

Comparisons

The index method can significantly improve the efficiency, especially when the ensemble size is

large.

Page 15: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

Related Work

Behavior targeting Regression models vs. classification models

Stream indexing Boolean expression indexing in

Publish/subscribe systems

Ensemble models Concept drifting

Page 16: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

Conclusions

Contributions Identify and address the prediction efficiency

problem for ensemble models for behavior targeting.

Convert ensemble SVM model to a document set, and propose a new type of invert text index structure to achieve sub-linear prediction time.

Future work Index more complicated SVM models with non-

linear kernels.

Page 17: Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

For source code, visit our websitestreamming.org/homepages/lijun.html

Questions?