Top Banner

Click here to load reader

of 27

Survival Analysis & TTL Optimization

Feb 25, 2016

Download

Documents

morna

Survival Analysis & TTL Optimization. Rob Lancaster, Orbitz Worldwide. Outline. The Problem Survival Analysis Intro Key Terms Techniques & Models: Kaplan-Meier Estimates Parametric Models Optimizing Cache TTL Methods Results. The Problem. The hotel rate cache and TTL optimization. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

Slide 1

Rob Lancaster, Orbitz WorldwideSurvival Analysis &TTL OptimizationOutlineThe ProblemSurvival AnalysisIntroKey TermsTechniques & Models:Kaplan-Meier EstimatesParametric ModelsOptimizing Cache TTLMethodsResults

The ProblemThe hotel rate cache and TTL optimization.The Hotel Rate Cache

The Hotel Rate CacheKey/Value StoreKey: Search Criteria

Value: Hotel Rate Information

Benefit = Reduce looks & latencyCost = Increased re-price errors

hotel idcheck-in# peoplehostcheck-out# roomsThe Hotel Rate CacheEach cache entry is given a time-to-live (TTL)TTLs set based on intuition ages ago.Goal: Optimize TTL to decrease looks, control re-price errorsHow? Ideally, find greatest TTL value at which probability of rate change is below an acceptable threshold.

Survival AnalysisA brief? introduction.What is Survival Analysis?Statistical procedures for predicting time until an event occurs.Event: death, relapse, recovery, failure.Examples:Heart transplant patients:Time until death.Leukemia patients in remission:Time until relapse.Prison parolees:Re-arrest.

Key TermsSurvival Time, T vs. tFailureCensoringSurvival Function

CensoringPeriod of no informationLeft-censored.Right-censored.Causes:Individual is lost to follow-upDeath from cause unrelated to event of interestStudy endsModels assume either failure or censoring.Survival FunctionSurvival Function: S(t)Probability of survival greater than t, i.e. that T > tProperties:Non-increasingS(t) = 1, for t=0.S(t) = 0, t=

Kaplan-Meier Estimatestjmjqjnj0001411014211134211160287106910510224tj: observation timemj: number of failuresqj: number of censored observationsnj: number at risk

Kaplan-Meier Estimates

Parametric ModelsAccelerated Failure TimeAssume distributionUse regression to fit parameters. is parameterized in terms of predictor variables and regression parameters.

DistributionS(t)ExponentialWeibullLog-logistic

Optimizing Cache TTLMethods and early results.Data CollectionData is collected from service hosts in our hotel stack.Includes every live rate search (aka burst) performed by our hotel stack.Raw data: ~200 GB, compressed, 108 records.Extraction: