Top Banner
AB Testing Revolution through constsant evolution
27

AB Testing at Expedia

Apr 15, 2017

Download

Documents

Paul Lucas
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: AB Testing at Expedia

AB Testing

Revolution through constsant evolution

Page 2: AB Testing at Expedia
Page 3: AB Testing at Expedia

Expedia SF114 Sansome

www.expedia.com@expediaeng

Work with us: [email protected]

m

Page 4: AB Testing at Expedia

Paul LucasSr Director, TechnologyWant to visit next? Greece

Jeff MadynskiDirector, TechnologyWant to visit next? Croatia

Anuj GuptaSr Software Dev EngineerWant to visit next? Peru

Page 5: AB Testing at Expedia

Revolution through constant evolution

Page 6: AB Testing at Expedia
Page 7: AB Testing at Expedia

Technology EvolutionV0 – batch processing from abacus exposure logs, Omniture, and booking datamart. Tableau visualization

V1 - Storm, Kestrel, DynamoDB / Postgresql reading UIS messages and client log data. (Nov 2014 - Dec 2015)

V2 - Introduce Kafka and Cassandra (May 2016)

Page 8: AB Testing at Expedia

TNL – original solution• Batch processing• Tableau visualization• Merged data from OMS/omniture• Problems:

– 1-2d feedback loop – what if we had mistakes in test implementation(bucketing not what anticipated)?

– In order to fix data import errors - start over again

Page 9: AB Testing at Expedia

TNL Dashboard v0

Omnitureclick data

Booking datamart

Abacus exposures

Tableau

Hadoop ETL

Page 10: AB Testing at Expedia

TNL v0 -> v1

Page 11: AB Testing at Expedia

Begin Jeffdelete this page

Page 12: AB Testing at Expedia
Page 13: AB Testing at Expedia

TNL v1 Problems • Database size 420GB, queries took 3-5 minutes

• Data drop (kestrel) • Increase in data (multi-brand, +customers)

Page 14: AB Testing at Expedia

TNL v1->v1.1, v2• Fighting fires, borrowing more time• POC next

Page 15: AB Testing at Expedia

Fighting fires – borrowing more time

Page 16: AB Testing at Expedia

User Interaction Service(UIS) Traffic

Page 17: AB Testing at Expedia

Scaling messaging system

Kafka

• Publish-subscribe based messaging system

• Distributed and reliable• Longer retention and

persistence• Monitoring dashboard

and alerts• Buffer for system

downtime

Kestrel limitation

• Message durability is not available

• Reaching potential scalability issues

• In-active open source project

Page 18: AB Testing at Expedia

Scaling database performance

• Database views for caching–Views created every 6 hours

–UI only loads data from views

–Read-only replicas for select queries

• Archive data–Moved old and completed experiment data to

separate tables

–DB cleanup using vacuum and re-indexing

Page 19: AB Testing at Expedia

TNL Dashboard v2

Page 20: AB Testing at Expedia

Product Demo

Page 21: AB Testing at Expedia

Streaming

Page 22: AB Testing at Expedia

•Column-oriented, time series schema•Time-to-live(TTL) on data•Only store most popular aggregates

Page 23: AB Testing at Expedia

v1 VS v2•New Architecture

– More scalable– More responsive– Less prone to data loss

• Lessons learnt–System is as fast as the slowest component

–Fault-tolerance and resilience

–Partition data

–Pre-production environment

Page 24: AB Testing at Expedia

Questions/discussion

Page 25: AB Testing at Expedia

APPENDIX

Page 26: AB Testing at Expedia

27Apply statistical power to test results results

Using 90% confidence level, 1 out of 10 tests will be false positive or negative

Heads TailsRight hand 51 49Left hand 49 51

Right hand is superior at getting

heads!

Page 27: AB Testing at Expedia

Do’s and Don’ts when concluding tests

Don’t call test too early; this increases false

positives or negatives

Don’t call tests as soon as you see positive results because test

result frequently goes up and down

To claim a test Winner/Loser, the positive/negative effect has to stay for

at least 5 consecutive days and the trend is stable

Please note this type of chart is not currently available in the Test and Learn dashboard or SiteSpect UI; The shape of Confidence Interval lines varies test by test

Define one success metric and run tests for a pre-determined duration;

(For hotel/flight tests in the US, suggest running until confidence interval of conversion change is

within +/- 1%); tests should run at least 10 days

Don’t assume the midpoint (observed % change during the test period) will hold true after the feature is rolled out: a 4.0% +/- 4.0% test may have zero impact and may not be much

better than a 1.0% +/- 1.0% test

Don’t call an inconclusive test “trending positive” or “trending

negative” as test result fluctuates

Contact ARM testing team for questions

[email protected]

Using 90% confidence levelWinner: Lower bound of % change >= 0 (or probability of test being positive >= 95%);Loser: Higher bound of % change <= 0 (or probability of test being negative >= 95%)

Else: Inconclusive or Neutral