Top Banner
30

Data Science at Trainline for Smarter Journeys

Feb 13, 2017

Download

Data & Analytics

Marco Rossetti
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Science at Trainline for Smarter Journeys

Data Science at Trainlinefor Smarter Journeys

London, 22/11/2016

@DataScienceFest

@TrainlineTalent

Page 2: Data Science at Trainline for Smarter Journeys

Outline• A bit about Trainline.• Cloud-based serverless architecture for Big Data.• Case Study: BusyBot

• Other Case Studies

2

John Telford, Head of Data Architecture.Leading the adoption of Big Data technology at Trainline. Manages a team of Data Engineers and Database Administrators. Previously worked on Data Warehousing and Big Data at Channel 4. Computer Science degree from Brunel University.Twitter: @jtelford1

Marco Rossetti, Senior Data Scientist.Leading personalisation initiatives, like providing context-aware personalised services, journey recommendations, and tailored travel options. Previously worked on recommender systems for researchers at Mendeley. He has a PhD in Computer Science from University of Milan-Bicocca.Twitter: @ross85

Page 3: Data Science at Trainline for Smarter Journeys

Trainline - Smarter JourneysHelp our customers save,• Time (no more queuing for tickets at station)• Money (book early, find cheap tickets)• Energy (remove complexity)

Headlines...• We process more than £2.3 billion in ticket sales annually.• 100,000 smarter journeys every single day.• 44 train companies, across 24 European countries.• ~400 employees (London, Edinburgh, Paris).• More than 30m visits per month• 1 ticket sold every three seconds

3

Trainline takeover of Kings X, Oct 2016.

Page 4: Data Science at Trainline for Smarter Journeys

‡ 4

Page 5: Data Science at Trainline for Smarter Journeys

‡ 5

Page 6: Data Science at Trainline for Smarter Journeys

‡ 6

Page 7: Data Science at Trainline for Smarter Journeys

Bob's cloud lawsIt’s cloud if…1. It offers self provisioning.2. It offers pay-as-you-go pricing.3. It is, for all intents and purposes, infinitely scalable.

Thus, no need for support from the provider for set-up, no upfront payments for licences or minimum term agreements, and no constraints on what I can do!

• Hosting is not cloud.• BYO licensing is not cloud.

7

Page 8: Data Science at Trainline for Smarter Journeys

From servers... to serverless

8

Servers = Pets

Virtual Machines= Cattle

Containers & Serverless = Herds

Trainline policy:Use PaaS wherever possible,Use Serverless wherever possible,... so long as they are good enough.

Page 9: Data Science at Trainline for Smarter Journeys

Data Gateway

9

Page 10: Data Science at Trainline for Smarter Journeys

Data Platform

10

Page 11: Data Science at Trainline for Smarter Journeys

Lessons: Lambda• Effortless scaling; we often have >

100 λs running at once.• Warm-up time.

• Choose language / framework carefully.

• Consequences of 'freeze'.• Monitoring– single thread.

Google "Trainline Engineering Lambda"

11

ServiceTimeDistribution

Execution(ms)

Page 12: Data Science at Trainline for Smarter Journeys

Lessons: Kinesis Streams

• TCO is generally low.• But... understand costs, related to capacity of stream (number & size of

messages), time-to-live, etc.• Monitoring / alerting... CloudWatch is (probably) not enough.• Compress & encrypt?

Google "AWS Overview of Security Processes"

12

Page 13: Data Science at Trainline for Smarter Journeys

‡ 13

BusyBot

Page 14: Data Science at Trainline for Smarter Journeys

0% 10% 20% 30% 40% 50% 60% 70%

Delays

Overcrowding

Value for money

Toilet Facilities

Luggage Space

Availability of staff

Car Parking

Unhappy customers

Source : National Rail Passenger Survey (NRPS) 2015

14

Page 15: Data Science at Trainline for Smarter Journeys

‡ 15

Page 16: Data Science at Trainline for Smarter Journeys

‡ 16

Google "BusyBot overcrowding"

Page 17: Data Science at Trainline for Smarter Journeys

Busy Bot DiscoveryData from March to May - approx. 100k feedback from our Android

users.

17

Page 18: Data Science at Trainline for Smarter Journeys

Infrastructure – Data Gateway

Feedbackcollection

DailyEnrichment

{"train_destination": "RDG","retail_train_number": "GW2980","train_origin": "NRC","train_date": "2016-08-08T07:38:00.000Z","customer_longitude": 0,"train_hashid": "NRC:RDG:08/08/2016 08:38:00:GW2980","customer_location_on_train": "Back","customer_hashid": ”…","customer_got_seat": 1,"customer_feedback": "Yes","feedback_type": 1,"customer_latitude": 0,"feedbackid": ”…","device_id": ”…","timestamp": "2016-08-08T07:41:39.390Z","customer_id": ”…”

} 18

Page 19: Data Science at Trainline for Smarter Journeys

‡ 19

Page 20: Data Science at Trainline for Smarter Journeys

‡ 20

Page 21: Data Science at Trainline for Smarter Journeys

ALLfeebacks

≧100feedbacks

≧1000feedbacks

CityThameslink:50%

0%withaseat

100%withaseat

21

Page 22: Data Science at Trainline for Smarter Journeys

Infrastructure – Data PlatformModel BuildingAndValidation Service

route-origin

route-destination stop

customer-location-train

percentage-who-got-seat

feedback-count

EUS MAN EUS middle 0.738059701 4020

EUS BHM EUS middle 0.63788222 3532

KGX LDS KGX middle 0.704984154 3471

BHM EUS BHM middle 0.679082241 3356

KGX EDB KGX middle 0.5589236 3233

EUS GLC EUS middle 0.676663543 3201

MAN EUS MAN middle 0.769495772 3193

PAD SWA PAD middle 0.608086078 3067

EUS BHM EUS front 0.672365666 2866

EUS MAN EUS front 0.790479625 2773

{"retailTrainIdentifier": "VT7280","isBusy": false,"callingPoints": [

{"stationCode": "EUS","coaches": [

{"position": "Back", "recommend": true},{"position": "Front", "recommend": false},{"position": "Middle", "recommend": false}

]},{

"stationCode": "MKC","coaches": [

{"position": "Back", "recommend": false},…

22

Page 23: Data Science at Trainline for Smarter Journeys

• AtleastN feedbacks

• AtleastfeedbacksforD days

• CIonthepercentagewhogotaseat<=p

Data Validation

23

Page 24: Data Science at Trainline for Smarter Journeys

Journey Results

Live Tracker

BusyBot V1Sep 2016

24

Page 25: Data Science at Trainline for Smarter Journeys

‡ 25

Coming soon…

Page 26: Data Science at Trainline for Smarter Journeys

Hotels

26

Page 27: Data Science at Trainline for Smarter Journeys

JourneyRecommendations

27

Page 28: Data Science at Trainline for Smarter Journeys

SearchPrediction

28

Page 29: Data Science at Trainline for Smarter Journeys

SummaryBusyBot Hotels

Journey RecommendationsSearch

Prediction

DelaysPrices

Real Time InformationPersonalisation

….

29

Page 30: Data Science at Trainline for Smarter Journeys

Any Questions?

(we are hiring!)

Data Scientist positions: [email protected] Engineer positions: [email protected]

30