Top Banner
building intelligent data products
36

Building Intelligent Data Products

Feb 19, 2017

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Building Intelligent Data Products

building intelligent data products

Page 2: Building Intelligent Data Products

what actually is fraud

architecting flexible data ‘plumbing’

building solid data products on top of them

Page 3: Building Intelligent Data Products

stephen whitworth

2 years at Hailo as data scientist/jack of some trades out of university

product and marketplace analytics, agent based modelling, data engineering, ‘ML’ services

data science/engineering at ravelin, specifically focused on our detection capabilities

Page 4: Building Intelligent Data Products

what is ravelin?

online fraud detection and prevention platform

stream application/server data to our events API

we give fraud probability + beautiful data visualisation

backed by techstars/passion/playfair/amadeus/indeed.com founder/wonga founder amongst other great investors

Page 5: Building Intelligent Data Products

fraud?

Page 6: Building Intelligent Data Products

$14Ba dollar for every year the universe has existed

Page 7: Building Intelligent Data Products

Same day delivery On-demand services

Page 8: Building Intelligent Data Products

‘victimless crime’

police ill-equipped to handle

low barrier to entry from dark net

3D secure - conversion killer

Page 9: Building Intelligent Data Products

traditional: human generated rules, born of deep expertise

order-centric view of the world

Page 10: Building Intelligent Data Products

hybrid: augment expertise by learning rules from data

cards don’t commit fraud, people do

Page 11: Building Intelligent Data Products

building good plumbing

Page 12: Building Intelligent Data Products

receive firehose through API

decode arbitrary data and store

extract hundreds of features

http/slack/whatever notification to customer

in 100-300ms (ish)

run through N models and rule engine to get probability

Page 13: Building Intelligent Data Products

BUZZWORDS ABOUND

go

postgres

AWS

microservices

zookeeper

NSQ python

event-driven

elasticsearch bigquery dynamodb

redis

Page 14: Building Intelligent Data Products
Page 15: Building Intelligent Data Products

instrumentation

Page 16: Building Intelligent Data Products

different databases for different needs

kudos if you get The Office reference

Page 17: Building Intelligent Data Products

postgres: solid, start here

dynamodb: very high throughput, low latency data

bigquery: to answer any question you could possibly have

elasticsearch: rich querying in a reasonable amount of time

graph db: haven’t decided, recommendations?

Page 18: Building Intelligent Data Products

asynchronous systemsfirehoses

nice deployment patterns

‘lambda architecture’ - the append only log

services store their own interpretation of events

services are almost entirely decoupled

Page 19: Building Intelligent Data Products

asynchronous systemsfirehoses

error propagation is challenging

no guarantees of SLA - at least as slow as your queue

hard to know who or what is consuming your data

Page 20: Building Intelligent Data Products

building data products

Page 21: Building Intelligent Data Products

‘a random forest is like a room full of experts who have seen different

cases of fraud from different perspectives’

Page 22: Building Intelligent Data Products

‘a random forest is like a room full of experts who have seen different

cases of fraud from different perspectives’

N

Page 23: Building Intelligent Data Products

precision: of all of my predictions, what % was I correct?

recall: out of all of the fraudsters, what % did I catch?

implicit tradeoff between conversion and fraud loss

‘accuracy’ a useless metric for fraud

Page 24: Building Intelligent Data Products

99.8% ACCURATE

Page 25: Building Intelligent Data Products
Page 26: Building Intelligent Data Products

keep model interfaces simple

hide arbitrarily complex transformations behind it

blend global and client specific models

Page 27: Building Intelligent Data Products

building and training statistical models

currently batch

will combine with online

Page 28: Building Intelligent Data Products

RANDOM FORESTS

Page 29: Building Intelligent Data Products

‘a random forest is like a room full of experts who have seen different

cases of fraud from different perspectives’

Page 30: Building Intelligent Data Products

RANDOM FORESTS

MONITORING

Page 31: Building Intelligent Data Products

probabilistic, not deterministic

dogfood - use live robot customers

run models in ‘dark mode’ to determine performance

Page 32: Building Intelligent Data Products

why not deep learning? ..yet

ability to debug random forests

had nice results with keras

Page 33: Building Intelligent Data Products

serialisation and deployment: an unsolved problem

Page 34: Building Intelligent Data Products

in beta and signing up clients

looking for on-demand services/marketplaces

talk to me afterwards

Page 35: Building Intelligent Data Products

obligatory: we are hiring!

senior machine learning engineers/data scientists

[email protected] or talk to me after

Page 36: Building Intelligent Data Products

@sjwhitworthwww.ravelin.com - @ravelinhq