Top Banner
LEVERAGING HPC AND ENTERPRISE ARCHITECTURES FOR LARGE SCALE INLINE TRANSACTIONAL ANALYTICS IN FRAUD DETECTION AT PAYPAL Image from Boris Müller’s “Visual Poetry 6” (http://www.esono.com/boris/projects/poetry06/visualpoetry06/) Arno Kolster Sr. Database Architect (Advanced Technology Group Site Operations Infrastructure) September 26, 2013
18

PayPal's usage of HPC and InfiniBand as presented at ISC Big Data 2013

Feb 11, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PayPal's usage of HPC and InfiniBand as presented at ISC Big Data 2013

LEVERAGING HPC AND

ENTERPRISE ARCHITECTURES

FOR LARGE SCALE INLINE

TRANSACTIONAL ANALYTICS IN

FRAUD DETECTION AT PAYPAL

Image from Boris Müller’s “Visual Poetry 6” (http://www.esono.com/boris/projects/poetry06/visualpoetry06/)

Arno Kolster Sr. Database Architect

(Advanced Technology Group – Site Operations Infrastructure)

September 26, 2013

Page 2: PayPal's usage of HPC and InfiniBand as presented at ISC Big Data 2013

2

REAL TIME FRAUD DETECTION

Page 3: PayPal's usage of HPC and InfiniBand as presented at ISC Big Data 2013

THE PROBLEM

Detecting fraud in 'real time’ as millions of transactions are

processed between disparate systems at volume.

3

Ability to create and deploy new fraud models into event

flows quickly and with minimal effort.

Provide environment for fraud modeling, analytics,

visualization, M/R, dimensioning and further processing.

Finding suspicious patterns that we don’t even know exist in

related data sets.

Page 4: PayPal's usage of HPC and InfiniBand as presented at ISC Big Data 2013

THE CHALLENGES

5 9s availability, scalability and reliability in a 24x7x365

environment. “PayPal is always open” *.

4

Maintaining a graph of identities, transactions, ips, etc. to

support the models.

Keep Operations simple. Small team of SAs and DBAs.

How to keep fraud models current and ensure integrity of

incoming events and data.

Educate peers and higher ups of new technology and

concepts so they ‘get it’. “HPC what?”

Page 5: PayPal's usage of HPC and InfiniBand as presented at ISC Big Data 2013

VOLUME FOR THIS HYBRID SYSTEM?

11 million+ PayPal logins / day.

5

500 variables calculated per event for some models.

~4 Billion inserts / day.

13 million+ financial transactions / day.

~8 Billion selects / day.

Page 6: PayPal's usage of HPC and InfiniBand as presented at ISC Big Data 2013

OUR SOLUTION - TRINITY

Highly distributed open source databases for OLTP storage

of nodes and edges. Architected for scale out, up and HA.

6

Intelligent gateways, message routing & delivery to

heterogeneous systems. Event everything.

Inline stream analytics using CEP and ESP.

Leveraged HPC architecture and hardware where it made

sense.

Downstream analytics environments for further processing.

Real time linking platform for identities from various source

systems. Built a giant graph.

Standardized operations – h/w, s/w deployment, monitoring,

command & control processes, etc.

Page 7: PayPal's usage of HPC and InfiniBand as presented at ISC Big Data 2013

7

HIGH LEVEL SYSTEM OVERVIEW

F

L

O

W

S11

M11

S0 SS1 M0 M1

… App App

App

DAL

SGW SGW

INCOMING EVENTS

SGW

S95 M95

S0 SS1 M0 M1

… App App

App

DAL

ANALYTICS (OLAP/MapReduce)

Trinity Identification Service

(TIS)

MODELS

Extensible Financial Linking

(EFL) MYSQL REPLICATION

Page 8: PayPal's usage of HPC and InfiniBand as presented at ISC Big Data 2013

8

HOW DID WE MANAGE TO SCALE ?

Page 9: PayPal's usage of HPC and InfiniBand as presented at ISC Big Data 2013

TRINITY DB PROLIFERATION

9

2007 2013

1800 Instances

8+ Billion Selects/Day

600 Masters, 1200 ROs

36 Instances

12 Masters, 24 ROs TIS, TAS

5 Billion Nodes

Billions of Edges

TIS, TAS

ARS, NEO, UVS, NA

EFL 12 shards

Scale up & scale out 3 DBAs / 4 SAs

3 DBAs / 4 SAs

Page 10: PayPal's usage of HPC and InfiniBand as presented at ISC Big Data 2013

DERIVATION OF MODELS

Metrics / variables / summaries generated from inline

processing of events using CEP (>500 metrics / event)

10

Scoring of events based on historical and current metrics.

Scores sent on to PayPal flows for further RISK modeling or

transaction blocking.

Different Fraud Models generate different types of scores.

New Fraud Models created based on success ratio of

previous ones or reaction to change in data and usage

patterns. (R, SAS, M/R, vectoring)

Page 11: PayPal's usage of HPC and InfiniBand as presented at ISC Big Data 2013

STREAMING ANALYTICS FLOW (FRAUD)

11

AZURE DB

INDIGO DB

IDENTITY

SGW

POOL

INDIGO

SGW

POOL

AZURE

APP

POOL

INDIGO

APP

POOL

1

TIS

POOL

CERULEAN

SGW

POOL

CERULEAN

POOL

2

BES/RES

POOL

COBALT

POOL

(SFS)

TIS DB

M

E

S

S

A

G

E

B

U

S

PP

EFL

IDENTITY

SGW

POOL

REST

SOAP

RES

ATE

C

E

P

C

E

P

3

4

7

5

6

Page 12: PayPal's usage of HPC and InfiniBand as presented at ISC Big Data 2013

WHERE ARE WE USING HPC?

Infiniband on all internal Trinity network (Mellanox QDR

40Gb dual plane)

12

3 SGI Altix ICE 8200/8400 clusters for all 120+ EFL memory

based apps – no disk i/o overhead.

MPI “like” apps. MPP features with scale out and affinity

processing.

SGI InifiniteStorage IS4600 for EFL databases.

Lustre on Hadoop cluster.

Page 13: PayPal's usage of HPC and InfiniBand as presented at ISC Big Data 2013

SGI ALTIX 8200/8400 ICE CLUSTERS

13

156 sockets

1872 cores

7.5Tb RAM

Intel Xeon X5650

2.67Ghz

78 nodes

Page 14: PayPal's usage of HPC and InfiniBand as presented at ISC Big Data 2013

SGI ALTIX ICE 8200/8400

14

Supports multiple deployment strategies in the same cluster

Page 15: PayPal's usage of HPC and InfiniBand as presented at ISC Big Data 2013

TRINITY STATISTICS

15

HAPPY HOLIDAYS 2012

Messages Events Database

Sent Rcvd Sent Rcvd DB Op:

SELECT

DB Op:

INSERT

DB Op:

UPDATE

Rows

Read

Rows

Inserted

Rows

Updated

Bytes

Sent

Bytes

Rcvd

Per

Day 404,920 3.27 B 4.3 B 8.89 B 3.47 B 3.35 B 882.4 G 1,801 G

Per

Secon

d

40,091 17,235 380 1,250 4,687 37,855 49,719 102,934 40,156 38,762 10.2 G 20.1 G

Totals

/Sec 57,326 1,630 92,261 181,851 30.3 G

Page 16: PayPal's usage of HPC and InfiniBand as presented at ISC Big Data 2013

WHAT’S NEXT?

Addition of new link types into the network graph. Will add

60+ dbs.

16

Co-sponsoring a BOF at SC13 with ORNL called “Big Data:

Industry views on real-time data, analytics, and HPC

technologies to bring them together.”

Change key/value type data store from structured to semi-

structured.

Keep educating peers and higher ups so they ‘get it’. “HPC?

Yes. We want more!”

New project involving enterprise integration with HPC

technology called ‘Systems Intelligence’ for PayPal

ecosystem management.

Page 17: PayPal's usage of HPC and InfiniBand as presented at ISC Big Data 2013

17