Top Banner
FEBRUARY 3, 2015 Arno Kolster Sr. Database Architect Advanced Technology Group @2015 PayPal Inc. All rights reserved. Confidential and proprietary. Evolution of HPC Usage at PayPal Image from Boris Müller’s “Visual Poetry 6” (http://www.esono.com/boris/projects/poetry06/visualpoetry06/)
46

FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

Jun 22, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

FEBRUARY 3, 2015

Arno Kolster Sr. Database Architect Advanced Technology Group

@2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Evolution of HPC Usage at PayPal

Image from Boris Müller’s “Visual Poetry 6” (http://www.esono.com/boris/projects/poetry06/visualpoetry06/)

Page 2: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

About your speaker

2 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

25+ years in database architecture and operations.

Has been with Ebay Inc. for 12 years, with a focus on database and operations architecture.

Has spoken at a number of domestic and international Big Data and HPC conferences.

Career interest in solving real time, high volume analytics problems using HPC and new technology architectures.

Along with his colleague Ryan Quick, won IDC HPC Innovation Awards at SC ’12 and SC ‘14.

Page 3: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

Why did we start leveraging HPC?

3 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

We had a number of large scale compute problems we thought we could solve with non-traditional enterprise solutions.

Many HPC installations had already solved these problems before. (Large data set analytics, heterogeneous compute architectures etc.)

We saw an eventual ‘merge’ of HPC and enterprise technologies and wanted to get in front of that trend.

HPC price points had come down enough for enterprise capex budgets.

Page 4: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

PayPal HPC Timeline

4 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

2008 2009 2010 2011 2012 2013 2014 2015+

Page 5: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

Real Time Fraud Detection Problems

5 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Detecting fraud in 'real time’ as millions of transactions are processed between disparate systems at volume is extremely difficult.

Ability to create and deploy new fraud models into event and transaction flows quickly and with minimal effort.

Provide environment for fraud modeling, analytics, visualization, M/R, dimensioning and further processing.

Finding suspicious patterns that we don’t even know exist in related data sets.

Page 6: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

The Challenges

6 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

5 9’s availability, scalability and reliability in a 24x7x365 environment with servers requiring less power and committing to cleaner and greener commerce.

Maintain a graph of identities, transactions, bank accounts, credit cards, ip addresses etc. to support the models.

Keep operations simple. Small team of SAs and DBAs.

How to keep fraud models current and ensure integrity of incoming events and data.

Educate peers and higher ups of new technology and concepts so they ‘get it’. “HPC what?”

Page 7: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

What kind of volume?

7 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

11 million+ PayPal logins / day.

500+ variables calculated per event for some models.

~4 Billion inserts / day.

14 million+ financial transactions / day.

~8 Billion selects / day.

Page 8: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

Our Solution - Trinity

8 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Real time linking platform for identities from various source systems. Built a ‘financial’ social network.

Intelligent gateways, message routing & delivery to heterogeneous systems.

Inline stream analytics using CEP and ESP.

Highly distributed open source databases for OLTP storage of edges and nodes. Architected for scale up, out and HA.

Standardized operations – h/w and s/w deployment, monitoring, command & control processes, etc.

Downstream analytics environments for further processing.

Leveraged HPC architecture and hardware where it made sense.

Page 9: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

Trinity – EFL Flow

@2014 PayPal Inc. All rights reserved. Confidential and proprietary. 9

AZURE DB

INDIGO DB

IDENTITY

SGW

POOL

INDIGO

SGW

POOL

AZURE

APP

POOL

INDIGO

APP

POOL

1

TIS

POOL

CERULEAN

SGW

POOL

CERULEAN

POOL

2

BES/RE

S

POOL

COBALT

POOL

(SFS)

TIS DB

M

E

S

S

A

G

E

B

U

S

PP

EFL

IDENTITY

SGW

POOL

REST SOAP RES ATE

C

E

P

C

E

P

3

4

7

5

6

Page 10: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

SGI ALTIX 8200/8400 ICE CLUSTERS - 2008

10

156 sockets

1872 cores

7.5Tb RAM

Intel Xeon X5650

2.67Ghz

78 nodes

Page 11: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

SGI ALTIX ICE 8200/8400 CLUSTER

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 11

Supports multiple deployment strategies in the same cluster

EFL Cluster Provisioning By Application

Page 12: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

Cyber Monday 2014 – Trinity only

Messages Events Database

Sent Rcvd Sent Rcvd DB Op:

SELECT

DB Op:

INSERT

DB Op:

UPDATE

Rows

Read

Rows

Inserted

Rows

Updated

Bytes

Sent

Bytes

Rcvd

Per

Day 6B 10B 4B 39B 8B 12B 8TB 5.3TB

Per

Secon

d

160K 80K 1200 5000 69K 119K 47K 458K 95K 141K 98MB

65MB

Totals

/Sec 240K 6200 235K 694K 163MB

Page 13: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

13

Page 14: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

“BABAR” – Hadoop Cluster

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 14

Hadoop / HDFS deployment architected differently than traditional Hadoop installations.

Separation of storage from compute to allow independent expansion model.

SSDs for shuffle/sort and spinning disk (Lustre) for ingress / egress.

Used for offline analytics by where.com for geo-marketing data, spatial vectoring, array modeling, etc.

Also houses OLTP, OLAP and vector databases, R, MatLab, etc.

Page 15: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

“BABAR” SGI ALTIX 8400 ICE CLUSTER

15

1152 sockets

2304 cores

14.2Tb RAM

Intel Xeon X5690 3.47Ghz

128 nodes

IS5500 QDR IB arrays

Shared storage on 6 IS5500 arrays, DDN

SFA10K & DDN SFA12K arrays

Page 16: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

The problem with Big Data analytics…

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 16

Everyone seems to think there is a single solution to solve the problem. There isn’t.

“The Three Legged Stool”

OLTP

Analytics DB HDFS

Page 17: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

Systems Intelligence

“MAINTAINING THE WELLNESS OF THE PAYPAL ECOSYSTEM”

17

OPERATIONAL ANALYTICS

Page 18: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

The problems that kicked off Systems Intelligence:

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 18

No analytics being done on Terabytes of operational data being pushed from app servers.

No understanding of how this could even be analyzed by operations teams.

No interest from the business units, because it wasn’t business analytics.

No vision of possible future benefits or integrations to other operational areas.

In steps ATG, with a proven architecture and a vision to deploy HPC to solve the problem.

Page 19: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

19

A FINELY TUNED ECOSYSTEM

Ecosystem: a system involving the interactions between a community of living organisms in a particular area and its nonliving environment

Page 20: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

What are we trying to accomplish with operational Big Data?

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 20

Gather an holistic view of PayPal’s ecosystem. (i.e. interactions between physical sites, infrastructure, applications and customers). Think “Internet of Things” inside the data center.

Create a self-healing environment through the use of predictive analytics, event correlation and behavior and remediation rule sets.

Model the entire ecosystem’s capacity and capabilities for growth, performance and efficiency.

Leverage real time streaming analytics with dynamic models built offline to recognize patterns and take appropriate actions. Educate our peers and management about real time analytics augmented with ancillary datasets.

Page 21: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

25Tb of data ingested every hour

What are we up against? Operations analytics in real time…

21 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Real time anomaly detection in correlated event streams using predictive analytical models based on historical data sets. Streams include application logs, server machine data, data center metrics and social media. What are we up against with real time? 3 Million events / sec from 1000s of sources in our data centers. “IoT in the data center”

20Mb / sec machine data

Increasing social media trends / customer interaction per day

50K metadata relationships

Page 22: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

Our Solution – Systems Intelligence

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 22

Common ontology for concepts and relationships.

Purpose built systems driven by underlying technology.

RDMA, clustered file systems for reduced copy times.

Inline stream analytics using CEP, ESP and patented technologies.

Downstream analytics environments for model building and further processing.

Page 23: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

3 In Line Processing

SHARED MEMORY EVENT WINDOW

APP APP APP

CEP CEP

APP

CEP

MACHINE DATA

APP LOGS

ENVIRON DATA

SOCIAL DATA

1 Source Events

4

GRAPH DB

OLTP DB

2 Message Bus

7

LINKED DATA MONITORING

ALERTING SELF-HEALING

8

5 Destination Data

Stores ANALYTICS DB

@2014 PayPal Inc. All rights reserved. Confidential and proprietary. 23

Systems Intelligence Flow

VISUALIZATION,

MACHINE

LEARNING “Data Scientist”

6 NEW MODELS

Page 24: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

UV2000 Installation – Jan 2014

24

Page 25: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

UV2000 Installation – Jan 2014

25

Page 26: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

Systems Intelligence Cluster

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 26

THE UV2000

“The big brain”

24 sockets

96 cores

Intel Xeon E5

6Tb RAM

Shared storage on six SGI IS5500 arrays, DDN SFA10K and DDN SFA12K arrays

IS5500 QDR IB arrays

BABAR and….

Page 27: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

Systems Intelligence Analytics

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 27

Keep events in memory so you have a ‘rolling window’ for CEP and ESP processing.

The event window is a function of memory size * events/sec * event size.

Shared memory data set can be acted upon by a number of different processes.

Data is streamed through predictive models generated from offline machine learning and deep analytics of historical data sets.

Page 28: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

What can we tell from Systems Intelligence and IoT streams?

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 28

Application flows slowing down. (i.e. due to database performance degradation or new code push)

Aberrant server or server pool behavior.

Environmental issue in the data center. (i.e. temperature deviation, accidental operator error)

Bug in new codes shows up as a change in social media sentiment or increased customer service activity. Real time business metrics. (i.e. total payment volume)

Page 29: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

Benefits derived from Systems Intelligence

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 29

Visibility into the health of the complete ecosystem all the time.

Self healing of potential issues through predictive models and remediation rule sets.

Modeling the entire PayPal system into a Linked Data paradigm for future use.

Less reliance on humans. Computers don’t need sleep.

Ecosystem has become too complex for humans to comprehend.

Cost benefit to business – running a more efficient system, up to its capabilities, not is capacity.

Page 30: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

Our latest development for Systems Intelligence…

30

“Complex Event Processing as Digital Signals”

Page 31: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

Real-Time Analogy

31

Everyone likes to go to concerts…

Page 32: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

32 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

The Concert Experience

You’re at the concert listening to your favorite piece of music or song.

You’re really enjoying yourself, you’ve had a glass of wine, you feel in tune with the musicians…

Suddenly you hear a bad note.

But it’s a concert, the show goes on, you ignore it and you have another sip of wine.

Page 33: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

33

The Concert Experience

Not a second later,

not a minute later,

not a day later…

You used predictive models to look for anomalies

in the event stream,

You analyzed data in real time.

But at the instant the event occurred.

But…what just

happened?

in REAL TIME .

Page 34: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

34

Difficult stuff, right?

How do we create a solution that allows us to do this?

Cheaper

Faster Yes it is.

Greener

Page 35: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

35

…meanwhile, in a completely unrelated meeting…

Page 36: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

36

An idea evolved regarding the m800 cartridge...

Page 37: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 37

… HPC in a SoC …

Page 38: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

38

Difficult stuff, right?

Cheaper

Faster

Greener

Yes, but not impossible.

Page 39: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

@2015 PayPal Inc. All rights reserved. Confidential and proprietary. 39

Familiar Systems Integration (ARM)

• Linux for general purpose work

• integrating with enterprise systems (databases, marshaling, command & control)

• short development learning curve (python, java, openCL, openMPI

Efficient, Real-Time Parallel Processing

Complex Event Processing as Digital Signals

• Implement signal analysis in hardware

• solve encoding, marshaling, atomicity

• apply both global shared memory and scale-out process best practices

• leverage cross-platform development to decrease ramp-up and testing time (openCL)

Page 40: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

@2014 PayPal Inc. All rights reserved. Confidential and proprietary. 40

Parallel, True Real-Time Analytics

• Multiple filters/atomic event stream

• Multiple streams/filter

• Multiple filters/multiple streams

• Pattern recognition (outliers, clusters, frequency matrices, etc)

• Rich library of functions (notch/high pass/low pass filters, DFT/FFT, z-,bilinear- transform, etc.)

HPC and Enterprise Best Practices

Complex Event Processing as Digital Signals

• Multicore implementation

• Tiered shared memory and queuing

• High-speed, low-latency transports inter/intra SoC

• Support for common development libraries and standards (openCL, openMP/MPI)

• Efficient, low-power solutions

(~55W/cartridge (4 SoCs / cartridge)

• Extreme performance (11.2 GF/watt)

Page 41: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

@2014 PayPal Inc. All rights reserved. Confidential and proprietary. 41

Page 42: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

Challenges along the way (pt 1)…

42 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Misunderstanding of what HPC is. “I just bought a cloud… why do I need HPC now?”

Industry wants to follow trends of what other ‘valley’ companies are doing.

People are comfortable with what they know and resistant to change.

Had to apply e-commerce high availability and high volume standards to HPC.

Page 43: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

Challenges along the way (pt 2)…

43 @2014 PayPal Inc. All rights reserved. Confidential and proprietary.

Not able to realize there is no ‘box’, let alone think outside one.

No ability to apply new technology to existing problems in a different way.

Very few people can take a vision all the way to production deployment.

In terms of analytics – shortage of analysts with technical skills.

Page 44: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

How we’re addressing the challenges…

44 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

Educating our peers, managers and executives to the benefits and ROI of HPC as it pertains to the specific use cases we’ve identified.

Evangelize, socialize and showcase the work we’ve been doing.

Keep presenting PayPal HPC technology and use cases at conferences.

Formalize a roadshow and brown bag sessions about technical computing.

Try to host BOFs about HPC and industry to gain larger industry momentum.

Continue collaborations with ORNL, IBM Labs, HP and TI.

Page 45: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

What we’re looking at in a post split world….

45 @2015 PayPal Inc. All rights reserved. Confidential and proprietary.

HPDA – High Performance Data Analytics

More ‘Atoms’ in our ‘Atoms to Galaxies’ architecture toolbox.

Continue with socially responsible computing – not only through business initiatives, but bringing lower energy computing into the data centers.

Page 46: FEBRUARY 3, 2015 · Hadoop / HDFS deployment architected differently than traditional Hadoop installations. Separation of storage from compute to allow independent expansion model.

Thank you.

Image from Boris Müller’s “Visual Poetry 6” (http://www.esono.com/boris/projects/poetry06/visualpoetry06/)

[email protected]