Top Banner
Tailored for Spark Hadoop Summit Dublin 2016 Petr Igrevski John Scheibmeir eBay
17

Tailored for Spark

Apr 16, 2017

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Tailored for Spark

Tailored for Spark

Hadoop Summit Dublin 2016Petr IgrevskiJohn ScheibmeireBay

Page 2: Tailored for Spark

eBay - Tailored for Spark 2

How to tailor Spark for maximum impact

1. Optimal infrastructure

2. Customized user experience

Page 3: Tailored for Spark

eBay - Tailored for Spark 3

Outline

1. eBay, Analytics, Hadoop, and Spark2. Spark Opportunities at eBay3. QA

Page 4: Tailored for Spark

BACKGROUND

Page 5: Tailored for Spark

5

eBay

eBay - Tailored for Spark

Q4 2015

Page 6: Tailored for Spark

6

Analytics at eBay

Analytics

BI

Kylin MicroStrategy Tableau R / SAS

ETL

Ab Initio

Data Platform

Hadoop Teradata

eBay - Tailored for Spark

Streaming Spark

Page 7: Tailored for Spark

7

Hadoop at eBay

1. Search Index2. Log Management3. Operation Metric Management4. Analytics

eBay - Tailored for Spark

Page 8: Tailored for Spark

8

Hadoop Hardware

Multiple Generations

12-18 Cores

72-128GB RAM

24-72TB Storage

Provisioned by cabinet

eBay - Tailored for Spark

Page 9: Tailored for Spark

9

Spark at eBay

• Uses– Spark 1.4 to Spark 1.6

• Methods– Yarn

• Current utilization– 20% analytic clusters

• Use Cases– Purchase Suggestions– Marketing Optimization– Customer Interests, Consistency, and Similarity– Kylin Cube Building

eBay - Tailored for Spark

Page 10: Tailored for Spark

10

Spark Challenges

• Capacity Management and Efficiency– Map Reduce => Yarn– Job Sizing

• Support– Missing vendor support– Missing expertise

• Deployment– Library conflicts– Configuration challenges– Distribution sprawl

• Integration– Configuration

eBay - Tailored for Spark

Page 11: Tailored for Spark

TAILORING SPARKSimple things should be simple. Complex things should be possible.

Alan Kay

eBay - Tailored for Spark11

Page 12: Tailored for Spark

12

We can

• Copy• Test • Run

eBay - Tailored for Spark

Page 13: Tailored for Spark

13

Opportunities for Spark

•Flexibility•Usability•Simplicity•Speed•Transparency

eBay - Tailored for Spark

Page 14: Tailored for Spark

14

On YARN

• Security• Multitenancy• Reliability• Experience• Performance

eBay - Tailored for Spark

YARNSpark

HDFS HDFS SWIFT NFS

Ker

bero

s

Page 15: Tailored for Spark

15

Does it fit?

• Compute• Storage• Network• Provisioning

eBay - Tailored for Spark

Shared Compute resources

Independently scalable storage

Flat Network

Page 16: Tailored for Spark

16

Can we make it feel better?

• Standard ADLC• Test to your level of comfort• Single click deployment• Watch every step• Certify your job• Let it run• Did you say UI?

eBay - Tailored for Spark

Development

Test

Packaging

Certification

Runtime

RegisterRepos

CIMetadata DBProvisioning

Runtime farmOrchestrator

Page 17: Tailored for Spark

17

Q/A

eBay - Tailored for Spark