Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS

Post on 25-May-2015

342 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presented by Milosz Tanski, AdFin

Transcript

OLAP ON QERIES IN SECONDS ON PETABYTE DATASETDistributing Petabucket data using CephFS

Milosz Tanski, CTO @Adfinmilosz@adfin.com

October 2014

©AdFin. All Rights Reserved2

Outline

Who/what is AdFin?

What is PetaBucket?

Petabucket on CephFS

Contributing FSCache support to CephFS

©AdFin. All Rights Reserved3

About Adfin

= Ad-Tech + Finance-Tech

Creating tools that bring buying intelligence to programmatic media.

Advertising is bought and sold in real time via RTB (since 2008)

Brining transparency to the Ad markets.

The Bloomberg, S&P, Markit… for Ad markets.

©AdFin. All Rights Reserved4

We Deliver… Pretty Analytics

©AdFin. All Rights Reserved5

We Deliver… Pretty Analytics

©AdFin. All Rights Reserved6

We Deliver… Pretty Analytics

©AdFin. All Rights Reserved7

We Deliver… Pretty Analytics

©AdFin. All Rights Reserved8

What’s the problem?

Market is ~500 Billion impressions a day; it’s growing.

Each impression is unique.

Each is worth a small fraction of a penny.

Magnitude more then number of trades in the Financial markets

There’s a magnitude more bids for those impressions.

That’s a lot of data to process, store, analyze.

©AdFin. All Rights Reserved9

Petabucket

Distributed, time series, relational, OLAP database

Relational query language (but not SQL)

Query in broken up into many smaller chunks

Great single node performance. 10s of millions rows a second.

Vectorized query processing, vectorized compressed bitmap indexes.

Responses in real-time. Goal is low single digit seconds (uncached)

Why? Because we’re a bit crazy.

©AdFin. All Rights Reserved10

Queries easy for humans / machines

11

High Level System Diagram

12

Time series bulk import

©AdFin. All Rights Reserved13

Petabucket and CephFS

CephFS as a single namespace storage for nodes

Why?

Scalable storage (speed / size)

Separate storage from computation

No SPOF

DFS performance

Client (kernel) performance

©AdFin. All Rights Reserved14

High Level System Diagram, part 2

©AdFin. All Rights Reserved15

CephFS is not production ready?

Again, we’re a bit crazy?

Started in early 2013.

When we started client and MDS were not ready.

We found and reported a lot of bugs.

Yan Zhen fixed a lot of bugs. Thanks Yan.

Today we’re happy and in production.

Processed multiple PB of data since then.

©AdFin. All Rights Reserved16

FSCache for kclient

We decided to add local persistent caching support to the kclient.

Our access pattern:

Working set larger then node memory (page cache)

Append-only data (time series)

Most recent month, quarter of data access 100x more often

Benefits:

Reducing latency / speed lost by moving to non-local filesystem

Reduce Ceph network traffic and OSD utilization

Cheap local SSD drives get 500MB/s read performance

Not re-inventing the wheel

©AdFin. All Rights Reserved17

Kernel programming is hard

Have to understand Ceph, kernel, concurrency.

An error in the kernel hangs or Oops your machine.

Bugs in other parts of the kernel? (CacheFS).

Prototype working in two weeks

First submission 2 months later.

In kernel 5 months later.

Number one problem concurrency.

©AdFin. All Rights Reserved18

Ceph with FSCache Status

In since: 3.13

… Works well since: 3.15

… All bugs fixed: 3.17

Speed… as fast as your caching disk

Tested single client performance 1200MB/s

©AdFin. All Rights Reserved19

Next steps…

Contributing to Ceph & kernel is addicting:

Ceph performance work. Improving latency / ioops.

Kernel work: readv2() syscall. File serving applications

http://lwn.net/Articles/612483/

Thank You!

©AdFin. All Rights Reserved21

Let’s Get in Touch

milosz@adfin.com16 E. 34th Street, 15th FloorNew York, New York 10016

Milosz TanskiCTO

linkedin.com/company/AdFin

twitter.com/AdFin

top related