Top Banner
Eric Lubow @elubow [email protected] The Big Data Revolution is an
32

The Big Data Revolution is an Evolution

Dec 05, 2014

Download

Technology

Dealing with data doesn't only require a data store, it requires an infrastructure. At SimpleReach, we have 5 data storage layers to service all of our data needs. These range from high volume, high velocity data ingestion with real-time analytics to ad-hoc style historical analysis with search capabilities. To communicate effectively between applications, data stores sit behind a service architecture for consistent data access patterns and failover/redundancy. This talk is a story of how we came to this architecture and some of the lessons we learned along the way.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Big Data Revolution is an Evolution

Eric Lubow

@elubow

[email protected]

The Big Data Revolution is an

Page 2: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Overvie• Evolution

• SimpleReach

• Data Stores / Languages

• Architecture Implementation

Page 3: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

We're in the midst of an evolution, not a revolution.

Page 4: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

The 2 Truths

Page 5: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Even with the right tools, 80% of the work of building a big data system is acquiring and refining

The Real Truth

Page 6: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

30m plays/day + 4m user ratings + 75k movies metadata + 24.4m users metadata =

David Fincher + Kevin Spacey + British House of

Cards

Mitch Hurwitz + Will Arnett + Jason Bateman + Arrested

Development

Page 7: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

BRING IT TOGETHE

Page 8: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

evolutionrevolutionInsufficient Capabilities

Scale/Need Changes

Development & Integration

New Products

Page 9: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Page 10: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Page 11: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

• Millions of URLs per day

• Over 1 billion pageviews per month

• 250m events per day (~3k events/second)

• Auto-scale 90-130 machines depending on traffic

SimpleReach

Page 12: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

HUMBLE BEGINNINGS

Page 13: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Scale

Page 14: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

AND THEN...

C*

Page 15: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

• Large data volume ingestion at high velocity

• Really fast writes to many locations (eventual consistency)

• Query by column groups within rows (slicing)

• TTLs for small group aggregation

• Wrote Helenus, Node.js driver for Cassandra

Cassandra C*

Page 16: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

• Fast atomic increments (Node.js is native JSON)

• Sharding

• Solid ORM for Rails (MongoID)

• B-Tree Indexes

• Document based via JSON

• TTLs for ephemeral data

MongoDB

Page 17: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

• Supports hundreds of thousands transactions per second

• Great caching engine

• Supports useful variable types like sets, sorted set, lists

• Everything is guaranteed to be Memory Mapped

Redis

Page 18: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

• Works with standard MySQL driver

• Column Stores for ad-hoc analytics queries in SQL

• Heavy compression of data (avg 12:1)

Infobright

Page 19: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

• Polyglottany doesn’t only apply to data stores

• Each language has its own benefit to each stack layer

• Each language has its own individual benefits

• Each language has its own development benefits

The c0dez

Page 20: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Page 21: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Cons• Redis - Can only utilize a single core. SerDe price.

• Infobright - DELETE/UPDATEs are VERY expensive

• Cassandra - No btree indexes or probabilistic counters

• Mongo - Indexes must fit in memory. Forced Replica ping times

• Python - Whitespace. Community

• Ruby - Not high performance enough for our standards

Page 22: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Evolution Takes Work• Service Oriented Architecture (Internal API)

• Data accuracy checks: visual and programmatic

• Built framework for testing out engines (Storage, Queueing, etc)

• Access to many toolsets (for all languages, DBs, Engines)

Page 23: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Service

Internal API

Solr

Real-timeC*

C*

Page 24: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Path of a Packet

InternetEP

Inte

rnal

API

Solr

C*

Mong

Redis

IB

API

Fire Hos

SC

Cons

umer

s

Que

ue

Page 25: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Architecture DistributionUS-EAST-1a

MONGO-SHARD-0001-B

MONGO-SHARD-0000-A

CASSANDRA-0001

CASSANDRA-0010

REDIS-0001A

INFOBRIGHT-0001

iAPI-0001

US-EAST-1b

MONGO-SHARD-0002-B

MONGO-SHARD-0001-A

CASSANDRA-0002

CASSANDRA-0011

REDIS-0001B

iAPI-0002

US-EAST-1e

MONGO-SHARD-0002-A

MONGO-SHARD-0000-B

CASSANDRA-0003

CASSANDRA-0012

INFOBRIGHT-0002

iAPI-0003

Page 26: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

The Schrute of the Problem

Page 27: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Evolving Amazon Tools• Full Featured API

• Simple Queuing Service

• Data Pipelining

• OpsWorks

• Cloud Formation

• Redshift Analytics

• CloudSearch

• Elastic Beanstalk

• Elastic MapReduce

• Simple Workflow Coordinator

• S3 / Glacier

Page 28: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

DevOps Wizardry• Extensive use of AWS

• Monitor: Nagios, Statsd, and Graphite

• Manage: Chef, OpsWorks, cSSHx

• Deployments

Page 29: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Summary• Solutions Require Evolution

• Build, Use, and Integrate Tools

• Abstraction

• Distribution

• Monitoring & Automation

Page 30: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

A revolution only lasts fifteen years, a period which coincides with the

Evolution Takes Time

Page 31: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

We’re (Ask us about Food Coma Fridays)

Page 32: The Big Data Revolution is an Evolution

Big Data Revolution is an Evolution

Eric Lubow @elubow #NYCassandra2013

Questions are guaranteed in life.Answers aren’t.

Eric Lubow

@elubow

[email protected]

Thank you.