Philly DB MapR Overview

MapR's Hadoop Distribution

Who am I?

•  Keys Botzum •  [email protected] •  Senior Principal Technologist, MapR Technologies •  MapR Federal and Eastern Region

http://www.mapr.com/company/events/speaking/pdb-10-16-12

Agenda

•  What’s a Hadoop? •  What’s MapR? •  Enterprise Grade Hadoop •  Making Hadoop More Open

Hadoop in 15 minutes

How to Scale? Big Data has Big Problems •  Petabytes of data •  MTBF on 1000s of nodes is < 1 day •  Something is always broken •  There are limits to scaling Big Iron •  Sequential and random access just don’t scale

Example: Update 1% of 1TB

•  Data consists of 1010 records, each 100 bytes •  Task: Update 1% of these records

Approach 1: Just Do It

•  Each update involves read, modify and write •  t = 1 seek + 2 disk rotations = 20ms •  1% x 1010 x 20 ms = 2 mega-seconds = 23 days

•  Total time dominated by seek and rotation times

Approach 2: The “Hard” Way

•  Copy the entire database 1GB at a time •  Update records on the fly

•  t = 2 x 1GB / 100MB/s + 20ms = 20s •  103 x 20s = 20,000s = 5.6 hours

•  100x faster to do 100x more work! •  Moral: Read data sequentially even if you only want 1%

of it

MapReduce: A Paradigm Shift

•  Distributed computing platform •  Large clusters •  Commodity hardware

•  Pioneered at Google •  BigTable, MapReduce and Google File System

•  Commercially available as Hadoop

Hadoop •  Commodity hardware – thousands of nodes •  Handles Big Data – petabytes and more •  Sequential file access – each spindle provides data as fast as

possible •  Sharding

•  Data distributed evenly across cluster •  More spindles and CPUs working on different parts of data set

•  Reliability – self-healing (mostly), self-balancing •  MapReduce

•  Parallel computing framework •  Function shipping

§  Moves the computation to the data rather than the typical reverse

§  Takes into account sharding •  Hides most of complexity from developers

Inside Map-Reduce

Input Map Shuffle and sort

Reduce Output

"The 6me has come," the Walrus said, "To talk of many things: Of shoes—and ships—and sealing-‐wax

the, 1 6me, 1 has, 1 come, 1 …

come, [3,2,1] has, [1,5,2] the, [1,2,1] 6me, [10,1,3] …

come, 6 has, 8 the, 4 6me, 14 …

Agenda


The MapR Distribution for Apache Hadoop

•  Commercial Hadoop Distribution •  Open, enterprise-grade distribution

•  Primarily leveraging open source components •  Carefully targeted enhancements to make Hadoop more

open and enterprise-grade

•  Growing fast and a recognized leader

MapR in the Cloud

•  Available as a service with Amazon Elastic MapReduce

(EMR) •  http://aws.amazon.com/elasticmapreduce/mapr

§  Available as a service with Google Compute Engine

MapR Partners

Agenda


MapR’s Complete Distribution for Apache Hadoop

•  Integrated, tested, hardened and supported

•  Integrated with Accumulo

•  Runs on commodity hardware

•  Open source with standards-based extensions for:

•  Security •  File-based access •  Most SQL-based

access •  Easiest integration

•  High availability •  Best performance

MapR Heatmap™

LDAP, NIS Integration

Quotas, Alerts, Alarms

CLI, REST APT

Hive Pig Oozle Sqoop HBase Whirr

Mahout Cascading Naglos Integration

Ganglia Integration

Flume Zoo-keeper

MapR Control System

Direct Access

NFS

Real-Time

Streaming

Volumes Mirrors Snap-shots

Data Placemen

t

No NameNode Architecture

High Performance Direct Shuffle

Stateful Failover and Self Healing

2.7 MapR’s Storage Services™

Accumulo

Easy Management at Scale

•  Health Monitoring

•  Cluster Administration

•  Application Resource Provisioning

Same information and tasks available via command line and REST

MapR: Lights Out Data Center Ready

Reliable Compute Dependable Storage

•  Automated stateful failover •  Automated re-‐replica6on •  Self-‐healing from HW and SW failures

•  Load balancing •  Rolling upgrades •  No lost jobs or data •  99999’s of up6me

§  Business con6nuity with snapshots and mirrors

§  Recover to a point in 6me §  End-‐to-‐end check summing

§  Strong consistency §  Built in compression §  Mirror across sites to meet Recovery Time Objec6ves

Storage Architecture

§  How does MapR manage storage and how is this different from generic Hadoop?

21 ©MapR Technologies -‐ Confiden6al

What is a Volume?

§  Like a sub-‐directory §  related dirs/files together

§  Contains file metadata for this volume

§  Mounted to form global name-‐space

§  Logical unit of policy

Volumes help you manage data


Typical Volume Layout

Create lots of volumes, 100K volumes OK!

/

/binaries /var/mapr /projects /hbase /users

/mjones /jsmith /build /test local...


Volumes Let You Manage Data

§  Replica6on factor §  Quotas §  Load balancing §  Snapshots §  Mirrors §  Data placement §  Made of containers

§  Container is Sharding unit §  16 – 32G


Storage Architecture

§  Nodes §  Disks §  Storage Pools §  Containers

–  Distributed across cluster –  16-‐32 GB

§  Volumes


A B C D

NameNode

E F

NAS APPLIANCE

DataNode DataNode DataNode



No NameNode Architecture Other Hadoop Distribu6ons MapR

§  HA requires specialized hardware and/or sonware

§  File scalability hampered by namenode booleneck

§  Metadata must fit in memory

§  HA w/ automa6c failover and re-‐replica6on §  Up to 1T files (> 5000x advantage) §  Higher performance §  100% commodity hardware §  Metadata is persisted to disk

NameNode

A BNameNode

C D

NameNode

E F

A F C D E D

B C E B

C F B F

A B

A D

E


Hadoop / HBASE APPLICATIONS

NFS APPLICAITONS


NFS APPLICAITONS

MapR Snapshots

§  Snapshots without data duplica6on

§  Saves space by sharing blocks

§  Lightning fast §  Zero performance loss on wri6ng to original

§  Scheduled, or on-‐demand §  Easy recovery by user

REDIRECT ON WRITE FOR SNAPSHOT

Data Blocks

Snapshot 1 Snapshot 2 Snapshot 3

READ / WRITE

MapR Storage Services


NFS APPLICAITONS

A B C C’ D

Production

MapR Mirroring/COOP Requirements

Business Con6nuity and Efficiency Efficient design §  Differen6al deltas are updated §  Compressed and

check-‐summed

Easy to manage §  Scheduled or on-‐demand §  WAN, Remote Seeding §  Consistent point-‐in-‐6me

WAN

Production Research

Datacenter 1 Datacenter 1

WAN

Cloud

Compute Engine

Thought Questions •  Consider a cluster with

•  Petabytes of data •  Hundred or thousands of jobs running each day, creating new data •  Many users and teams all using this cluster

•  How do I back this up? •  User “oops” protection

•  How do I replicate data from one cluster to another in support of disaster recovery? •  Protection from power outages, floods, fire, etc


Designed for Performance and Scale MapR Apache/CDH

Terasort w/ 1x replica6on (no compression)

Total (minutes) 24 min 34 sec 49 min 33 sec

Map 9 min 54 sec 28 min 12 sec

Shuffle 9 min 8 sec 27 min 0 sec

Terasort w/ 3x replica6on (no compression)

Total 47 min 4 sec 73 min 42 sec

Map 11 min 2 sec 30 min 8 sec

Shuffle 9 min 17 sec 28 min 40 sec

DFSIO/local write

Throughput/node 870 MB/s 240 MB/s

YCSB (HBase benchmark, 50% read, 50% update)

Throughput 33102 ops/sec 7904 ops/sec

Latency (r/u) 2.9-‐4 ms/0.4 ms 7-‐30 ms/0-‐5 ms

YCSB (HBase benchmark, 95% read, 5% update)

Throughput 18K ops/sec 8500 ops/sec

Latency (r/u) 5.5-‐5.7 ms/0.6 ms 12-‐30 ms/1 ms

HW: 10 servers, 2 x 4 cores (2.4 GHz), 11 x 2TB, 32 GB

§  1.4 PB user data §  900-‐1200 MapReduce jobs per day §  16 TB/day average IO through each server §  85-‐90% storage u6liza6on (with snapshots) §  Very low-‐end hardware (consumer drives)

§  6B files on a single cluster (+ 3x replica6on) §  2000 servers targeted §  No degrada6on during hardware failures §  Heavy read/write/delete workload §  1.7K creates/sec/node

Large Web 2.0 company

Response Eme (write/read/delete)

Atomic workload 7.8/4.5/8.7 ms

Mixed workload 6.6/4.9/9.1 ms

Customer Support

•  24x7x365 “Follow-The-Sun” coverage •  Critical customer issues are worked on

around the clock •  Dedicated team of Hadoop engineering

experts •  Contacting MapR support

•  Email: [email protected] (automatically opens a case)

•  Phone: 1.855.669.6277 •  Self Service options:

§  http://answers.mapr.com/ §  Web Portal: http://mapr.com/

support

Two MapR Editions – M3 and M5

§  Control System §  NFS Access §  Performance §  High Availability §  Snapshots & Mirroring §  24 X 7 Support §  Annual Subscrip6on

§  Control System §  NFS Access §  Performance §  Unlimited Nodes §  Free

Compute Engine

Also Available through:

Agenda


33 ©MapR Technologies

Not All ApplicaEons Use the Hadoop APIs

Applica6ons and libraries that use files and/or SQL •  These are not legacy

applica6ons, they are valuable applica6ons

Applica6ons and libraries that use the Hadoop APIs

30 years 100,000s applica6ons

10,000s libraries 10s programming languages


Hadoop Needs Industry-‐Standard Interfaces

• MapReduce and HBase applica6ons • Mostly custom-‐built

Hadoop API

•  File-‐based applica6ons •  Supported by most opera6ng systems NFS

•  SQL-‐based tools •  Supported by most BI applica6ons and query builders

ODBC


NFS


Your Data is Important

§  HDFS-‐based Hadoop distribu6ons do not (cannot) properly support NFS

§  Your data is important, it drives your business – make sure you can access it – Why store your data in a system which cannot be accessed by 95% of the world’s applica6ons and libraries?


Direct Access NFS™ File Browsers

Access Directly “Drag & Drop”

Random Read Random Write

Log directly

grep!sed!sort!tar!

Standard Linux Commands & Tools

Applica6ons


The NFS Protocol

§  RFC 1813

§  Very simple protocol

§  Random reads/writes –  Read count bytes from offset offset of file file

– Write buffer data to offset offset of a file file

§  HDFS does not support random writes so it cannot support NFS

WRITE3res NFSPROC3_WRITE(WRITE3args) = 7; struct WRITE3args { nfs_fh3 file; offset3 offset; count3 count; stable_how stable; opaque data<>; }; READ3res NFSPROC3_READ(READ3args) = 6; struct READ3args { nfs_fh3 file; offset3 offset; count3 count; };


Hadoop Was Designed to Support MulEple Storage Layers

HDFS

o.a.h.hd

fs.Distrib

uted

FileSystem

NFS interface

Hadoop FileSystem API

S3

o.a.h.fs.s3n

a6ve.Na6

veS3FileSystem

Local File System

o.a.h.fs.LocalFileSystem

FTP

o.a.h.fs.np.FTPFileSystem

MapR storage layer

com.m

apr.fs.MapRFileSystem

o.a.h.fs.FileSystem Interface MapReduce


One NFS Gateway

What about scalability and high availability?


MulEple NFS Gateways


MulEple NFS Gateways with Load Balancing


MulEple NFS Gateways with NFS HA (VIPs)

Customer Examples: Import/Export Data •  Network security vendor

•  Network packet captures from switches are streamed into the cluster •  New pattern definitions are loaded into online IPS via NFS

•  Online measurement company •  Clickstreams from application servers are streamed into the cluster

•  SaaS company •  Exporting a database to Hadoop over NFS

•  Ad exchange •  Bids and transactions are streamed into the cluster

Customer Examples: Productivity and Operations

•  Retailer •  Operational scripts are easier with NFS than HDFS + MapReduce

§  chmod/chown, file system searches/greps, perl, awk, tab-complete •  Consolidate object store with analytics

•  Credit card company •  User and project home directories on Linux gateways

§  Local files, scripts, source code, … §  Administrators manage quotas, snapshots/backups, …

•  Large Internet company recommendation system •  Web server serve MapReduce results (item relationships) directly from

cluster

•  Email marketing company •  Object store with HBase and NFS

Apache Drill Interactive Analysis of Large-Scale Datasets

Latency Matters

•  Ad-hoc analysis with interactive tools

•  Real-time dashboards

•  Event/trend detection and analysis •  Network intrusion analysis on the fly •  Fraud •  Failure detection and analysis

Big Data Processing

Batch processing Interactive analysis Stream processing

Query runtime Minutes to hours Milliseconds to minutes

Never-ending

Data volume TBs to PBs GBs to PBs Continuous stream

Programming model

MapReduce Queries DAG

Users Developers Analysts and developers

Developers

Google project MapReduce Dremel

Open source project

Hadoop MapReduce

Storm and S4

Introducing Apache Drill…

Innovations

•  MapReduce •  Scalable IO and compute trumps efficiency with today's commodity hardware •  With large datasets, schemas and indexes are too limiting •  Flexibility is more important than efficiency •  An easy to use scalable, fault tolerant execution framework is key for large

clusters

•  Dremel •  Columnar storage provides significant performance benefits at scale •  Columnar storage with nesting preserves structure and can be very efficient •  Avoiding final record assembly as long as possible improves efficiency •  Optimizing for the query use case can avoid the full generality of MR and thus

significantly reduce latency. No need to start JVMs, just push compact queries to running agents.

•  Apache Drill •  Open source project based upon Dremel’s ideas •  More flexibility and openness

More Reading on Apache Drill •  MapR and Apache Drill

•  http://www.mapr.com/drill •  Apache Drill project page

•  http://incubator.apache.org/projects/drill.html •  Google’s Dremel

•  http://research.google.com/pubs/pub36632.html •  Google’s BigQuery

•  https://developers.google.com/bigquery/docs/query-reference •  MIT’s C-Store – a columnar database

•  http://db.csail.mit.edu/projects/cstore/ •  Microsoft’s Dryad

•  Distributed execution engine •  http://research.microsoft.com/en-us/projects/dryad/

•  Google’s Protobufs •  https://developers.google.com/protocol-buffers/docs/proto

Philly DB MapR Overview

Documents