TriHUG 3/14: HBase in Production

HBase In ProductionHey we’re hiring!

Contents● Bronto Overview● HBase Architecture● Operations● Table Design● Questions?

Bronto OverviewBronto Software provides a cloud-based marketing platform for organizations to drive revenue through their email, mobile and social campaigns

Bronto Contd.● ESP for E-Commerce retailers● Our customers are marketers● Charts, graphs, reports● Market segmentation● Automation● We are also hiring

Where We Use HBase● High volume scenarios● Realtime data● Batch processing● HDFS staging area● Sorting/Indexing not a priority

○ We are working on this

HBase Overview● Implementation of Google’s BigTable● Sparse, sorted, versioned map● Built on top of HDFS● Row level ACID● Get, Put, Scan● Assorted RMW operations

Tables OverviewTables are sorted (lexicographically) key value pairs of uninterpreted byte[]s. Keyspace is divided up into regions of keys. Each region is hosted by exactly one machine.

R3R1

Server 1

Key Value

a byte[]

aa byte[]

b byte[]

bb byte[]

c byte[]

ca byte[]

R1: [a, b)

R2: [b, c)

R3: [c, d)

R2

Server 1

Table Overview

Operations● Layers of complexity● Normal failure modes

○ Hardware dies (or combust)○ Human error

● JVM● HDFS considerations● Lots of knobs

Cascading Failure1. High write volume fragments heap2. GC promotion failure3. Stop the world GC4. ZK timeout5. Receive YouAreDeadException, die6. Failover7. Goto 1

Useful Tunings● MSLAB enabled● hbase.regionserver.handler.count

○ Increasing puts more IO load on RS○ 50 is our sweet spot

● JVM tuning○ UseConcMarkSweepGC○ UseParNewGC

Monitoring Tools● Nagios for hardware checks● Cloudera Manager

○ Reporting and health checks○ Apache Ambari and MapR provide similar tools

● Hannibal + custom scripts○ Identify hot regions for splitting

Table Design● Table design is deceptively simple● Main Considerations:

○ Row key structure○ Number of column families

● Know your queries in advance

Additional Context● SAAS environment

○ “Twitter clone” model won’t work● Thousands of users millions, of attributes● Skewed customer base

○ Biggest clients have 10MM+ contacts○ Smallest have thousands

Row Keys● Most important decision● The only (native) index in HBase● Random reads and writes are fast

○ Sorted on disk and in memory○ Bloom filters speed read performance (not in use)

Hotspotting● Associated with monotonically increasing

keys○ MySql AUTO_INCREMENT

● Writes lock onto one region at a time● Consequences:

○ Flush and compaction storms○ $500K cluster limited by $10K machine

Row Key Advice● Read/Write ratio should drive design

○ We pay a write time penalty for faster reads● Identify queries you need to support● Consider composite keys instead of indexes● Bucketed/Salted keys are an option

○ Distribute writes across N buckets○ Rebucketing is difficult○ Requires N reads, slow workers

Variable Width Keyscustomer_hash::email● Allows scans for a single customer● Hashed id distributes customers● Sorted by email address

○ Could also use reverse domain for gmail, yahoo, etc.

Fixed Width Keyssite::contact::create::email● FuzzyRowFilter

○ Can fix site, contact, and reverse_create○ Can search for any email address○ Could use a fixed width encoding for domain

■ Search for just gmail, yahoo, etc● Distributes sites and users● Contacts sorted by create date

Column Families● Groupings of named columns● Versioning, compression, TTL● Different than BigTable

○ BigTable: 100s○ HBase: 1 or 2

Column Family ExampleId d {VERSIONS => 2} s7 {TTL => 604800}

a (address) p (phone) o:3-27 (open) c:3-20 (click)

dfajkdh byte[] byte[]:555-5555 byte[]

hnvdzu9 byte[]:1234 St. XXXX

hnvdzu9 byte[]:1233 St.

hnvdzu9 XXXX byte[]

er9asyjk byte[]: 324 Ave

Column Family Example

● PROTIP: Keep CF and qualifier names short ○ They are repeated on disk for every cell

● “d” supports 2 versions of each column, maps to demographics● “s7” has seven day TTL, maps to stats kept for 7 days.

MemStore

HDFSs2s1 s3

f1

Column Families In Depth

MemStore

HDFSs2s1

f2

my_table,,1328551097416.12921bbc0c91869f88ba6a044a6a1c50.

● StoreFile(s) for each CF in region

● Sparse● One memstore per CF

○ Must flush together● Compactions happen at

region level

(Region)

(family) (family)

Compactions● Rewrites StoreFiles

○ Improves read performance○ IO Intensive

● Region scope● Used to take > 50 hours● Custom script took it down to 18

○ Can (theoretically) run during the day

MemStore

HDFS

S1

f1


(Region)

MemStore

HDFSs2

s1 s3

f1


(Region)

Compaction Before and After

s4 s5 s6

Before After

K-Way Merge

The Table From Hell● 19 Column Families● 60% of our region count● Skewed write pattern

○ KB size store files○ Frequent compaction storms○ hbase.hstore.compaction.min.size (HBASE-5461)

● Moved to it’s own cluster

And yet...● Cluster remained operational

○ Table is still in use today● Met read and write demand● Regions only briefly active

○ Rowkeys by date and customer

What saved us● Keyed by customer and date● Effectively write once

○ Kept “active” region count low● Custom compaction script

○ Skipped old regions● More hardware● Were able to selectively migrate

Column Family Advice● Bad choice for fine grained partitioning● Good for

○ Similarly typed data○ Varying versioning/retention requirements

● Prefer intra row scans○ CF and qualifiers are sorted○ ColumnRangeFilter

Questions?