Top Banner
HBase In Production Hey we’re hiring!
29

TriHUG 3/14: HBase in Production

Aug 23, 2014

Download

Internet

trihug

talk by Michael Webster, software engineer at Bronto
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: TriHUG 3/14: HBase in Production

HBase In ProductionHey we’re hiring!

Page 2: TriHUG 3/14: HBase in Production

Contents● Bronto Overview● HBase Architecture● Operations● Table Design● Questions?

Page 3: TriHUG 3/14: HBase in Production

Bronto OverviewBronto Software provides a cloud-based marketing platform for organizations to drive revenue through their email, mobile and social campaigns

Page 4: TriHUG 3/14: HBase in Production

Bronto Contd.● ESP for E-Commerce retailers● Our customers are marketers● Charts, graphs, reports● Market segmentation● Automation● We are also hiring

Page 5: TriHUG 3/14: HBase in Production

Where We Use HBase● High volume scenarios● Realtime data● Batch processing● HDFS staging area● Sorting/Indexing not a priority

○ We are working on this

Page 6: TriHUG 3/14: HBase in Production

HBase Overview● Implementation of Google’s BigTable● Sparse, sorted, versioned map● Built on top of HDFS● Row level ACID● Get, Put, Scan● Assorted RMW operations

Page 7: TriHUG 3/14: HBase in Production

Tables OverviewTables are sorted (lexicographically) key value pairs of uninterpreted byte[]s. Keyspace is divided up into regions of keys. Each region is hosted by exactly one machine.

Page 8: TriHUG 3/14: HBase in Production

R3R1

Server 1

Key Value

a byte[]

aa byte[]

b byte[]

bb byte[]

c byte[]

ca byte[]

R1: [a, b)

R2: [b, c)

R3: [c, d)

R2

Server 1

Table Overview

Page 9: TriHUG 3/14: HBase in Production

Operations● Layers of complexity● Normal failure modes

○ Hardware dies (or combust)○ Human error

● JVM● HDFS considerations● Lots of knobs

Page 10: TriHUG 3/14: HBase in Production

Cascading Failure1. High write volume fragments heap2. GC promotion failure3. Stop the world GC4. ZK timeout5. Receive YouAreDeadException, die6. Failover7. Goto 1

Page 11: TriHUG 3/14: HBase in Production

Useful Tunings● MSLAB enabled● hbase.regionserver.handler.count

○ Increasing puts more IO load on RS○ 50 is our sweet spot

● JVM tuning○ UseConcMarkSweepGC○ UseParNewGC

Page 12: TriHUG 3/14: HBase in Production

Monitoring Tools● Nagios for hardware checks● Cloudera Manager

○ Reporting and health checks○ Apache Ambari and MapR provide similar tools

● Hannibal + custom scripts○ Identify hot regions for splitting

Page 13: TriHUG 3/14: HBase in Production

Table Design● Table design is deceptively simple● Main Considerations:

○ Row key structure○ Number of column families

● Know your queries in advance

Page 14: TriHUG 3/14: HBase in Production

Additional Context● SAAS environment

○ “Twitter clone” model won’t work● Thousands of users millions, of attributes● Skewed customer base

○ Biggest clients have 10MM+ contacts○ Smallest have thousands

Page 15: TriHUG 3/14: HBase in Production

Row Keys● Most important decision● The only (native) index in HBase● Random reads and writes are fast

○ Sorted on disk and in memory○ Bloom filters speed read performance (not in use)

Page 16: TriHUG 3/14: HBase in Production

Hotspotting● Associated with monotonically increasing

keys○ MySql AUTO_INCREMENT

● Writes lock onto one region at a time● Consequences:

○ Flush and compaction storms○ $500K cluster limited by $10K machine

Page 17: TriHUG 3/14: HBase in Production

Row Key Advice● Read/Write ratio should drive design

○ We pay a write time penalty for faster reads● Identify queries you need to support● Consider composite keys instead of indexes● Bucketed/Salted keys are an option

○ Distribute writes across N buckets○ Rebucketing is difficult○ Requires N reads, slow workers

Page 18: TriHUG 3/14: HBase in Production

Variable Width Keyscustomer_hash::email● Allows scans for a single customer● Hashed id distributes customers● Sorted by email address

○ Could also use reverse domain for gmail, yahoo, etc.

Page 19: TriHUG 3/14: HBase in Production

Fixed Width Keyssite::contact::create::email● FuzzyRowFilter

○ Can fix site, contact, and reverse_create○ Can search for any email address○ Could use a fixed width encoding for domain

■ Search for just gmail, yahoo, etc● Distributes sites and users● Contacts sorted by create date

Page 20: TriHUG 3/14: HBase in Production

Column Families● Groupings of named columns● Versioning, compression, TTL● Different than BigTable

○ BigTable: 100s○ HBase: 1 or 2

Page 21: TriHUG 3/14: HBase in Production

Column Family ExampleId d {VERSIONS => 2} s7 {TTL => 604800}

a (address) p (phone) o:3-27 (open) c:3-20 (click)

dfajkdh byte[] byte[]:555-5555 byte[]

hnvdzu9 byte[]:1234 St. XXXX

hnvdzu9 byte[]:1233 St.

hnvdzu9 XXXX byte[]

er9asyjk byte[]: 324 Ave

Column Family Example

● PROTIP: Keep CF and qualifier names short ○ They are repeated on disk for every cell

● “d” supports 2 versions of each column, maps to demographics● “s7” has seven day TTL, maps to stats kept for 7 days.

Page 22: TriHUG 3/14: HBase in Production

MemStore

HDFSs2s1 s3

f1

Column Families In Depth

MemStore

HDFSs2s1

f2

my_table,,1328551097416.12921bbc0c91869f88ba6a044a6a1c50.

● StoreFile(s) for each CF in region

● Sparse● One memstore per CF

○ Must flush together● Compactions happen at

region level

(Region)

(family) (family)

Page 23: TriHUG 3/14: HBase in Production

Compactions● Rewrites StoreFiles

○ Improves read performance○ IO Intensive

● Region scope● Used to take > 50 hours● Custom script took it down to 18

○ Can (theoretically) run during the day

Page 24: TriHUG 3/14: HBase in Production

MemStore

HDFS

S1

f1

my_table,,1328551097416.12921bbc0c91869f88ba6a044a6a1c50.

(Region)

MemStore

HDFSs2

s1 s3

f1

my_table,,1328551097416.12921bbc0c91869f88ba6a044a6a1c50.

(Region)

Compaction Before and After

s4 s5 s6

Before After

K-Way Merge

Page 25: TriHUG 3/14: HBase in Production

The Table From Hell● 19 Column Families● 60% of our region count● Skewed write pattern

○ KB size store files○ Frequent compaction storms○ hbase.hstore.compaction.min.size (HBASE-5461)

● Moved to it’s own cluster

Page 26: TriHUG 3/14: HBase in Production

And yet...● Cluster remained operational

○ Table is still in use today● Met read and write demand● Regions only briefly active

○ Rowkeys by date and customer

Page 27: TriHUG 3/14: HBase in Production

What saved us● Keyed by customer and date● Effectively write once

○ Kept “active” region count low● Custom compaction script

○ Skipped old regions● More hardware● Were able to selectively migrate

Page 28: TriHUG 3/14: HBase in Production

Column Family Advice● Bad choice for fine grained partitioning● Good for

○ Similarly typed data○ Varying versioning/retention requirements

● Prefer intra row scans○ CF and qualifiers are sorted○ ColumnRangeFilter

Page 29: TriHUG 3/14: HBase in Production

Questions?