8/14/2019 HBase @ Meetup
1/21
HBase @ MeetupGary Helmling Lead SW Engineer
8/14/2019 HBase @ Meetup
2/21
8/14/2019 HBase @ Meetup
3/21
The Solution
Show activity from allyour groups in oneplace
real-time updates better discovery of
what's going on find new ways to
participate and get toknow your groups
8/14/2019 HBase @ Meetup
4/21
Challenges
Normalized schema Each type of activity requires querying a separate table
already wasn't scaling at the group level
Query efficiency Activity occurs at group level Members can be in hundreds of groups For member home page we need activity from all groups ordered by
most recent
N subqueries by group ID merged back by descending timestamp
8/14/2019 HBase @ Meetup
5/21
8/14/2019 HBase @ Meetup
6/21
Why HBase?
We own infrastructure, no usage limits Data model
Semi-structured data in HBase (easily handles multiple types in sametable)
Time-series ordered Scaling is built in (just add more servers) But extra indexing is DIY
Very active developer community Established, mature project (in relative terms!) Matches our own toolset (java/linux based)
8/14/2019 HBase @ Meetup
7/21
8/14/2019 HBase @ Meetup
8/21
What is HBase?Data Storage
Table Regions, defined by row [start key, end key)
Store, 1 per family 1+ Store Files (Hfile format on HDFS)
(table, rowkey, family, column, timestamp) = value Everything is byte[] Rows are ordered sequentially by key Special tables: -ROOT-, .META.
Tell clients where to find user data
8/14/2019 HBase @ Meetup
9/21
HBase ArchitectureCourtesy of Lars George
from http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.htmlhttp://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html8/14/2019 HBase @ Meetup
10/21
What is HBase?Data Access
Random access (Gets) by rowkey only
Sequential reads (Scans) starting row key where you stop is as important as where you start
ending row key (optional) server-side filter (optional)
Writes (Puts) No insert vs. update distinction
8/14/2019 HBase @ Meetup
11/21
How It WorksStoring activity data in HBase
FeedItem : stores activity data for all types keyed by group and descending timestamp
ch -ts - -
each row only contains data for that typeRow Key info: content:ch1261585-ts9223... item_type = chapter_greeting
target_greeting = 8104438greeting = Hi, Gary
ch1261585-ts9223... item_type = new_discussiontarget_forum = 847743target_thread = 7369603
title = Improvementsbody = When adiscussion is created...
MemberFeedIndex : index of FeedItem rows from all of a member's groups one row per member (keyed by member ID) columns store refs to FeedItem row keys for that member's groups TTL of 2 months expires old index values
Row Key item:
4679998 ch176399-ts9223370788400750807-mem-10044424 = new_member ch1261585-ts9223370787431124807-ptag-8525047 = photo_tag...
8/14/2019 HBase @ Meetup
12/21
8/14/2019 HBase @ Meetup
13/21
How it WorksSecondary index tables
Still need to find rows by column values tried tableindexed contrib (0.19 release), high CPU usage & contention
on scans decided to update to 0.20 release for other performance improvements built secondary indexing into app layer
Separate table per indexed column FeedItem info:actor_member indexed by FeedItem-by_actor_member Index table rows keyed by column value and descending timestamp
-< Long.MAX_VALUEtimestamp >-
Zero pad numeric values (or big-endian representation) for correct byteordering
8/14/2019 HBase @ Meetup
14/21
How it WorksSecondary index tables
ex. FeedItem-by_actor_member Row Key info: __idx__:
0002851766-9223370783553935005- rowkey actor_member = 2851766item_type = new_rsvppub_date =
row = ch1143475-ts9223370783553935005-rsvp-54704795
0004679998-9223370783650851832- rowkey actor_member = 4679998
item_type = new_discussionpub_date =
row = ch1261585-
ts9223370783650851832-disc-7369603
Row Key info: content:
ch1143475-ts9223370783553935005-rsvp-54704795 actor_member = 2851766
item_type = new_rsvppub_date =
comment = See you there
ch1261585-ts9223370783650851832-disc-7369603 actor_member = 4679998item_type = new_discussionpub_date =
title = Next monthbody = ...
indexes FeedItem
8/14/2019 HBase @ Meetup
15/21
Interacting with HBaseMeetup.Beeno
package com.meetup.feeds.db;
...
@HEntity (name="FeedItem")public class FeedItem implements Externalizable {
...
@HRowKeypublic String getId() { return this.id; }public void setId(String id) { this.id = id; }
@HProperty (family="info", name="actor_member",indexes = { @HIndex (date_col="info:pub_date", date_invert=true,
extra_cols={"info:item_type"}) } )public Integer getMemberId() { return this.memberId; }public void setMemberId(Integer id) { this.memberId = id; }
Java Beans mapped to HBase tables
8/14/2019 HBase @ Meetup
16/21
Interacting with HBaseServices
Base service class provides round-tripping based on annotations
public class EntityService {
public T get( String rowKey ) throws HBaseException {}
public void save( T entity ) throws HBaseException {}
public void saveAll( List entities ) throws HBaseException {}
public void delete( String rowKey ) throws HBaseException {}
public Query query() throws MappingException {}
}
easily extended for specific needs
Almost all HBase interaction through service instances.
8/14/2019 HBase @ Meetup
17/21
Interacting with HBaseQueries
Find all items related to a discussion
FeedItemService service = new FeedItemService(DiscussionItem.class);Query query =
service.query()
.using( Criteria.eq("threadId", threadId) );List items = query.execute();
Find all greetings from a given member
FeedItemService service = new FeedItemService(GreetingItem.class);Query query =
service.query()
.using( Criteria.eq("memberId", memberId) )
.where( Criteria.eq(type,FeedItem.ItemType.CHAPTER_GREETING) );
List items = query.execute();
Simple Query API uses mappings and secondary index tables
8/14/2019 HBase @ Meetup
18/21
Interacting with HBaseMember Feed Retrieval
// retrieve the member's index recordHTable mfiTable = HUtil.getTable("MemberFeedIndex");Get get = new Get( Bytes.toBytes(String.valueOf(memberId)) );get.addFamily( Bytes.toBytes("item") );Result r = mfiTable.get(get);
FeedItemService service = new FeedItemService();Set sortedKeys = sortKeys(r);List items = new ArrayList();
// for each index col get the entity recordfor (IndexKey key : sortedKeys) {
FeedItem item = service.get(key.getKey());if (item != null)
items.add(item);}
// populate member and chapter info
Get latest activity from all a member's groups using MemberFeedIndex
8/14/2019 HBase @ Meetup
19/21
HBase @ MeetupIssues along the way
Performance testing Product targeting 3 of our highest traffic pages, simulating load is hard Started with load scripts Moved to testing with live traffic
Use AJAX calls to simulate requests Selective enable for X% of traffic
Launched data collection/write traffic first Allowed tweaking configuration before impacting user experience
8/14/2019 HBase @ Meetup
20/21
HBase @ MeetupIssues along the way
High CPU / Concurrency issues Updated to 0.20 release for performance gains across the board Replaced tableindexed usage with application level secondary indexing
Hot regions - profile page hits small table every pageload
Force split table to distribute across multiple servers Newest region still handling high load
changed index keying to -- for even
distribution I/O Heavy load / MemberFeedIndex table growing
Lowered MemberFeedIndex time-to-live to 2 months Enabled LZO compression
8/14/2019 HBase @ Meetup
21/21
HBase @ MeetupCurrent Status
Live traffic growing Cluster handling ~2.5k 3k request/sec 50+% still write traffic ~17% of page views hit HBase (for reads) Expanding to 30% of page views in coming months
Meetup.Beeno now open-source on Github: http://github.com/ghelmling/meetup.beeno
Next up Continue tweaking Site analytics
http://github.com/ghelmling/meetup.beenohttp://github.com/ghelmling/meetup.beeno