Top Banner
WWW.NGDATA.COM Making Sense of Data Lily goes shopping – real-time recommendations with HBase HBaseCon, May 2012 Steven Noels – VP Product – @stevenn
13

HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

Jun 21, 2015

Download

Technology

Cloudera, Inc.

HBase brings interactivity to Hadoop, and allows users to collect, manage and process data in real-time. Lily wraps HBase and Solr in a comprehensive Big Data platform, with HBase-native secondary indexing complementing ad-hoc structured search. Through spare write-cycles during read operations, Lily transforms HBase in an scalable data management engine providing interactive analytics, profile harvesting and real-time recommendations. This talk highlights the architecture of Lily, how it completes HBase, and explains some of its implementation use cases.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

Making Sense of Data

Lily goes shopping – real-time recommendations with HBase

HBaseCon, May 2012

Steven Noels – VP Product – @stevenn

Page 2: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

•  HBase-backed data repository, with batteries included

•  Data model:

•  high-level data model on top of HBase’s byte[]’s

•  schema

•  versioning (schema and data)

•  links, variants

•  Java & REST API's

•  Indexing:

•  through configuration, not implementation

•  incremental and batch index maintenance

•  RowLog: distributed, durable queue for sec. actions

•  Open Source: www.lilyproject.org (Apache License)

Lily Core 2’ recap

HBase

Lily

Solr et al.

RowLog

client app

Page 3: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

•  BigTable model

•  sparseness

•  atomic row updates aka concistency

•  auto-partitioning

•  Apache license

•  A great community led by a Saint J

Why HBase?

Page 4: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

Portfolio Overview

Schema and Data Management Total Data Aggregation Real-time Index and Retrieval Security and Enterprise Connectors

Profile Development Context and Activity Tracking

Social Stream Ingestion

Real-time AI Recommendations Industry algorithms and rules

Trend Analytics Pattern Detection

open source  

commercial availability  

Page 5: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

Some of the larger Lily deployments

•  media

•  aggregation, database publishing and online archives

•  finance

•  real-time identity fraud detection

•  retail banking

•  contextualized (time+loc+person) mobile coupons

•  retail

•  e-commerce platform: product catalog, consumer data store, real-time indexing

Lily (=HBase) In Use

Page 6: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

Collaborative Filtering?

Recommend items similar to a user’s highly-preferred items

Page 7: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

Collaborative Filtering is … Matrixes

Sean likes “Scarface” a lot Robin likes “Scarface” somewhat Grant likes “The Notebook” not at all …

(123,654,5.0)!(789,654,3.0)!(345,876,1.0)!…!

(345,654,4.5)!…!

(Magic)

Grant may like “Scarface” quite a bit …

Page 8: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

Personalized offers

Contextualized recommendations

Item Acitvity Profile

creditcard statements

shops & merchants product families offers/coupons

Page 9: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

Lily Core Repository

Fitting Recommendations into the Lily Architecture

indexes

activity storerowlog

LILY CRUD API

data, activity, profile scoring

co-occurencelookup matrix

read/write demultiplexer

LILY recommender engine

Steven [email protected]

www.ngdata.comtelephone: +32 9 33 88 220

Gent (Belgium)

Makers ofALS

k-m

ea

ns

pro

pe

nsit

y

cu

sto

m ..

.

algorithm support

data store

profile store

Lily/HBase Secondary Indexes

Page 10: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

•  Transaction-based preferencing

•  Pluggable preference strategies, using Lily-based data (HBase&Solr) for decision making •  e.g. credit card statement = transactions between users and product

families

•  Preference weighting

•  Ingest: REST API, bulk support

•  Real-time updating of the recommendation model

•  Profile Store

•  Profile activities can be preferenced

•  Support for Profile behavior analysis

Preferencing aka Feeding the Matrix

Page 11: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

•  Recommender

•  Pluggable recommender strategies, using Lily-based data (HBase&Solr) for decision making

•  Multi-model support: user-item & item-user recommendations

•  Estimation of both preferenced and non-preferenced items

•  Geolocation-based recommendations

•  Re-scoring

•  REST API

•  (Planned)

•  Support for Classifications (scenario - Recommend me all (possible) coffee drinkers)

•  Matrix / recommendation indexing

Making recommendations

Page 12: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

•  Secondary indexes (= Lily Core!)

•  indexes are defined through configuration

•  single or multi-field indexes

•  range queries and prefix queries

•  asc or desc sorted results

•  can read huge, sorted lists

•  synchronously updated: index updates are applied by rowlog secondary actions

•  online building of new indexes (no table locks)

•  MapReduce integration

•  SolrCloud integration

•  Index shards and configuration managed through ZooKeeper

Other upcoming Lily Features

Page 13: HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily & HBase - ngdata

WWW.NGDATA.COM

Making Sense of Data

Questions? Thank you!