Breaking with relational dbms and dating with hbase

Post on 27-Jan-2015

110 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Session on Hbase at IndicThread Conference on Java, Dec 2010 http://j10.indicthreads.com/

Transcript

1

Gaurav KohliXebiaBreaking with DBMS and

Dating with

2

me

Gaurav Kohligaurav.in@gmail.com

ConsultantXebia IT Architects

3

Why are we here ?

Something about RDBMS

Limitations of RDBMS

Why Hbase or any NoSql solution

Overview of Hbase

Specific Use cases

Paradigm shift in Schema Design

Architecture of Hbase

Hbase Interface – Java API, Thrift

Conclusion

4

Databases

5

Relational Databases have a lot of

6

Data Set going into PetaBytes

RDBMS don't scale inherently Scale up/Scale out ( Load Balancing + Replication)

Hard to shard / partition

Both read / write throughput not possible Transactional / Analytical databases

Specialized Hardware …... is very expensive Oracle clustering

7

Master

Slave

Replication

8

MySQL master becomes a problem All Slaves must have the same write capacity as master Single point of failure, no easy failover

Master

Reads

Writes

Slave nodes

9

Master Master

Slave

Replication

10

11

12

2006.11 Google releases paper on BigTable

2007.2 Initial HBase prototype created as Hadoop contrib.

2007.10 First usable HBase

2008.1 Hadoop become Apache top-level project and HBase becomes

subproject 2010.5~

Hbase becomes Apache top-level project 2010.6

Hbase 0.26.5 released.

2010.10

HBase 0.89.2010092 – third developer release

13

Distributed uses HDFS for storage

Column-Oriented

Multi-Dimensional versions

High-Availability

High-Performance

Storage System

14

A Sql Database

No Joins, no query engine, no datatypes, no sql No Schema

Denormalized data

Wide and sparsely populated data structure(key-value)

No DBA needed

Hbase is

15

Bigness Big data, big number of users, big number of computers

Massive write performance Facebook needs 135 billion messages a month Twitter stores 7 TB data per day

Fast key-value access

Write availability

No Single point of failure

16

Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, etc.

Real-time inserts, updates, and queries.

Fraud detection by comparing transactions to known patterns in real-time.

Analytics - Use MapReduce, Hive, or Pig to perform analytical queries

Specific

17

Column-oriented database

Table are sorted by Row

Table schema only defines Column families column family can have any number of columns

Each cell value has a timestamp

18

19

20

Sorted Map(

RowKey, List(

SortedMap(Column, List(

value, Timestamp)

))

)SortedMap(RowKey,List(SortedMap(Column,List(Value,Timestamp)))

21

A BIG SORTED MAP Row Key+ Column Key + timestamp => value

Row Key Column Key Timestamp Value

1 info:name 1273516197868 Gaurav

1 info:age 1273871824184 28

1 info:age 1273871823022 34

1 info:sex 1273746281432 Male

2 info:name 1273863723227 Harsh

3 Info:name 1273822456433 Raman

2 Versionsof this row

Timestamp is a long valueColumn Qualifier/Name

Sorted by Row key andcolumn key

Column family

Student table

22

Example of a Student and Subject

Student TablePK id

nameagesex

Example of a Student and Subject

Subject TablePK id

titleintroductionteacher_id

Student-Subject Tablestudent_id

subject_id

type

m n

23

Example of a Student and Subject

RDBMS

key name age sex1 Gaurav 28 Male

id title introduction teacher_id1 Hbase Hbase is cool 10

Student table

Subject table

student_id subject_id type

1 1 elective

Student-Subject table

24

Hbase

Student-Subject schema - Hbase

Row Key Column family Column Keys

student_id info name, age, sex

student_id subjects Subject Id's as qualifier(key)

Row Key Column family Column Keyssubject_id info title, introduction, teacher_id

subject_id students Student id's as qualifier(key)

Student table

Subject table

25

Hbase

key info subjects1 info:name=Gaurav

info:age=28info:sex=Male

subjects:1=”elective”subjects:2=”main”

key info students1 info:title=Hbase

info:introduction=Hbase is coolinfo:teacher_id=10

students:1students:2

Student-Subject schema - HbaseStudent table

Subject table

26

Attribute Possible Values Default

COMPRESSION NONE,GZ,LZO NONE

VERSIONS 1+ 3

TTL 1-2147483647(seconds) 2147483647

BLOCKSIZE 1 byte – 2 GB 64k

IN_MEMORY true,false false

BLOCKCACHE true,false true

27

Region: Contiguous set of lexicographically sorted rows

hbase.hregion.max.filesize (default:256 Mb) Region hosted by Region Servers

Each Table is partitioned into Regions

28

Regions and

row200

row201

row500

row1

new row

29

Regions and

row200

row201

row350

row1

row 351

row 501

30

Master

Zookeeper

RegionServers

HDFS

MapReduce

31

32

– Java API, Thrift...

33

– Java API, Thrift... Java

Thrift ( Ruby, Php, Python, Perl, C++... )

REST

Groovy DSL

MapReduce

Hbase Shell

34

– Java API, Thrift... Java

Get Put Delete Scan IncrementalColumnValue

35

36

Hbase v/s RDBMS Not a replacement Solves only a small subset(~5%)

37

Where Sql makes life easy Joining Secondary Indexing Referential Integrity (updates) ACID

Where Hbase makes life easy Dataset scale Read/Write scale

Replication Batch analysis

38

39

40

Hbase Apache (http://hbase.apache.org/)

Hbase Wiki (wiki.apache.org/hadoop/Hbase)

Hbase blog (blog.hbase.org)

Images from Google Search

http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

http://highscalability.com/blog/2010/12/6/what-the-heck-are-you-actually-using-nosql-for.html

top related