Top Banner
© Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis Soztutar enis [at] apache [dot] org @enissoz Page 1 Architecting the Future of Big Data
38

Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

Apr 30, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Inte

gra

tio

n o

f A

pa

ch

e H

ive

an

d H

Ba

se

En

is S

oztu

tar

en

is [a

t] a

pa

ch

e [d

ot] o

rg

@e

nis

so

z

Page 1

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 2: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Ag

en

da

Page 3

A

rchitecting the F

utu

re o

f B

ig D

ata

• O

ve

rvie

w o

f H

ive

an

d H

Ba

se

• H

ive

+ H

Ba

se

Fe

atu

res a

nd

Im

pro

ve

me

nts

• F

utu

re o

f H

ive

an

d H

Ba

se

• Q

&A

Page 3: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Ap

ach

e H

ive

Ove

rvie

w

• A

pa

ch

e H

ive

is a

da

ta w

are

ho

use

syste

m fo

r H

ad

oo

p

• S

QL

-lik

e q

ue

ry la

ng

ua

ge

ca

lled

Hiv

eQ

L

• B

uilt

fo

r P

B s

ca

le d

ata

• M

ain

pu

rpo

se

is a

na

lysis

an

d a

d h

oc q

ue

ryin

g

• D

ata

ba

se

/ ta

ble

/ p

art

itio

n / b

ucke

t –

DD

L O

pe

ratio

ns

• S

QL T

yp

es +

Co

mp

lex T

yp

es (

AR

RA

Y, M

AP, e

tc)

• V

ery

exte

nsib

le

• N

ot fo

r : sm

all

da

ta s

ets

, lo

w la

ten

cy q

ue

rie

s, O

LT

P

Page 4

A

rch

itecting the F

utu

re o

f B

ig D

ata

Page 4: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Ap

ach

e H

ive

Arc

hite

ctu

re

Page 5

A

rchitecting the F

utu

re o

f B

ig D

ata

Me

tasto

re

RD

BM

S

Hiv

e T

hrift

Se

rve

r

Dri

ve

r

CL

I

JD

BC

/OD

BC

Hiv

e W

eb

Inte

rfa

ce

HD

FS

Ma

pR

ed

uce

Exe

cu

tio

n

Pa

rse

r P

lan

ne

r

Rd

HD

FS

du

ce

d

Op

tim

ize

r

M S C l i e n t

Page 5: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Ove

rvie

w o

f A

pa

ch

e H

Ba

se

• A

pa

ch

e H

Ba

se

is th

e H

ad

oo

p d

ata

ba

se

• M

od

ele

d a

fte

r G

oo

gle

’s B

igTa

ble

• A

sp

ars

e, d

istr

ibu

ted

, p

ers

iste

nt m

ulti- d

ime

nsio

na

l so

rte

d

ma

p

• T

he

ma

p is in

de

xe

d b

y a

ro

w k

ey,

co

lum

n k

ey,

an

d a

tim

esta

mp

• E

ach

va

lue

in

th

e m

ap

is a

n u

n-in

terp

rete

d a

rra

y o

f b

yte

s

• L

ow

la

ten

cy r

an

do

m d

ata

acce

ss

Page 6

A

rchitecting the F

utu

re o

f B

ig D

ata

Page 6: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Ove

rvie

w o

f A

pa

ch

e H

Ba

se

• L

og

ica

l vie

w:

Page 7

A

rchitecting the F

utu

re o

f B

ig D

ata

Fro

m:

Big

table

: A

Dis

trib

ute

d S

tora

ge

Syste

m fo

r S

tru

ctu

red

Da

ta, C

ha

ng, e

t a

l.

Page 7: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Ap

ach

e H

Ba

se A

rch

ite

ctu

re

Page 8

A

rch

itecting the F

utu

re o

f B

ig D

ata

Clie

nt

Zo

oke

ep

er

HM

aste

r

Re

gio

n

se

rve

r

Re

gio

n

Re

gio

n

Re

gio

n

se

rve

r

Re

gio

n

Re

gio

n

Re

gio

n

se

rve

r

Re

gio

n

Re

gio

n

H

HD

FS

Page 8: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Hiv

e +

HB

ase

Fe

atu

res a

nd

Imp

rove

me

nts

Arc

hitecting the F

utu

re o

f B

ig D

ata

P

age 9

Page 9: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Hiv

e +

HB

ase

Mo

tiva

tio

n

• H

ive

an

d H

Ba

se

ha

s d

iffe

ren

t ch

ara

cte

ristics:

• H

ive

da

taw

are

ho

use

s o

n H

ad

oo

p a

re h

igh

la

ten

cy

– L

on

g E

TL tim

es

– A

cce

ss to

re

al tim

e d

ata

• A

na

lyzin

g H

Ba

se d

ata

with M

ap

Re

du

ce r

eq

uire

s c

usto

m

co

din

g

• H

ive

an

d S

QL a

re a

lre

ad

y k

no

wn

by m

an

y a

na

lysts

Page 1

0

Arc

hitecting the F

utu

re o

f B

ig D

ata

Hig

h la

ten

cy

vs.

Lo

w la

ten

cy

Str

uctu

red

U

nstr

uctu

red

An

aly

sts

P

rog

ram

me

rs

Page 10: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Use

Ca

se

1: H

Ba

se

as E

TL D

ata

Sin

k

Page 1

1

Arc

hitecting the F

utu

re o

f B

ig D

ata

Fro

m H

UG

- H

ive

/HB

ase In

teg

ratio

n o

r, M

ayb

eS

QL?

April 2

01

0 J

oh

n S

ich

i F

ace

bo

ok

htt

p:/

/ww

w.s

lide

sh

are

.ne

t/h

ad

oo

pu

se

rgro

up/h

ive

-h-b

ase

ha

do

op

ap

r20

10

Page 11: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Use

Ca

se

2: H

Ba

se

as D

ata

So

urc

e

Page 1

2

Arc

hitecting the F

utu

re o

f B

ig D

ata

Fro

m H

UG

- H

ive

/HB

ase In

teg

ratio

n o

r, M

ayb

eS

QL?

April 2

01

0 J

oh

n S

ich

i F

ace

bo

ok

htt

p:/

/ww

w.s

lide

sh

are

.ne

t/h

ad

oo

pu

se

rgro

up/h

ive

-h-b

ase

ha

do

op

ap

r20

10

Page 12: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Use

Ca

se

3: L

ow

La

ten

cy W

are

ho

use

Page 1

3

Arc

hitecting the F

utu

re o

f B

ig D

ata

Fro

m H

UG

- H

ive

/HB

ase In

teg

ratio

n o

r, M

ayb

eS

QL?

April 2

01

0 J

oh

n S

ich

i F

ace

bo

ok

htt

p:/

/ww

w.s

lide

sh

are

.ne

t/h

ad

oo

pu

se

rgro

up/h

ive

-h-b

ase

ha

do

op

ap

r20

10

Page 13: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Exa

mp

le: H

ive

+ H

ba

se (

HB

ase

ta

ble

)

hbase(main):001:0> create 'short_urls', {NAME =>

'u'}, {NAME=>'s'}

hbase(main):014:0> scan 'short_urls'

ROW COLUMN+CELL

bit.ly/aaaa column=s:hits, value=100

bit.ly/aaaa column=u:url,

value=hbase.apache.org/

bit.ly/abcd column=s:hits, value=123

bit.ly/abcd column=u:url,

value=example.com/foo

P

age 1

4

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 14: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Exa

mp

le: H

ive

+ H

Ba

se

(H

ive

ta

ble

)

CREATE TABLE short_urls(

short_url string,

url string,

hit_count int

)

STORED BY

'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES

("hbase.columns.mapping" = ":key, u:url, s:hits")

TBLPROPERTIES

("hbase.table.name" = ”short_urls");

P

age 1

5

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 15: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Sto

rag

e H

an

dle

r

• H

ive

de

fin

es H

ive

Sto

rag

eH

an

dle

r cla

ss fo

r d

iffe

ren

t sto

rag

e

ba

cke

nd

s: H

Ba

se

/ C

assa

nd

ra / M

on

go

DB

/ e

tc

• S

tora

ge

Ha

nd

ler

ha

s h

oo

ks fo

r

–  G

ettin

g in

pu

t / o

utp

ut fo

rma

ts

–  M

eta

da

ta o

pe

ratio

ns h

oo

k: C

RE

AT

E T

AB

LE

, D

RO

P T

AB

LE

, e

tc

• S

tora

ge

Ha

nd

ler

is a

ta

ble

le

ve

l co

nce

pt

–  D

oe

s n

ot su

pp

ort

Hiv

e p

art

itio

ns, a

nd

bu

cke

ts

Page 1

6

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 16: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Ap

ach

e H

ive

+ H

Ba

se

Arc

hite

ctu

re

Page 1

7

Arc

hitecting the F

utu

re o

f B

ig D

ata

Me

tasto

re

RD

BM

S

Hiv

e T

hrift

Se

rve

r

Dri

ve

r

CL

I H

ive

We

b

Inte

rfa

ce

HD

FS

Ma

pR

ed

uce

Exe

cu

tio

n

Pa

rse

r P

lan

ne

r

Rd

HD

FS

duce

d

Op

tim

ize

r

M S

C l i e n t

HB

ase

Sto

rag

eH

an

dle

r

B

dl

St

H

Page 17: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Hiv

e +

HB

ase

In

teg

ratio

n

• F

or

Inp

ut/O

utp

utF

orm

at, g

etS

plit

s()

, e

tc u

nd

erlyin

g H

Ba

se

cla

sse

s a

re u

se

d

• C

olu

mn

se

lectio

n a

nd

ce

rta

in filt

ers

ca

n b

e p

ush

ed

do

wn

• H

Ba

se

ta

ble

s c

an

be

use

d w

ith

oth

er(

Ha

do

op

na

tive

) ta

ble

s

an

d S

QL c

on

str

ucts

• H

ive

DD

L o

pe

ratio

ns a

re c

on

ve

rte

d to

HB

ase

DD

L

op

era

tio

ns v

ia th

e c

lien

t h

oo

k.

– A

ll o

pe

ratio

ns a

re p

erf

orm

ed

by th

e c

lien

t

– N

o tw

o p

ha

se

co

mm

it

Page 1

8

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 18: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Sch

em

a / T

yp

e M

ap

pin

g

Arc

hitecting the F

utu

re o

f B

ig D

ata

P

age 1

9

Page 19: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Sch

em

a M

ap

pin

g

• H

ive

ta

ble

+ c

olu

mn

s +

co

lum

n typ

es <

=>

HB

ase

ta

ble

+ c

olu

mn

fam

ilie

s (

+ c

olu

mn

qu

alif

iers

)

• E

ve

ry fie

ld in

Hiv

e ta

ble

is m

ap

pe

d in

ord

er

to e

ith

er

– T

he

ta

ble

ke

y (

usin

g :ke

y a

s s

ele

cto

r)

– A

co

lum

n fa

mily

(cf:)

-> M

AP

fie

lds in

Hiv

e

– A

co

lum

n (

cf:cq

)

•  H

ive

ta

ble

do

es n

ot n

ee

d to

in

clu

de

all

co

lum

ns in

HB

ase

• 

Page 2

0

Arc

hitecting the F

utu

re o

f B

ig D

ata

CREATE TABLE short_urls(

short_url string,

url string,

hit_count int,

props, map<string,string>

)

WITH SERDEPROPERTIES

("hbase.columns.mapping" = ":key, u:url, s:hits, p:")

int,

<st

PERT

ns

mapping"

="

,

ring

RTIES

mapping"

="

g,string>

ing"

="":key

u:url

s:hits

p:

rl

s:hit

":key

u:ur

>

":ke

Page 20: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Typ

e M

ap

pin

g

• R

ece

ntly a

dd

ed

to

Hiv

e (

0.9

.0)

• P

revio

usly

all

typ

es w

ere

be

ing

co

nve

rte

d to

str

ing

s in

HB

ase

• H

ive

ha

s:

– P

rim

itiv

e typ

es: IN

T, S

TR

ING

, B

INA

RY, D

AT

E,

etc

– A

RR

AY

<Typ

e>

– M

AP

<P

rim

itiv

eTyp

e, Typ

e>

– S

TR

UC

T<

a:I

NT,

b:S

TR

ING

, c:S

TR

ING

>

• H

Ba

se

do

es n

ot

ha

ve

typ

es

– B

yte

s.to

Byte

s()

Page 2

1

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 21: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Typ

e M

ap

pin

g

• Ta

ble

le

ve

l p

rop

ert

y

"hbase.table.default.storage.type” = “binary”

• Typ

e m

ap

pin

g c

an

be

giv

en

pe

r co

lum

n a

fte

r #

– A

ny p

refix o

f “binary”

,

eg

u:url#b

– A

ny p

refix o

f “string”

, e

g u:url#s

– T

he

da

sh

ch

ar “-”

, e

g u:url#-

Page 2

2

CREATE TABLE short_urls(

short_url string,

url string,

hit_count int,

props, map<string,string>

)

WITH SERDEPROPERTIES

("hbase.columns.mapping" = ":key#b,u:url#b,s:hits#b,p:#s")

nt,

str

ERT

smapping"

=":

,

ring,

TIES

mapping"

=":

string>

ing"

=":

s:hits#b

p:#s"

rl#b

s:hit

#b

u:ur

:key#b

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 22: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Typ

e M

ap

pin

g

• If th

e typ

e is n

ot a

prim

itiv

e o

r M

ap

, it is c

on

ve

rte

d to

a J

SO

N

str

ing

an

d s

eria

lize

d

• S

till

a fe

w r

ou

gh

ed

ge

s fo

r sch

em

a a

nd

typ

e m

ap

pin

g:

– N

o H

ive

BIN

AR

Y s

up

po

rt in

HB

ase

ma

pp

ing

– N

o m

ap

pin

g o

f H

Ba

se

tim

esta

mp

(ca

n o

nly

pro

vid

e p

ut

tim

esta

mp

)

– N

o a

rbitra

ry m

ap

pin

g o

f S

tru

cts

/ A

rra

ys in

to H

Ba

se

sch

em

a P

age 2

3

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 23: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Bu

lk L

oa

d

• S

tep

s to

bu

lk lo

ad

:

– S

am

ple

so

urc

e d

ata

fo

r ra

ng

e p

art

itio

nin

g

– S

ave

sa

mp

ling

re

su

lts to

a file

– R

un

CL

US

TE

R B

Y q

ue

ry u

sin

g H

ive

HF

ileO

utp

utF

orm

at and

To

talO

rde

rPa

rtitio

ne

r

– Im

po

rt H

file

s in

to H

Ba

se

ta

ble

• Id

ea

l se

tup

sh

ou

ld b

e

SE

T h

ive

.hb

ase

.bu

lk=

tru

e

INS

ER

T O

VE

RW

RIT

E T

AB

LE

we

b_

tab

le S

EL

EC

T …

.

Page 2

4

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 24: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Filt

er

Pu

sh

do

wn

Arc

hitecting the F

utu

re o

f B

ig D

ata

P

age 2

5

Page 25: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Filt

er

Pu

sh

do

wn

• Id

ea

is to

pa

ss d

ow

n filt

er

exp

ressio

ns to

th

e s

tora

ge

la

ye

r to

min

imiz

e s

ca

nn

ed

da

ta

• To

acce

ss in

de

xe

s a

t H

DF

S o

r H

Ba

se

• E

xa

mp

le:

CREATE EXTERNAL TABLE users (userid LONG, email STRING, … )

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler’

WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,…")

SELECT ... FROM users WHERE userid > 1000000 and email LIKE

‘%@gmail.com’;

-> scan.setStartRow(Bytes.toBytes(1000000))

Page 2

6

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 26: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Filt

er

De

co

mp

ositio

n

• O

ptim

ize

r p

ush

es d

ow

n th

e p

red

ica

tes to

th

e q

ue

ry p

lan

• S

tora

ge

ha

nd

lers

ca

n n

eg

otia

te w

ith

th

e H

ive

op

tim

ize

r to

de

co

mp

ose

th

e filt

er

x > 3 AND upper(y) = 'XYZ’

• H

an

dle

x > 3

, se

nd upper(y) = ’XYZ’

as r

esid

ua

l fo

r H

ive

• W

ork

s w

ith

:

key = 3, key > 3

, e

tc

key > 3 AND key < 100

• O

nly

wo

rks a

ga

inst co

nsta

nt e

xp

ressio

ns

Page 2

7

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 27: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Se

cu

rity

Asp

ects

To

wa

rds fu

lly s

ecu

re d

ep

loym

en

ts

Arc

hitecting the F

utu

re o

f B

ig D

ata

P

age 2

8

Page 28: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Se

cu

rity

– B

ig P

ictu

re

• S

ecu

rity

be

co

me

s m

ore

im

po

rta

nt to

su

pp

ort

en

terp

rise

le

ve

l a

nd

mu

lti te

na

nt a

pp

lica

tio

ns

• 5

Diffe

ren

t C

om

po

ne

nts

to

en

su

re / im

po

se

se

cu

rity

– H

DF

S

– M

ap

Re

du

ce

– H

Ba

se

– Z

oo

ke

ep

er

– H

ive

• E

ach

co

mp

on

en

t h

as:

– A

uth

en

tica

tio

n

– A

uth

oriza

tio

n

Page 2

9

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 29: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

HB

ase

Se

cu

rity

– C

lose

r lo

ok

• R

ele

ase

d w

ith

HB

ase

0.9

2

• F

ully

op

tio

na

l m

od

ule

, d

isa

ble

d b

y d

efa

ult

• N

ee

ds a

n u

nd

erlyin

g s

ecu

re H

ad

oo

p r

ele

ase

• S

ecu

reR

PC

En

gin

e: o

ptio

na

l e

ng

ine

en

forc

ing

SA

SL

au

the

ntica

tio

n

– K

erb

ero

s

– D

IGE

ST-M

D5

ba

se

d to

ke

ns

– TokenProvider

co

pro

ce

sso

r

• A

cce

ss c

on

tro

l is

im

ple

me

nte

d a

s a

Co

pro

ce

sso

r:

AccessController

• S

tore

s a

nd

dis

trib

ute

s A

CL d

ata

via

Zo

oke

ep

er

– S

en

sitiv

e d

ata

is o

nly

acce

ssib

le b

y H

Ba

se

da

em

on

s

– C

lien

t d

oe

s n

ot n

ee

d to

au

the

ntica

te to

zk

Page 3

0

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 30: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Hiv

e S

ecu

rity

– C

lose

r lo

ok

• H

ive

ha

s d

iffe

ren

t d

ep

loym

en

t o

ptio

ns, se

cu

rity

co

nsid

era

tio

ns

sh

ou

ld ta

ke

in

to a

cco

un

t d

iffe

ren

t d

ep

loym

en

ts

• A

uth

en

tica

tio

n is o

nly

su

pp

ort

ed

at M

eta

sto

re, not on

Hiv

eS

erv

er, w

eb

in

terf

ace

, JD

BC

• A

uth

oriza

tio

n is e

nfo

rce

d a

t th

e q

ue

ry la

ye

r (D

rive

r)

• P

lug

ga

ble

au

tho

riza

tio

n p

rovid

ers

. D

efa

ult o

ne

sto

res g

lob

al/

tab

le/p

art

itio

n/c

olu

mn

pe

rmis

sio

ns in

Me

tasto

re

GRANT ALTER ON TABLE web_table TO USER bob;

CREATE ROLE db_reader

GRANT SELECT, SHOW_DATABASE ON DATABASE mydb TO

ROLE db_reader

Page 3

1

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 31: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Hiv

e D

ep

loym

en

t O

ptio

n 1

Page 3

2

Arc

hitecting the F

utu

re o

f B

ig D

ata

Clie

nt

M

eta

sto

re

R

DB

MS

Drive

r

CL

I

HD

FS

Ma

pR

ed

uce

Exe

cu

tio

n

Pa

rse

r P

lan

ne

r

du

ce

er

Op

tim

ize

r

Au

tho

riza

tio

n

Au

the

ntica

tio

n

RD

BM

S

A1

2n

/A11

N

A1

2n

/A11

N

MA

/A

HB

ase

A/A

M S

C

l i e n t

A/A

Page 32: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Hiv

e D

ep

loym

en

t O

ptio

n 2

Page 3

3

Arc

hitecting the F

utu

re o

f B

ig D

ata

Clie

nt

Me

tasto

re

R

DB

MS

Drive

r

CL

I

HD

FS

Ma

pR

ed

uce

Exe

cu

tio

n

Pa

rse

r P

lan

ne

r

Op

tim

ize

r

Au

the

ntica

tio

n

Au

tho

riza

tio

n

A1

2n

/A11

N

A1

2n

/A11

N

M S

C l i e n t

HB

ase

A/A

A

/A

Page 33: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Hiv

e D

ep

loym

en

t O

ptio

n 3

Page 3

4

Arc

hitecting the F

utu

re o

f B

ig D

ata

Clie

nt

Me

tasto

re

RD

BM

S

Hiv

e T

hrift

Se

rve

r

Drive

r CL

I

JD

BC

/OD

BC

Hiv

e W

eb

Inte

rfa

ce

HD

FS

Ma

pR

ed

uce

Exe

cu

tio

n

Pa

rse

r P

lan

ne

r

Op

tim

ize

r

Mt

t

Au

the

ntica

tio

n

Au

tho

riza

tio

n

A1

2n

/A11

N

A1

2n

/A11

N

M S

C

l i e

n t

HB

ase

A/A

A

/A

Page 34: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Hiv

e +

HB

ase

+ H

ad

oo

p S

ecu

rity

• R

eg

ard

less o

f H

ive

’s o

wn

se

cu

rity

, fo

r H

ive

to

wo

rk o

n

se

cu

re H

ad

oo

p a

nd

HB

ase

, w

e s

ho

uld

:

– O

bta

in d

ele

ga

tio

n to

ke

ns fo

r H

ad

oo

p a

nd H

Ba

se jo

bs

– E

nsu

re to

ob

ey th

e s

tora

ge

le

ve

l (H

DF

S, H

Ba

se

) p

erm

issio

n c

he

cks

– In

Hiv

eS

erv

er

de

plo

ym

en

ts, a

uth

en

tica

te a

nd

im

pe

rso

na

te th

e u

se

r

• D

ele

ga

tio

n to

ke

ns fo

r H

ad

oo

p a

re a

lre

ad

y w

ork

ing

• O

bta

inin

g H

Ba

se

de

leg

atio

n to

ke

ns a

re r

ele

ase

d in

Hiv

e

0.9

.0

Page 3

5

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 35: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Fu

ture

of H

ive

+ H

Ba

se

• Im

pro

ve

on

sch

em

a / typ

e m

ap

pin

g

• F

ully

se

cu

re H

ive

de

plo

ym

en

t o

ptio

ns

• H

Ba

se

bu

lk im

po

rt im

pro

ve

me

nts

• S

ort

ab

le s

ign

ed

nu

me

ric typ

es in

HB

ase

• F

ilte

r p

ush

do

wn

: n

on

ke

y c

olu

mn

filt

ers

• H

ive

ra

nd

om

acce

ss s

up

po

rt fo

r H

Ba

se

– h

ttp

s://c

wik

i.a

pa

ch

e.o

rg/H

CA

TA

LO

G/r

an

do

m-a

cce

ss-

fra

me

wo

rk.h

tml

Page 3

6

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 36: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Re

fere

nce

s

• S

ec

uri

ty

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HIV

E-2

76

4

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HB

AS

E-5

37

1

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HC

ATA

LO

G-2

45

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HC

ATA

LO

G-2

60

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HC

ATA

LO

G-2

44

– h

ttp

s://c

wik

i.a

pa

ch

e.o

rg/c

on

flu

en

ce

/dis

pla

y/H

CA

TA

LO

G/H

ca

t+S

ecu

rity

+D

esig

n

• Ty

pe

ma

pp

ing

/ F

ilte

r P

us

hd

ow

n

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HIV

E-1

63

4

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HIV

E-1

22

6

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HIV

E-1

64

3

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HIV

E-2

81

5

– h

ttp

s://issu

es.a

pa

ch

e.o

rg/jira/b

row

se

/HIV

E-1

64

3

P

age 3

7

Arc

hitecting the F

utu

re o

f B

ig D

ata

Page 37: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

Oth

er

Re

so

urc

es

Page 3

8

© H

ort

onw

ork

s Inc. 2012

• H

ad

oo

p S

um

mit

– J

un

e 1

3-1

4

– S

an

Jo

se

, C

alif

orn

ia

– w

ww

.Ha

do

op

su

mm

it.o

rg

• H

ad

oo

p T

rain

ing

an

d C

ert

ific

ati

on

– D

eve

lop

ing

So

lutio

ns U

sin

g A

pa

ch

e H

ad

oo

p

– A

dm

inis

terin

g A

pa

ch

e H

ad

oo

p

– O

nlin

e c

lasse

s a

va

ilab

le U

S, In

dia

, E

ME

A

– h

ttp

://h

ort

on

wo

rks.c

om

/tra

inin

g/

Page 38: Page 1 Integration of Apache Hive and HBaseeecs.csuohio.edu/~sschung/cis612/Integration-of-Apache-Hive-and-H… · © Hortonworks Inc. 2011 Integration of Apache Hive and HBase Enis

© H

ort

onw

ork

s Inc. 2011

Th

an

ks

Qu

estio

ns?

Arc

hitecting the F

utu

re o

f B

ig D

ata

P

age 3

9