Top Banner
Phoenix James Taylor @JamesPlusPlus http://phoenix-hbase.blogspot.com/ We put the SQL back in NoSQL https://github.com/forcedotcom/phoenix
107

@JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Jan 22, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix James Taylor @JamesPlusPlus http://phoenix-hbase.blogspot.com/

We put the SQL back in NoSQL https://github.com/forcedotcom/phoenix

Page 2: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Agenda

Completed

l What/why HBase?

Page 3: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Agenda

Completed

l What/why HBase? l What/why Phoenix?

Page 4: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Agenda

Completed

l What/why HBase? l What/why Phoenix? l How does Phoenix work?

Page 5: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Agenda

Completed

l What/why HBase? l What/why Phoenix? l How does Phoenix work? l Demo

Page 6: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Agenda

Completed

l What/why HBase? l What/why Phoenix? l How does Phoenix work? l Demo l Roadmap

Page 7: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Agenda

Completed

l What/why HBase? l What/why Phoenix? l How does Phoenix work? l Demo l Roadmap l Q&A

Page 8: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is HBase?

Completed

l Developed as part of Apache Hadoop

Page 9: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is HBase?

Completed

l Developed as part of Apache Hadoop l Runs on top of HDFS

Page 10: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is HBase?

Completed

l Developed as part of Apache Hadoop l Runs on top of HDFS l Key/value store

Page 11: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is HBase?

Completed

l Developed as part of Apache Hadoop l Runs on top of HDFS l Key/value store

Map

Page 12: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is HBase?

Completed

l Developed as part of Apache Hadoop l Runs on top of HDFS l Key/value store

Map

Distributed

Page 13: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is HBase?

Completed

l Developed as part of Apache Hadoop l Runs on top of HDFS l Key/value store

Map

Distributed

Sparse

Page 14: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is HBase?

Completed

l Developed as part of Apache Hadoop l Runs on top of HDFS l Key/value store

Map Sorted

Distributed

Sparse

Page 15: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is HBase?

Completed

l Developed as part of Apache Hadoop l Runs on top of HDFS l Key/value store

Map Sorted

Distributed Consistent

Sparse

Page 16: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is HBase?

Completed

l Developed as part of Apache Hadoop l Runs on top of HDFS l Key/value store

Map Sorted

Distributed Consistent

Sparse Multidimensional

Page 17: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Cluster Architecture

Page 18: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Sharding

Page 19: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use HBase?

Completed

l If you have lots of data

Page 20: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use HBase?

Completed

l If you have lots of data l Scales linearly

Page 21: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use HBase?

Completed

l If you have lots of data l Scales linearly l Shards automatically

Page 22: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use HBase?

Completed

l If you have lots of data l Scales linearly l Shards automatically

l If you can live without transactions

Page 23: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use HBase?

Completed

l If you have lots of data l Scales linearly l Shards automatically

l If you can live without transactions l If your data changes

Page 24: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use HBase?

Completed

l If you have lots of data l Scales linearly l Shards automatically

l If you can live without transactions l If your data changes l If you need strict consistency

Page 25: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is Phoenix?

Completed

Page 26: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is Phoenix?

Completed

l SQL skin for HBase

Page 27: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is Phoenix?

Completed

l SQL skin for HBase l Alternate client API

Page 28: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is Phoenix?

Completed

l SQL skin for HBase l Alternate client API l Embedded JDBC driver

Page 29: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is Phoenix?

Completed

l SQL skin for HBase l Alternate client API l Embedded JDBC driver l Runs at HBase native speed

Page 30: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is Phoenix?

Completed

l SQL skin for HBase l Alternate client API l Embedded JDBC driver l Runs at HBase native speed l Compiles SQL into native HBase calls

Page 31: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

What is Phoenix?

Completed

l SQL skin for HBase l Alternate client API l Embedded JDBC driver l Runs at HBase native speed l Compiles SQL into native HBase calls l So you don’t have to!

Page 32: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Cluster Architecture

Page 33: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Cluster Architecture

Phoenix

Page 34: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Cluster Architecture

Phoenix

Phoenix

Page 35: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Performance

Page 36: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use Phoenix?

Page 37: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use Phoenix?

Completed

l Give folks an API they already know

Page 38: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use Phoenix?

Completed

l Give folks an API they already know l Reduce the amount of code needed

Page 39: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use Phoenix?

Completed

l Give folks an API they already know l Reduce the amount of code needed

SELECT TRUNC(date,'DAY’), AVG(cpu) FROM web_stat WHERE domain LIKE 'Salesforce%’ GROUP BY TRUNC(date,'DAY’)

Page 40: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use Phoenix?

Completed

l Give folks an API they already know l Reduce the amount of code needed l Perform optimizations transparently

Page 41: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use Phoenix?

Completed

l Give folks an API they already know l Reduce the amount of code needed l Perform optimizations transparently

l Aggregation l Skip Scan l Secondary indexing (soon!)

Page 42: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use Phoenix?

Completed

l Give folks an API they already know l Reduce the amount of code needed l Perform optimizations transparently l Leverage existing tooling

Page 43: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Why Use Phoenix?

Completed

l Give folks an API they already know l Reduce the amount of code needed l Perform optimizations transparently l Leverage existing tooling

l SQL client/terminal l OLAP engine

Page 44: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

How Does Phoenix Work?

Completed

l Overlays on top of HBase Data Model l Keeps Versioned Schema Respository l Query Processor

Page 45: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Data Model

HBase Table

Phoenix maps HBase data model to the relational world

Page 46: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Data Model

HBase Table Column Family A Column Family B

Phoenix maps HBase data model to the relational world

Page 47: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3

Phoenix maps HBase data model to the relational world

Page 48: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Phoenix maps HBase data model to the relational world

Page 49: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Phoenix maps HBase data model to the relational world

Page 50: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix maps HBase data model to the relational world

Page 51: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix maps HBase data model to the relational world

Page 52: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix maps HBase data model to the relational world

Page 53: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix maps HBase data model to the relational world

Multiple Versions

Page 54: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix maps HBase data model to the relational world Phoenix Table

Page 55: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix maps HBase data model to the relational world Phoenix Table

Key Value Columns

Page 56: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Data Model

HBase Table Column Family A Column Family B

Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value

Row Key 2 Value Value

Row Key 3 Value

Phoenix maps HBase data model to the relational world Phoenix Table

Key Value Columns Row Key Columns

Page 57: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Metadata

Completed

l  Stored in a Phoenix HBase table

Page 58: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Metadata

Completed

l  Stored in a Phoenix HBase table l  SYSTEM.TABLE

Page 59: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Metadata

Completed

l  Stored in a Phoenix HBase table l  Updated through DDL commands

Page 60: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Metadata

Completed

l  Stored in a Phoenix HBase table l  Updated through DDL commands

l  CREATE TABLE l  ALTER TABLE l  DROP TABLE l  CREATE INDEX l  DROP INDEX

Page 61: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Metadata

Completed

l  Stored in a Phoenix HBase table l  Updated through DDL commands l  Keeps older versions as schema evolves

Page 62: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Metadata

Completed

l  Stored in a Phoenix HBase table l  Updated through DDL commands l  Keeps older versions as schema evolves l  Correlates timestamps between schema and data

Page 63: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Metadata

Completed

l  Stored in a Phoenix HBase table l  Updated through DDL commands l  Keeps older versions as schema evolves l  Correlates timestamps between schema and data

l  Flashback queries use schema that was in-place then

Page 64: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Metadata

Completed

l  Stored in a Phoenix HBase table l  Updated through DDL commands l  Keeps older versions as schema evolves l  Correlates timestamps between schema and data l  Accessible via JDBC metadata APIs

Page 65: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Metadata

Completed

l  Stored in a Phoenix HBase table l  Updated through DDL commands l  Keeps older versions as schema evolves l  Correlates timestamps between schema and data l  Accessible via JDBC metadata APIs

l  java.sql.DatabaseMetaData l  Through Phoenix queries!

Page 66: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Example

Row Key

SERVER METRICS

HOST VARCHAR DATE DATE RESPONSE_TIME INTEGER GC_TIME INTEGER CPU_TIME INTEGER IO_TIME INTEGER …

Over metrics data for clusters of servers with a schema like this:

Page 67: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Example Over metrics data for clusters of servers with a schema like this:

Key Values

SERVER METRICS

HOST VARCHAR DATE DATE RESPONSE_TIME INTEGER GC_TIME INTEGER CPU_TIME INTEGER IO_TIME INTEGER …

Page 68: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

With 90 days of data that looks like this:

SERVER METRICS HOST DATE RESPONSE_TIME GC_TIME

sf1.s1 Jun 5 10:10:10.234 1234 sf1.s1 Jun 5 11:18:28.456 8012 … sf3.s1 Jun 5 10:10:10.234 2345 sf3.s1 Jun 6 12:46:19.123 2340 sf7.s9 Jun 4 08:23:23.456 5002 1234 …

Example

Page 69: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Example Walk through query processing for three scenarios

Page 70: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Example Walk through query processing for three scenarios

1.  Chart Response Time Per Cluster

Page 71: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Example Walk through query processing for three scenarios

1.  Chart Response Time Per Cluster

2.  Identify 5 Longest GC Times

Page 72: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Example Walk through query processing for three scenarios

1.  Chart Response Time Per Cluster

2.  Identify 5 Longest GC Times

3.  Identify 5 Longest GC Times again and again

Page 73: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 1 Chart Response Time Per Cluster

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Page 74: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 1 Chart Response Time Per Cluster

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Page 75: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 1 Chart Response Time Per Cluster

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Page 76: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 1 Chart Response Time Per Cluster

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Page 77: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 1 Chart Response Time Per Cluster

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Page 78: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 1: Client Identify Row Key Ranges from Query

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Row Key Ranges HOST DATE

Page 79: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 1: Client Identify Row Key Ranges from Query

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Row Key Ranges HOST DATE

Page 80: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 1: Client Identify Row Key Ranges from Query

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Row Key Ranges HOST DATE

Page 81: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 1: Client Identify Row Key Ranges from Query

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Row Key Ranges HOST DATE sf1

Page 82: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 1: Client Identify Row Key Ranges from Query

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Row Key Ranges HOST DATE sf1 sf3

Page 83: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 1: Client Identify Row Key Ranges from Query

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Row Key Ranges HOST DATE sf1 sf3 sf7

Page 84: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 1: Client Identify Row Key Ranges from Query

Completed

SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)

Row Key Ranges HOST DATE sf1 t1 - * sf3 sf7

Page 85: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 2: Client Overlay Row Key Ranges with Regions

Completed

R1

R2

R3

R4

sf1

sf4

sf6

sf1 sf3

sf7

Page 86: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 3: Client Execute Parallel Scans

Completed

R1

R2

R3

R4

sf1

sf4

sf6

sf1

sf3

sf7

scan1

scan3

scan2

Page 87: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 4: Server Filter using Skip Scan

Completed

sf1.s1 t0 SKIP

Page 88: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 4: Server Filter using Skip Scan

Completed

sf1.s1 t1 INCLUDE

Page 89: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 4: Server Filter using Skip Scan

Completed

sf1.s2 t0 SKIP

Page 90: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 4: Server Filter using Skip Scan

Completed sf1.s2 t1 INCLUDE

Page 91: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 4: Server Filter using Skip Scan

sf1.s3 t0 SKIP

Page 92: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 4: Server Filter using Skip Scan

sf1.s3 t1 INCLUDE

Page 93: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

SERVER METRICS HOST DATE sf1.s1 Jun 2 10:10:10.234 sf1.s2 Jun 3 23:05:44.975 sf1.s2 Jun 9 08:10:32.147 sf1.s3 Jun 1 11:18:28.456 sf1.s3 Jun 3 22:03:22.142 sf1.s4 Jun 1 10:29:58.950 sf1.s4 Jun 2 14:55:34.104 sf1.s4 Jun 3 12:46:19.123 sf1.s5 Jun 8 08:23:23.456 sf1.s6 Jun 1 10:31:10.234

Step 5: Server Intercept Scan in Coprocessor

SERVER METRICS HOST DATE AGG sf1 Jun 1 … sf1 Jun 2 … sf1 Jun 3 … sf1 Jun 8 … sf1 Jun 9 …

Page 94: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Step 6: Client Perform Final Merge Sort

Completed

R1

R2

R3

R4

scan1

scan3

scan2

SERVER METRICS HOST DATE AGG sf1 Jun 5 … sf1 Jun 9 … sf3 Jun 1 … sf3 Jun 2 … sf7 Jun 1 … sf7 Jun 8 …

Page 95: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 2 Find 5 Longest GC Times

Completed

SELECT host, date, gc_time FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) ORDER BY gc_time DESC LIMIT 5

Page 96: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 2 Find 5 Longest GC Times

•  Same client parallelization and server skip scan filtering

Page 97: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 2 Find 5 Longest GC Times

Completed

•  Same client parallelization and server skip scan filtering •  Server holds 5 longest GC_TIME value for each scan

R1

SERVER METRICS HOST DATE GC_TIME sf1.s1 Jun 2 10:10:10.234 22123

sf1.s1 Jun 3 23:05:44.975 19876

sf1.s1 Jun 9 08:10:32.147 11345

sf1.s2 Jun 1 11:18:28.456 10234

sf1.s2 Jun 3 22:03:22.142 10111

Page 98: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

SERVER METRICS HOST DATE GC_TIME sf1.s1 Jun 2 10:10:10.234 22123

sf1.s1 Jun 3 23:05:44.975 19876

sf1.s1 Jun 9 08:10:32.147 11345

sf1.s2 Jun 1 11:18:28.456 10234

sf1.s2 Jun 3 22:03:22.142 10111

Scenario 2 Find 5 Longest GC Times

•  Same client parallelization and server skip scan filtering •  Server holds 5 longest GC_TIME value for each scan •  Client performs final merge sort among parallel scans

Scan1

Scan2

Scan3

Page 99: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 3 Find 5 Longest GC Times

Completed

CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time)

Page 100: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 3 Find 5 Longest GC Times

Completed

CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time)

Page 101: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 3 Find 5 Longest GC Times

Completed

CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time)

Page 102: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 3 Find 5 Longest GC Times

Completed

CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time)

Row Key

GC_TIME_INDEX GC_TIME INTEGER DATE DATE HOST VARCHAR RESPONSE_TIME INTEGER

Page 103: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 3 Find 5 Longest GC Times

Completed

CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time)

Key Value

GC_TIME_INDEX GC_TIME INTEGER DATE DATE HOST VARCHAR RESPONSE_TIME INTEGER

Page 104: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Scenario 3 Find 5 Longest GC Times

Completed

SELECT host, date, gc_time FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) ORDER BY gc_time DESC LIMIT 5

Page 105: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Demo

Completed

l Phoenix Stock Analyzer l Fortune 500 companies l 10 years of historical stock prices l Demonstrates Skip Scan in action l Running locally on my single node laptop cluster

Page 106: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Phoenix Roadmap

Completed

l  Secondary Indexing l  Count distinct and percentile l  Derived tables l  Hash Joins l  Apache Drill integration l  Cost-based query optimizer l  OLAP extensions l  Transactions

Page 107: @JamesPlusPlus What is HBase? Completed ! Developed as part of Apache Hadoop ! Runs on top of HDFS ! Key/value store Map Sorted Distributed Consistent Sparse What is HBase? Completed

Thank you! Questions/comments?