Top Banner
Apache Kudu A Closer Look at By Andriy Zabavskyy Mar 2017
63

A Closer Look at Apache Kudu

Mar 20, 2017

Download

Software

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Closer Look at Apache Kudu

Apache KuduA Closer Look at

By Andriy Zabavskyy Mar 2017

Page 2: A Closer Look at Apache Kudu

A species of antelope from BigData Zoo

Page 3: A Closer Look at Apache Kudu

Why Kudu

Why Kudu

Page 4: A Closer Look at Apache Kudu

Analytics on Hadoop before Kudu

Fast Scans Fast Random Access

Page 5: A Closer Look at Apache Kudu

Weak side of combining Parquet and HBase

• Complex code to manage the flow and synchronization of data between the two systems.

• Manage consistent backups, security policies, and monitoring across multiple distinct systems.

Page 6: A Closer Look at Apache Kudu

Lambda Architecture Challenges

• In the real world, systems often need to accommodate • Late-arriving data• Corrections on past records• Privacy-related deletions on data that has already been

migrated to the immutable store.

Page 7: A Closer Look at Apache Kudu

Happy Medium• High Throughput. Goal within 2x Impala• Low Latency for random read/write. Goal 1ms on SSD• SQL and NoSQL style API

Fast Scans Fast Random Access

Page 8: A Closer Look at Apache Kudu

Why Kudu

Data Model

Page 9: A Closer Look at Apache Kudu

Tables, Schemas, Keys

• Kudu is a storage system for tables of structured data

• Schema consisting of a finite number of columns

• Each such column has a name, type:• Boolean, Integers, Unixtime_Micros, • Floating, String, Binary

Page 10: A Closer Look at Apache Kudu

Keys

• Some ordered subset of those columns are specified to be the table’s primary key

• The primary key:• enforces a uniqueness constraint • acts as the sole index by which rows may be efficiently

updated or deleted

Page 11: A Closer Look at Apache Kudu

Write Operations

• User mutates the table using Insert, Update, and Delete APIs • Note: a primary key must be fully specified• Java, C++, Python API

• No multi-row transactional APIs:• each mutation conceptually executes as its own

transaction, • despite being automatically batched with other mutations

for better performance.

Page 12: A Closer Look at Apache Kudu

Read Operations

• Scan operation:• any number of predicates to filter the results• two types of predicates:

• comparisons between a column and a constant value, • and composite primary key ranges.

• An user may specify a projection for a scan. • A projection consists of a subset of columns to be

retrieved.

Page 13: A Closer Look at Apache Kudu

Read/Write Python API Sample

Page 14: A Closer Look at Apache Kudu

Why Kudu

Storage Layout

Page 15: A Closer Look at Apache Kudu

Storage Layout Goals

• Fast columnar scans• best-of-breed immutable data formats

such as Parquet• efficiently encoded columnar data files.

• Low-latency random updates• O(lg n) lookup complexity for random

access

• Consistency of performance• Majority of users are willing

predictability

Page 16: A Closer Look at Apache Kudu

MemRowSet

• In-memory concurrent B-tree• No removal from tree – MVCC

records instead• No in-place updates – only

modifications without changing the value size

• Link together leaf nodes for sequential scans

• Row-wise layout

� -

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

-

Page 17: A Closer Look at Apache Kudu

DiskRowSet

� �

�����

�����

�����

�����

����� ����� ����� ����� �����

����� ����� ����� ����� �����

����� ����� ����� ����� �����

����� ����� ����� ����� �����

����� ����� ����� ����� �����

����� ����� ����� ����� �����

�����

�����

�����

�����

�����

�����

�����

�����

�����

�����

�����

�����

�����

����

�����

�����

�����

�����

����� ����� ����� ����� �����

����� ����� ����� ����� �����

����� ����� ����� ����� �����

����� ����� ����� ����� �����

����� ����� ����� ����� �����

����� ����� ����� ����� �����

�����

�����

�����

�����

�����

�����

�����

�����

�����

�����

�����

�����

�����

����

• Column-organized• Each column is written to

disk in a single contiguous block of data.

• The column itself is subdivided into small pages

• Granular random reads, and

• An embedded B-tree index

Page 18: A Closer Look at Apache Kudu

Deltas

• A DeltaMemStore is a concurrent B-tree which shares the implementation of MemRowSets

• A DeltaMemStore flushes into a DeltaFile

• A DeltaFile is a simple binary column

Page 19: A Closer Look at Apache Kudu

Insert Path

• Each DiskRowSet stores a Bloom filter of the set of keys present

• Each DiskRowSet, we store the minimum and maximum primary key,

Page 20: A Closer Look at Apache Kudu

Read Path

• Converts the key range predicate into a row offset range predicate

• Performs the scan one column at a time • Seeks the target column to the correct row offset • Consult the delta stores to see if any later updates

Page 21: A Closer Look at Apache Kudu

Delta Compaction

• Background maintenance manager periodically

• scans DiskRowSets to find any cases where a large number of deltas have accumulated, and

• schedules a delta compaction operation which merges those deltas back into the base data columns.

Page 22: A Closer Look at Apache Kudu

RowSet Compaction

• A key-based merge of two or more DiskRowSets• The output is written back to new DiskRowSets rolling every

32 MB• RowSet compaction has two goals:

• We take this opportunity to remove deleted rows. • This process reduces the number of DiskRowSets that

overlap in key range

Page 23: A Closer Look at Apache Kudu

Kudu Trade-Offs

• Random Updates will be slower• Kudu requires key-lookup before update, bloom lookup

before insert

• Single Row Seek may be slower• Columnar Design is optimized for scans• Especially slow at reading a row with many recent

updates

Page 24: A Closer Look at Apache Kudu

Why Kudu

Cluster Architecture

Page 25: A Closer Look at Apache Kudu

Cluster Roles

��

� �

��

��

� �

� �

� �

��

�� �

��

� �

� �

� �

� �

� �

� � � �

� �

� �

Page 26: A Closer Look at Apache Kudu

The Kudu Master

Kudu’s central master process has several key responsibilities: • A catalog manager

• keeping track of which tables and tablets exist, as well as their schemas, desired replication levels, and other metadata

• A cluster coordinator• keeping track of which servers in the cluster are alive and

coordinating redistribution of data

• A tablet directory• keeping track of which tablet servers are hosting replicas of

each tablet

Page 27: A Closer Look at Apache Kudu

Why Kudu

Cluster Architecture

Partitioning

Page 28: A Closer Look at Apache Kudu

Partitioning

• Tables in Kudu are horizontally partitioned.

• Kudu, like BigTable, calls these partitions tablets

• Kudu supports a flexible array of partitioning schemes

Page 29: A Closer Look at Apache Kudu

Partitioning: Hash

Img source: https://github.com/cloudera/kudu/blob/master/docs/images/hash-partitioning-example.png

Page 30: A Closer Look at Apache Kudu

Partitioning: Range

Img source: https://github.com/cloudera/kudu/blob/master/docs/images/r ange-partitioning-example.png

Page 31: A Closer Look at Apache Kudu

Partitioning: Hash plus Range

Img source: https://github.com/cloudera/kudu/blob/master/docs/images/hash-range-par tition ing-example.png

Page 32: A Closer Look at Apache Kudu

Partitioning Recommendations

• Bigger tables, like fact tables are recommended to partition in a way so that 1 tablet would contain about 1GB of data

• Do not partition small tables like dimensions• Note: Impala doesn’t allow skipping the partitioning

clause, so you need to specify the 1 range partition explicitly:

Page 33: A Closer Look at Apache Kudu

Dimension Table with One Partition

���

Page 34: A Closer Look at Apache Kudu

Why Kudu

Cluster Architecture

Replication

Page 35: A Closer Look at Apache Kudu

Replication Approach

• Kudu uses the Leader/Follower or Master-Slave replication

• Kudu employs the Raft[25] consensus algorithm to replicate its tablets• If a majority of replicas accept the write and log it to

their own local write-ahead logs, • the write is considered durably replicated and thus

can be committed on all replicas

Page 36: A Closer Look at Apache Kudu

Raft: Replicated State Machine

• Replicated log ensures state machines execute same commands in same order• Consensus module ensures proper log replication• System makes progress as long as any majority of servers are up• Visualization: https://raft.github.io/raftscope/index.html

Page 37: A Closer Look at Apache Kudu

Consistency Model

• Kudu provides clients the choice between two consistency modes for reads(scans):• READ_AT_SNAPSHOT• READ_LATEST

Page 38: A Closer Look at Apache Kudu

READ_LATEST consistency

• Monotonic reads are guaranteed(?) Read-your-writes is not• Corresponds to "Read Committed" ACID Isolation mode:• This is the default mode.

Page 39: A Closer Look at Apache Kudu

READ_LATEST consistency

• The server will always return committed writes at the time the request was received.

• This type of read is not repeatable.

Page 40: A Closer Look at Apache Kudu

READ_AT_SNAPSHOT Consistency

• Guarantees read-your-writes consistency from a single client

• Corresponds "Repeatable Read” ACID Isolation mode.

Page 41: A Closer Look at Apache Kudu

READ_AT_SNAPSHOT Consistency

• The server attempts to perform a read at the provided timestamp

• In this mode reads are repeatable• at the expense of waiting for in-flight transactions whose

timestamp is lower than the snapshot's timestamp to complete

Page 42: A Closer Look at Apache Kudu

Write Consistency

• Writes to a single tablet are always internally consistent• By default, Kudu does not provide an external consistency

guarantee. • However, for users who require a stronger guarantee, Kudu

offers the option to manually propagate timestamps between clients

Page 43: A Closer Look at Apache Kudu

Replication Factor Limitation

• Since Kudu 1.2.0:• The replication factor of tables is now limited to a

maximum of 7• In addition, it is no longer allowed to create a table with an

even replication factor

Page 44: A Closer Look at Apache Kudu

Kudu and CAP Theorem

• Kudu is a CP type of storage engine.

• Writing to a tablet will be delayed if the server that hosts that tablet’s leader replica fails

• Kudu gains the following properties by using Raft consensus:• Leader elections are fast• Follower replicas don’t allow

writes, but they do allow reads

Page 45: A Closer Look at Apache Kudu

Why Kudu

Kudu Applicability

Page 46: A Closer Look at Apache Kudu

Applications for which Kudu is a viable

• Reporting applications where new data must be immediately available for end users

• Time-series applications with • queries across large amounts of historic data• granular queries about an individual entity

• Applications that use predictive models to make real-time decisions

Page 47: A Closer Look at Apache Kudu

Why Kudu

Streaming Analytics

Case Study

Page 48: A Closer Look at Apache Kudu

Business Case

• A leader in health care compliance consulting and technology-driven managed services

• Cloud-based multi-services platform

• It offers • enhanced data security and

scalability, • operational managed services,

and access to business information

http://ihealthone.com /wp-c ontent/uploads/2016/12/Healthcare_Complianc e_Cons ultants-495x400.jpg

Page 49: A Closer Look at Apache Kudu

ETL ApproachKey Points:

• Leverage Confluent platform with Schema Registry

• Apply configuration based approach:• Avro Schema in Schema Registry for

Input Schema• Impala Kudu SQL scripts for Target

Schema

• Stick to Python App as primary ETL code, but extend:• Develop new abstractions to work

with mapping rules

• Streaming processing for both facts and dimensions

Cons:

• Scaling needs extra effortsData Flow

AnalyticsDWH

EventTopics

ETL Code

Configuration

InputSchema

MappingRules

TargetSchema

Other Configurations

Page 50: A Closer Look at Apache Kudu

Stream ETL using Pipeline Architecture

Cache Manager

Mapper/ Flattener

Types Adjuster

Data Enricher DB SinkerData

Reader

Configuration

Pipeline Modules:• Data Reader: reads data from source DB• Mapper/Flattener: flatten JSON treelike structure into flat one

and maps the field names to target ones• Types Adjuster: adjusts/converts data types properly• Data Enricher: enriches the data structure with new data:

• Generates surrogate key• Looks up for the data from target DB(using cache)

• DB Sinker: writes data into target DBOther Modules:• Cache Manager: manages the cache with dimension data

Page 51: A Closer Look at Apache Kudu

Why Kudu

Key Types Benchmark

Page 52: A Closer Look at Apache Kudu

Kudu Numeric vs String Keys• Reason:

• Generating surrogate numeric keys adds extra processing step and complexity to the overall ETL process

• Sample Schema:• Dimension:

• Promotion dimension with 1000 unique members, 30 categories

• Products dimension with 50 000 unique members, 300 categories

• Facts• Fact table containing the references to the 2 dimension

above with 1 million of rows• Fact table containing the references to the 2 dimension

above with 100 million of rows

Page 53: A Closer Look at Apache Kudu

Benchmark Result

Page 54: A Closer Look at Apache Kudu

Why Kudu

Lessons Learnt

Page 55: A Closer Look at Apache Kudu

Pain Points

• Often releases with many changes• Data types Limitations (especially in Python Lib, Impala)• Lack of Sequences/Constraints• Lack of Multi-Row transactions

Page 56: A Closer Look at Apache Kudu

Limitations

• Not recommended more than 50 columns• Immutable primary keys• Non-alterable Primary Key, Partitioning, Column Types• Partitions splitable

Page 57: A Closer Look at Apache Kudu

Modeling Recommendations: Star Schema

Dimensions :• Replication factor equal to

number of nodes in a cluster• 1 Tablet per dimension

Facts:• Aim for as many tablets as you

have cores in the cluster

Page 58: A Closer Look at Apache Kudu

Why Kudu

What Kudu is Not

Page 59: A Closer Look at Apache Kudu

What Kudu is Not

• Not a SQL interface itself• It’s just the storage layer – you should use Impala or

SparkSQL

• Not an application that runs on HDFS• It’s an alternative, native Hadoop storage engine

• Not a replacement for HDFS or Hbase• Select the right storage for the right use case• Cloudera will support and invest in all three

Page 60: A Closer Look at Apache Kudu

Why Kudu

Kudu vs MPPData Warehouse

Page 61: A Closer Look at Apache Kudu

Kudu vs MPP Data Warehouses

In Common:• Fast analytics queries via SQL • Ability to insert, update, delete data

Differences:

üFaster streaming insertsüImproved Hadoop integration

oSlower batch insertsoNo transactional data loading, multi-row transactions,

indexing

Page 62: A Closer Look at Apache Kudu

Useful resources

• Community, Downloads, VM:• https://kudu.apache.org

• Whitepaper:• http://kudu.apache.org/kudu.pdf

• Slack channel:• https://getkudu-slack.herokuapp.com

Page 63: A Closer Look at Apache Kudu

USA HQToll Free: 866-687-3588 Tel: +1-512-516-8880

Ukraine HQTel: +380-32-240-9090

BulgariaTel: +359-2-902-3760

GermanyTel: +49-69-2602-5857

NetherlandsTel: +31-20-262-33-23

PolandTel: +48-71-382-2800

UKTel: +44-207-544-8414

[email protected]

WEBSITE:www.softserveinc.com

Questions ?