Top Banner
Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017
27

Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

Apr 21, 2018

Download

Documents

truongdan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

Efficient Big Data Explorationwith SQL and Apache Drill

Jonatan Kazmierczak

Java User Group Switzerland, Zürich, 07.02.2017

Page 2: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

-- SQL and Apache Drill -- Jonatan Kazmierczak -- JUG CH 2017 --

About authorJonatan.Kazmierczak (at) gmail (dot) com

senior consultant at Atos Consulting Switzerland

creator of Class Visualizer

top rated participant in contests in programming and data science:HackerRank, TopCoder, Google Code Jam

working with Java and SQL for 20 years

Page 3: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

-- SQL and Apache Drill -- Jonatan Kazmierczak -- JUG CH 2017 --

About author – cont.first rank in Java

www.hackerrank.com/leaderboard/java/practice/level/1/filter/country=Switzerland/page/1

www.hackerrank.com/jonatan_k

Page 4: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

-- SQL and Apache Drill -- Jonatan Kazmierczak -- JUG CH 2017 --

Agenda

Introduction

Demo: starting with Drill

Technical details

Demo: deep dive into Drill

Summary, Q & A

Page 5: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

Introduction

Page 6: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

-- SQL and Apache Drill -- Jonatan Kazmierczak -- JUG CH 2017 --

Computers – beforewww.amibay.com/showthread.php?71410-Atari-65XE-BOX-XC12-BOX-2-Quickshots

Page 7: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

-- SQL and Apache Drill -- Jonatan Kazmierczak -- JUG CH 2017 --

Data – before

Page 8: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

-- SQL and Apache Drill -- Jonatan Kazmierczak -- JUG CH 2017 --

Data – now

Page 9: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

-- SQL and Apache Drill -- Jonatan Kazmierczak -- JUG CH 2017 --

Computers – now

32GB RAM3TB RAM

Page 10: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

-- SQL and Apache Drill -- Jonatan Kazmierczak -- JUG CH 2017 --

What is Apache Drill ?

low latency distributed schema-free SQL query engine for large-scale datasets

designed to scale to several thousands of nodes and query petabytes of data at the speeds required by BI/Analytics environments

Page 11: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

Demo: starting with Drill

Page 12: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

Technical details

Page 13: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

-- SQL and Apache Drill -- Jonatan Kazmierczak -- JUG CH 2017 --

Basic info

Website drill.apache.org

Current version 1.9.0

Query language SQL:2003

Interfaces shell, web console, JDBC/ODBC, REST API, Java API, C++ API

Page 14: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

-- SQL and Apache Drill -- Jonatan Kazmierczak -- JUG CH 2017 --

Supported data sources and formats

RDBMS FSNoSQL

Page 15: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

-- SQL and Apache Drill -- Jonatan Kazmierczak -- JUG CH 2017 --

Features

Dynamic schema discovery

Flexible data model

In-memory data processing (whenever possible)

Extensible architecture

Distributed and embedded mode

Page 16: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

-- SQL and Apache Drill -- Jonatan Kazmierczak -- JUG CH 2017 --

Distributed setup

Node 1

Node 2

Node 3

ZooKeeper Drillbit

Drillbit

Drillbit

Client

Page 17: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

-- SQL and Apache Drill -- Jonatan Kazmierczak -- JUG CH 2017 --

Sample query

storage plugin

workspace

table / view / file / document

select * from dfs.demo.`countries.csv`

Page 18: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

-- SQL and Apache Drill -- Jonatan Kazmierczak -- JUG CH 2017 --

Config – storage plugins

Page 19: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

-- SQL and Apache Drill -- Jonatan Kazmierczak -- JUG CH 2017 --

Query execution

Drillbit

Client

Client

Foreman

Foreman

SQL Parser

SQL Parser

Optimizer

Optimizer

Parallelizer

Parallelizer

Executor

Executor

Storage Plugin

Storage Plugin

SQL query

parse SQL query

logical plan

optimize logical plan

physical plan

parallelize physical plan

execution tree (fragments)

execute fragments

fetch data

data

results

results

Page 20: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

-- SQL and Apache Drill -- Jonatan Kazmierczak -- JUG CH 2017 --

Drill insideover 0x2000 classes

Page 21: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

Demo: deep dive into Drill

Page 22: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

Summary

Page 23: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

-- SQL and Apache Drill -- Jonatan Kazmierczak -- JUG CH 2017 --

Advantages

Easy to start working with

Concept of SQL-on-Anything

Using standard SQL

Page 24: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

-- SQL and Apache Drill -- Jonatan Kazmierczak -- JUG CH 2017 --

Disadvantages

Partially implemented or unfinished features

Lacks in documentation

Page 25: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

-- SQL and Apache Drill -- Jonatan Kazmierczak -- JUG CH 2017 --

Use cases

Data exploration

Data transformation

BI / Data analytics

Applicable Not applicable

Page 26: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

Questions

Page 27: Efficient Big Data Exploration with SQL and Apache Drill · Efficient Big Data Exploration with SQL and Apache Drill Jonatan Kazmierczak Java User Group Switzerland, Zürich, 07.02.2017

Thank you

Jonatan.Kazmierczak (at) gmail (dot) com

Son-of-God.info