YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Big Fast SQL on OpenShift

Big Fast SQL on OpenShift

Kamil Bajda-Pawlikowski Co-founder / CTO @prestosql @starburstdata

OpenShift Commons 2019 @ San Francisco

Kyle Bader Principal Solutions Architect

Page 2: Big Fast SQL on OpenShift

2

Presto: SQL-on-Anything

Deploy Anywhere, Query Anything

Page 3: Big Fast SQL on OpenShift

Community

See more at our Wiki

Page 4: Big Fast SQL on OpenShift

Why Presto?

Community-driven open source project

High performance ANSI SQL engine• New Cost-Based Query Optimizer• Proven scalability• High concurrency

Separation of compute and storage• Scale storage and compute

independently• No ETL or data integration

necessary to get to insights• SQL-on-anything

No vendor lock-in• No Hadoop distro vendor lock-in• No storage engine vendor lock-in• No cloud vendor lock-in

Page 5: Big Fast SQL on OpenShift

Enterprise edition

© 2019 5

Founded by Presto committers:● Over 4 years of contributions to Presto● Presto distro for on-prem and cloud env● Supporting large customers in production● Enterprise subscription add-ons (ODBC,

Ranger, Sentry, Oracle, Teradata, K8S)

Notable features contributed:● ANSI SQL syntax enhancements● Execution engine improvements● Security integrations● Spill to disk● Cost-Based Optimizer

https://www.starburstdata.com/presto-enterprise/

Page 6: Big Fast SQL on OpenShift

Built for PerformanceQuery Execution Engine:

● MPP-style pipelined in-memory execution● Columnar and vectorized data processing● Runtime query bytecode compilation● Memory efficient data structures● Multi-threaded multi-core execution● Optimized readers for columnar formats (ORC and Parquet)● Predicate and column projection pushdown● Now also Cost-Based Optimizer

Page 7: Big Fast SQL on OpenShift

Join reordering with filter

Page 8: Big Fast SQL on OpenShift

CBO off

CBO on

https://www.starburstdata.com/presto-benchmarks/

Benchmark results

Page 9: Big Fast SQL on OpenShift

© 2019

Page 10: Big Fast SQL on OpenShift

Administrative challenges

● Configuring and managing clusters● Autotuning properties based on the hardware provisioned● High Availability for Presto Coordinator● Scaling cluster elastically based on query load● Gracefully decommissioning Presto Workers to avoid killing queries● Monitoring of hardware and software layers

https://www.starburstdata.com/technical-blog/presto-on-kubernetes/

Page 11: Big Fast SQL on OpenShift

https://docs.starburstdata.com/latest/kubernetes.html

Presto on OpenShift

Presto WorkerPod

Presto WorkerPod

11

Presto CoordinatorPod

Presto WorkerPod

Horizontal Pod Autoscaler (HPA)

Presto OperatorK8s Operator

PrestoService

Hive Metastore ServicePod

Hadoop / Hive

RDBMS

Page 12: Big Fast SQL on OpenShift

Now available in OpenShift Catalog!

Page 13: Big Fast SQL on OpenShift

● Massively scalable○ 10’s-100’s of PBs○ Billions of objects○ 100’s of gigabits

● Erasure coding drives storage efficiency● High level of fidelity with S3 API● Open source software - LGPL 2.1 / 3

Page 14: Big Fast SQL on OpenShift

Red HatOpenShift Container Storage

● Rook-Ceph operator● Ceph for block (RWO), file (RWO/RWX), object (S3)● Noobaa for multi-cloud gateway

Page 15: Big Fast SQL on OpenShift

Hive connector configuration example (hive.properties)

connector.name=hive-hadoop2

hive.metastore.uri=thrift://metastore.example.com:9083

hive.s3.endpoint=s3.example.com

hive.s3.aws-access-key=ACCESS_KEY

hive.s3.aws-secret-key=SECRET_KEY

hive.s3.use-instance-credentials=false

hive.s3.staging-directory=/tmp

hive.s3.ssl.enabled=true

Page 16: Big Fast SQL on OpenShift

presto> CREATE SCHEMA hive.s3_export WITH (location =

's3://my_bucket/some/path');

presto> CREATE TABLE hive.s3_export.my_table

WITH (format = 'ORC')

AS <source query>;

Page 17: Big Fast SQL on OpenShift

Demo at ODSC West 12:30PM Thursday 10/31

Page 18: Big Fast SQL on OpenShift

Thank You!

18

Twitter: @starburstdata @prestosqlBlog: www.starburstdata.com/technical-blog/Newsletter: www.starburstdata.com/newsletter

© 2019


Related Documents