Top Banner
Big Fast SQL on OpenShift Kamil Bajda-Pawlikowski Co-founder / CTO @prestosql @starburstdata OpenShift Commons 2019 @ San Francisco Kyle Bader Principal Solutions Architect
18

Big Fast SQL on OpenShift

Dec 18, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big Fast SQL on OpenShift

Big Fast SQL on OpenShift

Kamil Bajda-Pawlikowski Co-founder / CTO @prestosql @starburstdata

OpenShift Commons 2019 @ San Francisco

Kyle Bader Principal Solutions Architect

Page 2: Big Fast SQL on OpenShift

2

Presto: SQL-on-Anything

Deploy Anywhere, Query Anything

Page 3: Big Fast SQL on OpenShift

Community

See more at our Wiki

Page 4: Big Fast SQL on OpenShift

Why Presto?

Community-driven open source project

High performance ANSI SQL engine• New Cost-Based Query Optimizer• Proven scalability• High concurrency

Separation of compute and storage• Scale storage and compute

independently• No ETL or data integration

necessary to get to insights• SQL-on-anything

No vendor lock-in• No Hadoop distro vendor lock-in• No storage engine vendor lock-in• No cloud vendor lock-in

Page 5: Big Fast SQL on OpenShift

Enterprise edition

© 2019 5

Founded by Presto committers:● Over 4 years of contributions to Presto● Presto distro for on-prem and cloud env● Supporting large customers in production● Enterprise subscription add-ons (ODBC,

Ranger, Sentry, Oracle, Teradata, K8S)

Notable features contributed:● ANSI SQL syntax enhancements● Execution engine improvements● Security integrations● Spill to disk● Cost-Based Optimizer

https://www.starburstdata.com/presto-enterprise/

Page 6: Big Fast SQL on OpenShift

Built for PerformanceQuery Execution Engine:

● MPP-style pipelined in-memory execution● Columnar and vectorized data processing● Runtime query bytecode compilation● Memory efficient data structures● Multi-threaded multi-core execution● Optimized readers for columnar formats (ORC and Parquet)● Predicate and column projection pushdown● Now also Cost-Based Optimizer

Page 7: Big Fast SQL on OpenShift

Join reordering with filter

Page 8: Big Fast SQL on OpenShift

CBO off

CBO on

https://www.starburstdata.com/presto-benchmarks/

Benchmark results

Page 9: Big Fast SQL on OpenShift

© 2019

Page 10: Big Fast SQL on OpenShift

Administrative challenges

● Configuring and managing clusters● Autotuning properties based on the hardware provisioned● High Availability for Presto Coordinator● Scaling cluster elastically based on query load● Gracefully decommissioning Presto Workers to avoid killing queries● Monitoring of hardware and software layers

https://www.starburstdata.com/technical-blog/presto-on-kubernetes/

Page 11: Big Fast SQL on OpenShift

https://docs.starburstdata.com/latest/kubernetes.html

Presto on OpenShift

Presto WorkerPod

Presto WorkerPod

11

Presto CoordinatorPod

Presto WorkerPod

Horizontal Pod Autoscaler (HPA)

Presto OperatorK8s Operator

PrestoService

Hive Metastore ServicePod

Hadoop / Hive

RDBMS

Page 12: Big Fast SQL on OpenShift

Now available in OpenShift Catalog!

Page 13: Big Fast SQL on OpenShift

● Massively scalable○ 10’s-100’s of PBs○ Billions of objects○ 100’s of gigabits

● Erasure coding drives storage efficiency● High level of fidelity with S3 API● Open source software - LGPL 2.1 / 3

Page 14: Big Fast SQL on OpenShift

Red HatOpenShift Container Storage

● Rook-Ceph operator● Ceph for block (RWO), file (RWO/RWX), object (S3)● Noobaa for multi-cloud gateway

Page 15: Big Fast SQL on OpenShift

Hive connector configuration example (hive.properties)

connector.name=hive-hadoop2

hive.metastore.uri=thrift://metastore.example.com:9083

hive.s3.endpoint=s3.example.com

hive.s3.aws-access-key=ACCESS_KEY

hive.s3.aws-secret-key=SECRET_KEY

hive.s3.use-instance-credentials=false

hive.s3.staging-directory=/tmp

hive.s3.ssl.enabled=true

Page 16: Big Fast SQL on OpenShift

presto> CREATE SCHEMA hive.s3_export WITH (location =

's3://my_bucket/some/path');

presto> CREATE TABLE hive.s3_export.my_table

WITH (format = 'ORC')

AS <source query>;

Page 17: Big Fast SQL on OpenShift

Demo at ODSC West 12:30PM Thursday 10/31

Page 18: Big Fast SQL on OpenShift

Thank You!

18

Twitter: @starburstdata @prestosqlBlog: www.starburstdata.com/technical-blog/Newsletter: www.starburstdata.com/newsletter

© 2019