Top Banner
The Challenges of Open Source Cloud Database
14

The Challenges of Open Source Cloud DatabaseSpeech5)OUSHU_Mr... · 2019-11-21 · SQL-on-Hadoop Separation of Storage & Compute (2000s) • Application:BigData ... faster than Hive

May 20, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Challenges of Open Source Cloud DatabaseSpeech5)OUSHU_Mr... · 2019-11-21 · SQL-on-Hadoop Separation of Storage & Compute (2000s) • Application:BigData ... faster than Hive

The Challenges of Open Source Cloud Database

Page 2: The Challenges of Open Source Cloud DatabaseSpeech5)OUSHU_Mr... · 2019-11-21 · SQL-on-Hadoop Separation of Storage & Compute (2000s) • Application:BigData ... faster than Hive

01 About Oushu

02 Background

03 Cloud Database

04 Apache HAWQ

目录CONTENT

Page 3: The Challenges of Open Source Cloud DatabaseSpeech5)OUSHU_Mr... · 2019-11-21 · SQL-on-Hadoop Separation of Storage & Compute (2000s) • Application:BigData ... faster than Hive

About Oushu

Founded by Apache HAWQ core team. Apache HAWQ is 1st Apache Database TLP initiated by team in China

Focusing on AI and Big Data. Products: Cloud Database OushuDB/HAWQ、Littleboy AI、Lava

Most team members are from EMC, Oracle, IBM, Teradata, Google, Amazon et al.

The first Chinese Company selling database to top US companies. Hundreds of enterprise users across the world

Invested by Sequoia and Red point. Microsoft accelerator 2018

Core members hold dozens of US patents, Work published on top database conference: SIGMOD

Page 4: The Challenges of Open Source Cloud DatabaseSpeech5)OUSHU_Mr... · 2019-11-21 · SQL-on-Hadoop Separation of Storage & Compute (2000s) • Application:BigData ... faster than Hive

Ou sh u Cu sto mers & Partn ers

Energy

Telcom 公安部多省系统

Finance

Page 5: The Challenges of Open Source Cloud DatabaseSpeech5)OUSHU_Mr... · 2019-11-21 · SQL-on-Hadoop Separation of Storage & Compute (2000s) • Application:BigData ... faster than Hive

5

Database History:57 years

• Database:1962

⁃ InvertedFileDatabaseSystem

⁃ SystemDevelopmentCorporation

• SeveralphasesofDatabase

⁃ 1960s:NavigationalDBMS(network&hierarchical)

ü IntegratedDataStore (IDS)

ü InformationManagementSystem (IMS)

⁃ 1970s- 1990s:SQL/RelationalDBMS

ü OLTP,Datawarehouse

⁃ 2000s- Present

ü MPP,Hadoop,NoSQL(XML,KV,Graph,Tree),NewSQL,Cloud Database

Page 6: The Challenges of Open Source Cloud DatabaseSpeech5)OUSHU_Mr... · 2019-11-21 · SQL-on-Hadoop Separation of Storage & Compute (2000s) • Application:BigData ... faster than Hive

6

Cloud Database

• Publiccloud:Database as a service

⁃ Amazon Redshift (ParAccel MPP)

⁃ RDS (PostgreSQL,MySQLetal)

⁃ OushuDB

• PrivateCloud:Virtual machine/Docker container

Page 7: The Challenges of Open Source Cloud DatabaseSpeech5)OUSHU_Mr... · 2019-11-21 · SQL-on-Hadoop Separation of Storage & Compute (2000s) • Application:BigData ... faster than Hive

7

Cloud Database vs Traditional Database

• Difference

⁃ Howusersusethedatabase

⁃ Billing

⁃ RunningEnvironment

ü Virtualization:ResourceManagement

⁃ Ecosystem

ü InfrastructureServices:S3 etal.

⁃ Elasticity

⁃ Security

• Same

⁃ Datamodel&QueryingLanguage

⁃ Queryoptimization&Execution

⁃ Indexes&Storage

⁃ TransactionProcessing

⁃ etal

Page 8: The Challenges of Open Source Cloud DatabaseSpeech5)OUSHU_Mr... · 2019-11-21 · SQL-on-Hadoop Separation of Storage & Compute (2000s) • Application:BigData ... faster than Hive

The Evolution Path of Analytical Database

Cloud Native (about 2015)

Traditional DWDedicated Hardware (1980s)

• Application:Reporting• Scale:10s;• SQL Compatibility:Good• Performance:Middle• Cloud Support:Weak• Examples:Oracle,DB2,Teradata

Network

Storage(SAN, NAS)

Compute(RDBMS, EDW) Compute Memory Storage

zz

SQL-on-HadoopSeparation of Storage & Compute (2000s)

• Application:Big Data• Scale:1000s;• SQL Compatibility:Weak• Performance:Middle• Cloud Support:Weak• Examples:Hive,SparkSQL

StorageCompute

X86 MPP:Share Nothing (2000s)

• Application:Big Data• Scale: 100s• SQL Compatibility:Good• Performance:Middle• Cloud Support:Weak• Examples:Greenplum, Vertica

Network

Memory

CPU CPU

Memory

CPU CPU

• Application:AI、Cloud、IOT, Big Data• Scale:1000s;• SQL Compatibility:Good• Performance:Good• Cloud Support:Native• Examples:OushuDB, HAWQ, Snowflake

Page 9: The Challenges of Open Source Cloud DatabaseSpeech5)OUSHU_Mr... · 2019-11-21 · SQL-on-Hadoop Separation of Storage & Compute (2000s) • Application:BigData ... faster than Hive

HAWQ & OushuDB

HAWQ Started

HAWQ 1.0 Alpha

2011

2012

HAWQ 1.0GA2013

HAWQ 1.X版2014

HAWQ 2.0 Alpha2015

OushuDB2016

OushuDB 3.02017

OushuDB 4.02018-2019

Oushu Founder Lei Chang Initiated the

project at EMC

Hundreds of times faster than Hive

Separate Compute & Storage

HAWQ SIGMODpaper published

Become ApacheIncubating Project

Oushu Founded. Focusing on HAWQ &

OushuDB

10 times faster than HAWQ2.0

Support Update/Delete/Index

OushuDB/HAWQ used by hundreds of companies across the world

HAWQ Becomes Apache Top Level Project

Page 10: The Challenges of Open Source Cloud DatabaseSpeech5)OUSHU_Mr... · 2019-11-21 · SQL-on-Hadoop Separation of Storage & Compute (2000s) • Application:BigData ... faster than Hive

HAWQ Main Features

● DiscoverNewRelationships● EnableDataScience● AnalyzeExternalSources● QueryAllDataTypes!

Multi-levelFaultTolerance

GranularAuthorization

Resourcequeues

highmulti-tenancy

ANSISQLStandard

OLAP Extensions

JDBCODBCConnectivity

ElasticRuntime Online Expansion

HDFS/Magma/Hadoop

PetabyteScale

CostBasedOptimizer

DynamicPipelining

ACID+Transactional

Multi-LanguageUDFSupport

Built-in DataScience Library

Extensible(PXF) QueryExternalSources

Accessibility+Usability

HDFSNativeFileFormats

● ManageMultipleWorkloads● PetabyteScaleAnalytics● Sub-secondPerformance

● LeverageExistingSkills&Tools

● EasilyIntegratewithOtherTools

Compression +Partitioning

core

compliance

● WellIntegratedwithHadoopEcosystem

Page 11: The Challenges of Open Source Cloud DatabaseSpeech5)OUSHU_Mr... · 2019-11-21 · SQL-on-Hadoop Separation of Storage & Compute (2000s) • Application:BigData ... faster than Hive

HAWQ vs Others

Hadoop Native & Open & Scalability

Proprietary & limited scalability

LimitedPerformance &

SQL Compliance

Big SQL

Vortex

HighPerformance &

SQL Compliance

SQL

Page 12: The Challenges of Open Source Cloud DatabaseSpeech5)OUSHU_Mr... · 2019-11-21 · SQL-on-Hadoop Separation of Storage & Compute (2000s) • Application:BigData ... faster than Hive

Contributing to HAWQ

• Documentation

• Wiki

• Bugreports

• Bugfixes

• Features

• Website:http://hawq.incubator.apache.org/

• Wiki:https://cwiki.apache.org/confluence/display/HAWQ

• Repo:https://github.com/apache/hawq• JIRA:

https://issues.apache.org/jira/browse/HAWQ• Mailinglists:dev/[email protected]

Page 13: The Challenges of Open Source Cloud DatabaseSpeech5)OUSHU_Mr... · 2019-11-21 · SQL-on-Hadoop Separation of Storage & Compute (2000s) • Application:BigData ... faster than Hive

Code contribution process

• StartaJIRA

• Forkagithub repo:https://github.com/apache/hawq.git

• Cloneyourrepotolocal

• Addthegithub repoas“upstream”

• Createafeaturebranchandcommityourcode

• Startapullrequestforcodereview

Page 14: The Challenges of Open Source Cloud DatabaseSpeech5)OUSHU_Mr... · 2019-11-21 · SQL-on-Hadoop Separation of Storage & Compute (2000s) • Application:BigData ... faster than Hive

感谢观看

让人类只为兴趣而工作