Top Banner
Apache Tajo on Swift Bringing SQL to the OpenStack World Jihoon Son Apache Tajo PMC member
23

Apache Tajo on Swift

Jul 14, 2015

Download

Engineering

Jihoon Son
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Apache Tajo on Swift

Apache Tajo on SwiftBringing SQL to the OpenStack World

Jihoon SonApache Tajo PMC member

Page 2: Apache Tajo on Swift

Who am I

● Jihoon Son○ Ph.D candidate (Computer Science & Engineering,

2010.3 ~) ○ Apache Tajo PMC and Committer (2014.5.1 ~)○ Mentor of Google Summer of Code (2013)

● Contacts○ Email: jihoonson AT apache.org○ LinkedIn: https://www.linkedin.com/in/jihoonson

Page 3: Apache Tajo on Swift

Outline

● OpenStack Swift● Apache Tajo● Tajo on Swift● Demo● Our Roadmap

Page 4: Apache Tajo on Swift

OpenStack Swift

● Popular object storage○ Images, videos, logs, ...

● Enterprises store objects on Swift to provide their services○ Usually private clusters

Page 5: Apache Tajo on Swift

SQL on Swift

● Data analysis is important to improve the quality of their services○ SQL is one of the most powerful and popular query

language● Many enterprise data analysis tools relying on

SQL○ OLAP, visualization, data mining, …

● Need for using SQL on Swift

Page 6: Apache Tajo on Swift

Apache Tajo

● Scalable, efficient, and fault-tolerant data warehouse system○ Support SQL standards compliance ○ Efficient batch execution and interactive ad-hoc

analysis■ Low latency and high throughput■ No use of MapReduce

○ No single point of failure

Page 7: Apache Tajo on Swift

Apache Tajo

● Active open source project○ 18 committers and 16 contributors ○ Activity summary

Page 8: Apache Tajo on Swift

Apache Tajo

Pluggable Storage Layer

...

MasterMasterTajoMaster

TajoWorker

TajoWorker

TajoWorker

TajoWorker

...

Page 9: Apache Tajo on Swift

Tajo on Swift

Pluggable Storage Layer

MasterMasterTajoMaster

TajoWorker

TajoWorker

TajoWorker

TajoWorker

...

...Swift

Page 10: Apache Tajo on Swift

Tajo on Swift

● No need to modify code of Tajo and Swift○ Tajo can access Swift with the Hadoop-openstack

library■ But, doesn’t need to install or run Hadoop

○ Just use it

Swift

Network

Page 11: Apache Tajo on Swift

Tajo on Swift

● Configuration highlights○ Swift configuration

■ Need the keystone authentication for the Hadoop■ No additional configurations

○ HDFS configuration■ Different cloud providers support

● Key name patternfs.swift.service.${provider}

Page 12: Apache Tajo on Swift

Tajo on Swift

● Configuration highlights○ Swift configuration

■ Need the keystone authentication for the HDFS client■ No additional configurations

○ HDFS configuration■ Different cloud providers support

● Key name patternfs.swift.service.${provider}

Page 13: Apache Tajo on Swift

Tajo on Swift

● Data locality problem

Worker

StorageNode

Interconnection Network

Node A

Worker

Node B

StorageNode

SignificantNetwork

Overhead

Page 14: Apache Tajo on Swift

Tajo on Swift

● Data locality problem

Worker

StorageNode

Interconnection Network

Node A

Worker

Node B

StorageNode

Page 15: Apache Tajo on Swift

Advanced Integration

● List endpoints middleware○ Providing the location information of objects,

accounts or containers■ Tajo workers can directly access each object

○ Example

Page 16: Apache Tajo on Swift

Advanced Integration

● List endpoints middleware○ Swift configuration○○○ Hadoop configuration

Page 17: Apache Tajo on Swift

Advanced Integration

● Location-aware computing○ Moving the processing close to the data

■ Avoiding the performance degradation due to the data transfer over the network

○ Important issue when Tajo and Swift share the same cluster

Page 18: Apache Tajo on Swift

Location-aware Computing

Storage Node

Storage Node

Storage Node

QueryMaster

MasterMasterProxy Server

TajoWorker

TajoWorker

TajoWorker

Data location

Data

Swift Cluster Tajo Cluster

Page 19: Apache Tajo on Swift

StorageNode

Location-aware Computing

1. Getting object locations from the ring

QueryMaster

MasterMasterProxyServer

Get object locations

StorageNode

StorageNode

Page 20: Apache Tajo on Swift

Location-aware Computing

2. Assigning tasks based on object locationsQueryMaster

Worker Worker Worker ...

StorageNode

StorageNode

StorageNode

...

Assign tasks close to the object

Directly read object data

Page 21: Apache Tajo on Swift

Demo

Page 22: Apache Tajo on Swift

Our Roadmap

● Storage layer specialized for Swift● Block storage support

○ Cinder and Ceph● Provisioning Tajo clusters

○ Sahara○ Heat, TOSCA

Page 23: Apache Tajo on Swift

Thanks!http://tajo.apache.org/