ANL 2014 - Chicago Elastic and Efficient Execution of Data- Intensive Applications on Hybrid Cloud Tekin Bicer Computer Science and Engineering Ohio State.

ANL 2014 - Chicago

Elastic and Efficient Execution of Data-Intensive Applications on Hybrid Cloud

Tekin Bicer

Computer Science and Engineering

Ohio State University

ANL 2014 - Chicago

Introduction

• Scientific simulations and instruments– X-ray Photon Correlation Spectroscopy

• CCD Detector: 120MB/s now; 44GB/s by 2015

– Global Cloud Resolving Model• 1PB for 4km grid-cell

• Performed on local clusters– Not sufficient

• Problems– Data Analysis, Storage, I/O performance

• Cloud Technologies– Elasticity– Pay-as-you-go Model

ANL 2014 - Chicago

Hybrid Cloud Motivation

• Cloud technologies– Typically associated with

computational resources• Massive data generation

– Exhaust local storage• Hybrid Cloud

– Local Resources: Base– Cloud Resources: Additional

• Cloud– Compute and storage

resources

Local Resources

ANL 2014 - Chicago

Cloud Storage

Usage of Hybrid Cloud

Local Storage

Data Source

Local Nodes

Cloud Compute Nodes

ANL 2014 - Chicago

Challenges

• Data-Intensive Processing– Transparent Data Access and

Analysis

– Programmability of Large-Scale Applications

• Meeting User Constraints– Enabling Cloud Bursting

• Minimizing Storage and I/O Cost– Domain Specific Compression– In-Situ and In-Transit Data

Analysis

MATE-HC: Map-reduce with AlternaTE APIover Hybrid Cloud

Dynamic Resource Allocation Framework

for Hybrid Cloud

Compression Methodology and

System for Large-Scale App.

ANL 2014 - Chicago

Programmability of Large-Scale Applications on Hybrid Cloud

• Geographically distributed resources• Ease of programmability

– Reduction-based programming structures• MATE-HC

– A middleware for transparent data access and processing

– Selective job assignment– Multi-threaded data retrieval

ANL 2014 - Chicago

Middleware for Hybrid Cloud

...Data

Slaves

MasterLocal Cluster

LocalReduction

Job Assignment

...Data

Slaves

Master

Cloud Environment

Job Assignment

LocalReduction

Remote DataAnalysis

Job Assignment

GlobalReduction

ANL 2014 - Chicago

MATE vs. Map-Reduce Processing Structure

• Reduction Object represents the intermediate state of the execution

• Reduce func. is commutative and associative• Sorting, grouping.. overheads are eliminated with red. func/obj.

ANL 2014 - Chicago

Simple Example

3 5 8 4 1 3 5 2 6 7 9 4 2 4 8

Our large Dataset

Our Compute NodesRobj[1]= Robj[1]= Robj[1]=

Local Reduction (+) Local Reduction(+)Local Reduction(+)

8 15 1421 23 27

Result= 71 Global Reduction(+)

ANL 2014 - Chicago

Experiments

• 2 geographically distributed clusters– Cloud: EC2 instances running on Virginia

– 22 nodes x 8 cores– Local: Campus cluster (Columbus, OH)

– 150 nodes x 8 cores

• 3 applications with 120GB of data– KMeans: k=1000; KNN: k=1000; – PageRank: 50x10 links w/ 9.2x10 edges

• Goals:

– Evaluating the system overhead with different job distributions– Evaluating the scalability of the system

ANL 2014 - Chicago

System Overhead: K-Means

Env-* Global Reduction

Idle Time Total Slowdown

Stolen # Jobs (960)local EC2

50/50 0.067 0 93.871 20.430 (0.5%) 0

33/67 0.066 0 31.232 142.403 (5.9%) 128

17/83 0.066 0 25.101 243.31 (10.4%) 240

ANL 2014 - Chicago

Scalability: K-Means

ANL 2014 - Chicago

Summary

• MATE-HC is a data-intensive middleware developed for Hybrid Cloud

• Our results show that – Low inter-cluster comm. overhead– Job distribution is important– Overall slowdown is modest – Proposed system is scalable

ANL 2014 - Chicago

Outline

• Data-Intensive Processing– Programmability of Large-Scale

Applications– Transparent Data Access and

Analysis

for Cloud Bursting

ANL 2014 - Chicago

Dynamic Resource Allocation for Cloud Bursting

• Performance of cloud resources and workload vary– Problems:

• Extended execution times• Unable to meet user constraints

– Cloud resources can dynamically scale• Cloud Bursting

– In-house resources: Base workload– Cloud resources: Adopt performance requirements

• Dynamic Resource Allocation Framework– A model for capturing “Time” and “Cost” constraints

with cloud bursting

ANL 2014 - Chicago

System Components

• Local cluster and Cloud • MATE-HC processing structure• Pull-based job distribution• Head Node

– Coarse grained job assignment– Consideration of locality

• Master node– Fine grained job assignment

• Job Stealing– Remote data processing

ANL 2014 - Chicago

Resource Allocation FrameworkEstimate required time for local cluster processing

Estimate required time for cloud cluster processing

All variables can be profiled during execution, except estimated # stolen jobs

Estimate remaining # jobs after local jobs are consumed

Ratio of local computational throughput in system

ANL 2014 - Chicago

Execution of Resource Allocation Framework

• Head Node– Evaluates profiled info.– Estimates # cloud inst.

• Before each job assign.

– Informs Master nodes• Master Node

– Each cluster has one– Collects profile info.

• During job req. time– (De)allocates resources

• Slave Nodes– Request and consume jobs

ANL 2014 - Chicago

Experimental Setup

• Two Applications– KMeans (520GB): Local=104GB; Cloud=416GB– PageRank (520GB): Local=104GB; Cloud=416GB

• Local cluster: Max. 16 nodes x 8 cores = 128 cores

• Cloud resources: Max. 16 nodes x 8 cores = 128 cores

• Evaluation of model– Local nodes are dropped during execution– Observed how system is adopted

ANL 2014 - Chicago

KMeans – Time Constraint

# Local Inst.: 16 (fixed)# Cloud Inst.: Max 16 (Varies)Local: 104GB, Cloud:416GB

• System is not able to meet the time constraint because max. # of cloud instances is reached• All other configurations meet the time constraint with <1.5% error rate

ANL 2014 - Chicago

KMeans – Cloud Bursting

• 4 local nodes are dropped …• After 25% and 50% of time constraints are elapsed, error rate <1.9%• After 75% of time constraint is elapsed, error rate <3.6%

• Reason of higher error rate: Shorter time to profile new environment

# Local Inst.: 16 (fixed)# Cloud Inst.: Max 16 (Varies)Local: 104GB, Cloud:416GB

ANL 2014 - Chicago

Summary

• MATE-HC: MapReduce type of processing– Federated resources

• Developed a resource allocation model– Based on feedback mechanism– Time and cost constraints

• Two data-intensive applications (KMeans, PR)– Error rate for time < 3.6%– Error rate for cost < 1.2%

ANL 2014 - Chicago

Outline

Analysis

MATE-HC: Map-reduce with AlternaTE API

over HC

for Cloud Bursting

ANL 2014 - Chicago

Data Management using Compression

• Generic compression algorithms– Good for low entropy sequence of bytes– Scientific dataset are hard to compress

• Floating point numbers: Exponent and mantissa• Mantissa can be highly entropic

• Using compression is challenging– Suitable compression algorithms– Utilization of available resources– Integration of compression algorithms

ANL 2014 - Chicago

Compression Methodology• Common properties of scientific datasets

– Multidimensional arrays– Consist of floating point numbers– Relationship between neighboring values

• Domain specific solutions can help• Approach:

– Prediction-based differential compression• Predict the values of neighboring cells• Store the difference

ANL 2014 - Chicago

Example: GCRM Temperature Variable Compression

• E.g.: Temperature record• The values of neighboring cells

are highly related• X’ table (after prediction):

• X’’ compressed values– 5bits for prediction +

difference• Lossless and lossy comp.• Fast and good compression

ratios

ANL 2014 - Chicago

Compression Framework

• Improve end-to-end application performance• Minimize the application I/O time

– Pipelining I/O and (de)compression operations• Hide computational overhead

– Overlapping application computation with compression framework

• Easy implementation of compression algorithms• Easy integration with applications

– Similar API to POSIX I/O

ANL 2014 - Chicago

A Compression Framework for Data Intensive Applications

Chunk Resource Allocation (CRA) Layer• Initialization of the system• Generate chunk requests, enqueue processing • Converting original offset and data size requests to

compressed

Parallel Compression Engine (PCE)• Applies encode(), decode() functions to chunks• Manages in-memory cache with informed prefetching• Creates I/O requests

Parallel I/O Layer (PIOL)• Creates parallel chunk requests to storage medium• Each chunk request is handled by a group of threads• Provides abstraction for different data transfer

protocols

ANL 2014 - Chicago

Integration with a Data-Intensive Computing System

• Remote data processing– Sensitive to I/O bandwidth

• Processes data in…– local cluster– cloud– or both (Hybrid Cloud)

ANL 2014 - Chicago

Experimental Setup

• Two datasets:– GCRM: 375GB (L:270 + R:105)– NPB: 237GB (L:166 + R:71)

• 16x8 cores (Intel Xeon 2.53GHz)• Storage of datasets

– Lustre FS (14 storage nodes)– Amazon S3 (Northern Virginia)

• Compression algorithms– CC, FPC, LZO, bzip, gzip, lzma

• Applications: AT, MMAT, KMeans

ANL 2014 - Chicago

Performance of MMAT

Local Remote Hybrid0

1800OriginalLZOCC

Original CC Original CCLocal Remote

1000 ReadDecomp.Reduction

Speedups

Local Remote Hybrid

CC 1.63 1.90 1.85

LZO 1.04 1.24 1.14

Compression Ratios

CC 51.68% (186GB)

LZO 20.40% (299GB)

Breakdown of Performance• Overhead (Local): 15.41%• Read Speedup: 1.96

I/O Throughput (128np)

GB/sec Orig. CC

Local 1.62 3.21

Remote 0.1 0.19

ANL 2014 - Chicago

Lossy Compression (MMAT)

1000CCLossy 2eLossy 4e

Speedups

Local Remote Hybrid

2e vs CC 1.07 1.18 1.09

4e vs CC 1.13 1.43 1.18

4e vs orig. 1.76 2.41 2.18

Compression Ratios

Lossless 51.68%

2e 56.88% (162GB)

4e 62.93% (139GB)

Lossy• #e: # dropped bits• Error bound: 5x(1/10^5)

ANL 2014 - Chicago

Summary• Management and analysis of scientific datasets are

challenging– Generic compression algorithms are inefficient for

scientific datasets• Proposed a compression framework and methodology

– Domain specific compression algorithms are fast and space efficient• 51.68% compression ratio• 53.27% improvement in exec. time

– Easy plug-and-play of compression– Integration of the proposed framework and methodology

with a data analysis middleware

ANL 2014 - Chicago

Outline

Analysis

for Cloud Bursting

ANL 2014 - Chicago

In-Situ and In-Transit Analysis

• Compression can ease data management– But may not always be sufficient

• In-situ data analysis– Co-locate data source and analysis code– Data analysis during data generation

• In-transit data analysis– Remote resources are utilized– Forward generated data to “staging nodes”

ANL 2014 - Chicago

In-Situ and In-Transit Data Analysis

• Significant reduction in generated dataset size– Noise elimination, data filtering, stream mining…– Timely insights

• Parallel data analysis– MATE-Stream

• Dynamic resource allocation and load balancing– Hybrid data analysis– Both in-situ and in-transit

ANL 2014 - Chicago

Robj[...]

Parallel In-Situ Data Analysis

DataSource

Disp LRobj[...]

Local Combination• Intermediate results• Timely insights• Continuous global red.

Local Reduction• Filtering, stream mining• Data reduction• Continuous local red.

Data Generation• Scientific instruments,

simulations, etc.• (Un)bounded data

ANL 2014 - Chicago

Robj[...]

Elastic In-Situ Data Analysis

DataSource

LRobj[...]

Insufficient resource utilization• Dynamically extend resources• New local reduction proc.

ANL 2014 - Chicago

Robj[...]

Elastic In-Situ and In-Transit Data Analysis

DataSource

Disp LRobj[...]

GRobj[...]

Staging node is set• Forward dataReduction process:1. Local comb.2. Global comb.

ANL 2014 - Chicago

Future Directions

• Scientific applications are difficult to modify– Integration with existing data sources– GridFTP, (P)NetCDF and HDF5 etc.

• Data transfer is expensive (especially for in-transit)– Utilization of advanced network technologies– Software-Defined Networking (SDN)

• Long running nature of large-scale app.– Failures are inevitable– Exploit features of processing structure

ANL 2014 - Chicago

Conclusions

• Data-intensive applications and instruments can easily exhaust local resources

• Hybrid cloud can provide additional resources • Challenges: Transparent data access and processing; meeting user

constraints; minimizing I/O and storage cost• MATE-HC: Transparent and efficient data processing on Hybrid

Cloud• Developed a “dynamic resource allocation framework” and

integrated with MATE-HC– Time and cost sensitive data processing

• Proposed a “compression methodology and a system” to minimize storage cost and I/O bottleneck

• Design of “in-situ and in-transit data analysis” (on going work)

ANL 2014 - Chicago 42

Thanks for your attention!

ANL 2014 - Chicago

MATE-EC2 Design

• Data organization– Three levels: Buckets, Chunks and Units– Metadata information

• Chunk Retrieval– Threaded Data Retrieval– Selective Job Assignment

• Load Balancing and handling heterogeneity– Pooling mechanism

ANL 2014 - Chicago

MATE-EC2 vs. EMR

• KMeans• Speedups

• vs. combine3.54 – 4.58

• PageRank• Speedups

• vs. combine4.08 – 7.54

ANL 2014 - Chicago

Different Chunk Sizes

• KMeans• 1 retrieval threads• Performance

increase– 128KB vs. >8M– 2.07 to 2.49

ANL 2014 - Chicago

K-Means (Data Retrieval)

• Fig 1: 16 Retrieval Threads– 8M vs. others speedup: 1.13-1.30

• Fig. 2: 128M Chunk Size– 1 Thread vs. others speedup: 1.37-1.90

Fig. 1 Fig. 2

• Dataset: 8.2GB

ANL 2014 - Chicago

Job Assignment

• KMeans: – 1.01 (8M) and 1.10-1.14 (for others)

• PCA (2 iterations):– Speedups : 1.19-1.68

ANL 2014 - Chicago

Heterogeneous Conf.

Overheads– KMeans: 1%– PCA: 1.1%, 7.4%, 11.7%

ANL 2014 - Chicago

Kmeans – Cost Constraint

• System meets the cost constraints with <1.1% error rate• Maximum # cloud instances is allocated error rate is again <1.1%

• System tries to minimize the execution time with provided cost constraint

ANL 2014 - Chicago

Prefetching and In-Memory Cache

• Overlapping application layer computation with I/O

• Reusability of already accessed data is small

• Prefetching and caching the prospective chunks– Default is LRU– User can analyze history and

provide prospective chunk list

• Cache uses row-based locking scheme for efficient consecutive chunk requests

Informed Prefetching

prefetch(…)

ANL 2014 - Chicago 51

1000FPC2P – 4IO4P – 8IO

1200OriginalFPC

Performance of KMeans

Speedups

Local Remote Hybrid

FPC 0.75 1.30 1.12

Speedups w/ multithreading

Local Remote Hybrid

2P - 4IO 1.25 1.17 1.19

4P - 8IO 1.37 1.16 1.21

4P – 8IO vs Orig.

1.03 1.51 1.36

• NPB dataset• Comp ratio: 24.01% (180GB)• More computation

– More opportunity to fetch and decompression

ANL 2014 - Chicago Elastic and Efficient Execution of Data- Intensive Applications on Hybrid Cloud Tekin Bicer Computer Science and Engineering Ohio State.

gglobal cloud

chicago mate

chicago middleware

base cloud resources

chicago scalability

chicago outline dataintensive

global reduction slide

chicago experiments

Documents

Gesetz- und Verordnungsblatt für...

CCGrid 2014 Improving I/O Throughput of Scientific...

Irk Bitig - Talat Tekin

THE WORD ‘TEGIN ~ TEKIN’ IN TURKIC LANGUAGES AND...

TEKin osaamisen kehittämisen kysely 2014

SC TEKIN SRL

Ofis tekin katalog

Oğuz Tekin - eski yunan tarihi

using tekin rs with hotwire v200 vegas -...

Arif Tekin Tarihdeki Yaratilis Söylencesi

Tekin Formula 10 Manual - CompetitionX · The Formula 10...

Galip Tekin Tursuntur

PDAC-10 Middleware Solutions for Data- Intensive...

IPDPS, 2010 1 Supporting Fault Tolerance in a Data-Intensive...

Metin TEKiN Mali Hizmetler Başkanı

A Framework for Elastic Execution of Existing MPI Programs.....