Top Banner
Advanced service-based data analytics: concepts and designs Hong-Linh Truong Distributed Systems Group, Vienna University of Technology [email protected] dsg.tuwien.ac.at/staff/truong 1 ASE Summer 2014 Advanced Services Engineering, Summer 2014, Lecture 7
57

TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Jan 26, 2015

Download

Education

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Advanced service-based data analytics:

concepts and designs

Hong-Linh Truong

Distributed Systems Group,

Vienna University of Technology

[email protected]/staff/truong

1ASE Summer 2014

Advanced Services Engineering,

Summer 2014, Lecture 7

Advanced Services Engineering,

Summer 2014, Lecture 7

Page 2: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Outline

Principles of elasticity for advanced service-

based data analytics

Data analytics within a single system

Data analytics across multiple systems

Composable cost evaluation

ASE Summer 2014 2

Page 3: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

PRINCIPLES OF ELASTICITY FOR DATA

ANALYTICS

ASE Summer 2014 3

Page 4: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Advanced service-based data

analytics (1)

ASE Summer 2014 4

Cities, e.g. including:

10000+ buildings

1000000+ sensors

Near realtime analytics

Near realtime analytics

Predictive data

analytics

Visual Analytics

Enterprise

Resource

Planning

Enterprise

Resource

Planning

Emergency

Management

Emergency

Management

Internet/public cloud

boundary

Organization-specific

boundary

Tracking/Log

istics

Tracking/Log

istics

Infrastructure

Monitoring

Infrastructure

Monitoring

Infrastructure/Internet of Things

......

Page 5: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Advanced service-based data

analytics (2)

ASE Summer 2014 5

A lot of input data (L0):

~2.7 TB per day

A lot of results (L1, L2):e.g., L1 has ~140 MB per

day for a grid of

1kmx1km

Soil

moisture

analysis for

Sentinel-1

Michael Hornacek,Wolfgang Wagner, Daniel Sabel, Hong-Linh Truong, Paul Snoeij, Thomas Hahmann, Erhard Diedrich, Marcela Doubkova,

Potential for High Resolution Systematic Global Surface Soil Moisture Retrieval Via Change Detection Using Sentinel-1, IEEE Journal of

Selected Topics in Applied Earth Observations and Remote Sensing, April, 2012

Michael Hornacek,Wolfgang Wagner, Daniel Sabel, Hong-Linh Truong, Paul Snoeij, Thomas Hahmann, Erhard Diedrich, Marcela Doubkova,

Potential for High Resolution Systematic Global Surface Soil Moisture Retrieval Via Change Detection Using Sentinel-1, IEEE Journal of

Selected Topics in Applied Earth Observations and Remote Sensing, April, 2012

Data-as-a-Service

and Platform-as-a-

Service in clouds

Data-as-a-Service

and Platform-as-a-

Service in clouds

Page 6: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Advanced service-based data

analytics -- fundamental concepts

ASE Summer 2014 6

Part A Part B ...... Part N

Cluster Grid Local Cloud Public cloud/Sky

Applications

System

infrastructures

Domain 1 Domain 2 Domain n

Page 7: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Design questions

Which system infrastructures are used?

Which are the functions of units?

Which interfaces are suitable for units?

Which programming models are used within units?

Which are fundamental units to be used?

How do different units interact?

Which non-functional parameters are important and

how to measure them?

ASE Summer 2014 7

Part = a (composite) service unitPart = a (composite) service unit

Page 8: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Fundamental concepts – system

infrastructure unit

ASE Summer 2014 8

System infrastructures

Cloud

Software-based Cloud

system

Human-based Cloud system

Cluster GridHigh

Performance Server

Page 9: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Fundamental concepts – unit

functions

ASE Summer 2014 9

Function

Front-end/Presentation

Data Analytics Service

Visualization Service Middleware

Enterprise Service Bus

Publish/Subscription Messaging/Queuing Data Transfer

Page 10: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Fundamental concepts –

programming model within units

ASE Summer 2014 10

Programming model

MapReduce MPIParallel

DatabaseWorkflow

Other solutions

Page 11: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Fundamental concepts – interfaces

between units

ASE Summer 2014 11

Interface

Standard

REST SOAP

APIs

Specific APIsStandard APIs

(e.g. OpenStack)

Interaction

Pull Push

Page 12: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Fundamental concepts – services

and data concerns

ASE Summer 2014 12

Service and data

concerns

Data concerns

Quality of data

Pricing Data Right ...

Service Concerns

QoS Pricing ...

Page 13: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Complex dependencies in (big)

data analytics More data more

computational resources

(e.g. more VMs)

More types of data

more computational models

more analytics

processes

Change quality of results

Change quality of data

Change response time

Change cost

Change types of result

(form of the data

output, e.g. tree, visual,

story, etc.)

More data more

computational resources

(e.g. more VMs)

More types of data

more computational models

more analytics

processes

Change quality of results

Change quality of data

Change response time

Change cost

Change types of result

(form of the data

output, e.g. tree, visual,

story, etc.)

Data

Computational

Model

Analytics

Process

Analytics Result

Data

Data

DataxDatax

DatayDatay

DatazDataz

Computational

Model

Computational

ModelComputational

Model

Computational

ModelComputational

Model

Computational

Model

Analytics

Process

Analytics

ProcessAnalytics

Process

Analytics

ProcessAnalytics

Process

Analytics

Process

Quality of

Result

ASE Summer 2014 13

Hong-Linh Truong, Schahram Dustdar, "Principles of

Software-defined Elastic Systems for Big Data

Analytics", (c) IEEE Computer Society, IEEE

International Workshop on Software Defined

Systems, 2014 IEEE International Conference on

Cloud Engineering (IC2E 2014), Boston,

Massachusetts, USA, 10-14 March 2014

Hong-Linh Truong, Schahram Dustdar, "Principles of

Software-defined Elastic Systems for Big Data

Analytics", (c) IEEE Computer Society, IEEE

International Workshop on Software Defined

Systems, 2014 IEEE International Conference on

Cloud Engineering (IC2E 2014), Boston,

Massachusetts, USA, 10-14 March 2014

Page 14: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Complex dependencies in (big)

data analytics

Elasticity principles

can be used to

support this!

Elasticity principles

can be used to

support this!

ASE Summer 2014 14

Page 15: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Elasticity Principles: Elasticity of

data and computational models

Multiple types of objects from different sources with

complex dependencies, relevancies, and quality

Different data and computational models the same

analytics subject

New analytics subjects can be defined and analytics

goals can be changed

Decide/select/define/compose not only computational

models for analytics subjects but also data models

based on existing ones

Management and modeling of elasticity of data and computational

model during the analytics

Management and modeling of elasticity of data and computational

model during the analytics

ASE Summer 2014 15

Page 16: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Elasticity Principles: Elasticity of

data resources

Data provided, managed and shared by different

providers

Data associated with different concerns (cost, quality

of data, privacy, contract, etc.

Static data, open data, data-as-a-service, opportunistic

data (from sensors and human sensing)

Not just centralized big data and total data ownership

Data resources can be taken into account in an elastic

manger: similar to VMs, based on their quality,

relevancy, pricing, etc.

Data resources can be taken into account in an elastic

manger: similar to VMs, based on their quality,

relevancy, pricing, etc.

ASE Summer 2014 16

Page 17: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Elasticity Principles: Elasticity of

humans and software as computing

units

Human in the loop to solve analytics tasks that software

cannot solve

Human-based compute units can be scaled up/down

with different cost, availability, performance models

Human-based compute units + software-based

compute units for executing computational models

Elasticity controls can be also done by humans

Provisioning hybrid compute units in an elastic way for

computational/data/network tasks as well as for

monitoring/control tasks in the analytics process

Provisioning hybrid compute units in an elastic way for

computational/data/network tasks as well as for

monitoring/control tasks in the analytics process

ASE Summer 2014 17

Page 18: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Elasticity Principles: Elasticity of

quality of results

Definition of quality of results

Trade-offs of time, cost, quality of data, forms of

output

Using quality of results to select suitable

computational models, data resources,

computing units

Multi-level control for the elasticity based on

quality of results

Able to cope with changes in quality of data,

performance, cost and types of results at runtime

Able to cope with changes in quality of data,

performance, cost and types of results at runtime

ASE Summer 2014 18

Page 19: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

WE NEED TO START FROM

DATA ANALYTICS WITHIN A

SINGLE SYSTEM

ASE Summer 2014 19

Page 20: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Domain ADomain A

Data analytics within a single

system They are complex enough but

do not meet all requirements

In a single domain

Tightly coupled computing

infrastructures

E.g., in the same

cloud

Computation and data are

close

Several concerns can be

by-passed

ASE Summer 2014 20

Data

service

unit

Data Analytics

Unit

Not always provisioned under the „Service Unit“ model

Page 21: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Data analytics within a single

system

ASE Summer 2014 21

1. Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, and Michael

Stonebraker. 2009. A comparison of approaches to large-scale data analysis. In Proceedings of the 2009 ACM

SIGMOD International Conference on Management of data (SIGMOD '09), Carsten Binnig and Benoit Dageville

(Eds.). ACM, New York, NY, USA, 165-178. DOI=10.1145/1559845.1559865

http://doi.acm.org/10.1145/1559845.1559865

2. Leonardo Neumeyer, Bruce Robbins, Anish Nair, Anand Kesari: S4: Distributed Stream Computing Platform. ICDM

Workshops 2010: 170-177

3. Jerry Chou, Mark Howison, Brian Austin, Kesheng Wu, Ji Qiang, E. Wes Bethel, Arie Shoshani, Oliver Rübel,

Prabhat, and Rob D. Ryne. 2011. Parallel index and query for large scale data analysis. In Proceedings of 2011

International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). ACM, New

York, NY, USA, , Article 30 , 11 pages. DOI=10.1145/2063384.2063424 http://doi.acm.org/10.1145/2063384.2063424

4. Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGregor, Prashant J. Shenoy: A platform for scalable one-pass

analytics using MapReduce. SIGMOD Conference 2011: 985-996

5. Fabrizio Marozzo, Domenico Talia, Paolo Trunfio: A Cloud Framework for Parameter Sweeping Data Mining

Applications. CloudCom 2011: 367-374

6. Yingyi Bu, Bill Howe, Magdalena Balazinska, Michael D. Ernst: HaLoop: Efficient Iterative Data Processing on Large

Clusters. PVLDB 3(1): 285-296 (2010)

1. Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, and Michael

Stonebraker. 2009. A comparison of approaches to large-scale data analysis. In Proceedings of the 2009 ACM

SIGMOD International Conference on Management of data (SIGMOD '09), Carsten Binnig and Benoit Dageville

(Eds.). ACM, New York, NY, USA, 165-178. DOI=10.1145/1559845.1559865

http://doi.acm.org/10.1145/1559845.1559865

2. Leonardo Neumeyer, Bruce Robbins, Anish Nair, Anand Kesari: S4: Distributed Stream Computing Platform. ICDM

Workshops 2010: 170-177

3. Jerry Chou, Mark Howison, Brian Austin, Kesheng Wu, Ji Qiang, E. Wes Bethel, Arie Shoshani, Oliver Rübel,

Prabhat, and Rob D. Ryne. 2011. Parallel index and query for large scale data analysis. In Proceedings of 2011

International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). ACM, New

York, NY, USA, , Article 30 , 11 pages. DOI=10.1145/2063384.2063424 http://doi.acm.org/10.1145/2063384.2063424

4. Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGregor, Prashant J. Shenoy: A platform for scalable one-pass

analytics using MapReduce. SIGMOD Conference 2011: 985-996

5. Fabrizio Marozzo, Domenico Talia, Paolo Trunfio: A Cloud Framework for Parameter Sweeping Data Mining

Applications. CloudCom 2011: 367-374

6. Yingyi Bu, Bill Howe, Magdalena Balazinska, Michael D. Ernst: HaLoop: Efficient Iterative Data Processing on Large

Clusters. PVLDB 3(1): 285-296 (2010)

Some papers

Page 22: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Data analytics within a single

system – some examples

ASE Summer 2014 22

Message Passing

Interface (MPI) + Cluster-

based File system

Message Passing

Interface (MPI) + Cluster-

based File system

MapReduce + Google

File System

MapReduce + Google

File System

Hadoop + HDFSHadoop + HDFS

Dryad+LINQDryad+LINQ

Parallel Database

(SQL/NonSQL)

Parallel Database

(SQL/NonSQL)

Yahoo S4Yahoo S4

WorkflowWorkflow

A short, good overview in Chapter 6: Cloud Programming and Software Environments, Book: Distributed and Cloud

Computing – from Parallel Processing to the Internet of Things, Kai Hwang, Geoffrey C. Fox and Jack J Dongarra,

Morgan Kaufmann, 2012

A short, good overview in Chapter 6: Cloud Programming and Software Environments, Book: Distributed and Cloud

Computing – from Parallel Processing to the Internet of Things, Kai Hwang, Geoffrey C. Fox and Jack J Dongarra,

Morgan Kaufmann, 2012

Page 23: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Discussion time

ASE Summer 2014 23

WHY ANALYTICS UNITS SHOULD BE

„CLOSED“ TO DATA UNITS?

WHICH CONCERNS COULD BE IGNORE IN

SINGLE SYSTEM DATA ANALYTICS?

WHICH ISSUES WE NEED TO CONSIDER

WHEN OUR DATA UNITS ARE IN

DIFFERENT SYSTEMS?

Page 24: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Data analytics across multiple

systems – design choice

Programming models

for data analytics

service

Data service units

Supporting middleware

units

ASE Summer 2014 24

Programming model

System Infrastrucure

Interface

Page 25: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Data analytics across multiple systems

– programming models (1)

Static data

ASE Summer 2014 25

Local

input

data

Analytics

Results

MapReduce/Hadoop

Workflow

MPI

Other solutions

Servers/Cloud/Cluster

What are our design concerns? What are our design concerns?

Input

data

Output

data

Page 26: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Stockmarket

Social media

M2M

Stockmarket

Social media

M2M

Data analytics across multiple systems

– programming models (2)

Near-realtime data

ASE Summer 2014 26

Analytics

Results

Complex event processing

Stream data analysis

Other solutions

Servers/Cloud/Cluster

Input

data

Output

dataWhat are our design concerns? What are our design concerns?

Page 27: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Big data (e.g.,

satellite images)

Big data (e.g.,

satellite images)

Data analytics across multiple systems

– programming models (3)

Near-realtime data

ASE Summer 2014 27

Analytics

Results

MPI

Workflow

Other solutions

Servers/Cloud/Cluster

Input

data

Output

dataWhat are our design concerns? What are our design concerns?

Page 28: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Data analytics across multiple

systems – data service units

ASE Summer 2014 28

Cluster file

LustreLustreNFSNFS

Data

Analytics Unit• Read/write data via direct ,

low-level read/write via IO

Interface

• Cluster or cluster of clusters

• Can be very large

System

• Usually parallel processing

Programming model

Hadoop File SystemHadoop File System Google file systemGoogle file system

Read/write data

Page 29: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Data analytics across multiple

systems – data service units

ASE Summer 2014 29

Storage-as-a-

Service

Google Storage Service

(REST API)

Google Storage Service

(REST API)

Amazon S3

(SOAP/REST API)

Amazon S3

(SOAP/REST API)

Data

Analytics Unit • Direct data transfer via REST/SOAP APIs

Interface

• Decouple between analytics and storage

System

• May require middleware for data transfer

• Request via SOAP/REST

• Real data transfer done by external middleware

• A rich set of programming models can be used

Programming model

commands data

Page 30: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Data analytics across multiple

systems – data service units

ASE Summer 2014 30

Database-as-a-

Service

SkySQL

Amazon RDS

Microsoft SQL Azure

Clustrix DBaaS

SkySQL

Amazon RDS

Microsoft SQL Azure

Clustrix DBaaS

MongoDB/MongoLab

Amazon DynamoDB

Amazon SimpleDB

Cloudant Data

MongoDB/MongoLab

Amazon DynamoDB

Amazon SimpleDB

Cloudant Data

Data

Analytics Unit

• REST/SOAP APIs

• Mainly for commands and results

Interface

• Decouple between analytics unit and database

• Database as a sevice can be very large

System

• Analytics can be done at both sides

• Analytic units can use any programming models

• Database-as-a-service can perform a lot of analytics

• Parallel database operations

Programming model

Technology

queries data

Page 31: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Data analytics across multiple

systems – data service units

ASE Summer 2014 31

DaaS

Infochimps

Microsoft Azure

Xively

GNIP

Infochimps

Microsoft Azure

Xively

GNIP

Data

Analytics Unit• Data transfer can be uni or bi-

direction

• REST/SOAP APIs

Interface

• Both systems for DaaS and for analytics units can be very large

System

• Can be any

Programming modelTechnology

Page 32: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Middleware service unit for

transfering large data -- GlobusOnline

ASE Summer 2014 32

Source: Bryce Allen, John Bresnahan, Lisa Childers, Ian Foster, Gopi Kandaswamy, Raj Kettimuthu, Jack Kordas, Mike

Link, Stuart Martin, Karl Pickett, and Steven Tuecke. 2012. Software as a service for data scientists. Commun. ACM 55,

2 (February 2012), 81-88. DOI=10.1145/2076450.2076468 http://doi.acm.org/10.1145/2076450.2076468

Source: Bryce Allen, John Bresnahan, Lisa Childers, Ian Foster, Gopi Kandaswamy, Raj Kettimuthu, Jack Kordas, Mike

Link, Stuart Martin, Karl Pickett, and Steven Tuecke. 2012. Software as a service for data scientists. Commun. ACM 55,

2 (February 2012), 81-88. DOI=10.1145/2076450.2076468 http://doi.acm.org/10.1145/2076450.2076468

Page 33: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Middleware service unit for

transfering large data -- ProxyWS

ASE Summer 2014 33

Spiros Koulouzis, Reginald Cushing, K. A. Karasavvas, Adam Belloum, Marian Bubak: Enabling Web Services to

Consume and Produce Large Datasets. IEEE Internet Computing 16(1): 52-60 (2012)

Spiros Koulouzis, Reginald Cushing, K. A. Karasavvas, Adam Belloum, Marian Bubak: Enabling Web Services to

Consume and Produce Large Datasets. IEEE Internet Computing 16(1): 52-60 (2012)

Page 34: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Middleware service units for

messages/queuing

Advanced Message Queuing Protocol (AMQP)

Simple (or Streaming) Text Orientated

Messaging Protocol (STOMP)

Specific protocols/APIs

ASE Summer 2014 34

Amazon SQSAmazon SQSStormMQStormMQ RabbitMQRabbitMQ

Page 35: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

SOME EXAMPLES OF

COMPLEX DATA ANALYTICS

SERVICEASE Summer 2014 35

Page 36: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

The SMAD distributed processing

architecture

36ASE Summer 2014

Page 37: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Different possibilities: Grids and/or

clouds Raw images stored in archival: iRODS, HTTP server, or

Amazon S3

Notification

Any queuing system: on-premise or cloud-based

service

Reference images:

Local/pre-deployed or deployed on demand

Computation: set of workstations, cluster, EC2, etc.

Sentinel-1 images and SSM storage:

Local files, cloud storage, iRODs, etc.

Result notification and sharing: to whom? At which

scale?

37

The choices are also strongly dependent on “collaboration

needs” and money! But how easy data sharing is?

The choices are also strongly dependent on “collaboration

needs” and money! But how easy data sharing is?

ASE Summer 2014

Page 38: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Prototype

38

PBS on Vienna Scientific

Cluster (vsc.ac.at), In total

~ 4000 cores

ASE Summer 2014

Page 39: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

39

Illustrative experiment (1)

ASE Summer 2014

Page 40: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Sustainability governance analysis

ASE Summer 2014 40

Cities, e.g. including:

10000+ buildings

1000000+ sensors

Near realtime analytics

Near realtime analytics

Predictive data

analytics

Visual Analytics

Enterprise

Resource

Planning

Enterprise

Resource

Planning

Emergency

Management

Emergency

Management

Internet/public cloud

boundary

Organization-specific

boundary

Tracking/Log

istics

Tracking/Log

istics

Infrastructure

Monitoring

Infrastructure

Monitoring

Infrastructure/Internet of Things

......

Page 41: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

DaaS for sustainability governance

Monitoring data DaaS

Domain-specific knowledge DaaS

ASE Summer 2014 41

Hong-Linh Truong, Schahram Dustdar , Sustainability Data and Analytics in Cloud-Based M2M Systems, Big Data

and Internet of Things: A Roadmap for Smart Environments Studies in Computational Intelligence Volume 546, 2014, pp

343-365

Hong-Linh Truong, Schahram Dustdar , Sustainability Data and Analytics in Cloud-Based M2M Systems, Big Data

and Internet of Things: A Roadmap for Smart Environments Studies in Computational Intelligence Volume 546, 2014, pp

343-365

Page 42: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Platform-as-a-Service for

Sustainability Governance

• different types of analytics application models,

such as batch, workflow and stream applications

and intelligent bots

different programming models and languages

For analytics of large-scale data but also bot-as-

a-service

ASE Summer 2014 42

Page 43: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Cloud-based Sustainability

governance analysis framework

ASE Summer 2014 43

Page 44: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Cloud-based Sustainability

governance analysis framework

ASE Summer 2014 44

Page 45: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

HOW TO DEAL WITH COST

AND QUALITY OF COMPLEX

SERVICES?

Discussion time

ASE Summer 2014 45

Page 46: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

46 46

Examples of our complex data

analytics Bio-mechanic applications

Simulate the stiffness of human bones

Data and computation intensive applications

Sequential and parallel programs (e.g., parfe and paraview),

Complex software installation: Parmetis, Trilinos, Parfe, Paraview, and HDF5

run under batch and interactive modes

ASE Summer 2014

Page 47: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Composable evaluation approach

We test with „cost“

ASE Summer 2014 47

Part A Part B ...... Part N

Page 48: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Dealing with performance and cost

of complex applications in clouds Application complexity

Elastic high performance applications on multiple clouds: libraries, software

services, virtual machines, etc.

Cost and performance are needed for determining which parts of the application

should be excuted in the clouds and when

Cost/performance model complexity

Coarse- and fine-grained cost models of clouds at different layers:

Too coarse-grained (networks, storages, machines) or too fine-grained (IO

calls)

Software-, data-, human-specific cost/performance models

Cost models for individual parts (workflow, MPI, OpenMP, etc.)

Tran Vu Pham, Hong-Linh Truong, Schahram Dustdar "Elastic High Performance Applications - A Composition

Framework", The 2011 Asia-Pacific Services Computing Conference (IEEE APSCC 2011), (c) IEEE Computer Society,

December 12 - 15, 2011, Jeju, Korea

Hong Linh Truong, Schahram Dustdar: Composable cost estimation and monitoring for computational applications in

cloud computing environments. Procedia CS 1(1): 2175-2184 (2010)

Tran Vu Pham, Hong-Linh Truong, Schahram Dustdar "Elastic High Performance Applications - A Composition

Framework", The 2011 Asia-Pacific Services Computing Conference (IEEE APSCC 2011), (c) IEEE Computer Society,

December 12 - 15, 2011, Jeju, Korea

Hong Linh Truong, Schahram Dustdar: Composable cost estimation and monitoring for computational applications in

cloud computing environments. Procedia CS 1(1): 2175-2184 (2010)

ASE Summer 2014 48

Page 49: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Composable cost evaluation

Part A Part B Part C

Cost/performance

model i

Cost/performance

model j

Cost/performance

model k

Runtime:

Elastic

processes

Elastic high performance applications on multiple clouds:

libraries, software services, virtual machines, etc.

Utilize different

performance and

dependencies models

for sequential, parallel,

workflows, etc.

ASE Summer 2014 49

Page 50: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Composable cost evaluation:

Estimation and Monitoring

Leverage our previous knowledge on event representations,

application monitoring, performance analysis, dependability

analysis

Employ service-oriented approach

RESTful service, JSON and XML event data

50 50ASE Summer 2014

Page 51: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Event Representations and

Instrumentation

Captured monitoring events based on a well-defined

specification

– Well-known instrumentation techniques can be

reused

Consider different application execution models (e.g.,

MPI, workflows, etc.)

ASE Summer 2014 51

Page 52: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Composable cost evaluation --

Fine-grained composition cost

models

52 52ASE Summer 2014

Page 53: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Composable cost evaluation --

Illustrative experiments

53

Examples with the Bones application

ASE Summer 2014

Page 54: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Simple Cost Estimation - Examples

aaaa

54ASE Summer 2014 54

Page 55: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Online Cost Monitoring - Examples

One experiment of a

bioinformatic

workflow in EC2

Support runtime

cost-based

composition and

execution

55ASE Summer 2014 55

Page 56: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

Exercises

Read mentioned papers

Analyze the relationships between programming

models and system infrastructures for data

analytics across multiple domains

Examine http://cloudcomputingpatterns.org and

see how it supports data analytics patterns

Develop some patterns for data analytics across

multiple systems

Work on composable cost evaluation for

complex data analytics

ASE Summer 2014 56

Page 57: TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

57

Thanks for your attention

Hong-Linh Truong

Distributed Systems Group

Vienna University of Technology

[email protected]

dsg.tuwien.ac.at/staff/truong

ASE Summer 2014