DIET Overview and some recent work A middleware for the large scale de ployment of applications over the Grid

DIET Overview and some recent workA middleware for the large scale deployment

of applications over the Grid

Frédéric Desprez LIP ENS Lyon / INRIA

GRAAL Research Team

Join work withN. Bard, R. Bolze, B. Depardon, Y Caniou, E. Caron,

B. Depardon, D. Loureiro, G. Le Mahec, A. Muresan, V. Pichon, …

Distributed Interactive Engineering Toolbox

Introduction• Transparency and simplicity represent the holy grail for Grids

(maybe even before performance) ! • Scheduling tunability to take into account the characteristics

of specific application classes

• Several applications ready (and not only number crunching ones !)

• Many incarnations of the Grid (metacomputing, cluster computing, global computing, peer-to-peer systems, Web Services, …) Many research projects around the world Significant technology base

• Do not forget good ol’ time research on scheduling and distributed systems ! Most scheduling problems are very difficult to solve even in their

simplistic form … … but simple solutions often lead to better performance results in

real life

Introduction, cont

One long term idea for the gridoffering (or renting) computational power and/or storage through the Internet

Very high potential• Need of Problem Solving and Application Service Provider Environments• More performance, storage capacity • Installation difficulty for some libraries and applications• Some libraries or codes need to stay where they have been developed• Some data need to stay in place for security reasons Using computational servers through a simple interface

RPC and Grid-Computing: GridRPC• One simple idea

– Implementing the RPC programming model over the grid – Using resources accessible through the network– Mixed parallelism model (data-parallel model at the server level

and task parallelism between the servers)

• Features needed– Load-balancing (resource localization and performance evaluation,

scheduling), – IDL, – Data and replica management, – Security, – Fault-tolerance, – Interoperability with other systems,– …

• Design of a standard interface – within the OGF (GridRPC and SAGA WG)– Both computation requests and data management– Existing implementations: NetSolve, Ninf, DIET, OmniRPC

RPC and Grid Computing: Grid RPC

AGENT(s)

S1 S2 S3 S4

A, B, C

Answer (C)

S2 !

Request

Op(C, A, B)

Client

DIET’s Goals• Our goals

To develop a toolbox for the deployment of environments using the Application Service Provider (ASP) paradigm with different applications

Use as much as possible public domain and standard software To obtain a high performance and scalable environment Implement and validate our more theoretical results

Scheduling for heterogeneous platforms, data (re)distribution and replication, performance evaluation, algorithmic for heterogeneous and distributed platforms, …

• Based on CORBA, NWS, LDAP, and our own software developments FAST for performance evaluation, LogService for monitoring, VizDIET for the visualization, GoDIET for the deployment

• Several applications in different fields (simulation, bioinformatics, …)• Release 2.2 available on the web • ACI Grid ASP, TLSE, ACI MD GDS, RNTL GASP, ANR LEGO, Gwendia,

COOP, Grid’5000http://graal.ens-lyon.fr/DIET/

DIET Dashboard

DIET Architecture

LA

MA

LA

LALA

Server front end

Master Agent

Local Agent

Client

MA

MA

MA

MA

JXTA

Client and server interface• Client side

So easy … Multi-interface (C, C++, Fortran,

Java, Scilab, Web, etc.) Grid-RPC compliant

• Server side Install and submit new server to

agent (LA) Problem and parameter

description Client IDL transfer from server Dynamic services

new service new version security update outdated service Etc.

Grid-RPC compliant

Data/replica management• Two needs

Keep the data in place to reduce the overhead of communications between clients and servers

Replicate data whenever possible

• Three approaches for DIET DTM (LIFC, Besançon)

Hierarchy similar to the DIET’s one Distributed data manager Redistribution between servers

JuxMem (Paris, Rennes) P2P data cache

DAGDA (IN2P3, Clermont-Ferrand) Joining task scheduling and data management

• Work done within the GridRPC Working Group (OGF) Relations with workflow management

ClientA

F

G

Client

Y

Server 1

Server 2

X

B

B

B

DAGDA

Data Arrangement for Grid and Distributed Applications• A new data manager for the DIET middleware providing

Explicit data replication: Using the API. Implicit data replication: The data are replicated on the selected

SeDs. Direct data get/put through the API. Automatic data management: Using a selected data replacement

algorithm when necessary. LRU: The Least Recently Used data is deleted. LFU: The Least Frequently Used data is deleted. FIFO: The « oldest » data is deleted.

Transfer optimization by selecting the more convenient source. Using statistics on previous transfers.

Storage resources usage management. The space reserved for the data is configured by the user.

Data status backup/restoration. Allowing to stop and restart DIET, saving the data status on each node.

DAGDA• Transfer model

Uses the pull model. The data are sent independently of the service call. The data can be sent in several parts.

1: The client send a request for a service.2: DIET selects some SeDs according to the chosen scheduler.3: The client sends its request to the SeD.4: The SeD download the data from the client and/or from other nodes of DIET.5: The SeD performs the call.6: The persistent data are updated.

DAGDA• DAGDA architecture

Each data is associated to one unique identifier

DAGDA control the disk and memory space limits. If necessary, it uses a data replacement algorithm.

The CORBA interface is used to communicate between the DAGDA nodes.

The users can access to the data and perform replications using the API.

DIET Scheduling• Collector of Resource Information (CoRI)

Interface to gather performance information Functional requirements

Set of basic metrics One single access interface

Non-functional requirements Extensibility Accuracy and latency Non-Intrusiveness

• Currently 2 modules available CoRI Easy Fast Extension possibilities:

Ganglia, NagiosR-GMA, Hawkeye, INCA, MDS, …

CoRI-Easy Collector

FAST Collector

CoRI ManagerOther

Collectors like

Ganglia

FAST Software

• Performance evaluation of platform enables to find an efficient server (redistribution and computation costs) without testing every configuration performance database for the scheduler

• Based on NWS (Network Weather Service)

FAST: Fast Agent’s System Timer

Client application

FAST API

Static DataAcquisition

Dynamic DataAcquisition

FAST

Low level software

LDAP BDB NWS ...

Computer

• Memory amount• CPU Speed• Batch system

Network

• Bandwidths• Latencies• Topology• Protocols

Computation

• Feasibility• Execution time

on a given architecture

Computer

• Status (up or down)

• Load• Memory• Batch queue

status

Network

• Bandwidths• Latencies

Plugin Schedulers• “First” version of DIET performance management

Each SeD answers a profile (COMP_TIME, COMM_TIME, TOTAL_TIME, AVAILABLE_MEMORY) for each request

Profile is filled by FAST Local Agents sort the results by execution time and send them back

up to the Master Agent

• Limitations Limited availability of FAST/NWS Hard to install and configure Priority of FAST-enabled servers Extension hard to handle Non-standard application- and platform-specific performance

measures Firewall problems with some performance evaluation tools No use of integrated performance estimator (i.e. Ganglia)

DIET Plug-in Schedulers • SeD level

Performance estimation function Estimation Metric Vector (estVector_t) - dynamic collection of performance

estimation values Performance measures available through DIET

FAST-NWS performance metrics Time elapsed since the last execution CoRI (Collector of Resource Information)

Developer defined values Standard estimation tags for accessing the fields of an estVector_t

EST_FREEMEM EST_TCOMP EST_TIMESINCELASTSOLVE EST_FREECPU

• Aggregation Methods Defining mechanism how to sort SeD responses: associated with the service

and defined at SeD level Tunable comparison/aggregation routines for scheduling Priority Scheduler

Performs pairwise server estimation comparisons returning a sorted list of server responses;

Can minimize or maximize based on SeD estimations and taking into consideration the order in which the request for those performance estimations was specified at SeD level.

Workflow Management (ANR Gwendia)• Workflow representation

Direct Acyclic Graph (DAG) Each vertex is a task Each directed edge represents

communication between tasks

• Goals Build and execute workflows Use different heuristics to solve

scheduling problems Extensibility to address multi-workflows

submission and large grid platform Manage heterogeneity and variability of

environment

Architecture with MA DAG• Specific agent for workflow management (MA DAG)• Two modes:

MA DAG defines a complete scheduling of the workflow (ordering and mapping) MA DAG defines only an ordering for the workflow execution, the mapping is done in

the next step by the client which pass by the Master Agent to find the server where execute the workflow services.

Workflow Designer• Applications viewed as services within DIET• Compose services to get a complete application workflow in a

drag’&’drop fashion

DIET: Batch System interface• A parallel world

Grid resources are parallel (parallel machines or clusters of compute nodes) Applications/services can be parallel

• Problem many types of Batch Systems exist, each having its own behavior and user

interface• Solution

Use a layer of intermediary meta variables Use an abstract BatchSystem factory

BatchSystem

<<abstract>>

Loadleveler_BatchSystem

SGE_BatchSystem

OAR1_6BatchSystem

PBS_BatchSystem

Grid’5000 Grid’5000

1) Building a nation wide experimental platform for Grid & P2P researches (like a particle accelerator for the computer scientists)• 9 geographically distributed sites hosting clusters with 256 CPUs to

1K CPUs)• All sites are connected by RENATER (French Res. and Edu. Net.)• RENATER hosts probes to trace network load conditions• Design and develop a system/middleware environment for safely test and

repeat experiments2) Use the platform for Grid experiments in real life conditions

• Address critical issues of Grid system/middleware:• Programming, Scalability, Fault Tolerance, Scheduling

• Address critical issues of Grid Networking• High performance transport protocols, Qos

• Port and test applications• Investigate original mechanisms

• P2P resources discovery, desktop Grids

4 main features:• A high security for Grid’5000 and the Internet, despite the deep

reconfiguration feature• A software infrastructure allowing users to access Grid’5000 from any

Grid’5000 site and have home dir in every site• A reservation/scheduling tools allowing users to select node sets and

schedule experiments• A user toolkit to reconfigure the nodes and monitor experiments

Goals and Protocol of the Experiment• Validation of the DIET architecture at large scale over different

administrative domains

• Protocol DIET deployment over a maximum of processors Large number of clients Comparison of the DIET execution times with average local execution times 1 MA, 8 LA, 540 SeDs 2 requests/SeD 1120 clients on 140 machines DGEMM requests (2000x2000 matrices) Simple round-robin scheduling using

time_since_last_solve

Grid’5000

Paravent : 9 s

Lilles : 34 s Paraci : 11 s

Bordeaux : 33 s Parasol : 33 s

Sophia : 40 s Toulouse : 33 s

Lyon : 38 s Orsay : 40 s

Results Grid’5000

ORSAY

SeDLoadLeveler

BORDEAUX

Project Users

Sed = Server Daemon, installed on any server running Loadleveler. Note that we can define rescue SeD.MA = master agent, coordinates Jobs. We can define rescue or multiple Master Agent.WN = worker node

SeDLoadLeveler

SeDLoadLeveler

SeDLoadLeveler

Web

Interfa

ceOrsayDecrypthon2

CRIHANDB2

OrsayDecrypthon1

MasterAgent

DIETDécrypthon

LILLE

JUSSIEU

Deployment example: Décrypthon platform

BD AFMCliniques

Lyon

IBM WII

Data manager

Interface

Eucalyptus – the Open Source Cloud• Eucalyptus is:

A research project of a team from the University of California, Santa Barbara

An Open Source Project An IaaS Cloud Platform

• Base principles A collection of Web Services on each node Virtualization to host user images (Xen technology) Virtual networks to provide security Implement the Amazon EC2 interface

Systems / Tools built for EC2 are usable “Turing test” for Eucalyptus

Uses commonly-known and available Linux technologies

• http://open.eucalyptus.com/

Eucalyptus platform

DIETCloud architecture

MA

LA LA

SeD SeD SeD

CLC

CC

CC

NC

NC

NC

NC

+ =DIET Eucalyptus

DIETCloud Architecture• Several solutions that differ by how much of the architectures of

both systems overlap or are included one in the other DIET is completely included in Eucalyptus DIET is completely outside of Eucalyptus

…and all the possibilities in between

DIET completely included in Eucalyptus

• The DIET platform is virtualized inside Eucalyptus

• Very flexible and scalable as DIET nodes can be launched when needed

• Scheduling is more complex

CLC

CC CC

NCSeD

NCLA

NCMA

Eucalyptus

DIET

DIET completely outside of Eucalyptus

• SeD requests resources to Eucalyptus

• SeD works directly with the Virtual Machines

• Useful when Eucalyptus is a 3-rd party resource

MA

LA

SeD SeDCLC

CC

NCNC

DIET

Eucalyptus

Implemented Architecture

• We have considered the architecture taking benefits of DIET design when DIET is completely outside of Eucalyptus

• Eucalyptus is treated as a new Batch System Easy and natural way of use in DIET

DIET is designed to easily add a new batch scheduler Provide a new implementation for the BatchSystem abstract

class

• Handling of a service call is done in three steps:1. Obtain the requested virtual machines by a SOAP call

to Eucalyptus

2. Execute the service on the instantiated virtual machines, bypassing the Eucalyptus controllers

3. Terminating the virtual machines by a SOAP call to Eucalyptus

DIETCloud: a new DIET architecture

EucalyptusAmazon EC2

EucalyptusBatch System

Some thoughs about DIET and Clouds• The door to using Cloud platforms through DIET has been

opened• The first DIET Cloud architecture was designed• The current work serves as a proof of concept of using the

DIET Grid-RPC middleware on top of the Eucalyptus Cloud system to demonstrate general purpose computing using Cloud platforms

• Possible ways of connecting the two architectures have been studied

• Several issues still remain to be solved Instance startup time needs to be taken into account A new scheduling strategy is needed for more complex

architectures The performance of such a system needs to be measured

• GridRPC Interesting approach for several applications Simple, flexible, and efficient Many interesting research issues (scheduling, data management, resource discovery and

reservation, deployment, fault-tolerance, …)• DIET

Scalable, open-source, and multi-application platform Concentration on several issues like resource discovery, scheduling (distributed scheduling and

plugin schedulers), deployment (GoDIET and GRUDU), performance evaluation (CoRI), monitoring (LogService and VizDIET), data management and replication (DTM, JuxMem and DAGDA)

Large scale validation on the Grid5000 platform A middleware designed and tunable for an application given

• And now … Client/server DIET for Décrypthon applications Deployment and validation on execution

Duplicate and check requests from UD Validation using SeD_batch (Loadleveler version) Data management optimization Scheduling optimization More information and statistics for users Fault tolerance mechanisms

Conclusions and future work

http://graal.ens-lyon.fr/DIEThttp://www.grid5000.org/

Research Topics• Scheduling

Distributed scheduling Software platform deployment with or without dynamic connections

between components Plug-in schedulers Multiple (parallel) workflows scheduling Links with batch schedulers Many tasks scheduling

• Data-management Scheduling of computation requests and links with data-

management Replication, data prefetching Workflow scheduling

• Performance evaluation Application modeling Dynamic information about the platform (network, clusters)

Questions ?

http://graal.ens-lyon.fr/DIET

DIET Overview and some recent work A middleware for the large scale de ployment of applications over the Grid

Documents

grid computing

performance evaluation

high performance

data need

grid metacomputing

better performance results

web aci grid asp

data redistribution