1 Rafael Ferreira da Silva – [email protected]On-line, Non-Clairvoyant Optimization of Workflow Activity Granularity on Grids Rafael FERREIRA DA SILVA , Tristan GLATARD University of Lyon, CNRS, INSERM, CREATIS Villeurbanne, France Frédéric DESPREZ INRIA, University of Lyon, LIP, ENS Lyon Lyon, France Euro-Par 2013 August 26-30, 2013
24
Embed
On-line, non-clairvoyant optimization of workflow activity granularity task on grids
Presentation held at Euro-Par 2013, Aachen, Germany
Abstract. Controlling the granularity of workflow activities executed on widely distributed computing platforms such as grids is required to reduce the impact of task queuing and data transfer time. Most existing granularity control approaches assume extensive knowledge about the applications and resources (e.g. task duration on each resource), and that both the workload and available resources do not change over time. We propose a granularity control algorithm for platforms where such clairvoyant and offline conditions are not realistic. Our method groups tasks when the fineness degree of the application, which takes into account the ratio of shared data and the queuing/round-trip time ratio, becomes higher than a threshold determined from execution traces. The algorithm also de-groups task groups when new resources arrive. The application's behavior is constantly monitored so that the characteristics useful for the optimization are progressively discovered. Experimental results, obtained with 3 workflow activities deployed on the European Grid Infrastructure, show that (i) the grouping process yields speed-ups of about 2.5 when the amount of available resources is constant and that (ii) the use of de-grouping yields speed-ups of 2 when resources progressively appear.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Roulette wheel selection based on association rules
Set of Actions
x2
level 1
level2
level3
level 1
level2
level3
€
=ηiη jj=1
n∑
event (job completion and failures)
or timeout
Monitoring Analysis
Execution Knowledge
Planning
Monitoring data
R. Ferreira da Silva, T. Glatard, F. Desprez, Self-healing of workflow activity incidents on distributed computing infrastructures, Future Generation Computer Systems (FGCS), in press, 2013.
Incident degrees are quantified in discrete incident levels
Thresholds are determined from visual mode clustering or K-means
R. Ferreira da Silva, T. Glatard, A science-gateway workload archive to study pilot jobs, user activity, bag of tasks, task sub-steps, and workflow executionss, CoreGRID/ERCIM Workshop on Grids, Clouds and P2P Computing (CGWS), Rhodes Island, Greece, 2012.
Outline
Context The Virtual Imaging Platform Problem definition
Task granularity Self-healing of workflow executions on grids
Context Autonomous handling of unfairness among workflow executions No strong assumptions on resource characteristics and workload
Summary of the proposed method Implements a generic MAPE-K loop Determines task fineness based on queue waiting time and estimated
data transfer time of shared input data Tasks are grouped pairwise as long as Q > R, and tasks are too fine Tasks are ungrouped when the number of available resources increases
Optimizing task granularity Properly detects and handles lightweight tasks Stationary load: fineness control significantly reduces the makespan of
all applications Non-stationary load: de-grouping algorithm compensates lack of