1 Rafael Ferreira da Silva – [email protected]Workflow Fairness Control on Online and Non-Clairvoyant Distributed Computing Platforms Rafael FERREIRA DA SILVA , Tristan GLATARD University of Lyon, CNRS, INSERM, CREATIS Villeurbanne, France Frédéric DESPREZ INRIA, University of Lyon, LIP, ENS Lyon Lyon, France Euro-Par 2013 August 26-30, 2013
25
Embed
Workflow fairness control on online and non-clairvoyant distributed computing platforms
Presentation held at Euro-Par 2013 - Aachen - Germany
Abstract. Fairly allocating distributed computing resources among workflow executions is critical to multi-user platforms. However, this problem remains mostly studied in clairvoyant and offline conditions, where task durations on resources are known, or the workload and available resources do not vary along time. We consider a non-clairvoyant, online fairness problem where the platform workload, task costs and resource characteristics are unknown and not stationary. We propose a fairness control loop which assigns task priorities based on the fraction of pending work in the workflows. Workflow characteristics and performance on the target resources are estimated progressively, as information becomes available during the execution. Our method is implemented and evaluated on 4 different applications executed in production conditions on the European Grid Infrastructure. Results show that our technique reduces slowdown variability by 3 to 7 compared to first-come-first-served.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Roulette wheel selection based on association rules
Set of Actions
x2
level 1
level2
level3
level 1
level2
level3
€
=ηiη jj=1
n∑
event (job completion and failures)
or timeout
Monitoring Analysis
Execution Knowledge
Planning
Monitoring data
R. Ferreira da Silva, T. Glatard, F. Desprez, Self-healing of workflow activity incidents on distributed computing infrastructures, Future Generation Computer Systems (FGCS), in press, 2013.
Incident degrees are quantified in discrete incident levels
Thresholds are determined from visual mode clustering or K-means
R. Ferreira da Silva, T. Glatard, A science-gateway workload archive to study pilot jobs, user activity, bag of tasks, task sub-steps, and workflow executionss, CoreGRID/ERCIM Workshop on Grids, Clouds and P2P Computing (CGWS), Rhodes Island, Greece, 2012.
Outline
Context The Virtual Imaging Platform Problem definition
Fairness among workflow executions Self-healing of workflow executions on grids