VIP: design and implementation of the portal and execution service
Post on 11-Jun-2015
446 Views
Preview:
DESCRIPTION
Transcript
VIP: design and implementation of the portal and execution service
1
VIP Launching Workshop Lyon, December 14th 2012
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Rafael FERREIRA DA SILVA CNRS, CREATIS, INSA-Lyon, Université Lyon 1, INSERM
For the VIP Project Consortium:
Outline
Introduction
VIP Architecture Web Portal Data Transfers Workflow Execution
Workflow Self-Healing
Conclusions
2 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr http://vip.creatis.insa-lyon.fr
Platform goals
Multi-modality medical image simulators MRI, US, CT and PET
Objectives Workflow execution on EGI
Access to storage resources
High–level interface for non-experts
No IT required Software as a Service (SaaS)
No client software instalation
New features automatically available
Consolidated support and troubleshooting
3 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr http://vip.creatis.insa-lyon.fr
VIP – Architecture
4 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr http://vip.creatis.insa-lyon.fr
GASW
Object Model Repository
Simulated Data Repository
Workflow Engine Job Generation
Job Scheduler
Data Management
VIP – Web Portal
5 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr http://vip.creatis.insa-lyon.fr
User Front-End Openly-accessible web portal Access point to models and simulators. User-friendly interface which assists users in using image
simulators. Modular code design (GWT + SmartGWT)
Users/Apps Management
6 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr http://vip.creatis.insa-lyon.fr
Users Groups Application Classes Applications
VIP – GRIDA
7 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr http://vip.creatis.insa-lyon.fr
Grid Data Management Agent Handles file catalog and transfer operations by pooling
Performs data replication
Data Transfers Management
8 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr http://vip.creatis.insa-lyon.fr
User Machine VIP Server Grid Storage
User uploads file to VIP Server
GRIDA Uploads file to the grid (replication)
GRIDA Downloads file to VIP Server
User downloads the file
VIP – Data Repositories
9 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr http://vip.creatis.insa-lyon.fr
Easily integration of third-party libraries NeuSemStore-Provenance for simulated
data
NeuSemStore-Simulated-Objects for the model catalog
Encapsulation of objects as GWT serialized beans
More details on the presentation of B. Gibaud
Databases GWT Server GWT Client
RPC call
GWT Bean
NeuSemStore
10 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr http://vip.creatis.insa-lyon.fr
MOTEUR workflow engine Applications described on formal language http://modalis.i3s.unice.fr/softwares/moteur
Generic Application Service Wrapper (GASW) Bash scripts wrapped in grid jobs
Self-healing of workflow execution
VIP – Workflow Engine
VIP – Architecture
Workload Management System with Pilot Jobs Distributed Infrastructure with
Remote Agent Control (DIRAC) [CPPM-LHCb]
http://diracgrid.org
Hosted by CC-IN2P3 French National Instance
Data Storage and Computing Back-End EGI infrastructure, Biomed VO http://www.egi.eu
11 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr http://vip.creatis.insa-lyon.fr
Workflow Execution
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
2. User launches a simulation
3. MOTEUR generates invocations
4. GASW generates grid jobs
5. Jobs are submitted to DIRAC
6. Pilot jobs are submitted to EGI
1. Input data upload
7. Pilot jobs fetch grid jobs
8. Inputs download
10. Results upload
11. Download results
9. Execution
http://vip.creatis.insa-lyon.fr 12
Outline
Introduction
VIP Architecture Web Portal Data Transfers Workflow Execution
Workflow Self-Healing
Conclusions
13 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr http://vip.creatis.insa-lyon.fr
Workflow Self-Healing
14 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr http://vip.creatis.insa-lyon.fr
Problem: costly manual operations Rescheduling tasks, restarting services, killing misbehaving
experiments or replicating data files
Objective: automated platform administration Autonomous detection of operational incidents
Perform appropriate set of actions
Assumptions: online and non-clairvoyant Only partial information available
Decisions must be fast
Production conditions, no user activity and workloads prediction
General MAPE-K loop
15 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr http://vip.creatis.insa-lyon.fr
Incident 1 degree η = 0.8
Incident 2 degree η = 0.4
Incident 3 degree η = 0.1
level 1
level2
level3
Roulette wheel selection
Incident 1
Selected
Rule Confidence (ρ) ρxη
2 1 0.8 0.32
3 1 0.2 0.02
1 1 1.0 0.80
Association rules for incident 1
Incident 2
Selected
Roulette wheel selection based on association rules
Set of Actions
x2
level 1
level2
level3
level 1
level2
level3
€
=ηiη jj=1
n∑
event (job completion and failures) or timeout
Monitoring Analysis
Execution Knowledge
Planning
Monitoring data
Incident: Activity Blocked An invocation is late compared to the others
Possible causes Longer waiting times
Lost tasks (e.g. killed by site due to quota violation)
Resources with poor performance
16 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Invocations completion rate for a simulation Job flow for a simulation
http://vip.creatis.insa-lyon.fr
Activity blocked: degree Degree computed from all completed jobs of the activity
Job phases: setup inputs download execution outputs upload
Assumption: bag-of-tasks (all jobs have equal durations)
Median-based estimation:
Incident degree: job performance w.r.t median
17 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
€
d =Ei
Mi + Ei
∈ [0,1]
Median duration of jobs phases
Real job duration
42s
300s
20s
?
42s
300s
400s*
15s
Estimated job duration
50s
250s
400s
15s
completed
current
Mi = 715s Ei = 757s
*: max(400s, 20s) = 400s
http://vip.creatis.insa-lyon.fr
Activity blocked: levels and actions
Levels: identified from the platform logs
Actions Job replication
Cancel replicas with bad performance
Replicate only if all active replicas are running
18 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Replication process for one task
Level 1 (no actions)
Level 2
action: replicate jobs
d
€
τ1
http://vip.creatis.insa-lyon.fr
Experimental results
19 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
speeds up FIELD-II execution up to 4
Repetition w
1 –0.10
2 –0.15
3 –0.09
4 0.05
5 –0.26
Goal: Self-Healing vs No-Healing Cope with recoverable errors
Metrics Makespan of the activity execution
Resource waste
For w < 0: self-healing consumed less resources
For w > 0: self-healing wasted resources €
w =(CPU + data) self −healing(CPU + data)no−healing
−1
Self-Healing process reduced resource consumption up to 26% when compared
to the No-Healing execution R. Ferreira da Silva, T. Glatard, F. Desprez, Self-healing of operational workflow incidents on distributed computing infrastructures, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Ottawa, Canada, 2012.
VIP – Facts 321 registered users, from
38 countries
Most used portal certificate in EGI (August 2012) https://wiki.egi.eu/wiki/EGI_robot_certificate_users
Consumed 379 CPU years from January 2011 to August 2012 http://accounting.egi.eu
1/10 of the total activity of the biomed international VO. One of the most active users
20 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr http://vip.creatis.insa-lyon.fr
VIP – Facts
21 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr http://vip.creatis.insa-lyon.fr
Repartition of application executions in VIP (Nov 2011 – Oct 2012)
1155 executed simulations during the last year (~3/day)
Applications
Repartition of portal users on EGI (August 2012) (source: https://wiki.egi.eu/wiki/EGI_robot_certificate_users)
Users
Concluding remarks VIP is an openly-accessible web portal for multi-modality
medical image simulators MRI, US, CT and PET and other tools Workflow execution on EGI Access to storage resources High–level interface for non-experts
No IT required (Software as a Service)
Facts 321 registered users from 38 countries Consumed about 400 CPU years / year
Limits and perspectives Fair resource allocation among workflows User support Heavy data transfers
22 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr http://vip.creatis.insa-lyon.fr
VIP: design and implementation of the portal and execution services
VIP Launching Workshop Lyon, December 14th 2012
Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Rafael FERREIRA DA SILVA CNRS, CREATIS, INSA-Lyon, Université Lyon 1, INSERM
For the VIP Project Consortium:
Thank you for your attention. Questions?
http://vip.creatis.insa-lyon.fr!
top related