1 Euro-Par 2007, Rennes, 29th August The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan * , Ozan Sonmez and Dick Epema PDS Group Delft University of Technology The Netherlands * : now postdoc LRI/INRIA Futurs, Orsay (Paris South), France
23
Embed
Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1Euro-Par 2007, Rennes, 29th August
The Characteristics and Performance of Groups of Jobs in
Grids
Alexandru Iosup, Mathieu Jan*, Ozan Sonmez and Dick Epema
PDS GroupDelft University of Technology
The Netherlands
*: now postdoc LRI/INRIA Futurs, Orsay (Paris South), France
Euro-Par 2007, Rennes, 29th August 2
Outline
• Why looking at groups of jobs?
• Grid traces and environment summary
• Definitions of groups of jobs
• The characteristics of jobs grouping• Workload-level analysis• Group-level analysis• Job-level analysis
• Conclusion and future work
Euro-Par 2007, Rennes, 29th August 3
Why looking at groups of jobs?
• Current grids run almost exclusive single-node jobs [Grid2006]• Traces analysis: LCG, Grid3, TeraGrid, DAS-2
• How jobs are related then? What is their structure?• Batches of identical jobs?• Something else?
• No such analysis using long-term data from production and research grid environment
• No analysis of the impact of groups of jobs on the performance of grids
Euro-Par 2007, Rennes, 29th August 4
Our research questions
• What are the dependencies among the jobs submitted by a single user?
• What is the physical structure of such groupings?
• What is the impact of the job groupings on the performance of grids?
Euro-Par 2007, Rennes, 29th August 5
Grid traces: Grid’5000 (1/3)
• Experimental platform• Grid’5000: 9 sites, 15 clusters• All clusters managed by OAR
• 75% of batches are size 15-20 (Grid’5000 and NorduGrid) or <10 (GLOW)• Average: 31+/-110 (Grid’5000), 15+/-33 (NorduGrid) and 15+/-38 (GLOW)• Heavy-tail distribution
Euro-Par 2007, Rennes, 29th August 14
Group-level analysis: inter-arrival time (seconds)
• Expected high inter-arrival time for batches• 50% of the values are between 400 and 700 seconds• Reminder: = 120 seconds
Euro-Par 2007, Rennes, 29th August 15
Group-level analysis: duration (seconds)
• Duration of batches are higher than for single jobs• For NorduGrid, average duration of batches is 1.5 day vs. 1
day for single jobs
Euro-Par 2007, Rennes, 29th August 16
Group-level analysis: consumed CPU time (KCPUs)
• Consumed CPU time is much higher for batches than for single jobs!
Euro-Par 2007, Rennes, 29th August 17
Job-level analysis: run time (seconds)
• Average run time for batches• Grid’5000: 0.66+/-6.65 days• GLOW: 1.04+/-3.18 days• NorduGrid: 2.27+/-5.59 days
Euro-Par 2007, Rennes, 29th August 18
Job-level analysis: wait time (seconds)
• NorduGrid: no wait time information in the trace • Average wait times of batches are higher than
• The runtime of batches• The wait time of single jobs
Euro-Par 2007, Rennes, 29th August 19
Job-level analysis: consumed CPU time (KCPUs)
• No clear distinction between batches and single jobs
Euro-Par 2007, Rennes, 29th August 20
Other analyses
• Do parallel jobs inside batches exists?• Average parallelism: 1+/-1 (Grid’5000), 2+/-7 (NorduGrid)
and 1 (GLOW)• Grid’5000: 37% of batches are of size 2, 9% of size >2,
max. = 325
• To what extend batches are PSAs?• In Grid’5000, 75% of batches are PSAs• PSAs compared to batches:
• Increased grouped size by 9 in average• Average duration time divided by 5.7
Euro-Par 2007, Rennes, 29th August 21
Performance impact of grouped submissions
• Batches display an high AIT value• Over 4000% of the ART!
• Research direction for designing scheduling policies for batches: minimization of the AIT of batches
• Performances metrics• Group runtime (RT)• Group duration (DT)• Group idle time: IT = DT - RT
Batches Single jobs
ART (s) AIT (s) ART (s) AIT (s)
Grid’5000
14 181 568 483 4 127 4 233
Euro-Par 2007, Rennes, 29th August 22
Conclusion & future work
• Formally defined 3 types of groups of jobs• Batch (and PSAs), continued and bursty
• Analysis of 3 long-term traces from large and different platforms• Up to 96% of CPU time consumed by batch submissions
• Performance analysis of batches compared to single jobs
• Future work • Deeper analysis (Grid Workloads Archives)• Research direction: minimization of idle time in groups• Trace driven simulations• Dynamic resource availability [Grid2007]
Euro-Par 2007, Rennes, 29th August 23
Thank you! Questions? Remarks? Observations?
Help building our community’sGrid Workloads Archive: