Top Banner
1 Euro-Par 2007, Rennes, 29th August The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan * , Ozan Sonmez and Dick Epema PDS Group Delft University of Technology The Netherlands * : now postdoc LRI/INRIA Futurs, Orsay (Paris South), France
23

Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Dec 25, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

1Euro-Par 2007, Rennes, 29th August

The Characteristics and Performance of Groups of Jobs in

Grids

Alexandru Iosup, Mathieu Jan*, Ozan Sonmez and Dick Epema

PDS GroupDelft University of Technology

The Netherlands

*: now postdoc LRI/INRIA Futurs, Orsay (Paris South), France

Page 2: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 2

Outline

• Why looking at groups of jobs?

• Grid traces and environment summary

• Definitions of groups of jobs

• The characteristics of jobs grouping• Workload-level analysis• Group-level analysis• Job-level analysis

• Conclusion and future work

Page 3: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 3

Why looking at groups of jobs?

• Current grids run almost exclusive single-node jobs [Grid2006]• Traces analysis: LCG, Grid3, TeraGrid, DAS-2

• How jobs are related then? What is their structure?• Batches of identical jobs?• Something else?

• No such analysis using long-term data from production and research grid environment

• No analysis of the impact of groups of jobs on the performance of grids

Page 4: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 4

Our research questions

• What are the dependencies among the jobs submitted by a single user?

• What is the physical structure of such groupings?

• What is the impact of the job groupings on the performance of grids?

Page 5: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 5

Grid traces: Grid’5000 (1/3)

• Experimental platform• Grid’5000: 9 sites, 15 clusters• All clusters managed by OAR

• Trace period: 05/2004 - 11/2006• CPUs: ~ 2500• Jobs: 951 K• Users: 473• Groups: 10• Consumed CPU time: 651 years

Page 6: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 6

Grid traces: NorduGrid (2/3)

• Large scale production grid • NorduGrid: ~75 sites• Handled via ARC middleware

• Advanced Resource Connector

• Trace period: 05/2004 - 02/2006• CPUs: ~ 2000• Jobs: 781 K• Users: 387• Groups: 106• Consumed CPU time: 2443 years

Page 7: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 7

Grid traces: GLOW (3/3)

• Grid Laboratory Of Wisconsin• Campus wide distributed computing

environment• Condor based

• Trace period: 09/2006 - 01/2007• CPUs: ~ 1400• Jobs: 216 K• Users: 18• Groups: 1• Consumed CPU time: 55 years

Page 8: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 8

Grid traces summary

Period 05/2004 - 11/2006

05/2004 - 02/2006

09/2006 - 01/2007

Sites 15 ~75 1

CPUs ~2500 ~2000 ~1400

Jobs 951 K 781 K 216 K

Groups 10 106 1

Users 473 387 18

Consumed CPU time

651 years 2443 years 55 years

Page 9: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 9

Groups of jobs: definitions (1/2)

• Batch submission

Maximal contiguous subsequence G of such that for any two successive jobs J, J’ in G

• Parameter Sweep Application (PSA)• Batch submission + jobs execute the same application

Page 10: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 10

Groups of jobs: definitions (2/2)

• In this talk, we focus on batch submissions

Page 11: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 11

Characteristics of jobs groupings

• In our analysis, = 120 seconds

Page 12: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 12

Workload-level analysis

Grid’5000 NorduGrid GLOW

Submissions

26k 50k 13k

Jobs 808k (951k)

738k (781k) 205k (216k)

CPU time 193y (651y)

2192y (2443y)

53y (55y)

• Batches

• Continued• NorduGrid & GLOW: identical to batches• Grid’5000: 14k sub, 910k jobs, 462y

• Bursty: less submissions, more jobs

Page 13: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 13

Group-level analysis: size of batches

• 75% of batches are size 15-20 (Grid’5000 and NorduGrid) or <10 (GLOW)• Average: 31+/-110 (Grid’5000), 15+/-33 (NorduGrid) and 15+/-38 (GLOW)• Heavy-tail distribution

Page 14: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 14

Group-level analysis: inter-arrival time (seconds)

• Expected high inter-arrival time for batches• 50% of the values are between 400 and 700 seconds• Reminder: = 120 seconds

Page 15: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 15

Group-level analysis: duration (seconds)

• Duration of batches are higher than for single jobs• For NorduGrid, average duration of batches is 1.5 day vs. 1

day for single jobs

Page 16: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 16

Group-level analysis: consumed CPU time (KCPUs)

• Consumed CPU time is much higher for batches than for single jobs!

Page 17: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 17

Job-level analysis: run time (seconds)

• Average run time for batches• Grid’5000: 0.66+/-6.65 days• GLOW: 1.04+/-3.18 days• NorduGrid: 2.27+/-5.59 days

Page 18: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 18

Job-level analysis: wait time (seconds)

• NorduGrid: no wait time information in the trace • Average wait times of batches are higher than

• The runtime of batches• The wait time of single jobs

Page 19: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 19

Job-level analysis: consumed CPU time (KCPUs)

• No clear distinction between batches and single jobs

Page 20: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 20

Other analyses

• Do parallel jobs inside batches exists?• Average parallelism: 1+/-1 (Grid’5000), 2+/-7 (NorduGrid)

and 1 (GLOW)• Grid’5000: 37% of batches are of size 2, 9% of size >2,

max. = 325

• To what extend batches are PSAs?• In Grid’5000, 75% of batches are PSAs• PSAs compared to batches:

• Increased grouped size by 9 in average• Average duration time divided by 5.7

Page 21: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 21

Performance impact of grouped submissions

• Batches display an high AIT value• Over 4000% of the ART!

• Research direction for designing scheduling policies for batches: minimization of the AIT of batches

• Performances metrics• Group runtime (RT)• Group duration (DT)• Group idle time: IT = DT - RT

Batches Single jobs

ART (s) AIT (s) ART (s) AIT (s)

Grid’5000

14 181 568 483 4 127 4 233

Page 22: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 22

Conclusion & future work

• Formally defined 3 types of groups of jobs• Batch (and PSAs), continued and bursty

• Analysis of 3 long-term traces from large and different platforms• Up to 96% of CPU time consumed by batch submissions

• Performance analysis of batches compared to single jobs

• Future work • Deeper analysis (Grid Workloads Archives)• Research direction: minimization of idle time in groups• Trace driven simulations• Dynamic resource availability [Grid2007]

Page 23: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Euro-Par 2007, Rennes, 29th August 23

Thank you! Questions? Remarks? Observations?

Help building our community’sGrid Workloads Archive:

http://gwa.ewi.tudelft.nl/