Top Banner
Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman ([email protected]) Preston Smith ([email protected]) Rosen Center for Advanced Computing Purdue University
23

Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman ([email protected]) Preston Smith ( [email protected])

Dec 28, 2015

Download

Documents

May Gilbert
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

Lessons Learned in the Purdue Teragrid Condor PoolsWith An Adventure in Light Weight Adaptation

P. A. Cheeseman ([email protected]) Preston Smith ([email protected])

Rosen Center for Advanced ComputingPurdue University

Page 2: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

• Rosen Center Clusters– Condor backfills among idle nodes in PBS clusters

• Provided 5.5 million CPU-hours in 2006, all from idle nodes in clusters

• Nature of Purdue pools makes for non-trivial chance of job eviction. More on this later.

• Campus– Idle labs– Departments around campus

Purdue Condor Pools

Page 3: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

All in all, 6400 CPUs available!• Use on TeraGrid

– 2.4 million hours in 2006 spent Building a database of hypothetical zeolite structions

– Solving the Football Pool Problem

• Already in 2007: 5.5 million hours allocated– 4th largest single award in March allocations meeting

• Condor provides TeraGrid unparalleled price/cycle– Similar throughput in terms of hours serviced with Cray XT3, DataStar, etc., for much less cost

Purdue TeraGrid

Page 4: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

• Usage reporting– TeraGrid uploads per-job usage nightly. This proved challenging

to collect with Condor– Perl scripting and data massaging to process history files and

inject data into a database.– Usage reporting infrastructure (AMIE) unable to keep up with the

deluge of job records.• … But that’s TeraGrid’s issue, not Condor’s.

• TG implemented temporary solution - usage reporting up to date

Purdue TeraGrid - Challenges

Page 5: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

• Detective Work learning how to determine accurate job time– RemoteWallClockTime - CumulativeSuspensionTime

• Not so useful for “charged time” on (such as TeraGrid)• Require manually computing difference of completion time

and last start time.

• Occasional bugs– Negative walltime numbers (or really large ones)

• Usually in a job that has been condor_rm’d

Purdue Teragrid - Usage Reporting

Page 6: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

• RCAC tracks job-level history. Similar history processing scripts used in campus grid as for TeraGrid

• Difficult to locate every schedd and grab history from it– Even more complicated when some schedds that we want to

account for usage are under different administration.– Skate aroung with ssh-key to collect history files

• We would love a centralized method to gather or record job history– Or condor_history outputting XML or GGF usage records..

More on Usage Reporting

Page 7: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

TeraGrid Projects

Prof. Keith Cherkauer (Purdue University)Hydrologic Simulations, continuing enterprise,reasonably predictable impact.I/O to CPU on the order of 50-200 MB/hour.File system saturation a strong possibility.

Prof. M. W. Deem (Rice University)Prof. D. J. Earl (University of Pittsburgh)

Hypothetical Zeolite StructuresMonte Carlo computation.Average time per set of ~1 hour with broad variance.I/O to Time on the order of 1-2 MB/hour on average.

Page 8: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

Lesson Learned Early - Cherkauer

Reality of leverage.

500+ jobs at 50-200 MB/Hour can keep a single file system very busy. The Cherkauer application was identifiably a problem for a particular parallel filesystem that shall not be named, at more than ~200 jobs in simultaneous execution (10 GB/Hour minimum).

Problems were resolved by conversion to standard universe to enable longer duration and fewer jobs.

• Eliminated system() calls.

• Added code to locate data files per search path.

Resulting code was usable under both vanilla and standard universes. Production runs presently being done in standard universe with file transfer.

Page 9: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

Outcome - Cherkauer

• Procedures developed to set up submissions allow for jobs to be queued in digestible batches.

• Procedures in ‘full automatic’ mode could be used to complete an entire problem while handling remote archive of results to avoid file system issues.

• Computation known to require a month or more of time now completes in less than a day.

Page 10: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

Database of Hypothetical Zeolite Structures

Prototyping:• Trial group of ~100 parameter sets was used to prototype.

• Initial live data group was 6707 parameter sets.

• Set processed by executing program ~100 times (cycles).

• Execution of application performed by script in vanilla universe. Script allowed self checkpoint capability and duration control.

Prototyping Observations:• Early delivery rates of ~7200 hours/day easily achieved.

• Ultimate number of sets to process was not well known. First estimate of ~500,000 grew to ~2,900,000 by 2007/02.

• Eviction rates were unacceptably high (see Figure 1).

Page 11: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

Unadapted Eviction Rate BreakdownData for 326,041 Jobs

20% or less12%

20-40%28%

40-60%25%

60% or more35%

Figure 1

Page 12: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

Database of Hypothetical Zeolite Structures

Prototyping Observations:• Compute times per set variable from minutes to several hours. (see Figures 2 and 3).

• Execution speed strongly related to compiler. Intel compilers were known in advance to produce significantly faster code.

Page 13: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

Database of Hypothetical Zeolite Structures

Adaptation Issues:• Limiting job duration to eliminate runaways and limit eviction.

• Increasing small job duration to lower overhead of handling.

• Preemption tolerance (self checkpoint).

• Fault tolerance.

• Many issues were initially addressed via execution script. Adaptation to standard universe was thought to be a must.

Page 14: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

Database of Hypothetical Zeolite Structures

Workflow:• Groups delivered via HTTP from Prof. Earl’s web site.

• Sets per group ranged from ~7000 to 30,000.

• Results returned to Prof. Earl via ‘drop zone’ in archival storage for post-analysis until approximately 10/2006. Post-analysis was subsequently handled at Purdue in Condor.

Processing at Purdue:• Steward procedures developed to feed jobs to Condor, monitor progress, validate results, resubmit unanticipated failure cases, and archive results for group.

• Stewards were designed to process group in batches of ~2000 sets to allow processing within 6-8 GB of volatile storage.

Page 15: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

Database of Hypothetical Zeolite Structures

Adapting for the Purdue Condor Pools• Eliminate need for execution side script to pave the way for standard universe execution.

• Incorporate repetitive execution within core application. Address overhead of execution side script, multiple loads of core application, enable transition to standard universe.

• Introduce self imposed timing controls. Address inability to identify runaways among 1000s of jobs.

• Embed reasonable self checkpoint capability. Address both preemption and fault tolerance.

• Introduce ability to tune average job duration to Condor pool conditions. Address eviction rate problem.

• Any other code work required to achieve the points above. Some memory management work was expected.

Page 16: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

Database of Hypothetical Zeolite Structures

Notes on Condor Adaptation• Written adaptation plan reviewed by all concerned parties.

• Adaptation work undertaken while production continued.

• Modifications to existing code plus new code ~30 routines.

• Several hundred lines of non-commentary written.

• Code revisions validated periodically by textual comparison of result files for 100 parameter sets from control case.

• Adaptation period spanned compiler version changes.

• Adapted code became production version 09/2006.

• Approximately 325,000 sets completed before production.

• Code adaptation mandated changes to steward procedures

Page 17: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

Database of Hypothetical Zeolite Structures

After adapting to Condor?• Execution times became manageable (see Figures 4 and 5).

• Eviction rates fell to more controllable ratios (Figure 6).

• Workflow became more automatic with ability to limit job duration (and exposure to various system hiccups). Recovering loss of a few hundred ‘short’ jobs was easier than recovering loss of the same number of ‘long’ jobs.

• Application could run in either of standard or vanilla universe equally well due to duration control.

• Ultimate choice was to remain in vanilla universe to continue using Intel V9 compiler suite.

• Front end load due to steward procedures was reduced due to less handling of intermediate semaphore and lock files.

Page 18: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

Unadapted Execution Times

0

100

200

300

400

500

600

700

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Time (hours)

Jobs

Unadapted Execution Time Breakdown

20 min. or less3%

20-40 min.6%

40 min. to 1 hr.7%

1-2 hr.24%

2-3 hr.21%

3 hr. or more39%

Adapted Execution TimesFive Jobs per Set

0

2000

4000

6000

8000

10000

12000

14000

0 1 2 3

Time (hours)

Jobs

Adapted Execution Time BreakdownFive Jobs per Set

1 hr. or less93%

1-2 hr.6%

2-3 hr.1%

0%

Figure 2 Figure 3

Figure 4 Figure 5

Page 19: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

Database of Hypothetical Zeolite Structures

0

10

20

30

40

50

60

Percentage of Jobs

10 30 50 70 90 110 130 150 170

Eviction Rate (%)

Comparison of Eviction RatesExcluded Sg1 and Sg2.

Before Adaptation (326,041 jobs)

After (575,803 jobs)

Figure 6

Page 20: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

Database of Hypothetical Zeolite Structures

Project to-date:• 1.5 million sets processed since 2006/02 including dry spells due to delays in workflow, exhaustion of allocation, and processing of renewal.

• 2.4 million hours ‘officially’ delivered to the project since 2006/02 or ~250 hours per hour excluding dry spells.

• Most recent throughput delivered 96,000 hours in 226 hour time span or ~400 processor hours per hour.

• Entire collaboration continues exclusively via e-mail.

• Approximately 1.4 million sets remain.

Page 21: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

Database of Hypothetical Zeolite Structures

Miscellany:• Throughout the project, emphasis was given to designing stewards to return results to Prof. Earl in the same arrangement as they were delivered to ease post-analysis. Revision of the data structure was never seriously undertaken, nor seen to be necessary.

• While getting jobs into execution was initial primary concern, bulk of work in stewards ultimately centered on automating handling of results.

• Various working file systems were tried during production. The present procedures operate using volatile storage for active computation, high capacity storage for staging, and long term (tape robot) storage for archival.

Page 22: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

Database of Hypothetical Zeolite Structures

More Miscellany:• Core of steward procedures composed of less than 10 scripts.

• Many additional scripts written to gather post mortem data w.r.t. job cost, fault statistics, exhaustive result validation, and summaries.

• DAGs were explored as a job metering tool but deferred due to problems not well understood and demands of production. Since the workflow didn’t implicitly require DAG features, the production methods were retained until a solid reason for using DAGs could be discerned. Additionally, ‘pre’ and ‘post’ procedures were known to be an undertaking as demanding as developing the stewards.

• Adaptation of the stewards to other batch systems was done more easily than expected. Batch systems for which prototypes were done included PBS, LSF, and LoadLeveler.

Page 23: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation P. A. Cheeseman (aai@purdue.edu) Preston Smith ( psmith@purdue.edu)

TeraGrid ‘07• Right here at UW!

– June 4-8,2 2007

• Full analysis of Zeolite application, plus other Condor work from Purdue in proceedings

• Condor tutorial and demonstrations• Come join us or even help!

Required Plug