Condor Team Member Computer Sciences Department University of Wisconsin-Madison [email protected] http://www.cs.wisc.edu/condor Dynamic DAGMan with ClassAds Himani Apte
Jan 20, 2016
Condor Team MemberComputer Sciences DepartmentUniversity of Wisconsin-Madison
[email protected]://www.cs.wisc.edu/condor
Dynamic DAGMan with ClassAds
Himani Apte
www.cs.wisc.edu/condor
Outline
› DAGMan workflow management
› Motivation for dynamic DAGMan
› ClassAds
› Putting together: DAGMan + ClassAds
› Looking ahead
www.cs.wisc.edu/condor
DAGMan
› Directed Acyclic Graph Manager
› Meta-scheduler for Condor
› DAG: set of jobs with dependencies
› Manages submission of DAG jobs
› Enforces execution order
› DAGMan itself is a Condor job!
www.cs.wisc.edu/condor
Example DAGJob A A.condor
Job B B.condor
Job C C.condor
Job D D.condor
Parent A Child B C
Parent B C Child D
Script PRE A input.sh
Script POST D output.sh
A
CB
D
www.cs.wisc.edu/condor
Simplified state diagram of a DAG node
Waiting Pre-running Submitted Done
Post-running
Failed
www.cs.wisc.edu/condor
DAGMan: important properties
› Monitors job state using Condor logs
› Simple and clean recovery model• Rescue DAG: saves state at failure• Restart: reconstruct internal state
› Scripts allow “lazy” planning
› Throttling parameters
www.cs.wisc.edu/condor
Outline
› DAGMan workflow management
› Motivation for dynamic DAGMan
› ClassAds
› Putting together: DAGMan + ClassAds
› Looking ahead
www.cs.wisc.edu/condor
Motivation for dynamic DAGMan
› DAG: complete execution order
› Flexibility to make run-time decisions• Which subset of DAG nodes should execute?• When should node X execute?
› Conditional DAGs• Associate a condition with DAG edges• Simplest condition: successful completion of
parent nodes
www.cs.wisc.edu/condor
Conditional DAG: examples
A
Condition:
A.x = = true
B C
Yes No
P1 P2
C
Condition:
P1.x OR P2.x
Example 1 Example 2
www.cs.wisc.edu/condor
Motivation for dynamic DAGMan
› Scripts can be leveraged for lazy planning• For simple conditions
• E.g. exit value of job
• Modify DAG structure• E.g. convert branch-not-taken to no-op/empty
› We want a generic solution
› Supported by “Dynamic DAGMan”
www.cs.wisc.edu/condor
Outline
› DAGMan workflow management
› Motivation for dynamic DAGMan
› ClassAds
› Putting together: DAGMan + ClassAds
› Looking ahead
www.cs.wisc.edu/condor
ClassAds
› Classified advertisements
› Used extensively in Condor• Define jobs, machines, resources• Define conditions, triggers,
requirements• Maintain internal state
www.cs.wisc.edu/condor
ClassAds
› List of attribute-value pairs• Simple value types: integer, strings• Complex types: list, expressions,
ClassAds
› Matchmaking framework• Tests match between two classAds• Using “Requirements” expression
› Great fit for Dynamic DAGMan
www.cs.wisc.edu/condor
Outline
› DAGMan workflow management
› Motivation for dynamic DAGMan
› ClassAds
› Putting together: DAGMan + ClassAds
› Looking ahead
www.cs.wisc.edu/condor
Putting together: DAGMan + ClassAds
› Dynamic DAGMan research project• Work-in-progress• Not yet available in Condor
› DAG nodes have associated classAds› Basic node attributes
• Job identifier, name, type• Status (Waiting, Submitted, Done, etc.)
www.cs.wisc.edu/condor
Dynamic DAGMan: attributes
› Execution characteristics of job• Exit value• Wall-clock time • CPU utilization (local and remote)• Network statistics (bytes sent / received)• Information about files transferred (for vanilla
universe)
› Attributes maintained by Condor for a job
www.cs.wisc.edu/condor
Dynamic DAGMan: conditions
› Requirements expression• Defines trigger condition for the node• Arbitrarily complex expression • Defined on the attributes of parent
nodes
› Use matchmaking to determine if a node can be submitted
www.cs.wisc.edu/condor
Dynamic DAG: example
A
condition x = = true
B C
Yes No
Job A A.condor
Job B B.condor
Job C C.condor
Parent A Child B \
COND [ ( other.job == A &&
other.x == true ) ]
Parent A Child C \
COND [ ( other.job == A &&
other.x == false ) ]
www.cs.wisc.edu/condor
Dynamic DAGMan: example
Job P1 P1.condor
Job P2 P2.condor
Job C C.condor
Parent P1 P2 Child C \
COND [ (other.job == P1 &&
other.x == true) ||
(other.job == P2 &&
other.x == true) ]
P1 P2
C
Condition:
P1.x OR P2.x
www.cs.wisc.edu/condor
Dynamic DAGMan
› Recovery model is still the same• Rescue DAG: saves node state at failure• ClassAd attribute-values can be re-
generated from Condor logs
› Flexibility to make run-time decisions• Which subset of nodes in the DAG
should be executed?• When should node X be executed?
www.cs.wisc.edu/condor
Outline
› DAGMan workflow management
› Motivation for dynamic DAGMan
› ClassAds
› Putting together: DAGMan + ClassAds
› Looking ahead
www.cs.wisc.edu/condor
Looking ahead
› DAG with only implicit edges• Parent-child relations embedded in classAds• Nodes specify
• Trigger condition• Preference for child nodes to run
• On-the-fly dependency formation based on previous node execution
› DAGMan collaborates with Quill• Getting attributes from persistent storage
www.cs.wisc.edu/condor
Looking ahead
› Allow job to modify/add its attributes• Determine what happens after job exits
› Global state control• Throttling expression/parameters
› Global DAG-classAd• Statistics on running, successful and failed
jobs• E.g. if (#failed jobs > N ) run cleanup node
www.cs.wisc.edu/condor
Thank-you
We are interested in knowing your suggestions!