Grid Simulatorwith
ProductionSchedulingAlgorithms
MiroslavRuda, Hana
Rudová
Motivation
Simulator withVirtualMachines
ExperimentalTestbed
Preemption
Conclusion
Grid Simulator with Production SchedulingAlgorithms
Miroslav Ruda1 Hana Rudová2
1Institute of Computer ScienceMasaryk University
2Faculty of InformaticsMasaryk University
Cracow, 2007
Grid Simulatorwith
ProductionSchedulingAlgorithms
MiroslavRuda, Hana
Rudová
Motivation
Simulator withVirtualMachines
ExperimentalTestbed
Preemption
Conclusion
Scheduling Algorithms
Algorithms in Grid simulatorsSimGrid, GridSim, GSSIM , Aleadevelopment and testing of new algorithmscomparison of algorithms
Algorithms in production systemsPBSPro, SGE, Maui, Moabin simulators: approximated with FIFO (with backfilling)hard to reimplement
many rules, features, bugsclosed source, algorithms not published
Example www.excludus.com
Information about Excludus "Grid Optimizer"uses innovative real-time scheduling algorithmdynamic adaptive scheduling beyond traditionalworkload managers
Grid Simulatorwith
ProductionSchedulingAlgorithms
MiroslavRuda, Hana
Rudová
Motivation
Simulator withVirtualMachines
ExperimentalTestbed
Preemption
Conclusion
Scheduling Algorithms
Algorithms in Grid simulatorsSimGrid, GridSim, GSSIM , Aleadevelopment and testing of new algorithmscomparison of algorithms
Algorithms in production systemsPBSPro, SGE, Maui, Moabin simulators: approximated with FIFO (with backfilling)hard to reimplement
many rules, features, bugsclosed source, algorithms not published
Example www.excludus.com
Information about Excludus "Grid Optimizer"uses innovative real-time scheduling algorithmdynamic adaptive scheduling beyond traditionalworkload managers
Grid Simulatorwith
ProductionSchedulingAlgorithms
MiroslavRuda, Hana
Rudová
Motivation
Simulator withVirtualMachines
ExperimentalTestbed
Preemption
Conclusion
Scheduling Algorithms
Algorithms in Grid simulatorsSimGrid, GridSim, GSSIM , Aleadevelopment and testing of new algorithmscomparison of algorithms
Algorithms in production systemsPBSPro, SGE, Maui, Moabin simulators: approximated with FIFO (with backfilling)hard to reimplement
many rules, features, bugsclosed source, algorithms not published
Example www.excludus.com
Information about Excludus "Grid Optimizer"uses innovative real-time scheduling algorithmdynamic adaptive scheduling beyond traditionalworkload managers
Grid Simulatorwith
ProductionSchedulingAlgorithms
MiroslavRuda, Hana
Rudová
Motivation
Simulator withVirtualMachines
ExperimentalTestbed
Preemption
Conclusion
Simulator with production resourcemanagement system
Experiments with PBSProdifferent setup of PBS scheduler
queues, priorities, backfilling, . . .different setup of worker nodes
number of nodes per queue, . . .
modifications of PBS schedulerinclusion of virtual machines into PBS
Future: new scheduler
Grid Simulatorwith
ProductionSchedulingAlgorithms
MiroslavRuda, Hana
Rudová
Motivation
Simulator withVirtualMachines
ExperimentalTestbed
Preemption
Conclusion
Simulator with Virtual Machines
Worker nodes represented by virtual machinesStandard PBS Server and Scheduler
running on dedicated serverStandard PBS Mom
running within each virtual machineSleep jobs
no cpu/memory consumption
Grid Simulatorwith
ProductionSchedulingAlgorithms
MiroslavRuda, Hana
Rudová
Motivation
Simulator withVirtualMachines
ExperimentalTestbed
Preemption
Conclusion
Workloads
Real workloadsCzech Grid METACentrumExtracted from PBS accounting2005-2007
Jobs submitted with the same requirementson worker nodesto the same queueswith original owners, ...
Time reductionconfigurable reduce factor (600)expected and real wall-clock timejob arrival time
Grid Simulatorwith
ProductionSchedulingAlgorithms
MiroslavRuda, Hana
Rudová
Motivation
Simulator withVirtualMachines
ExperimentalTestbed
Preemption
Conclusion
Vserver based virtual machines
One kernel space - very lightweightSimilar to
Linux chroot or BSD jail, with better protectionAccess limits
standard: filesystemadded: processes, network devices ...
No hardware emulation, no paravirtualisationno performance penalty
Copy On Write filesystemone RO root filesystem, with RW overlay filesystem
System daemonsrunning only once in hosting environment
Grid Simulatorwith
ProductionSchedulingAlgorithms
MiroslavRuda, Hana
Rudová
Motivation
Simulator withVirtualMachines
ExperimentalTestbed
Preemption
Conclusion
Experimental Testbed
Current workloads (year 2007)January 4.700 jobs, March 14.000 jobs, Jan-March70.000 jobs
150 Vserver domains16 core AMD machine . . . . can use more physical machinesrepresents 300 nodes. . . . . . . . . . . . . . . . . . .can be extended
COW filesystem300 MB one system instalation12 GB used to represent 150 virtual machines
Virtual machine: only PBS Mom + sshdSubmission of all jobs
without any sleep takes less then 10 minutesReduce factor 600
1 month -> 1.5 hours, 1 year -> ≤ 1 dayreasonably small simulation overhead
Grid Simulatorwith
ProductionSchedulingAlgorithms
MiroslavRuda, Hana
Rudová
Motivation
Simulator withVirtualMachines
ExperimentalTestbed
Preemption
Conclusion
Evaluation Criteria
Standard monitoring during simulation runnumber of running/waiting/done jobsnumber of used nodes
Analysis of accounting dataWeighted Response Time (WRT)Weighted SlowDown Time (WSD)Weighted Wait Time (WWT)metrics per user, queuealso structured by number of nodes used by job
Grid Simulatorwith
ProductionSchedulingAlgorithms
MiroslavRuda, Hana
Rudová
Motivation
Simulator withVirtualMachines
ExperimentalTestbed
Preemption
Conclusion
Experimental Results for March Workload
Number of running jobsand number of usedworker nodes. Firstsimulation with "starvationsupport", second without.
Number of finished jobs.
Grid Simulatorwith
ProductionSchedulingAlgorithms
MiroslavRuda, Hana
Rudová
Motivation
Simulator withVirtualMachines
ExperimentalTestbed
Preemption
Conclusion
Experimental Results for Parallel Jobs I.
number of running jobs
number of finished jobs number of finished parallel jobs
Grid Simulatorwith
ProductionSchedulingAlgorithms
MiroslavRuda, Hana
Rudová
Motivation
Simulator withVirtualMachines
ExperimentalTestbed
Preemption
Conclusion
Experimental Results for Parallel Jobs II.
Wait time based on number of nodes used by jobNodes A B C D E1 166 326 261 99 2722 40 44 25 244 284 308 337 229 522 3528 339 513 252 425 58412 434 580 393 615 85316 617 343 524 694 35528 817 361 758 884 43032 820 676 761 857 74740 1052 937 1020 808 1048A one queueB one queue, starvation supportC two queuesD two queues, strict fifoE two queues, starvation
Grid Simulatorwith
ProductionSchedulingAlgorithms
MiroslavRuda, Hana
Rudová
Motivation
Simulator withVirtualMachines
ExperimentalTestbed
Preemption
Conclusion
Jobs with Preemption
Motivation: better support of parallel jobs or prioritiesTwo virtual machines running on physical machine
first machine: standard jobssecond machine: privileged/parallel jobs
Magrathea allowsseveral VMs running on a single computerjobs submitted directly to VMs
When job is started in privileged domain, Magratheasuspends job in standard domain (if needed)almost all cpu/memory resources are given to privilegeddomain (but standard is still running)
Support of simulatorMagrathea installed on simulated machines toosleep jobs must respect preemption
Grid Simulatorwith
ProductionSchedulingAlgorithms
MiroslavRuda, Hana
Rudová
Motivation
Simulator withVirtualMachines
ExperimentalTestbed
Preemption
Conclusion
Conclusion & Future Work
New Grid simulatorInclusion of production resource management systemNew experiments: PBSPro (and other algorithms)Novel proposal with virtual machinesNew experiments: scheduling with Magrathea
Future workStudy: limits of the simulator
efficient scheduling algorithms neededcannot use actual load on machinesmonitoring issues
New scheduler
Grid Simulatorwith
ProductionSchedulingAlgorithms
MiroslavRuda, Hana
Rudová
Motivation
Simulator withVirtualMachines
ExperimentalTestbed
Preemption
Conclusion
Standard submit script in simulator
#!/bin/bashreduce=600 #reduce factor
sleep $(($SIMSLEEP/$reduce)) #gap in workload
sudo $SIMUSER qsub -q $SIMQUEUE#the same node requirements-l nodes=$SIMNODESL-l walltime=$(($SIMREQL/$reduce)) «EOF
#sleep instead of real jobsleep $(($SIMWALL/$reduce))EOF
Grid Simulatorwith
ProductionSchedulingAlgorithms
MiroslavRuda, Hana
Rudová
Motivation
Simulator withVirtualMachines
ExperimentalTestbed
Preemption
Conclusion
Preemption in simulator
reduce=600sleep $(($SIMSLEEP/$reduce))sudo $SIMUSER qsub -q sim$SIMQUEUE
-l nodes=$SIMNODESL-l walltime=$(($SIMREQL/$reduce))«EOF
sleeptime = $(($SIMWALL/$reduce))while ($sleeptime >0) dosleep $sleeptime#check long how job has been preemptedsleeptime=‘magrathea-preempted-time‘;
doneEOF
Grid Simulatorwith
ProductionSchedulingAlgorithms
MiroslavRuda, Hana
Rudová
Motivation
Simulator withVirtualMachines
ExperimentalTestbed
Preemption
Conclusion
Weighted Response Time, SlowDown, andWait Time
SAj = reqResourcesj × (endTimej − startTimej)
TotalSA =∑
j∈Jobs
SAj
SD =(endTimej − submitTimej)
runtimej
WRT =
∑j∈Jobs(SAj(endTimej − submitTimej))
TotalSA
WSD =
∑j∈Jobs SAj × SDj
TotalSA
WWT =
∑j∈Jobs SAj × (startTimej − submitTimej)
TotalSA