1 ClusterSim: A Java-Based Parallel Discrete-Event Simulation Tool for Cluster Computing Luís F. W. Góes 1 , Luiz E. S. Ramos 2 , Carlos A. P. S. Martins 3 Graduation Program in Electrical Engineering, Pontifical Catholic University of Minas Gerais Av. Dom José Gaspar 500, Belo Horizonte, MG, Brazil 1 {[email protected]} 2 {[email protected]} 3 {[email protected]} Abstract – In this paper, we present the proposal and implementation of a Java-based parallel discrete-event simulation tool for cluster computing called ClusterSim (Cluster Simulation Tool). The ClusterSim supports visual modeling and simulation of clusters and their workloads for performance analysis. A cluster is composed of single or multi-processed nodes, parallel job schedulers, network topologies and technologies. A workload is represented by users that submit jobs composed of tasks described by probability distributions and their internal structure (CPU, I/O and MPI instructions). Our main objectives in this paper: to present the proposal and implementations of the software architecture and simulation model of ClusterSim; to verify and validate ClusterSim; to analyze ClusterSim by means of a case study. Our main contributions are: the proposal and implementation of ClusterSim with an hybrid workload model, a graphical environment, the modeling of heterogeneous clusters and a statistical and performance module. Keywords: Cluster Computing, Discrete-Event Simulation, Parallel Job Scheduling, Network Topologies and Technologies, Performance Analysis.
25
Embed
ClusterSim: A Java-Based Parallel Discrete-Event ...research.cs.rutgers.edu/~luramos/pdf/iccc04clustersim.pdfThe ClusterSim is a Java-based parallel discrete-event simulation tool
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
ClusterSim: A Java-Based Parallel Discrete-Event Simulation Tool
for Cluster Computing
Luís F. W. Góes1, Luiz E. S. Ramos2, Carlos A. P. S. Martins3
Graduation Program in Electrical Engineering, Pontifical Catholic University of Minas Gerais
Av. Dom José Gaspar 500, Belo Horizonte, MG, Brazil
C 01 First Fit Alternative Scheduling Limited FCFS C 02 First Fit Alternative Scheduling Limited SJF C 03 First Fit Slot Unification Limited FCFS C 04 First Fit Slot Unification Limited SJF C 05 First Fit Alternative Scheduling Unlimited X C 06 First Fit Slot Unification Unlimited X C 07 Best Fit Alternative Scheduling Limited FCFS C 08 Best Fit Alternative Scheduling Limited SJF C 09 Best Fit Slot Unification Limited FCFS C 10 Best Fit Slot Unification Limited SJF C 11 Best Fit Alternative Scheduling Unlimited X C 12 Best Fit Slot Unification Unlimited X
Along the time, new jobs are inserted at the Ousterhout matrix, while other jobs are removed
as soon as they finish their execution. It implies in fragmentation of jobs in time slices, that is,
20
the time slices do not utilize all the available processors. By means of packing schemes, jobs are
allocated to processors (inserted in the Ousterhout) to minimize the jobs fragmentation. After
fragmentation, some re-packing schemes can be used, as alternative scheduling and slot
unification [5] [19].
In Table 6, we observe the variations of the gang scheduling algorithm for each cluster. The
multiprogramming level was limited in 3. When the multiprogramming level is unlimited, it does
not make sense to use a wait queue. Because, as soon as a job arrives, it will always fit to the
matrix. The description of these algorithms, schemes and policies can be found in [5].
5.1.2 Workloads
In ClusterSim, a workload is composed of a jobs set represented by: their types, internal
structures, submission probabilities and inter-arrival time distributions. Due to the lack of
information about the internal structure of the jobs, we decided to create a synthetic set of jobs
[5] [6] [8]. In the workload jobs, at each one of the iterations, the master task sends a different
message to each slave task. On their turn, they process a certain number of instructions,
according to the previously defined granularity, and then they return a message to the master task.
The total number of instructions that is to be processed by the job and the size of the messages
are divided equally among the slave tasks.
With regard to the parallelism degree, which is represented by a probability distribution, we
considered jobs between 1 and 4 tasks as low parallelism degree and between 5 and 16 as high
parallelism degree. As usual, we used a uniform distribution to represent the parallelism degree.
Combining the parallelism level, number of instructions and granularity characteristics, we had 8
different basic job types.
21
There are two main aspects through which a job can influence in a gang scheduling: space
and time [5]. In our case, space is related with the parallelism degree and time with the: number
of instructions, granularity and the other factors. Then we combine these orthogonal aspects to
form 4 workload types. In the first type, the most predominant are the jobs with a high
parallelism degree and a structure that leads to a high execution time. In the second type, jobs
with a high parallelism level and a low execution time predominate. The third one has the
majority of jobs with a low parallelism degree and a high execution time. In the last workload,
jobs with a low parallelism degree and a low execution time prevail. For each workload we
varied the predominance level between 60%, 80% and 100% (homogeneous). For example, a
workload named HH60 is a workload composed of 60% of jobs with a high execution time and a
high parallelism degree, and the other 40% is composed of the opposite workload (low execution
time and parallelism degree). So, we created 12 workloads to test the 12 clusters: HH60, 80 and
100; HL60, 80 and 100; LH60, 80 and 100; LL60, 80 and 100.
In all workloads we used a total number of jobs equal to 100 and the inter-arrival represented
by an Erlang hyper-exponential distribution. To simulate a heavy load, we divided the inter-
arrival time by a load factor equal to 100. This value was obtained through experimental tests.
Each one of the 12 clusters was tested with each workload, using 10 different simulation
seeds. The selected seeds were: 51, 173, 19, 531, 211, 739, 413, 967, 733 and 13. So we made a
total of 1440 (12 clusters X 12 workloads X 10 seeds) simulations.
5.2 Results Presentation and Analysis
In this section, we present and analyze the performance of the clusters and their gang scheduling
algorithms. To analyze them, we compare clusters in which a gang scheduling component or part
is varied and the others are fixed.
22
In Fig. 9, we present the mean nodes utilization for all workloads and clusters. Considering
the packing schemes (Fig. 10(a)), when the multiprogramming level is unlimited, the first fit is
better for HL and LH workloads.
Figure 9 – Mean nodes utilization for all clusters and workloads.
At a first moment, the best fit scheme finds the best slot for a job, but at long term, this
decision may prevent new jobs from entering in more appropriate slots. In the case of HL and LH
workloads, this chance increases, because the long jobs (with a low parallelism degree) that
remain after the execution of short jobs (with a high parallelism degree) will probably occupy
columns in common, thus, making it difficult to defragment the matrix. On the other hand, the
first fit initially makes the matrix more fragmented. Besides, it increases the multiprogramming
level. But at long term, it will make it easier to defragment the matrix, because the jobs will have
fewer columns in common. In the other cases, the best fit scheme presents a slightly better
performance. In general, both packing schemes have an equivalent performance. The same
happens to the re-packing schemes (Fig. 10 (b)).
Regarding the multiprogramming level, we reached two conclusions: the unlimited is better
for HH and LL workloads (Fig. 10 (c)), but it is very bad for HL and LH workloads (Fig. 10 (d)).
With an unlimited multiprogramming level, for each new job that does not fit to the matrix, a
new slot is created. At the end of the simulation, as the load is high, a big number of lines was
created. In this case, the big jobs are the long ones. So when small jobs terminate, the idle space
23
is significantly smaller than the space occupied by big jobs, that is, the fragmentation is low.
When we use LH and HL workloads, as time goes by, the short jobs will end, leaving idle spaces
on the matrix. In this case, the big jobs cannot be the long ones, so a big space can become idle.
Even if we use re-packing schemes, the fragmentation becomes high.
Figure 10 – Mean utilization considering the (a) packing schemes; (b) re-packing schemes;
multiprogramming level for (c) HH and LL workloads; and (d) HL and LH workloads.
With reference to the queue policies, the SJF (Short Job First) policy presented a higher
utilization in all cases. When we remove the short jobs first, there is a higher probability that
short idle slots exist where they can fit. Using the FCFS policy, if the first job is a big one, it can
not fit to the matrix, thus, preventing other short jobs from being executed.
On average, Cluster 08 presented the better (higher) mean utilization for all the 12 workloads,
but if we had analyzed other metrics we could choose another better cluster. Moreover, we could
change the number of nodes, network parameters, processors parameters, number of submitted
jobs etc. This case study was an example to show the use of ClusterSim for cluster performance
simulation.
24
6 Conclusions
In this paper, we proposed, implemented, verified, validated and analyzed the simulation tool
ClusterSim. It has a graphical environment that facilitates the modeling and creation of clusters
and workloads (parallel jobs and users) to analyze their performance by means of simulation. Its
hybrid workload model (probabilistic model and structural description) allows the representation
of real parallel jobs (instructions, loops etc.). Moreover it makes the simulation more
deterministic than an only-probabilistic model. The verification and validation of ClusterSim by
means of manual execution and experimental tests showed that ClusterSim provides mechanisms
to repeat and modify some parameters of real experiments under a controllable and trustful
environment. As shown in our case study, we can create synthetic workloads and evaluate the
performance of different cluster configurations.
Built in Java and with its source code available, the classes of ClusterSim can be extended,
allowing the creation of new network topologies, parallel job scheduling algorithms, etc.
The main contributions of this paper are: the definition, proposal, implementation,
verification, validation and analysis of the ClusterSim. The main features of ClusterSim are: an
hybrid workload model, a graphical environment, the modeling of heterogeneous clusters and a
statistical and performance module. As future works we can highlight: to implement a network
topology editor, support to distributed simulation, simulation of grid architectures, generation
of statistical and performance graphics etc. More information, source code and documentation of
the ClusterSim will be available on: http://o_cabra.sites.uol.com.br/clustersim/clustersim.html.
References
[1] Bodhanwala, H. et al., "A General Purpose Discrete Event Simulator", Symposium on Performance Analysis of
Computer and Telecommunication Systems, Orlando, USA, 2001.
[2] Breslau, L. et all, “Advances in Network Simulation”, IEEE Computer, Vol. 33 (5), pp. 59-67, May 2000.
25
[3] Buyya, R. and Murshed, M., “GridSim: A Toolkit for the Modeling and Simulation of Distributed Resource
Management and Scheduling for Grid Computing”, The Journal of Concurrency and Computation: Practice and
Experience, Volume 14, Issue 13-15, Pages: 1175-1220, Wiley Press, USA, November - December 2002..
[4] Casanova, H., “Simgrid: a Toolkit for the Simulation of Application Scheduling”, 3rd IEEE/ACM International
Symposium on Cluster Computing and the Grid, Los Angeles, 2001.
[5] Feitelson, D. G. “Packing Schemes for Gang Scheduling”, Workshop on Job Scheduling Strategies for Parallel
Processing, pp. 89-110, 1996.
[6] Feitelson, D., “Workload Modeling for Performance Analysis”, Performance Analysis of Complex Systems:
Techniques and Tools, pp. 114-141, 2002.
[7] Góes, L. F. W., Martins, C. A. P. S., “RJSSim: A Reconfigurable Job Scheduling Simulator for Parallel
Processing Learning”, 33rd ASEE/IEEE Frontiers in Education Conference, Colorado, pp. F3C3-8, 2003.
[8] Góes, L. F. W., Martins, C. A. P. S.: Proposal and Development of a Reconfigurable Parallel Job Scheduling
Algorithm. Master’s Thesis. Belo Horizonte, Brazil (2004) (in portuguese)
[9] Law, A.M., Kelton, W.D.,“Simulation Modeling and Analysis”, McGraw-Hill, 1991.
[10] Low, Y.H. et al., “Survey of Languages and Runtime Libraries for Parallel Discrete-Event Simulation”, IEEE
Computer Simulation, pp. 170-186, 1999.
[11] MacNab, R. and Howell, F.W., “Using Java for Discrete Event Simulation”, 12th UK Performance Engineering
Workshop, Edinburgh, pp. 219-228, 1996.
[12] Prakash, S. and Bagrodia, R.L., “MPI-SIM: Using Parallel Simulation to Evaluate MPI Programs”, Winter
Simulation Conference (WSC98), pp. 467-474, 1998.
[13] Ramos, L. E. S., Góes, L. F. W., Martins, C. A. P. S., “Teaching And Learning Parallel Processing Through
Performance Analysis Using Prober”, 32nd ASEE/IEEE Frontiers in Education Conference, Boston, pp. S2F13-18
2002.
[14] Rosenblum, M., Bugnion, E., Devine, S., Herrod, S. A. “Using the SimOS Machine Simulator to Study
Complex Computer Systems”, ACM TOMACS Special Issue on Computer Simulation, pp. 78-103, 1997.
[15] A Collection of Modeling and Simulation Resources on the Internet
URL: www.idsia.ch/%7Eandrea/sim/simindex.html
[16] Sulistio, A., Yeo, C.S. and Buyya, R., “Visual Modeler for Grid Modeling and Simulation Toolkit”, Technical
Report, Grid Computing and Distributed Systems (GRIDS) Lab, Dept. of Computer Science and Software
Engineering, The University of Melbourne, Australia, 2002.
[17] Sulistio, A., Yeo, C.S. and Buyya, R., “A Taxonomy of Computer-based Simulations and its Mapping to
Parallel and Distributed Systems Simulation Tools”, International Journal of Software: Practice and Experience,
Wiley Press, 2004.
[18] Zhang, Y., H. Franke, Moreira, E.J., Sivasubramaniam, A. “Improving Parallel Job Scheduling by Combining
Gang Scheduling and Backfilling Techniques”, IEEE International Parallel and Distributed Processing Symposium,
pp. 133, 2000.
[19] Zhou, B. B., Mackerras, P., Johnson C. W., Walsh, D. “An Efficient Resource Allocation Scheme for Gang
Scheduling”, 1st IEEE Computer Society International Workshop on Cluster Computing, pp. 187-194, 1999.