A Low-Cost Parallel Queuing System for Computationally
Intensive Problems
Sean Martin, Bei Yuan, Judy Fredrickson, Sean Martin, Bei Yuan, Judy Fredrickson,
Fred Harris, Jr.*Fred Harris, Jr.*
University of Nevada, RenoUniversity of Nevada, Reno
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
Background - Crossing Number Problem
My involvement started in Graduate SchoolMy involvement started in Graduate SchoolI was in Computer Science, my fiancé was in I was in Computer Science, my fiancé was in
Mathematics at Clemson University Mathematics at Clemson University She was under Rich RingeisenShe was under Rich Ringeisen
Her MS work led to a 1988 Congressus Paper Her MS work led to a 1988 Congressus Paper ““Crossing Numbers of Permutation Graphs”Crossing Numbers of Permutation Graphs”
Helping her with the code got me “hooked” on Helping her with the code got me “hooked” on the problem and I ended up taking Graph the problem and I ended up taking Graph Theory from Ringeisen a couple of years later.Theory from Ringeisen a couple of years later.
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
My work My work A GA for the Rectilinear MCN ProblemA GA for the Rectilinear MCN Problem
1993 Cumberland Conference1993 Cumberland Conference 1996 Ars Combinatoria Paper 1996 Ars Combinatoria Paper
Found drawings of KFound drawings of K1212 and K and K1313 better than the better than the
formulas by Richard Guyformulas by Richard Guy
Richard Guy said if the rectilinear formula was not a Richard Guy said if the rectilinear formula was not a tight bound, the normal one would not be either.tight bound, the normal one would not be either.
Background - Crossing Number Problem
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
Could you develop an algorithm for calculating Could you develop an algorithm for calculating the MCN for non-rectilinear graphs?the MCN for non-rectilinear graphs?
My wife and I worked on and finally developed a My wife and I worked on and finally developed a computational algorithm for solving the Minimum computational algorithm for solving the Minimum Crossing Number Problem for non-rectilinear Crossing Number Problem for non-rectilinear problem. problem.
This was presented at the 1996 – Kalamazoo This was presented at the 1996 – Kalamazoo ConferenceConference
Background - Crossing Number Problem
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
This algorithm was then implemented by one of This algorithm was then implemented by one of my studentsmy students
Umid Tadjiev Umid Tadjiev Developed a static parallel partitioning of itDeveloped a static parallel partitioning of it Presented our work at the 1997 SIAM Conference Presented our work at the 1997 SIAM Conference
on Parallel Processing for Scientific Computing.on Parallel Processing for Scientific Computing.
Background - Crossing Number Problem
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
Motivation
We still have not found out if the formula by We still have not found out if the formula by Richard Guy is exact or not.Richard Guy is exact or not.
The problem is that this problem, and others The problem is that this problem, and others like it, are computationally expensivelike it, are computationally expensive
My goal has been to build a tool that would My goal has been to build a tool that would allow us to expand our knowledge of the MCN allow us to expand our knowledge of the MCN problem (and others as well).problem (and others as well).
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
Parallel Cluster Computation
Computer clusters are affordableComputer clusters are affordableParallel processing now feasible for computationally Parallel processing now feasible for computationally
intensive problemsintensive problems Exhaustive SearchesExhaustive Searches Graph AlgorithmsGraph Algorithms
Can we build a tool that will harness this power and Can we build a tool that will harness this power and allow researchers to use it with little (or no) allow researchers to use it with little (or no) knowledge needed of the parallel programming knowledge needed of the parallel programming details?details?
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
Development/Testing Cluster
Our idea for a computational engineOur idea for a computational engine A group of networked workstationsA group of networked workstations Use many machines as one “Supercomputer”Use many machines as one “Supercomputer” Work is distributed across all machinesWork is distributed across all machines Low cost makes it affordable resourceLow cost makes it affordable resource
College of Engineering Computing Center LabCollege of Engineering Computing Center Lab 44 Pentium 4 machines running Widows XP and44 Pentium 4 machines running Widows XP and 44 Pentium 4 workstations running Linux (RH 9.0)44 Pentium 4 workstations running Linux (RH 9.0)
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
Run-Time Cluster
Cortex, a much larger and faster clusterCortex, a much larger and faster cluster Processors (128 total)Processors (128 total)
• 30 dual processor Pentium III30 dual processor Pentium III
• 34 dual processor Pentium IV Xeon34 dual processor Pentium IV Xeon
InterconnectInterconnect• Ethernet for NFSEthernet for NFS
• Myrinet 2 for communicationMyrinet 2 for communication
– 2 Gigabit bi-directional low-latency network2 Gigabit bi-directional low-latency network
Misc.Misc.• 2 GB RAM per CPU2 GB RAM per CPU
• More than ½ Terabyte of storageMore than ½ Terabyte of storage
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
The Problem to avoid – Load (un)Balancing
Unbalanced search treeUnbalanced search tree Processes 2 & 3 sit idle Processes 2 & 3 sit idle
while process 1 works while process 1 works toward a solutiontoward a solution
A work queue system A work queue system helps balance the helps balance the workloadworkload
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
The Solution – A Generic Work Queue System
Almost all of the problems we have been Almost all of the problems we have been looking at can be broken up into jobs (or sub-looking at can be broken up into jobs (or sub-jobs).jobs).
We decided to build a queue of jobs (work) that We decided to build a queue of jobs (work) that can be distributed across a cluster to harness the can be distributed across a cluster to harness the parallel computation power available.parallel computation power available.
One of the goals:One of the goals: Little knowledge of parallel programming or Little knowledge of parallel programming or
message passing needed by user.message passing needed by user.
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
Queuing System Design Goals
Master/Slave architecture Master/Slave architecture Master creates initial jobs for slavesMaster creates initial jobs for slaves Master then monitors messages and keeps the work Master then monitors messages and keeps the work
load balancedload balanced
Central and distributed work queuesCentral and distributed work queues Queue sizes can be altered (while running) for Queue sizes can be altered (while running) for
optimizationoptimization
Master signals termination when master queue Master signals termination when master queue empty and all slaves are idleempty and all slaves are idle
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
Queuing SystemMaster
Slave 1 Slave 2 Slave n
Share Work Msg
Central Queue
Distributed Queue
Work Request
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
User Requirements
Define a jobDefine a job Define the master and slave functionsDefine the master and slave functions Then optionallyThen optionally
Determine queue max and min sizesDetermine queue max and min sizes Can be ascertained empirically during developmentCan be ascertained empirically during development
Adjust granularity as needed based upon Adjust granularity as needed based upon performance (message passing behavior)performance (message passing behavior)
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
Define a Job
A job is just a C/C++ data structureA job is just a C/C++ data structureWe used an array of integersWe used an array of integersIf the job is not of built in data types then the If the job is not of built in data types then the
user must define types and overload operatorsuser must define types and overload operators Our system is designed to work with almost Our system is designed to work with almost
any jobany job
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
Example Job MCNMCN
Region listsRegion lists Adjacency matrixAdjacency matrix Several integers to keep track of best and current Several integers to keep track of best and current
solutionssolutions
Job is enqueued as an array of integer valuesJob is enqueued as an array of integer values
Integer Array
Job Size, Current MCN,# Vertices,# Regions, Region List, Adjacency Matrix
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
User Defined Functions
Master Function Master Function We called it We called it master_create_jobsmaster_create_jobs( ) – ( ) – Creates initial jobs (from user data)Creates initial jobs (from user data) Number of jobs created can be application dependent or Number of jobs created can be application dependent or
based on number of processesbased on number of processes May return a meaningful value such as a lower boundMay return a meaningful value such as a lower bound
We return the initial MCN (using Guy’s formula)We return the initial MCN (using Guy’s formula)
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
User Defined Functions
Slave Function Slave Function We called it We called it work( )work( ) Unpacks job into local data structure to processUnpacks job into local data structure to process The code for this function determines the granularity of The code for this function determines the granularity of
the work being donethe work being done This function adds jobs it creates onto the local queueThis function adds jobs it creates onto the local queue It may return a meaningful value such as a current best It may return a meaningful value such as a current best
solution (updating the MCN)solution (updating the MCN)
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
Results
The system is able to create and manage a The system is able to create and manage a large number of jobs and messages large number of jobs and messages Test runs generated more than 128 million jobsTest runs generated more than 128 million jobs
The system works for different problemsThe system works for different problemsSolved Minimum Crossing Number Problem Solved Minimum Crossing Number Problem
for Kfor K66, K, K77, and K, and K88
Solved TSP for several graphsSolved TSP for several graphs
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
MCN Results
Graph Size Graph Size MCNMCN # of Jobs # of Jobs Created/ProcessedCreated/Processed
K5K5 11 33
K6K6 33 7171
K7K7 99 25,84425,844
K8K8 1818 128,737,926128,737,926
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
Future Work
Find the MCN of larger vertex setsFind the MCN of larger vertex setsCurrently being used to solve MCN problem Currently being used to solve MCN problem
for growing Nfor growing N Develop a job to find MCN of bipartite and Develop a job to find MCN of bipartite and
other graphsother graphs Add ability to save queues to diskAdd ability to save queues to disk Develop a GUI (for ease of use)Develop a GUI (for ease of use)
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
Future Work
Is a Stack of jobs better than a Queue?Is a Stack of jobs better than a Queue?The number of jobs generated is different The number of jobs generated is different
because one does a depth first search and the because one does a depth first search and the other a breadth first search.other a breadth first search.
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
A Time Saving Region Restriction A Time Saving Region Restriction for Calculating the MCN of Kfor Calculating the MCN of Knn
Judy FredricksonJudy Fredrickson
Talk 114 - Wednesday 4:00pmTalk 114 - Wednesday 4:00pm
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
Minimum Crossing Number
Classic graph theory problemClassic graph theory problemGiven a number of vertices Given a number of vertices nn, what is the , what is the
minimum number of crossings (Kminimum number of crossings (Knn) if every ) if every
vertex has an edge to every other vertexvertex has an edge to every other vertex Proven for Proven for nn 10 10 Involves a very large search space Involves a very large search space
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
Traveling Salesman Problem
“…“…given a finite number of ‘cities’ along with given a finite number of ‘cities’ along with the cost of travel between each pair of them, the cost of travel between each pair of them, find the cheapest way of visiting all the cities find the cheapest way of visiting all the cities and returning to your starting point.” and returning to your starting point.” -- Traveling -- Traveling
Salesman Problem Home PageSalesman Problem Home Page
Problem size grows exponentiallyProblem size grows exponentiallyDifficult to solve problems of any significant Difficult to solve problems of any significant
size with brute forcesize with brute force
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
Traveling Salesman Problem
AA BB CC DD EE
AA 00 55 11 11 55
BB 55 00 55 11 11
CC 11 55 00 55 11
DD 11 11 55 00 55
EE 55 11 11 55 00
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
Traveling Salesman Problem
A Low-Cost Parallel Queuing System for Computationally Intensive Problems
Results TSP
TSPTSPCreated over a million jobs with a fairly small Created over a million jobs with a fairly small
problem sizeproblem size~30,000 jobs sent to master by slaves~30,000 jobs sent to master by slavesRelatively few requests for workRelatively few requests for workGranularity could be less fine, more work done Granularity could be less fine, more work done
per jobper job