This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
At least 10x more primitive tasks than At least 10x more primitive tasks than processors in target computerprocessors in target computer
Minimize redundant computations and Minimize redundant computations and redundant data storageredundant data storage
Primitive tasks roughly the same sizePrimitive tasks roughly the same size Number of tasks an increasing function of Number of tasks an increasing function of
Grouping tasks into larger tasksGrouping tasks into larger tasks GoalsGoals
Improve performanceImprove performance Maintain scalability of programMaintain scalability of program Simplify programmingSimplify programming
In MPI programming, goal often to create In MPI programming, goal often to create one agglomerated task per processorone agglomerated task per processor
Locality of parallel algorithm has increasedLocality of parallel algorithm has increased Replicated computations take less time than Replicated computations take less time than
communications they replacecommunications they replace Data replication doesn’t affect scalabilityData replication doesn’t affect scalability Agglomerated tasks have similar computational Agglomerated tasks have similar computational
and communications costsand communications costs Number of tasks increases with problem sizeNumber of tasks increases with problem size Number of tasks suitable for likely target systemsNumber of tasks suitable for likely target systems Tradeoff between agglomeration and code Tradeoff between agglomeration and code
modifications costs is reasonablemodifications costs is reasonable
Process of assigning tasks to processorsProcess of assigning tasks to processors Centralized multiprocessor: mapping done Centralized multiprocessor: mapping done
by operating systemby operating system Distributed memory system: mapping done Distributed memory system: mapping done
by userby user Conflicting goals of mappingConflicting goals of mapping
Maximize processor utilizationMaximize processor utilization Minimize interprocessor communicationMinimize interprocessor communication
Static number of tasksStatic number of tasks Structured communicationStructured communication
Constant computation time per taskConstant computation time per task• Agglomerate tasks to minimize commAgglomerate tasks to minimize comm• Create one task per processorCreate one task per processor
Variable computation time per taskVariable computation time per task• Cyclically map tasks to processorsCyclically map tasks to processors
Unstructured communicationUnstructured communication• Use a static load balancing algorithmUse a static load balancing algorithm
Considered designs based on one task per Considered designs based on one task per processor and multiple tasks per processorprocessor and multiple tasks per processor
Evaluated static and dynamic task allocationEvaluated static and dynamic task allocation If dynamic task allocation chosen, task If dynamic task allocation chosen, task
allocator is not a bottleneck to performanceallocator is not a bottleneck to performance If static task allocation chosen, ratio of tasks If static task allocation chosen, ratio of tasks
to processors is at least 10:1to processors is at least 10:1
Boundary value problemBoundary value problem Finding the maximumFinding the maximum The n-body problemThe n-body problem Adding data inputAdding data input