Carlos Jaime Barrios Hernandez, PhD. EISI UIS @carlosjaimebh Concurrency and Parallel Programming Concurrency and Parallel Programming “An Introduction”
CarlosJaimeBarriosHernandez,PhD. EISIUIS @carlosjaimebh
ConcurrencyandParallelProgramming
ConcurrencyandParallelProgramming“AnIntroduction”
ConcurrentandParallel
Larépétitionsurlascène,1874,EdgarDegas,Paris,Muséed'Orsay.
Plan
• TheTraditionalWay • DesignSpacesofParallelProgrammingRecall • ConcurrentProgramming • DistributedMemoryVs.SharedMemory • DesignModelsforConcurrentAlgorithms
• TaskDecomposition • DataDecomposition
• ConcurrentAlgorithmDesignFeaturesandForces • NotParallelizableJobs,TasksandAlgorithms • AlgorithmStructures • FinalNotes
TraditionalWay
DesigningandBuildingParallelPrograms,byIanFosterinhttp://www.mcs.anl.gov/~itf/dbpp/
DesignSpacesofParallelProgramming*
• PatternsforParallelProgramming,TimotyMattson,BeverlyA.SandersandBernaL.Massingill, SoftwarePatternSeries,Addison-Wesley2004
FC
• FindingConcurrency(StructuringProblemtoexposeexploitableconcurrency)
AS
• AlgorithmStructure(StructureAlgorithmtotakeadvantageofConcurrency)
SS
• SupportingStructures(InterfacesbetweenAlgorithmsandEnvironments)
IM
• ImplementationMechanisms(DefineProgrammingEnvironments)
(Remember)ConcurrencyandParallelism
• Asystemis“concurrent”ifitcansupporttwoormoreactionsinprogressatthesametime
• Asystemis“parallel”ifitcansupporttwoormoreactionsexecutingsimultaneously
ConcurrentProgrammingisallaboutindependentcomputationsthatthemachinecanexecuteinanyorder.
ConcurretVs.Parallel
DistributedVs.Parallel
ConcurrentProgrammingGeneralSteps 1. Analysis
• IdentifyPossibleConcurrency • Hotspot:Anypartitionofthecodethathasasignificantamount
ofactivity • Timespent,Independenceofthecode…
2. DesignandImplementation • Threadingthealgorithm
3. TestsofCorrectness • DetectingandFixingThreadingErrors
4. TuneofPerformance • RemovingPerformanceBottlenecks
• Logicalerrors,contention,synchronizationerrors,imbalance,excessiveoverhead
• TuningPerformanceProblemsinthecode(tuningcycles)
DistributedVs.Shared
MemoryProgramming
CommonFeatures • RedundantWork • DividingWork • SharingData(DifferentMethods)
• Dynamic/StaticAllocationofWork • Dependingofthenatureofserialalgorithm,resultingconcurrentversion,numberofthreads/processors
OnlytoSharedMemory
• LocalDeclarationsandThread-LocalStorage
• MemoryEffects: • FalseSharing
• CommunicationinMemory • MutualExclusion • Producer/ConsumerModel • Reader/WriterLocks(InDistributedMemoryisBoss/Worker)
TasksandDataDecomposition
• TasksDecomposition:TaskParallelism • DataDecomposition:DataParallelism(GeometricParallelism)
ConcurrentComputationfromSerialCodes
• SequentialConsistencyProperty:Gettingthesameanswerastheserialcodeonthesameinputdataset,comparingsequenceofexecutioninconcurrentsolutionsoftheconcurrentalgorithms.
in P out
in P out
P
P
SequentialVersion
Parallel/ConcurrentVersion
Tasksmustbeassignedtothreadsforexecution
TaskDecompositionConsiderations
• Whatarethetasksandhowaredefined?
• Whatarethedependenciesbetweentaskandhowcantheybesatisfied?
• Howarethetaskassignedtothreads?
Whatarethetasksandhowaredefined? • Thereshouldbeatleastasmanytasksastherewillbethreads(orcores) • Itisalmostalwaysbettertohave(many)moretasksthanthreads.
• Granularitymustbelargeenoughtooffsettheoverheadthatwillbeneededtomanagethetasksandthreads • Morecomputation:highergranularity(coarse-grained) • LessComputation:lowergranularity(fine-grained)
Granularityistheamountofcomputationdonebeforesynchronizationisneeded
TaskGranularity
Core0
overhead
task
overhead
task
overhead
task
Core1 Core2 Core0
overhead
task
Core1 Core3
overhead
task
overhead
task
overhead
task
overhead
task
overhead
task
overhead
task
overhead
task
overhead
task
Fine-graineddecomposition Coarse-graineddecomposition
TaskDependencies
OrderDependency DataDependency
EnchantinglyParallelCode:Codewithoutdependencies
Process1
Process2
Out
in In1 In2
Process1
Process3
Process2
Out1 Out2
Process3
Out
DataDecompositionConsiderations
(GeometricDecomposition)
DataStructuresmustbe(commonly)dividedinarraysorlogicalstructures.
• Howshouldyoudividethedataintochunks?
• Howshouldyouensurethatthetasksforeachchunkhaveaccesstoalldatarequiredforupdate?
• Howarethedatachunksassignedtothreads?
Howshouldyoudividedataintochunks?
Byindividualelements Byrows
Bygroupsofcolumns Byblocks
TheShapeoftheChunk
• DataDecompositionhaveanadditionaldimension. • Itdetermineswhattheneighboringchunksareandhowanyexchangeofdatawillbehandledduringthecourseofthechunkcomputations.
2SharedBorders
• Regularshapes:CommonRegulardataorganizations. • Irregularshapes:maybenecessaryduetotheirregular
organizationsofthedata.
5SharedBorders
Howshouldyouensurethatthetasksforeachchunkhaveaccesstoalldatarequiredforupdate? • UsingGhostCells
• Usingghostcellstoholdcopieddatafromaneighboringchunk.
Originalsplitwithghostcells
Copyingdataintoghostcells
Howarethedatachunks(andtasks)assignedtothreads?
• DataChunksareassociatedwithtasksandareassignedtothreadsstaticallyordynamically
• ViaScheduling • Static:whentheamountofcomputationswithintasksisuniformandpredictable
• Dynamic:toachieveagoodbalanceduetovariabilityinthecomputationneededbychunk • Requiremany(more)tasksthanthreads.
ConcurrentDesignModelsFeatures • Efficiency
• Concurrentapplicationsmustrunquicklyandmakegooduseofprocessingresources.
• Simplicity • Easiertounderstand,develop,debug,verifyandmaintain.
• Portability • Intermsofthreadingportability.
• Scalability • Itshouldbeeffectiveonawiderangeofnumberofthreadsandcores,andsizesofdatasets.
TasksandDomainDecompositionPatterns
• TaskDecompositionPattern • Understandthecomputationallyintensivepartsoftheproblem. • FindingTasks(asmuch…)
• Actionsthatarecarriedouttosolvetheproblem • Actionsaredistinctandrelativelyindependent.
• DataDecompositionPattern • Datadecompositionimpliedbytasks. • FindingDomains:
• Mostcomputationallyintensivepartoftheproblemisorganizedaroundthemanipulationoflargedatastructure.
• Similaroperatorsarebeingappliedtodifferentpartsofthedatastructure. • Insharedmemoryprogrammingenvironments,datadecompositionwillbeimpliedbytaskdecomposition.
GroupandOrderTasksPatterns
• GroupTasksPattern • Simplifytheproblemdependencyanalysis
• Ifagroupoftasksmustworktogetheronadatasharedstructure • Ifagroupoftasksaredependent
• OrderTasksPattern • Findandcorrectlyaccountfordependenciesresultingfromconstraintsontheorderofexecutionofacollectionoftasks. • Temporaldependencies • SpecificRequirementsofthetasks
DataSharingPattern
• Datadecompositionmightdefinesomedatathatmustbesharedamongthetasks.
• Datadependenciescanalsooccurwhenonetaskneedsaccesstosomeportionsoftheanothertask’slocaldata. • ReadOnly • EffectivelyLocal(Accessedbyoneofthetasks) • ReadWrite
• Accumulative • Multipleread/SingleWrite
DesignEvaluationPattern
• Productionofanalysisanddecomposition: • Taskdecompositiontoidentifyconcurrency • Datadecompositiontoidentifydatalocaltoeachtask • Groupoftaskandorderofgroupstosatisfytemporalconstraints
• Dependenciesamongtasks • DesignEvaluation
• Suitabilityforthetargetplatform • DesignQuality • Preparationforthenextphaseofthedesign
NotParallelizableJobs,TasksandAlgorithms • Algorithmswithstate • Recurrences • InductionVariables • Reductions • Loop-carriedDependencies
TheMythicalMan-Month:EssaysonSoftwareEngineering.ByFredBrooks.EdAddison-WesleyProfessional,1995
AlgorithmStructures
• OrganizingbyTasks • TaskParallelism • DivideandConquer
• OrganizingbyDataDecomposition • GeometricDecomposition • RecursiveData
• OrganizingbyFlowofData • Pipeline • Event-BasedCoordination
AlgorithmStructureDecisionTree(MajorOrganizingPrinciple)
Start
OrganizeByTasks
Linear
TaskParallelism
Recursive
DivideandConquer
OrganizeByDataDecomposition
Linear
GeometricDecomposition
Recursive
RecursiveData
OrganizeByFlowofData
Linear
Pipeline
Recursive
Event-BasedCoordination
DivideandConquerStrategy
Problem
Subproblem Subproblem Subproblem Subproblem
Subsolution Subsolution Subsolution Subsolution
Subproblem Subproblem
Subsolution Subsolution
Solution
split
split split
Solve Solve Solve Solve
Merge
Merge Merge
DivideandConquerParallelStrategy
split
base-casesolve
base-casesolve
merge
split
base-casesolve
base-casesolve
merge
split
merge Eachdashed-lineboxrepresentsatask
RecursiveDataStrategy
• Involvesanoperationonarecursivedatastructurethatappearstorequiresequentialprocessing: • Lists • Trees • Graphs
• RecursiveDatastructureiscompletelydecomposedintoindividualelements.
• Structureintheformofaloop(top-levelstructure)
• Simultaneouslyupdatingallelementsofthedatastructure(Synchronization)
• Examples: • Partialsumsofalinkedlist.
• Uses: • WidelyusedonSIMDplatforms(HPF77)
• CombinatorialoptimizationProblems.
• Partialsums • Listranking • Eulertoursandeardecomposition • Findingrootsoftreesinaforestofrooteddirectedtrees.
PipelineStrategy
• Involvesperformingacalculationonmanysetsofdata,wherethecalculationcanbeviewedintermsofdataflowingthroughasequenceofstages • InstructionpipelineinmodernCPUs
• VectorProcessing(Loop-levelpipelining)
• Algorithm-levelPipelining • SignalProcessing • Graphics • ShellProgramsinUnix
Event-BasedCoordinationStrategy
• Applicationdecomposedintogroupsofsemi-independenttasksinteractinginanirregularfashion.
• Interactiondeterminedbyaflowofdatabetweenthegroups,implyingorderingconstraintsbetweenthetasks.
1
2
3
FinalNotes
• EveryParallelAlgorithminvolvesacollectionoftasksthatcanexecuteconcurrently • Thekeyisfindingtasks(andcollectthem)
• Data-baseddecompositionisgoodif: • Themostcomputationallyintensivepartoftheproblemisorganizedaroundthemanipulationofalargedatasetstructure.
• Similaroperationsarebeingappliedtodifferentpartsofthedatastructurewithindependency.
• Howeverthedesiredfeaturesofaconcurrent/parallelprogram(efficiency,simplicity,portabilityandscalability): • Efficiencyconflictswithportability • Efficiencyconflictswithsimplicity
• Thusagoodalgorithmdesignmuststrikeabalancebetweenabstractionandportabilityandsuitabilityforaparticulartargetarchitecture.
RecommendedLectures
• TheArtofConcurrency“AthreadMonkey’sGuidetoWritingParallelApplications”,byClayBreshears(Ed.OReilly,2009)
• WritingConcurrentSystems.Part1.,byDavidChisnall(InformITAuthor’sBlog:http://www.informit.com/articles/article.aspx?p=1626979)
• PatternsforParallelProgramming.,byT.Mattson.,B.SandersandB.MassinGill(Ed.AddisonWeslley,2009)WebSite:http://www.cise.ufl.edu/research/ParallelPatterns/
• DesigningandBuildingParallelPrograms,byIanFosterinhttp://www.mcs.anl.gov/~itf/dbpp/
Class-Delayedwork
• RevisionofChapter2ofDesigningandBuildingParallelPrograms,byIanFosterinhttp://www.mcs.anl.gov/~itf/dbpp/
• SolveintheExercisesSectionthe1and2numerals. • Imagineasolutionforareal-worldhighcomplexproblemtosolveinthecampus(conceptually)
• Readhttp://www.cs.wisc.edu/multifacet/papers/ieeecomputer08_amdahl_multicore.pdf