HPC & Big Data Convergence The Cutting Edge & Outlook Rashid Mehmood King Abdulaziz University @ IXPUG 2018, KAUST 13 March 2018
HPC&BigDataConvergenceTheCuttingEdge&Outlook
RashidMehmoodKingAbdulaziz University
@IXPUG2018,KAUST13March2018
Dataanalyticsandcomputingecosystemcompared
Rashid Mehmood HPC Big Data Convergence 2
HPCandBigData• HPCtechnologiesareneededbyBigDatatodealwiththe
everincreasingVsofdatainordertoforecastandextractinsightsfromexistingandnewdomains,faster,andwithgreateraccuracy.
• Increasinglymoredataisbeingproducedbyscientificexperimentsfromareassuchasbioscience,physics,andclimate,andtherefore,HPCneedstoadoptdata-drivenparadigms.
• Moreover,therearesynergiesbetweenthemwithunimaginablepotentialfordevelopingnewcomputingparadigms,solvinglong-standinggrandchallenges,andmakingnewexplorationsanddiscoveries.
Rashid Mehmood HPC Big Data Convergence 3
Rashid Mehmood HPC Big Data Convergence 4
Rashid Mehmood HPC Big Data Convergence 5
BigDataAnalyticsWorkloads
Rashid Mehmood HPC Big Data Convergence 6
TechnicalChallengesforaConvergedSystem
• HigherEnergyEfficiency– Isthemostcrucialchallenge– SunwayTaihuLight:15.37MWpowerconsumptionfor93.014Pflops/s(Rmax),125.435Pflops/s(Rpeak)
– Tianhe-2:17.81MWfor33.86Pflops/s(54.9PFLOPSpeak)– Powerandcoolingtechnologies– circuittechnologies– software– Weneedexascaleperformancewhilekeepingthepowerconsumptiontotheexistinglevels
– Powerawarealgorithmsandsoftwarewouldbeanothertrend
Rashid Mehmood HPC Big Data Convergence 7
Communicationsandnetworking
• Thousandsofprocessesandbillionsofthreads• Fine-grainedinteractionbetweentheprocessesisneeded
• Advancesincommunicationtechnologieshavebeenslowercomparedtoadvancesinprocessors
• Highbandwidthandlowlatencytechnologiesareneeded
• Algorithmsandsoftwarearerequiredthatlocaliseandminimisecommunication– Useadditionalcomputingratherthancommunication
Rashid Mehmood HPC Big Data Convergence 8
Challenges(memoryanddatalocality)
• Costofmovingdataishigherthanafloatingpointoperation– Energycostofaddoperationisaround0.9nj,L1toregisteris1nJ,whilemoving
dataoff-chiptoregisterisaround100nJ– 28to40%ofthetotalenergyconsumptionisspentmovingdata– 19to36%ofitiswastedinstalledcycles
• Bandwidth– theXeonbandwidthisonlyupto60GBs−1– IBM’sPower8bandwidthofupto192GBs−1– NVIDIA’slatestGPUshaveover280GBs−1
• Memorycapacity,latencyandbandwidtharecriticalforexascalecomputing
• Newtechnologiessuchasstackedmemorymayhelp• Butmemorypercoreisdecreasing• Designofalgorithmsrequiringlowermemorywouldbethetrend• Datalocalitywouldbeamongthemostimportantgoals• Needstobeintrinsicintheprogrammingmodels
Rashid Mehmood HPC Big Data Convergence 9
Challenges(FaultTolerance)• HPCsoftwarearedevelopedwiththeassumptionsoflow
faultoccurrenceprobability• Typicallyproportionaltothenumberofprocessesand
interactions• Billionsoffine-grainedprocessesinacomputationwill
increasethelikelihoodoffaultoccurrence• Bigdatasystemsaretypicallymorefault-tolerantpartlydue
tomachinevirtualisation,andfailovertechnologies– Butnottothedesiredlevel
• Thiswillpushtowardsrequiringmoreloosely-coupled,intelligent,somewhatfailureawarealgorithmsandsoftwaredesign
• Resilienceneedstobeintrinsicinprogrammingparadigms
Rashid Mehmood HPC Big Data Convergence 10
Challenges(4VsofBigData)• Volume,Velocity,VarietyandVeracity• Largevolumeofdatatypicallymeanmorememory,more
computing,andmoreinteraction• FastdatarequiresfasterI/O,memories,computationalelements,
workflows,applicationsandstrategiesfordatamanagement• Bigdatamayhavemanymorefilesthanthecurrentfilesystems
areabletodealwith• Restructuringandmanagementofcurrentscientificworkflowsis
requiredtomeetthecurrentandfuturedevelopmentsinHPC• Insitudataanalysiswouldalsoneedstobeintegratedwhere
efficient• Varietyandveracityrequireintelligentself-awaremethods
Rashid Mehmood HPC Big Data Convergence 11
Challenges(programmingmodels)• MPI(MessagePassingInterface)isrelativelytightlycoupled
andatypicalchoiceamongtheHPCcommunity• BSP(BulkSynchronousParallel)andMap-Reduceare
typicallyusedbyBigDatausers• Bigdataenvironmentsaremoreexpressiveandproductive
butofferlowerperformancecomparedtoHPC• Newprogrammingmodelsareneededtoexpressthefine-
grainedparallelismamongmillionsofcoresandbillionsofthreads.– Plusheterogeneityofsystems– Resilience– Datalocality
Rashid Mehmood HPC Big Data Convergence 12
SoftwareandHardwareGap• Thehardwarehasalreadychangedatamuchfasterrate
thanthesoftware• Exascalelevelparallelismwouldcreateevenmore
heterogeneityandchange• Theeffortindevelopingsoftwareishugeforagiven
architecture• maintainingandadaptinganexistingsoftwarefornew
architecturesisdaunting• Evenmorechallengingistoadaptexistingcodestofine-
grained(cores)heterogeneoussystemenvironments• Reformulationofscientificproblems,algorithmsand
workflowsareneededtomovetoexascalecomputing– E.g.computeratherthanfetchingdataifpossible
Rashid Mehmood HPC Big Data Convergence 13
AdditionalTechnicalChallenges• Correctnessofalgorithmsandsoftware• EfficiencyofthescientificprocesstosetupexperimentsanduseHPC/BDresources
• UsabilityandImpactismoreimportantthanbenchmarks
• Applicationscientistsareunderstandablymorefocussedonresultsandaccuracy– andlessonenergy,systemandworkflow/processefficiency
– Coordinationtoconsiderthebiggerpicturewouldimproveefficiency,productivityandtimetoinnovation
Rashid Mehmood HPC Big Data Convergence 14
CurrentConvergenceEfforts
Mellanox UDA,RDMA-Hadoop,DataMPI,Hadoop-IPoIB,HMOR
SpecificSolutionsConvergenceApproach
myHadoop
LibHDFS
MPI,ad-hocHadoop,CloudBlast,Spark,HTCondor
dataMPI
VirtualizedAnalyticsShipping(VAS),Sparkondemand
iRODS,MapReduce-MPI,Pilot-MapReduce,SRMetc
(Triple-H)
HPCorientedMapReducesolution
Hadoopon-demandontraditionalHPCresources
HPCapplication’sinterfacetoHDFS
Parallelizationofmanytaskapplicationswithdifferent
workflowsystemsOverlappingofmap,shuffleand
mergephasesMap-reducelikeframeworkforin
situdataanalysisSolutionstodealwithmassiveamountofdataindataintensive
applicationsHybriddesigntoreduceI/O
bottleneckinHDFS
Rashid Mehmood HPC Big Data Convergence 15
DesignPatternsBasedConvergedFuture
Rashid Mehmood HPC Big Data Convergence 16
Businesses Customers Scientists Administrators
Visualization&ManagementLayer
DistributedandParallelVisualizations
LiveAnalysis,AdvanceSearches,recommendations
Interactivedataexploration,renderingdatavisualizations,customized&user-friendly
experience,realtimemonitoring
Analytics/ProcessingLayerAnalyticspatternforunstructured
&structuredata,ProcessingAbstractionPattern,HighVelocity
real-timeprocessingetc.
AdvanceAlgorithms,computationsinparallel,StructuralPatterns,ComputationalPatterns,ParallelAlgorithmicstrategypatternsetc.
RealTimeAnalysis&BatchAnalysis,ResilienceDesignpatterns,Energyefficiency,Trade-offDesignPatterns
Storage/AccessLayer
Distributed&ParallelFileSysteme.g.LUSTER,HDFSetc. CognitiveStorage
UnstructuredDatae.g.HDFS,GFS,NoSQL(MongoDB).StructuredData
e.g.BigTable,HBase
DataSizeReduction,HighVolumeHierarchical,linked,Tabular&Binarystorage,Real-TimeAccess
Structured,Un-Structured,Semi-Structured IoT,Socialmedia,Scientificsimulations,Geographicaldata
DataTypes DataSources
Hardware
Commodity+State-of-the-Art
Acknowledgement
• SardarUsman• FurqanAlam
Rashid Mehmood HPC Big Data Convergence 17
References[1] EricD.Isaacs.(2010,Nov.)HuffpostChicago.[Online].http://www.huffingtonpost.com/eric-d-isaacs/why-america-must-win-the_b_785652.html
[2] Wikipedia.Supercomputer.[Online].https://en.wikipedia.org/wiki/Supercomputer
[3] Top500.http://www.top500.org/.[Online].http://www.top500.org/
[4] BBC.(2015,Apr.)USnuclearfearsblockIntelChinasupercomputerupdate.[Online].http://www.bbc.com/news/technology-32247532
[5] GilesM.B.andRegulyI.,"Trendsinhigh-performancecomputingforengineeringcalculations.,"Phil.Trans.R.Soc.A,vol.372,no.2022,2014,http://dx.doi.org/10.1098/rsta.2013.0319.
[6] TheWhiteHouse,OfficeofthePressSecretary.ExecutiveOrder-- CreatingaNationalStrategicComputingInitiative,29July2015.[Online].https://www.whitehouse.gov/the-press-office/2015/07/29/executive-order-creating-national-strategic-computing-initiative
[7] RobertF.Service,"Obamaordersefforttobuildfirstexascalecomputer,"Science,AAAS,July2015,http://www.sciencemag.org/news/2015/07/obama-orders-effort-build-first-exascale-computer
[8] LawrenceLivermoreNationalLaboratoryDonaCrawford.(2016,Jan.)TheImpactoftheU.S.SupercomputingInitiativeWillBeGlobal.[Online].http://www.top500.org/blog/the-impact-of-the-us-supercomputing-initiative-will-be-global/
[9] DanielA.ReedandJackDongarra,"ExascaleComputingandBigData,"CommunicationsoftheACM,vol.58,no.7,pp.56--68,July2015,http://doi.acm.org/10.1145/2699414
Rashid Mehmood HPC Big Data Convergence 18
References10. NiroshinieFernando,SengW.Loke,WennyRahayu,Mobilecloudcomputing:Asurvey,FutureGenerationComputerSystems,Vol.29,
Issue1,pp84–106,2013.
11. HoangT.Dinh,ChonhoLee,DusitNiyato,PingWang,Asurveyofmobilecloudcomputing:architecture,applications,andapproaches,Vol13Issue18,2013.
12. ScottJarr,FastDataandtheNewEnterpriseDataArchitecture.FirstEdition.October2014.O’ReillyMedia,Inc.
13. BobMarcus,“DataProcessinginCyber-PhysicalSystems”,January2016
14. BarryBolding,5PredictionsforSupercomputingin2016
15. https://www.hpcwire.com/2015/11/18/hpc-roi-invest-a-dollar-to-make-500-plus-reports-idc/
16. http://www.enterprisetech.com/2016/11/16/idc-ai-hpda-driving-hpc-high-growth-markets/?eid=328369061&bid=1593803
17. SaudiArabiaInvestsUS$70BillioninEconomicCitiesProject,Cisco.
18. TechnologyHoldstheKeytoSuccessforSaudiArabia'sVision2030,SaysIDC,21May2016.
19. Bigdataessentialtocancer'moonshot‘,CIO,11May2016.http://www.cio.com/article/3068571/government/big-data-essential-to-cancer-moonshot.html
20. VicePresidentBidenSaysBetterData,ComputingMakeCancerBeatable,CIO,19September2016.http://blogs.wsj.com/cio/2016/09/19/vice-president-biden-says-better-data-computing-make-cancer-beatable/
21. http://qz.com/811199/apple-aapl-is-scaling-back-its-autonomous-car-ambitions-and-focusing-on-creating-self-driving-software/
22. “IsHadooptheNewHPC,”2016.[Online].Available:http://www.admin-magazine.com/HPC/Articles/Is-Hadoop-the-New-HPC.[Accessed:17-March-2018].
23. Referenceslisttobeupdated
Rashid Mehmood HPC Big Data Convergence 19
References• EricD.Isaacs.(2010,Nov.)HuffpostChicago.[Online].http://www.huffingtonpost.com/eric-d-isaacs/why-america-
must-win-the_b_785652.html
• Wikipedia.Supercomputer.[Online].https://en.wikipedia.org/wiki/Supercomputer
• Top500.http://www.top500.org/.[Online].http://www.top500.org/
• BBC.(2015,Apr.)USnuclearfearsblockIntelChinasupercomputerupdate.[Online].http://www.bbc.com/news/technology-32247532
• GilesM.B.andRegulyI.,"Trendsinhigh-performancecomputingforengineeringcalculations.,"Phil.Trans.R.Soc.A,vol.372,no.2022,2014,http://dx.doi.org/10.1098/rsta.2013.0319.
• TheWhiteHouse,OfficeofthePressSecretary.ExecutiveOrder-- CreatingaNationalStrategicComputingInitiative,29July2015.[Online].https://www.whitehouse.gov/the-press-office/2015/07/29/executive-order-creating-national-strategic-computing-initiative
• RobertF.Service,"Obamaordersefforttobuildfirstexascalecomputer,"Science,AAAS,July2015,http://www.sciencemag.org/news/2015/07/obama-orders-effort-build-first-exascale-computer
• LawrenceLivermoreNationalLaboratoryDonaCrawford.(2016,Jan.)TheImpactoftheU.S.SupercomputingInitiativeWillBeGlobal.[Online].http://www.top500.org/blog/the-impact-of-the-us-supercomputing-initiative-will-be-global/
• DanielA.ReedandJackDongarra,"ExascaleComputingandBigData,"CommunicationsoftheACM,vol.58,no.7,pp.56--68,July2015,http://doi.acm.org/10.1145/2699414
Rashid Mehmood HPC Big Data Convergence 20
References• NiroshinieFernando,SengW.Loke,WennyRahayu,Mobilecloudcomputing:Asurvey,FutureGenerationComputerSystems,Vol.29,Issue
1,pp84–106,2013.
• HoangT.Dinh,ChonhoLee,DusitNiyato,PingWang,Asurveyofmobilecloudcomputing:architecture,applications,andapproaches,Vol13Issue18,2013.
• ScottJarr,FastDataandtheNewEnterpriseDataArchitecture.FirstEdition.October2014.O’ReillyMedia,Inc.
• BobMarcus,“DataProcessinginCyber-PhysicalSystems”,January2016
• BarryBolding,5PredictionsforSupercomputingin2016
• https://www.hpcwire.com/2015/11/18/hpc-roi-invest-a-dollar-to-make-500-plus-reports-idc/
• http://www.enterprisetech.com/2016/11/16/idc-ai-hpda-driving-hpc-high-growth-markets/?eid=328369061&bid=1593803
• SaudiArabiaInvestsUS$70BillioninEconomicCitiesProject,Cisco.
• TechnologyHoldstheKeytoSuccessforSaudiArabia'sVision2030,SaysIDC,21May2016.
• Bigdataessentialtocancer'moonshot‘,CIO,11May2016.http://www.cio.com/article/3068571/government/big-data-essential-to-cancer-moonshot.html
• VicePresidentBidenSaysBetterData,ComputingMakeCancerBeatable,CIO,19September2016.http://blogs.wsj.com/cio/2016/09/19/vice-president-biden-says-better-data-computing-make-cancer-beatable/
• http://qz.com/811199/apple-aapl-is-scaling-back-its-autonomous-car-ambitions-and-focusing-on-creating-self-driving-software/
• Referenceslisttobeupdated
Rashid Mehmood HPC Big Data Convergence 21
References• Giffinger,Rudolf;ChristianFertner,HansKramar,RobertKalasek,Nataša Pichler-Milanovic,EvertMeijers (2007)."Smartcities– Rankingof
Europeanmedium-sizedcities".http://www.smart-cities.eu/.Vienna:CentreofRegionalScience.
• RashidMehmoodandM.Nekovee,VehicularAdhocandGridNetworks:Discussion,DesignandEvaluation,InProcofthe14thWorldCongressonIntelligentTransportSystems,October2007
• SMART2020:Enablingthelowcarboneconomyintheinformationage.AreportbyTheClimateGrouponbehalfoftheGlobaleSustainability Initiative(GeSI).2008.
• NicholasStern.KeyElementsofaGlobalDealonClimateChange,LondonSchoolofEconomicsandPoliticalScience.2008.http://www.lse.ac.uk/collections/climateNetwork/publications/KeyElementsOfAGlobalDeal_30Apr08.pdf
• NicholasStern.ExecutiveSummary,SternReviewontheEconomicsofClimateChange,HMTreasury.2006.
• http://abhi-carmaniacs.blogspot.co.uk/2012/02/vehicular-ad-hoc-network.html
• http://mubbisherahmed.wordpress.com/2011/11/29/the-future-of-intelligent-transport-systems-its/
• R.Mehmood,Disk-basedTechniquesforEfficientSolutionofLargeMarkovChains,SchoolofComputerScience,UniversityofBirmingham,UK,October2004
• R.Mehmood,JA.Lu,ComputationalMarkovian analysisoflargesystem,InSpecialissueonIntelligentManagementSystemsinOperations,JournalofManufacturingTechnologyManagement,Vol.22,Issue6,pp.804– 817,2011,DOI:10.1108/17410381111149657
• RashidMehmood,Jie A.Lu.ComputationalMarkovian analysisoflargesystem.InSpecialissueonIntelligentManagementSystemsinOperations.JournalofManufacturingTechnologyManagement,Vol.22,Issue6,pp.804– 817,2011.DOI:10.1108/17410381111149657
Rashid Mehmood 22HPC Big Data Convergence
References • N.Komninos,“Intelligentcities:Variablegeometriesofspatialintelligence,”Intell.Build.Int.,vol.3,no.3,pp.172–188,2011
• RashidMehmood,Furqan Alam,NasserN.Albogami,Iyad Katib,Aiiad Albeshri andSalehM.Altowaijri,UTiLearn:APersonalisedUbiquitousTeachingandLearningSystemforSmartSocieties,IEEEAccess,March2017
• Zubaida AlAzawi,OmarAlani,Mohmmad B.Abdljabar,SalehAltowaijri,andRashidMehmood,ASmartDisasterManagementSystemforFutureCities,InProceedingsoftheACMWorkshoponWirelessandMobileTechnologiesforSmartCities(WiMobCity 2014),inconjunctionwiththe15thACMInternationalSymposiumonMobileAdHocNetworkingandComputing(MobiHoc 2014),Philadelphia,USA,August11-14,pp1-10,2014.
• Z.Alazawi,M.Abdljabar,S.Altowaijri,A.M.Vegni andR.Mehmood,ICDMS:AnIntelligentCloudbasedDisasterManagementSystemforVehicularNetworks,CommunicationTechnologiesforVehicles,LectureNotesinComputerScience,Vol.7266/2012,April2011,DOI:10.1007/978-3-642-29667-3_4
• Zubaida Alazawi,Mohmmad Abdljabar,SalehAltowaijri andRashidMehmood.InvitedPaper:IntelligentDisasterResponseSystembasedonCloud-EnabledVehicularNetworks.11thInternationalConferenceonIntelligentTransportationSystems(ITS)Telecommunications,Saint-Petersburg,Russia,August2011.DOI:10.1109/ITST.2011.6060083
• RashidMehmood andJonCrowcroft,ParallelIterativeSolutionMethodforLargeSparseLinearEquationSystems,TechnicalReportUCAM-CL-TR-650,ComputerLaboratory,UniversityofCambridge,October2005,http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-650.html
• http://cities.media.mit.edu/• Wikipedia.SmartCity.[Online].https://en.wikipedia.org/wiki/SmartCity• A.Caragliu,C.DelBo,andP.Nijkamp,“SmartCitiesinEurope,”3rdCent.Eur.Conf.Reg.Sci.,pp.45–59,2009.
Rashid Mehmood HPC Big Data Convergence 23
Be Part of ourJourneytowardsRealizingSaudiVision2030…
Thankyou…
HPC Big Data Convergence 24Rashid Mehmood