“ Clusters in Molecular Sciences Applications”, 2 Clusters in Molecular Sciences Applications”, 2 nd nd Annual iHPC Cluster Workshop, Ottawa Jan Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 11, 2002. p. 1 Clusters in Clusters in Molecular Sciences Molecular Sciences Applications Applications Serguei Patchkovskii Serguei Patchkovskii @# @# , Rochus , Rochus Schmid Schmid @ , Tom Ziegler , Tom Ziegler @ , , Siu Pang Chan Siu Pang Chan # , Andrew McCormack , Andrew McCormack # , , Roger Rousseau Roger Rousseau # , Ian Skanes , Ian Skanes # @ Department of Chemistry, University of Calgary, Department of Chemistry, University of Calgary, 2500 University Dr. NW, Calgary, Alberta, T2N 1N4 2500 University Dr. NW, Calgary, Alberta, T2N 1N4 Canada Canada # Theory and Computation Group, SIMS, NRC, 100 Theory and Computation Group, SIMS, NRC, 100 Sussex Dr., Ottawa, Ontario, K1A 0R6 Sussex Dr., Ottawa, Ontario, K1A 0R6
30
Embed
“Clusters in Molecular Sciences Applications”, 2 nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 1 Clusters in Molecular Sciences Applications.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 11
Clusters in Molecular Clusters in Molecular Sciences ApplicationsSciences Applications
Siu Pang ChanSiu Pang Chan##, Andrew McCormack, Andrew McCormack##, Roger , Roger RousseauRousseau##, Ian Skanes, Ian Skanes##
@@Department of Chemistry, University of Calgary, 2500 University Dr. NW, Department of Chemistry, University of Calgary, 2500 University Dr. NW, Calgary, Alberta, T2N 1N4 CanadaCalgary, Alberta, T2N 1N4 Canada
##Theory and Computation Group, SIMS, NRC, 100 Sussex Dr., Ottawa, Theory and Computation Group, SIMS, NRC, 100 Sussex Dr., Ottawa, Ontario, K1A 0R6 Ontario, K1A 0R6
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 22
• Are clusters a lasting, efficient investment?Are clusters a lasting, efficient investment?
• Odysseus: an internal cluster at the SIMS Odysseus: an internal cluster at the SIMS theory grouptheory group
• Clusters in molecular science applications: Clusters in molecular science applications: software availability and performancesoftware availability and performance
• Three war stories, and a cautionary messageThree war stories, and a cautionary message
• Summary and conclusionsSummary and conclusions
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 33
Shared, Academic Clusters in Shared, Academic Clusters in CanadaCanada
LocationLocation CPUsCPUs URL of other infoURL of other infoCarleton U.Carleton U. 8xPII-4008xPII-400 www.scs.carleton.ca/~gis/www.scs.carleton.ca/~gis/
U of CalgaryU of Calgary 179xAlpha179xAlpha www.maci-cluster.ucalgary.cawww.maci-cluster.ucalgary.ca
U of Western OntarioU of Western Ontario 144xAlpha144xAlpha GreatWhite.sharcnet.caGreatWhite.sharcnet.ca
U of Western OntarioU of Western Ontario 48xAlpha48xAlpha DeepPurple.sharcnet.caDeepPurple.sharcnet.ca
McMaster UMcMaster U 106xAlpha106xAlpha Idra.physics.mcmaster.caIdra.physics.mcmaster.ca
U of GuelphU of Guelph 120xAlpha120xAlpha Hammerhead.uoguelph.caHammerhead.uoguelph.ca
U of WundsorU of Wundsor 8xAlpha8xAlpha
Winfrid Laurier UWinfrid Laurier U 8xAlpha8xAlpha
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 44
Canadian top-500 facilitiesCanadian top-500 facilities
Cluster
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 55
Internal, “workhorse” clustersInternal, “workhorse” clustersLocationLocation CPUsCPUs URL or otherURL or other
U of AlbertaU of Alberta 98xPIII-45098xPIII-450 www.phys.ualberta.ca/THORwww.phys.ualberta.ca/THOR
U of CalgaryU of Calgary 94x21164-50094x21164-500 www.cobalt.chem.ucalgary.cawww.cobalt.chem.ucalgary.ca
U of CalgaryU of Calgary 120xPIII-1000120xPIII-1000 www.ucalgary.ca/~tieleman/elk.htmlwww.ucalgary.ca/~tieleman/elk.html
U of CalgaryU of Calgary 32xPIII32xPIII
Memorial UMemorial U 32xPII-30032xPII-300 weland.esd.mun.caweland.esd.mun.ca
Samuel Lunenfeld Research InstituteSamuel Lunenfeld Research Institute 224xPIII-450224xPIII-450 Bioinfo.mshri.on.ca/yac/Bioinfo.mshri.on.ca/yac/
Sherbrooke USherbrooke U 64xPII-40064xPII-400
U of SaskatchewanU of Saskatchewan 12xAthlon-80012xAthlon-800 Sasquatch.usask.caSasquatch.usask.ca
Simon Frazer USimon Frazer U 16xPIII-50016xPIII-500 www.sfu.ca/acs/cluster/www.sfu.ca/acs/cluster/
U of VictoriaU of Victoria 39xPIII-45039xPIII-450 Pingu.phys.uvic.ca/muse/ (?)Pingu.phys.uvic.ca/muse/ (?)
McMaster UMcMaster U 32xPIII-70032xPIII-700 www.cim.mcgill.ca/~cvr/beowulf/www.cim.mcgill.ca/~cvr/beowulf/
CERCA, MontrealCERCA, Montreal 16xAthlon-120016xAthlon-1200 www.cerca.umontreal.ca/~fourmano/www.cerca.umontreal.ca/~fourmano/
U of Western OntarioU of Western Ontario variousvarious www.baldric.uwo.cawww.baldric.uwo.ca
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 66
Clusters are everywhereClusters are everywhereLemma 1Lemma 1: A computationally-intensive research group : A computationally-intensive research group in Canada can be in one of the three states:in Canada can be in one of the three states:
a)a) It owns a cluster, orIt owns a cluster, or
b)b) It builds a cluster, orIt builds a cluster, or
c)c) It plans building a cluster RSNIt plans building a cluster RSN
Clusters became a mainstream research tool – useful,Clusters became a mainstream research tool – useful,but not automatically worthy of a separate mentionbut not automatically worthy of a separate mention
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 77
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 88
Cobalt: Nodes and NetworkCobalt: Nodes and NetworkDigital/Compaq Personal Workstation Digital/Compaq Personal Workstation 500au. 500au. CPUCPU Alpha 21164A, 500 MHzAlpha 21164A, 500 MHzCacheCache 96Kb on-chip (L1 and L2)96Kb on-chip (L1 and L2)Peak flopsPeak flops 101099 Flop/second Flop/secondSpecInt 95SpecInt 95 15.7 (estimate)15.7 (estimate)SpecFP 95SpecFP 95 19.5 (estimate)19.5 (estimate)
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 99
Cobalt: SoftwareCobalt: Software
OS, communications, and cluster management:OS, communications, and cluster management:Base OS: Tru64, using DMS, NIS, and NFSBase OS: Tru64, using DMS, NIS, and NFS
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 1010
Cobalt: Return on the InvestmentCobalt: Return on the Investment
Investment: DollarsInvestment: Dollars Payback: Research ArticlesPayback: Research Articles
Total publicationsTotal publications 9292
… … including:including:
OrganometallicsOrganometallics 2121
J. Am. Chem. Soc.J. Am. Chem. Soc. 1212
J. Phys. Chem.J. Phys. Chem. 1111
J. Chem. Phys.J. Chem. Phys. 1010
Inorg. Chem.Inorg. Chem. 66
Total costTotal cost 390,800390,800
… … including:including:
Initial purchaseInitial purchase 346,000346,000
Operating (’98-’01)Operating (’98-’01) power (6power (6¢¢/kWh)/kWh) 15,80015,800 admin (20% PDF) admin (20% PDF) 24,00024,000 spare partsspare parts 5,0005,000
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 1111
Odysseus: Low-tech solution for Odysseus: Low-tech solution for high-tech problemshigh-tech problems11
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 1212
Odysseus: Low-tech solution for Odysseus: Low-tech solution for high-tech problemshigh-tech problems22
… … plus, on the front end:plus, on the front end:Intel PRO/1000Intel PRO/1000Adaptec AHA-2940UWAdaptec AHA-2940UW60Gb 7200rpm IDE60Gb 7200rpm IDE
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 1313
Odysseus: Low-tech solution for Odysseus: Low-tech solution for high-tech problemshigh-tech problems33
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 1414
Odysseus: Low-tech solution for Odysseus: Low-tech solution for high-tech problemshigh-tech problems44
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 1515
Odysseus: Low-tech solution for Odysseus: Low-tech solution for high-tech problemshigh-tech problems55
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 1616
Backup unit (tape+robot)Backup unit (tape+robot) 5,8605,860
Spare parts in stockSpare parts in stock 5,0245,024
Ethernet (switch, cables, and head node link)Ethernet (switch, cables, and head node link) 4,1904,190
Compiler (PGI)Compiler (PGI) 3,7803,780
UPSUPS 2,2652,265
Backup tapes (16+1)Backup tapes (16+1) 1,9111,911
Total:Total: 90,44190,441
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 1717
Clusters in molecular science – Clusters in molecular science – software availability software availability
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 1818
Software: ADFSoftware: ADFADF – Amsterdam Density ADF – Amsterdam Density
Functional (Functional (www.scm.comwww.scm.com))
Example: Cr(N)PorphExample: Cr(N)Porph
Full geometry optimizationFull geometry optimization38 atoms38 atoms580 basis functions580 basis functionsC4v symmetryC4v symmetry45Mbytes of memory45Mbytes of memorySerial time: 683 minutesSerial time: 683 minutes
Number of Cobalt nodes
Sp
eed
up
idea
l
Observed
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 1919
Serial time per step: 83 secondsSerial time per step: 83 seconds
Memory: 231MbytesMemory: 231Mbytes
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 2020
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 2121
Software: AMBERSoftware: AMBERAMBER – “Assisted Model AMBER – “Assisted Model
Building with Energy Building with Energy Refinement” Refinement” ((www.amber.ucsf.edu/amber/www.amber.ucsf.edu/amber/))
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 2222
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 2323
Software: PWSCFSoftware: PWSCFPWSCF and PHONON – Plane wave pseudopotential codes, PWSCF and PHONON – Plane wave pseudopotential codes,
optimized for phonon spectra calculations (optimized for phonon spectra calculations (www.pwscf.org/www.pwscf.org/))
Example: MgBExample: MgB22 solid solid
Geometry opt.Geometry opt.
40 Ryd cut-off40 Ryd cut-off
60 K-points60 K-points
odysseus
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 2424
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 2525
War Story #1War Story #1Odysseus hardware maintenance log, Oct 19, 2001:Odysseus hardware maintenance log, Oct 19, 2001: Overnight, node 6 had a kernel OOPS … it responds to Overnight, node 6 had a kernel OOPS … it responds to
network pings and keyboard, but no new processes can be network pings and keyboard, but no new processes can be started …started …
Reason:Reason: Heat sink on CPU#1 became loose, resulting Heat sink on CPU#1 became loose, resulting in overheating under heavy load.in overheating under heavy load.Resolution:Resolution: Reinstall the heat sinkReinstall the heat sinkDetected by:Detected by: Elevated temperature readings for the Elevated temperature readings for the CPU#1 (lm_sensors)CPU#1 (lm_sensors)Downtime:Downtime: 20 minutes (the affected node)20 minutes (the affected node)
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 2626
Odysseus hardware maintenance log, Nov 12, 2001:Odysseus hardware maintenance log, Nov 12, 2001: A large, 16-CPU VASP job fails with “LAPACK: Routine A large, 16-CPU VASP job fails with “LAPACK: Routine
ZPOTRF failed”, or random total energy ZPOTRF failed”, or random total energy Reason:Reason: DIMM in bank #0 on node 17 developed a single-DIMM in bank #0 on node 17 developed a single- bit failure at the address 0xfd9f0cbit failure at the address 0xfd9f0cResolution:Resolution: Replace memory module in bank #0Replace memory module in bank #0Detected by:Detected by: Rerunning failing job with different sets of nodes,Rerunning failing job with different sets of nodes, followed by the memory diagnostic on the affected followed by the memory diagnostic on the affected node (memtest32)node (memtest32)Downtime:Downtime: 1 day (the whole cluster) + 2 days (the affected node)1 day (the whole cluster) + 2 days (the affected node)
War Story #2War Story #2
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 2727
War Story #3War Story #3Odysseus hardware maintenance log, Dec 10, 2001:Odysseus hardware maintenance log, Dec 10, 2001: Apparently random application failures are observedApparently random application failures are observedReason:Reason: Multiple single-bit memory Multiple single-bit memory failures, on the nodes (bank #): failures, on the nodes (bank #): 6 (#2), 7 (#2,#3), 8 (#0), 6 (#2), 7 (#2,#3), 8 (#0), 10 (#0), 11 (#0) 10 (#0), 11 (#0) Resolution:Resolution: Replace memory modulesReplace memory modulesDetected by:Detected by: Cluster-wide memory diagnostic (memtest32) Cluster-wide memory diagnostic (memtest32) Downtime:Downtime: 3 days (the whole cluster)3 days (the whole cluster)
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 2828
• Using inexpensive, consumer-grade hardware Using inexpensive, consumer-grade hardware potentially exposes you to low-quality componentspotentially exposes you to low-quality components
• NeverNever use components which have no built-in use components which have no built-in hardware monitoring and error detection capabilityhardware monitoring and error detection capability
• Always configure your clusters to Always configure your clusters to reportreport corrected corrected errors and out-of-range hardware sensors readings. errors and out-of-range hardware sensors readings.
• ActAct on the early warnings on the early warnings
• Otherwise, you run a risk of producing garbage Otherwise, you run a risk of producing garbage science, science, and never knowing itand never knowing it
Cautionary NoteCautionary Note
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 2929
Hardware Monitoring with LinuxHardware Monitoring with Linux
CategoryCategory ParameterParameter PackagePackageMotherboardMotherboard Temperature; Power supply Temperature; Power supply
voltage; Fan statusvoltage; Fan statuslm_sensorslm_sensors##
““Clusters in Molecular Sciences Applications”, 2Clusters in Molecular Sciences Applications”, 2ndnd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 3030
Summary and ConclusionsSummary and Conclusions• Clusters are no longer a techno-geek’s toy, and will Clusters are no longer a techno-geek’s toy, and will
remain the primary workhorse of many research remain the primary workhorse of many research groups, at least for a whilegroups, at least for a while
• Clusters give an impressive return on the investment, Clusters give an impressive return on the investment, and may remain useful longer than expectedand may remain useful longer than expected
• Many (most?) useful research codes in molecular Many (most?) useful research codes in molecular sciences are readily available on clusterssciences are readily available on clusters
• Configuring and operating PC clusters can be tricky. Configuring and operating PC clusters can be tricky. Consider a reputable system integrator with Beowulf Consider a reputable system integrator with Beowulf hardware hardware and softwareand software experience experience