Top Banner
Compute Canada Calcul Canada A proposal to the Canada Foundation for Innovation – National Platforms Fund Hugh Couchman (McMaster University, SHARCNET) Robert Deupree (Saint Mary’s University, ACEnet) Ken Edgecombe (Queen’s University, HPCVL) Wagdi Habashi (McGill University, CLUMEQ) Richard Peltier (University of Toronto, SciNet) Jonathan Schaeffer (University of Alberta, WestGrid) David S´ en´ echal (Universit´ e de Sherbrooke, RQCHP) Executive Summary The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput- ing (HPC) organizations in Canada. The seven regional HPC consortia in Canada —ACEnet, CLUMEQ, RQCHP, HPCVL, SciNet, SHARCNET and WestGrid— represent over 50 institutions and over one thousand university faculty members doing computationally-based research. The Compute Canada initiative is a coherent and comprehensive proposal to build a shared distributed HPC infrastructure across Canada to best meet the needs of the research community and en- able leading-edge world-competitive research. This proposal is requesting an investment of 60 M$ from CFI (150 M$ with matching money) to put the necessary infrastructure in place for four of the consortia for the 2007-2010 period. It is also requesting operating funds from Canada’s research councils, for all seven consortia. Compute Canada has developed a consensus on national governance, resource planning, and resource sharing models, allowing for effective usage and man- agement of the proposed facilities. Compute Canada represents a major step forward in moving from a regional to a national HPC collaboration. Our vision is the result of extensive consultations with the Canadian research community.
58

Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

Jan 18, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

Compute Canada — Calcul Canada

A proposal to theCanada Foundation for Innovation – National Platforms Fund

Hugh Couchman (McMaster University, SHARCNET)Robert Deupree (Saint Mary’s University, ACEnet)Ken Edgecombe (Queen’s University, HPCVL)Wagdi Habashi (McGill University, CLUMEQ)Richard Peltier (University of Toronto, SciNet)Jonathan Schaeffer (University of Alberta, WestGrid)David Senechal (Universite de Sherbrooke, RQCHP)

Executive Summary

The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia in Canada —ACEnet,CLUMEQ, RQCHP, HPCVL, SciNet, SHARCNET and WestGrid— represent over 50 institutionsand over one thousand university faculty members doing computationally-based research. TheCompute Canada initiative is a coherent and comprehensive proposal to build a shared distributedHPC infrastructure across Canada to best meet the needs of the research community and en-able leading-edge world-competitive research. This proposal is requesting an investment of 60 M$from CFI (150 M$ with matching money) to put the necessary infrastructure in place for fourof the consortia for the 2007-2010 period. It is also requesting operating funds from Canada’sresearch councils, for all seven consortia. Compute Canada has developed a consensus on nationalgovernance, resource planning, and resource sharing models, allowing for effective usage and man-agement of the proposed facilities. Compute Canada represents a major step forward in movingfrom a regional to a national HPC collaboration. Our vision is the result of extensive consultationswith the Canadian research community.

Page 2: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

1 Introduction

High Performance Computing (HPC) is transforming research in Canadian universities andindustry. Computer simulations and models now supplement or even supplant traditional field orlaboratory experiments in many disciplines. Massive data-sets from large-scale field experimentsare being manipulated, stored and shared. Numerical laboratories open up otherwise inaccessiblerealms and enable insights that were inconceivable a few years ago.

Research worldwide has seen a dramatic increase in the demand for HPC in the traditionalareas of science and engineering, as well as in medicine and the social sciences and humanities.In 1999, Canada occupied an inconsequential position in HPC-based research, but that year sawthe first funding of HPC by the Canada Foundation for Innovation. The subsequent combinationof federal, provincial and industrial funding is enabling Canada to develop a strong foundation inHPC research, train highly qualified personnel and attract international experts.

1.1 Background

In 1995, thirty Canadian researchers met in Ottawa to discuss the inadequate computing fa-cilities available in Canada for academic researchers. The action plan arising from this meetingeventually led to the creation of C3.ca (www.c3.ca) in 1997, a national organization for advocatingresearch in high-performance computing. C3.ca represents the imagination, good will and sharedvision of more than 50 institutions and thousands of researchers, post-doctoral fellows, graduatestudents, and support personnel. C3.ca’s vision is to create “a Canadian fabric of interwoventechnologies, applications and skills based on advanced computation and communication systemsapplied to national needs and opportunities for research innovation in the sciences, engineeringand the arts.” This vision is still relevant today.

The creation of C3.ca fortuitously aligned with a new government of Canada initiative: theCanada Foundation for Innovation (CFI). Since the first CFI competition in 1998 (with resultsannounced in 1999) and subsequent announcements in 2000, 2002 and 2004, Canadian researchershave actively pursued building competitive HPC infrastructure across the country. This has beenfacilitated by the creation of seven regional consortia of research universities with the mandateto apply for, acquire and operate HPC facilities that would be shared among researchers in theirrespective consortia. The consortia details are given in Table 1.1, while Table 1.2 lists the majorresearch organizations that have partnered with the consortia. The number of faculty membersusing HPC has increased from a few hundred in 2000 to over one thousand in 2006 (and thatnumber is growing).

Maintaining the HPC infrastructure required multiple CFI applications to be approved eachfunding cycle. This led to the desire for a more stable and coordinated source of funding. Inresponse, a two-year-long effort culminated in the October 2005 publication of the C3.ca Long-Range Plan (LRP) for high performance computing in Canada (Engines of Discovery: The 21stCentury Revolution1), jointly funded by C3.ca, the National Research Council, CFI, NSERC,CIHR, SSHRC and CANARIE. CFI created the National Platforms Fund (NPF) program, in partas a response to the Long Range Plan. It recognized the large funds invested by CFI in sharedconsortia-based HPC infrastructure, as well as the significant investments made in research-group-specific, non-shared facilities.

1.2 Vision

Our vision is for a national collaboration to acquire and support world-class fully-shared HPC in-frastructure across the country, creating an environment that fosters and enables new research insightsand advances.

1 The plan is available at www.c3.ca/LRP

– 1 –

Page 3: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

Consortium Provinces Members

ACEnetwww.ace-net.ca

Newfoundland,Nova Scotia, NewBrunswick, PrinceEdward Island

Dalhousie U., Memorial U., Mount Allison U.,St. Francis Xavier U., St. Mary’s U., U. of NewBrunswick, U. of Prince Edward Island.Soon to join: Acadia U., Cape Breton U.

CLUMEQwww.clumeq.mcgill.ca

Quebec McGill U., U. Laval, UQAM, and all others branchesand institutes of l’Universite du Quebec: UQAC,UQTR, UQAR, UQO, UQAT, ETS, ENAP and INRS

RQCHPwww.rqchp.qc.ca

Quebec Bishop’s U., Concordia U., Ecole Polytechnique, U.de Montreal, U. de Sherbrooke

HPCVLwww.hpcvl.org

Ontario Carleton U., Loyalist College, Queen’s U., RoyalMilitary College, Ryerson U., Seneca College, U. ofOttawa

SciNetwww.scinet.utoronto.ca

Ontario U. of Toronto

SHARCNETwww.sharcnet.ca

Ontario Brock U., Fanshawe College, U. of Guelph, LakeheadU., Laurentian U., Sir. Wilfrid Laurier U., McMasterU., Ontario College of Art and Design, U. OntarioInstitute of Technology, Sheridan College, Trent U.,U. of Waterloo, U. of Western Ontario, U. of Windsor,York U.

WestGridwww.westgrid.ca

Alberta,British Columbia,Manitoba,Saskatchewan

Athabasca U., Brandon U., Simon Fraser U., U. ofAlberta, U. of British Columbia, U. of Calgary, U. ofLethbridge, U of Manitoba, U. of Northern BritishColumbia, U. of Regina, U. of Saskatchewan, U. ofVictoria, U. of Winnipeg

Table 1.1. Academic consortia membership (research hospitals affiliated with manyinstitutions above are full partners in their respective consortia.)

National Institute for Nanotechnology Sudbury Neutrino ObservatoryCanada Light Source Perimeter Institute for Theoretical PhysicsTRIUMF Fields Institute for Mathematical SciencesNRC Herzberg Institute for Astrophysics Robarts Research InstituteOuranos Centre de recherches mathematiquesNatural Resources Canada Banff CentreInstitut de recherche d’Hydro-Quebec

Table 1.2. Major research partners.

This proposal responds fully to CFI’s integrated strategy for HPC investments. Our proposalcalls for a strengthening of the Canadian HPC collaboration and a metamorphosis of C3.ca intoCompute Canada (Calcul Canada in French), reflecting the priority investment recommended in theLRP. Compute Canada will serve the HPC needs of Canadian University researchers, regardlessof their affiliation. It represents a major (and evolutionary) leap forward in two significant ways.First, instead of thinking regionally we are thinking nationally; all seven consortia are full partners

– 2 –

Page 4: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

in this proposal to create a national initiative. Second, CFI funds have historically been targetedtowards meeting the computational needs of individual consortia (with a CFI requirement for 20%sharing with the rest of Canada). All the consortia are working together to build a world-classfully shared national infrastructure for computational-based research in Canada.

We anticipate that the HPC community and this proposal will continue to evolve to meetthe needs of Canadian research in the future. Thus, the partners will continue working with theresearch community to refine and improve the vision.

1.3 Application Process

This proposal is based on an extensive consultation with the Canadian academic researchcommunity. The formal process began in October 2005 with a CFI-sponsored workshop. However,the consultations began much earlier in preparation for the next CFI application cycle. Theproposal has been co-authored by the National Initiatives Committee (NIC), consisting of onerepresentative from each consortium. High-level decisions were approved by the National SteeringCommittee (NSC), consisting of a Vice President (Research) from each consortium. Consultationwith the research community included surveys, submissions, and interviews. Every attempt wasmade to engage as much of the research community as possible.

Although this is a national proposal, the requested infrastructure is targeted to giving four ofthe consortia a much-needed technology refresh. CLUMEQ was last funded by CFI in 2000, whileRQCHP, SciNet and WestGrid were last funded in 2002. These four consortia represent 61% of thefaculty members and 69% of the research funding in Canada.2 CLUMEQ, SciNet, and WestGridhave exhausted their funding; RQCHP will complete its acquisitions in 2006. SHARCNET, HPCVLand ACEnet were funded in 2004 and are not yet finished deploying all of their infrastructure.

1.4 Outline

This document describes this vision and, with CFI’s help, plans for its realization. This proposalincludes the following:• an outline of the history of HPC efforts and investments in Canada (Section 2),• a discussion of past successes —the impact of HPC on Canadian research and development

(Section 3), and the potential for HPC to drive innovation (Case Studies),• a national HPC vision and strategy for HPC acquisition, coordination, management, and sus-

tainability of the infrastructure (Section 4),• plans for the efficient and effective operation and support of the infrastructure, so as to maximize

the benefits to researchers (Section 5), and• a detailed budget and justification for the proposed HPC acquisitions (Budget Justification).

A glossary is provided as an appendix to assist with the numerous acronyms used in thisdocument. Finally, the list of CFI evaluation criteria for this program is provided on the last page.References to these criteria appear between brackets in the margins where appropriate, as a guideto the evaluation of this proposal.

2 Impact of Past Investments in HPCThe foundations of today’s HPC resources have been built from strategic past investments in

both computing infrastructure and support personnel. The result, seven regional consortia withan agreement to share, is extremely efficient and is strongly endorsed by the recent Long RangePlan for HPC in Canada. The LRP’s executive summary cites Professor Martyn Guest of theUK’s Central Laboratory of the Research Councils (CLRC) at the Daresbury Centre:

“Canada has invested wisely in mid-range computing over the last five years and has createdthe best developed, most broadly accessible mid-range High Performance Computing facilitiesin the world.”

2 Based on 2003-2004 data, the most recent that were available at the time of this writing.

– 3 –

Page 5: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

2.1 Facilities and User Base [ 1a ]

Canada’s HPC facilities support thousands of researchers, graduate students and research as-sociates (see discussion below), attract bright young academics and students to their universities,and they create a skill development environment critical to the research capability of the HPC userbase. They enable universities to offer new, often interdisciplinary and/or multidisciplinary, pro-grams in computational science, engineering, medicine, and the arts. These include computationalhealth and tele-robotics, computational biology and drug design, computational nanotechnology,computational fluid dynamics, aerodynamics and combustion, computational forecasts for weatherand the environment, computational oil and gas reservoir modeling, and disaster remediation.

Figure 2.1A shows the CFI investments made in the consortia from 1999 to 2006 – 108 M$.Since CFI funds are at most 40% of the cost, the investment has been leveraged to acquire atleast 270 M$ of infrastructure. CFI has also invested in large non-shared computer facilities forspecialized research that are not part of any consortium. Calculating this investment is difficult,but our best lower bound estimate is 40 M$ (100 M$ including leverage). Figure 2.1B shows thatthis investment is benefitting a large and growing user community.

Million

s o

f C

DN

$

0

5

10

15

20

25

June 1999 July 2000 Jan. 2002 March 2004award date

WestGridSHARCNETSciNetHPCVLRQCHPCLUMEQACEnet

0

200

400

600

800

1000

1200

2001 2002 2003 2004 2005

Num

ber

of ac

counts

WestGrid

SHARCNET

SciNet

HPCVL

RQCHP

CLUMEQ

ACEnet

year of operation(A) (B)

Figure 2.1: (A) – Left : CFI investments in the consortia, in millions of dollars. (B)– Right : Growth of the user accounts over the years

Table 2.1 shows the CFI infrastructure of the four major consortia being funded in this proposal.The table reflects the fact that there has been no new funding grant to CLUMEQ since 2000, and toRQCHP, SciNet, and WestGrid since 2002. All consortia have stretched their dollars out over manyyears to minimize the gap between the time the funding is exhausted and the next opportunityfor a new CFI application. CLUMEQ, SciNet, and WestGrid are out of funds; RQCHP will finishspending in 2006.

The success of this consolidation of university-based resources into well-managed HPC consortiais reflected in the growth of the user base. Figure 2.1B shows the total number of user accountsat the consortia. Counting accounts is easy, but identifying “real” users is a difficult task. Theexistence of a user account may not translate into a “real” HPC user. Even the definition of auser is not obvious, since some researchers are actively involved in HPC-related work, but do notever need to log in to an HPC facility (e.g., people who design parallel algorithms, versus thosethat implement them). Further, the philosophy of shared HPC access complicates the issue sincea user may have multiple accounts – one for each consortium. One way of identifying real usersis by their usage. In 2005, across Canada there were 1,854 user accounts that utilized at least 10CPU hours of consortia resources. Of these, 455 used more than 10,000 CPU hours and 96 usedover 100,000 CPU hours in the past year (despite, for example, there being few facilities at ACEnetand old facilities at CLUMEQ and SciNet). These numbers are conservative lower bounds of usage,

– 4 –

Page 6: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

Consortium Affiliation Install Vendor andArchitecture

CPUs Peak(Gflops)

RAM(GB)

CLUMEQ McGill 2002 AMD (capability) 256 819 384SciNet CITA Self assembled (capacity) 538 2,500 264SciNet High Energy IBM (capacity) 448 2 100 448SciNet Planetary Physics NEC (vector) 16 128 128WestGrid U. Victoria IBM (capability) 364 910 728RQCHP U. de Montreal 2003 SGI (SMP) 128 768 512WestGrid U. of. Alberta SGI (SMP) 256 358 256WestGrid U. of Calgary HP (capability) 128 256 128WestGrid U. of BC IBM (capacity) 1,008 6,100 1,008RQCHP Sherbrooke 2004 Dell (capacity) 872 5,580 1,744SciNet Aerospace HP (capability) 140 840 360RQCHP Sherbrooke 2005 Dell (capability) 1,152 8,294 4,608WestGrid U. of Alberta IBM (SMP) 132 800 520WestGrid U. of Calgary HP (capability) 260 1,040 520WestGrid U. of BC IBM (capacity) 672 4,184 832

Table 2.1. Recent acquisitions by CLUMEQ, RQCHP, SciNet, and WestGrid (minimumof 128 CPUs – except for the vector architecture). For an explanation of architecturetypes, see p. 26.

since they only account for one dimension of HPC, ignoring, for example, memory and disk needs.The community size is much larger than these numbers portray; for example, an active graduatestudent account may reflect the work of a team including a professor, postdoctoral fellow, andother students.

All seven consortia have experienced enormous growth in their HPC user communities overthe past five years (Figure 2.1B), and this growth is expected to continue. All facilities acrossthe country are at full capacity, often with long waiting lines for access. This state of affairs isan underestimation of the future reality. CFI’s new policy is to not fund requests for comput-ing infrastructure outside the NPF program if the computing needs can be met by NPF funds.Applicants requesting non-NPF resources from CFI will need to make a compelling case in theirapplication. In the past, CFI has funded dozens of clusters dedicated to specific research projects.Most of these users will not be able to get their equipment refreshed by CFI, and they will turnto the shared consortia resources to meet their future needs. At this point in time, it is hard toestimate the impact of this policy. Conservatively, we expect to see a growth of more than 100 ad-ditional new users (for a community growth of 400) in the next couple of years as a consequence ofthis new CFI policy. In addition, universities are increasing the number of computationally-basedresearchers that they are hiring, further increasing the expected size of the user community. Theinvestments in HPC need to increase to accommodate our growing user community.

2.2 Competitive Advantage Nationally and Globally [ 1b ]

The nature of research today has demonstrated a compelling need for comprehensive, collabo-rative, and advanced computing resources. These tools provide the capability of solving large-scalescientific problems that were not even imaginable 10 years ago. Today, Canadian consortia arearmed with computing resources that enable their member institutions and researchers to be com-petitive both nationally and internationally. As a result, this has attracted and retained excellentresearchers, helped reverse the so-called “brain drain” (e.g., the world-class SciNet climate mod-elling group attracted two researchers from outside of Canada), strengthened partnerships among

– 5 –

Page 7: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

first

RankingNo

v 199

3

Nov 1

994

Nov 1

995

Nov 1

996

Nov 1

997

Nov 1

998

Nov 1

999

Nov 2

000

Nov 2

001

Nov 2

002

Nov 2

003

Nov 2

004

Nov 2

005

Figure 2.2: Canadian HPC systems on the Top 500 list.

institutions (this proposal being a prime example), and contributed to the country in all sectorsof society (see Section 3 and the Case Studies).

The CFI investment in HPC has also resulted in several academic sites appearing in the Top 500list (www.top500.org). The Top 500 list has been used since 1993 to provide a gauge of performancecapabilities of systems from around the world. The list is issued twice a year (June and November)and in recent years has seen huge changes with the performance of the minimum entry position (#500 on the list) increasing at a rate of roughly 40% every six months. Canada has never placed alarge number of systems on the Top 500 list. The current list has six Canadian systems, three ofwhich are academic systems, one in a government research organization, and two in the industrysector. For comparison, the United States has placed 33 in the academic sector, 82 in researchorganizations, and 156 in the industry sector. Over the past five years, systems from the differentconsortia have (often briefly) entered the list. Currently, two systems from RQCHP and one fromWestGrid are on the list. Figure 2.2 shows the history of Canadian academic systems on the list. Ofnote is how quickly they disappear from the list, illustrating how important stable and continuedfunding is to being competitive in HPC-supported disciplines and sectors.

Computational-based research has become increasingly competitive. Access to more parallelprocessing capacity, faster machines, larger memories, and bigger data repositories allow one toaddress leading-edge, so-called “grand challenge” problems. Internationally, it is a race and thosewith access to the best resources often win. There are no prizes or patents awarded for beingsecond to solve a research problem. Pre-CFI, most Canadian research was not competitive inthese areas because of a lack of access to competitive resources. The advent of CFI has changedthis, but Canada is still lagging behind the international efforts. For example, numerous countrieshave facilities beyond that of anything in Canada, including Japan (Top 500 list entries #7, #12,#21, #38, #39), Spain (#8), The Netherlands (#9), Switzerland (#13, #31), South Korea (#16),China (#26, #42), United Kingdom (#33, #34, #46), Australia (#36), and Germany (#37). TheUnited States has the most powerful HPC resource in the world – roughly 45 times more powerfulthan Canada’s highest entry in the Top 500 list. Spain’s top entry is roughly five times morepowerful than the best such facility in Canada — a massive computational edge for Canada toovercome if it wants its researchers to be internationally competitive.

2.3 Attracting and Retaining Excellent Researchers [ 1c ]

One of the reasons for Canada’s HPC research success is our ability to attract and retainworld-class researchers. The combination of federal, provincial and industrial funding has enabled

– 6 –

Page 8: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

Canada not only to encourage Canadians to return home and international experts to relocate here,but also to keep the country’s best minds living and researching here. Since 1999, the investmentin computational infrastructure by CFI and its provincial, industrial and university partners hasrejuvenated computational research across the country and stopped the drain of faculty, graduatestudents and skilled technology personnel in HPC-related areas of study. The number of researchersworking on HPC-related research projects has increased from a few hundred in 2,000 to roughly6,000 today. All institutions in Canada use the accessibility to HPC facilities as an importantrecruiting tool.

There are 163 Canada Research Chairs (CRC) who are currently benefitting from the consortiaHPC resources. User surveys showed an additional 13 CRCs who planned to use the facilities inthe near future.

Some consortia have access to funds to accelerate the recruitment process and to strengthentheir retention efforts. For example, SHARCNET has fostered and grown its research communitythrough its Chairs and Fellowships program, specifically in the key research areas of computationalmaterials, computational finance, bio-computing, and HPC tools. Its Chairs program has secured13 world-class researchers into tenure-track faculty positions at its partner institutions. Of thisnumber, six were recruited from outside Canada, one was from industry, one was from anotherprovince, and five were used for retention. SHARCNET has also awarded Fellowships to 123provincial researchers, including 24 PDFs, 10 international visiting fellows, 43 graduate fellowships,and 40 undergraduate fellowships. HPCVL has a similar program and has awarded 52 HPCVL–Sunscholarships. Both of these initiatives were partly funded through a Province of Ontario program.

HPC infrastructure has acted as magnet for attracting HQP (highly qualified people), includ-ing postdoctoral fellows, graduate students, and programmer/analysts. As one data point, theWestGrid survey of their user community (November 2005) had 72 postdoctoral fellows and 101graduate students indicate that HPC access was a major factor in deciding which university toattend. Of these, one third came from outside Canada. The survey also provided some illustrativequotes of the importance of HPC as an attraction and retention tool: “Our involvement withWestGrid has without a doubt been instrumental in attracting programmer-analysts wanting toacquire high-performance computing skills as well as senior research assistants.” (Gordon Brod-erick, University of Alberta, Project CyberCell). “Without (WestGrid) facilities, I would not beable to conduct the research we are currently doing and I would not be able to attract the peopleI have.” (Kenneth Vos, University of Lethbridge). “While WestGrid wasn’t the main reason fortheir choice to come here, it did play an important role in their decisions to come, and more so intheir decisions to stay.” (David Wishart, University of Alberta).

2.4 Enhanced Training of Highly Qualified Personnel [ 1d ]

Maintaining a leadership position in the global research community means enhancing not onlythe infrastructure in place, but also the skills and training of the personnel who use, operate, andenhance the HPC infrastructure. The potential for training highly qualified personnel is immense.There are over 1,000 investigators and many more graduate students and post-doctoral fellows withaccess to Canada’s HPC facilities at any one time. The shared facilities create an environment forskill development that is critical to Canada’s ongoing research capability.

Support teams made up of these personnel are essential in helping to minimize the challengingstartup period for researchers learning to work with HPC resources. Training sessions, “How-To”series, and skill development workshops connect researchers with highly trained technical supportstaff whose lending of knowledge, experience, and guidance results in more effective use of theHPC infrastructure (there were over 2000 registrants in consortia-sponsored HPC courses offeredin 2005). This skill set is also imparted to students and postdoctoral fellows, giving them boththe scientific knowledge and the programming experience necessary to create new computational

– 7 –

Page 9: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

methods and applications in their various fields, eventually leading to dramatic new insights.Interactions of HPC support staff with graduate students and PDFs will provide a fertile trainingground to develop the next generation of researchers.

The highly collaborative environment that has emerged from HPC research in Canada hasproduced a web of HPC facilities and technical analysts that has created an effective and pivotalsupport network. For example, the NSERC-funded Technical Analysts Support Program (TASP –see Sect. 5.6 below) initiative for HPC is a program that assists in the training of highly qualifiedpersonnel through research, including programmer analysts, visualization experts, and networkengineers. These individuals have the opportunity to be trained on a variety of systems anddevelop computational and visualization skills on a wide spectrum of HPC resources.

The TASP program is ideal for joint industrial/academic research projects and allows studentsto develop contacts across the country. By sharing solutions across scientific fields, they havedirect and frequent access to professionals in other fields. Also, the development of highly qualifiedpersonnel and the subsequent movement of these people among organizations and sectors constitutethe most effective form of technology transfer.

In addition, through the use of Access Grid,3 technical and research presentations are beingbroadcast across the country (to roughly a dozen sites). The current proposal will expand theusage of Access Grid by ensuring that all participating institutions have an appropriate AccessGrid meeting room (see p. 30 below).

2.5 Strengthened Partnerships Among Institutions [ 1e ]

Another benefit of past HPC investments has been the pan-Canadian collaborations they havesparked. The model of geographical cooperation that has evolved over the last four years has provento be an extremely cost-effective and efficient way of kick-starting Canadian expertise in HPC. Eachof the seven consortia leverages the value of regional cooperation. Getting institutions to worktogether towards common goals has been a major success of past HPC initiatives. Over time,most of the consortia have grown, as institutions see the benefit of working together. For example,MACI was the province of Alberta’s HPC initiative in 1998. With British Columbia universitiesjoining MACI in 2001, WestGrid was created. Following the addition in 2005 of the Universityof Victoria and of the provinces of Manitoba and Saskatchewan, WestGrid now encompasses allacademic research institutions in four provinces.

The seven consortia represent over 50 institutions, plus numerous industrial and research insti-tute partnerships. The HPC facilities are already shared across the country, with many consortiareporting external usage in excess of the 20% CFI target. This sharing has fostered national co-operation and good will. Further, the national pool of applications analysts (see Sect. 5.6) hascreated a distributed but shared resource of HPC expertise, allowing, for example, a researcher inVancouver to get HPC assistance from an analyst in St. John’s. TASP is run by C3.ca, whichrepresents the HPC interests of Canadian researchers. C3.ca is the best example of strengtheningpartnerships. This researcher-driven initiative led to the Long Range Plan, the TASP program,and a national strategy for HPC advocacy, and underlies the focus and scope of this proposal.

Last summer (2005), two of Canada’s largest distributed computing environments were con-nected over a dedicated high-speed optical link. The new bridge between SHARCNET and WestGridrepresents the first step towards a pan-Canadian network of HPC facilities. “CA*net 4 was builtwith applications like this in mind,” said CANARIE President and CEO Andrew K. Bjerring.While the move does not yet fully integrate WestGrid and SHARCNET’s facilities into a unifiedcomputing “grid”, the dedicated high bandwidth connection means researchers working at mem-ber institutions can share and transmit massive amounts of data, with virtually no constraints

3 Access Grid is an open source suite that supports large-scale distributed meetings, collaborative worksessions, seminars, lectures, tutorials and training. See www.accessgrid.org.

– 8 –

Page 10: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

on bandwidth. Discussions are under way with CANARIE to extend this arrangement across thecountry to include all seven consortia.

Efforts to maximize the use and power of existing HPC resources in Canada is leading toinnovative collaborative approaches. One interesting example of creating a world-class competitivefacility in Canada is the CISS project (Canadian Inter-networked Scientific Supercomputer), led byPaul Lu (University of Alberta). Dr. Lu and his team have developed the Trellis software package,which allows for the sharing of computational resources using a minimal software infrastructureon a host system. To date, four CISS experiments have taken place. The most recent one, in 2005,had all the consortia contributing resources to the project. Over two days, over 4,000 processorswere used, allowing two researchers to get a total of 20 years of computing done. These kind ofinitiatives illustrate how the consortia can work together and share to support research excellence.

2.6 Resources Used To Their Full Potential [ 1f ]

Canada’s HPC resources are currently over-subscribed and constantly being pushed, by the veryresearch they support, to be faster, more powerful and more accessible. The CISS experimentsillustrate this point, and demonstrate the need for access to computing resources well in excess ofwhat any single consortium can obtain. In effect, all the computational resources across the countryare fully used, sometimes with long queues. For example, WestGrid’s shared-memory computershave had periods of eight-day waits to get access to the machine. Similarly, the WestGrid capacitycluster (1680 processors) is 100% busy, with an average of 2000 jobs waiting in the queue to run.

The advent of regional consortia saw not only a giant leap forward in the power of availableHPC resources, but at the same time an increase in inter-institutional migration of users in orderto use the architecture most appropriate to their needs. For instance, users with needs for acapacity cluster could use such a facility at a different institution, instead of running serial jobson a relatively more expensive SMP located at their own institution.

Critical to the success of the HPC investment is the work of application analysts (see Sect. 5.6below). These are trained scientists, many holding a PhD, whose role is to provide specializedtechnical assistance to researchers. They may spend a few days to a few weeks working on aparticular code, looking for ways to optimize the performance, often by instrumenting the code,and parallelizing the code if needed. They also provide general training on HPC, bring newusers up to speed, etc. For instance, a RQCHP analyst at U. Montreal parallelized a user’s codewith OpenMP, obtaining 90% of the peak performance on 128 CPUs. Another RQCHP analyst atSherbrooke parallelized and ran a quantum dynamics code on up to 800 CPUs. A CLUMEQ analystat McGill has parallelized several computational fluid dynamics and computational aero-acousticscodes to nearly 90% efficiency on a 256-processor machine. A WestGrid analyst reorganized anapplication’s data structures to increase the overall parallel efficiency by a factor of 10. An analystat ACEnet was able to increase the use of a 128-node cluster at Dalhousie. His proactive initiativesresulted in the system moving from six users and roughly 60% CPU utilization, to one with 23active users and virtually 100% utilization. A group of analysts at RQCHP designed a set of high-level scripts, called bqTools, to allow users to submit literally hundreds or thousands of jobs on acluster with a single command. bqTools is an interface between the user and the PBS queueingsystem that submits multiple copies of the same code with different input files, generated from acommon template and the user’s specification of the range of parameters to explore. This set oftools is crucial in making efficient use of the RQCHP clusters.

2.7 Bringing Benefits to the Country [ 1g ]

The investments of CFI have resulted in world-competitive research that would not be possiblewithout the requisite computing infrastructure. Section 3 and the Case Studies go into more detailabout the scientific outcomes past, present, and future.

– 9 –

Page 11: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

In the June 1993 Top 500 list, academic systems comprised 29% of the entries (research systems31% and industrial systems 31%). The current list (Nov. 2005) has academic systems comprisingjust 14% of the entries (research 24% and industry now comprising 53%). More than anything,this is reflecting the increased role that HPC is playing in the commercial segment of the world’seconomy and the increased importance training in the use of HPC will play in preparing workersto contribute to that economy.

Canada is far behind on the industrial HPC side. Whereas we can argue that the number ofCanadian academic facilities on the Top 500 list at 10% of the United States is an appropriatenumber (3 versus 33), the disparity on the industry side is astounding (1 versus 156 facilities).This shows that benefits of HPC to Canadian industry are still in their early stages.4 There is hugepotential for increased competitiveness in industry as they realize the benefits of HPC—somethingthat can really only happen if the needed expertise, toolsets, experience, and HQP are available.The CFI investments in HPC have not had sufficient time to mature, allowing students to graduateand use their skills to create new companies or enhance existing ones. The United States Top 500numbers clearly show the significance of HPC to industry and, thus, the critical need for Canadato foster development of HPC-related expertise.

This being said, there clearly is significant HPC presence in the Canadian economy. Oil compa-nies in Alberta use computing clusters to perform seismic analysis and determine the best drillinglocations for oil. The quality of Environment Canada daily forecasts has risen sharply by roughlyone day per decade of research and advancements in HPC hardware and software (EnvironmentCanada has a perennial Top 500 entry). Accurate forecasts transform into billions of dollars savedannually in agriculture and natural disaster costs. In Quebec, aerospace companies such as Bom-bardier and Pratt & Whitney are heavy users of HPC for all design. The rapid disseminationof the SARS genomic sequence through the Internet allowed groups all over the world to partici-pate in the computational analysis of the genome and the 3D modeling of the protein sequences.Visualization systems allow medical doctors to see patient organ structures in 3D, foresters andgeologists to visualize landscapes as if they were standing in them, and engineers to view and refinetheir designs before having to manufacture them. Many such examples are given in Section 3.

3 Success and potential: HPC and Canadian InnovationThis section provides a sample of Canadian innovation using HPC, in the context of past ac-

complishments, current needs and the benefits accrued to Canada, and across a variety of scientificfields. The discussion is supported by more specific case studies appearing later in the proposal.The following is not in any way an exhaustive description of HPC-based research in Canada, butrather a sampling. It is understood that the researchers named in this section will not have, onthat basis, greater access to the infrastructure than the many more researchers who are not cited.

3.1 Elementary Particle Physics

This field seeks to unveil the fundamental constituents of matter and their interactions. Theinternational ATLAS collaboration is undertaking one of the largest scientific experiments ever,by designing, building, and operating the giant ATLAS particle detector. ATLAS is located at the27 km circumference Large Hadron Collider (LHC) at the European Centre for Particle Physics(CERN). This detector is expected to start acquiring data in 2007, after which a growing volumeof data will fuel an unprecedented worldwide data analysis effort. At present, 33 Canadian particlephysicists (PIs) at universities from BC to Quebec are part of the ATLAS collaboration; R. Orr(SciNet) is the spokesman of the Canadian group, and M. Vetterli (WestGrid) is the co-ordinator

4 Undoubtedly, the number of Canadian HPC sites is under-reported on the list. But then that is true forall countries.

– 10 –

Page 12: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

of ATLAS computing in Canada. HPC needs for data storage and analysis will be satisfied bya worldwide infrastructure: the LHC Computing Grid (LCG), as explained in more detail inCase Study 7.1, p. 37. Canada will contribute a Tier-1 facility for large-scale, collaboration-widecomputing tasks. This is made possible by a separate CFI grant. Tier-2 needs, to continue the [ 2a,2b ]analysis for the extraction of physics results by various analysis groups or individual researchers,will be met by the shared use of the National Platform. This makes the most efficient overalluse of computing resources. The requirements for disk storage are substantial and the analysiscan be carried out on capacity clusters, with standard interconnect. Particle physicists have been [ 1a ]running ATLAS simulation software, as well as developing grid computing tools for a few yearsnow, notably on equipment at WestGrid, SciNet, and RQCHP.

Many of the tools for distributed analysis have been tested and used in production for simu-lation and data analysis by the D0 and CDF experiments at Fermilab (Chicago), and by BaBar(Stanford). This work has used facilities in the existing consortia. Canada is also known for theSudbury Neutrino Observatory (SNO), a large underground neutrino detector. The SNO collabo-ration is headed by A. McDonald (HPCVL). It uses facilities at HPCVL (data storage and detectorsimulation) and at WestGrid (simulation).

3.2 Astrophysics

Understanding the origin of the Universe, the formation of structures such as planets, starsand galaxies, catastrophic events such as the formation of a black hole or the supernova explosionof a massive star are some of the key questions facing contemporary astrophysics. Canadian re-searchers are among the international leaders in all of these areas. Astronomy and astrophysicsin Canada ranks third in the world in terms of impact behind only the US and the UK. HPCplays a pivotal role in all of these investigations. Broadly speaking, computational needs divideinto data analysis, and modelling and simulation. The last decade has seen dramatic advancesin the volume and quality of data from ground- and space-based observatories and a huge risein the effort to extract the key scientific results from these data. A prime example is the pio- [ 1a,2a ]neering work of R. Bond (SciNet) to analyse data from satellites measuring the character of theCosmic Microwave Background. Observations at other wavelengths are also increasingly demand-ing access to large-scale HPC for reduction and analysis; examples include the large-scale opticalsurrvey work of R. Carlberg (SciNet), D. Schade (WestGrid) and M. Hudson (SHARCNET). Thetheoretical counterpart of this observational work is the numerical simulation of complex systems.The range of scales involved, several hundred billion in mass, drives a relentless requirement forthe largest parallel systems available. This is particularly true of simulations of large-scale cosmic [ 2a ]structure, galaxies and star formation as is typified by the work of J. Navarro (WestGrid), U.-L. Pen(SciNet), H. Couchman (SHARCNET), J. Wadsley (SHARCNET) and H. Martel (CLUMEQ). These [ 1a,b ]researchers and other Canadian astrophysicists have also authored or co-authored several of theleading simulation codes now used worldwide including Gasoline, HYDRA and Zeus. M. Chop-tuik (WestGrid) is one of the world’s leading authorities on the numerical solution of the generalrelativistic problem of coalescing black holes. R. Deupree (ACEnet) has developed numerical meth-ods for stellar hydrodynamics that can treat, for instance, close binary stars, also the object ofL. Nelson’s (RQCHP) research. P. Charbonneau (RQCHP) models solar activity through detailedmagneto-hydrodynamic computations. Forthcoming experiments and observations drive an anal- [ 2a,b ]ysis requirement for serial farms with several thousand processors, whilst on the simulation side agreat deal of effort has been invested over the last decade to develop and optimize parallel codesand the necessity is for large capability clusters with high performance (low latency and highbandwidth) interconnects. Case Study 7.2 (p. 38) provides more detail on specific needs and pastaccomplishments.

– 11 –

Page 13: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

3.3 Chemistry and biochemistry

Chemistry is one of the fields in which HPC is used by a substantial fraction of the commu-nity. This is due in part to the wide availability of high-performance electronic structure software,which allows one to predict the conformation and chemical function of molecules (so-called abinitio computations).Canada counts many established groups whose expertise in this field is recog- [ 2a ]nized worldwide (Boyd at ACEnet, Ziegler and Salahub at WestGrid). Ziegler’s work includes many [ 4a ]practical applications to industrially important processes, in particular in the field of catalysis;his group is involved in collaborations with several industrial partners. Becke (HPCVL/ACEnet),Ernzerhof (RQCHP) and Ayers (SHARCNET) are internationally known for their development ofnew methods/theories in Density Functional Theory. Many other groups in Canada are well knownfor the application of these methods to molecules or solids (Cote (RQCHP); Stott, Zaremba, St-Amant, Woo (HPCVL); Tse, Wang (WestGrid)). J. Polanyi (Nobel Laureate in Chemistry, SciNet)applies ab initio calculations to the reactions of organic halides with surfaces that result in nano-patterning. Further understanding of this nano-patterning by geometrically-controlled chemical [ 2a,2b ]reactions will require large-scale parallel computing, since 50 to 100 atoms should be used in thesimulation. Computational chemists using ab initio methods are, as a group, among the largestusers of HPC resources in Canada. Over the last few years, many new faculty members across [ 1c ]Canada have been hired in this field (e.g. Bonev (ACEnet), Iftimie (RQCHP), Schreckenbach(WestGrid)), which reflects the strong activities of this area of research. Ab initio calculations, [ 2b ]depending on the size of the problem, may be handled on a single processor – but within paramet-ric studies, requiring many instances, i.e., capacity clusters – or on distributed or shared memorymachines. Case Study 7.3 (p. 40) reports on the specific needs associated with calculations ofthe electronic structures of complex molecules or solids. Increasingly complex structures are be- [ 2b ]ing investigated (e.g. metal-organic frameworks) and HPC needs are evolving from capacity tocapability computing.

A particularly HPC-intensive branch of physical chemistry is coherent control, which seeksways to guide the quantum mechanical motion of electrons in molecules with the help of extremelyshort laser pulses (10−15 seconds) to control chemical reactions at the molecular level. Canadian [ 1a ]chemists such as A. Bandrauk (RQCHP), P. Brumer (SciNet) and M. Shapiro (WestGrid) areinternationally recognized leaders in this field and have been major HPC users on the Canadianscene. As suggested by HPC simulations, extremely short and intense laser pulses may also be used [ 4b ]to accelerate electrons to relativistic speeds, thus making possible the advent of table-top particleaccelerators that could replace bulkier technologies in the context of nuclear medicine. Coherent [ 3f ]control is an excellent example of interdisciplinary research (chemistry, physics, photonics). The [ 2a,2b ]specific HPC needs of this field are cutting-edge: for instance, very large distributed memory withlow-latency interconnect are used in order to solve the Schrodinger equation in the presence of alaser pulse in real time. Even more powerful resources will be necessary to treat the Schrodingerand Maxwell equations on the same footing, i.e., to incorporate the quantum coherent nature ofmolecules and the classical coherent aspects of light. This increases the dimensionality of thesystem of partial differential equations to be solved, dramatically increasing the computationaleffort.

The problem of coherent control is a particular application of quantum dynamics. Anotherexample is the study of quantum effects in the motion of nuclei and their implications on re-action rate constants, photo-dissociation cross sections and spectra, as studied by T. Carrington(RQCHP). These quantum effects must be treated in order to correctly understand many biologicaland combustion processes, and such calculations require solving huge systems of linear equationsand computing eigenvalues of matrices whose size exceeds a million. Large memory systems, such [ 2b ]as SMP computers, are needed, and the required memory increases exponentially with the num-ber of particles involved. For this reason, larger systems of molecules must be treated within thesimpler, approximate framework of classical dynamics, sometimes with some quantum mechanical

– 12 –

Page 14: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

input (semi-classical dynamics). R. Kapral (SciNet) has developed such methods to simulate pro-ton/electron transfer reactions, and to develop an understanding of how such reactions take placein complex chemical and biological environments. G. Peslherbe (RQCHP) develops and employssimilar methods to investigate clusters of atoms and molecules, either as novel materials withtailored properties or as a tool to unveil the fundamental role of solvation in chemistry and bio-chemistry. The work of G. Patey (WestGrid) focuses on the theory and simulation of dense fluidsand interfaces in the framework of Statistical Mechanics. The fundamental research conductedin Patey’s group encompasses phenomena of immense practical importance for chemical as wellas biological systems. All of these problems involve the classical dynamics of thousands, or even [ 2b ]millions of particles (much like many problems in astrophysics) and, depending on scale, may betreated on capacity clusters with large memory or on capability clusters.

3.4 Nanoscience and Nanotechnology

Nanoscience deals with structures that can be fabricated or have emerging properties at thenanometer to micrometer scale. This involves contributions from physics, chemistry, electrical en- [ 3f ]gineering, and some efforts are models of multidisciplinary research. Much activity in nanoscience [ 4a ]and nanotechnology is motivated by the continuing drive towards miniaturization in the micro-electronics industry, which needs new paradigms for nano-devices in order to continue: transistorsbased on small clusters of atoms or even single molecules (molecular electronics). At this level,an institution like the National Institute for Nanotechnology (NINT), based in Edmonton (West-Grid), plays an important leadership role. For instance, T. Chakraborty (U. Manitoba, WestGrid)is studying the interaction of electron spins in quantum dots measuring only a few nanometersin diameter, by using monster matrices to calculate the probabilities of electron spin jumpingfrom one level to another. R. Wolkow (WestGrid) and collaborators have shown through exper- [ 1a,1b ]iment and simulations that the electrostatic field emanating from a fixed point charge regulatesthe conductivity of nearby molecules, thus showing the feasibility in principle of a single-moleculetransistor, a discovery with huge potential impact in nano-electronics. Understanding how theatomic-scale structure affects the electronic properties of materials on a larger scale is the gen-eral objective of L. Lewis (RQCHP), who leads a long-established group that resort to a varietyof computational schemes: Ab initio calculations for small systems, molecular dynamics for verylarge systems of several million atoms, simulated for very long times (billions of time-steps). HPC [ 2b ]needs are therefore varied, from capacity to capability clusters.

Research on quantum materials deals with more fundamental aspects that may have animportant impact on nanotechnology. This fields focuses on properties of materials that are es-sentially quantum mechanical (i.e. not accessible to classical approximations) but that go beyondthe single molecule. For instance: understanding the mechanism for high-temperature supercon-ductivity or the effect of impurities in metals (case Study 7.5, p. 43). This class of problems (oftenreferred to as Strongly correlated electrons) has been the object of intense numerical effort in thelast 15 years, in particular in the development of new algorithms. It has become a major HPCconsumer worldwide, both for capacity and capability architectures. Exotic superconductivity isthe focus of the work of A.M. Tremblay and D. Senechal (RQCHP) with quantum cluster methods,of A. Paramekanti (SciNet) using variational wave-functions, and of G. Sawatzky (WestGrid) andTh. Devereaux (SHARCNET). E. Sorensen (SHARCNET) and I. Affleck (WestGrid) made key con-tributions to the study of magnetic impurities, as have M. Gingras (SHARCNET) and B. Southern(WestGrid) on frustrated magnetism. Canadian researchers in this field have already performed [ 1a,1b ]world class calculations (exact diagonalization of matrices in the hundreds of GB of memory, ca-pacity calculations with hundreds of CPUs over many days) that make them leaders in their field.Access to large capacity and capability is essential in order to maintain their competitive edge. [ 2a,2b ]Closely related to quantum materials are efforts at designing and simulating physical realizations ofquantum computers in which quantum interference effects are controllable (A. Blais (RQCHP),F. Wilhelm (SHARCNET)).

– 13 –

Page 15: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

3.5 Environment Science

Climate modelling. Future climate change as a result of human activities is an issue of great [ 1g,4a,4b ]social, economic and political importance. Global climate modelling is key to both public policyand business strategy responses to global warming, the polar ice melt and long-term pollutiontrends. The same modelling methods are also being applied to better understand such high-profileenvironmental issues as water scarcity, water basin management, forest fires, ocean current andtemperature changes that influence local climate and fisheries, and long term trends in ozone levels.The policy debates around some of these issues are intense – driving intense demand for betterscientific models and simulations.

Canada is a world leader in weather forecasting, climate modelling and prediction. This isdriven both by the Meteorological Service of Canada (MSC) and by strong research programs inmany universities. The work of R. Peltier (SciNet) on climate change is particularly relevant tothe current debate on the impact of human activity on the global climate. Case study 7.6 (p. 44) [ 1a,2a ]deals precisely with this question. Modelling of climate change at the regional (continental) scaleis the focus of the Canadian Regional Climate Modelling and Diagnostics Network (CRCMD), ledby CLUMEQ scientists such as R. Laprise and C. Jones. High northern latitudes are the region [ 1e,3f,4b ]of Earth that we expect to be most strongly affected by greenhouse-gas-induced global warming.Since the Canadian land-mass and adjacent shelves of the Arctic Ocean constitute a major portionof this region, the issue of the stability of northern ecosystems is an important national concern.The Polar Climate Stability Network (PCSN) brings together many Canadian researchers involvedin both observation and computation, with the goal of assessing and predicting the effects of globalwarming on Canada’s northern climate.

Oceans play a vital role in climate change because of their ability to store and transport heat [ 1g,4a,4b ]and carbon. The Labrador Current is responsible for the presence of large numbers of icebergs offthe Atlantic Canadian coast that pose a hazard to shipping and the offshore oil and gas industry.The oceans also provide a significant fraction of the global food supply, an important sector in theCanadian economy. The Canadian effort in ocean modelling is spread over a large number of [ 1e,3f ]Canadian universities at WestGrid, SHARCNET, SciNet, CLUMEQ and ACEnet, as well as MSC andthe Department of Fisheries and Oceans (DFO). Expertise exists over the whole range of spaceand time scales, from the role of the oceans in climate, the role of mesoscale eddies in drivingthe ocean and tracer transport, and small scale mixing processes. In particular, eddy currents, [ 2a,2b ]with length scales on the order of tens of kilometers, play a vital role in the ocean circulation,transporting heat and carbon, yet ocean circulation models with sufficient resolution to adequatelyresolve eddies and important bathymetric features are computationally very demanding, and arecurrently beyond the reach of Canadian researchers. For instance, an eddy-resolving model of theNorth Atlantic with eight tracers, including biogeochemistry, and 45 vertical levels, would require84 days on a single node SX-8 (vector computer) for one 20 year integration. On a different level, [ 2a,2b,3f ]Project Neptune (based at U. Victoria, WestGrid) involves sea-floor based interactive laboratoriesand remotely operated vehicles spread over a vast area. It will enable researchers to study processespreviously beyond of the capabilities of traditional oceanography. This project requires state-of-the-art communication and data storage infrastructure. Canadian researchers (WestGrid, HPCVL,CLUMEQ, ACEnet) also participate in the international ARGO project, that collects data fromthousands of floats disseminated across the world’s oceans and distributes it through a data gridcentered in Toulouse.

The value of the models used in environment science depends, of course, on their quality and [ 2a,2b ]accuracy, hence on increasing HPC performance. One of the great difficulties is the existenceof many characteristic time scales in the same problem. HPC needs in this field are met by fine-grained parallelism, which traditionally means parallel vector computers, although large capabilityclusters are also increasingly used (by MSC for instance).

– 14 –

Page 16: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

Forestry. Canada has over 418 million hectares of forests, representing 10% of the forested land inthe world. With this 10%, Canada is a global leader in some of the areas of forestry research, suchas remote sensing research in forestry, forest monitoring and carbon accounting. The impact of [ 4a,4b ]this research plays a pivotal role in the timely analysis of results for forest management, forest in-ventory, forest industry, and public information. HPC helps manage and preserve Canadian forestsin many ways. For instance, through Genomics research on forest pathologies. Laval University(CLUMEQ) researchers aim to develop methods and strategies for the biological control of diseasesof trees and microbial invaders of wood. The latter includes the use of genomic approaches forlarge-scale profiling of microbial populations that may harbour potential biocontrol agents. HPC [ 2b ]needs in this sector are those of Genomics in general: large capacity clusters. Also, HPC resourcesare crucial for monitoring the state of our forests. Forest researchers can now access an innovative,Canada-wide data-storage and management system for large-size digital images, a project initi-ated by the Canadian Forest Service and the University of Victoria (WestGrid). SAFORAH, or [ 1g,2a,4b ]System of Agents for Forest Observation Research with Advanced Hierarchies, is a virtual-worldnetworking infrastructure that catalogues, stores, distributes, manages, tracks, and protects theearth-observation images that the Canadian Forest Service retrieves from remote-sensing sources tomeasure such things as forest cover, forest health, and forest-carbon budgets. Beyond the essential [ 2b ]storage and network resources, capacity clusters are also needed to analyse digital images.

3.6 Energy [ 4a,4b ]

The energy sector is arguably the most important component of the global economy from thestrategic point of view, and Canada is a major energy producer and consumer. Offshore oil re-sources offer many benefits to the economy of Atlantic Canada, and there too HPC is playing animportant role. For instance, T. Johansen (ACEnet) aims to develop new simulation methodologiesand tools that will assist in marginal fields and deep-water applications. Enhancing the recoveryrate of a single field like Hibernia by 2% could yield an additional 500 M$ in oil revenues. Be-yond this specific project, ACEnet institutions have formed the Pan-Atlantic Petroleum SystemsConsortium (PPSC), to “develop world-class technology and skills, which not only meet the needsof the petroleum sector in Atlantic Canada but also position the region as a leading exporter ofinnovative products and services to marine and offshore markets worldwide”.

Hydroelectricity also benefits from University-based HPC research. A. Soulaimani’s (CLUMEQ)finite element code, tailored for free surface flows, is used by Hydro-Quebec for river engineeringsimulations. A. Skorek (CLUMEQ) is simulating electro-thermal constraints in power equipment(transformers, circuit breakers, cables etc.). The goal here is to minimize the thermal lossesin such equipment, thus generating power savings. A. Gole (WestGrid) is developing parallelelectromagnetic simulations for the design and operation of reliable power networks in co-operationwith Manitoba Hydro. Canada also relies heavily on nuclear energy. S. Tavoularis (HPCVL)is conducting high resolution simulations of the thermal-hydraulic performance of the CANDUnuclear reactor core and pioneering simulations aimed at predicting the performance of GenerationIV supercritical water nuclear reactors.

Canada has made important contributions towards the use of hydrogen in the transportationsector, which may help reduce urban pollution and CO2 emissions. However, production andstorage issues, as well as the absence of specific standards for hydrogen as a vehicle fuel, arewidely regarded as obstacles to the introduction of hydrogen on the transportation energy market.Canadian researchers within the Auto21 project (CLUMEQ, SciNet, WestGrid and RQCHP) are de-veloping a scientific and engineering basis for determining standards and industry practices specificto hydrogen, focusing on refuelling stations and hydrogen-fuelled vehicles, taking into account thehydrogen storage technology (compression, sorption and liquefaction). Specifically they developCFD models based on validated HPC simulations to estimate quantitative clearance distance cri-teria, emergency response and the location of leak detectors inside vehicles. Fuel cells are also a

– 15 –

Page 17: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

promising technology involving hydrogen; A. Soldera (RQCHP) collaborates with General Motorsvia a NSERC Collaborative Research Development grant on applying computational Chemistryto the Study of Proton Exchange Membranes. In most of these examples capability clusters are [ 2b ]required.

3.7 Aerodynamics and Aerospace

The global aerospace industry is extremely competitive, with a constant push for leading-edge aircraft, jet engines, avionics, landing gear and flight simulators. The aerospace industry’schallenge is not only to design the most efficient and safest product, but also to be on time, onbudget, and within regulatory requirements. The Canadian aerospace industry, ranking fourth inthe world, is a 23 B$/year industry of which two thirds are exports. It employs 80,000 people, andinvolves all aspects of aircraft/rotorcraft/engine design, manufacture, simulation and operation.

In Canada, the industry’s strategy in accelerating its 1 B$/year R&D has quite visibly includeda significant expansion of its computational resources. Through HPC, large-scale multi-disciplinaryand multi-scale approaches are permitting the analysis and optimization of several limiting factorssimultaneously, while new optimal design approaches are generating configurations that not onlyhave significantly improved performance, but are indeed “optimal”.

The Canadian aerospace industry has a tradition of strong university-industry interactionthrough large research grants and several NSERC Industrial and Canada Research Chairs. Many [ 1a,1d ]aerospace scientists and innovators in Canada continue to have a national and international scien-tific profile and impact, and to succeed in training a large number of highly qualified Masters andPhD graduates grounded in HPC who are quickly absorbed by Canadian industry. For example,W. Habashi, A. Fortin, S. Nadarajah, A. Soulaimani, J. Laforte (CLUMEQ), D. Zingg, J. Mar-tins, C. Groth (SciNet), A. Pollard (HPCVL), J-Y. Trepanier, and M. Paraschivoiu (RQCHP), intheir ensemble, develop models and mathematical algorithms to analyze and optimize large-scalemultidisciplinary CFD problems that couple aerodynamics, conjugate heat transfer, ground andin-flight icing, acoustics, structures and combustion. A measure of their success is that many [ 1g ]of their codes and by-products have been adopted and are in daily use by the major aerospacecompanies in Canada and, in the case of W. Habashi’s codes, around the world.

In the last funding round, large-scale parallelism has enabled Canadian aerospace researchers [ 1a ]to tackle three-dimensional problems involving more than one discipline. In the current round, [ 2a ]flow unsteadiness will become a major focus, given the significant increase in computing power.See Case Study 7.8 (p. 47) for more details.

3.8 Industrial Processes [ 1g,4a,4b ]

Canada is the world’s third most important aluminium producer (after China and Russia),with about 10% of the global output. Much of the Aluminium industry related research in Canadais conducted at the Aluminium Research Centre – REGAL (CLUMEQ and RQCHP). HPC playsan important role, for instance, in simulating electrolysis cells. A virtual cell is operating at [ 1d ]REGAL, for research and training. REGAL studies all limiting aspects of aluminium electrolysisreduction cells. This requires the solution – either sequentially or in a coupled manner – of problemsinvolving magnetohydrodynamics, structure and conjugate heat transfer that necessitate largeHPC resources. ALCAN and ALCOA are two important industrial partners of REGAL. K. Neale(RQCHP), from REGAL, uses HPC for in-depth modelling of metal alloy microstructure withapplication to key industries involved in metal fabrication and forming processes. His program is toestablish a rigorous framework, consisting of original numerical tools and innovative experimentaltechniques that will become the platform to engineer new material systems for specific rangesof practical applications, particularly in the areas of civil engineering infrastructure and metalforming. The finite element code developed in his group requires a large distributed memory [ 2a,2b ]facility, particularly because of the move from two-dimensional to three-dimensional modelling.

– 16 –

Page 18: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

Canada is also one of the world’s leading pulp and paper producers. This industry has longestablished roots in the country but needs a growing research sector (much of it based on HPC) toface international competition. F. Bertrand (RQCHP) uses HPC to simulate the consolidation ofpaper coating structures. More generally, his work on numerical models for the mixing of granularmaterials has applications in many sectors of the chemical industry. Again, capability clusters are [ 2a,2b ]needed. The Centre de recherche en pates et papiers of UQTR (CLUMEQ) deals, as part of itsmission, with HPC simulations of pulp and paper processes.

3.9 Genomics and Proteomics

Sequencing the genome of living organisms (humans, plants, bacteria and viruses) has becomea major industry that pervades all the life sciences. It helps scientists understand mechanisms for, [ 1g,4b ]or resistance to, disease at the DNA level, thereby opening the way to the conception of new vac-cines, antibiotics, genetic tests, etc. The Canadian genomics community is large, influential, andwell supported by Genome Canada and corresponding provincial/regional agencies. The HapMap [ 1a,2a ]project (200+ researchers worldwide, Nature cover story, 27 Oct. 2005) is a major Canadianinitiative put forward by T. Hudson (CLUMEQ). Its objective is to chart the patterns of geneticvariation that are common in the world’s population. The results obtained so far provide convinc-ing evidence that variation in the human genome is organized into local neighbourhoods, calledhaplotypes. Genomics research relies heavily on HPC as it involves matching together a very largenumber of overlapping puzzle pieces (parts of the genome) into a coherent whole. This requirescomputerized search and pattern matching, as well as access to large databases. Such databasesmust be explored with machine learning algorithms, such as the ones developed by T. Hughes andcollaborators (SciNet). In general, research in genomics requires large data storage and access to [ 2a,2b ]capacity computing. See Case Study 7.9 (p. 49) for more detail.

Genomics is an area where technology transfer is accelerated by the use of HPC, as is the [ 1a,1g ]case with the Montreal-based company Genizon. Previous attempts at disease gene identifica-tion using family linkage analysis, while very successful at identifying genes for simple monogenichuman diseases, have generally failed when applied to complex diseases characterized by the inter-action of multiple genes and the environment. However, a more powerful gene mapping approach,called Whole Genome Association Studies (WGAS), has recently become possible. But the sys-tematic examination of hundreds of thousands of markers for statistical association with diseaserequires massive computational resources. The analysis workload and HPC requirement increasesas researchers zero in on specific genes and attempt replication of results in different populations.Genizon, in association with CLUMEQ, has recently analysed data from five WGAS, and will com-plete an additional three studies in the near future. Already, Genizon has pinpointed up to 12 ofthe genes that cause Crohn’s disease, an affliction of the bowel – compared with the two genesthat were previously known. Many such studies are in the works within this effort. [ 4a,4b ]

As another example, the Canadian potato genome project (led by ACEnet) aims at relatinggenomics information to the potato’s practical traits, especially those related to disease resistanceand suitability for value-added processing. The potato crop is one of the world’s four most impor- [ 4a,4b ]tant, and the tuber is eaten by over a billion people daily. In Atlantic Canada alone, more than1,000 potato farms employ more than 10,000 people.

Proteomics is a logical follow-up of genomics and studies how amino acid sequences, generatedfrom a piece of genetic code, fold into a three-dimensional structure by the simple action of inter-atomic forces and, occasionally, the help of other proteins. This is just one step away frompredicting the chemical and biological activity of the protein, which is one step from understandingthe results of a genetic modification. There are about 100,000 proteins in the human genome andthe structural information about these proteins is critical in understanding how they work fordrug development, cell differentiation, protein production and regulation. HPC plays a dual role [ 2a,2b ]in protein science: (1) it is essential in analysing the data from X-ray diffraction experiments that

– 17 –

Page 19: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

allow scientists to observe the 3D structure of proteins, such as the one conducted at the CanadianLight Source facility in Saskatchewan, and (2) it is used to predict the shape of proteins fromthe amino acid sequence, in what is already a major field of research in HPC. Many Canadianresearchers study the structure and interaction of proteins using molecular dynamics simulations.For instance, J.Z.Y. Chen (SHARCNET) investigates the folding properties of prions – such as those [ 1g,4b ]causing BSE (Mad Cow Disease) – at the molecular level and attempts to re-construct the processby which prions mysteriously convert themselves from a normal form into a disease-rich structure.P. Lague (CLUMEQ) carries simulations of cell membrane fusion and fission. The HPC needs [ 1a,1c,2b ]in proteomics are enormous: folding a 60-residue protein using the ART algorithm developed byN. Mousseau (RQCHP) takes 4 weeks on a single-processor machine, but tens of folding trajectoriesmust be generated to describe the statistics of the folding mechanism. Again, Case Study 7.9 refersto these needs.

3.10 Medical research

Cardiac Research. Heart disease is the most important cause of death in the western world. Withmajor improvements in the treatment of acute cardiac infarction in place, further reduction ofmortality depends more and more on our understanding of the amazingly complex cellular processesthat govern the electrical activation of the heart, and their relation to clinically measurable signalssuch as the electrocardiogram. HPC is crucial because macroscopic tissue modelling requiressimulation at several millions of points.

Canada has a strong position with several researchers who are recognized worldwide for theirwork in large-scale simulations of the heart, both in model development and in clinical applications.Interactions between these groups and with clinical and experimental investigators are intensive, [ 1a,2a,1e,3f ]guaranteeing a high practical impact of their work. The group of S. Nattel (RQCHP) works onatrial fibrillation, a condition that affects about 10% of people over 70 years of age and can havelife-threatening complications, such as stroke and heart failure. To learn more about the cellular [ 1a,1g ]and molecular mechanisms behind atrial fibrillation, this group has used computer models ofthe electrical activity in the heart’s upper chambers and conducts simulations to see if new orexisting drugs might be beneficial. J. Leon (ACEnet) and E. Vigmond (WestGrid) use simulationsto explore ways of improving implantable defibrillators, small devices that are implanted underthe skin, containing a sophisticated microprocessor and a battery. The device constantly monitors [ 1g,4b ]the hearts rhythm, so that when a cardiac patient has an episode of sudden cardiac infarction,the defibrillator delivers a strong electrical shock to start the heart pumping again. Implantabledefibrillators have become indispensable in patient care and monitoring. M. Potse and A. Vinet [ 1a,2a,4b ](RQCHP) use one of the world’s largest and most realistic heart models to make better sense ofsurface electrocardiograms, i.e., to better understand the relation between the clinical data andheart disease on the cellular level. The problem requires solving systems of tens of millions of [ 2b ]coupled differential equations, which is best accomplished on large-memory SMP computers.

R. Ethier (SciNet) and R. Mongrain (CLUMEQ) study blood flow through the heart and arteriesusing hydrodynamics. Current work focuses on blood flow through heart valves, where transitionaland turbulent flow is present. Modelling of such flows to improve heart valve and stent design is [ 4b ]clinically important. Understanding blood flow is also a crucial component in our interpretationof magnetic resonance scans.

Imaging. Magnetic resonance (MR), CAT and even PET scans are now commonplace in the livesof Canadians. These weakly intrusive techniques have dramatically improved diagnosis qualityover the last two decades and correspondingly increased the quality of life. All of these techniqueshave a strong computational component, and research towards improving these techniques (betterresolution, better models, etc.) relies on HPC. In neuroscience, as an example, every step of an [ 1g,4b ]MR brain scan relies on fast and effective computing, which produces three-dimensional imagesthat may then be stored. Just as demanding are the data mining aspects of imaging. Thousands of

– 18 –

Page 20: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

brain scans may be needed to determine the appropriate treatment. Archived brain scans containa wealth of information that doctors and researchers may mine to analyse and understand brainfunction and disease. Brain research in Canada (e.g. A. Evans (CLUMEQ), M. Henkelman (SciNet),T. Peters (Robarts, SHARCNET)) is increasingly making use of large brain image databases, oftenconstructed from research sites around the world, to understand brain disease and treatment. HPC [ 2a,2b ]needs are centered on storage and data transfer (tens of Terabytes are needed), as well as capacitycomputing. See Case Study 7.10 (p. 50) for more detail.

Radiotherapy. Knowing the effect of ionizing radiation on human tissues in general and genetic ma- [ 4b ]terial in particular is highly relevant to cancer treatment by radiotherapy and to radio-protection.J.-P. Jay-Gerin (RQCHP) simulates the penetration of fast electrons in aqueous solutions or cells.One of his objectives is to test a hypothesis according to which biochemical events in the cellularmembrane or mitochondria trigger the cell’s response to low doses of radiation. These simula- [ 2a,2b ]tions, conducted on a capacity cluster, are being moved towards a capability cluster where largerproblems can be addressed. On the diagnosis side, S. Pistorius (WestGrid) uses multidimensional [ 4b ]image processing to register, segment, and quantify the types and location of tumours for patienttreatment at CancerCare Manitoba.

Pharmacology. The design of new molecules from ab initio calculations – the expression in silicois used – is now a standard procedure in the pharmaceutical industry. The pharmaceutical sector [ 4a ]is an important component of the economies of developed countries and of Canada in particular.Quantum chemistry and molecular dynamics simulations using HPC allow testing of the function-alities of new molecules before engaging in the long process of organic synthesis of these molecules.Computer assisted molecular design (CAMD) is used not only within drug companies, but also at [ 3e ]many research universities, where specific drug research is conducted and future drug designerslearn their trade. For instance, D. Weaver’s group (ACEnet) is focusing on the design and synthesis [ 4a, 4b ]of novel drugs for the treatment of chronic neurological disorders, such as epilepsy and Alzheimer’sdementia. Attempts are made to correlate basic science with clinical science, thereby enabling a“bench-top to bedside” philosophy in drug design. An example of a Canadian success story is [ 1g ]Neurochem, a pharmaceutical company located in Quebec that has long been heavily involved inHPC developments. Originally spun out of Queen’s University, it has now grown into a mid-sizeddrug company with more than 200 employees. With promising drug candidates currently in clini- [ 4a ]cal trials and expanding drug development programs for other research projects, Neurochem is aleader in its development of innovative therapeutics for neurological diseases. In another project, [ 1g ]J. Wright (HPCVL) develops and tests synthetic antioxidants that may someday help slow downthe aging process and promote greater health in our senior years.

3.11 Social Sciences and Humanities

Canadian HPC is also reaching out to non-traditional areas. For instance, in psychology,D. Mewhort (HPCVL) works on the concept of human performance and how to model it. Currentwork centres on computational models for perception and memory with an emphasis on decisionand choice in speeded-performance tasks. Projects include studies of semantic acquisition andrepresentation, recognition memory, reading and lexical access, pattern recognition, attention, etc.The computational aspects of this research are carried out on capability clusters. HPC research in [ 2b ]the humanities is by and large based on text gathering and analysis. The TaPoR group (WestGrid,SHARCNET, SciNet, RQCHP, ACEnet) develops digital collections and innovative techniques foranalyzing literary and linguistic texts (see Case study 7.11, p. 52).

HPC plays an important role in econometric model estimation in areas such as labour eco-nomics, transportation, telecommunication and finance. In most cases, model estimation involvesdealing with huge micro-level databases that capture information at the level of individuals, house-holds or firms (this is known as micro-econometrics). Although a wide range of topics is covered,

– 19 –

Page 21: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

the micro-econometric models that are formulated all share commonalities. They are usually for-mulated at a level of sophistication that requires one to depict or to mimic decisions at the microlevel. Such a high level of refinement translates, most of the time, into a prohibitive level ofcomputational complexity, which is attributable to the computation of multidimensional integrals.Theoretical work on simulation has led to accurate simulators that can reliably replace thoseintegrals with easy to compute functions. Such simulations usually run on capability clusters.B. Fortin (CLUMEQ) leads a group that uses huge micro databases and sophisticated econometricmodels to analyse the impact of social policies. D. Bolduc and L. Khalaf (CLUMEQ) work on a newgeneration of discrete choice models that involves the presence of latent psychometric variables inthe econometric models.

3.12 Information Technology

HPC by itself can be the object of research with important HPC needs. Within this area, thefield with arguably the most impact on the way HPC resources are used is grid computing. Today,grids enable a number of large scale collaborative research projects that would not have beenpossible a decade ago. The major issues addressed by grid research include security mechanismsto enable controlled co-operation across institutional boundaries, creation of open standards forinteroperability and scalability issues faced by automation tools. Grid researchers do not havelarge computational needs over those of the application scientists they work with, but do needpriority, sometimes exclusive, access to resources to test new solutions and assistance from siteadministrators when deploying new software systems. It is also important to have access to a rangeof systems of different architectures so that interoperability issues can be fully explored. Past andpresent successes of Grid research in Canada include: (1) The Trellis Project (P. Lu, WestGrid) has [ 1a–f ]created a software system that allows a set of jobs to be load balanced across multiple HPC systems,while also allowing the jobs to access their data via a distributed file system. Along with manypartners, it performed a series of on-line, Canada-wide, and production-oriented experiments in2002 to 2004: The Canadian Internetworked Scientific Supercomputer (CISS). For instance, fromSeptember 15 to 17, 2004, in a 48-hour production run, CISS-3 completed over 15 CPU-years ofcomputation, using over 4,100 processors, from 19 universities and institutions, representing 22administrative domains. At its peak, over 4,000 jobs were running concurrently. (2) High-energy [ 1a–f ]physicists at WestGrid and computer scientists from NRC have established a computational grid(GridX1) using clusters across Canada, as a testbed for ATLAS. (3) One part of the Canadian [ 1g,4b ]DataGrid project involves managing sensor data collected on the Confederation Bridge that linksPrince Edward Island to New Brunswick. Data collected at the bridge are registered and replicatedat WestGrid and HPCVL and are used to predict critical structural problems that could be causedby ice flows. This predictive information is used to trigger preventative ice engineering measuresand to suggest times that the bridge should be closed. (4) The Grid Research Centre (GRC) basedat the University of Calgary collaborates on projects with researchers in academia and industryin Canada, the US and Europe. In a joint project with HP Labs, automatically deployable modelbased interoperable monitoring tools are being created. These tools are already deployed and usedon WestGrid systems.

Data mining is about extracting the full value from massive datasets that could not, in manycases, be attempted without HPC and sophisticated tools. The best-known examples are the Inter-net search engines, notably Google, which use data-mining techniques (and huge HPC resources)to rank web pages, images, etc. in order of relevance. Research on data mining algorithms alsorequires large HPC resources (large capacity clusters, and now capability clusters as well). Forinstance, J. Bengio (RQCHP) works on Statistical Learning Algorithms (SLA). These allow a com-puter to learn from examples, to extract relevant information from a large pool of data and, forinstance, to estimate the probability of a future event from a new context. Recent progress in this [ 1a, 4a, 4b ]field has led to many scientific and commercial applications. Bengio’s laboratory receives contri-butions and contracts from business partners that benefit from his research: insurance companies,pharmaceuticals, telecommunication services, Internet radio, financial risk managers, and so on.

– 20 –

Page 22: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

Data mining is one aspect of the more general field of Artificial Intelligence (AI). One of theearly objectives of AI research was to demonstrate the technology by building programs capableof defeating the best human players at playing games of skill. The first such demonstrationwas J. Schaeffer’s (WestGrid) work that resulted in a computer winning the World Man-MachineCheckers Championship in 1994 (the first program in any game to win a human title). Thisresult heavily depended on extensive use of HPC resources (hundreds of processors, primarilylocated in the United States due to a lack of available Canadian resources at that time). More [ 1a ]recently, Schaeffer’s group has used HPC resources (large shared-memory systems) to build astrong poker-playing program, developing new techniques for automated reasoning in domainswith incomplete and imperfect information. Navigating through the vicissitudes of the real worldrequires the ability to reason when only partial information is available, and a healthy skepticismabout the veracity of the information that is available. Thus poker is a much better test-bedfor exploring these ideas than games like chess. The research has implications in many areas,especially negotiations and auctions. The poker research has been commercialized by a Canadian [ 1g ]company (www.poker-academy.com).

There are a number of leading research groups in visualization across Canada, at WestGrid (theIMAGER, SCIRF, IRMACS, AMMI, Graphics Jungle and the IMG groups), SciNet (the DGPgroup) and RQCHP (the LIGUM group). These groups play a significant role in the internationalgraphics and visualization community. This leadership is exemplified by Tamara Munzner (West- [ 1a,2a ]Grid) being one of the co-authors of the US NIH/NFS Visualization Research Challenges report(2005). Case Study 7.12 (p. 54) provides more detailed information about specific project needs.

Numerical methods using HPC also arise in the fields of transportation and network planning.Using operations research algorithms, T.G. Crainic, M. Gendreau and B. Gendron (RQCHP),for instance, are solving, often in real time, large-scale problems related to traffic control, be ittraffic of people, vehicles or data. B. Jaumard (RQCHP) studies the problem of real-time designand maintenance of a communication path going through several satellites in motion. These are [ 2b ]examples of large-scale combinatorial optimization problems that must often be solved in real time,usually with the help of a capability clusters.

3.13 Mathematics

While advanced mathematics is involved in almost all the research described above, mathe-maticians are more and more using computation as a central tool of pure mathematical discovery.J. Borwein (ACEnet) uses advanced combinatorial optimization to hunt for patterns from very high [ 1a ]precision floating-point data. This floating-point computation is very intensive and a single runtypically involves more than 5,000 CPU-hours. His two 2004 books on Experimental Mathemat-ics (with D. Bailey, Lawrence Berkeley Labs) have already been cited in the National Academyof Science report Getting Up to Speed: The Future of Supercomputing (June 2004). P. Borwein [ 1g ]and M. Monagan (WestGrid) have worked with MapleSoft and MITACS to successfully integratesuch tools into Maple, the premier (Canadian) computer algebra package, along with F. Bergeron(CLUMEQ) and R. Corless (SHARCNET) who are also deeply involved in such co-development withmore emphasis on dynamical systems and special function evaluation. N. Bergeron (SHARCNET)applies parallel computer algebra methods in combinatorics while Kotsireas (SHARCNET) has hadgreat success implementing Groebner basis techniques to solve non-linear equations.

Financial mathematics has assumed critical importance in the Canadian financial industry. At [ 1g,4a ]the heart of modern-day financial mathematics is the modelling of the pricing and hedging ofoption contracts. The central importance of option and derivatives pricing stems from the pivotalrole of options in the daily risk management practices and trading strategies employed by leadingfinancial institutions. P. Boyle (SHARCNET) is internationally known for his broad contributions [ 1a ]to computational finance. At CLUMEQ, K. Jacobs and P. Christoffersen are well-known for theirwork on option valuation and for fixed income products such as bonds. It is worth mentioning that [ 1d,1g ]

– 21 –

Page 23: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

financial mathematics has attracted many young theoretical physicists and mathematicians whohave been trained to think quantitatively and to critically use or design approximate models. Thisis one example of the impact of highly qualified personnel (HQP) trained in a fundamental scienceand contributing to the Canadian economy in a priori unsuspected ways. T. Hurd (SHARCNET)is an example of the above in the research/academic sector.

4 A National Strategy for HPC

4.1 The National Vision

The current Canadian HPC model, described in Sect. 2, consists of seven regional consortiaproviding shared resources within their borders, with limited sharing with the rest of the country.This proposal is for a new model in which the regional consortia first select together and thenmanage locally a distributed infrastructure that is shared globally across Canada, within a collab-orative national entity called Compute Canada. Compute Canada, in which researchers, consortia [ 3h ]and University administrators will have representation, will ensure that the proposed platformeffectively functions as a national infrastructure, and establish procedures such that the currentNPF round is only the first of a series that will provide Canada with a sustained world-class HPCinfrastructure, together with the personnel needed to operate it efficiently.

Thus, Canadian researchers needing HPC will have access to a variety of architectures and will [ 2b,2c ]be able to select, for a particular scientific task, the one most appropriate to their needs. Theproposed model for national sharing will serve the needs of researchers more effectively, in a morecomprehensive way than the regional model brought about in the past through the individualconsortia. The installed infrastructure will be used in an even more efficient manner than beforebecause of the larger pool of potential users and load balancing between different facilities at thenational level.

4.2 National Sharing of Resources: Principles and implementation

The heart of the Compute Canada proposal must therefore be the principles intended to governnational resource sharing and the mechanisms proposed to manage the collectively acquired HPCinfrastructure. In designing a strategy there are clearly a series of balances that need to beoptimized: between the mix of architectures, between the degree of local as opposed to nationalresource allocation, and between the provincial and individual institutional jurisdictions, etc.

In previous CFI competitions, each consortium was requested to commit a minimum of 20% ofits resources for use by other Canadian scholars not affiliated with that consortium. The currentvision goes well beyond this in making all Compute Canada resources, past and future (i.e. notonly those acquired with support from the current NPF but also those previously funded by CFIat the consortia) accessible to all Canadian researchers at publicly funded academic and researchinstitutions. National sharing of Compute Canada resources across consortia is the sine qua nonmode of efficicent operation. Research needs that cannot be fulfilled by local facilities, in terms ofmachine type or in terms of cycle availability, will be addressed by application to the facilities atother locations. In order to maximize the use of the resources, it is proposed to operate under thefollowing set of guidelines.

1. Each consortium will be responsible for providing access to, and user support for, the localcomputational resources.

2. All users will obtain a local default allocation of resources, irrespective of their affiliation withinCanada. Each consortium shall provide a single point of access for requests for its resourcesfrom any researcher in Canada. Resource allocations beyond this default level, i.e., special orextraordinary requests, will be set by local and national resource allocation committees (seebelow).

– 22 –

Page 24: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

3. A local Resource Allocation Committees (LRAC), attached to each consortium, will treatspecial requests from researchers belonging to that consortium.

4. Special requests that cannot be accommodated within the consortium, either because of over-subscription of resources or for lack of the appropriate architecture, shall be promptly chan-nelled to a National Resource Allocation Committee (NRAC).

5. Non-local users (from other consortia) who have been granted access through the proceduresoutlined above will have the same privileges and responsibilities as local users.

6. The priority attached to the use of resources by individual researchers will be set by fair-shareprinciples. The idea behind fair-share is to increase the priority when scheduling jobs of theuser groups who are below their fair-share target, and decrease the priority of those user groupsthat are above their fair-share. In general, this means that when a group has recently used alarge number of processors, they will have a lower priority on jobs waiting to run. All usersget a default fair-share allocation. Projects requiring more resources can apply to the local ornational resource allocation committee for an increased fair-share allocation.

7. Software acquired with Compute Canada funding and installed on specific systems shall also bemade available to non-local users, subject to licensing constraints.

8. Information concerning Compute Canada resources and application procedures will be madeavailable on a national web-site in both official languages and will be linked to consortia web-sites.

9. The current intent is that there will be no differential fees to limit access to or sharing of theseresources. This will be possible only if sufficient funding is provided to cover operating costs.

*comprised of 8 regional representatives : 1 from Atlantic Canada 2 from Québec 3 from Ontario 2 from Western Canada

Compute Canada Board of Directorsup to 26 members

Othersup to 10 representatives

(Users, Industry, CANARIE, etc.)VP(R) Representatives

8 members *Executive

8 members *

NRACNational Resource

Allocation Committee

SACScientific Advisory Committee

4 members

Staff

Figure 4.1: The Compute Canada Council

4.3 Governance Model [ 3g,3h ]

The challenge of devising a structure capable of effectively managing the shared National Plat-form resource is to balance the operational effectiveness of local control with the need to ensureand promote national accessibility. The national structure proposed here will be a construct ofthe Canadian university community, and governed by a national Compute Canada Council that iscomposed of a number of distinct elements, as follows.

1. A Compute Canada Council (CCC) will be formed to oversee the access and sharing of theNPF-funded resources in order to ensure fairness and efficiency . This Council will be com-prised of up to 26 members, as follows: An Executive Committee (EC) comprised of 8 regionalrepresentatives (1 appointed by the consortium in Atlantic Canada (ACEnet), 2 appointed from

– 23 –

Page 25: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

the consortia from Quebec (RQCHP and CLUMEQ), 3 appointed by the consortia from Ontario(SHARCNET, HPCVL and SciNet), and 2 appointed by the consortium from Western Canada(WestGrid). This group would play the role of the present National Initiatives Committee infuture rounds of the NPF process. A National Steering Committee (NSC) comprised of repre-sentatives of the Vice-Presidents Research of the lead institutions of each of the existing HPCconsortia, with the regional distribution as described above. It is envisioned that this groupwould play the same overseeing and dispute resolution role in future rounds of the NPF pro-cess, as it has very well played in the current round. A User’s group comprised of as manyas 10 others representing academic, industrial, government and other users of HPC resources.A Nominating Committee within Compute Canada will select such representatives, who willprobably will be invited to serve on a rotating basis.

An organizational chart for the proposed Compute Canada Council is provided in Fig. 4.1, whichdisplays not only the above described three primary elements that will make up the Board ofDirectors (BoD) but also displays the additional elements of the proposed structure that willreport to, or be advisory to, the BoD. These additional elements are:2. The Executive Committee, whose role will be to overview and co-ordinate, between Board

meetings, the national aspects of Compute Canada’s operations. This includes: (1) making surethat the above sharing principles are applied effectively, which can be done by periodicallyreviewing statistics concerning access requests for, and usage of, the installed facilities. (2) co-ordinating any action by consortia and local staff to implement a national vision: constructionand co-ordination of web sites, a knowledge base for user support across Canada, nationalworkshops and conferences, etc. (3) reporting to the Canadian HPC community on the activitiesof Compute Canada.

3. A National Resource Allocation Committee (NRAC), whose role is to decide on large resourceallocations referred to it by the LRAC’s in order to rapidly and efficiently enable researchprojects requiring large HPC resources not available locally. The NRAC will consist of at leastone representative from each consortium and will strive for balance across disciplines.

4. A Scientific Advisory Committee, comprised of distinguished advisors on HPC from outsidethe Canadian HPC community, will annually review the performance of Compute Canada, andserve as a standing sounding board for the BoD.

4.4 Application Context

In assessing this proposal, it is important to recognize several constraints that strongly influencethe implementation of a national vision.1. The CFI NPF was formally discussed in October 2005, with a call for proposals only being

finalized in February 2006.2. CFI provides at most 40% of the funds. An equivalent share comes from the provinces of

Canada and at least 20% from industry (mostly in the form of vendor discounts, beyond theacademic discount). Most Canadian provinces insist that their investments in CFI-fundedinfrastructure must be spent in that province. Consequently, the HPC infrastructure requestedin this proposal is distributed across the country.

3. Within a consortium, major HPC infrastructure may have to be distributed across severalinstitutions, to strengthen the integration of that institution to the global collaboration andto benefit from the human and physical infrastructure already in place. Ideally, given thenational (CANARIE) and regional high-speed networks, the location of the resources shouldnot matter. In practice it does. We view this as a strength of the proposal since it enforces adistributed solution, giving it robustness, limiting single points of failure, maximizing provincialand institutional leverage and bringing support staff closer to more users.

4. We have done an extensive consultation with the Canadian academic research community. Thefour consortia expecting major funding have conducted extensive surveys of their HPC com-munities. For example, more than 200 PI’s responded to WestGrid’s internal web poll; RQCHP

– 24 –

Page 26: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

has established a list of about 150 PI’s following its survey; CLUMEQ has produced a 108-pageassessment of its user base and needs; SciNet has produced a detailed document outlining theresearch excellence that HPC infrastructure is essential in supporting. We have attempted to beas thorough as possible in reaching out to the entire Canadian research community. We stronglyfeel that the infrastructure proposed in this document has sufficient variety and flexibility soas to be beneficial to virtually all Canadian researchers.

5. This proposal should be viewed as a major evolutionary step in creating a truly national sharedHPC infrastructure. The research community will continue to work to refine and improve thevision.

4.5 The proposed infrastructure

Use of funds in the first NPF round. The LRP called for public investments of 76 M$/year inHPC infrastructure and personnel, including a high-end facility, near the top of the top500 list.The current level of NPF funding, and the timing, do not allow for such a high-end facility, nor forfunding a national organization as proposed in the LRP. The NPF funding is more commensuratewith the consortia-level funding proposed in the LRP. Moreover, the passage from consortium-based funding to the NPF model cannot be abrupt: the structures in place (the consortia) alreadyoperate with a high degree of efficiency and they must be maintained at least for a few years.In addition, Canadian provinces provide matching funds for CFI’s contribution and therefore aregional, if not provincial, distribution of the infrastructure is still a necessity.

Thus, the NPF will fund infrastructure still geographically located and managed by the cur-rent consortia, but based on agreement on architectures and distributions that facilitate full andeffective sharing of resources between consortia, as described above. Three of the seven consortia– SHARCNET, HPCVL and ACEnet – have been awarded major funding in 2004 and are currentlyin a deployment phase (years 2006 to 2007). They agreed that the bulk of the funding of this firstNPF round will be channelled through the other four (WestGrid, SciNet, RQCHP and CLUMEQ),at levels that roughly reflect both the size of the institutions involved and the depreciation of theircurrent infrastructure. In particular, RQCHP agreed to a smaller allocation in this applicationthan its historical funding would dictate, given that it is in the process of acquiring the balance ofits infrastructure. An analysis of the depreciated installed equipment at all consortia (not shownhere) has indicated that the distribution of funds the consortia agreed upon (Table 4.1) is equitablein terms of consortium size, historic funding and province population. Future NPF rounds willstrive to maintain balance across consortia or regions.

WestGrid 20.0 M$ SHARCNET 1.2 M$SciNet 15.0 M$ HPCVL 0.8 M$CLUMEQ 15.0 M$ ACEnet 0.5 M$

RQCHP 7.5 M$ Total 60.0 M$

Table 4.1. Proposed CFI funding of the regional consortia in the current NPF round.

This being said, the current proposal is not about dividing spoils between regional consortia.While the choice of architectures located at each consortium reflects to an extent its specific needs,the ensemble has been carefully debated and selected to satisfy the needs of all. This, of courseincludes large projects like ATLAS tier-2 computing (see Case Study 7.1, p. 37), whose nationaland local needs have also been covered.

– 25 –

Page 27: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

Categories of Architecture. In recent months, the needs of researchers located at the four majorconsortia being funded by the NPF have been carefully surveyed. Extensive discussions took placeon how to satisfy the generic and specific needs expressed in the surveys. It was realized that theseneeds could be accommodated by roughly four types of architectures:

1. Clusters with standard interconnects, also called capacity clusters in this proposal. These aresuitable for codes that can fit entirely within the physical memory of a single node, and aretailored for treating a large number of otherwise identical problems, as in parametric studies.They are, in fact, the most cost-effective way of solving such problems.

2. Clusters with low-latency/high-bandwidth interconnects, suitable for distributed memory ap-plications written, for instance, with the Message Passing Interface (MPI) Library. Problemsrequiring very large memory can only be tackled with such machines, hence the name capabilityclusters used in this proposal.

3. Shared memory machines, also called SMP (symmetric multi-processors) computers. Theirinterconnect between processors outperforms that of capability clusters in a way that allowsfor a shared memory model, e.g., using the OpenMP library, or with explicit threads. Suchmachines are suited for strongly-coupled problems that scale poorly on capability clusters.

4. Vector computers. They are also frequently capability clusters, but whose processors are capa-ble of performing a large number of pipelined floating-point operations, typically operating onan array of numbers with the same speed as a single number on a standard computer. Theyare traditionally used in climate and weather modelling.

These categories are not exclusive. For instance, capability clusters can be composed of so-called thin nodes (e.g. 4 cores or less per node) or of so-called fat nodes (e.g. 16 cores, with up to64 GB of RAM). Fat node clusters are thus also considered clusters of mid-sized SMP machines.In addition, the apparent slowing down in the growth of CPU speed favours the introduction ofcoprocessor technologies, permitting the introduction of a vector component into each node of acapability cluster. Finally, the decreasing cost of high-performance interconnects and the adventof multi-core processors tend to attenuate the difference in cost between capacity and capabilitysolutions, at fixed memory per core.

The proposed platform will be formulated in terms of this generic infrastructure without ref-erence to specific vendors (no quotes are enclosed with this proposal). Each of the major HPCinfrastructures proposed here can be satisfied by several vendors. If this grant application isfunded, the vendor decisions will be determined by an RFP process.

In the same spirit, storage needs are formulated in terms of a generic SATA-disk based solu-tion, with optional tape backups. A unified and generic cost template has been used for the globalbudget, irrespective of the choice of final vendor. The storage capacity necessitates an initial in- [ 2d ]vestment in hardware, following which increasing the capacity can be done periodically, dependingon the actual demand.

HPC equipment is very much coupled to the room in which it is installed. These rooms contain [ 2f,3h ]components specific to HPC: raised floors, large cooling and electrical capacities, uninterruptedpower sources (UPS), security features, etc. These form an integral and necessary part of the HPCinfrastructure and tend to increase the part of the budget devoted to construction/renovation,beyond the cost of a general-purpose space. The lifetime of these components is also longer thanthe computing equipment, and they therefore constitute a long-term investment in HPC.

Distribution of the proposed platform. Table 4.2 summarizes the architecture and geographicdistribution of the platforms chosen by the four consortia that would receive major funding. Itshows a good balance between capacity and capability clusters, and a moderate request for vec-tor and SMP machines. Table 4.3 summarizes the characteristics and relevance of the differentarchitectures.

– 26 –

Page 28: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

Architecture WestGrid SciNet RQCHP CLUMEQ TotalCapacity clusters 8,400 6,200 256 0 14,856

Thin-node capability clusters 2,800 3,200 3,700 3,900 13,600Coprocessor clusters 0 0 1,000 0 1,000

Fat-node clusters 0 0 0 2,800 2,800Large SMP 700 0 128 0 828

Vector cores 0 160 0 0 160Disk storage (TB) 2,750 1,200 216 1,200 5,366

Table 4.2. Platform infrastructure (approximate number of cores) vs. architectureproposed at the four consortia receiving major funding.

Remarks:

1. A more detailed description of each facility is provided in the budget justification section. Inparticular, memory configurations are not indicated here and some capacity clusters will beheterogeneous.

2. The Montreal site of CLUMEQ is planning a large fat-node capability cluster, with about 175nodes, each with 16 cores and 64 GB of RAM. Even though this machine is in the capabil-ity category, it can also be considered an SMP cluster and can satisfy moderate SMP needs(memory < 64 GB).

3. The WestGrid capability cores will be distributed across three sites, with respectively 800,900 and 1,100 cores. The other capability machines (fat-nodes excluded) will have between3,000 and 4,000 cores and will be able to host very large-scale calculations if needed. Thus,fractionating the capability clusters across six sites will not be a impediment to large-scalecomputing.

4. Fractionating the capacity clusters is not a problem either since they will be grid enabled.

5. SciNet will install a vector computer optimized for, but not dedicated to, climate and weatherstudies.

6. The Montreal site of RQCHP is planning a capability cluster with special coprocessors, orequivalent technology, to be installed in year 3. This will be optimized for codes based on stan-dard linear algebra libraries, but could also add a vector component to an otherwise commoncapability cluster.

7. Users in need of capacity computing coupled with large data storage, in particular the ATLAScollaboration, will find such facilities located at the WestGrid, SciNet and CLUMEQ.

The proposed configurations are based on today’s technologies and prices, and are obviously [ 2d ]subject to changes, especially for those configurations that are planned for years 2 and 3. The pre-cise configuration for each architecture can only be determined near the time of purchase, througha formal Request for Proposal (RFP) process, designed to provide the users with the most advan-tageous solution, in particular through the usual mechanisms of competition between vendors. Thecommittee has incorporated today’s educational prices, to which a reasonable additional discounthas been applied (based on past experience with consortia RFPs). The same prices have beenapplied across Canada. This led to an estimation of the scope of the infrastructure, as given inTable 4.2.

– 27 –

Page 29: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

Architecture Characteristics Fields of application (partial list)

Capacitycluster

Standard interconnect.Calculation must fit in each node’s memory(between 2 and 16 GB).Used typically for naturally parallelproblems.Most economic solution per peak Gflop.

Data analysis and simulation inAstrophysics and Particle Physics.Genomics and Proteomics.Quantum chemistry, quantum materials.Nanotechnology.Data mining, artificial intelligence.Grid research.

Thin-nodecapabilitycluster

High-performance interconnect (low latency,high bandwidth).Problems must be coded within a distributedmemory model (MPI); they are parallelizedbecause of very large memory requirementsand/or to gain speed of execution.

Large-scale simulations of particles inastrophysics, molecular dynamics.Coherent control (photonics) and otherlarge-memory quantum problems.Quantum materials.Computational Fluid Dynamics.Other finite-element problems inEngineering.Economics.

Coprocessorcluster

Nodes with coprocessor technology (e.g.Clearspeed technology, Cell, etc).Advantageous for codes that use standardlibraries (e.g. linear algebra routines) thatare already optimized for that architecture.Low power consumption per peak Gflop.

Quantum chemistry and ab initiocalculations.Any code relying heavily on standardlinear algebra routines.

Fat-nodecapabilitycluster

E.g., nodes with 16 cores and 64 GBeach, with high-performance interconnectbetween nodes.For problems that require large memory(< 64 GB) but are not parallelized in adistributed memory model: can be used asan “SMP farm”.Also for distributed-memory problemsthat tolerate a lower ratio of inter-nodebandwidth to node speed.

Computational fluid dynamics.Climate modelling.Meteorology.Imagery, e.g. brain research.Quantum chemistry.

Large SMP Shared memory model.For applications that cannot be parallelizedwithin a distributed memory model orrequire very large memory.

Artificial intelligence.Quantum chemistry.Medical (e.g. cardiac) research.Network science, Operational research.

ParallelVectorcomputer

For problems that scale poorly in adistributed memory model and for whichspeed of execution is the most importantfactor.Least economic solution per peak Gflop.

Climate modelling.Meteorology.Oceanography.Hydrology.Computational fluid dynamics.

Table 4.3. Proposed architectures, characteristics and relevant application [ 2b ]

– 28 –

Page 30: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

4.6 Relation between infrastructure and needs [ 2b ]

As mentioned previously, there has been an extensive survey within each consortium of theneeds of the current major users, as well as of emerging users of HPC. There is no doubt that the [ 2c ]task of selecting appropriate platforms for solving problems as varied as engineering, chemistry,physics, medicine and finance are facilitated by the growing flexibility of cluster platforms. Onehas only to observe the trend at Supercomputer conferences to notice that all these disciplines,so exclusive in their requirements only a few years ago, today have a common platform throughclusters.

In Section 3, a wide sampling of HPC-based research programs were described, and the as-sociated HPC needs were briefly mentioned, referring to the generic architectures above. This iscomplemented, later in this proposal, by a number of case studies that describe in more detail –albeit not comprehensively – some areas of research that have achieved international prominenceand where the lead scientists are heavy users of HPC. We have attempted to illustrate their cur-rent use of HPC and how they would benefit from enhanced infrastructure. Table 4.3 summarizesthe characteristics of the different architectures, and their relevance to the research described inSection 3 and the case studies.

4.7 Networking

The proposed infrastructure described above (and in more detail in the budget section) consistsof a distributed resource of individual HPC platforms linked through a high bandwidth nationalbackbone developed by CANARIE, a network referred to as CA*net4 . This network interconnectsthe individual regional networks (ORANs), and through them the universities, research centresand government laboratories, both with each other and with international peer networks. Througha series of five optical wavelength connections, provisioned at OC-192 (10Gbps), CA*net4 yields atotal network capacity of 50 Gbps. This most recently commissioned version of this network wasmade possible by an investment of 110 M$ from the federal Government .

Going forward, Compute Canada will work with CANARIE and the ORAN’s to plan and imple- [ 3f ]ment a truly national HPC backbone network – building on the work already done with selectedconsortia. It is anticipated that this network will include multiple dedicated HPC light-paths witha combined bandwidth of at least 10 Gigabits per second.

4.8 Compute Canada mechanisms to enhance collaboration [ 3f ]

Because the National HPC Platform is to be a purposefully shared resource, it will prove tobe an extremely effective structure through which to enhance research collaborations nation-wide.Examples of such existing or envisioned collaborations are described below. These collaborationsare expected to form as a consequence of shared research goals focused upon the use of specificHPC platforms.

Examples of HPC-based Research Collaborations. An extremely instructive example of the way [ 2a,b,3a–c,f ]in which HPC drives research collaborations is that provided by the ATLAS project (See CaseStudy 7.1, p. 37). The project is to have a lifetime of more than a decade, at which time ATLAS- [ 3d ]Canada will comprise about 50% of the Canadian experimental particle physics community, i.e.roughly 40 PIs. The HPC needs of the ATLAS community are well matched to the proposed sharedresource environment of Compute Canada. The ATLAS application is best served by capacityclusters with a standard interconnect as the event analyses that must be performed constitutea highly parallel application but one that also requires a significant amount of storage (severalpetabytes in total)

A second example of fostering research collaborations is the national effort in the development [ 2a,b,3a,c,f ]and application of global climate models to predict changes that are occurring as a consequence ofincreasing greenhouse gas concentrations in the atmosphere (see Case study 7.6, p. 44). Research

– 29 –

Page 31: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

in the area of climate predictability involves the solution of a complex initial value problem fora model of the entire “Earth System” of interacting ocean, atmosphere, sea ice and land surfaceprocesses. Each of these components of the system is described by a coupled set of non-linear partialdifferential equations. Their solution constitutes a tightly coupled problem which requires highbandwidth, low latency interconnect between a relatively modest number of extremely powerfulprocessors, or processing nodes, each of which has significant shared memory. The Canadiancommunity has expressed a strong interest in having access to a significant system in this classand our proposed platform will meet this need by installing a single system of this kind in theSciNet consortium.

Access Grid: Infrastructure to support collaborative interaction. This proposal includes the out- [ 3e,3f ]fitting of approximately 50 Access Grid rooms across Canada. These facilities, which have becomeincreasingly common in recent years, are intended to facilitate group-to-group interactions. AnAccess Grid node involves 3-20 people per site and are “designed spaces” that support the high-endaudio/video technology needed to provide a compelling and productive user experience. An Ac-cess Grid node consists of large format multimedia display, presentation, and interaction softwareenvironments; interfaces to grid middle-ware; and interfaces to remote visualization environments.With these resources, Access Grid supports large-scale distributed meetings, collaborative team-work sessions, seminars, lectures, tutorials and training. It is perhaps especially in the trainingarea that our proposed network of Access Grid facilities will prove to be most effective. Thesefacilities will be employed extensively by the technical support groups at each of the major instal-lations as an efficient means of delivering the training tutorials required for the efficient use of thenational HPC infrastructure.

5 Operations and the Human InfrastructureIn this section we describe the operational requirements and the investments into highly-

qualified personnel (HQP) that will enable Compute Canada to effectively implement the proposedstrategy. The CFI Infrastructure Operating Funds (IOF) are automatic with an award and arespecifically geared to cover operational costs. This proposal to CFI is also intended to fully justifythe need for additional funding from the Canadian research agencies, specifically NSERC, SSHRCand CIHR.

5.1 The Need for Highly Qualified Personnel

Although it is the researchers that assume the high-profile, visible role in the research process,in reality HPC-related activities require a large support team working behind the scenes to makethis possible.5 An effective HPC facility is much more than hardware. Highly trained individualsare needed to manage, operate and maintain the facility to ensure it runs smoothly and efficiently.They also play a key role in the selection and installation of equipment. For instance, the RQCHPclusters (installed in 2004 and 2005) provide a cost-effective solution to HPC, in large part becauseof the experience and diligence of the local team of analysts. Further, to maximize the use ofexpensive HPC facilities, it is equally important to have highly trained technical support staff whotrain and assist researchers in making the best use of HPC resources.

The investment in people is a critical component of this proposal. In many respects, computinginfrastructure can be more readily acquired than human infrastructure. Given adequate funding,upgrading of the capital equipment is straightforward; one can simply buy whatever is needed.However, obtaining human infrastructure is much more challenging. It can take years to trainpeople with the necessary skill sets. Many such highly skilled people are currently working at

5 Note that there is no reference to funding graduate students and PDFs. They are expected to be supportedthrough individual researchers’ grants.

– 30 –

Page 32: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

various consortia across the country. These people constitute a precious human resource thatmust be retained, but that can be easily lost, e.g., by being enticed away from Canada by the lureof better opportunities coupled with higher salaries. In addition, over the years, many talentedanalysts working at consortia have left for high-paying jobs in Canadian industry, sometimesfounding their own companies. While this is beneficial to the Canadian economy, it adds to thechallenges faced by the consortia. If Canada is to invest in people, then it must also invest increating the environment to attract and retain them.

5.2 Expense Categories

Effectively operating and using an HPC facility is an expensive proposition. The costs fall intoroughly four categories:

1. Operating infrastructure costs. This includes space, power, and day-to-day expenses (e.g.backup tapes).

2. Systems personnel. They are primarily concerned with the day-to-day care of the hardwareand software infrastructure, i.e., the proper functioning of the HPC facilities, system manage-ment, operator support, etc. This involves a wide number of disparate activities, all of whichare crucial to ensuring that the system is fully functional and available to the community. Theseinclude installing and maintaining the associated operating system(s) and updates/patches, themanagement of file systems and backups, minor equipment repairs, and ensuring the integrityand security of the user data.

3. Application analysts. Their role is to provide specialized technical assistance to researchers,conduct workshops and training, and to both evaluate and implement software tools to effec-tively make use of the available resources. This work can have a major impact on the scientificproductivity of the community. Typical HPC applications operate at a sustained rate wellbelow the theoretical peak performance; the applications analyst’s role is to improve this. Anapplication analyst might, for example, double the performance – through code optimization,algorithm re-design, enhancing cache utilization and/or improving data locality. This wouldcorrespond to twice the delivered science for the same cost. The added value from such activitiescan be huge.

4. Management personnel. A management structure needs to be put into place, both at theconsortium and national levels, to take care of, for example, human resources, financial issues,public relations, secretarial tasks, and coordination of activities. As well, we include in thiscategory the costs associated with hosting an annual international scientific advisory boardmeeting.

Systems personnel are usually located at each major installation site. Each major installationneeds a core set of people to address the day-to-day operations of the facility(s). Applicationanalysts do not need to have close proximity to the hardware. Ideally, they should be close tothe users since their work involves a collaboration with them. As this is not always possible,it is expected that much of their interaction with users will proceed through email and/or tele-/video-conferencing (as currently happens). Management personnel can be in support of a site, aconsortium, or the national initiative.

5.3 Funding Sources

The Canadian system makes it awkward to fund the above; no single funding source covers allthe needs. The consequence is that all consortia have to apply to multiple agencies to assist inproviding operating funds. These sources include:

• CFI Infrastructure Operating Fund (IOF). CFI provides funds for the operation of the proposedfacilities (30% of the CFI dollars). These funds can be spent on operating infrastructure costsand system personnel.

– 31 –

Page 33: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

• NSERC MFA grant. The NSERC Major Facilities Access (MFA) fund has been used to fundapplication analysts to support research application development (the so-called TASP pro-gram). This reflects the science and engineering dominated HPC usage to date. However,medical/biotech-related research is emerging as an important user of HPC resources in Canada(as well as the arts and humanities, but currently to a smaller extent), reflecting the need forcoordinated granting council funding.

• Provincial funding agencies. In the past, several provinces offered programs that could be usedfor HPC operations support. Some of these programs no longer exist.

• Institutions. All institutions hosting HPC facilities make a contribution to the cost of thefacilities. The most common form of support is one or more of space, power, technical supportpersonnel, management personnel, supplies, and cash.

The CFI IOF initiative has helped address the operational side and the NSERC MFA hashelped on the research applications side, but to date neither has reflected the true cost of operatingthese facilities. As well, there are no funding sources for management expenses, other than theinstitutions.

5.4 People Resources in Perspective

In comparison to the major HPC centers in Europe or the United States (or even EnvironmentCanada, the national weather service), the current Canadian level of HPC support is low. Table5.1 shows the user-support commitment at several major international and Canadian HPC sites(taken from the LRP using June 2004 data). There are significant differences in the total number ofsupport personnel. The ratio of resources (# of CPUs) to the number of support people ranges from38 for Environment Canada (a production facility) to 80-160 for international research facilities,to 334 for a typical Canadian HPC site. The table does not show the number of managementpersonnel (it was difficult to identify all of them). At least for the three international sites, thesenumbers are significant, further widening the disparity in support personnel abroad compared withCFI-funded infrastructure.

Category nersc (usa) psc (usa) hpcx (uk) Env. Can. WestGrid

Rank in Top 500 14 25 18 54 38

# of CPUs 6,656 3,016 1,600 960 1,008

Support personnelavailability (hours/days)

24/7 24/7 24/7 24/7 10/5

Operations personnel 9 6 4 12 2

Systems support person-nel

11 10 2 7 1

Networking & securitypersonnel

6 3 1 5 0

User support personnel 15 11 13 1 0

Total 41 30 20 25 3

Table 5.1. Operations and user support personnel at selected HPC facilities (Source:LRP)

– 32 –

Page 34: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

5.5 Operating Expense Plan

We now look in detail at each of the expense categories, projecting what their needs will be forthe next five years.

Operating Infrastructure Costs. The major item here is power. At this point, it is difficult toevaluate the cost of power over the next five years. However, a detailed analysis based on thecost of power in each province, current power consumption per CPU, and projected number ofCPUs concluded that a conservative estimate of this cost is 2.1 M$ per year for the infrastructurerequested in this proposal. For the five years of operations that the IOF provides, this amounts to10.5 M$ of the 18 M$ available (30% of 60 M$ requested). This also assumes that the cost of powerdoes not increase. In the past, most institutions hosting CFI-funded HPC installations paid thepower bill. With the rising price of power and demand from increasingly large HPC installations,many became aware that these costs were substantial and some now require the consortium topay their own power bills. This single expense can dramatically affect the ability of a site to fundsufficient people to professionally operate the installed systems.

Systems personnel. These people are essential for the smooth day-to-day operations of theseexpensive systems. Until the 2001 competition and the introduction of the IOF fund, CFI-fundedfacilities were run using a shoestring staff. IOF funds (net the power costs) are used to hire thesupport staff that are needed to ensure the professional operation of the infrastructure. Thisproposal has the majority of the requested infrastructure going to 12 sites. If one assumes thata systems person earns 70 K$ plus 22% benefits per year, then putting just one person per sitecosts 1 M$ per year – 5 M$ over the 5-year CFI time-frame. Given the size and scope of theinfrastructure at the sites, multiple systems personnel are needed (as many as four at some sites).In addition, each consortium will need personnel to provide technical leadership and coordinationacross all member sites. This includes a chief technology officer, director of operations, access gridcoordinator, etc. Coupled with the power costs above, there are few funds available for operatingthe facilities. This situation can only be rectified by having the institutions make contributions tothe operating costs.

Applications analysts. User support by application analysts makes the difference between a merelyoperational HPC facility and an effective one. Some consortia are well supported (from theirinstitutions or from provincial funding opportunities) and employ a few applications analysts.The majority of these people across the country are, however, funded through NSERC MFAgrants. The roughly 1 M$ annually supports a distributed team of applications analysts, but isquite inadequate in view of the current and projected needs, in particular given the new challengesarising with the national character of the facilities. A substantial increase of funding is required inthis category, and is justified in more detail in Section 5.6 below. This reflects the need for greaterattention to application development and performance enhancements, the larger user community,and the national nature of the proposal. A detailed operating budget (not included in this proposalfor lack of space) has been prepared that allocates the proposed funds to best meet the needs ofour large and distributed user community.

Management personnel. Unfortunately, there are currently no opportunities to apply for fundingto cover the costs of managing the multi-million dollar HPC facilities installed in Canada. CFIIOF cannot be used for any management expenses (including administrative, secretarial, financial,and public relations). NSERC MFA funds must be spent in support of research programs only. Allparticipating institutions must contribute cash to ensure proper local and national management.All the partners have committed the resources necessary to ensure this project is properly managed.

The contribution of granting councils. This proposal is requesting 32.4 M$ over five years fromthe granting councils. The majority of the funds will be used to support applications analysts,distributed across the country at all major sites. Unlike the capital and IOF funding requests,which are targeted towards the needs of four consortia, the TASP proposal seeks funding for all

– 33 –

Page 35: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

seven consortia representing the entire HPC community (as past NSERC MFA grants have done).The funds would be used as follows:• ∼ 65 applications analysts, with an approximate distribution (reflective of geographical distri-

bution) of 12.5% located in Atlantic Canada, 25% in Quebec, 37.5% in Ontario, and 25% inWestern Canada. The average salary would initially be 70 K$/year with 22% benefits (bothnumbers represent current averages across the country). A 4.5% per year increase in salaries ispart of the planning.

• An additional cost of 5 K$/year per applications analyst, to include the costs of a personalcomputer, travel to HPCS (the annual Canadian HPC conference), and one other conference(such as SuperComputing, or Gaussian users group).

• One communications/web master per consortia. This person’s responsibility would be to doc-ument user support, to facilitate comprehensive access to the resources, to document the suc-cesses of the HPC research, and ensure the successes are properly publicized to our communi-ties.

• Provision for an annual meeting of the international scientific advisory board (estimated at30 K$: four international members, one representative from each of the consortia, local ar-rangements, etc.).

5.6 The TASP Program

High performance computing cannot flourish without the appropriately trained support per-sonnel. This is currently provided by the Technical Analyst Support Program (TASP) model, aunique but unfortunately underfunded initiative. Many international installations have budgets tohire sufficient people to serve their strategic user community. The Canadian solution is to build anational support network – TASP – that serves a large user community with diverse needs. TheTASP program is currently funded by an NSERC MFA grant with a value of $1,038,000 per year.6

The program is being used to hire 22 programmer/analysts (many of these analysts have half theirsalary paid for by their host institution). Current TASP analysts form a highly trained, highlyexperienced team of computer scientists, mathematicians, computational scientists and systemanalysts who execute a wide range of activities:• Assistance in porting code from smaller systems to HPC resources;• Assistance in parallelization and optimization of codes;• Assistance with storage and data retrieval issues;• Consultation on numerical/computational methods;• Provision of widely disseminated training courses on parallelization and optimization;• Technology watch on HPC methods and libraries;• Assistance in advanced scientific visualization;• Distributed system administration; and,• Provision of training courses on the use of the AccessGrid.

TASP analysts work as a cohesive national team. They meet regularly via teleconferences, aswell as face-to-face annually at the Canadian HPCS conference. Although a particular providersite may not have a consultant in a particular area, there are both formal and informal mechanismswithin TASP to access the needed support. This has proven to be an extremely efficient way todeliver these support services, to cross-fertilize and to avoid duplication. This led to a biweeklyTrans Canada Computational Science Seminar being held over the AccessGrid between WestGridand ACEnet Universities, with expansion to other sites soon. In the past two years, the TASP haspartially or fully funded 66 workshops across the country, with a total of over 2,000 person days of

6 Note that the NSERC Major Facilities Access (MFA) program is being replaced by the new Major Re-sources Support (MRS) program. At the time of this writing, we do not know much about the MRS, and areassuming it will be similar in scope and funding model to the MFA.

– 34 –

Page 36: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

attendance. Topics regularly covered in workshops and online presentations include introductory,intermediate and advanced MPI programming; programming for shared memory architectures;performance-tuning tools; AccessGrid use; advanced scientific visualization; and grid middleware.As a result, the TASP team and the consortia have built a large resource base of presentationmaterials of both introductory and advanced tutorials on HPC, parallel programming, and variouscomputational methods.

In summary, the overall objective of the TASP analysts is to ensure that Canadian scientistsand their graduate students receive the computational support that they need to carry out theirresearch.

However, a large number of research groups need extensive help to get their computationalresearch up to speed. For instance, the parallelization of a code cannot always be accomplished bya member of the group; that and/or the optimization of the algorithm, or the introduction of high-performance libraries within a code, may require the investment of several weeks of an analyst’stime. Such investments sometimes lead to radical changes, not only in the performance of the code,but also in the scope of problems that can be solved, thus opening new research opportunities.Most researchers do not have the background, or the funds to hire someone with the expertise, toaccomplish these tasks. A single TASP analyst attached to a consortium can therefore enhanceseveral research applications to effectively use HPC resources, but the needs are enormous. Thecurrent level of funding for this program is insufficient, leaving the analyst needs of many researchgroups not addressed. The consequence is that many groups still run inefficient codes, or codesthat lead them to lower expectations and that are not competitive in the international context.Worse, many groups stay clear of HPC and its benefits for lack of adequate support. Expansionof the TASP program is required to meet the needs of the current and expanding community.

The current funding level for applications analysts is small given the size of our research com-munity. For example, the facilities mentioned in Table 5.1 (NERSC, PSC, HPCX, EnvironmentCanada) have an application analyst for every 10 to 42 users. In contrast, the two largest consortia,WestGrid and SHARCNET, have roughly 125 to 225 users per analyst. A consortium like RQCHPemploys a more adequate number of analysts, but just maintaining that number (despite the re-cent expansion of its membership) precisely requires the level of funding asked for in this proposal,given the increase of other expenses (e.g. power), the disappearance of the provincial programcontributing to operations and the need for a fair distribution of analysts accross the country.Increasing the number of analysts to 65 will bring the ratio of analysts to users to roughly 1:50(averaged across Canada), more in line with the ratios at the international sites.

The additional (at most consortia) TASP analysts would be proactive in a number of initiatives,including: (1) Giving greater attention to the application needs of more users; (2) Supporting andencouraging use of the grid infrastructure; (3) Developing portals to simplify job submission;(4) Adopting/developing meta-scheduling tools; (5) Adopting/developing tools for monitoring,accounting, reporting and analyzing HPC resource usage.

6 ConclusionsThis proposal represents a groundbreaking approach to HPC infrastructure in Canada. Getting

seven consortia, over 50 institutions, and the research community, to agree on a united HPCvision is a major accomplishment and a reflection of the essential role that HPC plays in researchtoday. The Compute Canada initiative is a comprehensive proposal to build a shared distributedHPC infrastructure across Canada. The national consensus on governance, resource planning,and resource sharing models are major outcomes of developing this proposal. C3.ca has begunthe process of becoming Compute Canada. With CFI’s initiative and the committed support ofthe provinces, industry, and the partner institutions, Compute Canada will meet the needs of theresearch community and enable leading-edge world-competitive research.

– 35 –

Page 37: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

7 Case Studies

Title Authors

7.1 Particle Physics R.S. Orr (U. of Toronto, SciNet)M. Vetterli (SFU/TRIUMF, WestGrid)Brigitte Vachon (McGill U., CLUMEQ)

7.2 Astrophysics R. Bond (U. of Toronto, SciNet)H. Couchman (McMaster U., SHARCNET)

7.3 Quantum Chemistry M. Cote (U. de Montreal, RQCHP)M. Ernzerhof (U. de Montreal, RQCHP)R. Boyd (Dalhousie, ACEnet)

7.4 Nanoscience andNanotechnology

A. Kovalenko (NINT, U. of Alberta, WestGrid)G. DiLabio (NINT, WestGrid)

7.5 Quantum Materials D. Senechal (U. de Sherbrooke, RQCHP)E. Sorensen (McMaster U., SHARCNET)

7.6 Global and regional climatechange

W.R. Peltier (U. of Toronto, SciNet)A.B.G. Bush (U. of Alberta, WestGrid)J.P.R. Laprise (UQAM, CLUMEQ)

7.7 Hydrology E.A. Sudicky (U. of Waterloo, SHARCNET)L. Smith (UBC, WestGrid)Rene Therrien (U. Laval, CLUMEQ)

7.8 Aerospace W.G. Habashi (McGill U., CLUMEQ)D.W. Zingg (U. of Toronto, SciNet)

7.9 Computational Biology C.M. Yip (U. of Toronto, SciNet)P. Tieleman (U. of Calgary, WestGrid)Hue Sun Chan (U. of Toronto, SciNet)Th.J. Hudson (McGill U., CLUMEQ)J. Corbeil (U. Laval, CLUMEQ)

7.10 Brain Imaging A. Evans (McGill U., CLUMEQ)M. Henkelman (U. of Toronto, SciNet)

7.11 Large-Scale Text Analysis G. Rockwell (McMaster U., SHARCNET)I. Lancashire (U. of Toronto, SciNet)R. Siemens (U. Victoria, WestGrid)

7.12 Collaborative Visualization Brian Corrie (Simon Fraser U., WestGrid)Pierre Boulanger (U. of Alberta, WestGrid)Denis Laurendeau (U. Laval, CLUMEQ)

– 36 –

Page 38: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

7.1 Particle Physics

The Challenge. Experimental high-energy particle physics is about to enter an extremely excitingperiod in the study of the fundamental constituents of matter. Since the discovery of the carriersof the weak force (W and Z bosons) at the CERN laboratory in Geneva, and the completion of theprecision study of parameters, many physicists are now convinced on the validity of the “StandardModel”. The Standard Model provides an accurate picture of the lowest level constituents of matterand their interactions in the accessible energy regime. For many reasons, the Standard Model isbelieved to be a low energy approximation to a unified, and possibly supersymmetric, theory thatspans all energy domains. If experimentally confirmed, such a supersymmetric theory could solvesimultaneously the problems of the basic structure of matter and the riddle of the compositionof dark matter in the universe. The Standard Model requires the existence of at least one newparticle, the Higgs boson. Its confrontation with experimental data, combined with its knowntheoretical shortcomings, is fueling expectations of new and exciting physics close to the TeVscale.7 This was the central motivation for the construction of the Large Hadron Collider (LHC),which will produce the highest energy proton-proton collisions ever achieved under laboratoryconditions. How to extend the Standard Model to overcome its known deficiencies is one of themost exciting experimental challenges in modern science.

The ATLAS collaboration has almost completed the construction of a general-purpose detec-tor designed to record the results of high energy proton-proton collisions and to fully exploit thediscovery potential of the LHC. This detector is designed to meet the diverse and exacting re-quirements of the LHC physics programme while operating in a very high-luminosity environment(luminosity is a measure of the interaction rate). This high-performance system must be capableof reconstructing the properties of electrons, muons, photons and jets of particles emerging fromthe collisions, as well as determining the missing energy in the event. Its radiation resistance mustallow operation for more than ten years of data taking at high luminosity.

ATLAS-Canada. The ATLAS-Canada collaboration comprises 33 grant eligible scientists who all [ 2a,3f ]take an active part in the ongoing projects. Including engineers, research associates, techniciansand students, ATLAS-Canada is a group of 88 highly trained people. The ATLAS detector projecthas an NSERC capital expenditure of 15.5 M$ for the completed detector construction, and an in-tegrated operating expenditure of 15.9 M$ to date. In addition, Canada has contributed 37.3 M$ tothe construction of the LHC. ATLAS was identified as the highest priority project by the Canadianparticle physics community in the last two long-range plans. The ATLAS-Canada collaborationincludes an excellent mix of leaders in Canadian high-energy physics, with a proven track record,and some of the best young scientists in the country, including several Canada Research Chairrecipients.

HPC Requirements. The ATLAS experiment will produce 2-3 Petabytes of data per year, withsignificant additional storage required for secondary data sets. Analysis of particle physics data iscarried out in stages, starting with calibration and alignment, event reconstruction, event filtering,and finally to the extraction of physics results. Secondary data sets are produced at each stage ofthe analysis chain. Each successive data set is smaller than the previous one due to event selectionand synthesis of the information from each collision. Eventually, the data sets get to a size whereit becomes practical for smaller research groups to access the data easily. The staged nature ofthe analysis lends itself well to a tiered system of analysis centres. CERN is coordinating aninternational network of high-performance computing centres to provide the resources necessaryfor the analysis of data from the four LHC experiments. This network will use grid computingtools to manage the data and to make efficient use of the computing resources. Over 100 sites ofvarying sizes in 31 countries currently participate in the LHC Computing Grid (LCG). ATLAS

7 One electron-volt is the energy gained by a unit charge that is accelerated through a voltage difference of1 Volt. 1 TeV is 1012 eV; the mass of the proton is a little less than 1 GeV (109 eV).

– 37 –

Page 39: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

will have 10 Tier-1 centres and 40-50 Tier-2 centres around the world. Canada will provide oneTier-1 and the equivalent of two Tier-2 centres. Technical details on the LCG can be found on theweb at lcg.web.cern.ch/LCG/.

The ATLAS-Canada computing and data analysis model is illustrated here through a set of [ 2b,3c ]typical use-cases:

1. Raw data handling: The raw data from the experiment will be processed, when possible, onthe Tier-0 system at CERN. Raw data, and the results of preliminary first- and second-passanalysis, will be sent to the Tier-1 centres. As better detector calibrations and reconstructioncode become available, the Tier-1 centres will reprocess the raw data to produce higher qualitysecondary data sets, which will be distributed to all ATLAS Tier-1 centres, and also to theTier-2 centres.

2. Simulation data handling: One of the primary uses of the Tier-2 systems will be to producethe large amounts of simulated data needed for analyses. The simulation is strongly limited byprocessing capacity, and the Tier-2 centres will have large CPU requirements for this purpose.As the simulated data at the Canadian Tier-2 centres are produced, they will be copied to theCanadian Tier-1 centre.

3. Physics analysis: Individual Canadian physicists will typically use desktop systems for prepar-ing and finalizing data analysis, but every analysis will require significant access to Tier-2, andin some cases Tier-1, capabilities. Most analyses will be based on the secondary data sets storedat the Tier-2 centres, once the analysis chain is stable.

The demands on the Canadian Tier-1 centre are exceptional. In addition to the processingand storage capabilities, the Tier-1 centre must continuously receive raw data from CERN, aswell as reprocessed data from other Tier-1 centres. It must also distribute the results of its owndata reprocessing to other Tier-1 centres and to the Canadian Tier-2 sites. The Tier-1 centremust therefore be dedicated to ATLAS. Funds for the Tier-1 are being provided through the CFIExceptional Opportunities Fund. On the other hand, access to the Tier-2 facilities would bemore erratic with periods of low usage as physicists consider the results of their analyses. These [ 2c ]facilities can then be used by researchers in other fields, and it is therefore appropriate that theTier-2 centres be part of shared facilities in the HPC consortia. This model makes efficient overalluse of the computing and storage resources in Canada.

7.2 Astrophysics

The Challenge. Astrophysicists seek to answer fundamental questions about the universe that spanthe vast extremes of time, space, energy and density from the Big Bang 14 billion years ago to thepresent and into the future. They are working on fundamental questions such as: how do complexstructures form, ranging from the nuclear and chemical elements to the vast interconnected cosmicweb of galaxies? How do stars, solar systems, planets, and indeed life, develop?

Canadian research institutions, in collaboration with other renowned international centres, aremaking massive investments in observational hardware to solve these cosmic mysteries of originand evolution. To extract the appropriate implications from this massive data set, which is growingexponentially in both volume and quality, requires significant increases in computational statisticaldata-mining power. It will also require the development of large-scale numerical simulations ofnon-linear astrophysical processes. A sense of the wide-range of Canadian astrophysical HPC needscan be gained by considering the following research topics:

1. Gasdynamical and gravitational simulations of nonlinear phenomena are critical in all areas ofastrophysics. For example, 3D numerical plasma simulations are an essential tool for under-standing the turbulent processes involved in the formation, growth and behavior of black holes,

– 38 –

Page 40: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

the explosion of supernovae, and the origin and dynamics of magnetic fields in stars, the galaxyand the universe. Simulations that track both gas and dark matter are required to investigatethe formation of large-scale structure in the Universe.

2. Cosmology has progressed from an order-of-magnitude science to a precision discipline. Increas-ingly ambitious cosmic microwave background (CMB) experiments, large-scale galaxy surveys,and intergalactic and interstellar medium observations, allow the determination of basic cos-mic parameters to better than 1% accuracy. They also address issues as far-reaching as thenature of the dark energy and dark matter that determine the dynamics and ultimate fate ofthe Universe. Analysis and interpretation of the new generation of space, ground-based andballoon-borne experiments, fueled by major technological advances, will require a hundred-foldincrease in processing power.

Canadian Successes. The previous generation of Canadian HPC resources played a vital role in na- [ 1a,1b ]tional and international collaborations and enabled key advances on some of the most fundamentalchallenges in astrophysics. SHARCNET researchers developed one of the world’s premier numericalcodes (Hydra) for simulating cosmological structure and collaborated on the “Millennium Simula-tion” which is the largest cosmological simulation ever performed (more than 14 billion particles).SciNet researchers and infrastructure have played a major role in analyzing recent CMB data;successes include the best measurements of the high-l polarization spectrum to date (2004 Sciencecover story), the first detection of peaks in the EE spectrum and the best intensity measurementof the third total power peak. WestGrid researchers are leaders in the numerical study of galaxiesand galaxy clusters, as well as the “Grand Challenge Problem” of computing involving black-holecollisions. Members of these consortia are at the leading edge in all aspects of simulations of the“cosmic web” that include gas, dark matter and dark energy. ACEnet astrophysicists are leadersin developing new algorithms and models to probe the interiors of stars, required to match theincreasingly more detailed observational data. These efforts will allow ACEnet to develop a com-mon simulation code for magneto-hydrodynamic astrophysical applications. RQCHP researchersare making fundamental discoveries concerning the physical processes occurring within stars (suchas diffusion), and are using large-scale simulations to develop a unified picture of the formationof compact objects (such as black holes) in close binaries. CLUMEQ researchers are using high-performance hydrodynamical simulations to study the formation and evolution of galaxies, theevolution and metal-enrichment of the intergalactic medium, the origin of Pop III stars, and theformation of star clusters and its feedback effect on the interstellar medium.

Community Strength. Canadian scientists are internationally recognized for excellence in HPC-related research in astrophysics, and are involved in many observational projects. In the projectslisted below, Canadian researchers require HPC resources to carry out large-scale data-analysisand numerical simulations. Canada has made significant investments in the: [ 2a ]

1. Canada-France-Hawaii Telescope Legacy survey (CFHTLS) that is mapping the dark matterin the universe to unprecedented accuracy

2. International Galactic Plane Survey (IGPS), the successor to the highly successful CanadianGalactic Plane Survey that is providing an unprecedented view of our Galaxy and the complexinteractions mediated by star formation and star death

3. Sudbury Neutrino Observatory (SNO) which was instrumental in solving the Solar Neutrinoproblem and continues to probe the inner workings of the Sun, the nature of dark matter andbasic neutrino properties

4. Upcoming ground-based ACT, balloon-borne SPIDER and space-borne PLANCK missions,that will determine cosmic parameters to unprecedented accuracy and possibly detect theunique signature of primordial gravity waves

5. Thirty-Meter Telescope (TMT), which will be the largest optical (and infrared) telescope everbuilt.

– 39 –

Page 41: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

HPC Requirements. The projects described in this case study require Monte-Carlo simulations [ 2b ]on thousands of CPUs. For example, the analysis of the Boomerang CMB data used 25% ofall available cycles on a 528-CPU SciNet cluster for three years. As computing needs increasedramatically with the number of detectors, this means that ACT and SPIDER analysis will requireroughly 300 times as much computiong power.

Highly-parallelized cosmological simulations that follow both gas and dark-matter require a [ 2a ]capability cluster with low-latency interconnect, thousands of CPUs and many TB of RAM. Forexample, a simulation with four billion particles and 40,000 time-steps would require 5 TB of RAMand 60 wall-clock days on a 5,000-CPU cluster. With such a system it will also be possible, for thefirst time, to perform dynamical simulations of galaxies that have the same resolution as in nature(with at least 10 billion particles) and to make the most realistic 3D turbulence and convectivesimulations ever attempted (with at least 5, 0003 cells).

7.3 Quantum chemistry

The Challenge. The properties of atoms, molecules and even solids can in principle be calculatedexactly from the laws of quantum mechanics. But a direct and complete quantum mechanicalsolution is only possible for systems involving a very small number of particles, even on the bestHPC facilities. Fortunately, starting in the 1960s, approximate methods have been derived fromDensity Functional Theory (DFT), which revolutionized the field and allowed the treatment ofsystems containing thousands of atoms. These methods are known as ab initio, since they startfrom the microscopic components of the of the system under study: nuclei and electrons. Theimportance of these methods has been recognized by the 1998 Nobel Prize for Chemistry given toWalter Kohn (B.Sc. and M.A. from the U Toronto) – for initiating DFT – and John Pople (formerlyat the National Research Council of Canada), who designed the GAUSSIAN code. Ab initio methodsare reliable enough that molecules and materials can be studied in virtual experiments in a HPCenvironment, whenever needed in solid state physics, chemistry, and increasingly in biology andpharmacology.Community strength and needs. The research groups named on p. 11 in connection with thisfield are only a few among the many active in Canada. Many more groups using these methodshave a need for large computation facilities. The methods are now implemented in well-writtencodes (Abinit, deMon, Gamess, Gaussian, Pwscf, Siesta, Vasp, Wien2k), each of these having a largecommunity of users in Canada. Some of these codes have small users fees for academics whileothers are open source codes making them accessible to the whole community. Because thesecodes provide an easy access to this technology, many groups in different fields are now using themand their usage will continue to grow in the coming years. The impact of this technology is felt [ 1g,4b ]in pharmacology (drug design), in biology (processes of the living cell), in the study of catalysisand in nanotechnology. Many Canadian researchers are involved in international collaborationsfor the development of code using DFT methods with a world-wide distribution: The ADF codeis developed in part at WestGrid, the deMon group was initiated at RQCHP/WestGrid and theAbinit code has contributors from RQCHP. Because of the wide range of applications possible withthese methods, implementation will depend on the problem that is addressed. Hence, different [ 2b ]strategies have been employed to efficiently use the computational resources available. Somecodes (ex: Gaussian, Gamess, Siesta) use a localized basis set to represent the electrons and donot scale well beyond a few nodes; they are more efficiently used with very fast processors withlarge memory. Other codes, based on a plane wave expansion (ex: Abinit, Pwscf, Vasp), are moresuitable for periodic systems and scale well over hundred of nodes on a capability cluster withfast interconnect. For these reasons, both shared memory and distributed memory architecturesare needed. However, all these codes have in common an extensive use of standard linear algebraroutines that benefit from new co-processor technologies (Clearspeed or Cell) for which theseroutines could be optimized. More specific research successes are given in the examples below.

– 40 –

Page 42: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

Example A. The Development of new functionals in DFT often requires testing on a large set [ 2b ]of predefined systems. The code then needs to be run on many independent instances, a task forwhich a capability cluster (or “capability farm”) is best suited. Typically, each test lasts a fewhours, requiring a few processors, and hundreds of such tests need to be executed.

Example B. Ab initio methods are perfectly suited for the design of novel materials withoutactual synthesis, optimizing the composition and structure and saving considerable amounts ofexperimental effort. Studying a complex solid with several hundred atoms per unit cell requires avast amount of computational resources. With a plane wave basis, this problem is equivalent, insome cases, to solving a linear system of up to hundreds of thousands variables. Fortunately, only afew hundred eigenvectors are needed, which makes iterative methods ideal for this task. Moreover,these problems involve Fast Fourier Transforms (FFT), of which highly optimized implementationsare available. Typically, such a study would require several hundred CPUs and run for as long asseveral weeks. Using this approach, RQCHP researchers were able to design a new material that [ 1a ]combined C60 fullerenes and a metal-organic framework with enhanced electronic properties aimedat improving the superconducting transition temperature. HPCVL chemists have demonstrated, [ 1a,1g ]using theory and molecular dynamics, that a film of oil containing additives is compressed atthe molecular level between two hot, hard surfaces. This study explains why certain additivesfail to protect the internal combustion engines. This finding is extremely important to the autoand lubrication industrial sectors. This type of study strains the capacity of existing resources.Expansion is required to provide large capability clusters or, when the technology is mature, special [ 2b ]accelerator technology adapted to linear algebra packages.

7.4 Nanoscience and Nanotechnology

The Challenge. Research in nanoscience and nanotechnology is inherently multidisciplinary, requir- [ 3f ]ing expertise in physics, chemistry, biology, engineering and material science in order to elucidatethe principles underlying the behaviour of nano-objects, and exploiting these principles in orderto develop nanodevices and integrate them into ‘real world’ applications.

One of the most promising targets in nanoscience and nanotechnology involves integrated ‘soft’biologically-inspired or synthetic organic nanostructures with inorganic ‘hard’ nanomaterials. It isanticipated that this focus will lead to new and extremely powerful tools and technology platformswith broad application in the life sciences, medicine, materials science, electronics and computa-tion. High performance computing (HPC) is an important component of nanoscience research,enabling the application of theory and modeling to such nanotechnology areas as molecular elec-tronic devices; nanoengineered thin films and photonic devices; bioinformatics, microfluidic devices,and nanodevices for health applications; environmentally friendly chemical processes and energyproduction. The timely development of advanced HPC infrastructure is thus a crucial factor inthe successful growth of these nanotechnology sectors in Canada.

Community Strength. The Canadian research community encompasses a wide range of disci-plines, with theoretical modeling and simulation employed to aid in solving crucial problems ofnanoscience and nanotechnology. For example, the Nanoelectronics program of the Canadian In-stitute for Advanced Research (CIAR) involves a number of highly accomplished researchers fromacross the country, including researchers from the National Institute for Nanotechnology (NINT),other National Research Council (NRC) Institutes and universities. A larger portion of the re-search conducted by these researchers involves simulation and modeling. CIAR members havedeveloped novel techniques for simulating and studying current/spin in nanoscale systems, aredeveloping novel platforms for nanoscale devices and are studying systems displaying high tem-perature superconductivity. Development and application of computational methods to researchareas that impact the Canadian economy include: nanocatalysis in chemistry and petrochemistry,biomembranes and protein simulations to understand the processes in biological nanostructures,

– 41 –

Page 43: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

fundamentals of solution chemistry and physico-chemical processes in soft matter nanosystems,molecular theory of solvation and a platform of multiscale modeling of nanosystems, integratedmodeling, simulation and visualization of biological systems (project CyberCell), nanoengineereddevices for biomedical diagnostics and fuel cell development. The following are two examples ofleading-edge developments in the theoretical modeling and simulation of nanosystems. The workdescribed relies critically on the availability of HPC facilities.Example 1 : Simulations of nanoscale devices. To progress beyond traditional silicon-based tech-nologies for computing and sensing, there is a need to develop devices that operate at the nanoscale.New molecular scale devices are envisioned, hybrid organic-silicon devices that have as their cen-tral functional elements a small number of molecules (less than ten) with surface features at whicha charge may be localized. The control over these molecules and charges will open the door tonew devices that require little power to operate, dissipate little heat and are capable of extremelyfast operation. A working model that demonstrates the principles of a single molecule transistorhas been built. The present WestGrid facilities, augmented by a 20-node PC cluster, were used to [ 1a,1b ]perform the quantum mechanical simulations that helped to elucidate the operational details of thedevice. Further developments towards an operational computing element will require computingfacilities well beyond that which is presently available to Canadian researchers. The difficultieslie in the fact that the systems under study are only slightly heterogeneous. This means thattraditional techniques for studying large systems (e.g. periodic methods) cannot be applied, be-cause a large amount of silicon bulk (10000 atoms) might contain, for example, only one dopantatom, which is responsible for the conductive ability of the bulk. Therefore the ability to perform [ 2a ]quantum mechanical modeling (ab initio calculations) on systems containing tens of thousandsof atoms is required. Having access to HPC computational facilities is required to advance thedevelopment of molecular scale devices.Example 2 : Theory and Modeling on multiple scales. The platform of theory, modeling, andsimulation on multiple length and time scales integrates a set of theoretical areas that treat phe-nomena and properties on a hierarchy of length and time scales: from ab initio methods for theelectronic structure at atomic scale to statistical-physical integral-equation theory of solvation anddisordered solids, chemical reactions in solution and at interfaces, self-assembly of supramolecu-lar nanostructures, and heuristic models of functioning of biological and engineered nanosystems.This state-of-art theoretical methodology constitutes a modeling platform for such major applica-tions as nanomaterials for catalysis, energy production and conversion; nanomembranes for waterpurification and treatment; ultrafast gas sensors for industrial safety and control systems; photonicnanodevices for integrated optics; supramolecular nanoarchitectures for chemistry and medicine.We have demonstrated the capabilities of our multiscale modeling methodology by predicting the [ 1a,1b ]formation and tunable stability of synthetic organic rosette nanotubes which constitute a newclass of synthetic organic architectures and a novel platform of organic synthesis. The calculationswere done using the WestGrid facilities and local HPC resources at U. Alberta. With appropriateHPC resources, this theoretical methodology makes feasible predictive modeling of very large sys-tems and slow processes of high practical interest, such as nanomaterials for energy and ICT, andprotein-protein interactions for enzymatic catalysis and biomedical applications.

HPC requirements adequate to the above research goals summarize as follows: [ 2b ]

• At least 50 TFlops (10,000 processors equivalent);• 20 TB of distributed RAM;• 2 TB of shared RAM.

It is highly desirable to have at one’s disposal different HPC systems that span over the set ofrequirements for particular applications, e.g.: distributed memory cluster; shared memory system;floating point coprocessors for massive parallel computations; FPGAs/co-processors for compute-intensive key algorithms.

– 42 –

Page 44: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

7.5 Quantum materials

The Challenge. The behavior of electrons in solids is most often described by band theory, whichtreats electrons as if they moved independently of each other. The parameters of band theory (theelectronic structure) are obtained from ab initio calculations (see case study 7.3). The indepen-dent electron approximation, central to band theory, fails to apply in a large class of “stronglycorrelated” materials, such as high-temperature superconductors. Problems involving magneticimpurities within metals, relevant to nanotechnology, are another important example. In general,many advanced “quantum materials” are in some exotic phase of matter that cannot be under-stood within the independent electron approximation and require new numerical methods. Thesimulation of these problems at very low temperatures typically requires finding the quantum me-chanical ground state of a system made of a small number of electrons (N). This is an eigenvalueproblem whose computational size increases exponentially with N , like the dimension of the corre-sponding quantum mechanical Hilbert space. In HPC, exponential problems are often consideredimpractical; however, in practice, we can respond to this challenge by developing schemes thatapproximate the solution of a large system by embedding the characteristics of a much smallersystem.

Community Strength. Canada has a very strong international reputation in the field of stronglycorrelated electrons. This is in part due to the action of the Canadian Institute for AdvancedResearch (CIAR) through its program on quantum materials (and formerly through its super-conductivity program). There are strong groups at UBC, Waterloo, Toronto, McMaster andSherbrooke. Two notable examples of scientific achievements are cited below.

HPC Requirements. This field, with its many different methodologies, has a variety of require- [ 2b ]ments. Pushing exponential problems to the limit of feasibility, however, requires large memoryaccess, which is only possible with distributed memory architectures (capability clusters). Quan-tum cluster methods, or methods based on the iterative aggregation of sites within an effectiveHilbert space (e.g. the Density-Matrix Renormalization Group, or DMRG), require the solution ofa mid-sized eigenvalue problem. In this way, memory requirements are kept at a reasonable levelin order to gain speed of execution. For this application, capability clusters are the instrumentsof choice, although capacity clusters are an entry-level solution for problems requiring parametricstudies. Monte Carlo methods (based on a stochastic evaluation of quantum-mechanical averages)are also widely used, and are most economically implemented using capacity clusters.

Example A: The screening of a magnetic impurity by conduction electrons. When a magneticimpurity is introduced into a metal, the resulting characteristics of the material, such as the resis-tivity, cannot be described using the independent electron picture. This problem is of paramountimportance for the understanding of nanoscale electronic circuits as well as quantum computingand quantum information theory. From the work of Kondo, it is also known that perturbativemethods fail and a full description, incorporating all correlation effects between the electrons, isnecessary. Numerical modeling has therefore been extremely important for the advancement ofthis active research field. Initial work by K. Wilson, for which he was in part awarded the 1982 No-bel Prize, has developed into the Numerical Renormalization Group (NRG). The DMRG methodhas been extensively used at SHARCNET as an alternative to the NRG method since it allows fora convenient way of calculating real-space correlations around the impurity. Both the NRG andthe DMRG methods are iterative methods that, at each step, require finding the lowest eigen-state of a sparse matrix whose dimension range in magnitude from thousands to millions. Sparseeigenvalue problems are very well adapted for a distributed parallel architecture but require a veryhigh performance of interconnectivity. On current SHARCNET computers researchers have reachedsizes of 5.7 billion for exact diagonalization (ED) studies of the Kondo problem using 72 hourson 64 CPUs. These calculations provided the first direct observation of the electrical transport in [ 1a,b ]nanoscale circuits with Kondo impurities and would have been impossible to perform prior to thearrival of SHARCNET computational facility. For a single parameter, a typical DMRG calculation

– 43 –

Page 45: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

will take 100 hours on 16 CPUs, the only limitation being the per node memory capacity. Thiswork required roughly 50 000 CPU-hours. However, an even more important factor for DMRGand ED is the per node memory bandwidth, much more than the CPU clock frequency.

Example B: High-temperature superconductivity with quantum cluster methods.High-temperature superconductors were discovered in the late 1980s. A simple physical model forthese materials was proposed in the Hubbard Model but it proved exceedingly difficult to studyand it could not be shown how it explained high-temperature superconductivity. Of interestis the phase diagram of the model, i.e., its properties as a function of parameters such as thedensity of electrons or the electron-electron interaction strength. Progress has been accomplishedin recent years thanks to a class of numerical methods based on the exact quantum-mechanicalsolution of the model on a small cluster of atoms, together with a smart embedding of the clusterwithin an infinite crystal lattice. One of these methods, called Variational Cluster PerturbationTheory (or V-CPT) has been used recently by physicists at RQCHP to show that the simpleHubbard Model contains the right elements to explain the basic properties of high-temperaturesuperconductors. The calculation was in fact the first to run on the RQCHP capacity cluster [ 1a,b ](Summer 2004), requiring roughly 100 000 CPU-hours, and would not have been feasible on acluster of a few tens of nodes only, let alone on a single computer. The method involves anoptimization problem: finding the stationary points of a function Ω(h), where h denotes one ormore fictitious fields favoring the establishment of broken symmetry states like antiferromagnetismor superconductivity. Evaluating this function Ω(h) requires solving a mid-size eigenvalue problem,as discussed above, and performing a three dimensional integral. The limiting factor is speed; thememory requirements are still relatively modest. Performing more realistic calculations will be [ 2a ]best done by working sequentially in parameter space, running each case in MPI. This will requirea “capability farm”: a large capability cluster with many MPI calculations running concurrently.

7.6 Global and Regional Climate Change

The Challenge. The ongoing global climate change caused by the inexorable rise in atmosphericgreenhouse gas concentrations constitutes a daunting challenge to the interconnected but hetero-geneous global community. Its impact will be most severely felt on high latitude regions of thenorthern hemisphere, in particular Canada. A 2005 report to the Prime Minister by the NationalRound Table on the Environment and the Economy refers to this danger as “perhaps unmatched intimes of peace” and suggests that “all Canadians will be touched by climate change impacts” that“pose new risks to human health, critical infrastructure, social stability and security”. The attri-bution of the ongoing changes to increasing concentrations of CO2, CH4, N2O, and halocarbonshas been clearly proven by new multidisciplinary numerical approaches that model the coupledevolution of the atmosphere-ocean-cryosphere-land surface processes as an Earth System. The con-tinuing development of such models constitutes one of the Grand Challenge Problems of moderncomputational science. Climate change computational models, using HPC, will have extremely [ 4b ]profound implications for national energy and environmental policy.

Canadian Successes. The Canadian Scientific community continues to play a leading international [ 1a,b ]role in the development and application of large-scale HPC-based numerical Earth System models.For example, significant early stage development of the semi-spectral methodology that serves asbasis for the dynamical cores of the atmospheric component of global coupled models took placeat the RPN Laboratory of Environment Canada at Dorval (Quebec). The semi-Lagrangian, semi-implicit methodology that is employed to efficiently time-step such models was also developed there.The Canadian Earth System Model at the Downsview, Toronto Laboratory of the MeteorologicalService of Canada is based upon this RPN sub-structure, and is now being further developed in theCanadian Climate Centre at the University of Victoria and widely used in Canadian universities. Afurther extension of the RPN effort to include fully elastic field equations, led by a UQAM scientist,

– 44 –

Page 46: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

has led to the development of the Canadian Community Mesoscale MC2 model. The CanadianMiddle Atmospheric Model (CMAM) that is playing a significant role in the international effort tobetter understand the stability of the stratospheric ozone layer that shields the surface biospherefrom the harmful effects of solar UV-B radiation has been developed by a team led by a SciNetscientist. All of these Canadian built models are playing an important role in the ongoing workof the Intergovernmental Panel on Climate Change (IPCC), which is co-sponsored by the UN andthe World Meteorological Organization, and is responsible for assessing the evolving science.

Community Strength. A measure of the Canadian research community strength and influence isthat the two highest ranked journals in the field, the Journal of the Atmospheric Sciences andthe Journal of Climate, both published by the American Meteorological Society, have selectedCanadians as their chief editors. A second measure is provided by the number of Canadianscurrently serving as Lead Authors in writing the 4th Scientific Assessment Report of the IPCCthat will appear in 2007. There are ten Canadians involved, with two as Coordinating Lead Authorson two of the eleven chapters. Our national effort is spread over a large number of universities thathost significant groups involved in global climate change modeling, including Alberta, Dalhousie,McGill, Toronto, York, UQAM, Victoria and Waterloo. Major national research networks fundedby the Canadian Foundation for Climate and Atmospheric Science are currently functioning. Theseinclude the Polar Climate Stability Network and the Modeling of Global Chemistry for ClimateNetwork (both led by SciNet PI’s), the Climate Variability Network led by a McGill-based PI, andthe Regional Climate Modeling Network led by a UQAM-based PI.

HPC Requirements. The requirements of this field are currently being met by parallel vector [ 2b ]systems, based on their dominance in the leading European laboratories (UK, France, and Ger-many). Canadian groups have also opted for smaller systems of this type (U Toronto, UQAM,Victoria). We have determined that alternative architectures do not adequately serve our needs inthis area. This was indicated in a letter from seven University of Alberta scientists, which stated:“Although the XXX shared memory architecture is, for climate simulation, distinctly superior toPC clusters, it does not allow particularly good scalability”. To appreciate the magnitude of thecomputational problems that must be addressed in climate change research it is useful to considera few specific examples. In describing these examples the NEC parallel vector systems will beemployed in order to provide a single basis for comparison. Please note that use of the vendorname does not imply a predetermined purchasing commitment.

Project A : Statistical equilibrium state computation under modified climate forcing. [ 2a ]• Model employed: NCAR CCSM 3.0, very low resolution (T31 atmosphere and 3x3 deg. ocean)• Calendar years of integration required: approximately 2000 years.• Machine employed: single node SX-6 with all 8 CPUs dedicated• Wall clock time required: 7 months• A single node SX-8 system operating at an aggregate peak speed of 128 Gflops will complete

this job in ∼ 3.5 months, a 20 node system in approximately 5 days. Such work may be donecompetitively on this system.

Project B : Ensemble of 10 transient simulations under changing climate. [ 2a ]• Model employed: Extended version of CMAMv8 with chemistry, T47, L95.• Calendar years of integration required: 100 years• Machine employed: single node SX-6 using all 8 CPUs but not in dedicated mode• Wall clock time required: 1 year• A single node SX-8 system operating in dedicated mode will complete this job in ∼ 5 months.

A 20-node system will complete this job allocation in about 2 weeks. Such work can be donecompetitively on such a system.

– 45 –

Page 47: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

Project C : Regional scale climate change projections. [ 2a ]• Model employed: GEM-based CRCM version 5, continental domain, 15-km resolution• Mesh comprising a 500× 500 horizontal grid with 60 levels in the vertical.• Simulation: double 30 year runs required (one for reference climate, one for projected). Tens

of runs for each to create the ensemble needed to determine significance.• Machine: SX-6 or SX-8.• Wall clock time for 1 run using 8 nodes of SX-6 : ∼ 2 months.• A 20-node SX-8 system will complete this job in approximately 2 weeks.

7.7 Hydrology

The Challenge. Issues such as surface water and groundwater quality deterioration, waste disposal, [ 2a, 4b ]the presence of infectious organisms in drinking water supplies, maintenance of aquatic habitat,and the impact of climate change on water supply require a fully-integrated approach. An ad hocapproach to water resources planning can lead to unpredictable and undesirable environmentalconsequences that require costly remediation or lead to irreparable damage to the resource. Thischallenge is internationally recognized, as are the limitations of our current computing capacityto address the pertinent issues. Numerical models that consider both groundwater and surfacewater quantity and quality in a fully-coupled, holistic fashion are conceptually and numericallychallenging, especially where complex reactions occur. Areal watershed models generally representthe surface water components adequately but overly simplify or entirely ignore the dynamics ofgroundwater. Standard groundwater models, on the other hand, ignore the dynamics of surfacewater. An integrated 3D surface/subsurface-modelling framework will require the bridging ofthese scale differences. The development of such models is recognized by the US NSF as one of the“Grand Challenge Problems” of modern computational hydrology and the predictions made withthem have extremely important implications with respect to water resources management andenvironmental policy, especially in the context of sustainable growth and the impact of ongoingclimate change.

Canadian HPC Successes. The members of the Canadian research community play a leading [ 1a,b ]role in the development and application of integrated numerical surface and subsurface hydrologymodels. The recently developed 3D control-volume finite element watershed model (HydroGeo-Sphere) is an advanced fully-coupled surface/subsurface flow and contaminant transport model. Itincorporates the physical processes of 3D variably-saturated subsurface flow, 2D overland/surfacewater flow, and multi-component, reactive, advective-dispersive contaminant transport. Its devel-opment is being led by research groups at Waterloo and Laval, and researchers at a number ofuniversities throughout Canada and the World are using it and its precursor, FRAC3DVS. Thefurther enhancement of the model has recently been accelerated through an evolving partnershipwith the US Bureau of Reclamation (USBOR) and the California Water Department to studya variety of water-related issues within the Central Valley of California. The HydroGeoSpheremodel has recently led to significant networking opportunities with European researchers. Themodel is at the core of a large EU Framework VI project which links numerous researchers from18 EU universities and research institutions to study the impact of non-point-source contaminantinputs, including climate and land-use change, on surface water and groundwater quality and onsoil functioning within several key watersheds within Europe, such as the Rhine and the Danube.

Community Strength. One measure of this quality is the fact that Canadian hydrogeologists havein the recent past served as editors of the two highest ranked journals in the water resourcesfield: Water Resources Research published by the American Geophysical Union and the Journalof Contaminant Hydrology published by Elsevier. Canadian hydrogeologists have also served asleaders of major international scientific societies, including past Presidents of the Hydrology Sectionof the American Geophysical Union, past Presidents of the Hydrogeology Division of the Geological

– 46 –

Page 48: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

Society of America, President of the International Commission on Groundwater of the InternationalAssociation of Hydrological Sciences, among other high-ranking posts. The Institute for ScientificInformation (ISI) recently assembled a list of the 250 most highly-cited researchers in the entirefield of engineering throughout the World: of the 11 Canadians listed, 4 are hydrogeologists.

HPC Requirements. The computational requirements of integrated surface/subsurface hydrolog-ical modeling is best met by tightly coupled parallel computer systems as is the norm in theleading European laboratories (UK, France, Germany). Alternative architectures, such as loosely-coupled clusters, do not serve our current needs, let alone the demands required to perform 3Dsimulations at the basin or continental scale over time frames of hundreds to thousands of years.To appreciate the magnitude of the computational problems that must be addressed in coupledsurface/subsurface hydrological research it is useful to consider a few specific examples.

Leading Projects with HPC Requirements.

Project A - Canadian landmass-scale computation of groundwater flow system evolution under [ 2a ]glacial cycling. Over the last two years, Canadian scientists have been conducting an applicationof HydroGeoSphere that entails the 3D simulation of the coupled surface and subsurface flowregimes for all of Canada in a fully-integrated manner that is driven by paleoclimate, in particularclimate-induced glaciation and deglaciation of the North American continent. This work entailsasyncronous linkage of HydroGeoSphere with a comprehensive, climate-driven dynamical model ofthe advance and retreat of the Laurentide ice sheet. Wall-clock times for each 120 K-year simulationon a 16-processor IBM RISC-based, shared-memory machine requires on the order of two weekseven after linearization of the governing flow eqautions and a de-coupling of the surface water flowregime. Through this effort, we are now poised to address the impact of future climate changeon Canada’s water resources at a national scale, but require substantially expanded computingresources. We estimate that a single one-hundred year simulation driven by future climate-changescenarios, and with full-coupling of the surface and subsurface flow systems would require severalmonths of CPU time on existing SHARCNET facilities.

Project B - Ensemble of 100 Monte Carlo transient simulations of high-level radioactive waste [ 2a ]repository performance under changing climate. The FRAC3DVS code has been adopted by theCanadian spent nuclear-fuel geoscience program to investigate the safety case for the disposal thesewastes in a Canadian Shield setting. Inclusion of complex 3D fracture-zone networks imbeddedin the host rock, along with the influence of dense groundwater brines at depth and with anaccounting of the effects of glacial loading/unloading over 100 K-year time frames, requires on theorder of 3 days of CPU time per simulation on a 3.5 Ghertz PC for each Monte Carlo realization.About 100 realizations are needed to capture the prediction uncertainty with any realism.

Project C- Groundwater discharge to coastal environments. Performed in 3D, a single simulation [ 2a ]of density-dependent flow in basins on the scale of 100 km2 can have clock times of 3 to 4 months(WestGrid statistics). This forces us to simplify the problem to consider only 2D elements of theflow, where in fact significant elements of the fluid motion are inherently three-dimensional.

7.8 Aerospace

The Challenge. Over the past thirty years, the aerospace design process has gradually beenrevolutionized. It has transformed from a primary reliance on testing to become heavily supportedby computational analysis. This change has been most notable in the field of computationalfluid dynamics to predict complex turbulent flows, which has greatly reduced the dependenceon expensive wind tunnel and flight tests. The development of reliable, efficient and accuratecomputational tools have greatly reduced the design cycle time and led to more efficient andsafer aircraft and engines. Nevertheless, there remain numerous challenges, many of which will beprogressively addressed through the use of future capabilities in HPC. The design of low-emission

– 47 –

Page 49: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

engines is dependent on a better understanding of the interaction of chemistry and fluid dynamicsin turbulent combusting flows. Similarly, further reductions in aerodynamic drag are essential inorder to reduce fuel consumption and emissions. Development of advanced concepts to reducedrag, such as flying wing configurations, adaptive wings, and active flow control devices, is highlydependent on high-performance computing. Recently, high-fidelity predictive capabilities havebeen used together with optimization algorithms to produce optimal designs. While much current [ 3f ]research in high-fidelity optimization is concentrated on a single discipline, such as aerodynamics,in the not-too-distant future one can envisage multi-disciplinary optimization of a complete aircraftand rotorcraft. This would be an enormous computational challenge highly dependent on HPC.

Canadian Successes. Canada’s aerospace industry, ranking fourth worldwide, is a key contributorto the national economy. It is Canada’s leading advanced technology exporter and employs 80,000workers. Canadian companies are world market leaders in regional aircraft, business jets, commer-cial helicopters, small gas turbine engines, and flight simulators. The larger companies, such asBombardier Aerospace, Pratt & Whitney Canada, Bell Helicopter and CAE invest heavily in R&Dpartnerships with HPC-based Canadian universities, and lately in unison through Consortia likeCRIAQ or through multi-sponsored NSERC Industrial Research Chairs. A flourishing number of [ 1g ]multidisciplinary applications of computational fluid dynamics have been developed and industryhas made extensive use of such tools in the design process. For the Canadian aerospace sector [ 4a ]to remain competitive, it will have to continually improve design time, which will be facilitatedgreatly by advancements in computational techniques and HPC.

The Community Strength. Several Canadian researchers are world leaders in the development andapplication of computational algorithms for aerospace applications. Members from these researchgroups are international journal editors, and have edited and authored books in this field; as well asoften invited as keynote conference speakers. These researchers have a long history in developing [ 1a,b ]computational fluid dynamics (CFD) software and, interestingly four major codes, StarCD, CFX,VSAERO and FENSAP-ICE, are spin-offs from Canadian researchers who developed them at UBC,Waterloo, RMC and McGill. As a measure of that unique success, taken together, it would behighly improbable that any major aerospace corporation around the world today is not using one ifnot more of these four software. Canadian academic-industry projects have led to hundreds of jointscientific publications and are a testimony to the originality and applicability of the research. Thus,overall, the Canadian research community has a golden track record in computational methods foraerospace applications and will continue contributing significantly to this sector.

HPC Requirements. Computational analysis to optimize the design of aerospace vehicles and [ 2a ]engines requires very large computing resources. The international escalating scientific challengeswill continue to strain and exceed our capabilities in HPC and can only be met with continuousupgrades in our faculties.

• For example, a solution of the steady flow field about an aircraft requires well over ten millionmesh nodes for sufficient resolution. There are at least six degrees of freedom per node, leadingto linear equations systems with over sixty million degrees of freedom and solution times rangingfrom ten to several hundred hours on a single processor.

• Optimization of an aircraft based on such analysis will require several hundred flow solutions, atten to twenty different operating points. Such an optimization will require 104 to 105 processorhours.

• If one now adds the complexity of the fourth dimension, time, in order to realistically accountfor unsteadiness, be it from rotor-stator interaction in a multi-stage jet engine, or from theeffect of propellers and rotors on aircraft and rotorcraft, solution times can easily be from 5 to10 times higher than the steady state.

• The next step of complexity would be enriching the physical models by migrating to morecomplete turbulence models, such as large eddy simulation and, certainly not in the near

– 48 –

Page 50: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

future, direct numerical simulation. That step is sure to add another factor of 5 to 10 to thecalculations.

• Following this, multidisciplinarity must be addressed in order to streamline the current lengthy [ 3f ]and inefficient sequential nature of aerospace design such as aerodynamics – structural analysis– stability and control – acoustics – icing. Putting two disciplines together is not additive, asmost interactions are very nonlinear and can easily quintuple the overall solution time, but yieldsignificant savings in terms of design time and manpower, added performance, and enhancedsafety. Examples are fluid-structure interaction, which is becoming the norm rather than theexception in industry. The simulation of in-flight ice accretion on an aircraft requires four suc-cessive complete solutions for impingement, accretion, anti-icing and performance degradation,thereby quintupling the resources needed for a flow analysis. Of similar complexity is to predictthe propagation of noise generated from an aircraft, rotorcraft or engine. This requires spectralresolutions to capture pressure signals that are from 6 to 7 orders of magnitude smaller thanthe flow itself, giving solution times in the order of days if not weeks.With a high-performance parallel capability high-memory computer consisting of from one to [ 2b ]

a few thousand processors, the above tasks translate to wall-clock times of tens to hundreds ofhours, assuming good parallel scalability. It is thus critical that the Canadian aerospace researchcommunity have suitable computing resources needed to develop the algorithms required to tacklecurrent problems.

7.9 Computational Biology

The role of HPC in Computational Biology. HPC has become an essential tool for biological andbiomedical research. We briefly describe some examples: [ 2a,2b ]

1. Genome sequencing projects are revealing the massive catalogue of genes in organisms rangingfrom yeast to man. We face major challenges in assigning functions to the many thousandsof uncharacterized genes and then using this information to identify the proteins that worktogether to control cellular processes.

2. Genetic studies of complex human diseases now test over 500,000 genetic markers per subject,such that a dataset for a current colon cancer study in Ontario/Quebec/Newfoundland involvesover 1.5 billion genotypes in 5,000 subjects. The search for gene-gene and gene-environmentstudies require huge computational power, in order to develop a set of genetic and environmentalrisk factors that can be used in cancer screening programs. Genetic studies will embrace newsequencing technologies that promise full genome sequencing of thousands of individuals indisease studies, including complete studies of tumors, and evaluation of epigenetic changesacross tissues. These projects will dwarf previous HPC needs of the Human Genome SequencingProject.

3. High-throughput microarray technologies are enabling studies of all genes of over 200 species.Only with HPC can we look for optimal structure and parameter values for networks of 25,000genes, and rapid identification of specific functional interactions between genes.

4. New advances in high-resolution, high-throughput tools such as mass spectrometry are makingit possible to rapidly identify and characterize proteins. Spatial and temporal mapping ofprotein-protein interactions in live cells has emerged as a powerful platform technology requiringthe acquisition, storage, and analysis of large (100’s of TB) image datasets.

5. Proteins are dynamic, undergoing rapid changes between various conformational states. Un-derstanding a protein’s biological function requires a first-principles understanding of how theseinterconversions occur since the biochemistry of diseases cannot be deduced from static foldedstructures alone. New efficient parallelization algorithms, innovative software, and rapid ad-vances in high-performance computing hardware are enabling technologies for atomistic- andcoarse-grained molecular dynamics (MD) simulations of large, complex systems and processes,including protein folding and membrane protein dynamics.

– 49 –

Page 51: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

Canadian HPC Successes. Canadian HPC successes in computational biology are many, including [ 1a,1e,1f ]new efforts in distributed computing such as the Trellis Project (see p. 20), which enables thecreation of metacomputers from geographically distant HPC infrastructure. In 2004, the Trellisproject engaged over 4000 computers nationwide to study two problems of significant biologi-cal importance: (1) protein folding (Chan:Toronto; Tieleman: Calgary); (2) proton transport inbiological membranes (Pomes: Toronto).

Community Strength. Canadian researchers are world-class in the design and implementationof high-performance computing in computational biology. These includes membrane biophysicssimulations and methods development for membrane simulations (Tieleman: Calgary), theoreticaland computational approaches to protein folding and conformational properties of biomolecules(Chan:Toronto), theoretical methods applied to the structure, function and dynamics of biolog-ical macromolecules (Pomes: Toronto), complex trait genetics (Hudson: McGill) and functionalgenomics (Hughes, Frey: Toronto; Rigault, Corbeil: Laval). Indeed the application of high perfor-mance computing in biology has spawned numerous initiatives, including the Canadian Bioinfor-matics Workshops.

HPC Requirements. The HPC needs for the computational biology, and indeed the biological sci- [ 2b ]ences community in general, are diverse and demanding. These range from immense storage (100’sof TB) required to archive information derived from massively parallel high-throughput micro-array experiments and all-atom molecular dynamics simulations of protein folding and membranedynamics, to large (1000’s of nodes) cluster computing systems for performing proteomic and ge-nomic data analysis, complex molecular dynamics simulations of biomolecular systems comprisingtens of thousands of atoms, and machine learning algorithm development.

Example. In view of the long-standing paradigm that structure begets function, it is criticallyimportant that the physico-chemical forces that underlie the three-dimensional structures of foldedproteins be resolved. Recent experimental advances point to an even greater challenge: the biolog-ical functions of proteins often rely on large-scale conformational fluctuations. Hence, a knowledgeof the folded structure of a protein is sometimes insufficient for deciphering, let alone under-standing its function. Moreover, many proteins are found to be function in intrinsically unfoldedforms. Thus, gaining insights into the dynamic and ensemble properties of proteins is crucial.Computational simulations are a powerful means of addressing such issues, and their implicationsfor maladies such as Alzheimer’s and prion diseases, which are among an increasing number ofdiseases found to arise from protein misfolding and aggregation. Simulations of protein foldingrequire extensive investigations of the available conformations using appropriate inter-atomic andinter-molecular interaction potentials. Herein lies the key challenges:

• How should the effectiveness of these empirical potentials be improved?

• How should they be systematically derived and evaluated?

• How should these potentials be applied, in a computationally tractable manner, to provideinsights about the relative importance of a protein’s different conformations?

Answering these questions at either the atomistic- or even coarse-grained level requires sig-nificant computational resources. One needs to look at not only interactions within the proteinbut also with its surroundings (i.e. solvents such as water) while simultaneously formulating de-tailed analytical approaches to modelling the interaction potentials between specific functionalgroups. The potential payoff is tremendous – Efficient modelling software coupled with realisticinteraction potentials will allow computational biologists to rapidly predict interactions betweenproteins, analyse mass spectrometry data, and elucidate the structure of gene products simplyfrom sequence.

7.10 Brain Imaging

– 50 –

Page 52: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

The Challenge. Brain imaging brings together physical scientists (physicists, chemists, computer [ 3f ]scientists, engineers) and neuroscientists (psychologists, molecular biologists, clinical researchers)in the study of the living brain, both in normal brain and disease. A range of brain scannertechniques collect 3D and 4D data about the brain’s structure and function, investigating themechanisms which underlie:1. Dynamic changes in the brain from birth through to old age;2. Normal mechanisms of perception, language, memory, motor skills, cognitive skills;3. Brain disorders: Alzheimer’s disease, Stroke, Multiple Sclerosis, Schizophrenia;4. Societal ills: drug addiction, stress, alcoholism, depression.

In the last 20 years, the field has exploded as computational advances allow complex analysisof raw imaging data in human brain research. Recently, there have been rapid increases in animalbrain imaging where, for example, we can measure the impact of genotype manipulations (geneknock-outs, transgenics) on brain development (the phenotype) in rodent models. This “genotype-phenotype” research is the next significant challenge beyond the Human Genome Project and willrequire massive computational resources.

Magnetic resonance imaging (MRI) assesses brain anatomy in exquisite detail, allowing us tomeasure changes in normal tissue or disease over time. Positron emission tomography (PET)measures brain chemistry such as the activities of dopamine, serotonin and other neurotransmit-ters. Functional MRI (fMRI) detects physiological changes like blood flow in different parts of thebrain and allows researchers to identify areas engaged in specific processing like pain perception ormemory. Magneto-encephalography (MEG) detects electrical activation in a brain over time, giv-ing information on the temporal ordering of events, how different brain regions communicate witheach other. MEG tells us “when” an event occurs but not “where” PET and MRI tell us “where”and “how much” but not “when”. The great benefit of advanced brain computational imaging,and indeed the next challenge, is to localize focal physiological changes and their interactions.These multi-modal tools will be vital for the assessment and the development of therapies relat-ing to neurodegenerative diseases, and in particular diseases associated with the aging Canadianpopulation.

History. Since the beginning of brain imaging in the early 1980’s, Canadian researchers have beenmajor leaders. There are world-renowned centres in Montreal, Toronto, Vancouver and London.At the McConnell Brain Imaging Centre (BIC) in Montreal, a community of 150 physical,computational, neurobiological scientists and trainees investigate the normal brain and a varietyof disorders in humans and in animal models. The BIC hosted the leading international meetingin the field in 1998. (2000 attendees). The BIC’s innovative use of computational analysis wasrecognized by a Computerworld Smithsonian Award and a permanent archive in the SmithsonianInstitute (www.cwheroes.org, 1999 Science category). The BIC is a hub for a large-scale dataanalysis for many international projects (e.g. a 30 M$ NIH multi-centre project to study pediatricbrain development). The BIC has received numerous CIHR Group Grant awards, major NIHconsortia awards and a 35 M$ CFI award in 2000 to re-equip the brain scanner infrastructure.

HPC Requirements. Functional images (fMRI, PET) measure brain physiology, including blood [ 2a ]flow, neurotransmitter/receptor density and brain networks that are “activated” during task per-formance (vision, language, memory, pain etc.). Serial 3D imaging creates a 4D datasets as thebrain responds to pharmacological/cognitive stimuli. Small changes in physiology identify the re-gions involved and the interaction between “activated” regions changes over time through learning,development and disease. Characterizing these correlated network underlying a particular behav-ior is a massive computational undertaking for each of N voxels8 (N2 correlation analyses whereN ∼ 106).

8 A voxel is to a 3D data set what a pixel is to a 2D image.

– 51 –

Page 53: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

Similar strategies are employed for structural imaging. Much of the analysis involves iterativedeformation of 3D volumes (∼ 106 voxels) or folded 2D cortical manifolds ( 300K vertices). Tensormaps of fiber tracts in the brain are extracted using 3D path analysis. Subtle anatomical signalsare detected by applying these analyses to large numbers (1000’s) of brain datasets.

Data analysis requires compute-intensive image processing techniques, all in 3 or 4D:1. Tensor analysis of fiber pathways throughout the brain;2. Non-linear 3D image deformation maps each brain to a common coordinate space;3. Segmentation of the 3D images to assign an anatomical ‘label’ to each voxel;4. Finite-element modeling of the cortical surface’s folding patterns;5. Statistical detection of voxels with significant changes in structural or functional signal;6. MRI/PET analysis using Bayesian formalism and differential equations at each voxel;7. Simulation experiments which model data acquisition using Monte Carlo techniques.

These steps are fully automated and are performed within a “pipeline” environment that pro-cesses large image databases for local researchers and academic collaborative networks. Thesepipelines also apply in the commercial arena. Clinical trials of new pharmaceuticals traditionallyemploy subjective clinical assessment of patient status. Imaging allows objective, in-vivo, quanti-tative assessment of disease progression and treatment impact. In 1997, the BIC completed thefirst fully-automated analysis of a phase III clinical trial image database (6000+ 3D datasets from14 international centres).

The major limiting factor in brain imaging research is computational capacity. The scanners [ 2b ]generate very large amounts of data (present BIC capacity is 25TB). We cannot keep up withthe computational demands of complex and iterative 3D voxelwise analysis on each of perhapsthousands of brains. Many important questions have to be addressed at inferior spatial resolutionor not explored at all. Canadian brain imaging researchers need continuous access to high-endHPC to maintain their position at the forefront of international brain imaging research field.

7.11 Large text studies

The role of HPC in Textual Studies. We live in an age of excess information. In a study titled,“How Much Information?” researchers estimated that 5 exabytes of new information are beinggenerated each year.9 Google, before they stopped advertising how many pages they had indexed,could search 8,168,684,336 Web pages. The Wayback Machine (Internet Archive) claims to have55 billion web pages archived for browsing. In short, researchers face an excess of information muchof which is in textual form and we need to develop new tools for analyzing large-scale collectionsthat are too large to read in traditional ways. Researchers that work with textual evidence fromliterary studies to law need HPC to help create custom aggregations for focused research that canbe combined with the analytical tools that can handle large-scale text datamining.

One way to think of the problem is to look at the scale of text collections that research tools weredesigned to handle. Early concording tools developed in the 1960s and 1970s like OCP (OxfordConcordance Program) and Micro-OCP were designed to create a concordance for a single coherenttext. These programs were batch programs that produced a concordance that could be formattedfor printing. Tools like TACT and TACTweb, both developed in Canada, were designed to handlesingle texts or collections of works like a corpus of work by the same author. Today, thanksto the breadth and energy of digitization projects, there is on the Web an exploding amountof information of interest to humanities scholars. Whether you are interested in 18th centuryliterature or an aspect of popular culture and discourse like discussion of the “cool” among teensthere is an excess of information now in digital form that can be gathered and studied. Some

9 See www.sims.berkeley.edu:8000/research/projects/how-much-info-2003/

– 52 –

Page 54: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

large-scale text databases have emerged like CanLII, ARTFL and the Women Writers Project.These have been built with custom retrieval tools optimized for their size. What we don’t have isgeneralized tools for gathering and studying large-scale collections or ways of using those enormouscollections like Google that have emerged.

Canadian HPC Successes. Humanities researchers and others concerned with textual evidence [ 1a,1b,1e ]in law, library science and informatics are at the forefront of the field of computer assisted textanalysis. The TAPoR (Text Analysis Portal for Research) project, which involves researchers at 6Canadian universities, is leading the development of digital collections and innovative techniquesfor analyzing literary and linguistic texts and a community portal that provides open access to webservice text tools (see www.tapor.ca). The Dictionary of Old English project and the LEME (Lex-icons of Early Modern English) project are unique historical dictionary projects providing founda-tional tools for the study of historical texts (see www.doe.utoronto.ca/ and leme.library.utoronto.ca).LeXUM, the Universite de Montreal’s justice system technologies laboratory, is a leader in the fieldof legal data processing. LexUM is a partner in the development of CanLII, a large-scale databaseof legislative and judicial texts, as well as legal commentaries, from federal, provincial and terri-torial jurisdictions on a single Web site (see www.lexum.umontreal.ca and www.canlii.org).

Community Strength. The Social Sciences and Humanities currently includes over 18,000 full-timefaculty who represent 54% of the full-time professors in Canadian universities. This communitydepend for their research on scholarly texts from primary sources to journals. While most useelectronic resources in some form few use computer assisted text analysis and even fewer use HPC.Focused research groups like TAPoR are the link with HPC consortia like SHARCNET. TAPoR isa national project involving 6 universities that is proposing to expand to 9. It involves the majorplayers in textual computing and has the expertise to bridge text research and HPC.

HPC Requirements. Data empires or large-scale collections, from the perspective of computinghumanists, have the following features:

• They are too large to handle with traditional research practices.

• They are heterogeneous in format.

• They are often multimedia rich in that they are not just texts in the sense of alphabetic datameant to be read but include page images, digital audio, digital video and other media objects.

• They are not coherent in the sense that they cannot be treated as corpora with an internallogic the way one can treat all the writings of a particular author.

The challenge is how to think about such large collections and use them in disciplined research.The HPC challenges include developing:

• Ways of gathering and aggregating large-scale collections for the purpose of study especiallywhen distributed. We have to develop virtual aggregation models.

• Ways of understanding the scope of these custom aggregations so that one can orient researchquestions to them. We need to be able run mining tools on distributed aggregations.

• Ways of asking deep questions of these aggregations. We need to be able to run analytical toolson subsets of these aggregations.

These types of problems are not new to HPC. Indexing, retrieval and datamining on largecollections of data has been the subject of research at groups like the Automated Learning Groupat the NCSA.What is new is adapting these techniques, often developed for business data orbiomedical data, to textual research in history and human culture. It is not the case that thereare “grand problems” in the arts and humanities that are waiting for text mining techniques.The very questions change when you ask about heterogeneous and incoherent collections. Themost dramatic challenge these text empires pose is to our understanding of what research in thehumanities is. We are at a threshold where we can stay the disciplines of the book (and archive) or

– 53 –

Page 55: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

become the disciplines of human art and expression. If we are about expression we have to adaptour research practices and questions to a different scale of information. We will have to learn to [ 2b ]use mining techniques, visualization techniques, and statistical techniques suited to the scale ofinformation in the digital age. The textual disciplines may end up the major user of HPC.

7.12 Collaborative Visualization

The Problem and its Importance. Visualization has been widely recognized as a critical componentof modern computational science. The US NIH/NSF report on Visualization Research Challenges(2005) states “Visualization is poised to break through from an important niche area to a pervasivecommodity. . . ” This break through will be driven by the need for computational scientists tounderstand increasingly complex data. During the past 20 years the world has experienced an?information big bang?. New information produced in the years since 2003 exceeds the informationcontained in all previously created documents. Of the information produced since 2003, more than90% takes digital form, vastly exceeding information produced in paper and film forms. Among thegreatest scientific challenges of the 21st century is to effectively understand and make use of thisvast amount of information. If we are to use information to make discoveries in science, engineering,medicine, art, and the humanities, we must create new theories, techniques, and methods for itsmanagement and analysis. The goal of visualization is to bridge the gap between the data, thecomputation that produces the data, and the human consumers of the data, providing the abilityfor researchers to share complex visualizations, control remote computations, and interact withboth their visualization and computational environments in an intuitive and natural manner.

HPC Requirements. Visualizing large, complex data sets requires the tight coupling of data [ 2b ]storage, computation, and graphics hardware. Computational simulations that produce millionsof data point points per time step, running over many time steps, produce vast amounts of data.Visualizing such time varying data can require memory in the 10s of gigabytes (GB). In addition,the extraction of graphics primitives from raw data and the processing of those graphics primitivesinto an effective visualization require the use of both parallel computation and parallel graphicsin the form of CPU and GPU (graphics processing units) clusters. As data sets scale, so must thecomputation and graphics.

Canadian Successes. In addition to pure and applied visualization research, there is a wide rangeof research projects in Canada that make extensive use of visualization as a critical tool in supportof their research. Example projects are described below:

Collaborative Computational Steering. The AMMI group at U. of Alberta, in collaboration with [ 1a,b ]ACE group at SFU, has developed an advanced computational steering, visualization, and collab-oration environment for computational science. Driven by the visualization needs of research onthe hydrodynamics of the earth’s core by Dr. Moritz Heimpel at U. of Alberta, this project hasexpanded to include a wide range of scientific domains including the simulation of CFD and ofbiological cellular processes (Project CyberCell). These projects, with increasing data set sizes,require both increasingly capable visualization algorithms as well as visualization hardware toenable researchers to interactively explore these complex simulations. This research has recentlyexpanded to include participation in the Global Lambda Visualization Facility (GLVF) project,an international testbed for advanced visualization, collaboration, and computational steering(www.evl.uic.edu/cavern/glvf/). This participation is being facilitated by the excellent collabora-tion/visualization infrastructure provided by WestGrid (www.westgrid.ca/collabvis).

Parallel Visualization. Research at the SCIRF lab (scirf.cs.sfu.ca) at SFU targets the development [ 2a,b ]of new interactive visualization algorithms for large data through data compression, hierarchicaland adaptive visualization methods, parallelization, and leveraging the power of GPUs. In partic-ular, developing new algorithms for data pre-processing and visualization directly on GPUs can

– 54 –

Page 56: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

provide a drastic increase in algorithm performance. Such computations are currently limited tolab systems that have relatively small numbers of processors and GPUs and as a result we arerestricted to data set sizes of a few GB. To cope with increasing data set sizes, we are faced withthe need to scale these computations to large parallel GPU clusters. Such systems do not currentlyexist, but their availability, through the NPF program, would enable us to implement interactivevisualization algorithms for data set sizes that are currently not possible.

Canadian Centre for Behavioural Neuroscience (CCBN). The development of new imaging tech-nology applicable to biological systems has spurred a revolution in neuroscience (See Case Study7.10). The field stands on the brink of being able to see unprecedented anatomical detail, levelsof key signaling molecules, and dynamic changes related to information processing, progression ofbrain pathology, and the evolution of therapeutic effects. The most effective mechanism to dealwith this increase in imaging data is through image analysis and visualization. The application ofvisualization technologies improves the ability to observe molecular signals in spectroscopic imag-ing approaches in optical-UV to infrared wavelengths, improves the time to capture and analyzeimages, and improves the ability to observe specific cell types. For researchers at CCBN (Uni-versity of Lethbridge), these technologies are critical in the areas of functional brain imaging, cellspecific imaging, and the detection/evolution of neuropathology.

Biochemistry and Medical Genetics. To understand the three-dimensional (3D) organization of [ 2a,b ]the mammalian nucleus in normal, immortalized and tumor cells, visualization of the nuclei iskey. Previous approaches using two-dimensional (2D) imaging have obvious limitations; nuclearstructures that are found in various focal planes cannot be properly displayed and therefore notproperly visualized. The result is that the interpretation of the data is incomplete. In the worstcase scenario, the interpretation is wrong. This is a limitation for basic and clinical research. Three-dimensional imaging, coupled with visualization of objects within the 3D space of the nucleus, hasenabled researchers to view the organization of such objects in a qualitative manner. For the firsttime, objects are viewed in the 3D space of the nucleus with high accuracy and the interpretationof the data is unbiased and objective. Researchers are able to assess parameters, such as spatialrelationship of objects to each other, overall distribution of multiple objects, and their relativesizes. None of the above is possible without visualization. Dr. Sabine Mai at the University ofManitoba represents the 3D technology node for Canada in international collaborations with theNetherlands, Italy and the US.

X-Ray Crystallography. Researchers at McMaster University’s Analytical X-Ray Diffraction Fa- [ 1a,b,2b ]cility use 2D CCD detectors for collecting diffraction data on single crystals (Chemical Crystallog-raphy) for molecular structure analyses and on polycrystalline solids (alloys, thin films, polymers,etc.) for crystallite orientation distribution (texture) analyses. Until recently, researchers have notbeen able to view the data set as one complete diffraction volume. With emerging visualizationsoftware and hardware technologies researchers can now examine the full 3D diffraction patternand all pole figures in a single visualization. For example, for single crystal data a regular series ofBragg diffraction spots (points) are used for solving the structure. Interesting packing disordersresult in diffraction intensity between the spots. These new visualization techniques give us criticalinsights into these unusual crystal packings.

Physics-Based Simulation of complex phenomena. Researchers at the Computer Vision and System [ 2b ]Laboratory (U. Laval) and non-academic partners are developing a platform for physics-basedsimulation that is coupled with distributed immersive visualization environments (4-wall CAVEand other systems). The research aims to combine complex geometric and photometric modelsof urban areas with GIS information and real-time physics-based simulation to train search andrescue teams.

– 55 –

Page 57: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

APPENDIX : Glossary

ACEnet Atlantic Computational Excellence NetworkCFI Canada Foundation for InnovationCIHR Canadian Institutes of Health ResearchCITA Canadian Institute of Theoretical AstrophysicsCLUMEQ Consortium Laval, Universite du Quebec, McGill and Eastern Quebec for HPCDFO Department of Fisheries and OceanHPCVL High Performance Computing Virtual LaboratoryHQP Highly qualified personnel (PDFs, analysts, graduate students, etc.)IOF Infrastructure Operating Fund (CFI)LRP Long Range Plan for HPC in Canada, published through C3.caMFA Major Facilities Access program (NSERC)MITACS Mathematics of Information Technology and Complex SystemsMRP Major Resource Provider (now synonymous with a consortium)MSC Meteorological Services CanadaNCE Network of Centers of Excellence (NSERC)NIC National Initiatives Committee, authoring this NPF proposalNINT National Institute for NanotechnologyNPF National Platforms Fund (CFI)NSC National Steering Committee, for the NPFNSERC Natural Sciences and Engineering Research Council of CanadaORAN Optical Regional Advanced NetworkRFP Request For ProposalRQCHP Reseau Quebecois de Calcul de Haute Performance.SciNet Science NetworkSHARCNET Shared Hierarchical Academic Research Computing NetworkSSHRC Social Sciences and Humanities Research Council of CanadaTASP Technical Analyst Support ProgramTRIUMF Canada’s national laboratory for particle physics researchWestGrid The Western Canada Research Computing Grid

Page 58: Compute Canada Calcul Canada · The Compute/Calcul Canada (CC) initiative unites the academic high-performance comput-ing (HPC) organizations in Canada. The seven regional HPC consortia

APPENDIX : Evaluation criteria from the CFI guidelines

1. Results and outcomes of past HPC investments. Past investments:a. enabled leading-edge research on computationally-challenging questions that would not have

been possible to undertake without the HPC resources;b. enabled institutions and their researchers to gain a competitive advantage nationally and

internationally;c. attracted and retained excellent researchers;d. enhanced the training of highly qualified personnel through research;e. strengthened partnerships among institutions and enhanced the efficiency and effectiveness

of HPC resources;f. provided resources that are used to their full potential;g. contributed to bringing benefits to the country in terms of improvements to society, the

quality of life, health and the environment or contributed to job creation and economicgrowth.

2. Quality of proposed research or technology development and appropriateness of HPC resourcesneeded. The investments will:a. enable computationally challenging research with the potential of being internationally com-

petitive, innovative, and transformative, and that could not be pursued otherwise;b. meet the needs of institutions and their researchers effectively and efficiently;c. provide a high degree of suitability and usability;d. are potentially scalable, extendable or otherwise upgradable in the future;e. incorporate reliable, robust system software essential to optimal sustained performance;f. provide a suitable and sustainable physical environment to accommodate the proposed sys-

tems, including adequate floor space, power, cooling, etc.

3. Effectiveness of the proposed integrated strategy of investments in HPC in contributing tostrengthening the national capacity for innovation. The investments will:a. build regional, provincial, and national capacity for innovation and for international com-

petitiveness;b. ensure complementarities and synergies among regional facilities;c. combine the expertise of regional facilities to ensure researchers have access to unprecedented

depth and support in the application of HPC to the most computationally challengingresearch;

d. attract and retain the best researchers or those with the highest potential;e. create a stimulating and enriched environment for training highly qualified personnel;f. strengthen multidisciplinary and interdisciplinary approaches, collaborations among re-

searchers, and partnerships among institutions, sectors, or regions;g. ensure effective governance, including the management, accessibility, operation and mainte-

nance of HPC resources on an ongoing basis;h. address all aspects and costs as well as long-term sustainability issues.

4. The potential benefits to Canada of the research or technology development enabled by HPC.The activities enabled by the investments will:a. contribute to job creation and economic growth in Canada;b. support improvements to society, quality of life, health, and the environment, including the

creation of new policies in these areas.