Top Banner
Working towards Sustainable Software for Science (an NSF and community view) Daniel S. Katz [email protected] & [email protected] Program Director, Division of Advanced Cyberinfrastructure ESIP Summer Meeting 2014 Copper Mountain - July 9
28

Working towards Sustainable Software for Science (an NSF and community view)

Sep 08, 2014

Download

Technology

Daniel S. Katz

This talk looks at the goal of sustainable scientific software from the point-of-view of an NSF program officer who funds software as infrastructure, meaning software that enables a community beyond the developers to perform research, and from the point-of-view of the attendees of the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1, http://wssspe.researchcomputing.org.uk/wssspe1/). Issues to be discussed include what sustainability means, funding, incentives, career paths, and communities.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Working towards Sustainable Software for Science (an NSF and community view)

Working towards Sustainable Software for Science

(an NSF and community view)

Daniel S. [email protected] & [email protected]

Program Director, Division of Advanced Cyberinfrastructure

ESIP Summer Meeting 2014Copper Mountain - July 9

Page 2: Working towards Sustainable Software for Science (an NSF and community view)

Big Science and Infrastructure

• Hurricanes affect humans• Multi-physics: atmosphere, ocean, coast, vegetation, soil

– Sensors and data as inputs

• Humans: what have they built, where are they, what will they do– Data and models as inputs

• Infrastructure:– Urgent/scheduled processing, workflows– Software applications, workflows– Networks– Decision-support systems,

visualization– Data storage,

interoperability

Page 3: Working towards Sustainable Software for Science (an NSF and community view)

Long-tail Science and Infrastructure

• Exploding data volumes & powerful simulation methods mean that more researchers need advanced infrastructure

• Such “long-tail” researchers cannot afford expensive expertise and unique infrastructure

• Challenge: Outsource and/or automate time-consuming common processes– Tools, e.g., Globus Online for

data management– Gateways, e.g., nanoHUB,

CIPRES, access to scientific simulation software

NSF grant size, 2007. (“Dark data in the long tail of science”, B. Heidorn)

Page 4: Working towards Sustainable Software for Science (an NSF and community view)

Science Infrastructure Challenges• Science

– Larger teams, more disciplines, more countries

• Data – Size, complexity, rates all increasing rapidly– Need for interoperability (systems and policies)

• Systems– More cores, more architectures (GPUs), more memory hierarchy– Changing balances (latency vs bandwidth)– Changing limits (power, funds)– System architecture and business models changing (clouds)– Network capacity growing; increase networks -> increased security

• Software– Multiphysics algorithms, frameworks– Programing models and abstractions for science, data, and hardware– V&V, reproducibility, fault tolerance

• People– Education and training– Career paths– Credit and attribution

Page 5: Working towards Sustainable Software for Science (an NSF and community view)

Cyberinfrastructure

“Cyberinfrastructure consists of

computing systems,

data storage systems,

advanced instruments and

data repositories,

visualization environments, and

people,

all linked together by

software and

high performance networks,

to improve research productivity and

enable breakthroughs not otherwise possible.”

-- Craig Stewart

Page 6: Working towards Sustainable Software for Science (an NSF and community view)

Cyberinfrastructure Framework for 21st Century Science and Engineering (CIF21)

• Cross-NSF portfolio of activities to provide integrated cyber resources that will enable new multidisciplinary research opportunities in all science and engineering fields by leveraging ongoing investments and using common approaches and components (http://www.nsf.gov/cif21)

• ACCI task force reports (http://www.nsf.gov/od/oci/taskforces/index.jsp)

– Campus Bridging, Cyberlearning & Workforce Development, Data & Visualization, Grand Challenges, HPC, Software for Science & Engineering

• Vision and Strategy Reports– ACI - http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12051

– Software - http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12113

– Data - http://www.nsf.gov/od/oci/cif21/DataVision2012.pdf

• Implementation– Implementation of Software Vision

http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504817

Page 7: Working towards Sustainable Software for Science (an NSF and community view)

Software as InfrastructureScience

Software

Computing Infrastructure

• Software (including services) essential for the bulk of science- About half the papers in recent issues of

Science were software-intensive projects- Research becoming dependent upon

advances in software- Significant software development being

conducted across NSF: NEON, OOI, NEES, NCN, iPlant, etc

• Wide range of software types: system, applications, modeling, gateways, analysis, algorithms, middleware, libraries

• Software is not a one-time effort, it must be sustained• Development, production, and maintenance are people intensive• Software life-times are long vs hardware• Software has under-appreciated value

For software to be sustainable, it must become infrastructure

Page 8: Working towards Sustainable Software for Science (an NSF and community view)

Software Vision

NSF will take a leadership role in providing software as enabling infrastructure for science and engineering research and education, and in promoting software as a principal component of its comprehensive CIF21 vision

• ...• Reducing the complexity of software will be a

unifying theme across the CIF21 vision, advancing both the use and development of new software and promoting the ubiquitous integration of scientific software across all disciplines, in education, and in industry

– A Vision and Strategy for Software for Science, Engineering, and Education – NSF 12-113

Page 9: Working towards Sustainable Software for Science (an NSF and community view)

Create and maintain a software ecosystem providing new capabilities that advance and accelerate scientific inquiry at unprecedented complexity and scale

Support the foundational research necessary to continue to efficiently advance scientific software

Enable transformative, interdisciplinary, collaborative, science and engineering research and education through the use of advanced software and services

Develop a next generation diverse workforce of scientists and engineers equipped with essential skills to use and develop software, with software and services used in both the research and education process

Infrastructure Role & Lifecycle

Page 10: Working towards Sustainable Software for Science (an NSF and community view)

ACI Software Cluster Programs• ACI co-funding (research -> infrastructure)

– Exploiting Parallelism and Scalability (XPS)• CISE (including ACI) program for foundational research

– Computational and Data-Enabled Science & Engineering (CDS&E)

• Virtual program for science-specific proofing of algorithms and tools

– ENG, MPS, ACI now; BIO, GEO, IIS in FY15?

• ACI lead-funding (infrastructure)– Software Infrastructure for Sustained Innovation (SI2)

• Transform innovations in research and education into sustained software resources that are an integral part of the cyberinfrastructure

– Includes all NSF directorates

Page 11: Working towards Sustainable Software for Science (an NSF and community view)

4 rounds of funding, 35 SSIs

2 rounds of funding,

14 S2I2 conceptualizations

Software Infrastructure Projects

See http://bit.ly/sw-ci for current projects

5 rounds of funding, 65 SSEs

Page 12: Working towards Sustainable Software for Science (an NSF and community view)

SI2 Solicitation and Decision Process

• Cross-NSF software working group with members from all directorates

• Determined how SI2 fits with other NSF programs that support software– See: Implementation of NSF Software Vision -

http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504817

• Discusses solicitations, determines who will participate in each

• Discusses and participates in review process• Work together to fund worthy proposals

Page 13: Working towards Sustainable Software for Science (an NSF and community view)

SI2 Solicitation and Decision Process

• Proposal reviews well -> my role becomes matchmaking– I want to find program officers with funds, and convince them

that they should spend their funds on the proposal

• Unidisciplinary project (e.g. bioinformatics app)– Work with single program officer, either likes the proposal or

not

• Multidisciplinary project (e.g., molecular dynamics)– Work with multiple program officers, ...

• Omnidisciplinary project (e.g. http, math library)– Try to work with all program officers, often am told “it’s your

responsibility”

To judge software, need to understand/forecast impact

Page 14: Working towards Sustainable Software for Science (an NSF and community view)

ACI Software Cluster Programs• In these programs, ACI works with other NSF

units to support projects that lead to software as an element of infrastructure

• Issue: amount of software that is infrastructure grows over time, and grows faster than NSF funding

Q: How can NSF ensure that software as infrastructure continues to appear, without funding all of it? A: Incentives

• The devil is in the details• We are exploring this now...

Page 15: Working towards Sustainable Software for Science (an NSF and community view)

Create and maintain a software ecosystem providing new capabilities that advance and accelerate scientific inquiry at unprecedented complexity and scale

Support the foundational research necessary to continue to efficiently advance scientific software

Enable transformative, interdisciplinary, collaborative, science and engineering research and education through the use of advanced software and services

Transform practice through new policies for software, addressing challenges of academic culture, open dissemination and use, reproducibility and trust, curation, sustainability, governance, citation, stewardship, and attribution of software authorship

Develop a next generation diverse workforce of scientists and engineers equipped with essential skills to use and develop software, with software and services used in both the research and education process

Infrastructure Role & Lifecycle

Page 16: Working towards Sustainable Software for Science (an NSF and community view)

Working Towards Sustainable Software for Science: Practice and Experiences (WSSSPE)

• http://wssspe.researchcomputing.org.uk• Mailing list:

http://lists.researchcomputing.org.uk/listinfo.cgi/wssspe-researchcomputing.org.uk

• First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1), @ SC13, 17 November 2013, Denver– 2 keynotes, 54 accepted papers– Discussion sessions: Developing software; Policy; Communities– Cross-cutting (emergent) topics: Defining sustainability; Career paths– Post-workshop paper: http://arxiv.org/abs/1404.7414

• Upcoming events:– WSSSPE1.1, @ SciPy2014, tomorrow!, Austin– WSSSPE2, @ SC14, 16 November 2014, New Orleans

Page 17: Working towards Sustainable Software for Science (an NSF and community view)

WSSSPE1 Context

• Science is becoming more complex• Software is becoming more important in science• Software is becoming part of the scientific

infrastructure– Should be preserved and reused (for

reproducibility and cost effectiveness)• But funding, culture, practice, etc. haven’t

caught up– What changes are needed?

• Bundled under the name “software sustainability”

Page 18: Working towards Sustainable Software for Science (an NSF and community view)

WSSSPE1 Motivation

• Arfon Smith (GitHub) keynote: Scientific Software and the Open Collaborative Web

• Example from data reduction in astronomy, where he needed to remove interfering effects from the device; work needed was persistent, but there was no practice of sharing this, so many researchers repeated the same calculations; ~13 person-years were wasted

• Why don’t we do better?– Because we are taught to focus on immediate research outcomes and not on

continuously improving and building on tools for research

• When we do know better, why we do not act any different?– Due to incentives and their lack: only the immediate products of research, not

the software, are valued

• Open source community has excellent cultures of code reuse, where there is effectively low-friction collaboration through the use of repositories– This has generally not happened in highly numerical, compiled language

scientific software

Page 19: Working towards Sustainable Software for Science (an NSF and community view)

WSSSPE1: Developing Software

• Widespread agreement that developing and maintaining software is hard, but best practices can help

• Difficulty: Making sustainable software means paying attention to many facets of software design, like APIs, security, user experience, testing ...– A single project that requires one full time software engineer

may actually require fractions of different kinds of engineers– But long-tail projects can’t even fund one FTE, let alone one

that can address all these facets• Team science (the science of teams) is important• Lack of career paths for developers is an issue

Many of the issues in developing sustainable software are social, not

technical

Page 20: Working towards Sustainable Software for Science (an NSF and community view)

WSSSPE1: Best Practices• Communities been built around projects, with:

– Public source code hosting, mailing lists, documentation, wikis, bug trackers, software downloads, continuous integration, software quality dashboards, and a general web presence to tie all of these things together

• Dominant common themes– Open development and community support – Tighter interactions among domain scientists and code developers,

and developers and their users (which are often domain scientists) – Users prefer robust and simple to use codes and platforms – Continuous integration is good – Sustainability is challenging for many reasons, including: changing

science, changing platforms and technology, and funding

• General recommendations– Use existing code/tools where possible – Code well, be simple and transparent– Use your code, nurture your community, promote, find support – Be satisfied with less than perfection – Keep the scientific goal in focus

Page 21: Working towards Sustainable Software for Science (an NSF and community view)

WSSSPE1: Policy

• Credit, citation, impact

Software work is inadequately visible in ways that “count” within the reputation system underlying science

– Recognition of work on scientific software, linked to questions of reward and thus motivation for particular kinds of work on scientific software

– How to fix: Software papers? DOIs for software? Altmetrics?

• Implementing Policy– Need strategies to implement the many facets of

sustainable software for scientific software, in a spectrum, between 2 extremes:

• “Co-funding” - large, multi-year collaborations, with equal emphasis on both the science and the software development

• “Software carpentry” - the scientists themselves will write and maintain their code, but need good tools to do this well

Page 22: Working towards Sustainable Software for Science (an NSF and community view)

WSSSPE1: Communities• Communities can form around science areas (developers

work together to maximize science) or technologies (technical catalysts take developments in one area and apply them to another)

• Companies or organizations can use well-tested practices to help communities to form– E.g. Hackathons, communication channels, conferences,

orientations, and documentation

• Need to educate developers to produce complex, agile, and sustainable domain-specific software, through:– Focused community workshops, summer schools, boot

camps; general software development training (and science training for computer scientists)

– Widely available prototype codes– Increased credit for software developments– Career paths in science

Page 23: Working towards Sustainable Software for Science (an NSF and community view)

WSSSPE1: Defining Sustainability• Sustainability for software does not just mean more

money from the government– Though it does need a committed effort over time

• Software Sustainability Maturity Model (OSS Watch, similar to discussion in Katz & Proctor)– “When choosing software for procurement or development

reuse ... you need to consider the future. While a software product may satisfy today’s needs, will it satisfy tomorrow’s needs?... In other words, is the software sustainable?”

• Sustainability not integrated into software engineering due to lack of accepted definition (Venters et al.)

• What is the goal of sustainability?– reproducible science, or persistence, or quality, or ?

• How is success in sustainability measured?– How does a group of developers know when they have

actually achieved sustainable software?

Sustainability: the effort that happens to make the essential things continue (@pebourne)

Page 24: Working towards Sustainable Software for Science (an NSF and community view)

WSSSPE1: Career Paths• People are essential elements of research infrastructure -

they need:– Education and training to be productive– Career paths to remain motivated– Incentives to move along their career paths

• It's difficult to motivate researchers to create sustainable software - why?– Few research career paths available for supporting software– No incentives for researchers to develop broad skill sets

outside of domains– Substantial competition from private companies

• Is there a role (career path) for non-tenure-track researchers who produce software, data, etc. in universities?– Assuming yes, do universities recognize and support this?

• If no, how to get them to?

Page 25: Working towards Sustainable Software for Science (an NSF and community view)

WSSSPE1: Career Paths

• Educate software developers and researchers to produce sustainable software by teaching them to collaborate more effectively; diminish distinctions– Caveat: academic communities aren’t taught how to evaluate

cross-disciplinary work– Lots of skills needed, domain science, software

development, applied mathematics, computer science, etc.– Teach general software development to researchers and

general domain knowledge to software developers– Provide focused community workshops, summer schools,

and boot camps to both, and give them a chance to interact and work together on problems

• Recognize and reward software development and the creation of other digital products

• Experiment, e.g. Moore & Sloan data science centers

Page 26: Working towards Sustainable Software for Science (an NSF and community view)

Moving Forward• WSSSPE1: Multiple linked social issues – Sustainability,

Incentives, Career Paths, Communities• Computational scientists “have a responsibility to convince their

institutions, reviewers, and communities that software is scholarship, frequently more valuable than a research article” (Bourne)

• Hypothesis: better measurement of contributions can lead to rewards (incentives), leading to career paths, willingness to join communities, leading to more sustainable software

• Recent CISE/ACI & SBE/SES Dear Colleague Letter: Supporting Scientific Discovery through Norms and Practices for Software and Data Citation and Attribution (NSF 14-059, http://www.nsf.gov/pubs/2014/nsf14059/nsf14059.jsp)– There is a lack of well-developed metrics with which to assess the

impact and quality of scientific software and data– NSF seeks to explore new norms and practices for software and data

citation and attribution, so that data producers, software and tool developers, and data curators are credited”

• 6 EAGERs and 3 collaborative workshops to be funded• Other ideas welcome – in WSSSPE2? Or discussion? Or email?

Page 27: Working towards Sustainable Software for Science (an NSF and community view)

Resources

• These slides, on: http://slideshare.net/danielskatz• NSF Software as Infrastructure Vision:

http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12113• Implementation of Software Vision:

http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504817• Software Infrastructure for Sustained Innovation (SI2) Program

– Scientific Software Elements (SSE) & Scientific Software Integration (SSI) solicitation: http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf14520

– 2013 PI meeting: https://sites.google.com/site/si2pimeeting/– 2014 PI meeting: https://sites.google.com/site/si2pimeeting2014/– Awards: http://bit.ly/sw-ci

• Working towards Sustainable Software for Science: Practice and Experiences (WSSSPE)– Home: http://wssspe.researchcomputing.org.uk (includes links to all slides & papers)– 1st workshop paper: http://arxiv.org/abs/1404.7414– 2nd workshop site: http://wssspe.researchcomputing.org.uk/wssspe2/

• NSF 14-059: “Dear Colleague Letter - Supporting Scientific Discovery through Norms and Practices for Software and Data Citation and Attribution”– http://www.nsf.gov/pubs/2014/nsf14059/nsf14059.jsp

Page 28: Working towards Sustainable Software for Science (an NSF and community view)

Credits:• SI2 Program:

– Current program officers: Daniel S. Katz, Rudolf Eigenmann, Sumanta Acharya, William Y. B. Chang, John C. Cherniavsky, Almadena Y. Chtchelkanova, Cheryl L. Eavey, Evelyn Goldfield, Sol Greenspan, Daryl W. Hess, Peter H. McCartney, Bogdan Mihaila, Dimitrios V. Papavassiliou, Andrew D. Pollington, Barbara Ransom, Thomas Russell, Nigel A. Sharp, Thomas Siegmund, Paul Werbos, Eva Zanzerkia

– Formerly-involved program officers: Manish Parashar, Gabrielle Allen, Eduardo Misawa, Jean Cottam-Allen

• WSSSPE:– Organizers: Daniel S. Katz, Gabrielle Allen, Neil Chue Hong, Karen

Cranston, Manish Parashar, David Proctor, Matthew Turk, Colin C. Venters, Nancy Wilkins-Diehr

– Summary paper authors: Daniel S. Katz, Sou-Cheng T. Choi, Hilmar Lapp, Ketan Maheshwari, Frank Löffler, Matthew Turk, Marcus D. Hanwell, Nancy Wilkins-Diehr, James Hetherington, James Howison, Shel Swenson, Gabrielle D. Allen, Anne C. Elster, Bruce Berriman, Colin Venters

– Keynote speakers: Phil Bourne, Arfon Smith