Top Banner
Practicing Open Science William J Schroeder, Kitware, Inc. Brian Wylie, Sandia National Labs Marcus Hanwell, Kitware, Inc.
32

Oscon 2011 Practicing Open Science

May 10, 2015

Download

Technology

Marcus Hanwell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Oscon 2011 Practicing Open Science

Practicing Open Science William J Schroeder, Kitware, Inc. Brian Wylie, Sandia National Labs Marcus Hanwell, Kitware, Inc.

Page 2: Oscon 2011 Practicing Open Science

Speakers & Topics

§  William Schroeder, President & CEO, Kitware, Inc. -  The whys and hows of Open Science

§  Dr. Marcus Hanwell, R&D Engineer, Kitware, Inc. -  Building an open-source research program (in Chemistry)

§  Brian Wylie, Sandia National Labs -  Research collaborations from a government perspective

Page 3: Oscon 2011 Practicing Open Science

The Scientific Method

•  Document    •  Share  

•  Data  •  Methodology  

•  Archive  

Galileo Galilei 1613

Page 4: Oscon 2011 Practicing Open Science

Open Science

§  Open Documents -  Hypothesis -  Descriptions -  Results

§  Open Data

§  Open Methodology -  Experimental apparatus -  Software -  Workflow -  Parameter Sets

Ensuring reproducibility

If it isn’t reproducible, it isn’t science

REPRODUCIBILITY

Positive Evidence

Accumulate Support

Negative Evidence

Disproof Hypothesis

Page 5: Oscon 2011 Practicing Open Science

§  Augmented PDF §  Contains links to executable viewer §  Downloads data and viewer as necessary to reproduce

paper images (results)

Example: OSA Interactive Science Publishing (ISP)

Page 6: Oscon 2011 Practicing Open Science

Example: Insight Journal §  Timely publishing of publications, data, and software §  Evaluated automatically; further reviewed by community

Code

Input Data

Journal Git Repository

Web Site

Results Data

Author

Build Machines

PDF doc

Page 7: Oscon 2011 Practicing Open Science

Benefits of Open Science

§  Collaboration -  Leveraging international communities

and expertize

§  Agile Innovation -  Facilitate technology mashups -  Move science to application faster -  More focus on technology; less on protection

§  Business Models -  Growing the pie, creating new opportunities -  Customization, software integration

“…much of our intelligence and creativity results from interactions with tools and artifacts and from collaborating with other individuals.”

-- Shneiderman

Page 8: Oscon 2011 Practicing Open Science

Example: Collaboration §  NIH National Center of Biomedical Computing NA-MIC §  Developing the OS NA-MIC Kit; 3D Slicer application

Page 9: Oscon 2011 Practicing Open Science

Example: Agile Innovation (Open Source for Medical Imaging)

Led to the creation of: -  ITK

-  VolView

-  BioImageXD

-  Osirix

-  MedINRIA

-  VisTrails

-  NIH / NCI caBIG – XIP

-  VR-Renderer

-  IGSTK

-  ParaView

-  Etc….

Creating VTK (Visualization Toolkit)

and finally…

Page 10: Oscon 2011 Practicing Open Science

Example: Business Models

§  Kitware: Building open source collaboration platforms -  The usual support and training -  Consulting -  Engaging in collaborative R&D -  Providing technology integration services,

aka creating custom solutions

CMake

CDash

Page 11: Oscon 2011 Practicing Open Science

The Open Technology Highway

§  Provide an open infrastructure -  Support research, teaching, non-profit

and commercial activities -  Any (legal) activity can hang off of the highway

-  Spur innovation, create opportunities -  Get from idea to product faster

-  Do not have to replicate technology -  Too many toll gates (i.e., closed systems,

unreasonable IP) slows everything down -  Prefer non-reciprocal licenses

Page 12: Oscon 2011 Practicing Open Science

Next Up

§  Marcus: Building a research program for chemistry

§  Brian: open science and research collaboration from a government perspective

Page 13: Oscon 2011 Practicing Open Science

Open Chemistry Growing a Research Program Through Open Source Dr. Marcus Hanwell, Kitware, Inc.

Page 14: Oscon 2011 Practicing Open Science

Grass Roots Effort §  Bootstrapped several efforts without funding

-  Spare time -  Parts of other projects when possible

§  Formed an “unorganization” – Blue Obelisk -  Published first article in 2005 -  Open data, open standards and open source -  Meet at ACS and other conferences when possible -  Follow-up article currently in press

§  Quixote collaboration more recently -  Provide meaningful data storage and exchange -  Principally targeting computational chemistry

Page 15: Oscon 2011 Practicing Open Science

The Early Years §  Avogadro projected started in 2006 §  First funded work in 2007 by Marcus Hanwell

-  Google Summer of Code student -  Final year of Ph.D. spent the summer coding -  Funded as part of KDE project – Kalzium editor

§  Built on several other open source projects -  Qt, Eigen, Open Babel, Blue Obelisk Data Repository

§  Also uses open standards, such as OpenGL for rendering §  Cross platform, open source stack

Page 16: Oscon 2011 Practicing Open Science

Community Tools, Standards and Resources §  Make extensive use of Qt for standard GUI elements

-  Much more than just GUI – multithreading, web resources -  Avogadro chosen as an outstanding example of “Qt in Use” -  Marcus Hanwell recently chosen as a “Qt Ambassador”

§  OpenGL for cross platform 3D rendering -  Accelerated rendering of 3D molecular geometry -  Facilitates interacting with the scene -  Use of GLSL for impressive, fast rendering

§  Open Babel for chemical input/output and more -  There are a lot of chemical file formats… -  Has a lot of chemical knowledge, e.g. bond perception

§  Git for distributed version control -  We work across multiple sites, time zones and institutions -  Gerrit for code review more recently – improving code quality

Page 17: Oscon 2011 Practicing Open Science

Evangelizing: Getting the Message Out §  Traditional social media used to communicate

-  Blogs, Planets, Twitter, Identi.ca, Friendfeed, Google+

§  Talks and posters at conferences -  Open source conferences talking about chemistry -  Chemistry conferences talking about open source chemistry

§  Several meetings and workshops about open chemistry -  Daresbury Laboratory: Chemical Visualization and Quixote -  NIH National Cancer Institute – Databases and Open Chemistry

§  Publications in the traditional journals §  Screencasts showing off what the software can do §  In person workshops and training sessions

Page 18: Oscon 2011 Practicing Open Science

Bringing About Real Change §  2011 is the ”International Year of Chemistry” §  Chemistry has been quite closed traditionally §  We are working hard to change this §  Recently led a Phase I SBIR to develop “open chemistry tools”

-  GUI acting as the center of the chemical workflow -  Database application using MongoDB, chemically aware -  Cluster integration on the desktop – submit, monitor and retrieve

§  Chemical simulation/calculation now biggest HPC user in military §  Open tools can use both open and closed computational codes

-  Largely written in Fortran to run on clusters -  NWChem recently open sourced – PNNL quantum code -  Already work with GAMESS, GAMESS-UK, Q-Chem, Gaussian…

§  The time is right for change in chemistry -  Opportunity to accelerate the rate of research

Page 19: Oscon 2011 Practicing Open Science

Funding Open Chemistry Tools §  Kitware’s core business is based on “open collaboration platforms” §  Led a Phase I Small Business Innovation Research project (US Army)

-  Invited to apply for Phase II funding, currently pending §  Make use of Apache and BSD licenses

-  Allow for participation of a wider cross-section of the community -  Reduced licensing complications -  Important for industry and government collaboration

§  Successfully taken part in Google Summer of Code – funded students -  Student in 2007 working on Avogadro and Kalzium -  Mentor for KDE in 2008-2010 -  VTK organization administrator and mentor in 2011

§  Looking to other funding agencies and collaborations in future

Page 20: Oscon 2011 Practicing Open Science

Developing in Niche Areas §  The population of active researchers in chemistry is relatively small

-  The number of those researchers who code is even smaller -  Of those, the number that wish to contribute to open source is tiny

§  Developing and nurturing these communities can be challenging

§  Some students develop a feature in a summer and disappear

§  Other professors might develop code over the summers

§  Have to lower the barrier to entry as much as possible

§  Often need to help with tools, build systems, etc

Page 21: Oscon 2011 Practicing Open Science

Enabling Technologies in Chemistry §  Large number of computational chemistry codes

-  Many do not have dedicated user interfaces -  Forming a new area enabling chemical workflows -  Some of the open source codes that can benefit

-  NWChem – quantum chemistry code -  Quantum Espresso – plane wave code

-  Free for use codes such as GAMESS -  Commercial codes such as Molpro, Q-Chem, others -  These codes are executed in a separate process

§  Libraries that can be used in the GUI: -  The Visualization Toolkit (VTK) provides advanced rendering -  ParaView library provides client-server technology for large data

Page 22: Oscon 2011 Practicing Open Science

Working With Academia, Industry and Government §  In the past licensing has not been ideal

-  Some form of GPL or non-commercial only license fine for most academics -  Industry and government need more liberal licenses in general, e.g. BSD, Apache 2

§  Can be challenging to ensure everyone gets something out of the deal §  Avoiding the trap of dual-licensing – often kills community and shared ownership §  Funders can find it harder to understand commercialization §  We normally employ a services/consulting role

Page 23: Oscon 2011 Practicing Open Science

Government  Open  Source                Collabora'ons  

Brian Wylie Sandia National Laboratories

Sandia  Na7onal  Laboratories  is  a  mul7-­‐program  laboratory  managed  and  operated  by  Sandia  Corpora7on,  a  wholly  owned  subsidiary  of  Lockheed  Mar7n  Corpora7on,  for  the  U.S.  Department  of  

Energy’s  Na7onal  Nuclear  Security  Administra7on  under  contract  DE-­‐AC04-­‐94AL85000.  

Page 24: Oscon 2011 Practicing Open Science

Government Open Source Resources  

•  GOSCON  Government  Open  Source  Conference  (goscon.org)  

•  Open  Source  Center:  Foreign  open  source  intelligence  data  (opensource.gov)  

•  Open  Source  SoQware  Ins7tute:  Non-­‐profit  corp/govt/acad  (oss-­‐ins7tute.org)  

•  Government  Open  Source  SoQware  Resource  Centre  (gossrc.org)    

•  Center  for  Strategic  and  Interna7onal  Studies  (tracks  open  source  legisla7on  csis.org)    

Page 25: Oscon 2011 Practicing Open Science

Government Open Source Around  the  World  

Data  Courtesy  of  the  Center  for  Strategic  and  Interna'onal  Studies  

0  

20  

40  

60  

80  

100  

120  

140  

160  

180  

Europe   Asia   La7n  America  

North  America  

Africa   Middle  East  

Failed  Proposed  Approved  

Open  Source  Ini'a'ves  by  Region  (2000-­‐2009)  

Page 26: Oscon 2011 Practicing Open Science

Government Open Source Example  Projects  

Open  source  data  analysis  and  visualiza7on  pla[orm  

Sandia    Los  Alamos  

Kitware  

University  of  Utah  

Page 27: Oscon 2011 Practicing Open Science

Government Open Source Example  Projects  

Sandia  

Kitware  

Indiana  University  Stanford  

Page 28: Oscon 2011 Practicing Open Science

Government Open Source Collabora'on  Benefits  

Government  

Commercial  

Academic  

No  specific  vendor  “lock-­‐in/out”  Allows  a  diversified  development  team  Known  code  base  (strengths  and  weaknesses)  Typically  easier  to  integra7on  with  other  OS  tools  Improvement  of  the  OS  project    Money  Leveraging  project  for  other/future  work  Improvement  of  the  OS  project      

Student/Professor  support  Publishing/Sharing  Improvement  of  the  OS  project        

Page 29: Oscon 2011 Practicing Open Science

Government Open Source Collabora'on  Issues  

Need  to  relax  into  exis7ng  OS  license*  New  projects  should  pick  a  liberal  OS  license  Funding  source  may  hesitate  on  Open  Source  Proprietary  projects  /  Intellectual  Property    Government  bureaucracy  Mixed  soQware  skill  set  Deliverables  can  get  distorted  *  No  gov’t  sell  back  clause      Work  may  not  be  publica7on  material  If  you  do  publish,  it  may  be  a  joint  publica7on  

Government  

Commercial  

Academic  

Page 30: Oscon 2011 Practicing Open Science

Government Open Source Ques'ons  Sec'on  

Page 31: Oscon 2011 Practicing Open Science

Contact Information

§  Will Schroeder [email protected]

§  Brian Wylie [email protected]

§  Marcus Hanwell [email protected]

Page 32: Oscon 2011 Practicing Open Science

(view included video)