` Collaborating with computer scientists, informaticists*, and software developers for Integrated Ecosystem Assessments Peter Fox 1 ; Heidi Sosik, Stace Beaulieu ([email protected]), and Joseph Futrelle 2 ; David Mark Welch 3 ; Jon Hare and Michael Fogarty 4 1 Tetherless World Constellation (TWC), Rensselaer Polytechnic Institute; 2 Woods Hole Oceanographic Institution; 3 Marine Biological Laboratory; 4 Northeast Fisheries Science Center ECO-OP Use Case: Ecosystem Status Report Outcomes ECO-OP Use Case: Linked Data Provenance Outcomes • Extended the PROV Ontology for capturing provenance in the IPython Notebook, a software platform that enables transparent workflows https://github.com/tetherless-world/ecoop/tree/master/prov • Applied the prov-ecoop ontology to case studies that included the Climate Forcing Chapter, a regional map of primary production, and a fisheries indicator in the Ecosystem Status Report • Book chapter in press for Oceanographic and Marine Cross-Domain Data Management for Sustainable Development: Documenting provenance for reproducible marine ecosystem assessment in open science Marine Biodiversity Virtual Laboratory (MBVL) Work in Progress • Project website: https://tw.rpi.edu//web/project/MBVL/ • Focusing on Objectives 1, 2, and 4 in this first year: 1) developing data access and computational infrastructure for the MBVL; 2) generating derived data products; 4) producing traceable product workflows. • We will work this summer with Matthew Ball, undergraduate student in computer sciences from Bowie State University, in the PEP program. More team members: X. Ma, L. Fu 1 More team members: B. Lee, S. Zednik 1 ; A. Shipunova, A. Voorhis 3 More team members: M. Di Stefano, P. West 1 ; A. Maffei 2 ; G. DePiper, K. Friedland, S. Gaichas, K. Hyde, R. Gamble, M. Jones, S. Lucey 4 The ECO-OP and MBVL projects were funded by the U.S. National Science Foundation, grant numbers 0955649 and 1539256, respectively. • A pilot toward end-to-end transparency from scientists’ desks to a report provided to policy makers and the public, important for science-based decision making. • Prototype enabled an executable workflow for the production of a collaborative, multidisciplinary report with very heterogeneous data types https://github.com/tetherless-world/ecoop/tree/master/pyecoop • Small team with computer scientists and IT specialists working directly with fisheries scientists led to rapid results, with a limiting factor being sufficient training for adoption of technologies by the larger group of domain scientists. • Manuscript under review in Earth Science Informatics: Toward cyberinfrastructure to facilitate collaboration and reproducibility for marine Integrated Ecosystem Assessments Our solution for sharing workflows and delivering reproducible documents: ECO-OP: An abbreviation of ECOsystem and interOPerability Goal: to develop and deploy a software environment to generate a portion of the Ecosystem Status Report for the Northeast U.S. Continental Shelf Large Marine Ecosystem, retaining traceability of derived datasets including indicators of physical pressures and ecosystem states. How can we generalize our workflows for biodiversity? * What is an informaticist? One can think of informatics as the steps and skills involved to make sense out of data – some of this is domain-specific (scientists are informaticists in their domains), and some of this is general to information processing or to the engineering of information systems. Take sample from environment Extract subset of organisms from sample Measure attributes for the sampled organisms Classify the organisms into categories Determine the number of classified organisms in each category VAMPS: Visualization and Analysis of Microbial Population Structures https://vamps.mbl.edu/ IFCB: Imaging Flow CytoBot http://ifcb-data.whoi.edu/ Goal: This research effort brings together computational and information scientists, oceanographers and microbiologists to develop a Marine Biodiversity Virtual Laboratory (MBVL) to address multi-scale, heterogeneous data challenges with informatics solutions that enable the cyber-generation and documentation of biodiversity indicators, providing the traceability between data and information to be used as a basis for sustainable ecosystem-based management and needed policy decisions. Current emphasis on lower trophic levels Goal: to provide standardized provenance as metadata for data products, so that a human (and, ultimately, in the future, a machine) could trace back to the source observational data and models used to compile an indicator. For the community standard, we chose the PROV Ontology for representing and exchanging provenance information as Linked Data in the Semantic Web. Diagram for the three top classes in PROV-O and the properties that relate them. Proposed implementation of a workflow using IPython Notebook to generate a fisheries indicator. Entities: 1 IPython Notebook; 2 Cell; 3 Datasets; 4 script written in other programming language (R) that was split into five Cells; 5 other software environments. Four different agents are identified as contributing source datasets. Activities: 1 CellRun; 2 other activities performed in other software environments. Diagram for TWC Methodology. The use case defines the interactions between people, hardware, software, and desired products and can be adjusted or refined after each iteration of the cycle. PDF of Climate Forcing Chapter IPython (now Jupyter) Notebook