Top Banner
Submitted as a companion document to the document titled ‘CGIAR Consortium Progress Report: October 2015 – March 2016’ for Fund Council 15 CGIAR Open Access and Open Data Phase I (2015) Progress Report: 31 January 2016
13

CGIAR Open Access and Open Data Phase I (2015) Progress Report€¦ · CGIAR Open Access and Open Data Phase I (2015) Progress Report: ... to comply with whichever OA/OD policy the

Jun 24, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CGIAR Open Access and Open Data Phase I (2015) Progress Report€¦ · CGIAR Open Access and Open Data Phase I (2015) Progress Report: ... to comply with whichever OA/OD policy the

Submitted as a companion document to the document titled ‘CGIAR Consortium Progress Report: October 2015 – March 2016’ for Fund Council 15

CGIAR Open Access and Open Data

Phase I (2015) Progress Report:

31 January 2016

Page 2: CGIAR Open Access and Open Data Phase I (2015) Progress Report€¦ · CGIAR Open Access and Open Data Phase I (2015) Progress Report: ... to comply with whichever OA/OD policy the

CGIAR OA-OD Phase 1 project report – Jan 2016 ANNUAL PROJECT REPORT: CGIAR Open Access and Open Data Phase I (2015): Assessment, Prioritization, and Coordination of CGIAR’s Current Open Environment

CGIAR – like the agricultural science community in general – still requires substantial work before

seamless and consistent information discoverability, integration, and interoperability between

related outputs is achieved. The 12-month Phase I Open Access and Open Data (OA/OD) project

funded by the Bill and Melinda Gates Foundation tackles some foundational needs towards these

goals through activities organized around five key objectives. These objectives are part of a strategy

to enhance CGIAR’s impact by enabling the discovery of, unrestricted access to, and effective reuse of

publications, data, and allied research products, particularly those deemed to be of high quality and

value, with the potential for accelerating change and catalyzing innovation. This annual report

provides an overview of the progress made since the project was approved in mid-January 2015.

Objective 1. Conduct a broad inventory and assessment of CGIAR capacity in OA/OD.

Effective implementation of OA/OD across CGIAR requires clear understanding of each Center’s

needs and capacity with respect to managing research outputs for openness. A broad and multi-

faceted assessment was undertaken to assess how information from different data streams was being

managed (including but not limited to: genetic/genomic; genebank; agronomy; breeding; natural

resource management—including soils, hydrology, climate and more; socioeconomic—including

surveys, food security, poverty, livelihoods, nutrition and allied areas; geospatial, and other sectors).

Fourteen Centers and two CRPs completed the assessment, designed as a 58-question survey to

determine: the OA publications landscape across CGIAR (Output 1.1); data management and quality

practices (Output 1.2); how other research products are handled (Output 1.3); and gaps and needs

in human resources and enabling environments for OA/OD (output 1.4).

Summary of Findings

Centers are approaching their Open Access/Open Data (OA/OD) operations in different ways – there are a wide range of approaches, priorities, and workflows among Centers. The open-ended questions at the end of the survey elicited vastly different reactions. Respondents were assured anonymity, so Center identities have been masked in this summary, although future support and recommendations will be based on individual responses.

Policies, Plans, and Workflows

Centers are split between having their own OA/OD policies and using the CGIAR Open Access/ Data Management Policy. Eight Centers reported having a separate Open Access/ Publications Policy, while ten Centers reported having a separate Open Data/Data Management policy.

Of those Centers with separate publications and/or data policies, only two explicitly state timelines in line with the CGIAR Policy. One Center with a 2012 (pre-CGIAR) policy is currently revisiting the document, and expects to implement timelines consistent with the CGIAR Policy. The policies of three Centers reside on their intranets, and are therefore not accessible.

Anecdotal evidence from the regional workshops indicates that those with a separate OA/OD policy pursued this path to try to get stronger buy-in from Center leadership and/or researchers, while other Centers indicated the CGIAR OA/DM policy was sufficient. Potential discrepancies between Center and CGIAR policies are being addressed via individual Center dialogue and advocacy.

All responding Centers and CRPs indicated that their CRP personnel and partners are expected to comply with whichever OA/OD policy the Center uses, but individual agreements may vary.

Page 3: CGIAR Open Access and Open Data Phase I (2015) Progress Report€¦ · CGIAR Open Access and Open Data Phase I (2015) Progress Report: ... to comply with whichever OA/OD policy the

CGIAR OA-OD Phase 1 project report – Jan 2016

Nearly all of the respondents (13) indicated they were using or planned to use the implementation plan template provided by the Consortium Office as the starting point for preparing their OA/OD Implementation Plan. Three other respondents indicated they were unsure. These responses are consistent with the plans that have been shared with the Consortium Office up to this point, and heartening as a move towards consistent OA/OD operationalization and appropriate needs identification and resolution.

Workflows, capacity, and day-to-day responsibility for OA/OD operations vary widely among CGIAR Centers and reporting CRPs. Libraries, knowledge centers, research data management/ research support units are involved in such workflows for publications. On the data side, some Centers have a dedicated data management or similar unit, while others rely on their library or KM team for support. A few Centers referred to a GIS specialist or unit for support in this particular data stream. Communications departments were mentioned by some Centers/CRPs, and a few Centers highlighted support by their legal team or IP focal point.

Nearly all respondents indicated that they have data management workflows in place or under development. Of the two respondents indicating otherwise, one of the two has since begun to develop guidelines.

Open Access to Publications & Publication Repositories

The percentage of peer-reviewed publications in the repository that are fully-downloadable without restriction is growing. Two Centers reported that over three quarters of the peer-reviewed publications within the repository are fully-downloadable without restrictions. Many of the Centers/CRPs responding indicated that around half (40-60%) of the peer-reviewed publications in the repository are fully-downloadable without restrictions.

Most Centers are moving towards DSpace (9) or other standards-compliant, interoperable OA repository platforms such as EPrints, Invenio, ContentDM, or KOHA.

A few Centers are still relying on websites or other platforms that are not typically considered to be OA standards-compliant repositories (e.g. SharePoint, home-grown solutions). Individual follow-up is ongoing to address these issues.

Publication dissemination mechanisms for CRPs need to be further clarified, and workflows and/or guidelines developed to help with these.

Reasons for selecting particular repository systems vary, although being open source, good interoperability, and adoption by other Centers were cited as the top reasons.

OAI-PMH is the primary interoperability protocol in use by publications repositories; this is reassuring, as OAI-PMH is an acceptable OA harvesting protocol, working with Dspace and DataVerse, the two primarily used publications and data repositories across Centers.

Nearly all Centers/CRPs report using Dublin Core and/or plan to use CG Core. Testing/implementation of CG Core is ongoing, with feedback expected in Jan/Feb 2016.

The adoption of consistent repositories, metadata applications, and OAI-PMH will make it possible to begin to harvest and aggregate metadata and associated files from CGIAR OA repositories.

Open Data and Data Repositories

Nearly all of the responding Centers/CRPs indicated that they have a data repository in place or in development; several indicated having several repositories.

Many respondents reported that all of the data sets in their repository are publicly-accessible. However, many of the repositories contain a small number of datasets and/or the percentages are misleading. Further follow-up is in process to resolve these issues.

Page 4: CGIAR Open Access and Open Data Phase I (2015) Progress Report€¦ · CGIAR Open Access and Open Data Phase I (2015) Progress Report: ... to comply with whichever OA/OD policy the

CGIAR OA-OD Phase 1 project report – Jan 2016

Most Centers are adopting Dataverse (8 Centers and 2 CRPs) as the platform for at least some of their repositories. Two others are using CKAN. “Good interoperability” was the most-often cited reason for selecting a particular system, with “open source” as the next most often selected response.

In terms of data streams, agronomy data, socioeconomic data, and plant breeding have been reported as having more than 1,000 openly-accessible data sets across responding CGIAR Centers/CRPs; however, a majority of the agronomy data sets are in the AgTrials repository, with the quality and reusability of the data and associated annotation being generally highly variable. A vast majority of the plant breeding data reported is also from one Center repository, and is still being assessed.

Uploads/deposits of data sets appear more or less evenly split between data managers and scientists. Workflows to streamline and ease uploading and license choice etc. are being gathered and/or developed based on feedback from the Communities of Practice.

Practices for data quality and/or data cleaning are not consistent among Centers, and responsibilities for these also vary. Several Centers (5) indicated that no one is specifically responsible for assessing data quality or cleaning data. Five other Centers indicated that researchers, project leaders, or science theme leaders are responsible, while 6 Centers indicated that they have a research methods or similar unit that is involved.

OAI-PMH was referenced by seven of the responding Centers/CRPs as being in use within the data repository. Likewise, most indicated that Dublin Core and/or the CG Core metadata schema released in September by the Consortium Office team will be adopted.

Several respondents mentioned GitHub as the repository of choice for apps/software.

Current Practices and Culture

Only one Center indicated having a centralized Open Access fund to help pay publication fees. Another noted that OA publication costs are split between the Center’s KM Unit and research divisions. Two Centers indicated in the survey – as others have indicated in their OA/OD Implementation Plans – that researchers are being encouraged to incorporate budgets for OA in new project proposals (but anecdotal evidence suggests this is meeting with some resistance, pointing to a need for significant advocacy to effect culture change).

While researchers are being encouraged to budget for OA in the project management cycle/new project proposals, only a few Centers indicated that the OA/OD focal point(s) were being consulted during project planning to ensure consistent implementation.

Three Centers indicated they have or are planning a subscription to Altmetric.com to better assess usage of open products. One Center uses the free altmetric.com API. Several of the Centers involved in CGspace are also exploring the possibility of embedding Altmetric.com.

Responses were quite varied to questions around researcher awareness of and compliance with OA/OD requirements. Many Centers indicated that they hope to incorporate OA/OD into researcher performance evaluation and workplans in the coming years – although few appear to have concrete steps towards this goal. Several Centers also suggested that support, education/advocacy, and incentives would help the organizations move forward.

Opportunities and Concerns – Technology, Culture, and Budgets

Common concerns cited were: o Lack of funding, capacity, process -- encompassing staff time, staff resources,

infrastructure (e.g. “financial and human resources constraints”; “funding for curation and management [and implementation]”; “dedicated resources [needed for OA/OD]”;

Page 5: CGIAR Open Access and Open Data Phase I (2015) Progress Report€¦ · CGIAR Open Access and Open Data Phase I (2015) Progress Report: ... to comply with whichever OA/OD policy the

CGIAR OA-OD Phase 1 project report – Jan 2016

“[need] communication and training”; “library staff [not] part of approval process and email notifications for new grants”).

o Lack of culture/attitude/awareness among researchers (e.g. “main challenge is to convince scientists to open their data…”; “researchers…used to doing just the opposite [of sharing] – holding on to [data] and guarding it jealously”; “enabling a data culture is not an easy task”; “attitude of staff towards OA/OD”).

o Lack of researcher incentives -- OA/OD not being part of researchers’ performance evaluation criteria (e.g. “OA/OD [not] in workplan and evaluation procedures of scientists”; “no metrics on data sharing”).

Several Centers/CRPs offered suggestions and ideas for raising awareness about and incorporating OA/OD into their organizational cultures:

o Improve Center capacity to support OA/OD – financial support needed so this is no longer an unfunded mandate

o Work with top-level leadership to ensure buy-in o Trainings/workshops and webinars with Center and partner researchers (“data

clubs”, “data talks” to raise awareness, advocate for, and provide the “how to” o Easy-to-use guidelines, templates, workflows, informational materials (e.g. via

regular Center news outlets) to support adoption of OA/OD practices o Survey researchers to understand their needs/gaps relating to OA/OD to best support

them o Work with HR Teams to have Research Data Management sessions during induction o Highlight/reward scientists who publish in OA journals and share data (e.g. via social

media); OA/OD champions as speakers (Science Week, OA/OD week, other events) o Include the OADM Policy in CRP Branding Guidelines o Include OA/OD (e.g. data collection and management) in work plans, proposals, and

ensure accountability via scientist evaluations; consider penalties for non-compliance

o Include published data (and not just publications) in performance reviews o Compile and share OA/OD success stories as accompaniment or follow-ups to

trainings and communications to motivate researchers

The final deliverable (Output 1.5) is the design of an indexing tool that will help assess each CGIAR

Center repository’s compliance with interoperability standards and “openness”, and provide

seamless and easy access to CGIAR research outputs through a central, user-friendly portal

supporting querying and retrieval of information from all CGIAR publications and data repositories.

Substantial progress has been made on this via iterative work with Cascadeo Inc., which has

developed functional requirements for different user types, a conceptual architecture and design for

the portal, and a roadmap towards its realization.

Objective 2. Develop a legacy data prioritization framework.

It is virtually impossible and very expensive to identify all datasets and publications from the past

several decades that are not currently openly accessible, but there are high-value outputs among

these that could perhaps be identified and made discoverable as global goods. A data prioritization

framework (Output 2.1) is being drafted with input from Center staff and AgMIP leaders who have

dealt with similar issues, with members of the Data Management Task Force (DMTF) soon to be

involved to ensure that the framework will hold true for varied data streams (e.g. genetic/genomic;

genebank; agronomy; breeding; natural resource management, including soils, hydrology, climate

Page 6: CGIAR Open Access and Open Data Phase I (2015) Progress Report€¦ · CGIAR Open Access and Open Data Phase I (2015) Progress Report: ... to comply with whichever OA/OD policy the

CGIAR OA-OD Phase 1 project report – Jan 2016 and more; socioeconomic, including surveys, food security, poverty, livelihoods, nutrition and allied

areas; geospatial, and other sectors.

An early draft workflow has been developed using the US Federal CIO Council’s Open Data

Prioritization Toolkit, which applies to data collected by US governmental agencies. These materials

have yet to be shared for feedback, and are likely to evolve. The guidance in this kit results in a data prioritization matrix (example in Figure 1 below) based on the value, cost, and risk of making a

dataset public following the completion of a workflow involving:

o Data attributes (Table 1)

o Data Evaluation Factors (Table 2)

o Data Evaluation Workbook (with a scoring guide included; Table 3)

Page 7: CGIAR Open Access and Open Data Phase I (2015) Progress Report€¦ · CGIAR Open Access and Open Data Phase I (2015) Progress Report: ... to comply with whichever OA/OD policy the

CGIAR OA-OD Phase 1 project report – Jan 2016

Table 1. Data prioritization framework: Data attributes.

Question

1 What is the source of the dataset?

Lead organization

Project or program

Country

2 Does the data source span multiple organizations?

3 The data in the dataset primarily belongs to what datatype (pick from the following drop-down choices)?

4 What automated or high-value tools have been used to generate the dataset? E.g., UAVs, satellites, sequencers, data loggers etc.

Question

5 In what format or schema does the data currently exist? E.g. XML, HTML, JSON, TXT, CSV

6 Is the data in a machine readable format? E.g., RDF, XML, JSON

7 What are the characteristics of the metadata schema associated with the dataset?

Is the metadata or descriptive attributes associated with the dataset adequate?

Is the schema mappable to the repository to be used for storing and sharing the dataset?

Does the metadata include the CG Core metadata elements?

8 Does a data dictionary exist for the data elements? I.e., are the data elements/column titles explained?

9 Does the data set include some indication of methodologies, if appropriate?

10 Does the data include personally identifiable information (PII) or other protected information that should be risk evaluated prior to release?

Question

11 How many people/organizations have requested access to the data?

12 Has the data been requested by external users or other government agencies in the past or on recurring basis?

13 What is the frequency of user requests/consumption for the data?

14 Is the data being shared with targeted consumers or openly available?

15 Is there an understanding of how users consume and/or utilize the data?

Question

16 How often was/is the data collected and processed?

17 Over what period (in years) was the data collected?

18 What is the estimated lifespan of data usability in years? E.g., 1-3; 3-6; 6-9; 10+

19 How often (in years) did/does the data need to be refreshed?

20 If data collection was/is recurrent, how often was/is the data released?

21 What mechanism is used to distribute data?

Question

22 Who is responsible for managing and maintaining the data? E.g. data steward/curator; librarian; scientist etc.

23 Who is the primary data owner?

24 Is the data steward/curator and the data owner the same person?

25 What workflow processes are in place to manage the data -- in brief? I.e., who is responsible for what and when

26 What, if any, processes are in place to maintain the dataset? E.g. QA/QC; format change/preservation etc.

27 Is the data lifecycle cost budgeted for?

28 Are there specific dataset security requirements?

Question

29 Were/are the data collectors trained in gathering this data?

30 Are there likely to be potential issues with data credibility?

31 Is the uncertainty about data accuracy and validity --if any-- documented in the dataset?

32 Briefly, what quality control processes were/are established and followed?

33 How often (in years) was/is the data verified and validated?

34 Could a mosaic effect resulting from the release of the data reveal private or sensitive information?

Users

Frequency and Distribution

Operation and Maintenance

Dataset Attributes

Response

Response

Source

Response

Format

Response

These Dataset Attributes can be customized based on a given entity's needs. The below questions are additional to the metadata that may already accompany the dataset, and are not necessary to complete unless the metadata associated with the dataset is

inadequate.

Data Integrity

Response

Response

Page 8: CGIAR Open Access and Open Data Phase I (2015) Progress Report€¦ · CGIAR Open Access and Open Data Phase I (2015) Progress Report: ... to comply with whichever OA/OD policy the

CGIAR OA-OD Phase 1 project report – Jan 2016

Table 2. Data prioritization framework: Data evaluation factors.

Data Evaluation Factors

Topic Sub-Topic Question

Who are the current or future internal and/or external users?

What external agencies might have interest in the dataset?

What is the estimated number of users?

Are the data of short- or long-term value to stakeholders?

How frequently might the dataset be consumed?

Is the value derived from each data interaction likely to be high, medium, or low?

Are there any limitations on data analysis and use?

Is the data currently used to drive institutional research and development efforts?

Is the data leveraged in decision making within or outside the institution?

Does the data increase project, program, or institutional efficiency and/or effectiveness?

What is the potential of the data to fuel innovation (e.g., enable the development of new tools)?

If used by secondary users, what is the potential of the data to lower costs?

What is the potential of the data to create economic value or growth?

What is the potential of the data to open up new business opportunities?

What is the potential of the data to catalyze new collaboration efforts?

Will the format of the data need to be converted in order to share or use the dataset?

Is there metadata and definitions within the dataset that ensure understanding of the data?

Is the estimated overall cost for data preparation likely to be high, medium, or low?

What is the estimated time to prepare the data for release?

How will changes be identified after initial publication?

How frequently will the data require a refresh?

What are the estimated overall costs for data maintenance?

Are there required processes in order to share the data?

How significant is the involvement of other programs and/or units (e.g., IP/legal) in sharing the data?

Are there regulatory (i.e. privacy, security, accessibility, etc.) concerns associated with sharing the data?

What entities will commit human and financial resources to sharing the data?

What are the additional lifecycle costs for data sharing?

What additional technology resources will be needed?

What system changes need to be implemented in order to share the data? What is the estimated cost?

Will sharing the data require additional hosting capacity or a different hosting technology? Would de-identifying the data eliminate its utility?

Is a process in place for collecting public feedback on the data and what is the associated cost of maintaining that process?

Will the release of the data have any unintended consequences (e.g. discrimination against an individual / group, release of protected health information[1], or the mosaic effect)?

Does the data pose a security risk when combined with currently available information?

Does the data disclose information regarding the security of institutional information or communications systems?

Does the data disclose information regarding physical security of facilities (owned or leased)?

Does the data disclose detailed critical infrastructure information?

Is the data accurate and credible?

What are the potential consequences from the data being misinterpreted?

Are there international, foreign, or other restrictions limiting the release of data?

* Note: If you anticipate that releasing data to external organizations will expose national security information, then you will most likely be unable to release the dataset.

Operation

and

Maintenance

Privacy and Unintended

Consequences

Security*

Description: Organizations can begin to evaluate their datasets based on value, cost, and risk to define the impact that releasing the data will have on the institution, data consumers, and society. Each dataset may be assessed and

scored using the below factors in line with guidance (to be developed).

Other

Considerations

Value

Stakeholders

Value Drivers

Risk

Cost

Format

Frequency

Review

Page 9: CGIAR Open Access and Open Data Phase I (2015) Progress Report€¦ · CGIAR Open Access and Open Data Phase I (2015) Progress Report: ... to comply with whichever OA/OD policy the

CGIAR OA-OD Phase 1 project report – Jan 2016

Table 3. Data prioritization framework: Data evaluation workbook.

Data Evaluation Workbook

Stak

eho

lder

s

Val

ue

Dri

vers

Form

at

Freq

uen

cy

Rev

iew

Op

erat

ion

an

d

Mai

nte

nan

ce

Pri

vacy

an

d U

nin

ten

ed

Co

nse

qu

ence

s

Secu

rity

Oth

er C

on

sid

erat

ion

s

Val

ue

Sco

re

Co

st S

core

Ris

k Sc

ore

Assigned Weight [0-4] 4 4 2 4 3 2 4 3 2

Dataset Title 0 no Impact

A 2 3 2 0 1 1 3 2 1 20 9 20 1 Moderate/Low Impact

B 0 1 2 4 1 2 1 3 1 4 27 15 2 Moderate Impact

C 3 1 2 0 2 1 1 4 0 16 12 16 3 Moderate/High Impact

D 0 1 4 2 2 1 0 0 1 4 24 2 4 High Impact

E 1 0 4 4 4 4 3 4 2 4 44 28

F 4 3 2 0 3 2 0 1 2 28 17 7

G 2 4 4 3 4 3 4 3 2 24 38 29

H 4 4 2 4 3 4 4 2 3 32 37 28 0 Negative

I 2 1 0 1 0 2 4 2 1 12 8 24 1 Somewhat Negative

J 4 3 3 1 0 2 0 0 1 28 14 2 2 Neutral

K 4 3 3 2 4 4 4 4 4 28 34 36 3 Somewhat Positive

L 2 3 4 4 3 4 2 3 3 20 41 23 4 Positive

M 1 2 1 0 2 3 1 1 1 12 14 9

N 4 1 0 1 0 1 1 2 1 20 6 12

Responses (0-4)

Impact on Publising Data

Description: In order to prioritize datasets to be made open, the datasets must be assessed for their value to potential users; the cost of preparing, releasing, and maintaining them;

and the risk of releasing them. This Data Evaluation Workbook provides a means for rating each dataset by value, cost and risk.

Instructions: For each sub-topic (in row 4), a weight should be applied (in row 5) to determine the general importance or impact of the sub-topic from the an organization's

perspective (i.e., "risks" associated with releasing data at one institution may be of less importance or concern than "costs" of opening data at another institution due to the nature

of the data and mission focus). A weight of "0" signifies that releasing the data will have "No Impact" to the organization. A weight of "4" signifies that the data will have a "High

Impact" to the organization. For each dataset listed under "Dataset Title" (in column A), provide a rating based on each evaluation factor sub-topic (in row 4). Rate each from 0 to 4.

"0" represents a negative effect on opening the data, and a "4" represents a positive effect on opening the data. Refer to the Dictionary tab for more details about the weight and

rating scales.

Ratings [0-4]

RiskValue Cost

Weights (0-4)

Page 10: CGIAR Open Access and Open Data Phase I (2015) Progress Report€¦ · CGIAR Open Access and Open Data Phase I (2015) Progress Report: ... to comply with whichever OA/OD policy the

CGIAR OA-OD Phase 1 project report – Jan 2016

Datasets in the top left quadrant should be given the highest priority for public release because they have high value and low risk. As each organization has different resource constraints, the cost will need to be weighed in relation to the risk and value.

Green circles are datasets with the highest priority for release as Value is high, Risk is low and Cost is low. Yellow datasets should be given moderate consideration even though the Risk, Value, and/or Cost may be somewhat undesirable. Red datasets should be given the lowest priority as Risk is high, Value is low and Cost is high.

Figure 1. Data prioritization framework: Data prioritization matrix, built using the data evaluation factors and workbook.

The X and Y axes represent "Risk" and "Value" respectively, circle size represents "Cost."

Objective 3. Provide coordinated support to Centers and CRPs in their ongoing efforts towards

OA/OD, and leadership for external efforts.

It is clear from the OA/OD assessments conducted in 2014 and 2015 that almost each Center and CRP

has taken steps towards making publications and data openly accessible, and towards better

documenting and organizing other research outputs. This work is continuing, but is likely to be of

limited utility without a consistent approach to human and technical infrastructure, policies,

standards, and interoperability—among other issues. In order to ensure that the OA/OD “scaffolding”

being erected across CGIAR can interlink and pay off, it is critical the OA/OD initiative continues

providing coordinated support to each Center and CRP.

In recognition of Centers’ expressed need for a set of tools, best practices and examples to help with

OA/OD implementation, an OA/OD Support Pack has been developed (Output 3.1) as a living

resource which brings together exemplar policies, templates, workflows, guidance on licensing and

publishing while retaining copyright, data management plans, advocacy materials and talking points

in support of OA/OD, ToRs for OA/OD related positions – and more. This resource currently exists as

an open cgxchange platform that has only been publicized to the knowledge and data manager

communities at CGIAR Centers, but will continue to grow with more materials added that respond to

the Center needs identified above. Simple usage analytics indicate that almost 9000 sessions have

been initiated on the site by about 4000 users in a one-year period until January 2016. A more

attractive and user-friendly redesign and deployment of the Support Pack is being considered

(perhaps along the lines of the University of Oxford’s Research Data “tools, services, and training”

webpage).

To increase momentum towards OA/OD, each Center (as the enduring entity and ultimate steward

of resources) is expected to develop for itself–and the CRPs it leads–a clear, practical, and actionable

Open Access and Data Management Implementation Plan which complies with CGIAR’s Open Access

and Data Management Policy (Output 3.2). A template implementation plan was developed to

encourage consistency among Centers, and regional workshops focused on using the template to

Page 11: CGIAR Open Access and Open Data Phase I (2015) Progress Report€¦ · CGIAR Open Access and Open Data Phase I (2015) Progress Report: ... to comply with whichever OA/OD policy the

CGIAR OA-OD Phase 1 project report – Jan 2016 draft implementation plans were organized. Two of these were held in April, in Kathmandu, Nepal

(organized by CIMMYT-Nepal) for Centers in Asia and at ILRI in Addis Ababa, Ethiopia for those in

Africa, the Middle East, and Europe. A last workshop for the Americas Centers is scheduled for August

18-21 at CIAT, in Cali, Colombia. Three-four data, information, communications, and knowledge

specialists, and/or legal/IP personnel attended for each Center, and were generally able to leave with

a draft implementation plan in hand. About 7 Centers have submitted drafts to the Consortium Office; others are still being finalized through discussion with Center leadership.

Infrastructural work to improve and link CGIAR tools and platforms (Output 3.3) was begun before

this project started, through substantial in-kind contributions from staff at several Centers (CIAT,

Bioversity, CIMMYT, ILRI, IFPRI). This work is continuing under the auspices of the OA/OD project,

and includes the following activities:

Development of a reference ontology for agronomic trial management, which will enable linkages between breeding and agronomy data, be available as Linked Open Data, and of value

to a variety of tools and users, including the IBP-BMS and AgTrials. This activity is being led

through a sub-grant by a Bioversity-based team specializing in ontologies, and includes the

active engagement of the AgMIP team and the ICASA data dictionary developed by it, and

input from CIRAD and INRA scientists, with CGIAR researcher input to come on a future draft.

631 variables from mainly ICASA Management section, and 324 variables developed by

Medha Devare have been compared. Among them 300 variables have been selected for the

Agronomy Ontology. A new convention has been defined for a 6-letter code related to each

parameter and each variable’s abbreviation is composed of the parameter, the method class

and the unit abbreviations. Variables are distributed in three categories: Experiment

information, agricultural practices and monitored data. A first version of the experiment

information and the agricultural practices categories has been successfully completed. The

monitored data category is being developed. Where relevant, Agronomy Ontology variables

are cross-referenced with Agrovoc1 variables. Identification of useful external ontologies to

leverage is performed in parallel. A first consultation of about 10 scientists from INRA and

CIRAD was organized at the CGIAR Consortium Office in early December 2015 to (i) introduce

the project, (ii) initiate a discussion about the harmonization of data annotation, and (iii)

identify scientists willing to contribute. These scientists are either agronomists,

socioeconomists, or data experts working with information systems for agronomic trial

networks like AGROSYST and ADONIS. There was strong interest in the Agronomy Ontology,

with 8 scientists agreeing to participate in an interest group that will provide guidance and

content for the ontology, and feedback on the database schema for the fieldbook.

The agronomy ontology will underlie the agronomy field book, envisioned to allow agronomists across CGIAR and its partners to create data sheets and collect data (laptop,

tablet, or mobile based) with consistent metadata, terminology, scales, and methodologies,

and to also store them temporarily in the IBP database with view/edit permissions. A one-

1 AGROVOC is a controlled vocabulary covering all areas of interest of the Food and Agriculture Organization (FAO) of the United Nations, including food, nutrition, agriculture, fisheries, forestry, environment etc. AGROVOC consists of over 32,000 concepts available in 23 languages

Page 12: CGIAR Open Access and Open Data Phase I (2015) Progress Report€¦ · CGIAR Open Access and Open Data Phase I (2015) Progress Report: ... to comply with whichever OA/OD policy the

CGIAR OA-OD Phase 1 project report – Jan 2016

click upload to AgTrials and APIs to easily deposit final field book data in key repositories is

also envisioned; all these improvements will go a long way towards addressing data quality

issues. Preliminary design of the fieldbook is underway with ODK as a simple proof-of-

concept platform, with full-scale testing envisioned in February-March.

Improvements to the AgTrials interface, navigation, metadata schema, and data submission forms are underway, with agreement on these fronts made with the CCAFS team at CIAT in

August, and backed via a sub-grant to the AgTrials team at CCAFS.

Work with the CSI community to address geospatial data needs and accessibility is

continuing, partially using project funds, and by leveraging other funding sources (e.g. via the

recently approved African Agriculture Technology Platform’s Virtual Information Platform (AATP-VIP), but prioritizing linkages with additional platforms where useful.

Through the OA/OD initiative, CGIAR is collaborating with and providing leadership on OA/OD

initiatives relating to the agricultural domain (Output 3.4), working closely with FAO (on determining

a coherent road map for agrisemantics efforts), AgMIP on data harmonization and interoperability,

and the Global Open Data for Agriculture and Nutrition (GODAN) initiative on impact assessment and

advocacy efforts. The OA/OD project is also enabling work with INRA and WUR on maturing the Open

Data Journal for Agricultural Research (ODjAR) to incentivize researchers to make their data open.

Other collaborations envisioned include close engagement with the conceptualization, writing, and

implementation of two potential big data platforms:

o “Big Data & ICT” cross-cutting platform of the CGIAR Research Programs, to be led by

CIAT and IFPRI

o “Biodiversity Informatics Platform”, led by DivSeek to leverage outputs of the Gates-

funded GOBII project involving Cornell University and CGIAR

Objective 4. Plan for impact assessment.

Successful implementation of this project (Phase I and Phase II) leads to the question: What, if any,

impact has the move to open access had? This area is new enough that the appropriate metrics to

assess true impact—not just citation counts—are still being discussed, developed, and reformulated.

This project will enable the development of a draft framework allowing an assessment of the impact

of implementing OA/OD across CGIAR (Output 4.1). The framework will be developed in close

collaboration with GODAN, FAO, and CTA, who have also allocated funding to this, and are struggling

with many of the same issues as CGIAR. The framework will be finalized through feedback from

Center-based staff, vendors (e.g. Altmetrics Inc.), and ideally, key personnel involved in the open

source Lagotto project. The goal is to include but go beyond traditional metrics such as the open

fraction of total publications and data sets produced by Centers and CRPs and estimated usage of

these open products via citation and download counts to assess whether OA/OD has resulted in

increasing data-driven innovation, stakeholder empowerment, and services, and transparency.

Objective 5. Plan for Phase 2: Implementation.

This Phase I, 12-month project provides data and information to indicate where additional funding

support should be focused to move CGIAR further along the road to findable, accessible,

interoperable, and reusable (FAIR) research outputs. Discussions with a number of funding and

implementing entities in the agricultural domain (BMGF, USAID, USDA, NCBI, CIRAD/INRA) will

Page 13: CGIAR Open Access and Open Data Phase I (2015) Progress Report€¦ · CGIAR Open Access and Open Data Phase I (2015) Progress Report: ... to comply with whichever OA/OD policy the

CGIAR OA-OD Phase 1 project report – Jan 2016 inform progress towards common understanding, policies, support, expectations, and vision where

possible through a Phase II proposal that will render CGIAR’s publications and data discoverable,

accessible, and reusable, and deliver an integrated and contextualized view of research outputs

across data stream, type, discipline, Center, and CRP (Output 5.1). The proposal will include sub-

grants to Centers based on needs articulated via assessments, workshops, and implementation plans

to bridge capacity gaps, and address culture change and advocacy. These sub-grants will be awarded based on RFPs, and enable Phase I momentum to continue until OA/OD becomes an integral part of

CGIAR project proposals. In addition to facilitating an enabling culture for OA/OD, the Phase II

proposal will emphasize the creation of an infrastructure to enable data discoverability,

interoperability, and harmonization that will in turn allow intelligent integration and analysis. A

mature indexing system and overarching portal to enable discovery of research outputs across CGIAR

is seen as a centerpiece of this proposal.