Top Banner
GMOD in action: the Legume Federation project Ethy Cannon Iowa State University GMOD 2016
44

GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Oct 11, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

GMOD in action: the Legume Federation project

Ethy Cannon Iowa State University

GMOD 2016

Page 2: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

●  Describe the Legume Federation

●  Show how we are using GMOD components to achieve our aim

●  Open a discussion:

○  Why form federations?

○  What are we missing that GMOD can provide?

○  What is missing in GMOD to support a federation?

Goals for this talk

Page 3: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

The Legume Federation - http://legumefederation.org/

The Legume Federation is an NSF project to build a federation of legume databases through data standards, distributed development and comparative analysis, to support research across the legume family, and to support robust agriculture for a world that is significantly "legume-fed".

3

Page 4: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Investigator institutions:

Iowa State University

National Center for Genomic Research (NCGR)

USDA-ARS

J. Craig Venter Institute (JCVI)

CyVerse

The Legume Federation - http://legumefederation.org/

Page 5: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Members and collaborators: Alfalfa Genome Cool Season Food Legume Database Feed the Future Climate Resilient Chickpea KnowPulse Legume Information System (NCGR & USDA-ARS) Medicago truncatula HapMap Medicago genome (JCVI) PeanutBase (ISU & USDA-ARS) SoyBase

The Legume Federation - http://legumefederation.org/

Page 6: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

•  Communication (human and computer) and cooperation.

•  Sharing data and software components.

•  Agreement on data exchange formats, terms, web service APIs, requirements for data deposit (e.g. use of standard repositories, metadata, integrity).

•  Caring for full lifecycle of a web resource, which may include porting to a more permanent resource at the end of funding.

•  Respect a level of autonomy and recognize need for domain experts.

6

But what is a federation, really?

Page 7: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Why federate?

•  Data management grows ever more expensive.

•  Extend limited personnel and resources.

•  Proliferation of specialized but useful web resources.

•  Help for smaller members.

7

Page 8: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Why federate legume web resources?

•  Legumes are extremely important: •  high-protein food •  forage and feed •  improve soil

8

•  Many research communities Medicago truncatula, Lotus japonicus, adzuki bean, alfalfa, apios, bambara groundnut, birdsfoot trefoil, black gram, carob, chickpea, clovers, common bean, cowpea, faba bean, fenugreek, grass pea, guar, horse gram, indigo, lablab, lentil, licorice, lima bean, lupin, moth bean, mesquite, mung bean, pea, peanut, pigeon pea, rice bean, scarlet runner bean, soybean, tamarind, tepary bean, yellow pea, vetch, winged bean

•  Taxonomic relatedness enables comparative research

Page 9: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

•  Communication, coordination and collaboration

•  Data and metadata standardization and exchange

•  Data repository

•  Linking data across legume species

•  Development

•  Training

The Legume Federation

Page 10: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

•  Communication, coordination and collaboration*

•  Data and metadata standardization and exchange*

•  Data repository

•  Linking data across species

•  Development*

The Legume Federation

Page 11: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

•  Communication, coordination and collaboration

•  Data and metadata standardization and exchange

•  Data repository

•  Linking data across species*

•  Development

The Legume Federation

Page 12: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Also...

•  Provides web resources for small research communities

•  Provides web resources for long-term projects generating

significant quantities of data

•  Developing sharable data curation practices

•  Supports full lifecycle of web resource

The Legume Federation

Page 13: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Also...

•  Provide web resources for small research communities*

•  Provide web resources for long-term projects generating

significant quantities of data*

•  Develop sharable data curation practices

•  Support full lifecycle of web resource*

The Legume Federation

Page 14: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Communication, coordination and collaboration

•  Coordinate and communicate across legume web resources

•  Share development efforts

•  Engage major data generators

•  Communicate with research communities

14

Page 15: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Communication, coordination and collaboration

•  Coordinate and communicate across legume data centers GMOD community

•  Share development efforts Tripal/Chado, InterMine

•  Engage major data generators Provide VMs of website/database with data loaders

•  Communicate to research communities

15

Page 16: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Data and metadata standardization and exchange

•  Standardization of metadata

•  Standardization of data exchange

•  Use of established ontologies

•  Use of common data collection templates

16

Page 17: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

•  Standardization of metadata Tripal, ?

•  Standardization of data exchange GBrowse, JBrowse, Tripal web services, Chado, ?

•  Use of established ontologies Tripal, ?

•  Use of common data collection templates Collaborations with other dbs, Tripal

Data and metadata standardization and exchange

17

Page 18: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

•  A central location where researchers can find and download datasets

•  Support PURLs, currently planning to use ARKs for major datasets

•  Internal IDs for derived data and for attaching metadata directly to files

•  Requires good metadata, at least semi-standardized

•  Partnering with CyVerse

Data repository

18

Page 19: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Concept: file name includes an opaque ID which links to its metadata.

19

Data repository – internal IDs

Example: Vigra_cDNA_4xGe.gff This is a file modified to meet genome browser requirements. The ID “4xGe” links to metadata with the original file, description of the genome project, and an explanation of how it was changed from the original.

Example: Vigra_genome_jU8x.fas A file containing the Vigna radiata pseudomolecule sequnce. The ID “jU8x” is linked to metadata about this file, including its original filename (required) and information about the project which produced it.

Page 20: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Concept: file name includes an opaque ID which links to its metadata.

20

Data repository – internal IDs

A file containing a Vigna angularis pseudomolecule sequence. The ID “jyYC” is linked to metadata about this file, including its original filename (required) and information about the project which produced it.

Page 21: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Concept: file name includes an opaque ID which links to its metadata.

21

Data repository – internal IDs

This is a file generated by LIS to complement this V. angularis genome dataset. The ID “3Nz5” links to metadata describing the genome dataset, and an explanation of how it this file was created.

NEED A NEW SCREEN SHOT

Page 22: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Enable sharing of development efforts, encourage good development practices, increase use of existing software.

22

Development

Page 23: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

CMapII JavaScript | In design

InterMine instances (http://mines.legumeinfo.org/) Working development instances: BeanMine, SoyMine, PeanutMine, LegumeMine Established instances: MedicMine, ThalMine

Tripal modules: BLAST, Ontology Search, QTL, Phylotree, Domain Search All in active use; QTL module will be re-written by Main lab

Context viewer JavaScript+Django | In active use.

CViTjs (whole genome viewer) JavaScript | Beta expected this month

Development

23

Page 24: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

CMapII USDA-ARS & NCGR - Steven Cannon, Andrew Farmer, Sudhansu Dash, Ethy Cannon, Alex Rice, Alan Cleary, Andrew Wilkey, David Grant

•  JavaScript

•  Will read GFF files

•  Support all CMap features + SoyBase CMap extensions

•  Handle large numbers of features

•  Would like comparative views

Development

Dorrie Main’s lab is developing a Tripal map viewer with all features from CMap, which will pull map data from Chado. Contact us if you would like to be involved with the design.

24

Page 25: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

InterMine instances (http://mines.legumeinfo.org/) NCGR - Sam Hokin & Andrew Farmer | JCVI - Vivek Krishnakumar

Development

25

BeanMine MedicMine PeanutMine SoyMine ThaleMine

Page 26: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Tripal modules: BLAST, Ontology Search, QTL, Phylotree, Domain Search

Lead developer: Lacey Sanderson (Usask)

Available and in use.

Development - Tripal

Page 27: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Tripal modules: BLAST, Ontology Search, QTL, Phylotree, Domain Search

Development - Tripal

Page 28: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Tripal modules: BLAST, Ontology Search, QTL, Phylotree, Domain Search

REST Web services: Prateek Gupta ●  GET list of target

databases ●  GET available

BLAST options ●  POST job ●  GET status ●  GET results

Development - Tripal

Page 29: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Tripal modules: BLAST, Ontology Search, QTL, Phylotree, Domain Search

REST Web services: Prateek Gupta

Status: testing and documenting.

Release: end of summer?

Next: consume CoGe BLAST Web services. (LegFed customization

Development - Tripal

Page 30: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Tripal modules: BLAST, Ontology Search, QTL, Phylotree, Domain Search ISU & Washington State - Ethy Cannon & Sook Jung

●  Preliminary QTL modules exist at CoolSeasonFoodLegumes, and

PeanutBase/LegumeInfo (adapted from the CSFL module).

●  New QTL data dictionary developed jointly by Ethy and Sook Jung with input from other groups.

●  Tripal module for the new data dictionary will be developed by Dorrie Main’s group after Tripal 3 is released.

Development - Tripal

30

Page 31: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Development - Tripal Tripal modules: BLAST, Ontology Search, QTL, Phylotree, Domain Search ISU & Washington State - Ethy Cannon & Sook Jung

31

Page 32: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Development - Tripal Tripal modules: BLAST, Ontology Search, QTL, Phylotree, Domain Search ISU & Washington State - Ethy Cannon & Sook Jung

32

Page 33: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Development - Tripal Tripal modules: BLAST, Ontology Search, QTL, Phylotree, Domain Search ISU & Washington State - Ethy Cannon & Sook Jung

33

Page 34: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Tripal modules: BLAST, Ontology Search, QTL, Phylotree, Domain Search ISU - Shivan Gunda and Ethy Cannon; idea by David Grant

Takes advantage of the structure of ontology trees to improve searching of data objects with attached ontology terms.

Development - Tripal

2. Find all children of those terms. 3. Retrieve data objects annotated with those terms.

Bonus: sibling terms provide user with related terms that might more closely match what is being sought.

Intended to be used as a library Core functionality can be used outside Tripal

34

1. Find all terms in selected ontologies that contain the search text.

Page 35: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Tripal modules: BLAST, Ontology Search, QTL, Phylotree, Domain Search ISU - Shivan Gunda and Ethy Cannon; idea by David Grant

Basic functionality: •  SetOntologies(ontology-list) •  SearchTerms(search-text) •  GetChildren(term) •  GetSiblings(term) •  GetParents(term)

Development - Tripal

35

Hope to release at the end of this summer

Page 36: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Tripal modules: BLAST, Ontology Search, QTL, Phylotree, Domain Search ISU - Shivan Gunda and Ethy Cannon

Implemented in QTL search at PeanutBase Old way: “oil” è only traits containing the word “oil”

Development - Tripal

New way: “oil” è traits containing the words “oil”, “linoleic” and “oleic”

36

Page 37: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Tripal modules: BLAST, Ontology Search, QTL, Phylotree, Domain Search ISU - Shivan Gunda and Ethy Cannon

Sibling terms can give additional hints:

Development - Tripal

37

Page 38: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Tripal modules: BLAST, Ontology Search, QTL, Phylotree, Domain Search NCGR: Pooja Umale & Andrew Farmer

Development - Tripal

Available and in use at LegumeInfo.org and PeanutBase.org.

Page 39: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Tripal modules: BLAST, Ontology Search, QTL, Phylotree, Domain Search NCGR - Alex Rice & Andrew Farmer

Development - Tripal

Available at LegumeInfo.org Ready to become a full-fledged Tripal module but first needs a volunteer to implement it at a new website.

Page 40: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

Genomic context viewer NCGR - Alan Cleary and Andrew Farmer

Django + Javascript Displays gene synteny among the species hosted at LegumeInfo.org.

Development

40

Page 41: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

CViTjs (whole genome viewer) ISU - Andrew Wilkey, Ethy Cannon & Steven Cannon

Development

An interactive JavaScript version of CViT. Software stack: RequireJS Paper,js JQuery Bootstrap

Page 42: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

CViTjs (whole genome viewer) ISU - Andrew Wilkey, Ethy Cannon & Steven Cannon

Development

An interactive JavaScript version of CViT. Status: Approaching beta. https://github.com/awilkey/cvitjs

Page 43: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

43

Iowa State University Jacqueline Campbell, Ethy Cannon, David Fernandez-Baca*, Shivan Gunda, Prateek Gupta, Wei Huang, Andrew Wilkey, Akshay Yadov

National Center for Genomic Research Joel Berendzen, Alan Cleary, Sudhansu Dash, Andrew Farmer, Sam Hokin, Alex Rice, Pooja Umale

USDA-ARS Steven Cannon, Scott Kalberer, Nathan Weeks

J. Craig Venter Institute Agnes Chan, Vivek Krishnakumar, Chris Town

CyVerse Eric Lyons

Tripal Stephen Ficklin Lacey Sanderson Main Lab Dorrie Main Sook Jung Taien Lee Chun-Haui Cheng

SoyBase David Grant Rex Nelson Kevin Feely

Page 44: GMOD in action - v1.legumefederation.org• Support PURLs, currently planning to use ARKs for major datasets • Internal IDs for derived data and for attaching metadata directly to

44

●  What are we missing that GMOD can provide?

●  What is missing in GMOD to support a federation?

●  What purpose would a centralized data repository serve and how should researchers interact with it?

Discussion