Organizing Scientific Competitions on the Semantic Web Sayoko Shimoyama, Robert Sidney Cox III, David Gifford and Tetsuro Toyoda Integrated Database Unit, Advanced Center of Computing and Communication (ACCC), RIKEN, Japan DEXA2013, August 27
Jul 24, 2015
Organizing Scientific Competitions on the Semantic Web
Sayoko Shimoyama, Robert Sidney Cox III, David Gifford and Tetsuro Toyoda Integrated Database Unit, Advanced Center of Computing and Communication (ACCC),
RIKEN, Japan
DEXA2013, August 27
» Data repositories and directories for open data help users register their data resources and locate related data; such as the CKAN (Comprehensive Knowledge Archive Network) web-based system for storage and distribution of data.
» However, the act of separating data from applications on the web makes collaboration between data and applications invisible; so contributions to opening and maintaining data are not evaluated as appropriately as contributions to Apps.
?
This situation does not motivate people to contribute by donating their own datasets.
invisible
Au
gust
27
, 20
13
» To overcome this situation, we developed LinkData as a data publishing platform and LinkDataApp as an application publishing platform.
» We combined them by automatically recording dependency graphs that relate Data and Apps using that data;
This cycle enhances a wide range of synergistic collaborations.
Create App for Data
Create Data to use with App
Au
gust
27
, 20
13
1. Support functions for creating table data to upload:
» Users can create a template by inputting metadata using LinkData’s GUI and downloading it.
» Schema of all published Data can be reused for publishing new datasets.
» Users input their data to this template to create their own table data for uploading.
create new
reuse
template
Au
gust
27
, 20
13
phone number
number of books
Central library
045-111-1111
154,265
West library
045-222-2222
65,489
South library
045-333-3333
98,548
Central library
045-111- 1111
phone number
154,265 number of books
Central library
045-111- 1111
phone number
154,265 number of books
Central library
045-111- 1111
phone number
154,265 number of books
2. Conversion to RDF and publishing:
» Template data tables can be uploaded, converted to RDF, and published online at LinkData.org.
Au
gust
27
, 20
13
3. Application development support function:
» Application developers can access Data Content using LinkData platform provided APIs.
» 8 formats are provided:
• TSV
• RDF/Turtle
• RDF/JSON
• RDF/XML
• RSS
• KML
• R (for statistical analysis)
• Simple Data Format
Due to these functions, LinkData supports not only
publishing data but also using data.
Au
gust
27
, 20
13
1. Create a new App by editing a sample program:
» Users choose Data as an input and edit sample JavaScript programs on a web browser to develop their own original App.
2. Fork an App to publish as a new one:
» Published Apps on LinkDataApp can be forked.
» Users can fork and modify the program to publish it as a new App.
3. Change input Data to create a new
App:
» Even a non-programmer can add new functionality to an App by changing the Input Data.
Choose input Data and create
Change input Data
Fork
JavaScript Editor Au
gust
27
, 20
13
Entity Definition
Data A single data set which has been published by a User in LinkData
Application (App)
A single application which has been published by a User in LinkData
User A user who had registered for a LinkData account
Graph Term Label Definition
Data(new) → Data(old) Reuse Ldd Create new Data by reusing existing Data
Data → User Contributed Ldu
The relationship between Existing Data and the user who created the Data
App(new) → App(old) Fork Laa Create a new App by reusing an existing App’s program code
App → Data Load Lad
Create an App by specifying some files as input from some particular Data
App → User Contributed Lau
The relationship between an Existing App and the user who created the App
User(A) → User(B) Follow Luu
User A follows user B to receive updates and information of evaluated Data and Apps by user B
User → Data Vote Lud A user gives a rating of Useful or Un-useful for considered Data
User → App Vote Lua A user gives a rating of Useful or Un-useful for a considered App
Au
gust
27
, 20
13
» Count of hosted Data and Apps in LinkData (as of August, 2013)
» Count of relationships among Data, Apps and Users in LinkData
Kind of relationship Count Load (App to Data) 1508
Fork (App to App) 153
Reuse (Data to Data) 41
Follow (User to User) 54
Vote (User to Data) 279
Vote (User to App) 108
655 316
There is a stronger synergy cycle between data resources and applications than “in data” (between data and data) or “in app” (between app and app).
Au
gust
27
, 20
13
Example dependency graph among Data, Apps and Users.
» Dark Green edges indicate Data to Data reuse
» Red edges indicate Data to App loading
» Blue edges indicate App to App forking
» Bright Green edges indicate User ownership “contribution”
» Grey edges indicate votes to rate applications by users, and following of other users
The dependency graph allows users to dynamically contribute to and benefit from an automated rating of both data and applications.
Interactive Gene Association Matrix application created on LinkDataApp http://app.linkdata.org/app/app1s64i
Au
gust
27
, 20
13
Organizing Scientific Competitions on the LinkData platform
» For the synthetic biology competition GenoCon2 (http://genocon.org) , we challenged participants to design novel regulatory DNA for controlling gene expression in the thale cress plant Arabidopsis thaliana.
» In addition to DNA sequences, we offered programs for DNA design.
Au
gust
27
, 20
13
PromoterCAD : Data Driven Design of Plant Regulatory DNA
» To allow non-experts an opportunity for DNA design we built a computer aided design tool on the LinkData platform, called PromoterCAD.
» Using PromoterCAD function modules, genes with the desired properties can be found and mined for regulatory motifs. These are introduced into the synthetic promoter by user choice of regulatory position. Repeating this process can create complex regulation at the promoter.
» Finally, the DNA design is exported for error and safety checking, DNA synthesis, and experimental characterization.
Au
gust
27
, 20
13
http://app.linkdata.org/app/app1s335i
PromoterCAD LinkData system architecture for DNA design incorporates database information with user knowledge
» PromoterCAD uses several data sources for Tissue / Time specific promoter design.
Au
gust
27
, 20
13
fork
add
Users can add their own data suited to promoter design.
create new
Users also can create a new App or fork a pre-existing App for design.
Here we show the cycle enhancing synergy of collaboration in this web-based scientific competition for synthetic biology promoter design.
This graph shows interaction between Data (Green box), Apps (Blue box), and Users (Grey box).
» Dark Green edges indicate Data to Data reuse
» Red edges indicate Data to App loading
» Blue edges indicate App to App forking
» Bright Green edges indicate User ownership “contribution”
» Grey edges indicate votes to rate applications by users, and following of other users
The App “GenoCon PromoterCAD” at http://app.linkdata.org/app/app1s94i is shown in the graph. The Dataset http://linkdata.org/work/rdf1s339i “Speedup Lists of Developmental Coexpression” is a source for this graph
Au
gust
27
, 20
13
• For example, highly voted application ID:137 “A Promoter Design to Maintain the Fertility of Transgenic Plant by new Plugin MotifRanking” is a fork of ID:94 PromoterCAD.
• This example graph shows that ID:94 forked by 6 apps and voted for by 1user.
• It shows ID:137 forked 0 times and voted for by 5 users for a score of 5.
• In this fashion each app can be compared for total activity and usefulness in turn.
6 forks
1 vote
5 votes
0 forks
Au
gust
27
, 20
13
LinkData Application app1s137i showing usability ranking and user voting buttons on top right. http://app.linkdata.org/app/app1s137i
GenoCon2 Contest Activity:
» There are over 40 international submissions including from the USA, Egypt and Japan.
» Users cooperated to create original designs that were modified and possibly improved by other users.
» Team collaboration was aided by the open nature of the design platform; 13 promoter designs are being considered for final construction in transgenic plants.
Au
gust
27
, 20
13
The semantic dependency-graph-based system with evaluation by experiment will foster a rapid biological knowledge cycle where programmers, researchers, and amateurs can all contribute.
» A scientific competition was successfully organized on the LinkData platform that records dependency graphs among datasets and applications.
» It was found that participants in the competition generated many dependency graphs by forking pre-existing applications or reusing schema of pre-existing datasets.
» These creative activities could not be observed explicitly without being recorded, such as by dependency graphs among datasets and applications on the platform.
» Hence, we suggest a worldwide system needs to be established to record and harvest such dependency graphs from distributed data platforms and application-development platforms around the world, so that our intellectual and creative activities using open datasets for application development may be recorded properly.
Au
gust
27
, 20
13
Dr. Takaho Endo for creating biological visualization tool on LinkDataApp. Ms. Yuko Yoshida for development of converter and valuable discussion. Dr. Shuji Kawaguchi for giving advice on the score calculation. Dr. Koro Nishikata for testing LinkData functions. Dr. Masahiro Mochizuki for testing and adding the MotifRanking tool.
Mr. Chanaka Perera, Mr. Uditha Punchihewa, Mr. Gayan Hewathanthri, Mr. Hiroaki Osada, Mr. Kazuro Fukuhara and Mr. Kiyoshi Mizumoto (Axiohelix Co., Ltd.) for web application and LinkData development.
The committee of Linked Open Data Challenge Japan for continuing interest and encouragement.
This work was supported by: The National Bioscience Database Center (NBDC) of the Japan Science and Technology Agency (JST).
REFERENCES » F. Manola, et al.: RDF Primer W3C Rec. (2004)
» E. Prud’hommeaux, et. al.: SPARQL Query Language for RDF. W3C Candidate Rec. (2006)
» T. Toyoda, et al.: “Methods for Open Innovation on a Genome – Design Platform Associating Scientific, Commercial, and Educational Communities in Synthetic Biology,” Methods in Enzymology., Vol. 498, 189-203, (2011)
» R. S. Cox III, K. Nishikata, S. Shimoyama, T. Toyoda et. al.: “PromoterCAD: data-driven design of plant regulatory DNA,” Nucl. Acids Res. 41 (W1): W569-W574, (July 2013)
Au
gust
27
, 20
13