Top Banner
417 Taiwan J For Sci 26(4): 417-24, 2011 Research note Linked Open Data of Ecology (LODE): A New Approach for Ecological Data Sharing Guan-Shuo Mai, 1) Yu-Hwang Wang, 2) Yue-Joe Hsia, 1) Sheng-Shan Lu, 2) Chau-Chin Lin 2,3) SummaryThe purpose of this paper is to report and discuss the use of a linked data approach on existing related databases on forest fires, plant specimens, insect collections, forest dynamics plot censuses, and Taiwanese species checklists. We adopted the linked data approach to connect together data in- trinsically related from distributed databases. The approach developed a workflow through 4 steps to integrate and publish human- and machine-readable ecological data as linked open data on the web. Results from our work can be found at the web site http://ecowlim.tfri.gov.tw. We conclude that the linked data approach is a new way to improve and advance ecological data sharing. Key words: LTER, raw data, metadata, linked data cloud, network. Mai GS, Wang YH, Hsia YJ, Lu SS, Lin CC. 2011. Linked Open Data of Ecology (LODE): a new approach for ecological data sharing. Taiwan J For Sci 26(4):417-24. 1) Department of Natural Resources and Environment Studies, National Dong Hwa Univ. 1 Daxue Rd., Sec. 2 Shoufen Township, Hualien 97401, Taiwan. 國立東華大學自然資源與環境學系。97401 花蓮縣 壽豐鄉志學村大學路二段1 號。 2) Forest Protection Division, Taiwan Forestry Research Institute, 53 Nanhai Rd., Taipei 10066, Taiwan. 林業試驗所森林保護組,10066 台北市南海路53 號。 3) Corresponding author, e-mail:[email protected] 通訊作者。 Received March 2011, Accepted September 2011. 2011 3 月送審 2011 9 月通過。
8

Linked Open Data of Ecology (LODE): A New Approach for Ecological Data Sharing

Feb 25, 2023

Download

Documents

Ritu Mishra
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Linked Open Data of Ecology (LODE): A New Approach for Ecological Data Sharing

417Taiwan J For Sci 26(4): 417-24, 2011

Research note

Linked Open Data of Ecology (LODE): A New Approach for Ecological Data Sharing

Guan-Shuo Mai,1) Yu-Hwang Wang,2) Yue-Joe Hsia,1) Sheng-Shan Lu,2) Chau-Chin Lin 2,3)

【Summary】

The purpose of this paper is to report and discuss the use of a linked data approach on existing related databases on forest fires, plant specimens, insect collections, forest dynamics plot censuses, and Taiwanese species checklists. We adopted the linked data approach to connect together data in-trinsically related from distributed databases. The approach developed a workflow through 4 steps to integrate and publish human- and machine-readable ecological data as linked open data on the web. Results from our work can be found at the web site http://ecowlim.tfri.gov.tw. We conclude that the linked data approach is a new way to improve and advance ecological data sharing.Key words: LTER, raw data, metadata, linked data cloud, network.Mai GS, Wang YH, Hsia YJ, Lu SS, Lin CC. 2011. Linked Open Data of Ecology (LODE): a new

approach for ecological data sharing. Taiwan J For Sci 26(4):417-24.

1) Department of Natural Resources and Environment Studies, National Dong Hwa Univ. 1 Daxue Rd.,

Sec. 2 Shoufen Township, Hualien 97401, Taiwan. 國立東華大學自然資源與環境學系。97401花蓮縣

壽豐鄉志學村大學路二段1號。2) Fores t P ro tec t ion Div i s ion , Ta iwan Fores t ry Research Ins t i tu te , 53 Nanha i Rd . , Ta ipe i 10066 ,

Taiwan. 林業試驗所森林保護組,10066台北市南海路53號。3) Corresponding author, e-mail:chin@tfri .gov.tw 通訊作者。

Received March 2011, Accepted September 2011. 2011年3月送審 2011年9月通過。

Page 2: Linked Open Data of Ecology (LODE): A New Approach for Ecological Data Sharing

418 Mai et al.─Linked Open Data of Ecology

研究簡報

鍵連開放的生態:一種資料分享的新方法

麥舘碩1) 王豫煌2) 夏禹九1) 陸聲山2) 林朝欽2,3)

摘 要

本文主要敘述與討論鍵連資料技術對於生態資料分享之助益。應用鍵連資料技術將現有關連式

資料庫包括森林火記錄、植物與昆蟲標本、楠溪森林動態樣區及台灣物種名錄加以鍵連於同一資料模

型中,建立起明確的資料連結。透過四個步驟之整合與發布,資料以人機可讀的格式呈現在網際網路

上,讓生態資料更容易進行分享。本研究的結果已建立網站,網址為http://ecowlim.tfri.gov.tw。我們的結論認為鍵連資料技術是一個有助於生態資料分享的新途徑。

關鍵詞:長期生態、原始數據、元數據、數據連結雲、網路。

麥舘碩、王豫煌、夏禹九、陸聲山、林朝欽。2011。鍵連開放的生態:一種資料分享的新方法。臺灣林業科學26(4):417-24。

Ecology is a discipline that emphasizes an integrative, collaborative approach. The field has rapidly matured over the past centu-ry from small-scale, short-term observations and experiments conducted by individuals to include large-scale, long-term, multidisci-plinary projects that integrate diverse datasets using sophisticated analytical approaches(Reichman et al. 2011). As ecological research has become increasingly multidisciplinary, research has begun to use data-intensive, mul-tifaceted approaches. Therefore, the need to share data is manifest since no individual sci-entist, or even a small group of scientists, can collect all the data that are needed to address major ecological research questions (Porter 2010). Sharing data that support publications facilitates the scientific ideals of replication and building on previous work, and syntheses is an obvious benefit (Parr and Cummings 2005). Although the need for scientists of all disciplines to share data has been highlighted (Anonymous 2009), why is data sharing not yet common practice? It is known that logis-tical barriers to data sharing exist (Parr and

Cummings 2005). Fortunately, recent national and multinational investments in networking and continued gains in information technolog-ical capabilities have given rise to a complex cyber infrastructure that is rapidly increas-ing our ability to produce, manage, and use ecological data (Arzberger et al. 2004). For example, the linked-data method (Berners-Lee 2006) is a style of publishing and interlink-ing structured data on the web that provides a new approach to disseminating of scientific data for sharing and reuse (Bizer et al. 2009).

In this paper, we illustrate how the linked-data method can contribute to data sharing beyond the current metadata capable with databases on forest fires, plant speci-mens, insect collections, forest dynamics plot censuses and Taiwanese species checklists. We call it the Linked Open Data of Ecology (LODE).

Traditionally, data published on the web were made available as raw dumps in formats such as comma-separated value (CSV) or hypertext markup language (HTML) tables. These formats sacrifice much of the structure

Page 3: Linked Open Data of Ecology (LODE): A New Approach for Ecological Data Sharing

419Taiwan J For Sci 26(4): 417-24, 2011

and semantics of datasets (Bizer et al. 2009). In the field of ecology, there is a trend of shift-ing theoretical (models) and computational (simulations) paradigms into data exploration (Gary 2007). This new paradigm requests that raw data be put on the Internet for a uni-fied accessing standard; knowledge should be objectively provided with “meaningful” links to related resources such as literature and raw data. Linked data (Berners-Lee 2006), a new concept to promote access to various sources of data on the Internet called the Web of Data, provides a data-exploring paradigm which is a compelling approach to disseminating data, and their sharing and reuse in the ecology field.

Linked data refers to a set of best prac-tices for publishing and interlinking struc-tured data on the web in a machine-readable way, lowering barriers to utilizing data from different sources by creating meaningful links (relations) and following four linked-data principles (Berners-Lee 2006, Heath and Bi-zer 2011) to make data easily distinguishable and accessible. These principles are: 1. using universal resource identifiers (URIs) as names of things, 2. using HTML URIs so that people can look up those names, 3. providing useful information using the standards when some-one looks up a URI, and 4. including links to other URIs, so that they can discover more things. This new method improves current heterogeneous ecological data and metadata specifications which are not amenable to au-tomated interpretation by computers (Reich-

man et al. 2011).The new method uses a resource de-

scription framework (RDF) as the data for-mat standard. The RDF provides a generic, graph-based data model encoding data in typed statements called triplets which include subjects, predicates, and objects (Bizer et al. 2009). Figure 1 shows an example of an RDF link that specifies the relationship between 1 species concept of maple (Acer amplum) and the taxonomic concept of class Magnoliopsi-da to which it belongs. It states that a resource (represented by Acer amplum) identified by the URI <http://ecowlim.tfri.gov.tw/lode/re-source/taif/Species/Acer_amplum> (angle brackets are conventionally used to quote URIs) is linked to another resource identified by the URI <http://dbpedia.org/resource/Mag-noliopsida> (representing the class Magno-liopsida) with the predicate <http://ecowlim.tfri.gov.tw/lode/resource/eco/class>, which means that any resource in the subject posi-tion (i.e., domain) of a triplet using this predi-cate is a member of the resource in the object position (i.e., range), and the object must be an instance of the taxonomic concept “class” (in this example “Magnoliopsida”).

We adopted the linked-data approach to connect together 4 databases originally dis-tributed in the fire ecology of Taiwan Forestry Research Institute (TFRI) (Firedb), herbarium of TFRI (Taif), insect collection of TFRI (Flyhorse), and Catalogue of Life in Taiwan (TaiBNET, http://taibnet.sinica.edu.tw) (Table 1). In addition, we also used the metadata

Fig. 1. Example of a resource description framework link showing the model encoding data in the form of a subject, predicate, object triplet.

Page 4: Linked Open Data of Ecology (LODE): A New Approach for Ecological Data Sharing

420 Mai et al.─Linked Open Data of Ecology

document from the TFRI Research Data Cata-logue which uses Ecological Metadata Lan-guage (EML) as the standard to describe For-est Dynamics Plot census of Nanshi Survey (FDP-NS) by Providence Univ. (Taichung, Taiwan). Figure 2 illustrates the summation of the 4-stage workflow for integrating and publishing scientific data as linked open data on the web. All of these datasets and metadata generally include events (e.g., projects, forest fires, specimen collections, and measuring events), where events occurred (e.g., coun-tries and other classes of locations), when events occurred, who was involved in them, what was produced as events progressed, and how concepts were processed (e.g., taxonom-ic concepts of the specimens categorized into different classes).

Results from our work can be found at the website (http://ecowlim.tfri.gov.tw). The website provides users with 4 different func-tions: to follow links to browse data we have published, to search for data either through text queries or the convex hull on the map, and to find relations between terms on the linked-data cloud. Figure 3 shows a webpage containing transformed, human-readable triplets that describe a resource within datas-ets we published. The example displays that the user finds a plant species, Acer amplum, which is archived in the TFRI herbarium. Figure 3 also shows the 4 principles of linked open data mentioned above. In the upper-left

corner, a URI represents the name of the sub-ject (Acer amplum) and can be looked up. The central part shows that the related information is provided when someone looks up this URI. Furthermore, this resource is linked to re-sources of other datasets to let users discover more things through links. If users want to get documents containing machine-readable raw data of a resource in the RDF triplets, they can click on the RDF icon in the upper-right corner.

In addition to simply browsing, we de-veloped simple web applications to link data published on the Web of Data. First, users can either use a full text search by species name or directly input an URI of a resource to find synonyms and related resources of species and distributions. Figure 4 shows results of searching for a kind of lady beetle, Cocci-nella septempunctata. The first part contains 15 names that the general public or taxono-mists would use. These names come from 6 resources about the species found on the Web of Data, and the occurrence of this species is recorded as North America and Taiwan based on the truth interlinked on the Linked Open Data cloud. Second, in order to display differ-ent facets of the integrated data, we provide another search method based on geo-spatial query functions of SPARQL on our site. Fig-ure 5 shows the results of the search. The user just encloses an area in a convex hull made up by 4 points on the map, and the system re-

Table 1. Descriptions of 5 relational databases used in linked dataDatabase name Subject No. of records SourceFiredb Forest fire records of Taiwan 2623 TFRI fire labTaif Herbarium of TFRI 105,130 TFRIFlyhorse Insect Collection of TFRI 114,002 TFRIFDP-NS Forest Dynamics Plot Census 663,224 Department of Ecology of Nanshi National Forest Providence Univ.TaiBNET Checklist of Taiwanese Species 84,557 Biodiversity Research Center, Academia Sinica, Taiwan

Page 5: Linked Open Data of Ecology (LODE): A New Approach for Ecological Data Sharing

421Taiwan J For Sci 26(4): 417-24, 2011

turns different kinds of event data with coor-dinates as different-colored circles within the boundary. The user can then click on a circle to browse the detailed information.

The last function of our website is rela-tion finding using RelFinder (http://code.google.com/p/relfinder/). Users choose 2 or more terms, and the application draws a graph to demonstrate relationships between them. Figure 6 shows that the user chooses an insect and a plant to find their relationships, and the result is an overlap of their distributions

based on published LODE datasets.In terms of data sharing, providing suf-

ficient context of data (metadata) that col-laborators can comprehend and effectively apply is critical. However, different metadata specifications cause difficulty in the exchange of data. Through our test of the linked-data approach, it was proven that we do not have to worry about questions like “which metada-ta specification is more suitable to our data”, or “is it either too simple to be useful or too complex for ecologist to use”. Therefore, we

Fig. 2. Workflow of the Linked Open Data of Ecology (LODE) showing the step-by-step process of data preparation, data repository, and data publishing for use on the web.

Page 6: Linked Open Data of Ecology (LODE): A New Approach for Ecological Data Sharing

422 Mai et al.─Linked Open Data of Ecology

Fig. 3. Webpage displaying that the user follows links to browse the species Acer amplum. The upper-left corner of the page shows the endpoint of the species information. The central part of the page is detailed information on this species including a digitized specimen. In the upper-right corner is the resource description framework triplet.

Fig. 4. Webpage displaying how a user can use text to query a species by its scientific name. The result returns 15 names that have been used by taxonomists. Six resources for this species were found, and the occurrence of this species is recorded as North America and Taiwan.

concluded that the linked-data approach is a new way to improve and advance ecological data sharing and integration.

However, the linked-data method has its

own drawbacks. Lacking concept descriptions of the dataset, dataset-level interlinking meta-information, semantics to detect inconsisten-cies, and schema-level integration hinders the

Page 7: Linked Open Data of Ecology (LODE): A New Approach for Ecological Data Sharing

423Taiwan J For Sci 26(4): 417-24, 2011

Fig. 5. Webpage showing how the user encircled a boundary on the map to search for data within the area (left), and results of the search in different-colored circles (right).

Fig. 6. Web page illustrating results of a user choosing the name of an insect and the name of a plant to find any relationships between these 2 terms. The lines indicate the distribution (places found) of these 2 species. The overlay of the distributions shows the connections.

progress of utilizing the Web Of Data (Jain et al. 2009). Resource identities and data prove-nances are also critical issues among commu-nities which need to be improved (McCusker and McGuinness 2010). In addition, the scale of billions of triplets is hard to process as a whole for machines that are not very power-ful. The problems to be solved in linked-data communities are very similar to problems in ecological data integration.

ACKNOWLEDGEMENTS

The authors thank the forest fire lab, entomology lab, and herbarium of TFRI, and the Department of Ecology, Providence Univ. for allowing the use of their datasets in this test. The Checklist of Taiwanese Spe-cies maintained by the Biodiversity Research Center, Academia Sinica, Taiwan which pro-vided downloads of the name list file and the

Page 8: Linked Open Data of Ecology (LODE): A New Approach for Ecological Data Sharing

424 Mai et al.─Linked Open Data of Ecology

research funding from the National Science Council (NSC99-2621-B-054-001) support-ing this study are hereby acknowledged.

LITERATURE CITED

Anonymous. 2009. Data’s shameful neglect. Nature 461:145.Arzberge P, Schroeder P, Beaulieu A, Bowk-er G, Case K, Laaksonen L et al. 2004. An international framework to promote access to data. Science 303:1777-8.Berners-Lee T. 2006. Linked data - design is-sues. Available at http://www.w3.org/DesignIs-sues/LinkedData.html. Accessed 17 March 2011.Bizer C, Heath T, Berners-Lee T. 2009. Linked Data-the story so far. Int J Semant Web Inf 5(3):1-22.Gary J. 2007. Jim Gary on eScience: a trans-formed scientific method. In: Hey T, Tansley S, Tolle K, editors. The fourth paradigm: data-intensive scientific discovery. Redmond, WA:

Microsoft Research. p 1-16.Heath T, Bizer C. 2011. Linked Data: evolv-ing the web into a global data space. Princeton, WI: MC Publishers. 136 p.Jain P, Hitzler P, Yeh PZ, Verma K, Sheth AP. 2009. Linked data is merely more data. In: Linked data meets artificial intelligence. Lon-don, UK: AAAI Press. p 82-6.McCusker JP, McGuinness DL. 2010. To-wards identity in linked data. In: Proceedings of OWL Experiences and Directions Seventh Annual Workshop. Karlsruhe, Germany. Avail-able at http://tw.rpi.edu/wiki.tw/images/8/8e/Owled2010-sameas.pdf. March 17, 2011.Parr CS, Cummings MP. 2005. Data sharing in ecology and evolution. TREE 20(7):362-3.Porter JH. 2010. A brief history of data shar-ing in the US long-term ecological research network. Bull Ecol Soc Am 91(1):14-20.Reichman OJ, Jones MB, Schildauer MP. 2011. Challenges and opportunities of open data in ecology. Science 331:703-5.