Hideaki Takeda, Fumi Kato / National Institute of Informatics LOD Application Exemplar - A case study: LODAC Museum Hideaki Takeda Fumi Kato National Institute of Informatics takeda @ nii.ac.jp 2012 INTERNATIONAL ASIAN SUMMER SCHOOL IN LINKED DATA IASLOD 2012, August 13-17, 2012, KAIST, Daejeon, Korea
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Hideaki Takeda, Fumi Kato / National Institute of Informatics
LOD Application Exemplar- A case study: LODAC Museum
Hideaki TakedaFumi Kato
National Institute of Informaticstakeda@ nii.ac.jp
2012 INTERNATIONAL ASIAN SUMMER SCHOOL IN LINKED DATA IASLOD 2012, August 13-17, 2012, KAIST, Daejeon, Korea
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Aim of this talk
• How to plan, design, and implement LOD?• Learn from the case
Hideaki Takeda, Fumi Kato / National Institute of Informatics
LODAC Project• Open Social Semantic Web Platform for Academic
Resources– Providing platforms for Linked Open Data– Practicing data accumulation and publishing
• Interested Areas– Museum information– Geographical information, especially geographical names– Local information– Taxonomic information on species– …
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Museum data as LOD
• The state-of-the-art of museum information in Japan (nearly 6,000 museums in Japan)– Distributed
• Self maintained• Isolated
– Opaque• Self designed• Messy
• Aggregating and associating museum information– LODAC-Museum
Hideaki Takeda, Fumi Kato / National Institute of Informatics
LODAC Museum – Main work
• Gathering of data– Thesaurus, museum collections, etc
• Standardization of data– Representing data from different sources in a
unique form• Integration of data– Identifying data– Associating the same data
• Consuming of data
Hideaki Takeda, Fumi Kato / National Institute of Informatics
LODAC Museum Architecture
Gathering of data
Standardization of data
Integration of dataConsuming of data
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Gathering data
• No museums publish data as LOD!• We use data published as Web pages– Scrape and translate data– License is not clear • It is a serous problem• We need permission from every site in principle• We got permission from some data publishers not all
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Gathering data
• No museums publish data as LOD!• We use data published as Web pages– Scrape and translate data– License is not clear • It is a serous problem• We need permission from every site in principle• We got permission from some data publishers not all
Hideaki Takeda, Fumi Kato / National Institute of Informatics
DatasetType No. Data source
Art work (lodac:Work)
ca.80,000 Catalog of the collections of 3 National Art Museum (25,180), National Museum of Western Art (4,373), Tokushima Pref. Art Museum (18,482) … over 100 museums
Database for National Treasure & Important Cultural Property of National Designated (915)
The Japanese Art Thesaurus (266)Specimen (lodac:Speciment)
ca.1,690,000 (100+ Museum collections)Science Net (National Science Museum)
Person (foaf:Person) ca. 8,800 The Japanese Art Thesaurus
Facilities (icls. Museum)
ca. 200,000 The Japanese Art ThesaurusCultural Heritage OnlineGIS data National and Regional Planning Bureau
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Extracting collection data from museum websites
Extract
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Extract
Extracting collection data from museum websites
Property Value
Property Value
Hideaki Takeda, Fumi Kato / National Institute of Informatics
13
Standardization of dataRe-organized common metadata.
Raw Data
dc:title
crm:P45_consistOf
skos:preflabel
lodac:era
Re-organized Metadata
Current organized policies・ Use existing metadata・ Define own metadata.
....
Hideaki Takeda, Fumi Kato / National Institute of Informatics
14
Namespaces
Prefix Metadata Name
crm CIDOC-CRM
dc11 Dublin Core 1.1
dc DCMI Terms
skos Simple Knowledge Organization System
rdfs Resource Description Frame Work Schema
foaf Friend of a Friend
rda2 Resource Description and Access
lodac LODAC Project
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Metadata schema for works
lodac:Work PropertyGenre lodac:genreType of cultural assets lodac:culturalAssetsCreator dc:creator / dc11:creatorNationality crm:P7_took_place_atTitle dc:title / skos:prefLabelTitle Pronunciation (yomi) dc:title @ja-hrkt / skos:altLabelTitle in English dc:title @en / skos:altLabelInscription crm:P62I_is_depicted_bySeal crm:P65_shows_visual_itemNo. of parts crm:P57_has_number_of_partsCollection dc:isPartOfCreated year dc:createdEstimated starting year lodac:estimatedStartYearMaterial dc:medium / crm:P45_consists_of
Hideaki Takeda, Fumi Kato / National Institute of Informatics
(Ref-resource)Creator’s reference
(ID-resource)Creator’s information
dc:references dc:references
(Ref-resource)Creator’s reference
Integrating Data
• How to integrate data from different sources – sharing of responsibility• Each source is responsible for its data
– Identifying IDs for data and managing data with the IDs
• LODAC is only responsible for integration– Assigning original IDs and associating other IDs to them
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Integrating Data
Data from Source BIntegrated data
dc:references dc:references
dc:references dc:references
dc:references dc:references
dc:creatordc:creator
crm:P55_has_current_location
crm:P55_has_current_location
crm:P55_has_current_locationdc:creator
Data from Source AWork
Museum
Creator
Minimum Data to identify entitiesRaw Data for entities Raw Data for entities
Hideaki Takeda, Fumi Kato / National Institute of Informatics
Integration of Person Data• Matching of Creators– Base: List of Artists from Thesaurus of Japanese Art– Target: Creators of collection in museums + Dbpedia– Method: String match of names– Results: Links from artist nodes to work nodes are added
LODAC data
Link to Work
DBpedia
Basic Information for Creators
Links
Hideaki Takeda, Fumi Kato / National Institute of Informatics
19
Integrating DataIntegrate Item Source Amount
of DataIntegration
Data
FacilitiesA.Japanese Art Thesaurus 648
77B.Cultural Heritage Online 915
Title of important cultural properties
A.Japanese Art Thesaurus (Art work) 3,80074
B.DB for National Treasure (Art work) 10,115
Creator information and Work Title
A.Japanese Art Thesaurus (Creator) 1,33215,020
B.All of art work (Work title string) 61,861
Creator nameA.Japanese Art Thesaurus (Creator) 1,332
615B.All of art work title(using creator name) 61,861
Hideaki Takeda, Fumi Kato / National Institute of Informatics