Top Banner
An Introduction to Linked Open Data for Museums David Henry Jarred Moore MW2014 Presented by
52

MW2014 Workshop - Intro to Linked Open Data

Nov 17, 2014

Download

Documents

David Henry

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 1. An Introduction to Linked Open Data for Museums David Henry Jarred Moore MW2014 Presented by

2. An Introduction to Linked Open Data for Museums 3. Limitations of Keyword Searching Polysemy: One word with multiple meanings. E.g. man crane bank Synonymy: Multiple words with the same meaning. buy OR purchase create OR make eliminate OR remove OR abolish Signal to noise ratio e.g. Try searching for the term Mississippi 4. What is Linked Open Data? On the web, open license Machine-readable data Non-proprietary format RDF Format Linked RDF 5. Copyright and Licensing If Your content files are still under copyright and your institution is the copyright owner, encourage your institution to license the content as openly as possible CCO CC-BY CC-BY-SA CC-BY-NC 6. What is RDF? Resource Description Framework (RDF) is a standard model for data interchange on the Web. RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed. (from W3C) making Statements about resources (in particular web resources) in the form of subject-predicate-object expressions. (Wikipedia) 7. What are Triples? Triples are statements of fact (or assertions) composed of a subject, predicate, and object. For example: David Henry Subject Lives in Predicate St. Louis Object 8. What are Questions Answered by RDF? Fact-Based Interpretive Theoretical Subjective Analytical 9. Fact Based Questions ask Who, What, When Where (Not so much Why) Fact-Based Questions Who directed Citizen Kane? Whats a daguerreotype? Where did Van Gogh paint Starry Night? 10. Fact Based Question: Are there any daguerreotypes of the Mississippian mounds in St. Louis, Missouri? Title: Group of people standing on a partially destroyed Big Mound. Description: Group of people standing on a partially destroyed Big Mound. Place: St. Louis, Missouri Dates: 1869 Type(s): photo, Daguerreotype Maker/Creator: Thomas M. Easterly Subjects: Mississippian Culture, mounds Identifier: PHO:17665 Permalink: http://collections.mohistory.org/resource/9952 11. Triples to Complex Graphs Thomas M. Easterly 1869Subject Mississippian Culture hasSubject hasLabel hasType Daguerreotype 12. Thomas M. Easterly Name: Thomas M. Easterly Birth Date: October 3, 1809 Death Date: March 12, 1882 Places of Residence: Guilford, Vermont Liberty, Missouri St. Louis, Missouri Bio: Thomas M. Easterly was one of the leading American Daguerreotypists . During the 1860s, improvements in photographic development caused daguerreotypes to become out of fashion. Easterly refused to acknowledge these changes believing the highly detailed daguerreotypes were far superior in terms of beauty or permanence urging the public to "save your old daguerreotypes for you will never see their like again". 13. Exercise 1. Time: 10-15 minutes Activity: Break into groups of 2-3. Write out one or more research questions. For each question, draw a entity-relationship graph that could provide an answer to the question 14. Whats Wrong with the Good Ole Web? 15. What is a Uniform Resource Identifier? Uniform Resource Locator ----- Purpose: To locate a web resource (document) Uniform Resource Name ----- Purpose: To identify any resourceIn Linked Open Data, URIs act as both URLs and URNs UR I 16. Principles of Linked Data Use URIs to denote things. Use HTTP URIs so that these things can be referred to and looked up ("dereferenced") by people and user agents. Provide useful information about the thing when its URI is dereferenced, leveraging standards such as RDF, SPARQL. Include links to other related things (using their URIs) when publishing data on the Web. To make this happen subjects and predicates MUST be defined by URIs. Objects may be URIs or literals. 17. Triples to Complex Graphs Resource:9952 Thomas M. Easterly 1839 ns1:Subject_91011 Mississippian Culture nso:hasSubject nso:hasLabel nso:hasType Daguerreotype ns1:type_80345 Resource:92142 18. Triples to Complex Graphs http://collections.mohistory.org/resource/9952 ns1:Person_5678 Thomas M. Easterly 1839 ns1:Subject_91011 Mississippian Culture nso:hasSubject nso:hasLabel nso:hasType Daguerreotype ns1:type_80345 19. What two words are most commonly found in a browser window? Web links have a half life of about ten years. In other words, 50% of links that are 10 years old are broken. 20. Document DocumentDocument DocumentDocument Link rot is a serious problem on the document-based web. 21. Person PersonObject PlaceObject createdBy createdBy knows livesAt Link rot is even more serious on the web of data. 22. Rules for persistent URIsCoolURIs No date Context No ownership context No technology context Re-use existing identifiers Link multiple representations Implement 303 redirects for real world objects NotCoolURIs Avoid stating ownership Avoid version numbers Avoid query strings Avoid file extensions 23. Example URI: http://education.data.gov.uk/ministryofeducation/id/school/123456 http://education.data.gov.uk/doc/school/v01/123456 states ownership version number good Mostly good http://www.example.com/id/alice_brown http://data.nytimes.com/88843902954064461461 24. Writing RDF RDFXML Turtle NTriples St. Louis, MO @prefix ns0: . @prefix ns1: . ns0:David_Henry ns1:livesIn St. Louis, MO . St. Louis, MO . David Henry Lives In St. Louis 25. Triples to Complex Graphs http://collections.mohistory.org/resource/9952 Resource:92142 Thomas M. Easterly 1839 ns1:Subject_91011 Mississippian Culture nso:hasSubject nso:hasLabel nso:hasType Daguerreotype ns1:type_80345 26. Graph to RDF as Turtle @prefix resource: . @prefix ns0: . @prefix xsd: . resource:9952 ns0:dateCreated "1869-01-01"^^xsd:date . resource:9952 ns0:hasType . resource:9952 ns0:createdBy resource:92142 . resource:92142 ns0:hasLabel "Thomas M. Easterly" . resource:9952 ns0:hasSubject resource:5215 . resource:5215 ns0:hasLabel "Mississippian Culture" . 27. Exercise 2. Time: 15 minutes Activity: Break into groups of 2-3. Using the graph defined in Exercise 1, define a set of triples from the graph (Use your own URIs) Use the RDF validator at http://www.rdfabout.com/demo/validator/ 28. What is Linked Open Data? On the web, open license Machine-readable data Non-proprietary format RDF Format Linked RDF 29. Principles of Linked Data Use URIs to denote things. Use HTTP URIs so that these things can be referred to and looked up ("dereferenced") by people and user agents. Provide useful information about the thing when its URI is dereferenced, leveraging standards such as RDF, SPARQL. Include links to other related things (using their URIs) when publishing data on the Web. 30. Core Vocabularies RDF & RDFS Useful terms: rdf:type, rdfs:label SKOS (Simple Knowledge Organization Schema) Useful terms: skos:broader, skos:narrower OWL (Web Ontology Language) Useful terms: owl:sameAs, owl:differentFrom Dublin Core Useful terms: dc:creator, dc:date, dc: subject Foaf Useful terms: foaf:name, foaf:knows, foaf:image 31. Ontology Thesaurus Controlled Vocabulary Vocabulary Types Simple list of terms. e.g. DCMI Types list Hierarchical list of terms e.g. Library of Congress Subjects Hierarchical list of terms with relationship constraints e.g. CIDOC CRM 32. Example using CRM Core E52 Time-Span 1898 E53 Place France (nation) E21 Person Rodin Auguste E52 Time-Span 1840 E67 Birth Rodins birth E52 Time-Span 1917 P4 has time-span E69 Death Rodins death E12 Production Rodin making Monument to Balzac in 1898 E21 Person Honor de Balzac E55 Type sculptors E84 Information Carrier The Monument to Balzac (plaster) E55 Type plaster E52 Time-Span 1925 E55 Type bronze E40 Legal Body Rudier (Vve Alexis) et Fils E12 Production Bronze castingMonument to Balzac in 1925 E55 Type companies E84 Information Carrier The Monument to Balzac(S1296) P108B was produced by P62 depicts P16B was used for P134 continued P2 has type P120B occurs after P4 has time-span P2 has type P100B died in P98B was born P4 has time -span P2 has type P14 carried out by P14 carried out by P62 depicts P108B was produced by P2 has type P7 took place at P4 has time-span 33. Implementing Linked Open Data Link existing data Low barrier to entry Controlled lists and thesauri Not very descriptive Manage data to fit an ontology High barrier to entry Ontologies Very descriptive RDF facilitates the evolution of schemas over time 34. What is RDF? Resource Description Framework (RDF) is a standard model for data interchange on the Web. RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed. (from W3C) making Statements about resources (in particular web resources) in the form of subject-predicate-object expressions. (Wikipedia) 35. Triples to Complex Graphs http://collections.mohistory.org/resource/9952 Resource:92142 Thomas M. Easterly 1839 ns1:Subject_91011 Mississippian Culture nso:hasSubject nso:hasLabel nso:hasType Daguerreotype ns1:type_80345 36. Finding Links Linked Open Vocabularies is a good starting point Other well-used sources include: DBPedia - for a wide-range of types (people, places, subjects, concepts) Id.loc.gov for name authorities and subjects Viaf.org for name authorities geonames.org for geographic locations Problem: There are no universal vocabularies 37. A Note of Caution When re-using existing URIs, be sure to use the URI that represents the entity (thing/concept/person) and not the web resource. For example: http://id.loc.gov/authorities/subjects/sh85126887.html Is NOT the same as: http://id.loc.gov/authorities/subjects/sh85126887 38. A Note of Caution When re-using existing URIs, be sure to use the URI that represents the entity (thing/concept/person) and not the web resource. 39. Finding Links Matching predicates. hasType => rdfs:type, dcterms:type, crm:E55_Type createdBy => dc:creator, crm:P94i_was_created_by dateCreated => dc:created, ? Matching value vocabularies. Daguerreotype => http://dbpedia.org/resource/Daguerreotype Mississippian Culture => http://id.loc.gov/authorities/subjects/sh85086218 Thomas Easterly => http://viaf.org/viaf/13114715/ Problem: There are no universal vocabularies 40. Triples to Complex Graphs http://collections.mohistory.org/resource/9952 Resource:92142 Thomas M. Easterly 1839 ns1:Subject_91011 Mississippian Culture dc:subject rdfs:label rdf:type Daguerreotype ns1:type_80345 41. @prefix resource: . @prefix ns0: . @prefix dc: . # dc:creator; dc:created; dc:subject @prefix rdf: . # rdf:type @prefix owl: . # sameAs; differentFrom @prefix xsd: . # date; integer resource:9952 ns0:dateCreated "1869"^^xsd:date . resource:9952 dc:date "1869-01-01"^^xsd:date . resource:9952 ns0:hasType . resource:9952 rdf:type . resource:9952 ns0:createdBy resource:92142 . resource:9952 dc:creator . resource:92142 ns0:hasLabel "Thomas M. Easterly" . resource:9952 ns0:hasSubject resource:5215 . resource:9952 dc:subject . #resource:5215 ns0:hasLabel "Mississippian Culture" . resource:92142 owl:sameAs . Graph to RDF as Turtle 42. Exercise 3. Time: 15-20 minutes Activity: Break into groups of 2-3. Using the triples you defined in Exercise 3, find existing URIs to link with your local URIs. Be prepared to explain why you chose the URIs your chose. 43. How Tos Embed schema.org data in a web page Publish static RDF files Manage local vocabularies and align them with existing vocabularies Contributing to a collection aggregator e.g. Europeana or DPLA Publish existing database records as RDF Managing RDF data in a triple (or quad) store 44. Embedding schema.org

Title:

Item: Daguerreotype

Dates: 1855 to 1865

. Copy and paste entire text 45. Publish static RDF files RDF files can be hand-written (what fun!) or rendered using templates Paths to RDF files can be submitted to RDF search engines such as Sindice (http://sindice.com) Caution: Some content negotiation would be required. Remember: http://mydomain.org/resource/1234.rdf is NOT the same as http://mydomain.org/resource/1234 46. Manage local vocabularies and align them with existing vocabularies Tools include: PoolParty Tematres Karma 47. Contributing to a collection aggregator e.g. Europeana or DPLA Service Hub Dataset A Dataset B Dataset C Service Hub Dataset 1 Dataset 2 Dataset 3 Content Hub Dataset X Dataset Y Dataset Z 48. Publish existing database records as RDF 49. Managing RDF data in a triple (or quad) store Quad = triple + context Most stores feature a SPARQL interface to query across all triples (quads) in a repository Tools: Sesame from OpenRDF Virtuoso Mulgara 50. Questions? [email protected] [email protected]