Applications of RDFS-Plus Look at projects setting up infrastructure for particular web communities RDFS-Plus is used in the models that describe data in these communities Not in the everyday use in these communities Projects originally based on RDF because of the inherently distributed nature of their requirements As the projects evolved, the need arose to describe the relationships between resources formally Whence RDFS then RDFS-Plus
102
Embed
Applications of RDFS-Plus Look at projects setting up infrastructure for particular web communities RDFS-Plus is used in the models that describe data.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Applications of RDFS-Plus Look at projects setting up infrastructure for particular web
communities
RDFS-Plus is used in the models that describe data in these communities
Not in the everyday use in these communities
Projects originally based on RDF because of the inherently distributed nature of their requirements
As the projects evolved, the need arose to describe the relationships between resources formally
Whence RDFS then RDFS-Plus
SKOS SKOS (Simple Knowledge Organization System) was developed by
the Institute for Learning & Research Technology (Univ. of Bristol)
Provides a way to represent semi-formal knowledge organization systems (KOSs) in a distributed and linkable way
Knowledge organization systems Thesauri, taxonomies, folksonomies
Taxonomy A taxonomy, or taxonomic scheme, is a particular classification ("the
taxonomy of ..."), arranged in a hierarchical structure
Anthropologists: Taxonomies are generally embedded in local cultural and social systems, and serve various social functions
Folksonomy
Folksonomy: a portmanteau of folk and taxonomy
A system of classification derived from the practice and method of collaboratively creating and managing tags to annotate and categorize content
Folksonomies became popular on the Web around 2004
Part of social software applications such as social bookmarking and photograph annotation
Tagging is one of the defining characteristics of Web 2.0 services
Allows users to collectively classify and find information
Some websites include tag clouds as a way to visualize tags in a folksonomy
An empirical analysis has shown that consensus around stable distributions and shared vocabularies does emerge
Even in the absence of a central controlled vocabulary
Thesaurus First modern thesaurus: Roget's Thesaurus (1852), entries listed
conceptually rather than alphabetically
A thesaurus doesn’t
define words
give a complete list of all the synonyms for a particular word
Entries are also for drawing distinctions between similar words and helping choose exactly the right word
Information retrieval thesauri are formally organized, making relationships between concepts explicit
Terms are sometimes placed in context to help the user draw distinctions
Terms generally arranged hierarchically by themes, topics or facets
Typically focus on one discipline, subject or field of study.
Follow international standards
The key difference between SKOS and thesaurus standards is its basis in the Semantic Web
Designed to allow modelers to create modular knowledge organizations, reused and referenced across the web
Not meant to replace thesaurus standards but to augment them by bringing in the distributed nature of the Semantic Web
A design goal is to allow any thesaurus standard to be mapped straightforwardly to SKOS
SKOS provides a low-cost migration path for porting existing organization systems to the Semantic Web
Also provides a lightweight, intuitive conceptual modeling language for developing and sharing new KOSs
SKOS can also be seen as a bridging technology between
the rigorous logical formalism of ontology languages such as OWL and
the chaotic, informal and weakly-structured world of Web-based collaboration tools—social tagging applications
SKOS Layers SKOS Core: The most mature, maps directly to the thesaurus
standards
SKOS Mapping: Provides a vocabulary to express matching (exact or fuzzy) of concepts from one concept scheme to another
SKOS Extensions: Provide ways to declare relationships between concepts with more specific semantics than the simple "broader-narrower"—e.g., class-instance or partitive relationships
SKOS Mapping and SKOS Extensions are in standby mode until SKOS Core is completed as a W3C Recommendation
Example Fig. 1: A fragment of
the UK Archival System rendered in SKOS
7 concepts related by SKOS-Core properties
Data properties shown in boxes
Each property is defined in relation to other properties Allows useful
Background and Further Developments AGROVOC is being converted
from a term-based knowledge organization system with traditional thesaurus relationships
to a concept-based, OWL-based system, the AGROVOC Concept Server (CS)
Allows the representation of more semantics
Export ontology from OWL format to RDF, XML, TBX, SKOS, and SQL format
An import functionality is envisaged
TBX: TermBase eXchange, a LISA standard republished as ISO 30042
Allows for the interchange of terminology data including detailed lexical information
Based on TM standards
A translation-memory (TM) system stores already translated words, phrases and paragraphs to aid human translators
The Localization Industry Standards Association (LISA)
International forum for organizations doing business globally
Helps governments, NGOs, and multinational corporations implement best practice and language technology standards
Provides info about managing multiple language content efficiently to communicate effectively across cultures
Closed in 2011; the European Telecommunications Standards Institute started an Industry Specification Group (ISG) for localization
The (AGROVOC) Concept Server Workbench (ACWB)
http://naist.cpe.ku.ac.th/agrovoc/ [Dead]
A web-based environment for managing the AGROVOC CS
Accessible to everyone, facilitates collaborative editing
Serves as a pool of agricultural concepts
Starting point for developing specific domain ontologies, where multilingualism and localized representation of info are important
Updated as VocBench, a web-based, multilingual, editing and workflow tool that manages thesauri, authority lists and glossaries using SKOS-XL (see below)
http://vocbench.uniroma2.it/
Validation
People have different background knowledge and their own ways to construct ontologies
Every action (add/edit/delete of a concept/term/relationship) must be approved by 2 types of users: ‘validators’ and ‘publishers’
Consistency Check
The system automatically checks (OWL reasoner) the data
Returns inconsistencies, which are manually fixed
SKOS-XL SKOS eXtension for Labels
See W3C, SKOS Simple Knowledge Organization System eXtension for Labels (SKOS-XL) Namespace Document - HTML Variant, Recommendation, 2009
http://www.w3.org/TR/skos-reference/skos-xl.html
We want to attached metadata to individual terms (not concepts)
But the basic unit of what SKOS manages isn’t a term (what taxonomy management software always managed before) but a concept
Makes internationalized vocabularies much easier to manage
E.g., I can have a single concept with
a German preferred label of "Spirituosen",
a British English preferred label of "spirits",
an American English preferred label of "liquor", and
an American alternative label of "booze"
They all refer to the same concept.
SKOS's extensibility means that you can attach all the metadata you want to a particular concept
But not to a term defined as a label for that concept—labels are strings (“lexical entities”)
SKOS is built on RDF
In RDF triples, strings (literals) can’t be subjects
How can we assign metadata about the labels themselves—e.g.,
the name of the person who added a particular label, or
the date it was last updated?
SKOS-XL defines variations on the SKOS skos:prefLabel and skos:altLabel properties
skosxl:prefLabel and skosxl:altLabel
These extension properties have as range the skosxl:Label class
Members of this class have a skosxl:literalForm property to identify a string that serves as a label for the concept
It can have all the additional properties you want
Next slide: some Turtle syntax for a SKOS-XL representation of the concept described above
Also has :lastEdited and :myCustomProperty properties adding metadata to some of the labels
It’s useful to consider the members of a group as instances of a class
foaf:membershipClass links a group to a class—e.g.,
:British_Monarchy foaf:membershipClass :Monarch .
FOAF specifies that, e.g., any individual of type :Monarch should appear as a member of group :British_Monarchy and
any member of group :British_Monarchy should have type :Monarch
So the following should be inferred from the above
:Anne a Monarch .
:George_I a Monarch .
:George_II a Monarch .
:George_III a Monarch .
The distinction between individual :British_Monarchy and class :Monarch is subtle
RDFS class :Monarch relates to schematic things about monarchs: property domains, subclasses, etc.
:British_Monarchy relates to the institution of the monarchy itself, referring to things like books about it, its webpages, …
In our examples, we’ve kept the world of classes separate from the world of instances
The only relationship has been rdf:type
Expressing a relationship where we view something
sometimes as an instance (e.g., instance :British_Monarchy of foaf:Group) and
sometimes as a class (e.g., class :Monarch of all instances that are foaf:member of this group)
is an example of meta-modeling
Discussing meta-modeling requires OWL constructs we’ll come to later
Then formalize the relationship between foaf:Group and foaf:membershipClass
And show how the above triples are inferred
Things People Make and Do People create things: books, webpages, works of art, companies,
organizations, …
Two FOAF properties relate people to their creations: foaf:made and foaf:maker
foaf:made rdfs:domain foaf:Agent .
foaf:made rdfs:range owl:Thing .
foaf:maker rdfs:domain owl:Thing .
foaf:maker rdfs:range foaf:Agent .
foaf: made owl:inverseOf foaf:maker .
Property foaf:publications relates a foaf:Person to any foaf:Document published
But FOAF doesn’t specify that a person foaf:made their foaf:publications
Still, in the spirit of the AAA principle, we can assert
foaf:publications rdfs:subPropertyOf foaf:made .
Identity in FOAF If someone else wants to say something about me, how will he refer
to me?
RDF uses URIs to uniquely denote things it describes
This is a simple, elegant, and standard solution to this problem
But it’s inadequate for FOAF
It isn’t common on the Web for people to have their own personal URIs for describing themselves
To lower the barriers to adopting FOAF, need a way to refer to one another that uses some part of the Internet that’s ubiquitous and familiar
The clearest answer is e-mail address
It isn’t a problem if someone has 2 or more e-mail addresses or if one e-mail address is valid for only a limited period
All FOAF requires is that another person doesn’t share the address (simultaneously or later)
Express the role foaf:mbox plays in identifying individuals with
foaf:mbox a owl:inverseFunctionalProperty .
And, similarly, all chat space IDs, homepage, and foaf:weblog are also inverse functional properties
But publishing someone’s e-mail address violates privacy
FOAF also offers an obfuscated version of foaf:mbox, called foaf:sha1sum
The result of applying the SHA-1 hash function to the e-mail address
But FOAF doesn’t offer a standard way to obfuscate the other identifying properties
Knows FOAF provides a single, high-level property for linking one person to
another hence as the basis for a social networking system—foaf:knows
The only triples defined for foaf:knows declare its domain and range to be foaf:Person
foaf:knows is designed to be vague
The relation could be derived from other info
E.g., coauthors are generally assumed to know each other
We usually assume that, if A knows B, then B knows A
But symmetry was intentionally left out for foaf:knows
SKOS and FOAF Both exploit the distributed nature of RDF to allow extension to a
network of info to be distributed across the web
Both rely on the inferencing structure of RDFS-Plus for completeness of their info structure
Both use owl:InversFunctionalProperty to determine identity of key elements
But FOAF (unlike SKOS) has a somewhat evolutionary approach to info extension
Many concepts (e.g., Name) have a broad number of terms
It can be extended as new features are needed—cf. foaf:weblog
SKOS, in contrast, has a much more orderly approach to extension
It has 3 parts
SKOS Core (described here), imported by the other 2
SKOS Mapping includes vocabulary for mapping vocabularies from different sources
SKOS Extensions is for particular vertical applications of SKOS
SKOS Core is an interlingua for thesauri
Designed by a small committee to consolidate the fundamentals of other thesaurus systems into a single Semantic Web model
Consider how they will be extended
FOAF takes the AAA slogan very seriously The actual preferred parts of the representation will be
determined largely by use
SKOS has a stable core designed by an informed committee who performed a detailed commonality/variability analysis of extant vocabulary systems Its architecture has been published and is a roadmap for
development
The technical structure of RDF supports both modes
FOAF’s free extension style and SKOS’s orderly layering are accomplished with the same graph overlay mechanism
The difference is in how the overlay is organized and governed
The SKOS and FOAF efforts are like standards efforts:
they’re maintained by committees who publish policy decisions
But they’re unlike standards efforts in that neither is intended as a complete work providing prescriptive advice to someone designing
a vocabulary control system (like SKOS) or
a social networking systems (like FOAF)
Their role is to provide an exchange mechanism on the Web for sharing this kind of info
This is the power of a model on the Semantic Web:
It doesn’t prescribe how to represent things
Rather, it provides a means of transfer from one representation to another
Dublin Core Diane Hillmann, Using Dublin Core, 2005-11-07
http://dublincore.org/documents/usageguide/
Metadata A metadata record consists of a set of attributes, or elements,
needed to describe a resource
E.g., a metadata system common in libraries—the library catalog— contains a set of metadata records with elements that describe a book or other library item:
author, title, date of creation or publication, subject coverage, and the call number specifying location on the shelf
Linkage between a metadata record and the resource it describes may take 1 of 2 forms:
1. elements may be contained in a record separate from the item (cf. a library's catalog record) or
2. the metadata may be embedded in the resource itself
E.g., the Cataloging In Publication (CIP) data printed on the verso of a book's title page, or the TEI header in an electronic text
Many metadata standards, including the DC standard, don’t prescribe either type of linkage
Introduction to Dublin Core The DC metadata standard is a simple yet effective element set for
describing a wide range of networked resources
Two levels:
Simple DC comprises 15 elements
Qualified DC includes 3 additional elements (Audience, Provenance and
RightsHolder) and a group of element refinements (or qualifiers) that refine the
semantics of the elements in ways useful in resource discovery
The semantics of DC has been established by an international, cross-disciplinary group of professionals from librarianship, computer science, text encoding, the museum community, and other related fields of scholarship and practice
Another way to look at DC is as a "small language for making a particular class of statements about resources"
This language has 2 classes of terms—elements (nouns) and qualifiers (adjectives)—that can be arranged into a simple pattern of statements
The resources themselves (think URIref) are the implied subjects in this language
In the diverse world of the Internet, DC can be seen as a "metadata pidgin for digital tourists":
easily grasped, but not necessarily up to the task of expressing complex relationships or concepts
Each element is optional and may be repeated
Most elements have a limited set of qualifiers or refinements, attributes that may be used to further refine (not extend) its meaning
Three Dublin Core Principles1. The One-to-One Principle DC metadata describes 1 manifestation or version of a resource
Manifestations don’t stand in for one another
E.g., the relationship between the metadata for the original Mona Lisa and that for a reproduction is part of the metadata description
Helps the user determine whether he must go to the Louvre or his need can be met by a reproduction
2. The Dumb-down Principle A client can ignore any qualifier and use the value as if it were
unqualified
May result in some loss of specificity, but the remaining element value must continue to be generally correct and useful for discovery
Qualification only refines, and doesn’t extend, the semantic scope of a property
3. Appropriate values An implementer can’t predict that the interpreter of the metadata will
always be a machine
The requirement of usefulness for discovery should be kept in mind
DC was originally developed for describing document-like objects
But DC metadata can be applied to other resources
Its suitability for particular non-document resources depends on how closely their metadata resembles typical document
metadata and what purpose the metadata is intended to serve
Dublin Core GoalsSimplicity of creation and maintenance The DC element set has been kept as small and simple as possible
Lets a non-specialist create simple descriptive records for info resources easily and inexpensively
Yet provides for effective retrieval of those resources in the networked environment
Commonly understood semantics Discovery of info across the Internet is hindered by differences in
terminology and descriptive practices from one field of knowledge to the next
DC can help the "digital tourist“—a non-specialist searcher—find his way by supporting a common set of elements whose semantics are universally understood and supported
E.g., scientists locating articles by a particular author and art scholars interested in works by a particular artist can agree on the importance of a "creator" element
Such convergence on a common, if slightly more generic, element set increases the visibility of all resources, both within a given discipline and beyond
International scope The DC Element Set was originally developed in English
But versions are being created in many other languages,
The DCMI (DC Metadata Initiative) Localization and Internationalization Special Interest Group is coordinating efforts to link these versions in a distributed registry
The development of the standard considers the multilingual and multicultural nature of the electronic information universe
Extensibility DC developers recognize the importance of providing a mechanism for
extending the DC element set for additional resource discovery needs
Expect that other communities of metadata experts will create and administer additional metadata sets, specialized to their communities
Metadata elements from these sets could be used in conjunction with DC metadata for interoperabilbility
The DCMI Usage Board is working on a model for accomplishing this "application profiles":
Schemas that consist of data elements drawn from 1 or more namespaces, combined by implementers, and optimized for a particular local application
Allows different communities to use the DC elements for core descriptive information
DC Syntax Issues Syntax choices depend on a number of variables,
One-size-fits-all prescriptions rarely apply
DC concepts and semantics are designed to be syntax independent
Equally applicable in a variety of contexts, as long as
the metadata is in a form suitable for interpretation both by search engines and by human beings
(X)HTML can be used to express either simple or qualified DC
But limitations inherent in representing refinements in HTM
Use meta and link element
But typically we use RDF
Metadata Storage and Maintenance Issues Some implementations using DC embed their metadata within the
resource itself
Most often with documents encoded using HTML
But also sometimes possible with other kinds of documents
Simple tools make provision of DC metadata within HTML encoded pages fairly easy
Alternatively, metadata can be stored in any kind of database
Provide a link to the described resource rather than be embedded within it
Element Content and Controlled Vocabularies Each DC element is optional and repeatable
No defined order of elements
The ordering of multiple occurrences of the same element (e.g., creator) may have a significance intended by the provider
But ordering isn’t guaranteed to be preserved in every user environment
Controlled Vocabularies (Vocabulary Encoding Schemes) Content data for some elements may be selected from a controlled
vocabulary (or vocabulary encoding scheme)
A limited set of consistently used and carefully defined terms
Can dramatically improve search results
Without basic terminology control, inconsistent or incorrect metadata can profoundly degrade the quality of search results
One cost of a controlled vocabulary is the need for an administrative body to review, update and disseminate the vocabulary
E.g., the US Library of Congress Subject Headings (LCSH) and the US National Library of Medicine Medical Subject Headings (MeSH) are formal vocabularies
But both require significant support organizations
Another cost is having to train searchers and creators of metadata so that they know when using, e.g., MeSH to enter "myocardial infarction" instead of "heart attack."
More sophisticated implementations can make such tasks easier
Encoding Schemes Using controlled vocabularies can be done most effectively using
encoding schemes
Without an encoding scheme specifically designated, a subject carefully selected from a particular controlled vocabulary can’t be distinguished from a simple keyword
Agent Roles in DC MARC Relator terms are properties describing the various roles
people and organizations play in developing and using of a resource
E.g., "Illustrator" is an agent which provided illustrations for the resource
Roles are expressed as properties (i.e., elements or element refinements)
Most are refinements of the dc:contributor
Library of Congress helped evaluate all 150 MARC Relator Terms
Asked whether they represented "an entity responsible for making contributions to the content of the resource"
15 elements defined in Dublin Core Metadata Element Set (DCMES) Version 1.1, “simple Dublin Core” Namespace http://purl.org/dc/elements/1.1/ (prefix dc:)
contributor
coverage
creator
date
description
format
identifier
language
publisher
relation
rights
source
subject
title
type
Larger set of “terms” defined in the more comprehensive document "DCMI Metadata Terms"
Terms refine elements (they’re “element refinements”, i.e., sub-properties)