RDFa, Etc. (Resource Description Framework–in–attributes) W3C: RDFa 1.1 Primer
Post on 29-Dec-2015
228 Views
Preview:
Transcript
RDFa, Etc. (Resource Description Framework–in–attributes)
W3C: RDFa 1.1 Primer
http://www.w3.org/TR/xhtml-rdfa-primer/
RDFa RDFa allows RDF statements to be included in ordinary
HTML/XHTML files using formally defined attributes
A W3C recommendation, http://www.w3.org/TR/rdfa-core
The vocabularies are specified using XML namespaces, so use with XHTML, not HTML, document types
Do not generate RDF/XML files separately
RDF/XML is complex
Requires a separate creation and storage mechanisms
Add extra structured content to the (X)HTML pages
Let processors extract that content and turn it into RDF
RDFa provides attributes to carry metadata in an XML language
Note the ‘a’ (attributes) in RDFa
These attributes include:
about gives a URI specifying the resource the metadata is about
rel and rev specify a relationship and inverse relationship with another resource, resp.
src, href and resource specify the partner resource
property specifies a property for the content of an element or the partner resource (the resource that the metadata is about)
content (optional) overrides the content of the element when using the property attribute
datatype (optional) specifies the datatype of text specified for use with the property attribute
typeof (optional) specifies the RDF type(s) of the subject or the partner resource
Five "principles of interoperable metadata" met by RDFa
Publisher Independence: Each site can use its own standards
Data Reuse: Data are not duplicated—separate XML and HTML sections aren’t required for the same content.
Self Containment: The HTML and the RDF are separated
Schema Modularity: The attributes are reusable
Evolvability: Additional fields can be added and XML transforms can extract the semantics of the data from an XHTML file
Attributes map to RDF components
Subject: about, src—e.g., about="rdfa-course"
Predicate: property, rel, rev, typeof—e.g., property="dc:title"
Object: content, href, resource, datatype, or just plain content or a resource—e.g., RDFa Course as the content of an HTML element
Example
<div about=”rdfa-course">
<h3 property="dc:title">RDFa Course</h3>
</div>
RDFa Example<div xmlns:v="http://rdf.data-vocabulary.org/#"
typeof="v:Person">
<span typeof="v:Address">
<span property="v:locality">Albuquerque</span>
<span property="v:region">NM</span>
</span>
</div>
The namespace used here identifies the vocabulary developed by Schema.org—see below
Publishing RDFa
RDFa provides an easy way of publishing RDF data on the Web
Often the same RDF data is available in different formats, including RDFa
The client chooses which one(s) to support
Consuming RDFa
Various search engines have begun to consume RDFa
Google, Yahoo, …
They may specify which vocabularies they “understand”
Facebook’s “social graph” is based on RDFa
RDFa Distiller W3C service to identify and list RDF in a web page
http://www.w3.org/2012/pyRdfa/
Extract RDF from HTML + RDFa
Using a web address, local file or direct text inputs, it provides a clean view of the implied data hierarchy
Example Select the tab Distill by Direct Text Input, copy the following into the window
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Books by Marco Pierre White</title>
</head>
<body>
I think White's book
'<span about="urn:ISBN:0091808189"
typeof="http://purl.org/ontology/bibo/Book"
property="http://purl.org/dc/terms/title"
>Canteen Cuisine</span>'
is well worth getting since although it's quite advanced stuff, he
makes it pretty easy to follow. You might also like
<span about="urn:ISBN:1596913614"
typeof="http://purl.org/ontology/bibo/Book"
property="http://purl.org/dc/terms/description"
>White's autobiography</span>.
</body>
</html>
Choose the following selections in the dropdowns below the text window
Host Language: HTML5 + RDFa
Output Format: Turtle
Returned content: Only core triples
Expand vocabularies: No
Generate warnings for non RDFa 1.1 Lite usage: No
Click the Go button (below these dropdowns)
Output presented in a downloaded file—open in, e.g., Notepad++
For our example, the output is
@prefix dc: <http://purl.org/dc/terms/> .
<urn:ISBN:0091808189> a <http://purl.org/ontology/bibo/Book>;
dc:title "Canteen Cuisine" .
<urn:ISBN:1596913614> a <http://purl.org/ontology/bibo/Book>;
dc:description "White's autobiography" .
RDFa Developerhttps://addons.mozilla.org/en-US/firefox/addon/rdfa-developer/?src=ss
Firefox add-on that lets us visualize all the RDFa triples in a web page
Shows a list of errors and warnings found while parsing the document
Lets us execute SPARQL queries on the RDFa content
To install, follow above link, click Add to Firefox button, restart Firefox (Perhaps first look Tools Add-ons for restart in Developer listing)
The Developer windows occupy the bottom part of the screen
To add an icon in the lower right corner of the browser (the icon bar), in the View menu at the top, under Toolbars, have Add-on bar checked
Click the icon to toggle the Developer display off and on
By default, the Developer windows appear when you start up Firefox
To prevent this, in the Tools tab, select Add-ons
In the resulting display, click the Disable button for RDFa Developer
To use the Developer again, go back and click the Enable button
If the Developer icon doesn't appear in add-on bar, View Toolbar Customize and drag the Developer icon from the pallet to the add-on bar
Example Save the following code (same as the previous example) in an HTML file<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Books by Marco Pierre White</title>
</head>
<body xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dc="http://purl.org/dc/terms/">
I think White's book
'<span about="urn:ISBN:0091808189"
typeof="bibo:Book"
property="dc:title"
>Canteen Cuisine</span>'
is well worth getting since although it's quite advanced stuff, he
makes it pretty easy to follow. You might also like
<span about="urn:ISBN:1596913614"
typeof="bibo:Book"
property="dc:description"
>White's autobiography</span>.
</body>
</html>
Open the saved HTML file in Firefox
The output should show 4 triples in the Data tab (expand by clicking th triangles) and 3 warnings in the Notices tab
If the tabs do not show any triples or warnings, try to disable & re-enable the RDFa Developer add-on
Regarding the Notices tab (errors & warning), suppose we remove the namespaces in the body element
Change
<body xmlns:bibo="http://purl.org/ontology/bibo/"
xmlns:dc="http://purl.org/dc/terms/">
to
<body xmlns:dc="http://purl.org/dc/terms/">
Open the saved HTML file in Firefox
The output shows the errors and warnings in the Notices tab
The errors specify that the prefix used for the bibo namespace is not defined (and the attribute with this prefix is unused)
I couldn't get the Query tab to submit queries
E.g., on the BBC New world website, http://www.bbc.co.uk/news/world/, one part of the HTML is as follows
To see the source HTML, in Mozilla, right click in the window In the resulting menu, click View Page Source
<meta property="og:title" content="BBC News: World">
<meta property="og:description" content="World news from the BBC">
<meta property="og:url" content="http://www.bbc.co.uk/news/world/">
<meta property="og:type" content="website">
<meta property="og:image" content= "http://news.bbcimg.co.uk/media/images/56400000/jpg/_56400259_bbcnews.jpg">
<meta property="og:site_name" content="BBC News">
<meta property="fb:app_id" content="218019758281651">
The next slide shows part of the RDFa Developer Data tab
The RDFa occurs in several places—hence the triples from RDFa not shown here
The og: prefix is for the Open Graph protocol
The Open Graph protocolhttp://ogp.me/
Enables any web page to become a rich object in a social graph
Used on Facebook to allow any web page (by adding metadata) to have the same functionality as any other object on Facebook
Since Open Graph is an open protocol of sorts, it's not Facebook specific
Google Plus gives schema.org the highest weight If they don’t exist, it falls back on open graph tags If they do not exist, falls back on page content, like "title", etc.
Even without a good internal search engine, Facebook already drives more traffic for some searches (social searches) than Google
No single technology provides enough info to richly represent any web page within the social graph
The Open Graph protocol builds on these existing technologies
Developer simplicity is a key goal that has informed many of the technical design decisions
See: The Open Graph Protocol Design Decisions (D. Recordon, presented at the W3C’s Linked Data CAMP at WWW 2010)
http://www.scribd.com/doc/30715288/The-Open-Graph-Protocol-Design-Decisions
Within 7 days of implementation, the following services hosted it og:it—simple metadata extractor to HTML OpenGraph.in—simple metadata extractor to HTML and JSON Multiple RDF parsers now understand the Open Graph protocol Open Graph protocol to JSON converter for testing Open source libraries for Java, Perl, PHP, and Ruby WorldPress plugin for easy publishing
Initial version is based on RDFa
Place additional <meta> tags in the <head> of your web page
The 4 required properties
og:title—the title of your object as it’s to appear in the graph
og:type—the type of your object, e.g., "video.movie" Depending on the type you specify, other properties may also
be required
og:image—an image URL to represent your object in the graph
og:url—the canonical URL of your object, used as its permanent ID in the graph
Example: the Open Graph protocol markup for The Rock on IMDB
<html prefix="og: http://ogp.me/ns#">
<head>
<title>The Rock (1996)</title>
<meta property="og:title" content="The Rock" />
<meta property="og:type" content="video.movie" />
<meta property="og:url"
content="http://www.imdb.com/title/tt0117500/" />
<meta property="og:image"
content="http://ia.media-imdb.com/images/rock.jpg" />
...
</head>
...
</html>
Also 7 optional properties
Some properties can have extra metadata attached to them
E.g, the og:image property has some optional structured properties
og:image:url—identical to og:image
og:image:secure_url—an alternate url to use if the webpage requires HTTPS
og:image:type—a MIME type for this image
og:image:width—the number of pixels wide
og:image:height—the number of pixels high
The og:video tag has the identical tags
The og:audio tag only has the 1st 3 properties
If a tag can have multiple values, put multiple versions of the same <meta> tag on your page
The 1st tag (from top to bottom) is given preference during conflicts
This is effectively an array of values
When the community agrees on the schema for a type, it’s added to the list of global types
All other objects in the type system are CURIEs (see below) of the form
<head prefix="my_namespace: http://example.com/ns#">
<meta property="og:type" content="my_namespace:my_type" />
The global types are grouped into verticals, each with its own namespace
The og:type values for a namespace are prefixed with the namespace and then a period Reduces confusion with user-defined namespace types
(which have colons)
Example (more a candidate vertical) profile—namespace URI: http://ogp.me/ns/profile#
profile:first_name—string—a given name
profile:last_name—string—a name inherited from a family or marriage
profile:username—string—a short unique string to identify them
profile:gender—enum(male, female)—their gender
The types used when defining attributes
Boolean—values: true, false, 1, 0
DateTime—composed of a date (year, month, day) and an optional time component (hours, minutes) as per the ISO 8601 standard
Enum—a type consisting of bounded set of constant string values
Float—a 64-bit signed floating point number
Integer—a 32-bit signed integer.
String—a sequence of Unicode characters
URL—all valid URLs that utilize the http:// or https:// protocols
Discuss the Open Graph Protocol
in the Facebook group (https://www.facebook.com/groups/opengraph/) or
on the developer mailing list (http://groups.google.com/group/open-graph-protocol)
The open source community has developed several parsers and publishing tools
Facebook Object Debugger—Facebook's official parser & debugger
Google Rich Snippets Testing Tool—Open Graph protocol support in specific verticals and Search Engines.
OpenGraph.in—a service that parses Open Graph protocol markup and outputs HTML and JSON
PHP Validator and Markup Generator—OGP 2011 input validator and markup generator in PHP5 objects
PHP Consumer—a small library for accessing of Open Graph Protocol data in PHP
OpenGraphNode in PHP—a simple parser for PHP
PyOpenGraph—a library written in Python for parsing Open Graph protocol information from web sites
Continued
OpenGraph Ruby—a Ruby Gem that parses web pages and extracts Open Graph protocol markup
OpenGraph for Java—a small Java class used to represent the Open Graph protocol
RDF::RDFa::Parser—a Perl RDFa parser that understands the Open Graph protocol
WordPress plugin—Facebook's official WordPress plugin
WordPress
http://wordpress.org/
A free and open source blogging tool and a content management system (CMS) based on PHP and MySQL
Runs on a web hosting service
Used by more than 18.9% of the top 10 million websites (August 2013)
The most popular blogging system (>60 M websites)
A CURIE (short for Compact URI) defines a generic, abbreviated syntax for expressing URIs, e.g., [isbn:0393315703]
May be considered a datatype
The square brackets may be used to prevent ambiguities between CURIEs and regular URIs, yielding so-called safe CURIEs
QNames may be considered a type of CURIE
CURIEs can be better defined and may include checking
Unlike QNames, the part of a CURIE after the colon needn’t conform to the rules for XML element names
The final W3C recommendation was released 2009
Example (using a QName syntax within XHTML)
<html xmlns:wiki="http://en.wikipedia.org/wiki/">
<head>...</head>
<body>
<p>
Find out more about <a href="[wiki:Biome]">biomes</a>.
</p>
</body>
</html
Definition
CURIE
RDFa Playhttp://rdfa.info/play/
Beta version (still bugs) yet very useful
HTML fragment with RDFa in left panel, rendering in right
Choose to see (below the panels) either N3 serialization of contained RDF or its graphical visualization
Examples of type Person, Social Network, Event, Place, Product, SVG
Edit these or make your own HTML fragments from scratch
See Tools tab at RDF.info web page (http://rdfa.info/tools/)
The W3C’s Nu Markup Validation Service
http://validator.w3.org/nu/
Handles RDFa in XML and (X)HTML (various versions) as well as SVG and MathML
Can automatically detect content type
java-rdfahttps://github.com/shellac/java-rdfa
An offshoot of the Stars Project, Univ. of Bristol, Institute for Learning and Research Technology (Web Futures team)
STARS (roughly Semantic Tools for Screen Arts Research) project (http://www.dshed.net/dshed/stars, http://stars.ilrt.bris.ac.uk/blog/) is now finished
Funded by JISC, a charity that champions the use of digital technologies in UK education and research
The Semantic Web technologies used in it broadly seek to capture and make machine readable data resources of video content
Lets people browsing the content discover thematic links and describe them in new ways
For HTML sources, add the format argument; need the validator.nu parser (see below)
$ java -cp '*' rdfa.simpleparse --format HTML http://www.slideshare.net/intdiabetesfed/world-diabetes-day-2009
<http://www.slideshare.net/intdiabetesfed/world-diabetes-day-2009>
<http://www.w3.org/1999/xhtml/vocab#stylesheet>
<http://public.slidesharecdn.com/v3/styles/combined.css?1265372095> .
...
The output of simpleparse is n-triples (hard to read)
Add Jena to the classpath and use rdfa.parse instead
$ java -cp '*:/path/to/jena/lib/*' rdfa.parse --format HTML http://www.slideshare.net/intdiabetesfed/world-diabetes-day-2009
@prefix dc: <http://purl.org/dc/terms/> .
@prefix hx: <http://purl.org/NET/hinclude> .
... nice turtle output ...
java-rdfa can be used from Jena—invoke
Class.forName("net.rootdev.javardfa.RDFaReader");
This hooks the 2 readers into Jena, then we can do either of the following
model.read(url, "XHTML"); // xml parsing
model.read(other, "HTML"); // html parsing
The Validator.nu HTML Parser
http://about.validator.nu/htmlparser/
An implementation of the HTML5 parsing algorithm in Java
Works as a drop-in replacement for the XML parser in applications that
already support XHTML 1.x content with an XML parser and
use SAX, DOM or XOM to interface with the parser
The parser core compiles on Google Web Toolkit
The following are mentioned in RDFa.info, Developers link, http://rdfa.info/dev/
Green Turtle
http://code.google.com/p/green-turtle/
An implementation of RDFa 1.1 for browsers
Including a bit of JavaScript extends the DOM to include the RDFa API
An RDFa 1.1 processor to process any ancillary documents to harvest triples
EasyRdfhttp://www.easyrdf.org/
A PHP library to make it easy to consume and produce RDF—e.g.,
$foaf = new EasyRdf_Graph("http://njh.me/foaf.rdf");
$foaf->load();
$me = $foaf->primaryTopic();
echo "My name is: ".$me->get('foaf:name')."\n";
There’s a class to map between RDF Types and PHP Classes
Support for visualization of graphs using GraphViz
EasyRdf 0.8 does support RDFa, but it's still in beta Use the converter at easyrdf-converter.aelius.com to test it out
pyrdfa3
https://github.com/RDFLib/pyrdfa3
This is what provides the W3C’s RDFa Distiller and Parser
Part of Python RDFLib, https://github.com/RDFLib
The RDFa gem
http://rubygems.org/gems/rdf-rdfa
The Ruby RDF Project collects numerous gems supporting Linked Data and Semantic Web programming in Ruby
See http://ruby-rdf.github.io/
librdfa, “The Fastest RDFa Processor on the Internet”
https://github.com/rdfa/librdfa/
A SAX-based RDFa processor written in C for XML and HTML family languages
Supports
XML+RDFa, XHTML+RDFa, SVG+RDFa, HTML4+RDFa and HTML5+RDFa
for both RDFa 1.0 and RDFa 1.1
clj-rdfa
https://github.com/niklasl/clj-rdfa
An RDFa extractor implemented in Clojure running on a Java Virtual Machine.
Clojure (pronounced “closure”) is a dialect of Lisp programming
A functional general-purpose language
Runs on the Java Virtual Machine, Common Language Runtime, and JavaScript engines
Focus is on programming with immutable values and explicit progression-of-time constructs
Facilitates the development of more robust programs, particularly multithreaded ones
Semarglhttp://semarglproject.org/
Download from https://github.com/levkhomich/semargl
A modular framework for crawling linked data from structured documents
Provides lightweight and performant tools without excess dependencies
High-performant streaming parsers for RDFa, JSON-LD (see below), RDF/XML, N-Triples
Streaming serializer for Turtle, NTriples, NQuads
Integration with Jena, Sesame (see below) and Clerezza (see below)
Small memory footprint and CPU requirements allow this framework to be used by any application
Runs seamlessly on Android and GAE (Google App Engine)
Sesamehttp://www.openrdf.org/about.jsp
An open-source framework for querying and analyzing RDF data
Implements an in-memory triple store and an on-disk triple store
And 2 Servlet packages to manage and provide access to these triple stores on a permanent server
The Sesame Rio (RDF Input/Output) package contains a simple API for Java-based RDF parsers and writers
Supports 2 query languages: SPARQL and SeRQL (in the SWI-Prolog Semantic Web Library, http://www.swi-prolog.org/pldoc/package/semweb.html, see also http://www.swi-prolog.org/web/)
Its Alibaba component is an API that lets us
map Java classes onto ontologies and
Generate Java source files from ontologies
Can thus use specific ontologies like RSS, FOAF and the DC directly from Java
Clerezza
http://clerezza.apache.org/
A service platform based on OSGi (Open Services Gateway initiative, open specifications that enable the modular assembly of software built with Java technology, http://www.osgi.org/)
Functionality for managing semantically linked data accessible through RESTful Web Services and in a secured way
Tools to manipulate RDF data, create RESTful Web Services and Renderlets using Scala Server Pages
A renderlet is a special container that can receive every object in Pimcore
Pimcore is an open source web content management platform for creating and managing web applications and digital presences implemented in PHP and MySQL
Scala Server Pages are like JSPs but for Scala instead of Java
Scala is an object-functional programming and scripting language for general software applications
RDF triples are stored via Clerezza’s Smart Content Binding (SCB)
A java implementation of the graph data model and functionalities to operate on it
A service interface to access multiple named graphs
Can use various providers to manage RDF graphs in a technology specific manner (using e.g., Jena or Sesame)
Provides for adaptors that allow an application to use various APIs (including the Jena api) to process RDF graphs
A serialization and a parsing service to convert a graph into a certain representation and vice versa
JSON-LD (JSON for Linked Data, http://json-ld.org/) is a method of transporting Linked Data using JSON
Being standardized by the W3C RDF Working Group (http://www.w3.org/TR/2013/PR-json-ld-20131105/, Nov. 2013)
Linked Data is a way of publishing structured data so that it can be interlinked and more useful
Builds upon standard Web technologies (HTTP, RDF, URIs, …)
Extends them to share info in a computer-readable way so that data from different sources can be connected and queried
JSON-LD aims to require as little effort as possible from developers to transform their existing JSON to JSON-LD
Designed around the concept of a “context” to provide additional mappings from JSON to an RDF-like model
See the playground at http://json-ld.org/playground/
checkrdfahttp://check.rdfa.info/
Checks a web page for RDFa and displays the found data
Validates our data against the published recommendations from major consumers/users of RDFa data
I don’t think this works anymore
Microformats See microformats.org at http://microformats.org/
Primer: http://www.digital-web.com/articles/microformats_primer/
A microformat (abbreviated μF) is a web-based approach to semantic markup
Re-uses existing HTML/XHTML tags to convey metadata and other attributes in web pages and other contexts that support (X)HTML (e.g., RSS)
Lets software process info intended for end-users (e.g., contact info, geographic coords, calendar events) automatically
Established microformats (e.g., hCard) are published on the web at least as often as alternatives (e.g., schema and RDFa)
hCard is a microformat version of vCard
Mozilla Operator add-on
https://addons.mozilla.org/en-US/firefox/addon/operator/
Leverages microformats and other semantic data available on many web pages to provide new ways to interact with web services
After adding it, View Toolbar Customize and drag the Operator icon from the pallet to the add-on bar
Then, at the top of the Mozilla window, View Sidebar and click Operator
Operatortoolbar
Operator iconand drop-down
menu
Add various items of info to various services
Here add an event to my Google Calendar
Get the same options in the toolbar just above the page
XOXO (eXtensible Open XHTML Outlines) is an XML microformat for outlines built on top of XHTML
http://microformats.org/wiki/xoxo
The spec defines an outline as a hierarchical, ordered list of arbitrary elements
It's fairly open, suitable for many types of list data
The XML elements in an XOXO document
<ol class="xoxo">
<ul class="xoxo">
These, with class attribute with value xoxo, are the root elements of XOXO, used as containers for outline items
May have attribute compact="compact" to indicate whether child items are visible
<li> is an item in the outline
May contain an ol or ul element to contain child items, which themselves may do so as well
<a> is a hyperlink for an item in the outline (and may contain much info—see below)
<dl> may contain any number of arbitrary properties using dt (definition term) and dd (definition description) elements
Example
<ol class='xoxo'>
<li>item 1
<dl>
<dt>description</dt>
<dd>This item represents the main point we're trying to make.</dd>
</dl>
<ol>
<li>subpoint a</li>
<li>subpoint b</li>
</ol>
</li>
Special properties: text, url, title, type, and rel (short for relationship)
Example<ol class='xoxo'>
<li><a href="http://example.com/more.xoxo"
title="title of item 1"
type="text/xml"
rel="help">item 1</a>
<!-- note how the "text" property is just the contents of the <a> -->
<dl>
<dt>description</dt>
<dd>This item represents the main point we're trying to make.</dd>
</dl>
</li>
Some Open Microformat Formats A microformat (singular) is one collection of properties, each with an intended kind of value
Only hCard and hCalendar have been ratified so far
hCard is for publishing people, companies, organizations on the web, using a 1:1 representation of vCard properties and values in HTML—e.g.,
<div class="vcard">
<a class="url fn org" href="http://microformats.org/">
microformats.org
</a>
</div>
hCalendar is a format for publishing events on the web, using a 1:1 representation of iCalendar (RFC2445) VEVENT properties and values in HTML—e.g.,
<span class="vevent">
<span class="summary">The microformats.org site was launched</span>
on <span class="dtstart">2005-06-20</span>
at the Supernova Conference
in <span class="location">San Francisco, CA, USA</span>.
</span>
XFN is a lightweight method of annotating links to indicate a personal relationship with the person responsible for the linked resource
Strengthens existing links in a way that’s both machine-readable and human-comprehensible
It and FOAF serve different purposes
relationship category XFN valuesfriendship (at most one): friend acquaintance contact
physical: met
professional: co-worker colleague
geographical (at most one): co-resident neighbor
family (at most one): child parent sibling spouse kin
romantic: muse crush date sweetheart
identity: me
hReview is a format suitable for embedding reviews (of products, services, businesses, events, etc.) in HTML, XHTML, Atom, RSS, and arbitrary XML
Example (See the rendering on the next slide)
<div class="hreview">
<span><span class="rating">5</span> out of 5 stars</span>
<h4 class="summary">Crepes on Cole is awesome</h4>
<span class="reviewer vcard">Reviewer:
<span class="fn">Tantek</span> -
<abbr class="dtreviewed" title="2005-04-18">April 18, 2005</abbr>
</span>
<div class="description item vcard"><p>
<span class="fn org">Crepes on Cole</span>
is one of the best little creperies
in <span class="adr">
<span class="locality">San Francisco</span>
</span>.
Excellent food and service. Plenty of tables in a variety of sizes
for parties large and small. Window seating makes for excellent
people watching to/from the N-Judah which stops right outside.
Continued
I've had many fun social gatherings here, as well as gotten
plenty of work done thanks to neighborhood WiFi.
</p></div>
<p>Visit date: <span>April 2005</span></p>
<p>Food eaten: <span>Florentine crepe</span></p>
</div>
Apache Any23 (Anything to Triples)
https://any23.apache.org/
A library, a web service, and a command line tool that extracts structured data in RDF format from a variety of Web documents
Supported input formats:
RDF/XML, Turtle, Notation 3
RDFa with RDFa1.1 prefix mechanism
Microformats: Adr, Geo, hCalendar, hCard, hListing, hResume, hReview, License, XFN and Species
HTML5 Microdata: (such as Schema.org, see below)
CSV with separator autodetection
For a detailed description of available extractors, see https://any23.apache.org/extractors.html
Apache Any23 is written and Java and used in major Web of Data applications such as
sindice.com (Semantic Web index, http://sindice.com/, collects Web data in many ways and offers search and querying across this data), and
sig.ma (semantic info mashup, http://sig.ma/)
Used in various ways, including
As a library in Java applications that consume structured data from the Web
As a command-line tool for extracting and converting between the supported formats
Online service: http://any23.org/
Microdata Microdata is a WHATWG HTML specification used to nest
semantics within existing content on web pages.
Web Hypertext Application Technology Working Group (WHATWG): a community interested in evolving HTML and related technologies
Microdata aims for annotation of HTML elements with machine-readable tags that’s simpler than the similar approaches
E.g., those using RDFa and Microformats
A web developer can design a custom vocabulary or use vocabularies available on the web (see data-vocabulary.org below)
Microdata Global Attributes (used as HTML attributes) itemscope creates the Item and indicates that descendants of this
element contain info about it
itemtype is a valid URL of a vocabulary that describes the item and its properties
itemid indicates a unique identifier of the item
itemprop indicates that its containing tag holds the value of the specified item property
The property’s name and value context are described by the item’s vocabulary
Property values usually consist of string values but can also use URLs (e.g., using the a element and its href attribute)
itemref: Properties that aren’t descendants of the element with the itemscope attribute can be associated with the item using this attribute
Provides a list of element itemids with additional properties elsewhere in the document
Schema.org See http://getschema.org/index.php/Main_Page
An initiative launched in June 2011 by Bing, Google and Yahoo! to provide a vocabulary for web masters to markup web content in ways recognized by major search providers
Data-Vocabulary.org: http://www.data-vocabulary.org/
Has links to the documentation on Schema.org’s vocabulary
The Schema.org vocabulary can be used with both Microdata or RDFa 1.1 Lite syntax
Has types for Event, Organization, Person, Product, Review, AggregateRating, Offer and hundreds of others
For the RDFS file (as an XML document) that defines this vocabulary, see http://rdf.data-vocabulary.org/rdf.xml
Other markup vocabularies are provided by Schema.org schemas
Typically, applications need to extract semantic annotations from the web pages and use them to perform reasoning
RDFa Extractor (RDFa2RDF Service)
http://getschema.org/rdfaliteextractor/about
A REST Web Service to extract RDF data from RDFa annotations
Provides the semantic information as N-Triples, N3 Notation, JSON
Powered by node.js and uses jsdom library node.js: http://nodejs.org/ jsdom: https://github.com/tmpvar/jsdom
Microdata Extractor (Microdata2RDF Service)
http://getschema.org/microdataextractor/about
Like the RDFa Extractor but has Microdata, not RDFa, as input
Conforms with the Microdata2RDF specification at W3C
But may use a different generation algorithm
top related