Annotations are coming to the web

Annotations are coming to the web, and ... well, everything else.

Annotating Everything Published on December 14, 2016

Kurt CagleFounder and Chief Ontologist at Semantical, LLC

Life is lived in the margins. Three thousand years ago, priests and scholars were adding

comments about the validity of poems about the gods in copies of clay tablets along the edges,

along with comments such as Hath-ur wrote this part. Most extant copies of the Illiad from the

fifth century BCE have more marginal content than actual text, and creators in the various

manuscripts of the bible (leading up to the masterpiece of the Book of Kells) deliberately kept

wide margins not only to assist in binding the large, wide pages, but also to provide room for

scholars, transcriptionists, and illuminators to add their own last words (or sketches) about a

given paragraph or poem.

With the advent of printing (and consequently the significant reduction of the cost of producing

a book), annotations exploded into multiple different forms - commentaries, citations,

corrections, glossary definitions and similar notations. These were often collected after the

publication of one edition of a work and appended to the next, either within the body of the text

(typically in footnotes either on the page or in chapter or book appendices).

The Internet was, ultimately, built around the concept of an annotation. A typical web page link

has all the key pieces one needs for a (very simple) annotation:<ahref="/link/to/annotation"title="My comments about this annotation">some inline text (the body)</a>

As the web shifted from being a read-only medium to being read-write oriented, one of the first

things to happen was the emergence of comments to content that was published, whether

articles or images. Most such systems provided a body (the comment), the author (the person

who posted the comment) and when the comment was made. Additionally, summary and title

metadata could be added. As mail moved away from being a stand-alone application to being a

mostly web-based app, it also became more annotation oriented.

Facebook and Twitter are, in the main, nothing but annotations annotating other annotations. A

tweet or facebook post is submitted, typically containing a body that might include a link

(which gets transformed from an inline piece of text to a web URL), and subsequent comments

made to that tweet or post are annotations to that post, or to comments on that post. Facebook

content itself has a specific address, but comments generally do not.

Twitter's retweet facility does much the same thing, but no distinction is made between a

primary tweet and any retweets - they are each distinct objects. Indeed, at their core, most social

networks follow one of those two models - Pinterest, Tumblr, Instagram, Snapchat, Wordpress.

All of these are effectively annotations of annotations.

Given this realization, the W3C is moving towards final release of the Open Annotation

standard (OA). The core idea behind OA is that any resource that has a representation on the

web can be the target of an annotation - a web site, a web page, a blog post, an image, a PDF

document or anything else that is accessible on the web.

The structure of an annotation is straightforward, though its implementation is perhaps a bit

more complex (Figure 1):

Figure 1. Core structure of an annotation.The annotation has two distinct parts: its target (what the annotate is referring to) and its body

(the content of the annotation itself). The target is represented by a URL (or web address)

though it could also be conceptual IRI (see Annotations and Semantics, below).

The body could be text (plain text or html) but could also be a link to an external document

such as a blog post, data definition, picture, video, fragment from another web source, in short,

anything that has an address on the web.

Annotations usually have a motivation or purpose. An annotation may be a general comment,

may be a formal or informal description or definition, may illustrate, indicate a change in status,

or similar action. An annotation may also have multiple bodies or targets, For instance, an

annotation may serve to identify the resumes of people in contention for a job:<http://www.annotationserver.com/anno1234> rdfs:label "Data Scientist Contenders"; a oa:Annotation; oa:hasBody "Joe, these are the candidates I think we should be looking at for the data scientist position."; dc:subject <http://www.example.com/posting/jobDataScientist1>; oa:hasTarget <http://www.example.com/resume/janeMarple>, <http://www.example.com/resume/deweyCheatem>, <http://www.example.com/resume/sherlockHolmes>, <http://www.example.com/resume/herculesPoroit>; oa:motivation oa:identifying; dc:creator foaf:tomSwift;.

In this case, the body is the content of the annotation, the subject provides a reference to an

external position, the motivation for the annotation is to identify individuals, and the targets are

the four individuals listed. The oa: is a namespace prefix identifying the various terms as being

in the Open Annotation namespace, while dc: is Dublin Core, a common standard for

identifying publishing metadata.

An annotation can be thought of as a link on steroids. The above annotation creates associations

between four different individuals through an external agency, something that can't easily be

done on the web. Similarly, an annotation can also link to another annotation:<http://www.annotationserver.com/anno1235> a oa:Annotation; oa:hasBody "I like Jane Marple. She seems well qualified."; dc:subject <http://www.example.com/resume/janeMarple>; dc:subject <http://www.example.com/posting/jobDataScientist1>; oa:hasTarget <http://www.annotationserver.com/anno1234>; oa:motivation oa:assessing; dc:creator foaf:janeDoe;.

<http://www.annotationserver.com/anno1236> a oa:Annotation; oa:hasBody "I have problems with Dewey Cheatem. His resume looks dubious."; dc:subject <http://www.example.com/resume/deweyCheatem>; dc:subject <http://www.example.com/posting/jobDataScientist1>; oa:hasTarget <http://www.annotationserver.com/anno1234>; oa:motivation oa:assessing; dc:creator foaf:mattSmith;.

Here, there are two annotations, the first endorsing Jane Marple, the second raising questions

about Dewey Cheatem. Both point back to the original annotation, and the subjects, in turn, are

what were targets in the previous annotation.

How Web Annotations WorkOne key difference between a link in an HTML document and an annotation is that a link is

embedded within a document, and is auto-selected by what's within the associated <a> tag.

With annotations, on the other hand, the target - the thing being annotated - is indicated by a

document located on a server, called an annotation server. Corresponding to this is

an annotation client, which can be thought of as being like a news reader - it takes all of the

annotations either associated with a given target or that corresponds to a given keyword or

query. In the former case, what comes back are all of the available "comments" for a given

document, while in the latter case, what comes back are suggestions about targets that satisfy a

search query ("give me all annotations about cats and show what they are annotating"). You can

also get conversations back - annotations that point to other annotations much like a layered

comment feed looks at the end of most articles ( or Facebook responses). The client app would

then be responsible for presenting this information in a user-friendly manner.

Conceivably there could be thousands or even millions of such servers. Public annotation

servers (perhaps hosted by Google or Facebook or Twitter - or the next incarnations of these

kinds of companies) would be able to aggregate commentary about the web in general, while

private annotation servers might be run by investment banks or financial analysts, health care or

life sciences organizations, political activists and so forth, that would be subscription only.

Since concepts - things - can be referenced as targets, it's not hard to imagine an investment

company annotating stocks, bonds or financial instruments (as well as regulatory structures),

news organizations tagging specific watch topics or movie fans (and studios) tagging specific

movies, TV shows or games. IMDB does some of this last one now (as does Rotten Tomatoes)

but you could log into both the (hypothetical) IMDB or Tomato annotation servers and be able

to plug directly into the information space that either of these services would provide.

Since such annotation servers can also add tagging into established taxonomies, with enough

such servers in place, you can create a semantic profile of a person, organization, thing, work of

art or topic that makes it possible to query across all such servers universally, and can go a long

way of providing a provenance trail to both ensure the legitimacy of specific information and

how that information has changed over time.

Now, none of this is here yet. The Web Annotation Working Group has pushed

the annotation model itself, its vocabulary, and the transport protocol to

Candidate Recommendation status, meaning that you should start to see early prototypes of

annotation servers and clients begin to appear by mid-2017, likely with universities, research

organizations and banks taking the lead here for internal applications. However, such services

will likely go commercial within the next two or three years, and by 2020 we could see a new

annotation explosion as more and more things (and even people) begin to develop their own

annotation spaces.

So, get out the sticky notes.

Kurt Cagle is a writer, blogger, and owner of Semantical LLC, a smart data company. He's covered with sticky notes.

Annotations are coming to the web

Data & Analytics