1 Folksonomies Inhaltserschließung und Retrieval im Web 2.0 und in Bibliotheken Dr. phil. Isabella Peters Heinrich-Heine-Universität Düsseldorf Abteilung für Informationswissenschaft Uni Graz – 17. Dezember 2009
1
FolksonomiesInhaltserschließung und Retrieval
im Web 2.0und
in Bibliotheken
Dr. phil. Isabella Peters
Heinrich-Heine-Universität Düsseldorf
Abteilung für Informationswissenschaft
Uni Graz – 17. Dezember 2009
2
Folksonomies: Indexing without Rules
“Anything goes”
“Against method”, 1975 (Paul K. Feyerabend, Austro-American
philosopher)
Tagging
• no rules
• no methods – or even against methods
• indexing a single document
– synonyms – why not? (New York – NY – Big Apple – … )
– homonyms – never heard! (not: Java [Programming Language] – Java
[Island], but Java)
– translations – why not? (Singapore – Singapur – …)
– typing errors – nobody is perfect (Syngapur)
– hierarchical relations (hyponymy) – why not? (Düsseldorf –
North Rhine-Westfalia – Germany)
– hierarchical relations (meronymy) – why not? (tree – branch – leaf)
3
Indexing – in general
4
Tri-partite System of Folksonomies
Folksonomies consist always of 3 parts
1) document (resource)
2) prosumer (user)
3) tag
5
Users – Tags - Documents
thematically linked
shared users thematically linked
shared documents
6
Shared Documents & Thematically
Linked Users
more like this ...
� similar documents
detection of documents
more like me ...
� similar users
detection of communities
thematically linked
shared documents
7
More like me! Or: More like This User!
• starting point: single user (ego)
• processing
– (1) tag-specific similarity• all tags of ego: a(t)
• all tags of another user B: b(t)
• common tags of ego and another user B: g(t)
– (2) document-specific similarity• all tagged documents of ego: a(d)
• all tagged documents of another user B: b(d)
• common tagged documents of ego and another user B: g(d)
– calculation of similarity• tag-specific: Jaccard-Sneath: Sim(tag; Ego,B) = g(t) / [a(t) + b(t) – g(t)]
• document-specific: Jaccard-Sneath: Sim(doc; Ego,B) = g(d) / [a(d) + b(d) – g(d)]
• ranking of Bi by similarity to ego (say, top 10 tag-specific and top 10 document-specific users)
• merging of both lists (exclusion of duplicates)
• cluster analysis (k-nearest neighbours, single linkage, complete linkage, group average linkage)
– result presentation: social network of ego in the centre
8
More like me! Or: More like This User!
single linkage clustering (fictitious example)
Sim(tag) = 0.21
Sim(doc) = 0.25
Sim(tag) = 0.65
Sim(doc) = 0.55
Sim(tag) = 0.33
Sim(doc) = 0.29
Sim(tag) = 0.17
Sim(doc) = 0.23
Sim(tag) = 0.08
Sim(doc) = 0.11
Sim(tag) = 0.15
Sim(doc) = 0.17
Sim(tag) = 0.45
Sim(doc) = 0.36
9
Narrow Folksonomies
• only onetagger (the content creator)
• no multiple tagging
• example: YouTube
Tags
10
Extended Narrow Folksonomies
• more than one tagger
• no multiple tagging
• example: Flickr
Source: Vander Wal (2005)
Tags
Add Tags Option
11
Broad Folksonomies
• more than one tagger
• multiple tagging
• example: Delicious
Source: Vander Wal (2005)
Tags
12
Folksonomies make use of
Collective Intelligence
Collective Intelligence
• “Wisdom of the Crowds” (Surowiecki)
• “Hive Minds” (Kroski) – “Vox populi” (Galton) – “Crowdsourcing”
• no discussions, diversity of opinions, decentralisation
• users tag a document independently from each other
• statistical aggregation of data
Collaborative Intelligence
• discussions and consensus
• prototype service: Wikipedia (but: 90 + 9 + 1 – rule)
“Madness of the Crowds”
• e.g., soccer fans – hooligans
• no diversity of opinion – no independence – no decentralisation –no (statistical) aggregation
13
Power Tags
• Power Law Distribution • Inverse-logistic Distribution
Power Tags Power Tags
14
Power Law Tag Distribution
Source: http:// del.icio.us
Tags zu www.visitlondon.com
0
10
20
30
40
50
60
70
Lond
on
Trav
el
UKEn
gland
Tour
ism
Guid
e
Cultu
reIn
form
ation
Ente
rtainm
ent
Holid
ayLo
ndre
s
Lond
ra
f (x)= C / xa
Users
Tags
80/20-Rule
Power Tags
Long Tail
15
Tags zu www.asis.org
0
5
10
15
20
25
30
35
Assoc
iation
sLib
rary
Inform
ation
Inform
ation
scien
ce IATe
chno
logy
Profes
siona
lRes
earch
Usabil
ityScie
nce
Libra
ries
Web
Inform
ation
arch
itectu
re
ITOrg
aniza
tions
Archite
cture
Organ
zatio
nCom
puter
sCon
feren
ce
Inform
ation
_arch
itectu
re
Inform
ation
_scie
nce
Societ
y
Inverse-logistic Tag Distribution
Source: http:// del.icio.us
Users
Tags
f (x)= e-C‘(x-1)b
Long Trunk
Long Tail
Power Tags
16
Use of Power Tags
• Power Tags as factor in relevance ranking �
documents tagged with Power Tags appear higher in
ranking
• Power Tags as candidate tags for Tag Gardening �
which (semantic) relation do they have with co-
occuring tags?
17
Benefits of Indexing with Folksonomies
• authentic user language – solution of the “vocabulary problem”
• actuality
• multiple interpretations – many perspectives – bridging the semantic gap
• raise access to information resources
• follow “desire lines” of users
• cheap indexing method – shared indexing
• the more taggers, the more the system becomes better – network effects
• capable of indexing mass information on the Web
• resources for development of knowledge organization systems
• mass quality “control”
• searching - browsing – serendipity
• neologisms
• identify communities and “small worlds”
• collaborative recommender system
• make people sensitive to information indexing
18
Disadvantages of Indexing with
Folksonomies
• absence of controlled vocabulary
• different basic levels (in the sense of Eleanor Rosch)
• different interests – loss of context information
• language merging
• hidden paradigmatic relations
• merging of formal (bibliographical) and aboutness tags
• no specific fields
• tags make evaluations (“stupid”)
• spam-tags
• syncategoremata (user-specific tags, “me”)
• performative tags (“to do”, “to read”)
• other misleading keywords
� solution: Tag Gardening with methods of Information Linguistics, user
collaboration in giving meaning to tags and combination with existing
knowledge organization systems
19
Goal of Tag Gardening: EmergentSemantics
Quelle: Peters, I., & Weller, K. (2008). Tag Gardening for Folksonomy Enrichment and Maintenance. Webology, 5(3), Article 58, from http://www.webology.ir/2008/v5n3/a58.html.
20
Maintenance of KOS and Folksonomy
Folksonomy KOS
Tag Gardening
new terms – new relations
Quelle: Christiaens, S. (2006). Metadata Mechanism: From Ontology to Folksonomy…and Back. LectureNotes in Computer Science, 4277, 199–207.
21
Feedback Loop in Practice:
Tagging of OPACs
2 possibilities:
• 1) tagging of resources within the library’s website
• 2) tagging of resources outside the library’s firewall
22
Tagging of OPACS: Within Library’s
Website: PennTags
http://tags.library.upenn.edu/
23
Tagging of OPACS: Within Library’s
Website: Ann Arbor District Library
http://www.aadl.org/catalog
24
Tagging of OPACS: Within Library’s
Website: University Library Hildesheim
http://www.uni-hildesheim.de/mybib/all_tags
25
Tagging of OPACS: Within Library’s
Website
• advantages:
– user behaviour can be directly observed and
exploited for own applications
– used knowledge organization system (KOS) can
profit from user behaviour and user language
– users will be “attracted” to the library
– library will appear “trendy”
26
Tagging of OPACS: Within Library’s
Website
• disadvantages:
– development and implementation (costs and
manpower) of the tagging service have to be taken
over from the library
– if only users may tag: librarians may loose their
work motivation or may have a feeling of
uselessness
– “lock- in”- effect of users � no “fresh” ideas
27
Tagging of Resources Outside the
Library‘s Firewall: LibraryThing
http://www.librarything.com/search
28
Tagging of Resources Outside the
Library‘s Firewall: BibSonomy
http://www.bibsonomy.org/
29
Tagging of Resources Outside the
Library‘s Firewall
• advantages:
– development and implementation (costs and
manpower) of the tagging service haven‘t to be
taken over from the library
– the library may profit from the “know- how” of the
provider of the tagging system
– users may profit from tagging activities of
hundreds of other users � no lock- in
– library appears “trendy”
30
Tagging of Resources Outside the
Library‘s Firewall
• disadvantages
– user behaviour cannot be observed or exploited
– your users support other tagging service
– used KOS cannot profit from user behaviour
31
Exkurs: Sentiment Tags
• negative tags: “awful” – “foolish”, …
• positive tags: “amazing” – “useful”, …
• applicable for sentiment analysis of documents
Quelle: Yanbe, Y., Jatowt, A., Nakamura, S., & Tanaka, K. (2007). Can Social Bookmarking Enhance Search in the Web? In Proceedings of the 7th ACM/IEEE Joint Conference on Digital Libraries, Vancouver, Canada (pp. 107–116).
32
Summary
• knowing how folksonomies work is important for their
adequate application in both
– knowledge representation and
– information retrieval
• knowing why folksonomies work is a secret ☺
33
Knowledge Representation and
Information Retrieval
• two sides of the same coin
• Immanuel Kant: Thoughts without content are
empty, intuitions without concepts are blind...
Knowledge Representationwithout Information Retrieval is
empty.
Information Retrieval without Knowledge
Representation is blind.
FeedbackLoop
34
Folksonomies and
Knowledge Organization Systems
• two sides of the same coin
• no rivals - work best in combination!
flexible, up-to-date, user-centric precise, rigid, complete
FeedbackLoop
35
Viele Grüße aus Düsseldorf.
Kontakt: isabella.peters@uni- duesseldorf.de
Erschienen 2009 im Verlag Saur, de Gruyter