Social Web 2.0 Class Week 8: Social Metadata, Ratings, Social Tagging

Social Web 2.0Implications of Social Technologies for Digital Media

Shelly Farnham, Ph.D.

Com 597 Winter 2007

Week 8 Social Metadata: from Rater Systems to

Folksonomies Social Navigation, Social Tagging

Class exercise

You as community of practice: Students/professionals digital media

Tag yourself Five most significant people you interact with, related

to areas of interest in digital media Five most significant organizations/events (e.g.,

where you work, events you go to, professional orgs you are a a part of, volunteer orgs, projects you are a part of)

Five tag words most express your areas of interest Note: will read aloud

Problem Statement

Attention economy Wealth of information creates poverty of attention Need to allocate attention efficiently

Copious amounts of online content Much of it user generated content (UGC) Social metadata layer

Authorship (who said what) Activity/history/transaction data (who did what) Relationships, network/groups data (Who knows who) Semantic tagging (who said what it is about)

How use social metadata around content to aid users in wisely spending their limited attention?

blogs home pages

digital libraries news

search hyperlinked browsing readers meme maps tag clouds

social metadata

collaborative filtering

prominencenetwork, group affinity

photos

Co

nte

nt

Filt

ers

UI

authorship

user activity

tagsGroups/networks

music

user

filtering by preference similarity filtering by social proximity filtering by highest rank

ratings

co-presence

soci

alm

eta

data

Challenges

Keeping up with proliferation of content Very dynamic: constantly updating, changing

Integrating data across sites Extracting semantic meaning for digital objects Holy grail: Semantic web

Common formats for integration and combination of data drawn from diverse sources

Language for how data relates to real world objects Actionable information, interoperability based on meaning RDF (resource description framework), XML

Social Navigation Assumption

Where people spend their time a good approximation of value

User behavior guide us to interesting/relevant information Google, what do people link to? BlueDot, what do people

bookmark?

Automated Recommendation Systems Collaborative Filtering

Provide some information about your preferences Get recommendations based what else people who

shared your preferences also liked Amazon.com

Proximity/clustering analyses Objects that occur near each other frequently must be related Kartoo.com

Social Tagging

Add tags to objects to bookmark Individual motivation, organizes in your list of favorites Aggregated collective knowledge, importance emerges

can see most popular tags, most popular items with same tags

A.k.a. User generated meta-data Collaborative tagging Ethnoclassification Folksonomies

No hierarchy, just automatically generated related tags Categorization vs. classificaiton (inclusive, not exclusive)

Delicious

Where

Social

Tagging

Started

Flickr

Tag cloud

Alphabetic order

Font size: Prominence Related

Flickr

Geo tagging

Meme Maps

BlueDot

Tagging

+

Social

network

The Power of Social Tagging Enables people to organize and access info that is

high in relevance Can enable social connectivity Low cost, shared workload, increases scalability low barrier to entry, no expertise required to participate User defined diction, terminology, in any knowledge

domain, enables non-professionals to participate in system

Responsive to dynamic changes in terminology, new terminology, changes and innovation in resources, emergent taxonomy through “desire lines”

The Power of Social Tagging (cont’d) Accommodates relational info

structures (as opposed to hierarchical) Can use frequency of terms across

people to approximate prominence of idea, or prominence of tagged item (if lots of people tagged it, must be

important tag, or must be important/interestinfsg object)

Enables discovery of new terminology/resources through browsing, serendipitous

http://www.shirky.com/writings/ontology_overrated.html

Problems with Social Tagging Polysemy – single word has multiple meanings Synonymy – different words same meaning “basic level” – continuum of specificity

Individual difference Idiosyncratic, meta-noise, low quality of tags Inexpert taggers Keyword vs. keyword phases, dealing with spaces

and multiple words

Primary Objects of Social Tagging

Resource Semantic Tags People

Social Uses of Tags

Conversation/communication Group formation Event collections New terms, e.g. “sometaithurts, flicktion” Some self-referential: “me “to read”

Kinds of Tags

Identifying what about (dogs) Identifying what it is (book, article) Who owns it Refining (2001) Qualities (stupid) Self reference Task organizing

Design Considerations Tagging rights

Self vs. free for all Tagging support

blind (new terminoloyg), viewable, suggestive (convergence) Aggregation

Bag vs. set (no repetition) Tagger: author, user, expert

With user get collective wisdom/evaluation\ Immediate feedback: see associated items as soon as

you tag Type of object/resource Resource connectivity (linked, grouped) Social connectivity (linked, grouped)

User incentives

organizational or social Future retrieval Contribution and sharing Attract attention Self-presentation Opinion expression

Developing algorithms Start simple, become complicated Expect a lot of tweaking to get to what seems to fit

Different data, different models more appropriate Use “known” information space to evaluate

Keep data in its rawest form (for now) User, tag, tag resource, timestamp, tagging context? Metatag? Tagging type?

As reasonably as possible, develop algorithms that map onto understanding of information space Social networks, associative models of memory, etc.

Associative data structure Authors, resources, tags all objects that are associated by particular instances

of tagging/bookmarking – could include time of content creation as well Weighting associations/similarity measures

Expect to differentially weight associations Author, resource, high weight Tagger, resource, lower weight Bookmarker, resource, medium weight

Expect will be transforming data to tweak shape of distributions

Inferring Prominance

Frequency of occurrence(see dubinko as used on flickr,

“interestingness”) Spectral analysis: Social network analysis

style looking for hubsApplied to tags (Wu et al.)

SNA: betweenness centrality p. 188 Items all connect to this central item

Inferring Tag Relatedness

Frequency co-occurrence Between tags:

Co-occurrence in resources Co-ocurrence in author

Between resources: Co-occurrence of tags Co-occurrence of authors of tags, bookmarks

Between users: Relatedness of tags frequency of co-occurrence in tags

Indirect through analysis of emergent semantics (wu, 2006), dimensions Neural net semantic “spread”, via link weights Thesus: distance in ontological tree (Halkidi) p. 323 semantic proximity,

using WordNet or known corpus

Similarity Measures

Sim(a, b) = Nab/sqrt(Na * Nb) Or asymetric/bidirectional: Sim(a, b) Nab/Na Weight count for # of tags in doc? Assumption if only

two tags describe document are more highly related than if 8 tags describe document:

Sim(dog, cat) in {dog, cat} = 1 Sim(dog, cat) in {dog, cat, vet, mouse} = .5

(Weighted sums)/(sum of weights) for adding multiple similarity measures

(WeightA*SimA + WeightB*SimB + WeightC*SimC)/(WeightA + WeightB + WeightC)

Weight some keywords as more important (Haldiki), e.g. depending on position (Goldre shows that 1st usually more important)

Connectionist Models Based on neural net models

Set/net of nodes with activation levels and weights Activation on any node weighted sum of inputs including external Activation of nodes occurs upon co occurrence, or query?

Instance of Female and Cancer added together as tags Spread activation across paths

Female and Cancer to Breast Update path weights and activation values Iterate until stabilize (decay at each iteration) Weights decay with time and lack of activation across links

In search, filter for most similar items using link weights multiplied out 2 degrees

McClelland and Rumelhart in PDP vol. 2 Read and Miller p. 27 on the Dynamic Construction of Meaning Gelgi, spreading activation

Automated tagging?

(Brooks & Montanez) Using standard keyword extraction

methods (TFIDF score, extracting three words most frequent relative to standard frequency in a corpus)

Argue is more effective than social tagging for developing similarity measures across documents

Inferring Quality of Tags

Frequency of occurrence Filter out “one-offs” Author of tag

Time in systemHas contributed content Is part of community of interest (e.g., group)

“seed” tags?

Inferring Structures

Cluster analysis Agglomerative, average linkage Wu et al keywords at each hiearchy…

Wu 2006: Separable Mixture Model (some form of MDS?) emergent semantics Based on co-occurrence data (users, resources, and

tags) Say works reasonably up to 40 dimensions

Mapping to existing e.g. WordNet (Haldiki)

Inferring Structures

Machine tags, a.k.a. Triple tags Flickr, mostly at API level Metatag for a tag

Namespace – class, or facet redicate – name of the propertyValue

Flora:tree=coniferous

Design Implications

Use tags as another way to link people Social presence indicators increase tagging

(Lee): Profile subscribing to others’ bookmarks awareness of who else tagged like you ,etc.

Use as way to associate informal lay language and formal terminology

Importance of showing authors of tags

UI considerations

Add tag widgetAdd tags at time of bookmarking (delicious)Add tags at time of content creation (profile,

stories, journal entries)View my tagged items OR all tagged items

Chronological order By tag

References Aldenderfer, M., and Blashfied, R. (1984) Cluster Analysis. Sage Publications, Newbury Park. Bechtel, W., Abrahamsen, A. (1991). Connectionism and the Mind: An introduction to parallel processing in networks. Blackwell,

Oxford. Brooks, C., Montanez, N., (2006). Improved annotation of the blogosphere via autotagging and hierarchical clustering. WWW 2006,

Edinburgh, Scotland. Dubinko, M., Kumar, R., Magnani, J. (2006). Visualizing tags over time. WWW 2006, Edinburgh, Scotland. Gelgi, F., Vadrevu, S., Davulcu, H. () Improving web data annotattions with spreading activation. WISE 2005, New York, NY. Golder, S. A., Huberman, B. A. (2006?) The structure of collaborative tagging systems.

http://arxiv.org/ftp/cs/papers/0508/0508082.pdf Haldinki, M., Nguyen, B., Varlamis, I., Vazirgiannis, M. (2003). THESUS: Organizing web document collections based on link

semantics. VLDB Journal (2003) 12: 320-332. Kim, J., Candan, K. (2006). CP/CV Concept Similarity Mining without frequency information from domain describing taxonomies.

CIKM 2006. Lee, Kathy. (2006). What goes around comes around: an analysis of del.icio.us as social space. CSCW 2006. Marlow, C., Naarman, M., boyd, D., Davis, M. (2006). HT06, tagging paper, taxonomy, flickr, academic article, to read. HT 2006. Mathes, A. Folksonomies – Cooperative classification and communication through metadata.

http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html McClelland and Rumelhart, Parallel Distributed Processing Read, S. J. and Miller, L. C., eds. (1998). Connectionist Models of Social Reasoning and Social Behavior. Lawrence Earlbaum,

New Jersey. Shirky, Clay. (??) Ontology is Overrated: Categories, Links, and Tags. http://www.shirky.com/writings/ontology_overrated.html Wasserman, S., and Faust, K. (1994). Social Network Analysis: Methods and Applications. Cambridge University Press,

Cambridge, UK. Wu, X., Zhang, L., Yu, Y. (2006). Exploring social annotations for the semantic web. WWW 2006, Edinburgh, Scotland. Wu, H., Zubair, M., Maly, K. (2006). Harvesting social knowledge from folksonomies. HT 2006, Odense, Denmark. Xue, G., Zeng, H., Chen, Z., Ma, W., and Yu, Y. (2004). Similarity spreading: A unified framework for similarity calculation of

interrelated objects. WWW 2004 New York, New York.

Social Web 2.0 Class Week 8: Social Metadata, Ratings, Social Tagging

Education

social network

social web

use social metadata

power of social tagging

social proximity filtering

semantic tagging

folksonomies social

social navigation assumption