Social Web 2.0 Implications of Social Technologies for Digital Media Shelly Farnham, Ph.D. Com 597 Winter 2007
Jan 27, 2015
Social Web 2.0Implications of Social Technologies for Digital Media
Shelly Farnham, Ph.D.
Com 597 Winter 2007
Week 8 Social Metadata: from Rater Systems to
Folksonomies Social Navigation, Social Tagging
Class exercise
You as community of practice: Students/professionals digital media
Tag yourself Five most significant people you interact with, related
to areas of interest in digital media Five most significant organizations/events (e.g.,
where you work, events you go to, professional orgs you are a a part of, volunteer orgs, projects you are a part of)
Five tag words most express your areas of interest Note: will read aloud
Problem Statement
Attention economy Wealth of information creates poverty of attention Need to allocate attention efficiently
Copious amounts of online content Much of it user generated content (UGC) Social metadata layer
Authorship (who said what) Activity/history/transaction data (who did what) Relationships, network/groups data (Who knows who) Semantic tagging (who said what it is about)
How use social metadata around content to aid users in wisely spending their limited attention?
blogs home pages
digital libraries news
search hyperlinked browsing readers meme maps tag clouds
social metadata
collaborative filtering
prominencenetwork, group affinity
photos
Co
nte
nt
Filt
ers
UI
authorship
user activity
tagsGroups/networks
music
user
filtering by preference similarity filtering by social proximity filtering by highest rank
ratings
co-presence
soci
alm
eta
data
Challenges
Keeping up with proliferation of content Very dynamic: constantly updating, changing
Integrating data across sites Extracting semantic meaning for digital objects Holy grail: Semantic web
Common formats for integration and combination of data drawn from diverse sources
Language for how data relates to real world objects Actionable information, interoperability based on meaning RDF (resource description framework), XML
Social Navigation Assumption
Where people spend their time a good approximation of value
User behavior guide us to interesting/relevant information Google, what do people link to? BlueDot, what do people
bookmark?
Automated Recommendation Systems Collaborative Filtering
Provide some information about your preferences Get recommendations based what else people who
shared your preferences also liked Amazon.com
Proximity/clustering analyses Objects that occur near each other frequently must be related Kartoo.com
Social Tagging
Add tags to objects to bookmark Individual motivation, organizes in your list of favorites Aggregated collective knowledge, importance emerges
can see most popular tags, most popular items with same tags
A.k.a. User generated meta-data Collaborative tagging Ethnoclassification Folksonomies
No hierarchy, just automatically generated related tags Categorization vs. classificaiton (inclusive, not exclusive)
Delicious
Where
Social
Tagging
Started
Flickr
Tag cloud
Alphabetic order
Font size: Prominence Related
Flickr
Geo tagging
Meme Maps
BlueDot
Tagging
+
Social
network
The Power of Social Tagging Enables people to organize and access info that is
high in relevance Can enable social connectivity Low cost, shared workload, increases scalability low barrier to entry, no expertise required to participate User defined diction, terminology, in any knowledge
domain, enables non-professionals to participate in system
Responsive to dynamic changes in terminology, new terminology, changes and innovation in resources, emergent taxonomy through “desire lines”
The Power of Social Tagging (cont’d) Accommodates relational info
structures (as opposed to hierarchical) Can use frequency of terms across
people to approximate prominence of idea, or prominence of tagged item (if lots of people tagged it, must be
important tag, or must be important/interestinfsg object)
Enables discovery of new terminology/resources through browsing, serendipitous
http://www.shirky.com/writings/ontology_overrated.html
Problems with Social Tagging Polysemy – single word has multiple meanings Synonymy – different words same meaning “basic level” – continuum of specificity
Individual difference Idiosyncratic, meta-noise, low quality of tags Inexpert taggers Keyword vs. keyword phases, dealing with spaces
and multiple words
Primary Objects of Social Tagging
Resource Semantic Tags People
Social Uses of Tags
Conversation/communication Group formation Event collections New terms, e.g. “sometaithurts, flicktion” Some self-referential: “me “to read”
Kinds of Tags
Identifying what about (dogs) Identifying what it is (book, article) Who owns it Refining (2001) Qualities (stupid) Self reference Task organizing
Design Considerations Tagging rights
Self vs. free for all Tagging support
blind (new terminoloyg), viewable, suggestive (convergence) Aggregation
Bag vs. set (no repetition) Tagger: author, user, expert
With user get collective wisdom/evaluation\ Immediate feedback: see associated items as soon as
you tag Type of object/resource Resource connectivity (linked, grouped) Social connectivity (linked, grouped)
User incentives
organizational or social Future retrieval Contribution and sharing Attract attention Self-presentation Opinion expression
Developing algorithms Start simple, become complicated Expect a lot of tweaking to get to what seems to fit
Different data, different models more appropriate Use “known” information space to evaluate
Keep data in its rawest form (for now) User, tag, tag resource, timestamp, tagging context? Metatag? Tagging type?
As reasonably as possible, develop algorithms that map onto understanding of information space Social networks, associative models of memory, etc.
Associative data structure Authors, resources, tags all objects that are associated by particular instances
of tagging/bookmarking – could include time of content creation as well Weighting associations/similarity measures
Expect to differentially weight associations Author, resource, high weight Tagger, resource, lower weight Bookmarker, resource, medium weight
Expect will be transforming data to tweak shape of distributions
Inferring Prominance
Frequency of occurrence(see dubinko as used on flickr,
“interestingness”) Spectral analysis: Social network analysis
style looking for hubsApplied to tags (Wu et al.)
SNA: betweenness centrality p. 188 Items all connect to this central item
Inferring Tag Relatedness
Frequency co-occurrence Between tags:
Co-occurrence in resources Co-ocurrence in author
Between resources: Co-occurrence of tags Co-occurrence of authors of tags, bookmarks
Between users: Relatedness of tags frequency of co-occurrence in tags
Indirect through analysis of emergent semantics (wu, 2006), dimensions Neural net semantic “spread”, via link weights Thesus: distance in ontological tree (Halkidi) p. 323 semantic proximity,
using WordNet or known corpus
Similarity Measures
Sim(a, b) = Nab/sqrt(Na * Nb) Or asymetric/bidirectional: Sim(a, b) Nab/Na Weight count for # of tags in doc? Assumption if only
two tags describe document are more highly related than if 8 tags describe document:
Sim(dog, cat) in {dog, cat} = 1 Sim(dog, cat) in {dog, cat, vet, mouse} = .5
(Weighted sums)/(sum of weights) for adding multiple similarity measures
(WeightA*SimA + WeightB*SimB + WeightC*SimC)/(WeightA + WeightB + WeightC)
Weight some keywords as more important (Haldiki), e.g. depending on position (Goldre shows that 1st usually more important)
Connectionist Models Based on neural net models
Set/net of nodes with activation levels and weights Activation on any node weighted sum of inputs including external Activation of nodes occurs upon co occurrence, or query?
Instance of Female and Cancer added together as tags Spread activation across paths
Female and Cancer to Breast Update path weights and activation values Iterate until stabilize (decay at each iteration) Weights decay with time and lack of activation across links
In search, filter for most similar items using link weights multiplied out 2 degrees
McClelland and Rumelhart in PDP vol. 2 Read and Miller p. 27 on the Dynamic Construction of Meaning Gelgi, spreading activation
Automated tagging?
(Brooks & Montanez) Using standard keyword extraction
methods (TFIDF score, extracting three words most frequent relative to standard frequency in a corpus)
Argue is more effective than social tagging for developing similarity measures across documents
Inferring Quality of Tags
Frequency of occurrence Filter out “one-offs” Author of tag
Time in systemHas contributed content Is part of community of interest (e.g., group)
“seed” tags?
Inferring Structures
Cluster analysis Agglomerative, average linkage Wu et al keywords at each hiearchy…
Wu 2006: Separable Mixture Model (some form of MDS?) emergent semantics Based on co-occurrence data (users, resources, and
tags) Say works reasonably up to 40 dimensions
Mapping to existing e.g. WordNet (Haldiki)
Inferring Structures
Machine tags, a.k.a. Triple tags Flickr, mostly at API level Metatag for a tag
Namespace – class, or facet redicate – name of the propertyValue
Flora:tree=coniferous
Design Implications
Use tags as another way to link people Social presence indicators increase tagging
(Lee): Profile subscribing to others’ bookmarks awareness of who else tagged like you ,etc.
Use as way to associate informal lay language and formal terminology
Importance of showing authors of tags
UI considerations
Add tag widgetAdd tags at time of bookmarking (delicious)Add tags at time of content creation (profile,
stories, journal entries)View my tagged items OR all tagged items
Chronological order By tag
References Aldenderfer, M., and Blashfied, R. (1984) Cluster Analysis. Sage Publications, Newbury Park. Bechtel, W., Abrahamsen, A. (1991). Connectionism and the Mind: An introduction to parallel processing in networks. Blackwell,
Oxford. Brooks, C., Montanez, N., (2006). Improved annotation of the blogosphere via autotagging and hierarchical clustering. WWW 2006,
Edinburgh, Scotland. Dubinko, M., Kumar, R., Magnani, J. (2006). Visualizing tags over time. WWW 2006, Edinburgh, Scotland. Gelgi, F., Vadrevu, S., Davulcu, H. () Improving web data annotattions with spreading activation. WISE 2005, New York, NY. Golder, S. A., Huberman, B. A. (2006?) The structure of collaborative tagging systems.
http://arxiv.org/ftp/cs/papers/0508/0508082.pdf Haldinki, M., Nguyen, B., Varlamis, I., Vazirgiannis, M. (2003). THESUS: Organizing web document collections based on link
semantics. VLDB Journal (2003) 12: 320-332. Kim, J., Candan, K. (2006). CP/CV Concept Similarity Mining without frequency information from domain describing taxonomies.
CIKM 2006. Lee, Kathy. (2006). What goes around comes around: an analysis of del.icio.us as social space. CSCW 2006. Marlow, C., Naarman, M., boyd, D., Davis, M. (2006). HT06, tagging paper, taxonomy, flickr, academic article, to read. HT 2006. Mathes, A. Folksonomies – Cooperative classification and communication through metadata.
http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html McClelland and Rumelhart, Parallel Distributed Processing Read, S. J. and Miller, L. C., eds. (1998). Connectionist Models of Social Reasoning and Social Behavior. Lawrence Earlbaum,
New Jersey. Shirky, Clay. (??) Ontology is Overrated: Categories, Links, and Tags. http://www.shirky.com/writings/ontology_overrated.html Wasserman, S., and Faust, K. (1994). Social Network Analysis: Methods and Applications. Cambridge University Press,
Cambridge, UK. Wu, X., Zhang, L., Yu, Y. (2006). Exploring social annotations for the semantic web. WWW 2006, Edinburgh, Scotland. Wu, H., Zubair, M., Maly, K. (2006). Harvesting social knowledge from folksonomies. HT 2006, Odense, Denmark. Xue, G., Zeng, H., Chen, Z., Ma, W., and Yu, Y. (2004). Similarity spreading: A unified framework for similarity calculation of
interrelated objects. WWW 2004 New York, New York.