Nisha Ranga DYNAMICS OF CONVERSATION
OBJECTIVE• INTRODUCTION
• PRELIMINARIES
• PROPERTIES OF CONVERSATIONS
• Size and Depth
• Degree
• Authorship
• MODELS
• BP-Model
• T-Model
• Mixture Model
INTRODUCTION
Analyze the structure of conversations :
• Usenet Groups
• Yahoo! Groups
DataSet consists of
• ID of message
• ID of parent message
• Author of the message
• Timestamp
INTRODUCTION
• How do online conversations build?
• What similarities and difference can be observed between different groups?
• Is there any model that human communication follows?
PRELIMINARIES
• Propose a simple mathematical model for the structure of conversations
• Account for factors such as recency and author identity that may affect conversations.
• Compare the predictions of these models back to the empirical data for three datasets: Usenet groups, Yahoo! Groups, and Twitter
NOTATIONS
• Denote messages by letter u, v, w…
• Messages are assumed to have a thread structure
• A message with no children is a leaf message
• A message with no parent is the root message
• t(u) is a timestamp of message u
• The messages in a thread are created chronologically
• If ‘u’ is a parent of ‘v’, then t(u) <= t(v)
THREAD
Def: The root message, along with its descendants form a connected component which is called a thread
PROPERTIES OF CONVERSATION
• SIZE AND DEPTH OF THREAD• Depth: length of the maximum path from the root to a leaf in a thread• Size is roughly quadratic to depth
PROPERTIES OF CONVERSATION
• DEGREE OF A THREAD
• Degree distribution is closer to power law i.e. p(k) k- for some >2
• Degree distribution is not independent of the level of a thread
• If root is at level 1, then degree distribution becomes ‘steeper’ with the level as having more children becomes less likely at higher levels
PROPERTIES OF CONVERSATION
• AUTHORSHIP
• There is a polynomial relationship between the size of a thread and the number of authors participating in the tread
• The author A(u) of a message u is the person who wrote it
• A single person can author multiple messages in a thread
BRANCHING PROCESS MODEL(BP-MODEL)
• Each thread starts with a root node
• At the ith level of the thread constructed each node generates a certain number of children according to the distribution p
• p(k) is a probability of a leaf u to have k children
• The process terminates when there are no more children
• Let be the random variable denoting the number of children at 𝑍𝑖the th level, then𝑖
𝑍 = ∑ 𝑖 𝑍
Where Z denotes the size of the thread
DRAWBACK OF BP-MODEL
• The model is not generative, i.e., the degree distribution is stipulated
• This model cannot capture the depth distributions of threads that are observed in reality
• In the branching process model, the number of children at each node is determined by a single distribution
• The branching process model does not capture the order in which the messages are created, i.e., the timestamps associated with the messages are left out
• It does not capture the author of messages
T-MODEL
• Threads grows in a discrete time steps
• Either a thread is stopped i.e no more messages are added
• A message is posted in response to the current message v
• Current degree of v – degv
• Recency of v – rv
• h(degv, rv) = degv+rv for constants >=0 and (0,1)
• Thus, both degree and recency play a role in generating different types of threads
• If degree plays a major role then the tread is bushy
• If recency plays a major role then thread is skinny
TI- MODEL• This model is used for author identity
• Author tends to respond to responses to their own earlier messages
• “Identity copying” effect
• New message v arrives with u=parent(v)
GROUP
fa.linux.kernel 0.98
uk.politics.electoral 0.98
rec.arts.drwho 0.97
uk.politics.crime 0.97
chile.soc.politica 0.96
USENET
GROUP It.discussioni.leggende.metropolitane 10
It.politica.polo 10
Rec.games.chess.politics 3
Bln.politik.rassismus 2
Sk.politics 1.5
Preferential behavior:Highest degree of preferential attachment is shown below
Recency behavior:Higher recency effect is shown below
USENET• IDENTITY “COPYING”
• High (low copying rate) indicates new authors tend to join in often
• Low (high copying rate) indicates tendency for authors of posts to have previously already authored a post
High (low copying rate): or.politics
alt.fan.cecil-adams
alt.marketplace.online.ebay
pl.misc.kolej
rec.arts.sf.written
Low (high copying rate) linux.debian.bugs.dist
microsoft.public.excel.misc
microsoft.public.excel.programming
nctu.talk
tw.bbs.campus.nctu
YAHOO! GROUPS• Groups with high degree of preferential attachment () and high
recency effect ()
Group
indianmedical =10
IllinoisSpeakers
DetectiveRichardHead
Bodybuildersaverageguys
villageDesign
NorthCarolinaSpeakers =0.99
stbaseliosorthodoxchurch
LostnFoundEvents
PatriceVinci
molecular-biology-notebook