Top Banner
Conversation Conversation Disentanglement in Disentanglement in Sports Discourse Sports Discourse Anthony Wong 6/01/11
14

Conversation Disentanglement in Sports Discourse

Jan 14, 2016

Download

Documents

Amelie Leblanc

Conversation Disentanglement in Sports Discourse. Anthony Wong 6/01/11. Importance of Topic. What is conversation disentanglement? Clustering task, diving a transcript into a number of smaller, separate conversations Conversation disentanglement has a couple practical applications: - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Conversation Disentanglement in Sports Discourse

Conversation Conversation Disentanglement in Disentanglement in

Sports DiscourseSports Discourse

Anthony Wong6/01/11

Page 2: Conversation Disentanglement in Sports Discourse

Importance of TopicImportance of TopicWhat is conversation disentanglement?

◦Clustering task, diving a transcript into a number of smaller, separate conversations

Conversation disentanglement has a couple practical applications:◦Summary generation◦User-interface systems like automatic

threading

Page 3: Conversation Disentanglement in Sports Discourse

Basis of my ApproachBasis of my Approach

Michael Elsner and Eugene Charniak (2008)◦Uses lexical and non-lexical features

to cluster different threads Time between utterances, same

speaker, number of shared words, “content” words

Page 4: Conversation Disentanglement in Sports Discourse

Proposed Project Proposed Project OverviewOverviewFollow the methodology in Elsner and

Charniak’s paper◦Create and annotate a dataset of sports

discourseUse existing Elsner/Charniak model to

provide a baseline classification results and see how well their model adapts to a different chat domain

Test out different feature combination to hopefully raise performance

? – Compare results with Elsner/Charniak paper in some meaningful way

Page 5: Conversation Disentanglement in Sports Discourse

Progress so farProgress so far

Page 6: Conversation Disentanglement in Sports Discourse

Retrieving and preparing Retrieving and preparing datadata

Page 7: Conversation Disentanglement in Sports Discourse

Retrieving and preparing Retrieving and preparing datadata

Page 8: Conversation Disentanglement in Sports Discourse

Annotating the dataAnnotating the data

Page 9: Conversation Disentanglement in Sports Discourse

Annotating the dataAnnotating the data

T1 715 KateC : Sam - this is going to be painful, isn't it? T1 715 SamHolako : I hope not Kate, but Howard, Nelson and Carter have killed the Raptors in the past T2 715 JaredWade : Classic Frisco. The Minnesota bathroom smells worse, I hear. T3 715 Anthony(RapsFan) : @Batman: His WP48 is the worst on the team. Andrea is terrible. He scores. That's about it. T3 715 Arnold : Holy impossibilities , Batman - that won't happen. T4 715 BretLaGree : Raja Bell and Mike Bibby just held a flop-off in the lane. Bell won. T5 715 Bobbo : Zach, Go hit up Cinnabun!!! worth the $$...write it off to ESPN anyway T5 715 ZachHarper : I don't think it works that way T6 715 Aras : Jared! T6 715 JaredWade : Aras.

Page 10: Conversation Disentanglement in Sports Discourse

Annotating the dataAnnotating the dataThe annotated part of this transcript

has 399 lines.177 unique threads.The average conversation length is

2.25423728814 .The median conversation length is 2 .The entropy is 7.0155726118 bits.The median chat has 0.0 interruptions

per line.The average block of 10 contains

6.25706940874 threads.The line-averaged conversation density

is 2.77944862155 .

Page 11: Conversation Disentanglement in Sports Discourse

Running Elsner model as Running Elsner model as isis T1 715 KateC : Sam - this is going to be painful, isn't it? T2 715 SamHolako : I hope not Kate, but Howard,

Nelson and Carter have killed the Raptors in the past T3 715 JaredWade : Classic Frisco. The Minnesota

bathroom smells worse, I hear. T4 715 Anthony(RapsFan) : @Batman: His WP48 is the

worst on the team. Andrea is terrible. He scores. That's about it.

T5 715 Arnold : Holy impossibilities , Batman - that won't happen.

T6 715 BretLaGree : Raja Bell and Mike Bibby just held a flop-off in the lane. Bell won.

T7 715 Bobbo : Zach, Go hit up Cinnabun!!! worth the $$...write it off to ESPN anyway

T8 715 ZachHarper : I don't think it works that way T9 715 Aras : Jared! T9 715 JaredWade : Aras.

Page 12: Conversation Disentanglement in Sports Discourse

Running Elsner model as Running Elsner model as isis368 unique threads.The average conversation length is

1.08423913043 .The median conversation length is

1 .The entropy is 8.48485646504 bits.The median chat has 0.0

interruptions per line.The average block of 10 contains

9.52699228792 threads.The line-averaged conversation

density is 1.42355889724 .

Page 13: Conversation Disentanglement in Sports Discourse

Editing the model and Editing the model and evaluationevaluation

Still in progress◦A lot of room for improvement◦Many different feature combinations

to try

Need to get evaluation code running

Page 14: Conversation Disentanglement in Sports Discourse

IssuesIssuesDocumentation for Elsner code is

good, but my Python is not

Integration issues between my data and Elsner code

MEGA Model Optimization Package (megam)