● Use doc2vec for topic calculus ● Use model trained on Wikipedia articles for topics ● Extract topic labels by compare email vectors & cluster keyword sets to topic vectors ● Choose a set of topics that together best describe a email Topic Analysis Input Communication groups Temporal Chains Textual Report Generation from Email utilizing Temporal Topic Analysis ● Two email datasets: ENRON & Avocado ● Enron contains ~500K emails from 150 employees ● Avocado Research Email Collection contains ~1M emails from 282 accounts ● Group people into clusters based on communication frequency ● Draw graph of communications, weigh edges with email count ● Extract topics for each cluster ● Use clusters to determine communication patterns & anomalies ● Resulting components represent communication groups Report Generation Topic Ranking ● Use the hierarchical structure from the analysis (communication groups, email grouping, topic chains, anomalies, etc.) ● Select relevant details to help user understand context of report, based on particular template of choice (summary vs anomalies) ● Reason over content to select good organization/display style. ● Supports multiple report templates, including summary- and anomaly-focused output, with modular extensibility for other styles Reply / Forward / Related ● Organize emails into topic chains by looking at replies, forwards, and by comparing topics ● Identify topic flow/change over time Collaboration We are proud of a successful collaboration between NC State and the LAS, including monthly meetings with excellent feedback and ideas. • We use doc2vec to compute similarity via cosine distance • For topic labeling, we rank topics using additional criteria: ○ PageRank ○ Coverage ○ Redundancy Colin M. Potts NC State University [email protected] Sean Lynch & Tracy Standafer Laboratory for Analytic Science [email protected] | [email protected] θ