Top Banner
CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License .
20

CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

Dec 27, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

CS 679: Advanced NLP

Lecture #1: Introduction to Text Mining

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.

Page 2: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

Objectives for Today

1. Quick course info.2. Overview of Text Mining3. Discuss your applications of Text Mining4. Elements of Text Mining5. Introduce course objectives

Page 3: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

Course Info. Office Hours:

Tue & Thu. 3-4pm (without appointment) OR by appointment

TA: TBD Web page: https://facwiki.cs.byu.edu/cs679

Syllabus Regularly updated schedule: Due dates, Reading

assignments, Projects guidelines, Lecture Notes Google Group “BYU CS 679” Email: ringger AT cs DOT byu DOT edu Grades: http://gradebook.byu.edu

Page 4: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

Assignments Readings – with max. one page reports

Mostly research papers (see course web page for all hyperlinks)

Usually one reading report per week

Intro. Projects Presentation Report

Semester Project Proposal Presentation Report

Page 5: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

Course Policies

Early Late Grades Other

See Syllabus for details

Page 6: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

Text Mining

The process of discovering previously unknown information in large text collections

Paraphrased from M. Hearst

Page 7: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

Other Definitions

Looking for patterns in unstructured text (Nahm)

Text mining applies the same analytical functions of data mining to the domain of textual information (Doore(

Page 8: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

“Search” versus “Discover”

Data Mining

Text Mining

DataRetrieval

InformationRetrieval

Search(goal-oriented)

Discover(opportunistic)

StructuredData

UnstructuredData (Text)

Credit: adapted from slide by Nathan Treloar, AvaQuest

Page 9: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

Your Exciting Applications

Page 10: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

F2011: Your Exciting Applications

Page 11: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

W2011: Exciting Applications

Page 12: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

2010: Exciting Applications

Page 13: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

2009: Exciting Applications

Page 14: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

Additional Applications

News Mining Sentiment Detection Summarization Trend Analysis Association Detection

Page 15: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

Course Objectives Acquire experience conducting exploratory data analysis on

large collections of text Gain in-depth experience with and understanding of

approaches to document classification sentiment classification

feature engineering feature selection

document clustering unsupervised topic identification visualization, including document summarization

Build a foundation of techniques for approximate Bayesian reasoning for unsupervised text analysis

Page 16: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

Course Objectives (2)

Obtain experience with techniques for evaluating and visualizing the results of unsupervised learning processes

Independent investigation of methods of your choice!

Application of your methods to learn something important from a significant text corpus of your choice

Page 17: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

Simplistic Text Mining Process

Credit: NCSA

Page 18: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

Methods

Feature Engineering Feature Selection Information Extraction Categorization (Supervised) Clustering (Unsupervised) Topic Identification / Topic Modeling Visualization

Page 19: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

Some Available Data Sets 20 Newsgroups -- Usenet Reuters (1990s) newswire Del.icio.us bookmarked web pages Enron Email Movie Reviews Gamespot game reviews General Conference State of the Union Campaign Speeches

… Yours!

Page 20: CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

Assignment

Reading for next time: Course Syllabus "Tapping the Power of Text Mining" by Fan et al.

(CACM 9/2006) "Text-Mining the Voice of the People" by

Evangelopoulos et al. (CACM 2/2012) Skim: Alta Plana Text Analytics Report

Reading Report #1 % Completed Questions