CS 679: Advanced NLP Lecture #1: Introduction to Text Mining This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.

Post on 27-Dec-2015

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

CS 679: Advanced NLP

Lecture #1: Introduction to Text Mining

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.

Objectives for Today

1. Quick course info.2. Overview of Text Mining3. Discuss your applications of Text Mining4. Elements of Text Mining5. Introduce course objectives

Course Info. Office Hours:

Tue & Thu. 3-4pm (without appointment) OR by appointment

TA: TBD Web page: https://facwiki.cs.byu.edu/cs679

Syllabus Regularly updated schedule: Due dates, Reading

assignments, Projects guidelines, Lecture Notes Google Group “BYU CS 679” Email: ringger AT cs DOT byu DOT edu Grades: http://gradebook.byu.edu

Assignments Readings – with max. one page reports

Mostly research papers (see course web page for all hyperlinks)

Usually one reading report per week

Intro. Projects Presentation Report

Semester Project Proposal Presentation Report

Course Policies

Early Late Grades Other

See Syllabus for details

Text Mining

The process of discovering previously unknown information in large text collections

Paraphrased from M. Hearst

Other Definitions

Looking for patterns in unstructured text (Nahm)

Text mining applies the same analytical functions of data mining to the domain of textual information (Doore(

“Search” versus “Discover”

Data Mining

Text Mining

DataRetrieval

InformationRetrieval

Search(goal-oriented)

Discover(opportunistic)

StructuredData

UnstructuredData (Text)

Credit: adapted from slide by Nathan Treloar, AvaQuest

Your Exciting Applications

F2011: Your Exciting Applications

W2011: Exciting Applications

2010: Exciting Applications

2009: Exciting Applications

Additional Applications

News Mining Sentiment Detection Summarization Trend Analysis Association Detection

Course Objectives Acquire experience conducting exploratory data analysis on

large collections of text Gain in-depth experience with and understanding of

approaches to document classification sentiment classification

feature engineering feature selection

document clustering unsupervised topic identification visualization, including document summarization

Build a foundation of techniques for approximate Bayesian reasoning for unsupervised text analysis

Course Objectives (2)

Obtain experience with techniques for evaluating and visualizing the results of unsupervised learning processes

Independent investigation of methods of your choice!

Application of your methods to learn something important from a significant text corpus of your choice

Simplistic Text Mining Process

Credit: NCSA

Methods

Feature Engineering Feature Selection Information Extraction Categorization (Supervised) Clustering (Unsupervised) Topic Identification / Topic Modeling Visualization

Some Available Data Sets 20 Newsgroups -- Usenet Reuters (1990s) newswire Del.icio.us bookmarked web pages Enron Email Movie Reviews Gamespot game reviews General Conference State of the Union Campaign Speeches

… Yours!

Assignment

Reading for next time: Course Syllabus "Tapping the Power of Text Mining" by Fan et al.

(CACM 9/2006) "Text-Mining the Voice of the People" by

Evangelopoulos et al. (CACM 2/2012) Skim: Alta Plana Text Analytics Report

Reading Report #1 % Completed Questions

top related