Top Banner
Visual Analytics for Linguistics - Day 3 Olga Scrivner Course Info Charts Text Visualization ITMS Preprocessing Data Data Visualization Cluster Analysis Topic Modeling Google Book API Visual Analytics for Linguistics - Day 3 Olga Scrivner
73

Visual Analytics for Linguistics - Day 3 ESSLLI

Jan 23, 2018

Download

Data & Analytics

Olga Scrivner
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Visual Analytics for Linguistics - Day 3

Olga Scrivner

Page 2: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

What You Will Learn

DAY 1 Introduction to Visual Analytics

DAY 2 Visualization Methods, Design, and Tools

DAY 3 Working with Unstructured Data

DAY 4 Working with Structured Data

DAY 5 Advanced Analytics

Page 3: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Our Materials - Web Site

http://obscrivn.wixsite.com/visualization

Page 4: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

What We Need

I Interactive Text Mining Suite

I Voyant

I R and Rstudio

I R libraries: ggplot2, plotly, reshape2

Page 5: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

What We Need

I Interactive Text Mining Suite

I Voyant

I R and Rstudio

I R libraries: ggplot2, plotly, reshape2

Page 6: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Quiz: Which Chart Are You?

https://www.sisense.com/blog/quiz-chart/

Page 7: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating a Bar Chart

I The value of a column in the data set. This is done withstat=“identity” , which leaves the y values unchanged.

I The count of cases for each group - each x valuerepresents one group.

Page 8: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating a Bar Chart - Sample

Page 9: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating a Bar Chart - Sample

Page 10: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating a Bar Chart - Values

Page 11: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating a Bar Chart - Counts

To get a bar graph of counts, we do not map a variable to y,and we use stat=“count”

Page 12: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating a Bar Chart - Counts

Page 13: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Title

Page 14: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Line Chart

Page 15: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Line Chart

Page 16: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Area Chart

http://www.r-graph-gallery.com/136-stacked-area-chart/

Page 17: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Scatter Plot

http://www.r-graph-gallery.com/272-basic-scatterplot-with-ggplot2/

Page 18: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Bubble Plot

https://plot.ly/r/bubble-charts/

Page 19: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Bubble Plot

https://plot.ly/r/bubble-charts/

Page 20: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Heatmap

http://www.r-graph-gallery.com/215-interactive-heatmap-with-plotly/

Page 21: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Heatmap

http://www.r-graph-gallery.com/215-interactive-heatmap-with-plotly/

Page 22: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Heatmap

Page 23: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Creating Word Cloud

Page 24: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Word Cloud - Contest - 10 min

I Create your own word cloudI Look at the function - type ?wordcloud2 and run

I Can you change a shape of your cloud?I Save (or make a screenshot) and post it on

twitter/facebook etc

Page 25: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Why Analyze Text?

The “epic transformation of archives” - shifting from print todigital archival form (Folsom, 2007)

“As our collective knowledge continues to be digitized andstored (...) it becomes more difficult to find and discover

what we are looking for.” (Blei 2012)

Page 26: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Text Mining Challenges

source - 1) Dan Jurafsky, 2) Text Mining with R for Social Science Research (Ryan Wesslen)

Page 27: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Basic Terminology

Page 28: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

What is Bag of Words?

I Simplest way to quantify text

I Word order ignored

I Term counts per document

I N-grams (uni-grams, bi-grams)

Source - Chris Manning

Page 29: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Preprocessing

I Tokenization (splitting words)

I Cleaning (lower case, punctuation)

I Stemming

I works, worked → work

I Filter (stopwords)

I and, the, a

Source - Wesslen

Page 30: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Preprocessing

I Tokenization (splitting words)

I Cleaning (lower case, punctuation)

I Stemming

I works, worked → work

I Filter (stopwords)

I and, the, a

Source - Wesslen

Page 31: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Preprocessing

I Tokenization (splitting words)

I Cleaning (lower case, punctuation)

I Stemming

I works, worked → work

I Filter (stopwords)

I and, the, a

Source - Wesslen

Page 32: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Macro-analysis

Concept Macro-analysis (Jockers, 2013)

“the construction of abstract models”(Jasinski, 2001)

Methods Tag clouds, heat maps, clusters, topics,network graphs

Tools GUI: Voyant, Papermachine, ITMSTUI: Mallet, Meta, R and Python packages

Page 33: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Visual Analytics

Visual Analytics - “The science of analytical reasoningfacilitated by visual interactive interfaces” (Thomas et all.,2005)

I Graphs, maps and trees for literature analysis (Moretti,2005)

Page 34: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Visualization Methods

I Word clouds to analyze a novel (Vuillemot et al., 2009)

Page 35: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Visualization Methods

I Social network graphs of characters in Greek tragedies(Rydberg-Cox, 2011)

Page 36: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Visualization Methods

I Literary fingerprint and summaries (Oelke et al., 2012)

Page 37: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Visualization Methods

I Tracking emotion and sentiment in fairy tales(Mohammad, 2012)

Page 38: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Topic Modeling

Discovering underlying theme of collection from Science magazine1990-2000 (Blei 2012)

Page 39: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Topics - Word Term

Page 40: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Topics - Word Term

Page 41: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Wikipedia Topics

http://www.princeton.edu/~achaney/tmve/wiki100k/browse/topic-presence.html

Page 42: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Wikipedia Topics - Assignment - 10 min

1. Language Related Topic2. Words: Dialect3. Related Document: Macedonian Language4. Related Document: Egyptian hieroglyphs5. Go to Full article:6. Find meaning:

Page 43: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Voyant

http://voyant-tools.org/

Page 44: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Voyant

http://voyant-tools.org/

Page 45: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Voyant - 10 min

http://voyant-tools.org/

I Examine visualization charts (identify typesand properties)

I Apply various filters and queries

Page 46: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Voyant Tools - Bubblelines - 7 min

http://docs.voyant-tools.org/tools/

I Delete top termsI Search for man and woman

I Make sure to have “separate lines for terms” clickedI Change search terms

Page 47: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Voyant Tools - Pair Work - 10 min

http://docs.voyant-tools.org/tools/I Examine visualization methodsI Select 5 methodsI Look at the documentation and how to use them

Page 48: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Interactive Text Mining Suite

I A user-friendly tool for quantitative analysis andvisualization of unstructured data

I Platform-independent

I Interactive

Page 49: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

ITMS Structure

1. File Uploads

I Upload files (txt, pdf, rdf and Google books API)

2. Data Preparation

I Data preprocessing (stopwords, stemming, metadata)

3. Data Visualization

I Word frequencies, Cluster analysis and topic modeling

Page 50: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

ITMS Structure

1. File Uploads

I Upload files (txt, pdf, rdf and Google books API)

2. Data Preparation

I Data preprocessing (stopwords, stemming, metadata)

3. Data Visualization

I Word frequencies, Cluster analysis and topic modeling

Page 51: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Workshop Files

I Download 3 text files

https://iu.box.com/s/knua9af3bip7g63s3zdax9ti4z243ldz

I NY Times articles (3 documents in a plain text format)

I ITMS Web site:

http://www.interactivetextminingsuite.com

Page 52: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Upload File

Page 53: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Upload File

Page 54: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Upload File

Page 55: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Preprocessing Data

Before performing data analysis we should preprocess data.

Page 56: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Preprocessing Options

Select preprocessing options and click apply.

Page 57: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Stopwords

Stopwords (e.g. the, and): select Default for English

Page 58: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Manual Removal of Stopwords

Based on the need, remove any additional stopwords that youmay consider a noise, e,g, paper, shows etc

Select apply

Page 59: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Stemming

To improve analytics, you can stem all your tokens, ex.instead of worked, works, working, you will have only onerelevant stem work

Page 60: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Metadata Extraction

You can extract or upload metadata. You will needdatestamp (year) information for chronological topicmodeling.

Page 61: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Visualization

Page 62: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Word Cloud Representation

Page 63: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Customization

Page 64: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Cluster Analysis

You need to have at least three documentsDocuments will be grouped based on their term similaritymeasures

Page 65: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Cluster Analysis

Page 66: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Topic Modeling

I LDA (Latent Dirichlet allocation)

I STM (Structural Topic model)

I Chronological topic visualization (lda): requiresmetadata

Page 67: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Topic Modeling Tuning

I Selection of topics (how many different themes)

I Selection of words per theme (how many words pertopic)

I Selection of iteration

Page 68: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Topic Model Selection

Page 69: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

LDA Topic Model

Page 70: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

STM Topic Model

Page 71: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Other Formats - Google Books

Before switching to other data formats, refresh your localbrowser.

Start with File Uploads and select Structured Data

Page 72: Visual Analytics for Linguistics - Day 3 ESSLLI

Visual Analyticsfor Linguistics -

Day 3

Olga Scrivner

Course Info

Charts

TextVisualization

ITMS

PreprocessingData

DataVisualization

Cluster Analysis

Topic Modeling

Google Book API

Other Formats - Google Books

Select your search terms and submit

Current limitation is 40 books