Visualization Taxonomies and Techniques Text: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014
Apr 02, 2015
Visualization Taxonomies and Techniques
Text: Words, phrases, sentences, …
University of Texas – Pan American
CSCI 6361, Spring 2014
Introduction
• Text is ubiquitous– Documents, and more
generally text, are a primary information source
• (Verbal has its place!)
– Access to documents and text has grown exponentially in recent years due to networking infrastructure
• WWW • Digital libraries • Social media
• Visualization to aid users in understanding and gathering information from text and document collections
Introduction
• Visualization can aid in performing tasks
• For example: – Which documents contain text on topic XYZ? – Which documents are of interest to me? – Are there other documents that are similar to this one (so they are worthwhile)? – How are different words used in a document or a document collection? – What are the main themes and ideas in a document or a collection? – Which documents have an angry tone? – How are certain words or themes distributed through a document? – Identify “hidden” messages or stories in this document collection. – How does one set of documents differ from another set? – Quickly gain an understanding of a document or collection in order to
subsequently do XYZ. – Understand the history of changes in a document. – Find connections between documents.
From Stasko, 2013
IntroductionChallenges of Text Visualization
• Text is unlike other data types seen so far, for example
• Context and Semantics– Context relevant to understanding and meaning– Indeed, natural language understanding a challenge of the nth + 1 century
• Dimensionality– Inherently, “not dimensional”, so must create “visually realizable” visual encoding – Often, first step is n-D, then 2- or 3-D
• Modeling Abstraction– Consider level of “understanding” require for task– Match analysis task with appropriate tools and models
IntroductionRelated topics
• Information Retrieval – Active search process that brings back particular/specific items (will discuss that
some today, but not always focus) – InfoVis and HCI can help some…
• Visualization may be most useful when not sure precisely what you’re looking for when retrieving information
– More of a browsing paradigm than a search one – But, this is part of the information retrieval task
• Define information need, formulate “query”, examine/evaluate results, … repeat
• Sensemaking – Gaining better understanding of facts at hand in order to take some next steps
• A principle focus in visual analytics – Visualization can help make large document collection more understandable more
rapidly • Which is good: “Overview, zoom and filter, details on demand”
Recall, Visualization Pipeline: Visualization Stages
• Data transformations:– Map raw data (idiosynchratic form) into data tables (relational descriptions
including metatags)
• Text is nominal data– A word, or any text unit, does not map easily to any quantitative representation! – The “Raw data --> Data Table” mapping is a principle element of creating any
visual representation• How do you get numbers from words, sentences, …??
– Will see several solutions
RawInformation
VisualFormDataset Views
User - Task
DataTransformations
VisualMappings
ViewTransformations
F F -1
Interaction
VisualPerception
Recall, Visualization Pipeline: Visualization Stages
• Visual Mappings:– Transform data tables into visual structures that combine spatial substrates,
marks, and graphical properties
• And … visual mappings, as well, requires at least “the usual level” of creativity
RawInformation
VisualFormDataset Views
User - Task
DataTransformations
VisualMappings
ViewTransformations
F F -1
Interaction
VisualPerception
Understanding Text Content
Understanding Text Content
• Visual representations of words, phrases, and sentences – Main goal of understanding, versus search
• Visual presentation always part of text presentation – – Standard typography uses layout, font, style, color … – Electronic media, especially – pick a web page– “Single text content”
Single Text ContentWord Counts
• 2012 National Conventions• NY Times: http://www.nytimes.com/interactive/2012/08/28/us/politics/convention-word-counts.html
Tag / Word Clouds
Tag / Word Clouds
• Lots of popular interest – E.g., on web
• Idea is to show word/concept importance through visual means – Tags: User-specified metadata (descriptors) about something – Sometimes generalized to just reflect word frequencies
• Not a new technique– Milgram’s ‘76 experiment to have people label landmarks in Paris – Flanagan’s ‘97 “Search referral Zeitgeist” – Fortune’s ‘01 Money Makes the World Go Round
Tag / Word CloudsExample: US State of the Union Speeches
• Guardian• http://www.guardian.co.
uk/news/datablog/2011/jan/25/state-of-the-union-text-obama#
• http://image.guardian.co.uk/sys-files/Guardian/documents/2011/01/26/State_of_the_union_2011.pdf?guni=Graphic:in body link
Flickr Tag Cloud
delicious Tag Cloud
Alternate Order
Many Eyes Tag Cloud
• Word pairs
Wordle
Wordle“Beautiful Word Clouds”, http://www.wordle.net/
• Tightly packed words– Horizontal, vertical or diagonal
• Size correlated with frequency
• Multiple color palettes
• User gets some control
• Layout Algorithm – Details not published – Sort words by weight, decreasing
order for each word– Init position randomly chosen
according to distribution for target shape
– Update position moves out radially
Wordle“Beautiful Word Clouds”, http://www.wordle.net/
• Course schedule, table of topics, and assignments
Wordle“Beautiful Word Clouds”, http://www.wordle.net/
• Course schedule, table of topics, and assignments
Wordle“Beautiful Word Clouds”, http://www.wordle.net/
• Course schedule, table of topics, and assignments
Can be many variations …
• A bit more order• Order the words more by frequency
Mani-WordleUser control
• Mani-Wordle – Start with nice default algorithm – Give user more control over design
• Alter color (within a palette) • Pin words, redo the rest • Move and rotate words
– http://www.cg.tuwien.ac.at/courses/InfoVis/HallOfFame/2012/Gruppe03/Homepage/index.html
– Koh et al TVCG (InfoVis) ‘10
Tag / Word CloudsConclusions
• Weaknesses– Sub-optimal visual encoding (size vs. position)– Inaccurate size encoding (long words are bigger)– Font sizes are hard to compare – May not facilitate comparison (unstable layout)– Word frequency may not be meaningful
• Most use words vs. stems
– Does not show structure of the text– Studies have even shown they underperform (Gruen et al CHI ’06)
• Why so popular?– OK for “quick look”– Serve as social signifiers that provide a friendly atmosphere that provide a
point of entry into a complex site – Act as individual and group mirrors – Fun, not business-like
BTW - Text Analysis Toolsvoyeur: http://voyeurtools.org/
• Book• + tools for
text analysis and visualization
Visualization and Information Retrieval
• Examples so far have focused on representing a single document– …, or, really, set of words as no consideration of even word order, let alone
sentence structure, etc.
• Principle question is how might visual representations aid text, or document, search
– I.e., how to find the proverbial needle in a haystack, where the haystack is all the documents on the www or a digital library
– Term information retrieval refers to this search and its history antedates computers
• IR entails:– Determine information need– Query formulation– Retrieval – Assessment of results– Reformulation of query or even information need– Repeat (until information need met)
Visualization and Information Retrieval
…• IR entails:
– Determine information need– Query formulation– Retrieval – Assessment of results– Reformulation of query or even information need– Repeat (until information need met)
• Provide visual representations that during this process– Document collection visually, support browsing, …
• Even for determining information need!
– Show query results visually – Show how query terms relate to results – … any aspect
Visualization and Information Retrieval
• Provide visual representations that during this process– Document collection visually, support browsing, …
• Even for determining information need!
– Show query results visually – Show how query terms relate to results – … any aspect
From Stasko, 2013
Evaluating Query ResultsTileBars, Hearst, 1996
• Hearst points out that query responses do not include:
– How strong the match is – How frequent each term is – How each term is distributed
in the document – Overlap between terms – Length of document
• Document ranking is opaque
• Inability to compare between results
• Input limits term relationships
TileBarsOverview
• Goal : Minimize time and effort for deciding which documents to view in detail
• Show the role of the query terms in the retrieved documents, making use of document structure
• Graphical representation of term distribution and overlap
• Simultaneously indicate: – Relative document length – Frequency of term sets in document – Distribution of term sets with respect to the document and each other
From Stasko, 2013
TileBarsScreen
• TileBars screen:
From Stasko, 2013
TileBarsDocument representation
• Visual representation of retrieved documents
• Video: TileBars-80mb-chi96_05_m1.mpeg
From Stasko, 2013
TileBars
•TileBars
•Video
TileBarsConclusions
• Clearly visually provides the information intended about each document
• Ease/effort/time of comparison?– Surely would improve with use
• … ?
Evaluating Query ResultsSparkler
• Abstract result documents more – Havre et al InfoVis ‘01
• Show “distance” from query in order to give user better feel for quality of match(es)
• Also shows documents in responses to multiple queries • Visualizing One Query
– Triangle – query – Square – document
• Distance between query and documents represents their relevance
From Stasko, 2013
Sparkler
• Visualizing Multiple Queries • Six queries here • Bullseye allows viewer to select quality results
From Stasko, 2013
Sparkler
• Test Example • Text Retrieval Conference (TREC-3) test document collection • AP news stories from June 24–30, 1990 • TREC topic: Japan Protectionist Measures • Sparkler found 16 of 17 relevant documents
From Stasko, 2013
Evaluating Query ResultsRankSpiral
• Compare search results from different search engines– Spoerri InfoVis ’04 poster
From Stasko, 2013
RankSpiral
• Color represents different search engines Compare search results from different search engines
From Stasko, 2013
RankSpiral
• Color represents different search engines Compare search results from different search engines
Evaluating Query Results ResultMaps
• Treemap-style vis for showing query results in a digital library– Clarkson, Desai & Foley TVCG (InfoVis) ‘09
From Stasko, 2013
Representing Multiple Documents
Representing Multiple Documents
• Previously, have seen various techniques for comparing multiple documents that are results of query, i.e., a subset of all documents
• Also, may want to just show everything, and then let user do “manual search”, or user-directed search
• Such displays of all documents also support the type of search common in visual analytics
– Query, browse, connect, drill-down
• Will see:– Parallel word clouds– Tree layout of synonyms– …
Multiple DocumentsParallel Tags Clouds
• Tag clouds increase size of word as f(frequency)• Showing multiple documents as tag clouds allows visual inspection
– Automated and user directed, visual analytics
• Parallel Tag Clouds - name says it all– Video - Collins et al VAST ‘09 – different circuit courts– http://www.youtube.com/watch?v=rL3Ga6xBgLw
Multiple DocumentsDo different district courts differ in cases they handle?
• .
Multiple DocumentsDo different district courts differ in cases they handle?
• .
Multiple DocumentsCounting Words: Overview & Timeline
• Ex., across speeches can count words
• State of the Union Addresses
• http://www.nytimes.com/ref/washington/20070123_STATEOFUNION.html?initialWord=iraq
• NY Times demo
Multiple DocumentsCounting Words: Overview & Timeline
• State of the Union Addresses • http://www.nytimes.com/ref/washington/20070123_STATEOFUNION.html?initialWord=iraq
•NY Times demo
Multiple Document Word UseDocuBurst
• Sets of synonyms grouped together
– Uses WordNet – show words from a
document in terms of their hypernym (ISA) links
– Size – # of leaves in subtree – Hue – diff synsets of word– Shade – frequency of use
• Demo, etc. – http://vialab.science.uoit.ca/portfolio/docuburst-
visualizing-document-content-using-language-structure
FeatureLens
• Show patterns of words or n-grams – Don et al. CIKM ‘07
• Video
FeatureLens
• Show patterns of words or n-grams – Don et al. CIKM ‘07
•Check Video
Combinations of words, phrases, and sentences
Multiple SentencesSeeSoft Display
• Originally for software visualization
• One line of text on each horizontal line
• Color highlight for attributes
– E.g., for software, how often modified, days since modification
– E.g., for text where a particular word appears in a sentence,
• Conversations might be revealed
• Detail view in pop up window
Multiple SentencesTextArc - Simple Single Document Visualization
• Visualize an entire book – Word appearances – Sentences – … – http://textarc.org
• Sentences laid out on circumference in order of appearance in spiral
• Frequently occurring words inside spiral
• Selecting word draws line on to sentences with word
– A kind of “visual concordance”
• Significant interaction
TextArc
Concordances and Word Frequencies
• From field of literary analysis
• Concordance– An alphabetical index
of the principal words in a book or the works of an author with their immediate context
• Word of interest in center, with text in which appears to left and right
• As, KWIC– Key word in context
Word Tree
• Shows context of a word or words – Follow word with all the phrases that follow it
• Wattenberg & Viégas TVCG (InfoVis) ‘08
• Font size shows frequency of appearance • Continue branch until hitting unique phrase • Clicking on phrase makes it the focus • Ordered alphabetically, by frequency, or by first appearance
Word TreeInteraction
Word TreeFrom King James Bible
• From King James Bible
WordTreeMany Eyes
Finding Structure: Phrase Nets
Find Structure: Phrase Nets
• Concordances show local, repeated structure of word context• Phrase Nets In Many Eyes, van Ham et al.
• Other types of patterns– Lexical: <A> at <B>, <A> and <B>, <A> at <B>, <A> (is|are|was|were) <B>– Syntactic: <Noun> <Verb> <Object>
• Visualize extracted patterns in a node-link view– Occurrences -> Node size– Pattern position -> Edge direction
Phrase Net(larger next slide)
Portrait of the Artist as a Young Man<A> and <B>
Phrase Net
Portrait of the Artist as a Young Man<A> and <B>
Phrase NetsThe Bible: <A> begat <B>
Phrase NetsOld and New Testaments: <A> of <B>
Phrase Nets(<A> and <B>) and (<A> at <B>)
End
• .
References
• F. Viegas, M. Wattenberg, "Tag Clouds and the Case for Vernacular Visualization", interactions, Vol. 15, No. 4, Jul-Aug 2008, pp. 49-52.