Top Banner
From keyword searching to discourse mining Pim Huijnen, Juliette Lonij Encounters between the Humanities and Computing, Utrecht University, 18 February 2016
15

From Keyword Searching to Discourse Mining

Apr 12, 2017

Download

Data & Analytics

Pim Huijnen
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: From Keyword Searching to Discourse Mining

From

keyword searching to

discourse mining

Pim Huijnen, Juliette Lonij

Encounters between the Humanities and Computing, Utrecht University, 18 February 2016

Page 2: From Keyword Searching to Discourse Mining

The Keyword Problem

Page 3: From Keyword Searching to Discourse Mining

From: The Barre daily times, January 22 (1913), p. 1

Page 4: From Keyword Searching to Discourse Mining

Dictionary searching

using extensive and context-specific word lists (‘dictionaries’) to replace the contingency of single keywords

Page 5: From Keyword Searching to Discourse Mining

Why dictionary searching?

…to trace discursive shifts, represented by combinations of words instead of individual words

…to trace the persistence of discourses

Page 6: From Keyword Searching to Discourse Mining

Eugenics in Dutch newspapers(?)

Query:

maatregel nageslacht eigenschap* aanleg theorie bloed invloed

NOT eugenetica eugenetiek eugeniek eugenese ras*

Page 7: From Keyword Searching to Discourse Mining

Eugenics after eugenics

(Geref. gezinsblad 1965)

(De Tijd 1952)

Page 8: From Keyword Searching to Discourse Mining

Efficiency before efficiency

Query: "product* machine* verspilling bedrijf goedkoop kwaliteit” \01-01-1890 t/m 31-12-1940

(1901)(1906)

Page 9: From Keyword Searching to Discourse Mining

Developing a script to extract dictionaries from literature

Experimenting with tools to visualise results of dictionary searching in kranten.delpher.nl

KB researcher-in-residence project

Page 10: From Keyword Searching to Discourse Mining

Script to extract dictionaries

B

Topic modeling

TF-IDF

A

Page 11: From Keyword Searching to Discourse Mining

BC

Script to extract dictionaries

Page 12: From Keyword Searching to Discourse Mining

Visualising results of dictionary searches in Delpher

Use OR-query to search Delpher

Visualise results on the basis of Solr’s relevancy-score (min. nr. of words)

(arbeid* OR bedrij* OR beheer OR controle* OR factor* OR functie* OR kost* OR leiding* OR loon* OR maatregel* OR management OR methode* OR model* OR norm* OR organisatie* OR plannen OR prijs OR productie OR rationeel OR rendement OR reorganisatie OR statistiek OR taylor OR tijd OR werkbesparing OR werkverdeeling)

Page 13: From Keyword Searching to Discourse Mining

kbresearch.nl/dictionary

Page 14: From Keyword Searching to Discourse Mining

kbresearch.nl/dictionary

Page 15: From Keyword Searching to Discourse Mining

Challenges

Running an OR-query of 25+ (or, preferably, more) words on a 100.000.000+ document dataset

Accounting for particularities of the corpus: * number of newspaper titles per year * changes in newspaper titles over the years * changes in article length over the years

Getting an idea of the exact combination of words in the visualised results