Top Banner
Linked Books Giovanni Colavizza EPFL
18
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Linked Books - DH Venice Fall School 2014

Linked Books

Giovanni Colavizza EPFL

Page 2: Linked Books - DH Venice Fall School 2014

Motivation: a question

How to find sources for a humanities research?

How to find literature for a research in “hard” sciences?

Page 3: Linked Books - DH Venice Fall School 2014

Motivation: the differences between humanities and “hard” sciences

• Primary and secondary sources • Citation history (e.g. Google Scholar) • Citation semantics

Page 4: Linked Books - DH Venice Fall School 2014

Motivation: primary and secondary sources

Approx. half of the citations in humanities are to primary sources [Wiberley (2009)].

Their use has hardly ever been studied with citation analytic methods.

“For  scholarship  in  the  humanities  there  are  three  kinds  of  literature:  primary  literature  that  contains  the  evidence  on  which  humanists base  their  scholarship,  secondary  literature  in  which  humanists  write  up  their  scholarship,  and  access  services  that  describe  and  index  the  publications  written  by  humanists.”  (Wiberley,  2009)

Page 5: Linked Books - DH Venice Fall School 2014

Motivation: citation history

Lack of data [Sula and Miller (2014)], why? • Sparse and local sub-fields • Nationality (language and schools) • Proliferation of editorial practices

Page 6: Linked Books - DH Venice Fall School 2014

Motivation: citation semantics

•Humanists are less prone to credit each other than scientists [Heinzkill, 1980; Swales, 1990; Hellqvist, 2010]

•They are less prone to work together. Avg. authors per publication of 1.06 in a study by Linmans (2010)

•They use citations with a great variety of meanings and ways: agree, disagree, full association, minor reference, etc. [Harwood (2008), Cano (1989)]

Examples: Strongly negative: “Professor Epstein’s comment presents no new findings and ignores the theoretical issues I raise.” and quote to Epstein 2008. Ogilvie (2008). Association: “non basta ridimensionare gli aspetti strutturali del declino economico, che per Venezia fu comunque solo “relativo”, ..” and quote to Rapp 1979. Trivellato (2000).

Page 7: Linked Books - DH Venice Fall School 2014

Motivation: our answer

Citation analysis for humanities is an almost non-existent field, yet the results could be very rich:

We cannot simply use traditional citation analysis methods on humanities data. We need new questions and methods.

Page 8: Linked Books - DH Venice Fall School 2014

The project: goals

• Digitise all historiography on Venice we can (i.e., for now, history).

• Extract all citations and populate a database. • Analyse the history of the history of Venice and

develop a framework for citation analysis for humanities.

• Publish an open access search engine for scholars and general public.

Page 9: Linked Books - DH Venice Fall School 2014

The project: goals

“Side effects”, we have the full text of most publications on Venice, considering we are also digitising documents at the Archive.. • Indexes of keywords (e.g. named

entities) • Direct link publication-sources • Topic modelling and fine-grain

classification of publications (currently at most Dewey subjects..)

• Enhanced library catalogue

Page 10: Linked Books - DH Venice Fall School 2014

The project: partners and materials

Partnership with Ca’ Foscari Library System (humanities library) and discussion with major Venetian libraries.

Digitisation goal: digitise all secondary literature on Venice for the last 200y (monographs, journals, editions, etc.). Currently circa 5000 estimated items (there are many more). Digitisation ongoing (1513 done last Friday).

Page 11: Linked Books - DH Venice Fall School 2014

Methods: overview

Page 12: Linked Books - DH Venice Fall School 2014

Methods I: data extraction

Page 13: Linked Books - DH Venice Fall School 2014

Methods I: data extraction

Page 14: Linked Books - DH Venice Fall School 2014

Methods I: data extraction

The steps: • OCR • Citation detection • Citation parsing • Model and populate the db (ontologies for citations)

Basic tools: • Active annotation for supervised learning (minimise

training data to annotate) • Conditional Random Fields for parsing • RDF and triple stores as database

Page 15: Linked Books - DH Venice Fall School 2014

Methods II: citation analysis, networks

Network-based models. Remember primary and secondary sources, how many graphs can we build?

Bibliographic coupling and co-citation

Page 16: Linked Books - DH Venice Fall School 2014

Methods II: citation analysis, networks

Page 17: Linked Books - DH Venice Fall School 2014

Methods II: citation analysis

Network-based models: • Global analysis • Local analysis (communities and nodes) • Temporal analysis • Publication classification and analysis

Big questions: • Key works, authors, sources • Disciplinary segmentations • Measure intellectual influence and schools of thought • Map scholarly debates

Page 18: Linked Books - DH Venice Fall School 2014

Linked Books Thank you

Giovanni Colavizza EPFL