Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

e-Humanities Group Research Meeting: STCN

2013/10/10 Wouter Beek

Albert Meroño Peñuela Rinke Hoekstra

Fernie Maas Inger Leemans

‘OPENING’ THE STCN LINKING THE STCN

Open data

Linked Open Data

• Connect to existing datasets • Connect to services • Queries/inferences run across datasets

– The Picarta topic hierarchy allows us to infer that certain publications cover related topics.

– GeoNames gives the latitude of publishing houses, allowing publishing decisions to be correlated to historical events.

– Lexvo / ISO standards allow translations to be traced via related languages (e.g. language families).

• Easy to create mashups / new applications.

died in

Biografisch portaal

same as

Taking the STCN to the Semantic Web

• 139.817 publications (4M facts) • 23.543 authors (120K facts) • 9.959 printers (55K facts) • 37K enriched concepts (DBpedia, Yago, Heidelberg

Diglit, …) • 105 topics (1K facts) • Relate to international standards

(GGC/OCLC/ISO/RFC/IANA) • Making the schema explicit (vocabulary)

Relational DB domain knowledge

RDF files

Text files ambiguous

XML files depends on structure

domain knowledge

Link to external sources (linksets) domain knowledge needed

Domain-independent data conversions fully automated

Simple RDF

Domain-dependent data conversions domain knowledge needed

Connect to services (e.g. query interface, maps)

high level of reuse

Fixing bad data origin inconsistencies

& inaccuracies

FROM THE LIBRARY TO THE LAB

“How many publications by Arminius?”

“How many publications by Gomarus?”

What happens to the average publication format after 1619?

Measured in terms of the number of folds: • Works by Arminius: 5.6 5.7 • Works by Gomarius: 6.8 4.9

Distant reading!

Methodological implications

searching for resources (librarian) to

validating/refuting hypotheses (scientist)

humanities + R (statistics processing software)

A WEB SERVICE FOR

RESEARCH INVOLVING DISTANT READING

Open issues 0: institutional hurdles

• The products of publicly funded research should be publicly available (papers&datasets). – Not everybody makes their data publicly available.

• Distant reading research is often restricted by the user interace.

Open issues 1: meaning

A large percentage of the data has no/unknown meaning: • “before 1808” • “This book was published between the Big Bang and

1808.” Context-dependent: • “The first dinosaur walked the earth before 300M years

BC.” • “Einstein came up with the idea of general relativity

before 1937.” Fuzzyness: • “James Joyce’s Ulysses was published before 1925.”

Open issues 2: statistics • Which query results are statistically relevant? • How to detect whether a statistically significant

difference reflects reality and not the way in which the dataset was constructed?

Dutch Book Trade 1660-1750: using the STCN to gain insight in publishers’ strategies

open data

open issues

distant reading research

funded research

certain publications

related topics

distant readinghumr

web service

Spiritual

Restoration 1660-1800

Decreto 1660

America’s History Fifth Edition Chapter 3: The British...

OLD - WikimediaOld Newburn, 1890 The Bridge over the River.....

Americas History Sixth Edition CHAPTER 3 The British Empire....

Anglia 1660-1789

1536 1674 1630 RENÆSSANCE 1536 - 1660 BAROK 1630 - 1750

ANGRA CM V2 Bohemia++++ · 2020-01-28 · 1750 1800 1800...

Ch. 3: Creating a British Empire in America (1660-1750)...

Chapter 4: Bonds of Empire 1660-1750. 13 Colonies Chesapeake...

House Bill 1660

HP / Compaq 1660

Creating Anglo-America 1660-1750 Historia- An ancient Greek....

Chapter 4 The Bonds of Empire 1660-1750. Rebellion and War.....

Kitchen 1660-1837

CHAPTER 4 The Bonds of Empire, 1660-1750 1. How did the...