Top Banner
Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913 Dolores Grant Dr Ciara Breathnach, Dr Sandra Collins, Rebecca Grant Irish Record Linkage 1864-1913
22

Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

Jul 08, 2015

Download

Design

IRL_Project

Presentation given by Dolores Grant at the European Society of Historical Demography conference, Alghero, Sardinia, 26 September 2014
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

Dolores GrantDr Ciara Breathnach, Dr Sandra Collins, Rebecca GrantIrish Record Linkage 1864-1913

Page 2: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

Irish Record Linkage project 1864-1913

Irish Record Linkage is an Irish Research Council funded project running until December 2015

To construct a Knowledge Platform by applying semantic technologies to vital-registration data generously shared by the Office of the Registrar General

To address research queries around infant and maternal mortality rates and patterns in Dublin

Page 3: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

Irish Record Linkage project 1864-1913

Collaboration between the Digital Repository of Ireland at the Royal Irish Academy, the University of Limerick and Insight@NUI Galway

Principal Investigators: Dr Ciara Breathnach (UL), Dr Sandra Collins (DRI), Dr Stefan Decker (Insight)

Project Team: Dr Brian Gurrin (UL), Dr Christophe Debruyne (Insight/DRI), Dr Oya Beyan (Insight), Rebecca Grant (DRI), Dolores Grant (DRI)

Page 4: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

University of Limerick

Its mission is to promote and advance learning and knowledge through teaching, research and scholarship in an environment which encourages innovation and upholds the principles of free enquiry and expression.

The Faculty of Arts, Humanities and Social Sciences prides itself on the quality of its teaching and its commitment to research and places a strong emphasis on the role of debate and discussion in the development of knowledge and analytical skills.

Page 5: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

The Digital Repository of IrelandBased in the Royal Irish Academy (Ireland's Academy for the Sciences and Humanities)

DRI is a trusted digital repository for the Humanities and Social Sciences data

Linking and preserving the rich data held by Irish institutions, providing a central internet access point and multimedia tools

Focal point for the development of national guidelines and policy for digital preservation and access

Page 6: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

INSIGHT@NUI Galway

Insight brings together leading Irish academics from 5 of Ireland'€™s leading research centres (DERI, CLARITY, CLIQUE, 4C, TRIL), in key areas of priority research including:

The Semantic Web,Sensors and the Sensor Web,Social network analysis,Decision Support and Optimization, andConnected Health.

Page 7: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

1845: Registration of marriages act was introduced to gather official statistics of marriages of the established Church of Ireland

1864: the first year Births, Deaths and Marriages (including Catholic Marriages) were registered following the establishment of a complete Irish civil registration system in 1863

Ireland 1864-1912: 2.9 million birth records4.9 million death records3.18 million marriages

Dublin 1864-1912: 609,720 birth records537,635 death records330,605 marriage records (1845-1913)

Irish Historic Vital Registration Data

Page 8: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

The Linked Data Concept

A method of publishing structured data on the Web, allowing it to be connected and enriched, and facilitating linking between related resources.

A key principle of Linked Data is that HTTP URIs are used to name the semantic elements of the dataset

Linked Data standards such as RDF allows semantic definitions to be applied to information, using statements called ‘triples’ in the form subject, predicate, object.

Page 9: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

The Linked Data Concept

This example describes the subject (James Joyce) and his relationship (predicate) to an object (Dublin). By semantically separating the elements of the information (that James Joyce was born in Dublin) datasets stored in this way can be easily queried.

Page 10: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

Vital registration data: birth, death, marriage records for Dublin

TIFF images of pre-digitised indexes and registers of birth, death and

marriage

General Register Office database for these records

General Register Office Data

Page 11: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913
Page 12: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

Marriage Records

Register TIFF Index TIFF System 1845-1901 System 1902-c.1912

Registrar’s District Registration District District District

Marriage solemnised at

Parish

Union

County County County

Province Province

Number in register Entry number

When married Year of event Year of event , Date of marriage

When registered Returns year Returns year

Returns quarter Returns quarter

Name and surname Name Forename, Surname Forename, Surname

Partner’s surname

Age

Sex

Condition

Rank or profession

Residence at the time of marriage

Father’s name and surname

Rank or profession of father

Celebrant

Witnesses

Signature of Registrar

Signature of Superintendant Registrar and date

Stamp Number Stamp number Stamp number

Volume number Returns volume number Returns volume number

Page number Page number Returns page number Returns Page number

Stamped number Page ID Page ID

2nd Stamped number

Index entry number Index entry number

Index page number

Page 13: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

Birth Records

Register TIFF Index TIFF System Pre 1900 System Post 1900

Superintendent Registrar’s District

Registrar’s District Registration district District District

Union

County County County

Province Province

Number in register Entry number

Date & place of birth Year of event Date of birth, year of event

Name (if any) Name Forename, Surname Forename, Surname

Sex Sex

Name, surname & dwelling place of father

Name & surname & maiden surname of mother

Mother’s maiden name

Rank or profession of father

Signature, qualification, and residence of informant

When Registered Returns year Returns year

Returns quarter Returns quarter

Signature of Registrar

Name & surname & maiden surname of mother

Rank or profession of father

Signature, qualification, and residence of informant

Signature of Registrar

Signature of Superintendant Registrar and date

Baptismal name if added after registration of birth and date

Stamp Number Stamp number Stamp number

Volume number Returns volume number Returns volume number

Page number Page number Returns page number Returns page number

Stamped number Page ID

2nd Stamped number

Index entry number

Index page number

Page 14: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

Death Records

Register TIFF Index TIFF System

Superintendent Registrar’s District

Registrar’s District Registration District District

District

Union

County County

Province

Number in register

Date and place of death Year of event

Name and surname Name Forename, Surname

Sex

Condition

Age last birthday Age Age at death

Rank, profession or occupation

Certified cause of death and duration of illness

Signature, qualification and residence of informant

When registered Returns year

Returns quarter

Signature of Registrar

Signature of Superintendant Registrar and date

Stamp number Stamp number

Volume number Returns volume number

Page number Page number Returns page number

Stamped number Page ID

2nd Stamped number

Index entry number

Index page number

Page 15: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

Research Questions

Identifying the record fields that are necessary to maintain the archival authenticity of the records and answer the research questions:

•How many women died within 42 days following childbirth due to complications related to labour and how does that figure correspond with the official reports?

•Which women died of causes that can be attributed to maternal death, but for which no corresponding birth certificate exists?

•How did various socio-economic conditions affect maternal and infant mortality rates?

Page 16: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

Competency questions to construct the Ontology

ID Competency Question

C01 Women died within 42 days after giving birth (the date of birth counted as day 1 and day 42 is included)

C02 Women died within 42 days after giving birth AND in their death certificate ‘complication 1’ is mentioned.

C03 Women died within 42 days after giving birth AND in their death certificate ‘complication 2’ is mentioned.

C04 Women having official maternal death reports including “XXXX’

C05 Women having official maternal death reports including “cause 1”

C06 Women having official maternal death reports including “cause 2 and cause 3 together”

C07 For each record in C04 find the ones with corresponding birth record (the date of death counted as day 1 and day 42 is included)

Page 17: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

described by

GROTriplestore

GRO Ontology

extract load

Digital Archivist

consulted by

amends/curates

Creation of RDF triples

Transform

GRO Database

Storage ModelMetadata that can be queried

declaratively with a W3C

standard

Page 18: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

GRO Triplestore

Triplestore 2 Data Analysis

Transformation from one model to

another

• SPIN – SPARQL Inference

• SWRL / RuleML

• SPARQL Construct

• …

SEPA

RAT

ION

OF

CO

NC

ERN

S

GRO Records annotation vs. Data Analysis

Page 19: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

<#B000-001> a

irl:BirthRecord;

irl:on "1900-08-08";

irl:name "James";

irl:mother "Mary Murphy";

irl:place "Castle Road"; …

<#B010-022> a

irl:BirthRecord;

irl:on "1902-04-19";

irl:name "Patrick";

irl:mother "Mary Murphy";

irl:place "Castle Road"; ...

<#B022-051> a

irl:BirthRecord;

irl:on "1904-09-20";

irl:name "Agnes";

irl:mother "Mary Murphy";

irl:place “Convent Hill"; ...

<#B050-003> a

irl:BirthRecord;

irl:on "1905-02-18";

irl:name "Michael";

#1 Mary Murphy

#2 Mary Murphy

#3 Mary Murphy

#4 Mary Murphy

owl:sameAs

owl:sameAsowl:sameAs

TRANSFORMATION

ONTOLOGY MATCHING

All generated are stored separately

for data analytics ...

Page 20: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

#1 Mary Murphy

#1 Mary Murphy

#1 Mary Murphy

James Patrick Michael

1900-08-08 1902-04-19 1905-02-18

619 days 1036 days

Average sibship interval = 827.5 days

Data analysis on the generated triples

Page 21: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

Data Challenges

•Data security - transfer, storage and use by authorised parties

•Data protection best practice

•Quantity of data

•Varying levels of detail eg causes of death

• Establishing maternal death- fever

•Archaic medical terms

•Variances in record subject names and places

•Place names changes over time

Page 22: Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913

DRI Presentation

The Irish Record Linkage Knowledge Platform

• State of the art linked data & ontology based analysis platform for historical 'big data'

• Platform within a secure, closed system

• Prepared to allow formulation of the specific research queries

• Query interface to allow for the historical analysis of the data.

• Potential expansion to include additional contextualising datasets

@IRL_Project http://irishrecordlinkage.wordpress.com/