Top Banner
(Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley
16

(Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.

Jan 17, 2016

Download

Documents

Philip Boone
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: (Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.

(Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical

Documents

Elder David W. Embley

Page 2: (Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.

Overview

• Big Picture• Diagram• Details & Demo

• Current Status and Expectations

Page 3: (Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.

Fe6: 1. Prepare 2. Extract 3. Merge&Split 4. Check&Correct 5. Generate 6. Convert

FROntIER

ListReader

OntoSoar

GreenFIE

Page 4: (Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.

1. Prepare

{

Page 5: (Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.

2. Extract

Page 6: (Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.

3. Merge & Split

Person

Couple

ParentsWithChildren

Page 7: (Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.

4. Check & Correct

Page 8: (Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.

5. Generate

Page 9: (Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.

6. Convert

Page 10: (Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.

HighlightedResults

Page 11: (Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.

Fe6: 1. Prepare 2. Extract 3. Merge&Split 4. Check&Correct 5. Generate 6. Convert

FROntIER

ListReader

OntoSoar

GreenFIE

COMET

Page 12: (Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.

Precision, Recall, F-Measure ResultsPrecision Recall F-Measure

FROntIER

Person 0.86 0.66 0.75

Couple 1.00 0.40 0.57

ParentsWithChildren 0.89 0.89 0.89

GreenFIE

Person 0.94 0.83 0.88

Couple 1.00 0.90 0.95

ParentsWithChildren 1.00 0.78 0.86

OntoSoar

Person 0.67 0.67 0.67

Couple 0.75 0.30 0.43

ParentsWithChildren 1.00 0.44 0.62

Page 13: (Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.

Fe6: 1. Prepare 2. Extract 3. Merge&Split 4. Check&Correct 5. Generate 6. Convert

FROntIER

ListReader

OntoSoar

GreenFIE

FeedbackLoop

Automated Check (Fix & Warn)

“Sanity”Check

Name, Date, Place Standardization

Administrative and Batch-Processing Management System

COMET

Page 14: (Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.

Fe6: 1. Prepare 2. Extract 3. Merge&Split 4. Check&Correct 5. Generate 6. Convert

FROntIER

ListReader

OntoSoar

GreenFIE

FeedbackLoop

Automated Check (Fix & Warn)

“Sanity”Check

Name, Date, Place Standardization

Administrative and Batch-Processing Management System

Bootstrapping, Ever-learning, Feedback Loop

Extraction Tools:• Layout• Machine Learning

Non-English Languages

COMET

Page 15: (Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.

Summary

• (Semi)automatic Extraction

• Green, Ever-Learning System (improves with use)

• Status:• Extraction Tools (tech-transfer of academic prototypes)• Thin-Line Ensemble Prototype (being thickened)

Page 16: (Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.

Summary

• (Semi)automatic Extraction

• Green, Ever-Learning System (improves with use)

• Status:• Extraction Tools (tech-transfer of academic prototypes)• Thin-Line Ensemble Prototype (being thickened)

BYU Data Extraction Research Groupwww.deg.byu.edu