Top Banner
NLP and Data Mining: From Chartex to Traces Through Time and beyond Dr Roger Evans Natural Language Technology Group & Cultural Informatics Research Group University of Brighton
16
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ihr june15-evans

NLP and Data Mining: From Chartex to Traces Through Time

and beyond

Dr Roger EvansNatural Language Technology Group &

Cultural Informatics Research GroupUniversity of Brighton

Page 2: Ihr june15-evans

One man, two guvnors

ChartEx TTT‘Deep’

processing

Page 3: Ihr june15-evans

Two men, two guvnors

ChartEx TTT

Natural language

processing

Data mining

Page 4: Ihr june15-evans

Two men, two guvnors

ChartEx TTT

Natural language

processing

Data mining

Brighton

Leiden

Page 5: Ihr june15-evans

ChartEx Architecture

1000’s of chartersVirtual

workbenchData

mining

Natural language

processing

DM development

NLP development

5-10 charters

Markupscheme

Expertelicitation

100-200 Charters Marked-up chartersManual markup

ChartExrepository

VWB development

VWB requirements

Repositorydevelopment

Page 6: Ihr june15-evans

ChartEx Architecture

1000’s of chartersVirtual

workbenchData

mining

Natural language

processing

DM development

NLP development

5-10 charters

Markupscheme

Expertelicitation

100-200 Charters Marked-up chartersManual markup

ChartExrepository

VWB development

VWB requirements

Repositorydevelopment

Runtime architecture

Page 7: Ihr june15-evans

TTT architecture

Record Linkage

Visualisation

Shallow language

processing

Extract content

Deep language

processing

DocumentsOptimisation

/statistics

Page 8: Ihr june15-evans

Comparison

Record Linkage

Visualisation

Shallow language

processing

Extract content

Deep language

processing

DocumentsOptimisation

/statistics

1000’s of chartersVirtual

workbenchData

mining

Natural language

processing

ChartExrepository

Page 9: Ihr june15-evans

Comparison

Record Linkage

Visualisation

Shallow language

processing

Extract content

Deep language

processing

DocumentsOptimisation

/statistics

1000’s of chartersVirtual

workbenchData

mining

Natural language

processing

ChartExrepository

Range of data

Medieval charters

English and Latin

Early and modern

Free text

Text and data

Page 10: Ihr june15-evans

Comparison

Record Linkage

Visualisation

Shallow language

processing

Extract content

Deep language

processing

DocumentsOptimisation

/statistics

1000’s of chartersVirtual

workbenchData

mining

Natural language

processing

ChartExrepository

Range of data

Analytic Complexity

Medieval charters

English and Latin

Early and modern

Free text

Text and data

Focus on people

Detailed view

Focus on places

Broad relational view

Page 11: Ihr june15-evans

Comparison

Record Linkage

Visualisation

Shallow language

processing

Extract content

Deep language

processing

DocumentsOptimisation

/statistics

1000’s of chartersVirtual

workbenchData

mining

Natural language

processing

ChartExrepository

Range of data

Target users

Analytic Complexity

Medieval charters

English and Latin

Early and modern

Free text

Text and data

Focus on people

Detailed view

Focus on places

Broad relational view‘Researchers’

Controlled environment

Web users

Less control

Page 12: Ihr june15-evans

Comparison

Record Linkage

Visualisation

Shallow language

processing

Extract content

Deep language

processing

DocumentsOptimisation

/statistics

1000’s of chartersVirtual

workbenchData

mining

Natural language

processing

ChartExrepository

Range of data

Target users

Analytic Complexity

Medieval charters

English and Latin

Early and modern

Free text

Text and data

Focus on people

Detailed view

Focus on places

Broad relational view‘Researchers’

Controlled environment

Web users

Less control

(Heritage) Enterprise

Bespoke

Page 13: Ihr june15-evans

What can Computer Science do?

• State of the art is broadly based on statistics

• Answers are always only approximate

• Different kinds of approximation:• Precision – focus on making sure answers are right (but

may miss some)

• Recall - focus on getting as many right answers as possible (but may give some wrong answers too)

Page 14: Ihr june15-evans

Precision and recall

Page 15: Ihr june15-evans

What does Digital Humanities want?

• Perfect results? • How do you respond if we say we can’t do that?

• Control over tradeoff?• How easy is it to understand what control you have?

• Does this help you interpret the results you get?

Page 16: Ihr june15-evans

Where are we now, and where are we going?• Human in the loop

• Tools always require human interpretation of results

• Is this really just a cop out by computer scientists?

• Or just a pragmatic expression of the state of the art?

• Deskilling• Do we really mean an expert in the loop?

• Conversations• Are we really only just at the point of negotiating what is

possible and what is required?