Top Banner
18 July, 2014 An Introduction to Relational Databases Dr James A J Wilson Dr Meriel Patrick
26

DHOxSS 2014 - Introduction to Relational Databases

May 25, 2015

Download

Education

This introduction to relational databases was presented at the Digital Humanities at Oxford Summer School 2014.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DHOxSS 2014 - Introduction to Relational Databases

18 July, 2014

An Introduction to Relational Databases

Dr James A J WilsonDr Meriel Patrick

Page 2: DHOxSS 2014 - Introduction to Relational Databases

18 July, 2014Page 2

Relational Databases

Defined in 1970 First commercially available relational database management

system released by Oracle in 1979 Widespread adoption by both business and research

communities Underpin many websites Well understood and widely supported

Digital Humanities Summer School -An Introduction to Relational Databases

Page 3: DHOxSS 2014 - Introduction to Relational Databases

Options when structuring data

Spreadsheets Recording the common properties of a single thing Numerical analysis Generating charts and graphs

Relational databases Recording the common properties of multiple related things Flexible querying

Document-orientated databases / ‘semi-structured’ databases Recording items which share some common properties Avoids need to define rigid structure in advance

XML / XML databases Categorizing elements of text

RDF (Resource Description Framework) triplestores Records relationships between things (basis of Semantic Web)

18 July, 2014Digital Humanities Summer School -An Introduction to Relational Databases

Page 3

Page 4: DHOxSS 2014 - Introduction to Relational Databases

When to use a relational database

You are collecting information about things which share common properties

You want to be able to list particular records that meet certain conditions

You wish to encourage consistency You want to be efficient, and avoid duplication of information You value flexibility when querying Good for collaborative working – one person sets up the

database, many can edit the data, many more can view or query the data

18 July, 2014Digital Humanities Summer School -An Introduction to Relational Databases

Page 4

Page 5: DHOxSS 2014 - Introduction to Relational Databases

Structure of a relational database - tables Example scenario: study of 18th century book trade

What things are we interested in? Publications Publishers People Our sources for the information we’re collecting

And what information might we want to know about each of these things?

Names Dates Places References

18 July, 2014Digital Humanities Summer School -An Introduction to Relational Databases

Page 5

Page 6: DHOxSS 2014 - Introduction to Relational Databases

18 July, 2014Digital Humanities Summer School -An Introduction to Relational Databases

Page 6

Person

Surname

First name

Middle initial(s)

Date of birth

Notes

Publication

Title

Author(s)

Publisher

Date of publication

Place of publication

Edition

Format

Type of publication

Price

Sales

Notes

Publisher

Name

Staff

Founded

Ceased

Address

Notes

Reference

Author(s)

Title

Date of publication

Edition

Volume

Page(s)

URL

Notes

Page 7: DHOxSS 2014 - Introduction to Relational Databases

Structure of a relational database – data types Most relational database management systems require that

each field has a defined data type Text (e.g. varchar, memo) Numeric (e.g. integer, decimal) Date Boolean (true / false; on / off) Blob (for otherwise undefined data, such as image files)

Each table needs at least one field that only contains unique values, which can be used as a ‘primary key’

Commonly an auto-incrementing whole (integer) number

18 July, 2014Digital Humanities Summer School -An Introduction to Relational Databases

Page 7

Page 8: DHOxSS 2014 - Introduction to Relational Databases

18 July, 2014Digital Humanities Summer School -An Introduction to Relational Databases

Page 8

Person

ID Int

Surname Text

First name Text

Middle initial(s)

Text

Date of birth Date

Notes Text

Publication

ID Int

Title Text

Author(s) Text

Publisher Text

Date of publication

Int?

Place of publication

Text

Edition Int

Format Text

Type of publication

Text

Price Dec?

Sales Int?

Notes Text

Publisher

ID Int

Name Text

Staff Text

Founded Int?

Ceased Int?

Address Text

Notes Text

Reference

ID Int

Author(s) Text

Title Text

Date of publication

Int?

Edition Int?

Volume Int?

Page(s) Text?

URL Text

Notes Text

Page 9: DHOxSS 2014 - Introduction to Relational Databases

Structure of a relational database - relationships Our different things are related to one another

A person may be the author of a publication, or a reference work, or they may be a publisher

Each edition of a publication has a publisher, or maybe more than one?

The information you record about a particular publication, or publisher, may come from one or more sources

Relationships between things can be of various sorts: One-to-many (e.g. a publisher may have many publications) Many-to-many (e.g. a publication may have many authors, and an

author may have many publications) One-to-one (rarely used – can improve performance, overcome

system limitations, or enable more granular access permissions)

18 July, 2014Digital Humanities Summer School -An Introduction to Relational Databases

Page 9

Page 10: DHOxSS 2014 - Introduction to Relational Databases

18 July, 2014Digital Humanities Summer School -An Introduction to Relational Databases

Page 10

Person

ID Int

Surname Text

First name Text

Middle initial(s)

Text

Date of birth Date

Reference Int

Page Text

Notes Text

Publication

ID Int

Title Text

Author(s) INT

Publisher INT

Date of publication

Int?

Place of publication

Text

Edition Int

Format Text

Type of publication

Text

Price Dec?

Sales Int?

Reference Int

Page Text

Notes Text

Publisher

ID Int

Name Text

Staff Text

Founded Int?

Ceased Int?

Address Text

Reference Int

Page Text

Notes Text

Reference

ID Int

Author(s) Text

Title Text

Date of publication

Int?

Edition Int?

Volume Int?

URL Text

Notes Text

1

?1

Page 11: DHOxSS 2014 - Introduction to Relational Databases

18 July, 2014Digital Humanities Summer School -An Introduction to Relational Databases

Page 11

Person

ID Int

Surname Text

First name Text

Middle initial(s)

Text

Date of birth Date

Reference Int

Page Text

Notes Text

Publication

ID Int

Title Text

Author(s) INT

Publisher INT

Date of publication

Int?

Place of publication

Text

Edition Int

Format Text

Type of publication

Text

Price Dec?

Sales Int?

Reference Int

Page Text

Notes Text

Publisher

ID Int

Name Text

Staff Text

Founded Int?

Ceased Int?

Address Text

Reference Int

Page Text

Notes Text

Reference

ID Int

Author(s) Text

Title Text

Date of publication

Int?

Edition Int?

Volume Int?

URL Text

Notes Text

1

?1

Man

y to

man

y

Page 12: DHOxSS 2014 - Introduction to Relational Databases

18 July, 2014Digital Humanities Summer School -An Introduction to Relational Databases

Page 12

Person

ID Int

Surname Text

First name Text

Middle initial(s)

Text

Date of birth Date

Reference Int

Page Text

Notes Text

Publication

ID Int

Title Text

Publisher INT

Date of publication

Int?

Place of publication

Text

Edition Int

Format Text

Type of publication

Text

Price Dec?

Sales Int?

Reference Int

Page Text

Notes Text

Publisher

ID Int

Name Text

Staff Text

Founded Int?

Ceased Int?

Address Text

Reference Int

Page Text

Notes Text

Reference

ID Int

Author(s) Text

Title Text

Date of publication

Int?

Edition Int?

Volume Int?

URL Text

Notes Text

1

?1

Man

y to

man

y

Authorship

ID Int

Author Int

Publication Int

Page 13: DHOxSS 2014 - Introduction to Relational Databases

Alternative structures

If you are certain that no publication is going to have more than three authors, your might want to have fields in the ‘publication’ table for author1, author2, author3 – each with a one-to-many relationship with the ‘person’ table

You could create another table just consisting of IDs and different types of publication. This could then be linked to the ‘publication’ table and act as a controlled vocabulary

Have a separate table for edition information. In most cases authors will not change, format might, sales and price almost certainly will. This will avoid data duplication

But maybe authors will be credited differently (anon revealed?), or titles vary between editions?

18 July, 2014Digital Humanities Summer School -An Introduction to Relational Databases

Page 13

Page 14: DHOxSS 2014 - Introduction to Relational Databases

Database design – good practice

Database normalization: Shouldn’t have to enter the same data twice Separate tables for separate things Don’t define duplicate fields in the same table (e.g. author1, author2, etc.) Fields should be ‘atomic’ – containing information at the most granular level

(usually) Enforce data integrity Keep ‘blobs’ of data (images, audio, etc.) outside of your database or at the very

least in separate tables; include links to the files within the database

Table / field naming conventions: Be consistent Avoid spaces, punctuation marks, and other non-alphanumeric characters

(although it’s fine to use underscores instead of spaces)

Document your database! You will thank yourself later

18 July, 2014Page 14

Digital Humanities Summer School -An Introduction to Relational Databases

Page 15: DHOxSS 2014 - Introduction to Relational Databases

Database design workflow

18 July, 2014Digital Humanities Summer School -An Introduction to Relational Databases

Page 15

Page 16: DHOxSS 2014 - Introduction to Relational Databases

Querying a relational database

Queries usually constructed using SQL statements SQL stands for ‘Structure Query Language’

Some Relational Database Management Systems hide the raw SQL from the user by providing query-builder tools

SELECT statements indicate which fields should be returned FROM statements indicate the table(s) in which those fields are

to be found JOIN statements are used when you wish to query multiple tables WHERE statement provide the conditions that a record must

meet in order to be listed in results ORDER BY statements control the order in which results are

returned

18 July, 2014Digital Humanities Summer School -An Introduction to Relational Databases

Page 16

Page 17: DHOxSS 2014 - Introduction to Relational Databases

Querying a relational database - examples

Imagine we have a single table in a database, called ‘countries’

18 July, 2014Page 17

Digital Humanities Summer School -An Introduction to Relational Databases

Countries

ID Int.

name Text

area Int.

population Int.

continent Text

If_visited Bool.

observations Text

SELECT * FROM Countrieswould return all information about all countries

SELECT name, area, population FROM Countrieswould return only the information in the named fields

SELECT * FROM Countries WHERE visited = TRUEwould return all information about countries that have been visited

SELECT * FROM Countries WHERE visited = TRUE AND population > 1000000would return all information about countries that have been visited and have a population of greater than a million

Page 18: DHOxSS 2014 - Introduction to Relational Databases

Querying a relational database - examples

JOINS are used to search across multiple tables

18 July, 2014Page 18

Digital Humanities Summer School -An Introduction to Relational Databases

SELECT c.name, c.area, c.population, d.name FROM Countries c INNER JOIN Continents d

ON c.continent = d.ID WHERE c.visited = TRUE AND d.name = ‘Europe’

would return selected information about each European

country visited

Countries

ID Int.

name Text

area Int.

population Int.

continent Int.

If_visited Bool.

observations Text

Continents

ID Int.

name Text

area Int.

Page 19: DHOxSS 2014 - Introduction to Relational Databases

What query results look like

A single table / spreadsheet Although software / websites may format results into a report.

18 July, 2014Page 19

Digital Humanities Summer School -An Introduction to Relational Databases

Page 20: DHOxSS 2014 - Introduction to Relational Databases

What can you do with your results?

Count, sort, and sometimes filter further

Export and analyse .csv file format is standard, and compatible with almost all statistical

analysis / data visualisation software

Save and make available to others

18 July, 2014Page 20

Digital Humanities Summer School -An Introduction to Relational Databases

Page 21: DHOxSS 2014 - Introduction to Relational Databases

Common database challenges in the humanities

Patchy or incomplete data Beware of the difference between 0 and null

Varying degrees of accuracy Often an issue with historical dates Splitting the separate elements of a date into separate fields may help

Interpreted and uncertain information Include a field indicating the degree of certainty of a particular ‘fact’ – e.g.

‘Definite, Probable, Possible’ Inconsistent or changing terminology

Alternative spellings, different forms of address, name changes Can be an idea to have a table of controlled vocabulary

‘Fuzziness’ vs. ‘queryableness’ e.g. if you store a data as ‘c. 310 BCE’, you can’t use it in a conditional

query such as ‘list all the inscriptions from the fourth century BCE

18 July, 2014Digital Humanities Summer School -An Introduction to Relational Databases

Page 21

Page 22: DHOxSS 2014 - Introduction to Relational Databases

Your exercise today…

Draft a structure for a relational database recording information about membership of gentlemen’s clubs in Victorian London

Think about the tables, fields, and relationships you’d need Your evidence collection (membership records, letters, diaries,

etc.) tells you which clubs people belonged to, and when However, the information is patchy

Names may not be given in full – identity is sometimes uncertain Dates may be uncertain or missing

All clubs have multiple members; some people were members of multiple clubs at varying periods

Over the years, some clubs changed locations

18 July, 2014Digital Humanities Summer School -An Introduction to Relational Databases

Page 22

Page 23: DHOxSS 2014 - Introduction to Relational Databases

Our example solution

18 July, 2014Digital Humanities Summer School -An Introduction to Relational Databases

Page 23

Page 24: DHOxSS 2014 - Introduction to Relational Databases

Possible enhancements

If dates are uncertain, integer may not be the best data type. Make the relationship between club_memberships and

evidence many-to-many rather than one-to-many Done by adding a link table

Split author entries into a separate table Allows multiple authors for each piece of evidence

Impose a controlled vocabulary on the occupation field by adding a look-up table

Add longitude and latitude to the addresses table.

18 July, 2014Digital Humanities Summer School -An Introduction to Relational Databases

Page 24

Page 25: DHOxSS 2014 - Introduction to Relational Databases

Relational Database Management Systems

Databases can seem rather complicated, but there is software that can help

MS Access Filemaker Pro Coming soon to Oxford – the Online Research Database Service

(ORDS)

For web-hosted relational database manipulation: MySQL PostgreSQL

18 July, 2014Digital Humanities Summer School -An Introduction to Relational Databases

Page 25

Page 26: DHOxSS 2014 - Introduction to Relational Databases

Questions?

18 July, 2014Page 26

Digital Humanities Summer School -An Introduction to Relational Databases