Top Banner
LIS510 lecture 12 Thomas Krichel 2006-12-13
62

LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Dec 26, 2015

Download

Documents

Dora Collins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

LIS510 lecture 12

Thomas Krichel

2006-12-13

Page 2: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

today• Leftovers from last time. • I discuss some elements of Bill Arms’ book

on Digital Libraries. – It’s introductory book that general, but smartly

written. – It is not a book to each someone to become a

digital librarian.– LIS650 and LIS651 are for that. They really

deal with the introduction to digital information.

• I also talk generally about understanding some digital contents.

Page 3: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

definition

• An informal definition of a digital library is a “managed collection of information, with associated services, where the information is stored in digital formats and accessible over a network.”

• “managed” in the key word here.

Page 4: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

benefits of digital libraries

• The digital library brings the library to the user.

• Computer power is used for searching and browsing.

• Information can be shared.

• Information is easier to keep current.

• The information is always available.

• New forms of information become possible.

Page 5: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

costs• Non-digital libraries are very expensive.

• Digital libraries are also expensive. Many publishers charge more for online editions that for traditional print.

• However the cost of the infrastructure is dropping.

• And there are potentials for changes in the way information is supplied in digital libraries.

Page 6: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

technical change

• Electronic storage is becoming cheaper than paper.

• Personal computer displays are becoming more pleasant to use.

• High-speed networks are becoming widespread.

• Computers have become portable.

Page 7: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

libraries adapt• Libraries get wired

• They offer electronic access, even to the home user.

• Other actions depend on the library type– Some shift from information access to

community center.– Some adopt digital reference with 24/7

asynchronous help.– Some get involved in digital archiving of

institutional assets.

Page 8: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

digital library cost

• The digital library material will cost more initially because publishers want to see a return in the extra functionality they have developed.

• In the longer run, digital library costs may be lower than in print– lower storage cost– less risk to the items– fewer staff (but differently trained) requirements

Page 9: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

classic roles for the library with digital material

• Investigation what to buy

• Negotiation of the purchase

• Acquisition of access to a service

• Installation of access devices

• Training of users

• Maintenance: update, migrate, replace

Page 10: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

beyond the library

• The classic roles will at best a stagnating, if not declining source for information professionals.

• The rise of open access will mean that no longer as many assets as before will have to be purchased. Today’s example

http://dme.mozarteum.at• Training needs of users decline as digital

media are getting easier to use.

Page 11: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

new roles for information professionals

• The information age does not happen without information professionals.

• There a huge demand for tech-savvy information professionals out there. Examples include– web site maintenance– digital archiving

Page 12: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

impact of technology on staff

• Information professionals that are technologically savvy will thrive better than those who are not.

• Fortunately the Palmer School offers LIS508, LIS650, LIS651.

• It still does not have a system administration class, but that may come as well.

Page 13: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

impact of technology on staff

• Constant computer use can cause serious health problems

• Problem areas are– bad posture problems at the desk– eye strain

• The use of mouse is particularly bad. Learn how to avoid using it.

• Injuries take a long time to heal.

Page 14: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

digital libraries are hard

• In digital libraries terminology is a bad problem. Basic concepts are hard to find.

• These definition problems also hurt efforts to build sophisticated information systems by semi-automated means.

• We live in the age of the brute-force calculation, not the age of artificial intelligence.

Page 15: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

data and metadata

• Metadata is data about data. The distinction between data and metadata depends often on the context.

• Metadata is often divided into– descriptive metadata– structural metadata– administrative metadata

Page 16: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

what’s in the digital library?• Items ?

• Material ?

• Documents ?

• Objects?

• Digital Items ?

• Digital Material ?

• Digital Documents ?

• Digital Objects ?

Page 17: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

storage and dissemination

• Items are stored in digital format in a way we can call the stored form of the item.

• When the item is shown to the user, it is shown as a “presentation” or “dissemination”. This is the way the object leaves the server.

• When it arrives at the users’ machines, they have to “render” the presentation.

Page 18: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

users and clients

• A user is someone who uses a digital library. Many times, the user is anonymous and can not be identified.

• A client is a software that the user runs to use the digital library. Sometimes this is called a user agent. Many times common people refer to it as a browser.

Page 19: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

work and contents

• These are difficult things to discuss. Look at the example at the song “Der Lindenbaum”. Could mean– song as sound and words– score– performance– recording– mp3 file containing the recording

Page 20: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

repositories

• This is general term used to talk about a computer system that has primarily the function of storing contents.

• When long-run storage is involved a repository becomes an archive.

• A server is a computer that is switched on constantly to provide services to the public.

Page 21: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

an example of terminology• “A data model is an abstraction (or an extra

level of indirection) for digital objects such that each digital object can be seen as an instance of the class defined by the data model.”

• “A surrogate is a transmittable serialization or representation of a digital object that can be passed back and forth so we can do things with it. Possible serialization techniques include XML and RDF/XML.”

Page 22: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

a digital library from scratch

• Much of the data that is stored in digital libraries is text.

• Most other material, that is not textual in nature, such as – sound files– graphics

need textual metadata in order to be found. Current technology is not able to find it

otherwise.

Page 23: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Information

• Information is best understood as “what it takes to answer a question”.

• The simplest question has a “yes” or “no” answer. Therefore a bit is the natural measure of information.

• Term first used by John Turkey in 1946.

• Concatenation of “binary digit”.

Page 24: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Usage of bits

• Computers are sometimes classified by the number of bits they can process at one time. "32 bit processor"

• Graphics are also often described by the number of bits used to represent each dot.

Page 25: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

bits and bytes

• a bit can take the values 0 or 1, thus it can describe 2 possibilities

• two bits can take the value 00, 01, 10, 11, thus it can describe four 2×2 possibilities

• n bits can encode 2 power n possibilities.• The first chips used to process 8 bits at a time. It

become customary to refer to them as a byte. It can encode 2 power 8 possibilities.

• We can use binary numbers just as decimal numbers.

Page 26: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

application of bytes

• IP (Internet Protocol) numbers are used as the addresses of computers on the Internet.

• In IP version 4 (the one that is most commonly used), each IP number has 4 bytes.

• It is represented as x.x.x.x where x is a number between 0 and 255 (why?)

• How many computers can there be on the Internet at any one time?

Page 27: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Many bytes

• Larger units are– Kilo byte is 2 power 10 bytes (=1024 bytes)– Mega bytes is 2 power 20 bytes– Giga bytes is 2 power 30 bytes– Tera byte is 2 power 40 bytes

• From ancient Greek words for "thousand", "large", "giant", and "monster", respectively. Terms date back to the French revolution.

Page 28: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Hex numbers• A byte is often represented by two hex

numbers.

• Each hex number can encode 16 values

• Written 0 to 9, then A B C D E F. F is 15.

• Conventionally prefixed with 0x

• Use Microsoft calculator with scientific notation to convert.

Page 29: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

applications of hex numbers• Media Access Control (mac) addresses of

hardware that allows access to computer networks. They are 6-byte numbers, each byte written as 2 hex numbers, e.g. 00:60:08:F5:20:A9

• character numbers that you see when you are inserting a special symbol in Microsoft software, e.g. powerpoint.

• Color codes on web pages use 6 hex digits.– 000000 is black– FFFFFF is white

Page 30: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Information in a computer file

• A file is a piece of data on a stored on a computer.

• Any file contains a sequence of 0s and 1s, like 1010100101010011110101010101…

• For a computer to make sense of a file, it has to know what type of file it is.

Page 31: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

executable files

• Files that are executable are files that make the computer do something. For example the file starts a program, say powerpoint. An executable on one computer may not run on another one.

• Non-executable files hold data that is used by an executable file. We will call them data files. Example: powerpoint slides file.

Page 32: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Characters

• Much of the information processed by computers is in the form of characters.

• From wikipedia– A character is a unit of information that roughly

corresponds to a grapheme, or written symbol, of a natural language, such as a letter, numeral, or punctuation mark.

• A character is not a grapheme because there are ligatures.

Page 33: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

control characters

• The concept also includes control characters, which do not correspond to natural language symbols but to other bits of information used to process texts of the language, such as instructions to printers or other devices that display such texts.

• An example for such a control character is the newline character.

Page 34: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

text files

• Many data files contain textual data. • Textual data is a sequence of characters.• A character is an elementary symbol that

has some meaning– alphabet letter– hieroglyph

• Example: email file• Text files can be read by many computer

programs.

Page 35: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

non-text files

• Examples for non-text files are – graphics files– movie files– sound files

• Non-text files are of minor significance in library settings– There is no way to organize information

retrieval for non-text files. They have to be retrieved using a textual surrogate.

– Traditional library material are textual

• will talk about this later.

Page 36: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Representing characters

• Computers don't understand text, they only understand numbers. For computers to be able to treat text, there must be a correspondence between numbers and text characters. Such a correspondence is called a character set.

• Examples for characters are – a

– c

– ë

– €

Page 37: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Legacy character sets

• In early days, computers were a lot less powerful than they are today.

• Could only deal with the characters that are most commonly used.

• Such sets are– ascii– ISO-8859-1– cp1252

Page 38: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

ASCII

• American Standard Code for Information Interchange

• 7-bit character set. There is no such thing as 8-bit ASCII

• 95 printable symbols

• 33 control characters (0-31, 127)

• http://www.ccmr.cornell.edu/helpful_data/ascii2.html has a list up to 127

Page 39: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

some ASCII control characters

• CR (13, ^M) is the carriage return

• LF (10, ^J) is the linefeed

• FF (12, ^L) is the form feed (new page)

• BS (8, ^H) is the backspace

• DEL (127, ALT-127) is delete

• ESC (27, ^[) escape

Page 40: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

ISO-8859-1

• ISO-8859-1, aka ISO-latin-1 extends ASCII with characters that are commonly used by the western European languages.

• It is the default character set of html.

• Positions 128 to 159 are not used.

• Cp1252 fills these with graphic chars. It is as Microsoft character set.

Page 41: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

This is not enough

• There are around 6800 different languages around.

• Some of these languages use characters sets that are not finite, i.e. folks can make up now characters out of existing ones!

• Setting up a character set for all languages is almost impossible.

Page 42: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

ISO 10646-1

• Defines the Universal Character Set (UCS)• UCS contains the characters required to

represent characters used by many known languages, even the likes of Oriya, Telugu, Bopomofo, Runic.

• ISO 10646 defines formally a 31-bit character set. They are represented as 32 bits, i.e. 4 bytes, or 8 hex chars.

• Not finished.

.

Page 43: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Unicode

• ISO is a inter-government agency. Slow and bureaucratic.

• Industry has come together to work on Unicode, a 2-byte character set.

• With some minor exceptions, the Unicode characters are the some as the first 65536 characters in UCS.

• Much better documented standard.

Page 44: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Unicode and legacy sets

• The first 128 characters are identical to those in ASCII

• The next 128 characters are identical to ISO 8859-1 (Latin-1).

• Unicode is well documented and the Unicode book can be downloaded from the Internet. A must-have for the serious digital librarian.

Page 45: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Beyond characters

• There is more to text than a string of characters.

• There is layout– titles– abstracts– mathematical formula spacing

Page 46: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Layout

• Layout can be conveyed by additional text that has special meaning. Examples – LaTeX– HTML– PostScript

• Another way is to do non-textual layout by adding some other digital signals. Examples– DVI– MS Word– MS Powerpoint

These can not be shown in these slides!

Page 47: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Example: LaTeX

\bigskip\textbf{Class structure}

Classes will be held in the computer lab in the Palmer School between 18:15 and 20:45. An optional practice session will last until 21:15.

\begin{tabular}{@{}llll@{}}

0&2006--09--12&introduction to the course &\\

1&2006--09--19&libraries and food &\\

2&2006--09--26&introduction to shushing &\\

Page 48: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Example: HTML

<p><strong>Class structure</strong><p>Classes will be held in the computer lab in the Palmer School between 18:15 and 20:45. An optional practice session will last until 21:15.<p>Class details:

<p><center><table width=100% border=1>

<tr><td align=left> 0 </td><td align=left> 2006&#8211;09&#8211;12 </td><td align=left><a href="lis510w06a-00.ppt">introduction to the course</a> </td></tr><tr><td align=left> 1 </td><td align=left> 2006&#8211;09&#8211;19 </td><td align=left><a href="lis510w06a-01.ppt">libraries and food</a> </td>

Page 49: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Example: PostScript

Fc(Class)g(structur)o(e)-104 3956 y Fd(Classes)26b(will)g(be)e(held)g(in)h(the)f(computer)f(lab)i(in)f(the)h(P)o(almer)f(School)g(between)f(18:15)h(and)g(20:45.)36 b(An)25 b(optional)e(practice)h(session)-104 4055 y(will)d(last)g(until)f(21:15.)-104 4155 y(Class)i(details:)-104 4307 y(0)141 b(2003\22609\22623)94b(introduction)18 b(to)i(the)h(course)-104 4407 y(1)141 b(2002\22609\22630)94 b(bits)21 b(bytes)f(and)g(characters)-104 4507 y(2)141 b(2003\22610\22607)94 b(databases)20 b(and)g(markup)e(languages)-

Page 50: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

DVI (rendition, "class structure")1659: fntnum27 current font is ptmb8t1660: setchar67 h:=-820459+473168=-347291, hh:=-221661: setchar108 h:=-347291+182183=-165108, hh:=-101662: setchar97 h:=-165108+327680=162572, hh:=111663: setchar115 h:=162572+254928=417500, hh:=271664: setchar115 h:=417500+254928=672428, hh:=431665: right3 163840 h:=672428+163840=836268, hh:=531669: setchar115 h:=836268+254928=1091196, hh:=691670: setchar116 h:=1091196+218232=1309428, hh:=831671: setchar114 h:=1309428+290976=1600404, hh:=1011672: setchar117 h:=1600404+364376=1964780, hh:=1241673: setchar99 h:=1964780+290976=2255756, hh:=1421674: setchar116 h:=2255756+218232=2473988, hh:=1561675: setchar117 h:=2473988+364376=2838364, hh:=1791676: setchar114 h:=2838364+290976=3129340, hh:=197

Page 51: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

XML

• XML the extensible markup language. It have become the lingua franca for structured textual data.

• It is also increasingly use on the web.

Page 52: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Databases

• Databases are collection of data with some organization to them.

• The classic example is the relational database.

• But not all database need to be relational databases.

Page 53: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Relational databases

• A relational database is a set of tables. There may be relations between the tables.

• Each table has a number of record. Each record has a number of fields.

• When the database is being set up, we fix – the size of each field – relationships between tables

Page 54: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Example: Movie database

ID | title | director | date

M1 | Gone with the wind | F. Ford Coppola | 1963

M2 | Room with a view | Coppola, F Ford | 1985

M3 | High Noon | Woody Allan | 1974

M4 | Star Wars | Steve Spielberg | 1993

M5 | Alien | Allen, Woody | 1987

M6 | Blowing in the Wind | Spielberg, Steven | 1962

• Single table• No relations between tables, of course

Page 55: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Problem with this database

• All data wrong, but this is just for illustration.

• Name covered inconsistently. There is no way to find films by Woody Allan without having to go through all spelling variations.

• Mistakes are difficult to correct. We have to wade through all records, a masochist’s pleasure.

Page 56: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Better movie databaseID | title | director | year

M1 | Gone with the wind | D1 | 1963

M2 | Room with a view | D1 | 1985

M3 | High Noon | D2 | 1974

M4 | Star Wars | D3 | 1993

M5 | Alien | D2 | 1987

M6 | Blowing in the Wind | D3 | 1962

ID | director name | birth year

D1 | Ford Coppola, Francis | 1942

D2 | Allan, Woody | 1957

D3 | Spielberg, Steven | 1942

Page 57: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Relational database

• We have a one to many relationship between directors and film– Each film has one director– Each director has produced many films

• Here it becomes possible for the computer– To know which films have been directed by

Woody Allen– To find which films have been directed by a

director born in 1942

Page 58: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Many-to-many relationships

• Each film has one director, but many actors star in it. Relationship between actors and films is a many to many relationship.

• Here are a few actorsID | sex | actor name | birth year

A1 | f | Brigitte Bardot | 1972

A2 | m | George Clooney | 1927

A3 | f | Marilyn Monroe| 1934

Page 59: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

Actor/Movie table

actor id | movie id

A1 | M4

A2 | M3

A3 | M2

A1 | M5

A1 | M3

A2 | M6

A3 | M4

… as many lines as required

Page 60: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

SQL

• Once we have the relational database, we can ask sophisticated questions:– Which director has had the most female actors

working for him?– In which years films have been shot that

starred actors born between 1926 and 1935?

• Such questions can be encoded in a language know as “structured query language” or SQL. All relational database vendors implement a dialect of SQL.

Page 61: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

databases in libraries

• Relational databases dominate the world of structured data

• But not so popular in libraries– Slow on very large databases (such as catalogs)– Library data has nasty ad-hoc relationships, e.g.

• Translation of the first edition of a book• CD supplement that comes with the print version

Difficult to deal with in a system where all relations and field have to be set up at the start, can not be changed easily later.

Page 62: LIS510 lecture 12 Thomas Krichel 2006-12-13. today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.

http://openlib.org/home/krichel

Thank you for your attention!