Top Banner
LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27
33

LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

Jan 05, 2016

Download

Documents

Elvin Jefferson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

LIS654 lecture 3

whaffle

Thomas Krichel2011-09-27

Page 2: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

readings• This slide set follows Reese and Barnerjee very

closely. • We want to get through the gist of what they

have in chapters one and two. I skip the most trivial things as well as the stuff that will be covered in copyright and imaging lectures.

• I have not been involved in repositories but I don’t buy a lot what they write.

Page 3: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

planning• “The ultimate success or failure of a digital

repository is usually determined in the planning stage. A repository must be structured and organized that users can readily find and use diverse types of resources. It must be easy to maintain and capable of accommodating needs and resources that may not exist at the time the repository is designed.”

• Happy talk!

Page 4: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

planning importance• “The ultimate success or failure of a digital

repository is usually determined in the planning stage.”

• It would be useful to have an example of a repository that failed because it was badly planned.

• The weak contents in many academic repositories suggests that all are badly planned?

Page 5: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

search

• “A repository must be structured and organized that users can readily find and use diverse types of resources.”

• Users don’t search local repositories. They come in through search engines or aggregators (which are also found through search engines). Optimizing repositories for local findability is plain wrong.

Page 6: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

capability

• “capable of accommodating needs and resources that may not exist”

• It is impossible to do that. Making this sort of ideas a precondition for building a repository slows down progress with real task.

Page 7: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

parallel to physical

• “Creating and managing a digital repository is similar to starting a new physical collection … new materials must be added while those that no longer support the mission of the repository should be removed”.

• The first idea holds people hostage to the past and the second is inimical to digital preservation.

Page 8: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

preservation• “one of the primary functions of digital

repositories is to preserve electronic resources, though they must also provide a system for cataloging, indexing and retrieving digital materials”.

• We are still on page one, but have already a contradiction with statement of previous slide.

• “electronic resources” vs “digital materials”.

Page 9: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

missing here

• There needs to be an analysis done of the functionalities of the repository.

• Some of the aims of the repository may be contradictory.

• Then a prioritization can take place between these different functionalities.

• This will allow to select an appropriate software.

Page 10: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

“decision to build a digital repository”• “Although many people treat repositories as

short-term projects that can be funded with grants and other non-recurring monies, the reality is …”

• Building the repository will cost a lot.• Maintaining it is ok, if you have somebody on

staff who has minimum system administration skills and you can pay for external hosting and local backup.

• Comparing the repository to new physical collection is not helpful.

Page 11: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

role of the repository

• “The importance of physically processing resources is diminishing, and more value is placed on the ability to locate and download remotely stored resources. In this sense, digital repositories are a logical outgrowth of traditional library services in response to challenges brought by network technology.”

• discuss ;-)

Page 12: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

example 1 (born digital) offered by RB

• An example they point out is http://lcweb2.loc.gov/diglib/lcwa/html/elec2000/elec2000-browse.html

• This is well presented collection.• It seem to carry over coding mistakes from the

collection.• There does not appear to be a harvesting

interface.

Page 13: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

example 2

• Locally to them, they look at http://oregondigital.org/digcol/corflood64/

• This is a ContentDM based digital image collection.

• This really is an archival collection.

Page 14: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

opportunity for libraries

• Provide desktop access.• Present the library as au-fait with technology.• It is an occasion to set up skills. • Expand the remit of the library to publication

of locally produced materials. This latter point mainly applies to academic institutions but may be to others.

Page 15: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

problems with repositories

• Tools are not stable.• Migrations will be required.• User expectations are high (erh…)• Electronic resources are more difficult to work

with.• Staff adaptability or having enough competent

staff is the biggest challenge.

Page 16: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

repository purpose questions• What type of resources will it contain?• How big is it supposed to grow?• Who is going to use it and how?• How can resources be protected against

modification?• How will access and IP right be managed?• What systems will it see to interact with?• What resources will be available to create and

maintained it?

Page 17: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

expected use of the repository

• R&B say that you have to make expectations about the use of the repository.

• What you, in principle, need to think about is how do you organize searching and browsing.

• However in practice it turns out that you will only be able to do what the repository software will be able to do, unless you can change the software. Changing software can be a tall order.

Page 18: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

searching

• You usually have resources and their descriptions.

• The descriptions can be stored as BLOBs in a database.

• You need to extract the searchable from the descriptions to make them searchable in the database.

• Example: find pictures shot between 2011-04 and 2011-05.

Page 19: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

browsing• This is tougher.• Here the data has to be discrete.• Many times the same entity is referred to by

different values, e.g. “Thomas Krichel” vs “Томас Крихель”, “The Magic Flute” vs “Die Zauberflöte”.

• If you want to have browsing by author, composer, work etc, you to, most likely manually, bring variant from together.

Page 20: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

backup

• This is more of a technical issue.• You will need backup. My general prescription

would be to run the repository itself with a 3rd party provider.

• Locally, keep a staging (rather than production) server and a backup. They can both be on the same machine.

Page 21: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

common-sensical sysadmin tips• You need physical security for any server.• You need to keep the software up-to-date. I do

it, roughly, weekly.• You need to join the mailing list for the

repository software, and the security list for the operating system.

• Encrypted access to the server when authentication is required.

• Run minimal amount of software.

Page 22: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

acquisitions• Since paper publishing is expensive, publishers

have to make exert some quality control. • For physical collections, libraries have

elaborate procedures. They have been evolving slowly for about 500 years.

• Libraries have catalogs, approval plans etc.• These are of little help with digital materials.• Most of the challenges of acquiring physical

continue for digital assets, R&B noted earlier.

Page 23: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

advice

• Another cerebral fart of R&B: “The value of a digital collection is measured by how well it helps people find what they need rather than by the number of items it contains.”

• They continue straight: “This means that to be useful, digital files must be selected and processed before they are stored.”

Page 24: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

developing the collection

• Putting in resources into the repository because they are there?

• Rely on content providers to provide them?• Rely on serendipity of library staff?

Page 25: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

R&B questions to answer

• What resources are desired and where are they?

• How will different versions of a document be handled?

• Who should be involved in the selection process?

• What tools exist to help automatically detect resources?

Page 26: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

fragmented resources

• “Acquiring resources for a digital repository is an inherently complex endeavor because it is often unclear what needs to be archive in the first place. Electronic resources frequently lack obvious boundaries.” – web pages– dynamically generated resources

Page 27: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

dealing with them• R&B suggest– not include them?– reformatting them?– postpone dealing with them?– contracting out?

• The Internet Archive’s Heritrix is a software that can deal with the archiving of web pages.

• The reformatting of links in proprietary file formats may be more difficult.

Page 28: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

identification planning

• This is an important process of building archive.

• Anything that is considered a resource has to be given an identifier.

• Identifiers can be dumb or intelligent.• Identification may be hierarchical and it can

then be delegated. • [I am leaving R&B here.]

Page 29: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

dumb identifiers

• Dumb identifiers contain no information about the item that they are identify.

• For example a number can be used.• Advantages– easy to create– no temptation to change

• Problem– not easy to relate to resource

Page 30: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

intelligent identifiers

• They say something about the resource.• Usually, any hierarchical identification

structure has some intelligence built into it. • But there is a temptation to change the handle

when there is a change in the intelligent matter that the handle is built on.

Page 31: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

Example from RePEc• The identification strategy was set by yours

truly.• It combine a centrally assigned archive code,

an series code assigned by the archive, and a code of the paper in the series.

• This is problematic when series move between archives.

• I tried to later have the series code to be centrally assigned.

Page 32: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

problem of handle instability

• If handles change, there are problems with all services based on them.

• For example if you have an announcement service, the paper appears to be new.

• If you have an author claiming service, the author appears to loose a paper and has to select the paper again.

Page 33: LIS654 lecture 3 whaffle Thomas Krichel 2011-09-27.

http://openlib.org/home/krichel

Please shutdown the computers whenyou are done.

Thank you for your attention!