Top Banner
Problems and Issues in Selecting, Harvesting, and Cataloging Web Resources Joanne Archer and John Schalow University of Maryland Libraries
15
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lsr vpresntation

Problems and Issues in Selecting, Harvesting, and Cataloging Web

Resources

Joanne Archer and John SchalowUniversity of Maryland Libraries

Page 2: Lsr vpresntation

Jargon

CrawlerWeb Harvesting

Seed

Harvest

Crawl

Page 3: Lsr vpresntation

Wayback Machine

Page 4: Lsr vpresntation

Options for Web Harvesting

In House Program

i.e. Pandora, Web Curator Tool

Pro: flexibility

Con: $$$

i.e. HTTrack, Adobe Web Capture

Pro: inexpensive

Con: not-scalable

Off the Shelf

Software

Third Party

Subscription

i.e. Web Archiving Service

Archive-It

Pro: Ease-of-use

Con: $

Page 5: Lsr vpresntation

Key Questions for Harvesting Projects

unique

ness

ephemerality

research valueharvest frequency

scope

Page 6: Lsr vpresntation

Maryland’s Pilot Harvests(2008-2010)

Historic Preservation Maryland State Documents

Page 7: Lsr vpresntation

Why harvest these areas?

• Collections are unique

• Builds on existing strengths in print collections

• Large amount of material migrating to the web

Page 8: Lsr vpresntation

Key Questions for Harvesting Projects

unique

ness

ephemerality

research valueharvest frequency

scope

Page 9: Lsr vpresntation

Harvesting

Page 10: Lsr vpresntation

Harvesting Challenges:• Javascript• Streaming media• Form and database driven content• Password protected sites• Robot.txt files• Multiple hosts/subdomains

Page 11: Lsr vpresntation

Single host = www.preservemd.org

Multiple hosts = www.umd.edu

www.lib.umd.edu

Page 12: Lsr vpresntation

End-User Access

Page 13: Lsr vpresntation

End-User Access

collection note

subjectheading

general material designation

URLs

uniform title

Page 14: Lsr vpresntation

Conclusions

Challenges• Start up costs• What to collect• Metadata creation

BUT We are well prepared to meet the challenges

Page 15: Lsr vpresntation

Questions?

• Joanne Archer: [email protected]

• John Schalow: [email protected]