WISCONSIN LIBRARY ASSOCIATION CONFERENCE OCTOBER 25, 2013 DESIGNING A SUCCESSFUL DIGITAL PROJECT Supported by WHRAB Sarah Grimm, Electronic Records Archivist, Wisconsin Historical Society Emily Pfotenhauer, Recollection Wisconsin Program Manager, WiLS
Slides from the "Planning a Successful Digital Project" start-to-finish session presented at the Wisconsin Library Association annual conference, Green Bay, October 25, 2013. Presenters: Sarah Grimm, Electronic Records Archivist, Wisconsin Historical Society and Emily Pfotenhauer, Recollection Wisconsin Program Manager, WiLS.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
W I S C O N S I N L I B R A R Y A S S O C I A T I O N C O N F E R E N C EO C T O B E R 2 5 , 2 0 1 3
DESIGNING A SUCCESSFUL DIGITAL PROJECT
Supported by WHRAB
Sarah Grimm, Electronic Records Archivist, Wisconsin Historical SocietyEmily Pfotenhauer, Recollection Wisconsin Program Manager, WiLS
(descriptive information)• Making available online• Storing and maintaining
digital files and data (digital preservation)
Wisconsin Historical Society
DIGITAL PRESERVATION
The Library of Congress started the Digital Preservation Outreach and Education (DPOE) program in order to foster national outreach and education to encourage individuals and organizations to actively preserve their digital content.
http://www.digitalpreservation.gov/education/
Waterford Public Library/University of Wisconsin Digital Collections
DIGITAL PRESERVATION
Digital preservation combines policies, strategies and actions to ensure access to reformatted and born digital content regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time. Working group on Defining Digital Preservation, ALA Annual Conference, 6/24/2007
WHAT IS DIGITAL CONTENT?
• Digital content is any content that is published or distributed in a digital form, including text, data, sound recordings, photographs and images, motion pictures, and software.• Digital materials created from analogue sources• Born-digital content
• Digital materials you currently have or create – or expect to have – that you want to preserve.
DEFINING A DIGITAL COLLECTION
• A good digital collection…• Is publicly accessible• Is searchable - Includes keywords and other descriptive
information (metadata) so users can find what they’re looking for• Uses software that is sustainable (will be around for a long time)
and interoperable (can be migrated or shared)• Remains true to the original materials• Respects intellectual property rights
• A digital collection is not…• An inventory• An online exhibit/gallery/slideshow
WELL-MANAGED COLLECTIONS
• Characteristics of well-managed digital content: • Basic information about each collection• Minimal metadata for objects • Common file formats • Controlled and known storage of content • Multiple copies in at least 2 locations
BEFORE YOU EVEN START…..
• Don’t scan a mess! Take the time to assess and organize your originals first.
• A digital project can be an ideal time to evaluate collection conditions and rehouse materials as needed.
• Resources for collections care and organization:• Wisconsin Historical Society
Field Services staff• Wisconsin Archives Mentoring
Service• National Park Service Conserve-
O-GramsRichland County History Room
PROJECT PLANNING WORKSHEET
http://recollectionwisconsin.org/wla2013
Philharmonic Chorus MembersImage ID: WHi-92113
PLANNING
Postal workers sorting mail, 1955Wisconsin Historical Society WHi-36392
• Connect to your community
• Reach new audiences• Improve access to
“invisible” materials• Protect fragile or
heavily used materials• Learn more about
your collections• Contribute to our
collective knowledgeSouth Wood County Historical Museum
DEFINING GOALS
POTENTIAL AUDIENCES
• Local residents• Students and teachers• Genealogists• Specialists (e.g. Civil War
When developing a selection policy, consider…• Your organization’s mission statement and collecting policies• Appeal and interest (is this of value to researchers? To other
audiences?)• Uniqueness of materials (is this the only source or does it also
exist elsewhere? Avoid duplication)• Focusing on a specific subject, theme or creator• Manageability – tackle a project of appropriate size and scope
SETTING PRIORITIES
Ask yourself which materials are…• most significant to your
organization?• most extensive?• most requested/used?• easiest?• oldest?• newest?• at risk?
Neville Public Museum of Brown County
SELECTION – YES OR NO?
• This item is rare or unique to our collection.• This item is frequently requested by our patrons/visitors.• This item or very similar items are not found anywhere else on the Internet.• There is enough accurate information available about the item to add
useful context for our audience (for example, we know or can find out names of people, locations, dates).
• We have the appropriate equipment to create an accurate, high-quality digital copy of this item (for example, item is not too large to fit on scanner), or funding to outsource if needed.
• This item is in stable condition and will not be damaged by scanning or other handling.
• This item is in the public domain or we have secured permission from the rights holder to make it available online.
DOCUMENT YOUR DECISIONS….
Sinclair Lewis TypingImage ID: WHi-51874
CONSIDERING COPYRIGHT
• Disclaimer: We are not lawyers.
• Owning a physical item does not necessarily mean you hold the copyright to that item.
• Public domain = no longer under copyright. In the US in 2013 that means the item was:• Published before 1923 –OR–• Unpublished; creator died before
1943 –OR–• Unpublished; unknown creator;
made before 1893UW-Milwaukee Libraries
CONSIDERING COPYRIGHT
• Works under copyright, copyright holder is known:• Contact copyright holder IN
WRITING to request permission to make available online.
• Works presumed to be under copyright; copyright holder is unknown or cannot be located:• Due diligence has been made to
identify and locate copyright holder.
• Be prepared to remove item from digital collection if challenged.
Three Lakes Historical Society
SAMPLE COPYRIGHT STATEMENTS
• For an item presumed to be in the public domain: This item is in the public domain. There are no known restrictions on the use of this digital resource. Contact [your institution] to purchase a high-resolution version of this image.
• For an item under copyright; copyright holder has granted permission to put online:This image has been made available with permission of the copyright holder and has been provided here for educational purposes only. Commercial use is prohibited without permission. Contact [your institution] for information regarding permissions and reproductions.
• For an item in which copyright status is undetermined:This material may be protected by copyright law. The user is responsible for all issues of copyright. Contact [your institution] for information regarding permissions and reproductions.
COPYRIGHT TOOLS
• Public Domain Sherpa: Public Domain Calculator• http://www.publicdomainsherpa.com/calculator.html
--for scanning slides and negatives• Size of scanning bed• Image editing software
--many new scanners come with Photoshop Elements• Compatible with your computer’s operating system• Is your computer fast enough to process large image files?
SCANNING PHOTOGRAPHS
• Scan all photographs in 24-bit color, even if image is black and white
• Scanning resolution (ppi) depends on size of original item• Longest side of item longer than
7” = 300ppi• Shorter than 7” = 600ppi• 35mm sides or other small items =
1200ppi• Save two copies of each scan:• Master file: TIFF (20-40MB) for
archiving and printing• Access copy: JPEG (1-5MB) for
editing, online viewing, email, social media
UW-La Crosse
SCANNING DOCUMENTS
• Handwritten texts • Scan in 24-bit color to
retain character of original• 300-400ppi is generally
sufficient• If feasible, create a
transcription• Use care when unfolding
papers or handling tightly bound volumes
Wisconsin Historical Society
SCANNING DOCUMENTS
• Printed texts• Scan in 8-bit grayscale or 1-
bit black and white• 300ppi is generally sufficient• Use OCR (Optical Character
Recognition) software to make the text computer-searchable• May be provided with your
scanner software• ABBYY Fine Reader• Adobe Acrobat• OCR is never 100% accurate,
but that’s ok L. E. Phillips Memorial Library, Eau Claire
WORKING WITH PRINTED TEXT? OCR!
• OCR = Optical Character Recognition• Software that makes printed text computer-readable and fully
searchable
• Very valuable when scanning books, yearbooks, city directories, newspaper clippings, etc.
• A couple of options…• ABBYY Finereader ($100-$170)• Adobe Acrobat ($45 through techsoup.org)
WHEN NOT TO SCAN IT YOURSELF
• Look to a vendor for scanning…• Oversized materials
--maps, blueprints, etc.• Fragile books or scrapbooks
--bindings can be damaged by laying flat to scan• Anything with flaking, cracked or otherwise fragile surface• Microfilm
--newspapers
• Potential vendors• Northern Micrographics, La Crosse• A/E Graphics, Milwaukee• Wisconsin Historical Society (for microfilm)
CREATING METADATA
Syl carving his name in tree, 1902Wisconsin Historical Society WHi-69022
METADATA: WHAT IS IT?
• Information about stuff• Technical metadata = information
about the digital file (size, type, etc.)
• Descriptive metadata = information about the content of the item (what are we looking at?)
• Helps users find what they’re looking for
• Organized, standardized, consistent, searchable
Grant County Historical Society
SAMPLE METADATA
Field Name Sample Data
Title DiVall barber shop, Middleton, 1925
Subjects Barbers; Barbershops
Type Still image
Format image/tiff
Rights statement This material may be protected by copyright law. The user is responsible for all issues of copyright.
File name 2006_01_12.tif
Submitter Middleton Area Historical Society
Date digitized 2013-04-05
Middleton Area Historical Society
SAMPLE METADATA
Field Name Sample Data
Creator Bartle, F. C.
Date Created 1925-09-12 OR 1920-1930
Materials Photographs
Description Ralph DiVall (left) and Edwin T. Baltes (right) shave two men seated in barber chairs. According to a family history on file at the Society, DiVall operated this barber shop from the 1920s until his retirement on July 1, 1966.
Location Middleton, Dane County, Wisconsin
Collection DiVall Family Collection
Identifier 2006.01.12
Middleton Area Historical Society
TITLES FOR HISTORIC PHOTOGRAPHS
The photograph may already have a title.
EXISTING TITLES
If the photograph contains a title or caption, transcribe it exactly.
Birds-eye-view, No. 4, 1908, Barneveld, Wis.
WHAT MAKES A GOOD TITLE?
If the photo does not already have a title, you’ll need to create one.A useful title is…• Descriptive and specific • Brief• Follows specific formatting rules• Capitalize first word and proper names (people, places, institutions) • Don’t start with “A” or “The”• Period not needed at the end
Women and children with babies in carriages, Manitowoc County, 1890-1899
(SUBJECT, LOCATION, DATE)
BUILDINGS AND CITYSCAPES
• Identify the name of the street or view• Identify the location (City OR Township OR County)
• Identify the date (Year? Date range?)
100 block of South Main Street, Fort Atkinson, 1940-1949
(SUBJECT, LOCATION, DATE)
SUBJECT, ACTIVITY, LOCATION, DATE
Person, object, building, etc.
City OR township OR county
Year or date range
EXPANDED FORMULA FOR CREATING TITLES
Action or event
Only include an element IF KNOWN
ACTIVITIES AND EVENTS
Identify…Who? What are they doing? Where and when?
• Circus elephant• Trainer• Woman on swing• Evansville• 1940-1949
Trainer with circus elephant holding woman on swing, Evansville, 1940-1949
(SUBJECT, ACTIVITY, LOCATION, DATE)
ASSIGNING SUBJECT HEADINGS
• Subject headings are terms or phrases assigned to an item to facilitate searching and browsing a collection.
• Consistent use of subject headings helps link related content in your collection and across disparate collections.
CONTROLLED VOCABULARIES
• A controlled vocabulary is a standardized, pre-determined list of subject headings.
• Some examples of controlled vocabularies:• Library of Congress Thesaurus
for Graphic Materials
• Library of Congress Subject Headings
• Getty Art and Architecture Thesaurus
• Nomenclature 3.0 New Berlin Historical Society
TIPS FOR ASSIGNING SUBJECT HEADINGS
• Consider the following elements to help select terms:• WHO? People - age, gender, occupation, ethnicity• WHERE? Building or other setting• WHAT? Activities or events
• Always copy terms exactly from the controlled vocabulary.• Think of your own “tags,” then search the controlled
vocabulary list for correct terms. • How did others do it? Look at similar photos for
examples/ideas.• Aim for 1-5 terms.• There is no one right answer!
SAMPLE SUBJECT HEADINGS
SAMPLE SUBJECT HEADINGS
Railroads; Railroad stations; Carts & wagons
SAMPLE SUBJECT HEADINGS
SAMPLE SUBJECT HEADINGS
Students; Music education; Youth orchestras
EXERCISE - ASSIGNING TITLES AND SUBJECTS
Work in small groups to assign a title and subjects to a historic photograph.
Remember the basic title formulas:• SUBJECT, LOCATION, DATE• SUBJECT, ACTIVITY, LOCATION, DATE
Select terms from the short list extracted from the Library of Congress Thesaurus for Graphic Materials. The full version of this controlled vocabulary is available online: http://www.loc.gov/rr/print/tgm1/• choose a maximum of 5 terms
• Tool with many different capabilities for image manipulation/editing• For photos, we can easily view an
entire folder’s worth of images at one time
CHECKSUMS
•Checksums (AKA “Hash Sums”) are created by programs running an algorithm against the contents of a file. (there are many free utilities that will perform this function for you)
•The resulting checksum is a short sequence of letters and/or numbers that uniquely identifies that file. (think “electronic fingerprint”)
Unix cksum utility
WHY IS THIS A GOOD THING?
•Checksums help maintain the INTEGRITY of your collections because they will tell you when things change over time.
•If two files are exactly the same, the checksums of those files will also be exactly the same (generally speaking )
•If a file becomes corrupted, degraded or is changed in some way, the next time you run the utility on it, the checksum will change
Things that will NOT affect checksums• Moving items from one place to another • Changing the file name
Run on the master fileswhen a collection is completed
Set up a schedule to run“verify checks” periodically
St. Mary of the Lake Parish School First DayImage ID: WHi-98433
STORAGE
KEY DECISION POINTS
• How are you going to organize it? • What are you going to store it on?• Where are you going to store it?• How many copies do you
need?
Post OfficeImage ID: WHi-9135
FACTORS TO CONSIDER
• Immediate Costs
• Quantity (size and number of files)
• Number of copies
• Media (life span, availability, $$)
• Other resources
• Expertise (skills required to manage)
• Services (local vs. hosted)
• Partners (achieving geographic distribution)
• Institutional constraints
HOW MANY AND WHERE?
• Multiple• Minimum: two (2) copies in two locations• Optimum: six (6) copies
• Geographically distributed• Don’t keep your copies onsite if possible
LOCAL STORAGE OPTIONS
• Local network • RAID device• External hard drive• Archival quality (gold) CDs
or DVDsTake into account potential future storage needs.
Villa Terrace Decorative Arts Museum
CLOUD STORAGE OPTIONS
Commercial options:• Google Drive• Up to 5GB free (approx. 140 high-resolution TIFF files)• 25GB = $2.50/month
• Amazon Simple Storage Service (S3)• $.095 per GB/month
Institutional options:• DuraCloud
THE (MOSTLY) GOOD…..
Responsibilities and costs are transferred to the cloud provider• Installation / replacement / upgrades of hardware and
software• Backup and recovery of data are part of the package• No local physical presence (valuable space)• No local environmental requirements (power or cooling costs)
THE (POTENTIALLY) BAD
There are potential disadvantages however…..• Can records be managed correctly throughout their entire
lifecycle? • Can it support Open Records requests?• Security concerns• Do you know where your data is?• Accessibility – more “points of failure” when the data is
remote• Costs for accessing data can be high
RESOURCES
State of Wisconsin Public Records Board has created two documents which can be found at:http://publicrecordsboard.wi.gov/docs_all.asp?locid=165
• Public Records Board Guidance on the Use of Contractors for Records Management Services
• Use of Contractors for Records Management Services(Both docs are in the Reference Materials section)
DOCUMENT YOUR DECISIONS….
Sinclair Lewis TypingImage ID: WHi-51874
ACCESS CONSIDERATIONS
Historical Society library stacks, 1896Wisconsin Historical Society WHi-23281
WHY ARE YOU PROVIDING ACCESS TO CONTENT?
• User demand• Institutional visibility• Legal mandates or grant
requirements• Generate revenue• Contribute to our collective
knowledge
South Wood County Historical Museum
WHAT MAKES A GOOD ONLINE COLLECTION?
• Publicly accessible.• Searchable - Includes keywords and other descriptive
information (metadata) so users can find what they’re looking for.• Organized and consistent.• Based on existing international/national/statewide standards
and best practices.• Uses software that is sustainable (will be around for a long
time) and interoperable (can be migrated or shared).• Respects intellectual property rights.• OAI-PMH compliant (to share content on statewide level)
SOME OAI-COMPLIANT ACCESS PLATFORMS
• CONTENTdm• Your own instance • Hosted by Milwaukee Public
Library through Recollection Wisconsin
• ResCarta Web• Free and open source• Host it yourself or through
vendor
• Omeka• Free and open source• Host it yourself or through
Omeka.net
• Other?Beloit College
CONTENTDM
• Hosted by Milwaukee Public Library through Recollection Wisconsin• Produced and distributed by OCLC• Costs (through Recollection Wisconsin):• $200 one-time setup fee• Annual hosting fees starting at $75
http://content.mpl.org/ashland
http://content.mpl.org/ashland
http://content.mpl.org/ashland
http://content.mpl.org/ashland
RESCARTA WEB
• Free and open source• Host it yourself; or hosting available through Northern
Micrographics (fee-based)• ResCarta Foundation – based in La Crosse
• Send someone with a laptop to popular local spots/events to demonstrate digital collections:• Ask, “Where do people go first to look for this kind of
information?” and then, market there• Upload a few digitized images to Flickr with descriptions that
point back to your related digital and physical collections.• Contribute to relevant pages on Wikipedia and include references
pointing to specific digital materials.
• Request that the Chamber of Commerce and other relevant local organizations link to the new digital collections from their websites.• Send a press release to local media
EVALUATING IMPACT
Understanding current users…Online survey instrumentWeb analyticsEmail subscriber listsVisitor forms
Understanding future users…Special interest groups (AASLH, SAA, etc.)ListservsWorkshops and conference sessions
of Digital Imaging Projects,” RLG DigiNews v. 3, no. 5 (1999) WHi-4352
TIPS FROM OTHER DIGITIZERS
• If I could do it all over again, I would:• Tackle a smaller group of
materials at first• Make sure two people started
the project at the same time so we could help each other
• Start with a clearer plan• Take the time to sort and
research the physical collection before digitizing
• Have firm deadlines to help me stay on track Langlade County Historical Society
NEXT STEPS/TO DO LIST
• Review collections and set priorities for digitization.• Consider developing a written selection policy.• Determine the copyright status of any materials you
plan to share online and secure permissions from copyright holders if materials are not in public domain.• Acquire scanning equipment or make other plans for
conversion.• Familiarize yourself with good, useful metadata by
looking at other online collections.
NEXT STEPS/TO DO LIST
• Develop a file naming convention document.• Develop a storage management policy• E.g., number of copies, locations
• Monitor copies of content for errors/changes• Evaluate technology to determine your preferred
access platform• Develop a marketing plan• Determine how you will evaluate the success of