Congressional Papers Roundtable Donation of Digital Records Survey Conducted by the Electronic Records Committee JuneJuly 2015
Congressional Papers Roundtable
Donation of Digital Records Survey
Conducted by the Electronic Records Committee
JuneJuly 2015
Introduction: This survey follows up on our survey in 2014 of Roundtable member policies and practices with electronic records. In this survey, we focused on the donation process, hoping to reveal details missed in the general survey that would aid institutions in developing their own donation procedures. Between June and July 2015, we solicited responses for the survey using the CPR and ACSC listservs and received 12 responses. While it is a small sample, the respondents offered information that should be of use to the congressional papers community. The following report represents the results for each question of the survey. Further analysis of the results will be available on the CPR ERC website this fall. Danielle Emerling Adriane Hanson Laura Litwer August 5, 2015
1
Overview 1. Please choose one of the following that best describes your institution.
Total respondents: 12
Institution Type Respondents
Public Library 1
Statefunded university or college 9
Private archives/historical society 1
Federal repository 1
2. Do you have digital records in your congressional collections?
Total respondents: 12
Digital records in congressional collections Respondents
Yes 12
No 0
2
Collection Development and Donor Communication 3. How are digital records addressed in your collection development policy?
Total respondents: 12
Collection Development Policy Respondents
Digital records are referenced throughout our policy, integrated with references to paper records.
7
We have separate collection development policies for paper and digital records.
1
The policy for collecting paper records includes a separate section about digital records.
1
We don't have a collection development policy that addresses digital records.
1
Collection development policy does not specify format, so collecting digital records is implied but not specifically address.
1
Other: We really have a separate policy integrated into our formal general collection development policy, but with our Congressional collection development, we reference them throughout our specific goals and objectives, as well as all communications relating to donations. The goal is to update our formal general policy to go along with our Congressional procedures.
1
3
4. What do you discuss with potential donors about digital records?
Total respondents: 11
Donor Discussions Respondents
Access methods 6
Appraisal and what files the archives would or would not like to have donated
9
Constituent correspondence management systems (CMS/CSS/CRM)
6
Description of files that will be donated 8
Hardware and software used by the office 10
Office policies about creating, using, and saving digital records
9
Preferred file formats the archives would like to receive
6
Preservation capabilities and limitations of the archives
7
Restricted materials that might be present in the donation
7
Storage capabilities of the archives 5
Total expected size of the donation 7
Use of social media and websites and how to capture those for the archives
2
We do not discuss digital records with donors 0
Other: Preference for paper records when possible 1
4
4. What do you discuss with potential donors about digital records? (Continued)
5. Do you have preferred file formats?
Total respondents: 12
Preferred file formats Respondents
Yes 6
No 6
5
Preferred File Formats 6. What are your preferred file formats?
Total respondents: 6
File Formats Respondents
PDF 2
PDF/A 1
MS Office 1
TIFF 3
JPEG 2
PST 1
Extensive file format lists
2
Additional comments:
● I'm looking for "widely installed base" or standards based agnostic. PDF, MS Office, TIFF, JPEG. We have acquired almost no software dependent files outside of these.
● We make a concerted effort to follow the National Archives Format Policy
Guidance for Transfer of Permanent Electronic Records: http://www.archives.gov/recordsmgmt/policy/transferguidancetables.html.
● Chart of Preferred formats (next page)
6
6. What are your preferred file formats? (continued)
Audio BWF (PMD) WAV FLAC AIFF (AFF) MP3
Image (born digital or scanned)
TIF(TIFF) PNG U3D X3D
Video
AVI* H.264* MOV* MP4* DPX (its own animal) *With video, encapsulation discussions are necessary when dealing with format/coding, as well as audio codecs
Objects/Data/Other
XML ASCII (Text) ODS CSV PDF PDF/E PDF/A ODP PPT (PPTX) PPTX preferred of the two DOC (DOCX) DOCX preferred of the two EML PST MBOX MSG
Website related
ARC WARC GIF JPG HTML (text) XHTML (text)
Coding scripts Database structures preferred in SQL with version designated Program retention in C#, C+, C++, Python, Ruby, JavaScript, PHP
7
7. What has been your success in receiving preferred file formats?
Total respondents: 6
Success Respondents
Mixed success 4
Little success 1
No discussion of preferred file formats before transfer
1
8
Transfers and Summary of Experiences 8. When do you receive digital records from offices?
Total respondents: 12
When Records Have Been Received Respondents
Received everything when the office closed 10
Periodic transfers with no regular schedule 3
Regular transfers 1
Other: Most digital records came as removable electronic files (floppy disks, CDs, etc.) when the collection was donated to the repository. For other collections, we are expecting future transfers.
1
9
9. What is your preference for transfer schedules and why?
Total respondents: 11
Transfer Schedule Preference
Respondents
Periodic/Regular 6
No preference 3
Other 2
Comments regarding preferences:
Why prefer periodic: ● We currently only have digital records with one congressional collection. These
came to us in one donation. For other digital collections, we prefer regular scheduled transfers to help with our planning.
● In an ideal world, we would prefer to receive regular transfers while the office is still
open and functioning. This type of schedule allows for more conversation with donors and [response cut off]
● I would prefer periodic transfers. In most of our fourteen collections digital materials
came as the office was closing or slightly ahead of that.
● In the future, we would prefer to receive digital records through periodic transfers so that we could better prepare the infrastructure for storage and ensure that the formats we are receiving will comply with our standards.
● It would be preferable to receive records, both paper and digital, on an ongoing
basis from offices at the end of a congressional session, but so far have not been able to convince any member offices to comply with this request.
● Would be better to get a regular transfer. The logistics would be easier to manage
and the staff involved in creating the records would still be in the office to talk to
10
about what the files are. That would also let us establish a regular relationship with the staff so they might be more likely to ask us questions and would know to ask us before deleting anything important.
Current situation:
● We have not established preferred transfer schedules. ● We have not established policies for transferring digital records yet.
Preferred frequency:
● Yearly would be the most convenient.
“It Depends”: ● Frankly, they all have their advantages and disadvantages. Schedules are
probably the easiest to plan for, but if we refused nonscheduled records, we'd miss out on valuable materials. We find it difficult to schedule everything.
● Depends upon the office. The way they manage their files has a bearing on the
potential to miss files or context.
11
10. How have you accepted social media accounts or websites?
Total respondents: 12
Methods of accepting social media accounts or websites Respondents
We have not received any social media accounts or websites. 8
We harvested the files. 3
The office gave us files that were posted on the web. 1
The office exported files from the social media accounts themselves. 1
The office paid a vendor to harvest the files. 1
Institution has harvested websites, but not from congressional offices.
1
12
11. If the office or your archives has harvested social media or websites, what tools were used and how successful were they?
Total respondents: 3
Tools Respondents
ArchiveIt 1
Heritrix and Wayback
1
HTTrack 2
Comments:
● We are starting to use ArchiveIt to harvest social media and websites. We do not have any public collections yet.
● Heritrix + Wayback very thorough, with welldocumented problems. | HTTrack
requires a lot of refinement, but provides good captures after great effort.
● Used HTTrack. Fine for websites. Did not do social media as well it could get the most recent posts from some sites but also got foreignlanguage versions of some of the subpages and could not capture the earlier posts that only appear when you scroll down the window. It could not capture Facebook at all.
13
12. What is the most common way you currently receive digital records?
Total respondents: 12
Method Respondents
External hard drive 8
Magnetic tape 1
Removable media 3
14
13. What is your preferred method to receive digital records and why?
Total respondents: 12 Note: some respondents had different preferred methods for different situations.
Method Respondents
External hard drive 8
FTP 2
Original computer 3
Web transfer 1
No preference 2
Comments for when a method is preferred: External hard drive:
● Preferred for older records to see the organization as a whole. ● Most common and convenient method. ● Donors understand this method the best.
Original computer:
● Preferred for older records to see the organization as a whole. ● Preferred when possible. ● Preferred so can take disk image, recover corrupted data and/or metadata that an
export would not have. FTP:
● Preferred for more recent transfers. ● Preferred for smaller transfers.
15
14. Are there methods of receiving digital records that you do not allow? If so, why not?
Total respondents: 9
Method Respondents
Direct online transfer
1
Email 1
None 7
Comments: Direct online transfer are not permitted because we wish them to be in their most raw format, with minimal compression/transfer/etc. We also require control over the actual data during transfer to establish verification procedures and look for any anomalies or malicious script. Email attachments are not permitted because it changes all the dates of the files (and potentially other metadata) and has a greater chance of corrupting the files.
16
15. What aspects of the donation process for digital records have been successful for you and why?
Total respondents: 8 Area Respondents
Disk images 1
Donor conversations 2
IT staff conversations 1
Office staff conversations
3
Photographs 2
Text PDF 1
Transfer of the files 1
Treat different than paper
1
Video 1
Comments:
● Digital records have not yet become a focus of the donation process.
Comments about communication:
● It always works best for us if digital records are discussed in the initial donor conversations and are treated as "different" as paper.
● Communicating directly with the IT staff person in the member office, while
coordinating with the chief of staff, has proved effective in obtaining digital files.
● Explaining issues and challenges of preservation and security of donated materials in a way that shows that the archival staff has direct interaction with and access control of the digital records at all times has helped with the acquisition of some digital materials, as well as some analog materials that needed digitization for access and future preservation of content (e.g. Beta format videos). This has not yet been used to any great degree with Congressional records, only other donors.
17
● Meeting with district office staff inperson to learn about the records, and calling D.C. office staff on the phone (email was not very effective). Especially important for learning about the files on the shared server.
Comments about technology:
● Disk imaging has generally been most successful.
● We've mostly been receiving still photos and video in digital form.
● We did receive all images in JPEG format and many of the text files were in PDF format but we still have many different formats to migrate.
● The transfer process itself went well getting the files copied onto an external hard
drive safely and shipped to us. 16. What aspects of the donation process for digital records would you like to improve for future donations and why?
Total respondents: 9
Area Respondents
Donor communication 5
Appraisal 3
Access 1
Accessioning workflow 1
Databases 1
Onsite transfers 1
Policies and standards 1
Preservation 1
Web harvesting 1
Comments about policy:
● We are building our digital archives program and would like to improve all aspects of the donation process, particularly the development of policies and standards.
18
● I would like to have procedures created to appropriately accession, preserve and make accessible the electronic files. We do not have these procedures in place at the moment, however, we are in the process of establishing these procedures.
Comments about donor conversations:
● I think the big question is advocacy for preservation of email, social media and constituent mail systems data We're not being offered these kinds of information.
● Having more direct and regular conversations with donors.
● We would like to have more communication with our donors prior to the transfer to
ensure that information is preserved in a proper format before it is stripped from its proprietary software (such as the CSS system).
● Better hand outs for the offices. We sent a survey that would have given us good
information but it was longer than they would read. We want a one page document for next time that will orient the staff to the process and will explain the rest via conversation.
● Email records have proved problematic, due to the reluctance of staff to submit to
requests for their records. The problem stems from email records containing a mix of both work and personal messages. More staff education is required to alleviate this problem, though given how staff use their email accounts, it will always be a problem.
Comments about appraisal:
● Better understanding of the importance placed on different social media accounts so we know which ones to focus on. Better web harvesting method.
● Have something in writing that would allow us to conduct appraisal during
accessioning and delete unwanted files to save our storage space.
● Capture onsite at time of transfer
● Better preservation of database architecture, and the ability to preappraise logs and other metadata to determine if it is worth the resources to preserve.
19
Wrap Up 17. What digital records topics would you like to learn more about, and how can CPR help?
Total respondents: 8 Topic Respondents
Access 4
Appraisal 1
Deaccessioning 1
Hybrid collections 1
Longterm preservation 1
Migration 1
Workflows 2
Working with offices 2
Comments:
● We would like to learn more about planning and implementing workflows for records that may be accessioned decades before they can be made publicly available.
● Longterm preservation, migration, access
● I would love to learn more about how we can supply access to digital records. We
have a good system in place for ingesting materials, however we are still struggling to create a plan for online and nearline access.
● How do people convince the staff to let go of email, social media and CMS data?
● Appraisal, deaccessioning unreadable removable electronic records (e.g., floppy
disks, etc), best practice standards for incorporating arrangement and description of digital records with paper records.
20
● How to make the archived files accessible to researchers for small institutions with small budgets.
● I would like to see the CPR issue more directives to congressional offices to
improve the migration process so that archives are not left to pay out massive expenses for migration and data reconstruction.
21