Top Banner
PANDORA: An Overview Future-proofing Institutional Websites 19-20 January 2006 London Matthew Walker Deputy Director, Collection Infrastructure IT Division National Library of Australia
24

香港六合彩

Jun 20, 2015

Download

Business

iewsxc

香港六合彩要文明点,香港六合彩看,我不就作得很好嘛尽管我想宰了香港六合彩
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 香港六合彩

PANDORA:An Overview

Future-proofing Institutional Websites

19-20 January 2006

London

Matthew Walker

Deputy Director, Collection Infrastructure

IT Division

National Library of Australia

Page 2: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

2

Introduction• Origin: Proof-of-concept

• Selection work started in 1996

• Archiving began late 1996/early 1997– Few automated processes– Progressed to more automated approach

• Now: Important NLA archiving activity

Page 3: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

3

How?• Dynamic approach

– Low structure, high flexibility– Processes developed “on the fly”

• Result– Outcomes achieved– Best use of available resources

Page 4: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

4

Who?• NLA

– Digital Archiving Section• Business responsibility (~7 staff)

– Librarians (support as needed)• Cataloguing

– Information Technology• Support (~1 staff)• Enhancement/Redevelopment (~4 staff)

Page 5: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

5

Who?• Partner Institutions

– Libraries:• Northern Territory Library, State Library of New

South Wales, State Library of Queensland, State Library of South Australia, State Library of Victoria, State Library of Western Australia

– Other:• Australian Institute of Aboriginal and Torres Strait

Islander Studies, Australian War Memorial, National Film and Sound Archive

Page 6: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

6

What?• NLA responsibilities

– National Library Act, 1960• No legal deposit legislation for electronic

resources!

– Maintain and develop a national collection of ‘library material’

– Comprehensive collection relating to Australia and the Australian people

– Leadership role

Page 7: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

7

Characteristics• Selective approach

• Scalable to available resources

• Negotiate permission to archive

• Manual quality assurance processes

• Access to the archived resources

Page 8: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

8

Issues• Missing resources for future researchers

• Labour intensive

• Full linking structure of the Internet not retained

• Deep web content not archived

Page 9: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

9

Workflow1. Nominating/Identifying

• Publisher self-nomination• Nomination form (

http://pandora.nla.gov.au/registration_form.html)

• Indexing/abstracting agency nominations.• Nomination form

(http://pandora.nla.gov.au/indexerform.html)

• NLA’s Digital Archiving Section (DAS)• Partner institutions

Page 10: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

10

Workflow2. Selecting

• DAS• NLA selection guidelines (

http://pandora.nla.gov.au/selectionguidelines.html)

• Partner institutions• Own selection guidelines

• Type of content• Documents (e.g. PDF)• Whole and partial websites

Page 11: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

11

Workflow3. Gathering

• Mechanisms• HTTrack crawling (http://www.httrack.com)• FTP from publisher• Email from publisher

• Preservation copy• Post-crawl processing• Working area

Page 12: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

12

Workflow4. Processing

• Quality assurance• Manual check for viewing/linking errors• Completeness and functionality• New content (compare with previous instance)• No unexpected content

• Modifications• Write access to the working area

• Add missing files, fix broken links, etc.

Page 13: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

13

Workflow5. Archiving

• Transfer master display copy from working area to Digital Object Storage System (DOSS)

• Transfer preservation copy to preservation area on the DOSS

• Create display copy on web server• Still not published!

Page 14: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

14

Workflow6. Publishing

• Title Entry Page (TEP)• Created from metadata• Additional links to notes, links to serial issues,

copyright statement, etc.• Creation makes the archived copy publicly

accessible

• Persistent Identifiers (PIs)• e.g. nla.arc-25849-20051113-

www.bullyingnoway.com.au/default.html

Page 15: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

15

Workflow7. Cataloguing

• Bibliographic details• NLA catalogue• National Bibliographic Database (NDB)

• Metadata imported into PANDORA TEPs

Page 16: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

16

Workflow• Permissions

• No legal deposit• Explicit permission of the publisher is sought prior

to archiving

• Copyright, etc• Publisher’s permission to make publicly available

– Restrictions

Page 17: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

17

Workflow• Restrictions

• Publisher restrictions on access• Period

– e.g. accessible from restricted location/s for 5 years– Location is specified by IP address and subnet mask

• Date– e.g. accessible from restricted location/s between 3/12/2005

and 31/1/2007– Location is specified by IP address and subnet mask

• Authenticated group– e.g. accessible by username/password credentials

• Can be enabled/disabled in PANDAS

Page 18: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

18

NLA Tools• PANDAS

– http://pandora.nla.gov.au/pandas.html– Web archive management system.

• XINQ– http://www.nla.gov.au/xinq/– Making deep web database archives

accessible by browse/search.

Page 19: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

19

Other Tools• PageVault

– http://www.projectcomputing.com/products/pageVault/– Archives your website by keeping a copy of every accessed

version of a page as it passes through your web server.• HTTrack

– http://www.httrack.com– Desktop/command-line tool for crawling websites.

• Heritrix– http://crawler.archive.org/– Tool from Internet Archive for crawling the web.– Designed for large-scale crawls, rather than individual

websites.

Page 20: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

20

PANDORA Resources• Selection guidelines

– http://pandora.nla.gov.au/selectionguidelinesallpartners.html

• Papers & presentations– http://pandora.nla.gov.au/papers.html

Page 21: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

21

Other Resources• PANDORA Archiving Issues FAQ

http://pandora.nla.gov.au/manual/pandas/faq.html• NLA Digital Archiving Section - General Procedures

(Procedures for handling Internet resources)http://pandora.nla.gov.au/manual/general_procedures.html

• NLA Digital Archiving Section Manual - Check List for Scheduled Gatheringshttp://pandora.nla.gov.au/manual/checklist.html

• NLA Digital Archiving Section Manual - Gathering Schedule Guidelineshttp://pandora.nla.gov.au/manual/schedule_guidelines.html

Page 22: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

22

Future Directions/Issues• Deep web – database archiving

• Historical repository of tools for viewing archive content

• New & future ways of authoring & publishing to the web– XML publishing, blogs, DB driven, wikis…– What’s coming in 2, 5 or 10 years’ time?

Page 23: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

23

Recommendations for starting out• Do something small & do it now.

• Build on what you already have.

• Think about what you have done and revise/expand as necessary.

Page 24: 香港六合彩

04/13/23 Future-proofing Institutional Websites, 19-20 January 2006, London

24

Summary• The PANDORA story

• Tools and resources

• Futures/ideas