Marine Protected Areas Lab 11
Dec 22, 2015
www.sims.monash.edu.au
IMS9300IS/IM Fundamentals
Lecture 7
Information Management Issues
www.sims.monash.edu.au
2
Outline
• What Information Management is• Documentation• Managing less-structured information• Metadata• Key Issues
– Volume
– Management
– Standards
– Longevity
www.sims.monash.edu.au
3
References
National Archives of Australia Recordkeeping Metadata Standard for Commonwealth Agencies http://www.naa.gov.au/recordkeeping/control/rkms/contents.html
Public Records Office Victoria. Management of Electronic Records PROS 99/007 (Version 2) The Victorian Electronic Records Strategy (VERS) http://www.prov.vic.gov.au/vers/standards/pros9907vers2/default.htm
Dublin Core Metadata Element Set http://dublincore.org/documents/dces/
www.sims.monash.edu.au
4
Reading
• Buckland, Michael J [1997]. What Is a ‘‘Document’’? Journal of the American Association for Information Science. 48 (9) 804-809. [available as full text on-line via Monash Library catalog]
• Baca, Martha Introduction to metadata http://www.getty.edu/research/conducting_research/standards/intrometadata/index.html
• DigiCULT Thematic Issues http://www.digicult.info
www.sims.monash.edu.au
5
Some sample sites
• William Blake Archive http://www.blakearchive.org/main.html
• Este Art Archive http://www.eca.ferrara.it• Picture Australia
http://www.pictureaustralia.org/
www.sims.monash.edu.au
6
What Information Management is
• The term is used differently – in computer science environment, where it is seen
as the same as “data management”, and
– In information sciences, where it is used to denote a variety of sub-disciplines, such as librarianship, knowledge management, recordkeeping, archives, and associated sub-fields, such as information retrieval and technological management of ill-structured information.
www.sims.monash.edu.au
7
Some sub-disciplines of IM
• Recordkeeping• Librarianship• Archives• Knowledge Management
www.sims.monash.edu.au
8
Some sub-activities
• Information Retrieval– Cataloguing– Classification– Indexing
• Technical operations– [Specialised] database management– Preservation– Delivery of information
www.sims.monash.edu.au
9
Key issues for IM
• Continuing massive increases in volume of material
• Continuing massive increase in demand for information
• Connecting these two• Storage• Retrieval• Movement of material across time and place
www.sims.monash.edu.au
10
Growth of Information
• 7 of 8 scientists who have ever lived are alive now
• Scientific publication rate doubles every 7 years
• Has the WWW slowed in its growth?
www.sims.monash.edu.au
11
Growth of IM tools’ power
• Moore’s law – cpu power doubles every 18 months
• Storage capacity increases• Sophistication of tools• Wider acceptance of standards?• Integration of tools [portals]• Koenig, Michael E. D. (1982). The information
controllability explosion. Library Journal Nov 1, 1982 v107 p2052(3) How valid are Koenig’s ideas now?
www.sims.monash.edu.au
12
Demand for Information
• Population growth• Literacy growth• Communications growth• Globalisation• Social complexity• Information-based economies• Information media used for
“entertainment”
www.sims.monash.edu.au
13
Storage of information
• How much paper is 1GB?• About 30m of reams of A4 printed both
sides [Think “Eiffel Tower” for your 100GB disk!]
• So if we can store it, it would be good to worry about format, retrieval and longevity.
www.sims.monash.edu.au
14
Retrieval
• Precision• Recall
www.sims.monash.edu.au
15
Full-text
• Words• Word order• Vocabulary issues
– What does the word “lead” mean
– What’s the difference between “I commence my vacation” and “I start my holiday”?
• Stop words
www.sims.monash.edu.au
16
Metadata
• Suppose you had formal scheme to describe things, in terms of
– Identification
– Description
– Content
– Rights
– Administration
• The beauty of standards is that there are so many to choose from.
www.sims.monash.edu.au
17
Metadata Standards – Alphabet Soup
• DC [Dublin Core]• XML [build your own scheme?]• RDF [Resource Description Framework]• TEI [Text Encoding Initiative]• OAIS [Open Archive Information System]• Etc. etc.• Oh yes, and MaRC [Machine Readable
Catalog]
www.sims.monash.edu.au
18
Dublin Core
• A standard set of 15 “tags” for metadata for Web materials.
• Subsequently modified at further meetings in Canberra and Warwick.
• Is it too simple or too complex?
www.sims.monash.edu.au
19
DC Tags
• Title• Creator• Subject & keywords• Description• Publisher• Contributor• Date• Resource Type
• Format• Resource Identifier• Source• Language• Relation• Coverage• Rights Management
www.sims.monash.edu.au
20
DC – an example
• The Commonwealth government in Australia has endorsed GILS [Government Information Locator Service] which uses a superset of DC
• e.g. http://www.health.gov.au/pubs/annrep/ar2003/index.htm
www.sims.monash.edu.au
21
Metadata from Commonwealth department Web page
rel="schema.AGLS" href="http://www.naa.gov.au/recordkeeping/gov_online/agls/1.2"> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <meta name="DC.Creator" content="Australian Government Department of Health and Ageing"> <meta name="DC.Publisher"
content="Australian Government Department of Health and Ageing"> <meta name="DC.Rights" content="Copyright of Commonwealth Government"> <meta name="DC.Title" content="2002-03 Annual
Report Health and Ageing Portfolio"> <meta name="DC.Subject" content="aged care services; australia; department of health and ageing; health care systems; health
services" scheme="Health Thesaurus"> <meta name="DC.Description" content="This page gives access to the 2002-03 Annual Report for the Department of Health and
Ageing for the financial year 2002-2003 "> <meta name="DC.Language" content="en" scheme="RFC3066"> <meta name="DC.Date.Created" content="2003-10-29" scheme="ISO8601"> <meta name="DC.Date.Issued" content="2003-10-29" scheme="ISO8601"> <meta name="DC.Date.Modified" content="2003-10-29" scheme="ISO8601"> <meta name="DC.Type" content="document" scheme="HI type"> <meta name="DC.Type" content="resource" scheme="HI category"> <meta name="DC.Format" content="application/pdf" scheme="IMT"> <meta name="DC.Identifier" content="http://www.health.gov.au/pubs/annrep/ar2003/index.htm" scheme="URI"> <meta name="AGLS.Availability" content="Available as a set of PDF files"> <meta name="AGLS.Audience" content="adult" scheme="HI age"> <meta name="HI.Complexity" content="medium"> <meta name="HI.Status" content="registered"> <meta name="Keywords" content="aged care services, australia, annual report, australian government, department of health and
ageing, health care systems, health services"> <meta name="Description" content="This page gives access to the 2002-03 Annual Report for the Department of Health and Ageing
for the financial year 2002-2003">
www.sims.monash.edu.au
22
Retrieval of non-text
• Images– “I want a picture of a boy, with a horse and a
dog”
• Sound– “What’s this tune? [hums tunelessly]”
www.sims.monash.edu.au
23
What are humans good at? Oral tradition.
• Before the storage of information content material was handed down by oral tradition.
• We are good at listening to stories• Legends, myths, folk stories, traditional
song, verse are all we have from before the invention of storage codes.
www.sims.monash.edu.au
24
Transfer of information over time and space
• Writing, in its various forms, has been use to move information over time and place.
• Typically the more material moves, and the faster it moves, the more transient it is.
www.sims.monash.edu.au
25
Longevity of information
• The useful life of information is difficult to determine. – Consider information on a cricket
scoreboard– This lecture is meant to be transient – the
relevant content in mere months will have changed.
– Warnings about radioactive contamination may need to last a long time
www.sims.monash.edu.au
26
Stored electronic material
• The problems are associated with– The content [may be transient]– The medium [there may be no hardware]– The format [there may be no operating
system]– The structure [there may be no software
application]• Failure at the latter 3 levels may not just
not read, but may damage the source.
www.sims.monash.edu.au
27
Images
• Material stored as images can be even more under threat, as display media may also be unobtainable.
• Conversion of text image to text is subject to the difficulties of OCR.
• Image formats may be proprietary.
www.sims.monash.edu.au
28
Sound
• Along with written codes, this provides us with some models of moving material across media.
• These are wax cylinder recordings from 1897 and 1904.
• Has sound been lost in the process of digitising the sound? Can you hear the difference between vinyl and CD, between audiotape [what type] and FM radio?
www.sims.monash.edu.au
29
Moving digital material through time
• This can be achieved by– Wrapping the source material in a
succession of layers of decoders
– Transferring the source to new media
– Preserving the source operating conditions
– Ensuring “backward compatibility”
www.sims.monash.edu.au
30
Digital Curation Centre
Scientists and researchers across the UK generate increasingly vast amounts of digital data, with further investment in digitisation and purchase of digital content and information. The scientific record and the documentary heritage created in digital form are at risk, from technology obsolescence and from the fragility of digital media. The JISC and the academic community have already begun to identify a strategic approach and have invested in a number of scoping studies. Building on that work and the expertise already existing in particular disciplines, the task is now to support UK institutions in storing, managing and preserving these data to ensure their enhancement and continuing long-term use.
http://www.dcc.ac.uk/
www.sims.monash.edu.au
31
Summary
• The key issues for IM pivot on– Accessibility
– Storage
– Retrieval
– Preservation
• The key issue is recognition of the need to do something about it, and exercising the will to do that.