This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1. Libraries in a data-centered environment Jakob Vo (GBV)
Ticer Summer School, August 22th, 2012
2. ProlegomenaThe importance of DataThe importance of
LibrariesSummaryAppendices
3. Section 1Prolegomena
4. So what about the Cloud? Its a hype Its a buzzword (cloud =
bullshit)1 Better know exactely what is referred to by cloud Which
notion of cloud do libraries refer to? 1 to impress and persuade,
unconcerned with falsehoods (Frankfurt, 2005)
5. Three notions of the Cloud
6. Figure: Infrastructure as a Service (IaaS)
7. Figure: Platform as a Service (PaaS)
8. Figure: Software as a Service (SaaS)
9. Software as a Service (aka web application) Software that
you dont have to install or update. Software that hides some of its
complexity. Any software is inherently more complex then the task
it automates. Dont expect software to simplify anything!
10. Section 2The importance of Data
11. Data vs. Applications Data matures like wine, applications
like sh James Governor
12. Data vs. Applications For immediate consumption
Requirements and business logic change Technical developments and
trends Peoples requirements change
13. Data vs. Applications Can be used in dierent context and
times, if it is well done: respect special properties of data
respect dierent notions of data
14. Special properties of data Bits can freely be rearranged.
Eventually, data can be copied can be modied very eciently, without
any traces or dierences between original and copy.
15. Special properties of data Digital Collections and
descriptions of data are data again.
16. Data challenges This is where libraries are needed!
Preservation Authenticity Provenance Identity
17. Data challenges: Preservation All data needs a carrier
Unsolved problem in general, but established discipline
18. Data challenges: Authenticity Data modication leaves no
traces Related to preservation but more about trust
19. Data challenges: Provenance Data copy leaves no traces
Digital signatures and trust (again)
20. Data challenges: Identity A single bit changes the whole
dataset Which modications matter?
21. Three notions of data2 Data is also becoming a hype, so
better know exactely what is referred to by data. Data as facts
Data as subjective observations Data as communications 2 As
identied by Ballsun-Stanton (2012)
22. Data as facts Hard numbers, product of reproducible
measurements, scientic facts Used to reveal (the real) world
23. Data as facts in libraries Created by libraries Holding
counts Patron information Formal metadata Collected by libraries
research data
24. Data as subjective observations Product of recorded
observations, sense-impressions that must be ltered Used to
construct (our) reality
25. Data as subjective observations in libraries Created by
libraries Subject indexing User studies Analysis of publication
trends Collected by libraries research data
26. Data as communications Transferred or stored sign, a
container of meaning in form of sequence of bits Used to describe
(any) reality
27. Data as communications in libraries digital objects,
electronic resources, informational objects, electronic
publications, digital documents Created by libraries Publications
you publish Collected by libraries Publications you collect
28. Data as communications/documents in libraries A document is
not information but a recorded evidence in support of a fact
(Briet, 1951),3 which can be any possible statement. This notion
somehow got lost in the history of library and information science
/ documentation science (rom 2007). Advice: Dont mess with data as
facts or as observations but treat them as documents, like other
(digital) publications! 3 See Buckland (1997,1998) for an
introduction.
29. Section 3The importance of Libraries
30. What does a library do? A library collects, arranges, and
makes available (published) documents (among other services) to
meet user needs. This should also apply to digital documents:
collect data arrange data make available data
31. Collect data Figure: Data needs care
32. Figure: How many libraries store digital objects
33. The eResource fallacy Libraries that license eResources to
be accessed from publisher sites, limit their role to temporary,
intermediary retailers. Advice: Data that cannot be copied and
modifed is lost. Libraries must actually collect and process
digital documents (or wont be in the document business
anymore)
34. Make available data Digital collections can be made
available in dierent forms and places at the same time The more
libraries share digital document, the more they are perceived as
trustful sources of original versions.
35. Arrange data Availability implies methods to link and reuse
content Reuse and connections are already done in documents Support
linking, aggregation, processing (for instance as Linked Open Data)
Track changes when reusing (revision control)
36. Example: Annotations Figure: Inkunabel
37. Figure: Neatline.org screenshot by David McClure, map tiles
by StamenDesign (CC BY 3.0), data by OpenStreetMap (CC BY SA), maps
fromLoC Hotchkiss Map Collection
38. Section 4Summary
39. The situation In the end all content will be digital get
used to it! Software is inherently complex and becomes obsolete
Data is more important in the long term, if it can be used in
dierent context and time Simple access will not be the primary role
of libraries Whats the typical reaction to data in your
institution? If data activity is outsourced to tech people, would
you also consider outsourcing book activity to book people?
40. Care for data! Do what you do to physical documents collect
digital document make available digital documents arrange digital
documents Libraries can respond to the data challenges, because of:
Trust Neutrality Persistence Focus on notion of data as
communications instead of digging into details of research data
Ensure that documents can be used as data: copying must be possible
and easy modication must be possible and easy
41. Where to start Collect digital publications! Start
archiving public websites, blogs, mailing lists etc. Create and
manage data/document repositories (see yesterdays talks) Invest in
preservation Exchange digital documents with other libraries and
initiatives (see following talk by Herbert van de Sompel, LOCKSS. .
. ) Provide data as accessible as possible (Open Data) Publish your
own digital publications Allow annotating and connecting with your
digital documents
42. Data that is loved tends to survive Kurt Bollacker
43. Section 5Appendices
44. References Ballsun-Stanton, Brian (2012): Asking About
Data: Exploring Dierent Realities of Data via the Social Data Flow
Network Methodology. PhD thesis Briet, Suzanne (1951): Quest-ce que
la documentation? Editions documentaires, industrielles et
techniques Buckland, Michael (1997): What is a document? In:
Journal of the American Society of Information Science (JASIST)
48.9, pp. 804809 Buckland, Michael (1998): What is a digital
document? In: Document Numrique 2.2, pp. 221 230 e Frankfurt, Harry
G. (2005). On Bullshit. Princeton University Press rom, Anders
(2007): The concept of information versus the concept of document.
In: Skare et al. (eds.) : Document (re)turn. Contributions from a
research eld in transition. pp. 53-72. Peter Lang
45. Image credits and licenses All images from Wikimedia
Commons: construction.jpg CC-0 by Schweinepeterle (Rolf H.)
cubicals.jpg CC-BY-SA by David R. Tribble mcdonalds.jpg CC-0 by
Raysonho@Grid Engine wine.jpg CC-BY-SA by Rafael Garcia-Suarez
sh.jpg CC-BY-SA by mahalie stackpole periodictable.png CC-0 by
Cepheus droste.jpg CC-BY Zzubnik winecellar.jpg CC-BY-SA by Che
(Petr Novk) a cloud.jpg CC-0 by Sidik iz PTU regal.jpg CC-0 by
Czarna Trucizna telescope.jpg CC-0 by C. Zorzi tree.jpg CC-BY-SA by
Ji-Elle twins.jpg CC-0 by William Morris Agency earthquake.jpg CC-0
by Bert Cohen vermeer.jpg CC-BY-SA Kunstkenner2305 clown.jpg CC-BY
by Hamdan Zakaria
46. Got questions? Just Ask! http://libraries.stackexchange.com
Q&A about libraries and information science
47. This presentation as digital document Source code and
images of this presentation are available at
https://github.com/jakobib/ticer2012 to be copied and modied under
CC-BY-SA license.
48. What about original data from libraries? Data not used as
documents: facts: library data (holdings, patrons, loans, formal
metadata) observations: subject metadata (descriptions)
communication: your publications Can be used in other context and
times Example: Authority les, connected via VIAF Good use of data
refers to you as source and authority Advice: Provide as much as
possible your original data and documents. Just publish and care
for this documents like any other acquisitions.
49. Additional made-up quotes If libraries still care for
documents, they have to care for data. There is no complete
resource management system - the library is the resource management
system Librarians dont have to read all books, but known all books.
The same applies to data: dont understand all data as facts and as
observations, but understand data as publications.