LIS65 4 lecture 2 history

LIS654 lecture 2

history

Thomas Krichel2011-09-14

contents

• some known old thinkers– Vannevar Bush– Joseph Carl Robnett Licklider, aka “Lick”

• the birth of the repository

background

• Vannevar Bush (1890—1974) directed the US office of Science Research and Development during WW2.

• As the war ended he saw two problems– how to make the war time scientific reports

available– find a new challenge for the scientists

• He proposed a solution in “As we may think”.

As we may think

• Vannevar Bush (1890—1974) wrote his famous essay in 1945.

• It remains to date one of the most frequently cited papers in Library and Information Science.

• I think this fame is somewhat undeserved.

the scientific record

• As scientists do more work, the “record” extends. This is good.

• Recent advances in microfilm also made is possible to store more of the record in microfilm.

• But with much research and increased specialization, “significant attainments become lost in the mass of the inconsequential”.

the memex

• The memex was a proposed desktop machine that would store millions of books in microfilm.

• It would have a mechanism that would allow any known item from the collection rapidly.

• But the problem is what items to look at?

organizing the collection

• Still today collections are organized hierarchically from class to subclass. Think of standard classification schemes.

• Or in a list of words, in a mechanical form from letter to letter.

• Bush rejected this, claiming that the brain does not work in that way.

the brain, by Bush

• Bush thought that the brain works by association.

• “With one item in its grasp, [the brain] snaps instantly to the next that is suggested by association of thought”.

• This is done “in accordance with some intricate web of trails carried by the cells of the brain.”

memex as a brain

• Every time a document is added to the memex it is given an identifier.

• Every time an item is consulted the user can associate with it other items. These associations are recorded.

• Trails of associations can be annotated and copied.

• Selection by association replaces indexing.

sharing

• An annotated trail between items can form a new item. That item can be shared.

• Bush envisioned that there would be a way for each memex to learn from all other memexes.

• Memex users would improve their thinking ability by its use.

• This would greatly increase the speed of scientific discoveries.

implementation

• There is no evidence that anything like the memex was ever built.

• Microfilm was replaced by digitization.• But the idea of associative trails or associative

indexing has something to do with the hypermedia.

• The later goes back to Ted Nelson.

Licklider

• Joseph Carl Robnett Licklider (1915—1990) trained as a mathematician and psychologist and worked mainly at the MIT.

• The Council of Library Resources got funding from the Ford Foundation to examine how technology could help libraries.

• Work was undertaken by Bolt, Beranek and Newman (BBN) of later ARPA fame

the basic idea

• The idea was that one could store all knowledge in a single or distributed machine.

• How this should be done?• Well first Lick estimated the corpus would be

10^14 bytes by the year 2000.• That’s about 500 20TB systems. • It could a be a central system with thin clients.

the system

• The system was call “procognitive” meaning for the advancement of knowledge.

• It would not be based on documents, metadata and retrieval.

• It would process information into knowledge and questions into answers.

• Users transmit their knowledge to the system.

information to knowledge

• To see how information can be processed into knowledge, Lick, looked at the human brain. He had studied cat brains in his PhD work.

• If it is possible to the process the body of information into knowledge structures, then questions can be answered by knowledge rather than be documents.

Lick on the brain

• The brain receives stimuli and stores representations of them.

• The brain finds answers to question by processing stored memory on the basis of “schemata”, which are ways in which stored stimuli representations can be processed.

human processing• Lick understood that current and foreseeable

technology would not allow processing of documents into knowledge.

• This would be the job of set of librarian called “procognitive system specialists”.

• The would encode contents of documents in a knowledge language.

• They would watch for ambiguity warnings.• Users would also provide feedback.

encoding

• Surprisingly Lick still imagined the procognitive system be based on natural language.

• The hope was that artificial intelligence (AI) methods would be developed to extract information from documents.

• That hope seemed justified in the 60s when AI was in its infancy.

steps to implementation

• The first attempts, in the 60s, tried to find the citation string in a database of citations.

• Thus this was more information retrieval on a small set of metadata than actual digital library work.

• Librarians preparing bibliographies for researchers were the prime users.

into 80s

• In the 80s the personal computer “came back”.

• Searching could be done of the full-texts of document.

• Browsing became available.

90s

• In the 90s the Internet and the search engine came along.

• Initially search engines followed standard information retrieval principles.

• My first work, about 1993, was based on gopher access and WAIS indexing.

the semantic web• The semantic web is the actual successor to

Lick’s vision.• It’s still not done.• I speculate it will not be done for a long time.• The reason is that while Lick thought

Psychology and Computers, he did not think through the economics of operating such system as the ones that he proposed.

• He also had too optimistic a vision about AI.

back in the trenches

• As we have seen early digital library visions have been inspired by the concern of access to scientific documents.

• The academic digital library was synonymous with the digital library.

• So all the progress was there, and pretty much is.

academic documents

• There are basically two types of academic documents.

• There are academic books and academic articles or papers.

• Both of them have been treated in different ways in the past, and continue to be treated differently at this time, maybe not for long.

academic books (monographs)

• Books are (were) purchased by libraries. • They were cataloged into the local integrated

library system– locally or– through shared cataloging

academic papers

• Most of them published in serials.• Libraries never catalogued them locally.• They relied on third party services to provide

abstracting and indexing services for them.

publishing academic papers

• Publishing academic papers go through a process of peer review.

• Papers are written for free by some academics.

• They are being reviewed for free by other academics.

• The profits from publishing go to publisher. Academic publishing is very profitable.

non-formal publication

• Some academic disciplines have a tradition of informal publication of papers that have not peer-reviewed.

• These are – mathematicians and physicists have preprints.– computer scientists and economists have working

papers.

preprints vs working papers• Preprints were sent by academics to

colleagues.• Working papers are issued by departments

and sent other department by an exchange agreement.

• Whatever the mode of working, non-formal publication channels enabled librarians to build really digital libraries.

• Actually they were more built by their users.

xxx.lanl.gov• This was/is a preprint server started by Paul

Ginsparg at Los Alamos National Archives.• It has been popular with physicists and

mathematicians. • It’s coverage with sub-disciplines varies.• It became arXiv.org.• It moved Cornell University in 2001.• It is now run by Cornell University libraries.

NCSTRL• was the network computer science technical

report library, a DARPA/NSF funded project that built an infrastructure for publishing computer science working papers.

• Starting in 1993, it was built on a formal protocol called Dienst. This enables local and remote services.

• Implementation software was deployed at participating institutions.

• Collapsed completely when funding was gone.

RePEc

• is a federated system based on metadata (ReDIF) and a transport protocol (Guildford Protocol), both written by yours truly.

• It can be run on a standard ftp or http software.

• RePEc archives don’t offer local services to end users.

UPSPROTO• In 2000, Herbert Van de Sompel started work

to build a prototype system to provide the existing discipline-based digital libraries.

• The experience lead to the formation of a working group that created an interoperabilty protocol called the Open Archives Protocol for Public Metadata Harvesting. (OAI-PMH).

• I was part of that group.

repositories

• OAI-PMH has been so widely implemented in repositories that we can say that a repository is a collection of documents on a server that implements this protocol.

• The is no official lists, but counts for institutional repositories now go over 2000.

institutional repositories• The initial purpose of the institutional

repositories has been to make institutional research papers available.

• This would create open access to research papers.

• But the success of deposit of real scientific work has been muted.

• In the meantime there are other type of contents in IR.

http://openlib.org/home/krichel

Please shutdown the computers whenyou are done.

Thank you for your attention!

LIS65 4 lecture 2 history

Documents

bush bush

thinkvannevar bush

memexthe memex

new item

memex users

known item

brain snaps

trails of associations