Greenstone tutorial 2006 New Zealand Digital Library Project 1 Greenstone: Open source software for building digital library collections Ian H. Witten and Kathy Don Computer Science Department Waikato University New Zealand http://greenstone.org http://nzdl.org
58
Embed
Greenstone: Open source software for building digital ...lquiroga/Greenstone/HawaiiLecture2xpageBW.pdf · Greenstone tutorial 2006 New Zealand Digital Library Project 1 Greenstone:
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
What does Greenstone do?Greenstone facts; standards
Reader’s Interface: examples of collections
Librarian interfaceBuild a collection in 30 sec (Hobbits)
Build a multimedia collection (Beatles)Adding and using metadata
Browsing classifiers, search indexesBuilding a collection manually (for masochists only)
Advanced stuffUnder the hood: collection configuration file
Customizing with macrosPersonalizing your home pageDifferent interface languages
Examples of what others have done
Reaching outServing and acquiring OAI
DSpace and METSGreenstone3
What we wanted
Greenstone turns a ragtag menagerie of documentsin various formats into an easy-to-use collection thatcan run on a standalone laptop in a Ugandan village’sinformation center
ALA 2002
Greenstone tutorial
2006
New Zealand Digital Library Project
4
“Collections” of digital materialIndividualized, depending on metadata etcUp to several Gb of text …… + associated images, movies, whateverFully searchableServed on WWW, or published on removable mediaRun anywhere, on any computerFully internationalizedNon-exclusive: documents and metadata in any formatNon-prescriptive: standard and non-standard metadata
What we wanted
Plugins — new document, metadata formatsClassifiers — new metadata browsers
What we got: GreenstoneAccessible via any Web browserServer runs on anything (all Windows + Unix + Mac)Collections can be published on CD-ROM/DVDTrivial to installGUI interface for building and publishing collections
Access
Collection-specificFull-text and fielded searchFlexible browsing facilitiesMetadata-based (Dublin Core recommended)Creates all access structures automatically
Searching/browsing
Multilingual: Documents and interfacesMultimedia: image, video, audio collections existMultiformat: Documents and metadata
Multi-*
Extensible
Greenstone tutorial
2006
New Zealand Digital Library Project
5
UNESCO: DistributingGreenstone DL software
GNU licensedFully documented … in English/French/Spanish/RussianLanguage interfaces … Arabic Chinese Czech … Thai TurkishUnix/Windows/Mac OS-XTrivial to installGUI interface for gathering, enriching, building …Serve collections on Web or write them to CD-ROMDocument formats: HTML, Word, PDF, PS, plain text, e-mail Metadata formats: XML, DC, OAI, MARC, …
“Give a man a fish, feed him for a dayTeach a man to fish, feed him for life”
Sustainable development
Greenstone software on CD-ROM
download from http://greenstone.org
Greenstone tutorial
2006
New Zealand Digital Library Project
6
Languages for interface: 38Languages for full software + manuals: 4Countries represented on email lists: 60UNESCO training courses in:
Bangalore, Almaty, Dakar, Suva, …
Greenstone factsOpen source: Gnu GPLDistributed via SourceForge since: Nov 2000Average downloads: 5000/month since thenHumanitarian CD-ROMs produced: 30-35Distribution for each one: 5000/year
Distribution
UNESCO, Paris (“Information for All” programme)FAO, Rome (Info Management Resource Kit)UNU, Japan (CD-ROM collections of UNU material)
UN Agencies
International
University of Waikato, New ZealandIndian Institute of Sciences, BangaloreUniversity College, LondonUniversity of Cape Town, South AfricaUniversity of Lethbridge, Canada
Technical centers
Sample collections at greenstone.org
Auburn University, AlabamaDetroit Public LibraryHawaiian Electronic Libraryibiblio project, University of North CarolinaIllinois Wesleyan UniversityLeHigh University, PennsylvaniaNew York Botanical GardenUniversity of California at RiversideUniversity of Chicago LibraryUniversity of IllinoisTexas A&M UniversityWashington Research Library Consortium
Argentina Human Rights Commission ArgentinaTasmania State Library AustraliaPeking University Digital Library ChinaGresham College, London EnglandUniversity of Applied Sciences, Stuttgart GermanyAssociation of Indian Labour Historians, Delhi IndiaIndian Institute of Management, Kozhikode IndiaIndian Institute of Science, Bangalore IndiaVimercate Public Library, Milan, Italy ItalyNetherlands Institute for Scientific Information Services NetherlandsPhilippine Government Information Network PhilippinesMari El Republic, Russia RussiaSlavonski Brod Public Library, Slovenia SloveniaVietnam National University VietnamWelsh Books Council Wales
International
U.S.
Greenstone tutorial
2006
New Zealand Digital Library Project
7
Plugins for
StandardsCan use any metadata set, Dublin Core suppliedPlugins for
METS can be used as Greenstone’s internal representation
Metadata
Documents
Web Can publish Greenstone collections on CD-ROMCan publish Greenstone collections on OAIExport collections to METSExport collections to DSpace (ready for DSpace’s batch import program)
Serving
PDFPostScriptWord, RTFHTMLPlain textLatex
Images(any format: GIF, JPEG, TIFF …)
MP3Ogg VorbisUnknownPlug
(e.g. for audio, MPEG, Midi)
ZIPExcelPPTEmailSource code
XML ReferMARC OAICDS/ISIS METS (subset)ProCite DSpaceBibTex
What is open-source software?
“The basic idea behind open source is very simple: When programmers can read, redistribute, and modify the source code for a piece of software, the software evolves. People improve it, people adapt it, people fix bugs. And this can happen at a speed that, if one is used to the slow pace of conventional software development, seems astonishing.”
- from www.opensource.orgAnyone can redistribute the software, even for a feeSource code must always be available
Greenstone tutorial
2006
New Zealand Digital Library Project
8
Ghostscript
Kea
pdftohtml
rtftohtml
TextCat
wvWare
Xlhtml
XML::Parser
Interpreter for Adobe Postscript documents (Postscript plugin)
Keyphrase extraction program (to generate metadata)
Converter for PDF documents (PDF plugin)
Converter for RTF documents (RTF plugin)
Detects languages and document encodings
Converter for Word documents (Word plugin)
Converter for Excel/Powerpoint documents (plugins)
Parses XML documents, used to read and write Greenstone’s internal XML document format
The power of open source: Greenstone uses …
MG
GDBM
wget
YAZ
Stemmer
GCC
CVS
Perl
Apache
Creates compressed full-text indexes and performs searches
Database used for metadata etc
Downloading pages from the Web when creating collections
Client and server implementation of Z39.50
English language stemmer
C/C++ compiler
Version control system
Used for plugins etc
Web server used by many Greenstone installations
and …
Greenstone tutorial
2006
New Zealand Digital Library Project
9
Humanity Development Libraryfor sustainable development and basic human needs
CD-ROMUS$1 Win3.1x upwardStand-aloneand intranet serverWeb browser user interface
Global Help Project, Antwerp (+ UN agencies)
AgendaOverview
What does Greenstone do?Greenstone facts; standards
Reader’s Interface: examples of collections
Librarian interfaceBuild a collection in 30 sec (Hobbits)
Build a multimedia collection (Beatles)Adding and using metadata
Browsing classifiers, search indexesBuilding a collection manually (for masochists only)
Advanced stuffUnder the hood: collection configuration file
Customizing with macrosPersonalizing your home pageDifferent interface languages
Examples of what others have done
Reaching outServing and acquiring OAI
DSpace and METSGreenstone3
Greenstone tutorial
2006
New Zealand Digital Library Project
10
New York Botanical Garden
o Rare 19th century works on American trees
o Gorgeous full-color plates
New York Botanical Garden
o Rare 19th century works on American trees
o Gorgeous full-color plates
Greenstone tutorial
2006
New Zealand Digital Library Project
11
University of Chicago Library
University of Chicago Library
Greenstone tutorial
2006
New Zealand Digital Library Project
12
Greenstone tutorial
2006
New Zealand Digital Library Project
13
Chinese documents(pictures of text)
+ Chinese interface
Peking University Library
Chinese(Chinese & English interfaces)
Classic Chinese literature
Greenstone tutorial
2006
New Zealand Digital Library Project
14
UNESCO, Paris
French
PAHO, WHO
Spanish
Greenstone tutorial
2006
New Zealand Digital Library Project
15
Russian
Mari El Republichttp://gov.mari.ru/gsdl
AgendaOverview
What does Greenstone do?Greenstone facts; standards
Reader’s Interface: examples of collections
Librarian interfaceBuild a collection in 30 sec (Hobbits)
Build a multimedia collection (Beatles)Adding and using metadata
Browsing classifiers, search indexesBuilding a collection manually (for masochists only)
Advanced stuffUnder the hood: collection configuration file
Customizing with macrosPersonalizing your home pageDifferent interface languages
Examples of what others have done
Reaching outServing and acquiring OAI
DSpace and METSGreenstone3
Greenstone tutorial
2006
New Zealand Digital Library Project
16
(Tutorial exercise #5: small collection of HTML files)Invoke GLI: build a small collection of HTML files
GatherCreateLook at extracted metadata Set up shortcut in the Librarian interface
The Greenstone Librarian Interface (GLI)Building collectionsInteractive Java programRuns on anythingBuild a collection on the computer you are on… plus new applet version Includes metadata editor
Caveat: cannot deal with such huge collections as Greenstone can (particularly of metadata)
Create a new collection
Greenstone tutorial
2006
New Zealand Digital Library Project
17
Gather: Gather the files together
Create: Build the collection
Greenstone tutorial
2006
New Zealand Digital Library Project
18
Preview: admire the result
An example: Beatles collection
Audio: MP3 filesMidi files zipped up in a single .zip file
Discography: HTML files (including many images)Images: JPEGs of album coversLyrics: HTML filesMARC: records containing relevant bibliog recordsSupplementary: material in PDF and Word filesTablature: guitar tablature in .txt files
Add/modify classifiersmodify to display dc.title or ex.titleadd one for “browse” buttonremove the one for filenameadd one for phrase indexadd regular expressions to clean up titles
Modify format statementsshow title only for cover imagessuppress text document icon for MP3/MIDI itemsmake bookshelves show how many documents they contain
Generalassign collection iconsassign icons for non-standard media types: lyrics, discography, etc
Greenstone tutorial
2006
New Zealand Digital Library Project
25
Full-text search
Form-based search
Greenstone tutorial
2006
New Zealand Digital Library Project
26
Browsing titles
Browsing document types
Greenstone tutorial
2006
New Zealand Digital Library Project
27
Hierarchical phrase browser
News flash: Applet version of GLI
Collection on remote Greenstone server
Edit it locallygatherenrichdesigncreate
Authenticationuses admin facility
Locking
Greenstone tutorial
2006
New Zealand Digital Library Project
28
AgendaOverview
What does Greenstone do?Greenstone facts; standards
Reader’s Interface: examples of collections
Librarian interfaceBuild a collection in 30 sec (Hobbits)
Build a multimedia collection (Beatles)Adding and using metadata
Browsing classifiers, search indexesBuilding a collection manually (for masochists only)
Advanced stuffUnder the hood: collection configuration file
Customizing with macrosPersonalizing your home pageDifferent interface languages
Examples of what others have done
Reaching outServing and acquiring OAI
DSpace and METSGreenstone3
Set up environment variables
ImportConvert to archive formatExtract metadata
Docs inGreenstoneArchive format
Build
collect.cfg(plugins)
create indexing & browsing structures, compress …
Greenstone collection
collect.cfg
Search Resultscollect.cfg + macros (main.cfg)
MakecolCreate a directory for the collection (with subdirectories), put collect.cfg file in “etc” subdirectory
name, icon, etcdescriptionemail of creatorsearch indexespluginsclassifiers
documentsquery resultsclassifiers
how to format
Add full-text index of titles
... or authors
Add alphabetic author browser
Include Word documents
Include PDF documents
Separate index for each language
Extract acronyms and add list
Import OAI metadata
Extract phrase hierarchy and addbrowser
Alter the format of any of the above
Restrict collection’s interface langs
Change default interface language
additional indexes line
… need author metadata
add classifier line
add plugin line
(same)
add languages line
plugin option
add plugin line
add classifier line
add format string
add format string
edit site config file
Alter configurationindexes document:Title
classify AZList –metadata Creator
indexes document:Creator
plugin WordPlug
plugin PDFPlug
languages en fr es
plugin PDFPlug –extract_acronyms
classify Phind
format …
format PreferenceLangs en|fr|es
cgiarg shortname=1 argdefault =fr
plugin OAIPlug
Greenstone tutorial
2006
New Zealand Digital Library Project
32
call the action
Generating web pages
request
(with arguments)
static
dynamic
action
process the arguments
generate web page
(using format, macros)
content
access
library generates the bare bones of web pages
format statements, macros wrap them with flesh
library
Analyse the requestDecide which action
send
response
call the action
Generating web pages
request
(with arguments)
static
dynamic
action
process the arguments
generate web page
(using format, macros)
content
access
library generates the bare bones of web pages
format statements, macros wrap them with flesh
library
Analyse the requestDecide which action
send
http://…/library?c=demo&a=p&p=about)
a=pc=demop=about
about.dm
Collection info dbFormat statements
Page actionDemo collection“about” page
response
Greenstone tutorial
2006
New Zealand Digital Library Project
33
Customizing with macros– let you customize presentation– present pages in different languages– print variables into the page text
(e.g. number of search hits)
Macro files– stored in gsdl/macros folder– each file defines one or more “packages”
(A “package” is a group of macros)
– loaded on startup(note difference between Local and Web Library)
– listed in etc/main.cfg
Collection-specific macros– Stored in gsdl/collect/mycol/macros/extra.dm– Or include argument [c=collectionname] for each macro
Personalizing your home pageC:\Program Files\gsdl\etc\main.cfg change home.dm to yourhome.dm
Greenstone tutorial
2006
New Zealand Digital Library Project
34
yourhome.dmpackage home _content_ { <h2>Your own Greenstone home page</h2> <ul> <table> <tr valign=top><td>Search page for the demo collection<br></td> <td><a href="_httpquery_&c=demo">Click here</a></td></tr> <tr><td>"About" page for the demo collection</td> <td><a href="_httppageabout_&c=demo">Click here</a></td></tr> <tr><td>Preferences page for the demo collection</td> <td><a href="_httppagepref_&c=demo">Click here</a></td></tr> <tr><td>Home page</td> <td><a href="_httppagehome_">Click here</a></td></tr> <tr><td>Help page</td> <td><a href="_httppagehelp_">Click here</a></td></tr> <tr><td>Administration page</td> <td><a href="_httppagestatus_">Click here</a></td></tr> <tr><td>The Collector</td> <td><a href="_httppagecollector_">Click here</a></td></tr> </table> </ul> }
Macros used in home.dm_httppagehome_ name of the home page_httppagehelp_ … the help page_httppagestatus_ … the administration page_httppagecollector_ … the Collector page_httpquery_&c=demo search page for the demo collection _httppageabout_&c=demo about page for the demo collection _httppagepref_&c=demo preferences page for the demo collection_content_{ … } defines a macro called _content_
contains HTML, but ‘{‘, ‘}’, ‘\’, and ‘_’ must be escaped with a backslash
_header_{ … } HTML page header (contains squirly bar)_footer_{ … } HTML page footer
main.cfg contains list of macros, replace home.dm by yourhome.dm and put it in the macrosdirectory
_textimagesearch_ {Search for specific terms}_textimageCreator_ {Browse alphabetical list of authors}_textimageTitle_ {Browse alphabetical list of titles}_textmatches_ {Matches }_textbeginsearch_ {Begin Search}
_texticonsmalltext_ {View this section of the text}_texticondetach_ {Open this page in a new window}_texticonexpandtoc_ {Expand table of contents}_texticonexpandtext_ {Display all text}_texticonhighlight_ {Highlight search terms}
Requires a full webserver (not “local library” version) Configuration file: etc/oai.cfg in the Greenstone filespace– repository name and version (OAI 1.1 or 2.0)– collections to be made accessible to OAI clients– metadata mapping file into DC (server only supports DC)
OAI 1.1OAI 2.0
Greenstone tutorial
2006
New Zealand Digital Library Project
54
1. Use Greenstone’s importfrom.pl command to acquire data from the JCDL01 collection at rocky.dlib.vt.edu
2. Use Greenstone’s import.pl and buildcol.plcommands to build a service provider based on the acquired metadata (and documents).
Using OAI-PMH, build a Greenstone collection based on metadata exported from an OAI server
Acquiring OAI metadata + docs
OAI Collection: acquisitionUse importfrom.pl to acquire metadata from the
external data provider:
gsdl% importfrom.pl oai-eOAI Acquire: from rocky.dlib.vt.edu/~jcdlpix/cgi-
bin/OAI1.1/jcdlpix.plRequesting list of identifiers ...... Done.Downloading metadata record for oai:JCDLPICS:200101dla1.oaiGetting document
http://rocky.dlib.vt.edu/~jcdlpix/pictures/200104dla/01dla1.jpgDownloading metadata record for oai:JCDLPICS:200101dla2.oaiGetting document
http://rocky.dlib.vt.edu/~jcdlpix/pictures/200104d1a/01dla2.jpgDownloading metadata record for oai:JCDLPICS:200101dla3.oaiGetting document
http://rocky.dlib.vt.edu/~jcdlpix/pictures/200104dla/01dla3.jpg…Number of documents processed: 81
Greenstone tutorial
2006
New Zealand Digital Library Project
55
OAI Collection: acquisitionExcerpts from Greenstone collection configuration file.