Greenstone Digital Library Software: An Overview Imran Mansuri Project Assistant (Library Science) INFLIBNET Centre 1 7 March 2011 Prepared by Imran Mansuri
Greenstone Digital Library Software:An Overview
Imran MansuriProject Assistant (Library Science)
INFLIBNET Centre17 March 2011 Prepared by Imran Mansuri
Agenda
• Introduction : Digital Library Software (DL)• Greenstone Digital Library Software (GSDL)• Introduction• History• Versions• Features• Unique Features• Technology used• Example Sites• Example Collections
27 March 2011 Prepared by Imran Mansuri
Digital Library Software• The term “Digital Library” refers to a library in
which collections are stored in digital formats(as opposed to print, microform, or othermedia) and accessible by computers
• The digital content may be stored locally oraccessed remotely via computer networks
• Access the books, images are in digital format• Using Net access to information from
anywhere37 March 2011 Prepared by Imran Mansuri
Digital Libraries : Features
Dynamic Electronic Information Systems
Increase Portability
Efficiency of Access
Flexibility
Availability
47 March 2011 Prepared by Imran Mansuri
Digital Library Software
Dspace
Fedora
Eprints
Resource Space
Greenstone
57 March 2011 Prepared by Imran Mansuri
Greenstone Digital Library Software• The Greenstone Digital Library Software (GSDL)
provides a way of building and distributing digitallibrary collections, opening up new possibilitiesfor organizing information and making it availableover the Internet or on CD-ROM
• Developed by the New Zealand Digital LibraryProject (www.nzdl.org) at the University ofWaikato
• Distributed in co-operation with UNESCO andHumanities Library Project, Romania
67 March 2011 Prepared by Imran Mansuri
GSDL : Some Facts
• Current version: 2.82 and 3.03 Available from http://www.greenstone.org• Software suite for building, maintaining, and
distributing digital library collections• Comprehensive, open-source• Distribution and promotion partners:o UNESCOo Human Info NGO, Belgium
77 March 2011 Prepared by Imran Mansuri
GSDL : History1995 - Digital library of Computer Science
Technical Reports. Its established by New Zealand Digital Library
1997 - Decision to use the GPL (General Public License ); name : Greenstone adopted ; Work with Human Info NGO to produce humanitarian CD-ROMs
1998 Apr - First CD-ROM collection released: Humanity Development Library
1998 Aug - Greenstone.org website established1999 BBC - Collection established
87 March 2011 Prepared by Imran Mansuri
2000 Apr - Greenstone mailing list startedAug - Formally established cooperative effort with
UNESCO and Human Info NGONov - Distribute software on SourceForge2002 Apr - Development of Greenstone3Mar - Official opening of the Niupepa collection,
development of the Greenstone Librarian Interface
Jun - First UNESCO Greenstone CD-ROM
97 March 2011 Prepared by Imran Mansuri
2003 - A Java development that became known as the Greenstone Librarian Interface2005 Nov - Initial release of Greenstone32006 Apr - Greenstone Support Group for
South Asia launched
107 March 2011 Prepared by Imran Mansuri
GSDL : Version 2000 Feb - gsdl 2.12 Apr - gsdl 2.21 Dec - gsdl 2.30 2001 Feb – gsdl 2.31 2002 Jan – gsdl 2.38 2003 Jun - gsdl 2.40 2004 Feb – gsdl 2.50 2005 Apr – gsdl 2.60 and in November - gsdl 3.00 2006 Mar – gsdl 2.70 2007 Apr – gsdl 2.80 2008 gsdl 3.03 Current release gsdl 2.82
117 March 2011 Prepared by Imran Mansuri
GSDL : Features
Multi S/W PlatformMulti Lingual Support Structured Metadata in XML using DCMetadata Extraction Plug-ins for Documents Full-text mirroring Text Level Penetration Concurrent & Dynamic Content Development Uniform Presentation
127 March 2011 Prepared by Imran Mansuri
Collection Building
• Web and command line mode• Input collections:• GSDL server (files)• Remote (FTP - files, HTTP – website pages)• Collection input: batch mode, NOT interactive• Document formats: HTML, PDF, Text, Word• (Doc, RTF), PS, e-mail, bibliographic
137 March 2011 Prepared by Imran Mansuri
• Support for full text tagging for hierarchicaldocument browsing
• Automatic text extraction and indexing‘Plugins’ for different document formats(HTMLPlug, PDFPlug, etc.) May fail for some
documents!XML representation – conversion to HTML forDisplayNative document format – storage and display (viabrowser plugins, helper applications)
• Data compression support
147 March 2011 Prepared by Imran Mansuri
• MetadataAutomatic extraction of simple metadata
(e.g. Title, date)Explicit metadata via ‘Classifiers’
Hierarchical (e.g. Subject)List (e.g. Organization, Author)
Used for browsing and field-based searchingMulti-language support via Unicode
157 March 2011 Prepared by Imran Mansuri
Collection Browse and Search
• Full text search• Metadata (field) search and browse• Boolean• Ranked• Multi-language support for browse/
search interface• Search history, search term• highlighting…
167 March 2011 Prepared by Imran Mansuri
Collection Presentation
• Search results formattingFormat strings in the configuration file
• Home page customizationUsing macros
177 March 2011 Prepared by Imran Mansuri
GSDL : Features
Easy Installation Easy Maintenance Hierarchy Structure Interface Customization
– Front Page Design, Header for the DigitalLibrary, Collection Icon, Cover Images
Collection Configuration (Collect.cfg) File Scalability, Flexibility
187 March 2011 Prepared by Imran Mansuri
Collection Distribution
• Web• CD-ROM Publish created collections to the CD-ROM Windows only Two possibilities:o Install GSDL software to HDD and access
content on CDo Run GSDL search engine out of the CD!
197 March 2011 Prepared by Imran Mansuri
GSDL : Unique Features
Incremental Collection Building Content Development in 3 different ways Good Documentation and Active Mailing
List Variety of Plug-ins for different document
Types Publishing on CD-ROMs Data Compression
207 March 2011 Prepared by Imran Mansuri
GSDL : Technology Used
• Technology used in the current version– Java 1.6 (Higher)– Image Magic– Application Server : Apache 2.2– GSDL_Linux 2.82 and Win
217 March 2011 Prepared by Imran Mansuri
GSDL : Example Sites
India: Archives of Indian Labour
227 March 2011 Prepared by Imran Mansuri
United States: New York Botanical Garden
237 March 2011 Prepared by Imran Mansuri
International: Global Library Services Network
247 March 2011 Prepared by Imran Mansuri
257 March 2011 Prepared by Imran Mansuri
267 March 2011 Prepared by Imran Mansuri
277 March 2011 Prepared by Imran Mansuri
Some ObservationsStrengths: Configurability: content extraction for indexing,
presentation layout, metadata for browsing and field-based searching (little difficult though!)
Extensibility:Plugins for content extraction, Unicode for
multilanguage support, source code availability Fulltext search on variety of document formats XML, Unicode, Dublin Core support Data compression CD-ROM publishing
287 March 2011 Prepared by Imran Mansuri
Limitations:
Interactive content updating and management not possible
No duplicate identification Metadata handling appears to be little complex Linux version seems to be more robust than
WindowsHangs while processing some documents during
collection building – no way to gracefully handle this
297 March 2011 Prepared by Imran Mansuri
Current Status
Strong development work – CS department at University of Waikato, NZ Z39.50 experimental interface now available Promoted by UNESCO Beginning to be used worldwide Can be
expected to reach CDS/ISIS like popularity (particularly in developing countries)
307 March 2011 Prepared by Imran Mansuri
Documentation and Help
• Available at: http://www.greenstone.org– Software– Demo collections– FAQ– Tutorial materials• Documentation: Installer’s Guide, User’s Guide, Developer’sGuide,
and other reading materials
317 March 2011 Prepared by Imran Mansuri
• Mailing lists:– Greenstone Users List– Greenstone Developers List
• Greenstone Documentation Wiki
http://wiki.greenstone.org/wiki/index.php/GreenstoneWiki
327 March 2011 Prepared by Imran Mansuri
337 March 2011 Prepared by Imran Mansuri