Interoperability of Retrieval HARVEST as Search Engine Kerstin Zimmermann Oldenburg University Hamburg August 2000
Jan 03, 2016
Interoperability of Retrieval
HARVEST as Search Engine
Kerstin Zimmermann
Oldenburg University
Hamburg August 2000
Kerstin Zimmermann Oldenburg University
Retrieval
Server / Archive
Workstation
PC
private
public
Kerstin Zimmermann Oldenburg University
Server Structure in Science
University page
Department / Facultiy
working groups
members
Publications
Kerstin Zimmermann Oldenburg University
Kind of Archives / Sources
For Documents
a) lists (name, title, date, link)
b) additional with abstracts
c) only fulltext
d) metadata and fulltext
Kerstin Zimmermann Oldenburg University
Formats indexed• sgml x
• xml x
• html X
• ps X Text, attention: do not use graficmode ASCII required
• pdf X Text, Destiller-Options: asciipdf=on, commpressed text= off exchange do not use optimize
• doc X
• rtf X
• tex X
• dvi X
Kerstin Zimmermann Oldenburg University
Harvest
WWW-SERVER
http://www.physik...
Dissertation GATHERER
BROKER
HARVEST
User
Internal Area
NWWW Browser
Result...................................
Request
Result
Kerstin Zimmermann Oldenburg University
Dublin Core MetaDataTitle
Creator
Subject
Desciption
Date
Publisher
Contributer
Type
Format
Identifier
Relation
Source
Language
Coverage
Rights
http://purl.org/DC/
15 Elements
Kerstin Zimmermann Oldenburg University
Source Code<META NAME="DC.Subject" CONTENT="(SCHEME=MathNet) People.Faculty/Staff"><META NAME="DC.Format" CONTENT="text/html"><META NAME="DC.Creator.Person.Name" CONTENT="Zimmermann, Kerstin"><META NAME="DC.Creator.Person.Address" CONTENT="(Email) [email protected]"><META NAME="DC.Creator.Person.Address" CONTENT="(Phone) + 49 (0) 441 798 3465"><META NAME="DC.Creator.Person.Address" CONTENT="(Fax) + 49 (0) 441 798 5649"><META NAME="DC.Creator.Person.Address" CONTENT="(Postal) Fachbereich Physik Carl von Ossietzky Universitaet D-26111 Oldenburg"><META NAME="DC.Creator.Person.IsMemberOf" CONTENT="DPG"><META NAME="DC.Creator.Person" CONTENT="(Keywords) Dipl.Phys., Wiss.Mit."><META NAME="DC.Subject" CONTENT=""><META NAME="DC.Relation" CONTENT="(SCHEME=url) "><META NAME="DC.Relation.References" CONTENT="(SCHEME=url) "><META NAME="DC.Date" CONTENT="1999-03-8"><META NAME="DC.Type" CONTENT="Text.Homepage"><META NAME="DC.Rights" CONTENT="These personal data may not be used for any commercial purpose or incorporated in mailing lists without written permission by the author. They are free for information and communication services of learned societies."><LINK REL="SCHEMA.dc" HREF="http://purl.org/DC/">
Existent Gatherers and Brokers
www.mathnet.de
www.physik.uni-oldenburg.de/
EPS/PhysNet/
www.iuk-initiative.org/iwi/TheO/
Kerstin Zimmermann Oldenburg University
Search resultsHow they look like:
- list of results
- link to the index-file
- link to the fulltext
(- link to the word in text)
Kerstin Zimmermann Oldenburg University
Kerstin Zimmermann Oldenburg University
inhomogeneous
Kerstin Zimmermann Oldenburg University
Kerstin Zimmermann Oldenburg University
with Metadata
Kerstin Zimmermann Oldenburg University
How to create MetaData?
Using an online tool
Kerstin Zimmermann Oldenburg University
Global Harvest Serverstructure
global
national
Learned
field
Europewide
D D BO P A C
N D L TD
C h em is tryS U B
E d u c a tion a l S c ien ceU L
D eu ts ch er B ild u n g s -S erve r
In fo rm atic sC om p u te r c en tre
D ep artm en ts / In s titu tesU L
M ath em aticsIM P R E S S
D ep artm en ts / In s titu tesU L
P h ys icsP h ysD is
D iss B rokerTh eO
Kerstin Zimmermann Oldenburg University
Harvest - Configuration
Provider
Provider
Provider
Gatherer
Broker
Broker
gmb
SOIF
HTTP
GathererProvider
Provider
Kerstin Zimmermann Oldenburg University
@FILE { http://www.physik.uni-oldenburg.de/Docs/THEO3/publications/metadocs/ebs.shell.structure.htmlupdate-time{9}: 938935362url-references{208}: http://www.physik.uni-oldenburg.de/Docs/THEO3/publications/ebs.shell.structure.pdfmailto:[email protected]://www.physik.uni-oldenburg.de/Docs/THEO3/publications/ebs.shell.structure.pdftitle{59}: Shell Structure and Stability of Very Neutron-Rich Isotopeskeywords{97}:
and author date eberhard ebs files hilf isotopes neutron pdf rich shell stability structure very
head{16}: -Version 1.0 -->dc.type{59}: InProceedings(SCHEME=Freetext)publication-status=publisheddc.title{59}: Shell Structure and Stability of Very Neutron-Rich Isotopesdc.publisher{18}: IKDA, TH Darmstadtdc.language{18}: (SCHEME=Z39.53)ENGdc.format{15}: application/pdfdc.date{75}: (SCHEME=ANSI.X3.30-1985)1975(SCHEME=ANSI.X3.30-1985)(TYPE=current)19990408dc.creator{126}: Eberhard R. Hilf(TYPE=email)[email protected](TYPE=phone)+49-(0)441-798-2543(TYPE=fax)+49-(0)441-798-3201body{190}: =+4>Shell Structure and Stability of Very Neutron-Rich Isotopes Author:Eberhard R. Hilf Phone: +49-(0)441-798-2543 Fax:+49-(0)441-798-3201 Files: ebs.shell.structure.pdf Date: 1975md5{32}: bc1f2750a042a8175cce710030c60d76file-size{4}: 2440type{4}: HTMLgatherer-version{6}: 1.5.19gatherer-host{31}: egoiste.physik.uni-oldenburg.degatherer-name{17}: Physics Oldenburgrefresh-rate{5}: 86400time-to-live{7}: 3888000last-modification-time{9}: 928224570description{186}: =+4>Shell Structure and Stability of Very Neutron-Rich Isotopes Author:Eberhard R. Hilf Phone: +49-(0)441-798-2543 Fax:+49-(0)441-798-3201Files: ebs.shell.structure.pdf Date: 1975}
SOIF: Example
Kerstin Zimmermann Oldenburg University
<tags> and MetadataHTML Element SOIF-Element
<A HREF> url-reference{}
<ADDRESS> address{}
<H1 ... H6> headings{}
<TITLE> title{}
...
Metadaten SOIF-Element
DC.title dc.title{}
DC.author dc.author{}
...
Kerstin Zimmermann Oldenburg University
Harvest-Sources:
ftp://ftp.tardis.ed.ac.uk/pub/harvest/develop/snapshots/
More infos:
http://www.dissonline.org/harvest.html
Harvest links
Kerstin Zimmermann Oldenburg University
Port-Numbers• Harvest 8500
• Webserver http 80
• ftp 21 tcp
• telnet 23
• smtp (email) 25
• pop3 110
• time-server 123
Kerstin Zimmermann Oldenburg University
Why Harvest?
• set up portals for specific needs
• heterogeneous archives
• runs on different platforms
• Software public domain (lower costs)
• open sourcecode
• world wide community
Kerstin Zimmermann Oldenburg University
runtime and pack of dataDFN-Net 3 Docs pro Minute
connecting time see Browser
index [ms]
memory 9 MB
PhysDis (Jan.‘00) 306 ‚real‘ links
1475 Documents
112 Server
Gatherer 2h 4min
Kerstin Zimmermann Oldenburg University
Legal Aspects
§• searching a database- look for robot.txt
• Discussion in DC.Rights
- rights of the resource (un-)restricted access / use
- rights of Metadata
Kerstin Zimmermann Oldenburg University
Topics of Discussion• Search depth
• fulltext vs metadata + abstract
• integration of old archives
• access
Questions, Commtents