Top Banner
Interoperability of Retrieval HARVEST as Search Engine Kerstin Zimmermann Oldenburg University Hamburg August 2000
25

Interoperability of Retrieval

Jan 03, 2016

Download

Documents

tanner-humphrey

Interoperability of Retrieval. HARVEST as Search Engine. Kerstin Zimmermann Oldenburg University Hamburg August 2000. Retrieval. public. WWW. Workstation. Server / Archive. PC. private. Server Structure in Science. University page Department / Facultiy working groups members - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Interoperability of Retrieval

Interoperability of Retrieval

HARVEST as Search Engine

Kerstin Zimmermann

Oldenburg University

Hamburg August 2000

Page 2: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

Retrieval

Server / Archive

Workstation

PC

private

public

Page 3: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

Server Structure in Science

University page

Department / Facultiy

working groups

members

Publications

Page 4: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

Kind of Archives / Sources

For Documents

a) lists (name, title, date, link)

b) additional with abstracts

c) only fulltext

d) metadata and fulltext

Page 5: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

Formats indexed• sgml x

• xml x

• html X

• ps X Text, attention: do not use graficmode ASCII required

• pdf X Text, Destiller-Options: asciipdf=on, commpressed text= off exchange do not use optimize

• doc X

• rtf X

• tex X

• dvi X

Page 6: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

Harvest

WWW-SERVER

http://www.physik...

Dissertation GATHERER

BROKER

HARVEST

User

Internal Area

NWWW Browser

Result...................................

Request

Result

Page 7: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

Dublin Core MetaDataTitle

Creator

Subject

Desciption

Date

Publisher

Contributer

Type

Format

Identifier

Relation

Source

Language

Coverage

Rights

http://purl.org/DC/

15 Elements

Page 8: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

Source Code<META NAME="DC.Subject" CONTENT="(SCHEME=MathNet) People.Faculty/Staff"><META NAME="DC.Format" CONTENT="text/html"><META NAME="DC.Creator.Person.Name" CONTENT="Zimmermann, Kerstin"><META NAME="DC.Creator.Person.Address" CONTENT="(Email) [email protected]"><META NAME="DC.Creator.Person.Address" CONTENT="(Phone) + 49 (0) 441 798 3465"><META NAME="DC.Creator.Person.Address" CONTENT="(Fax) + 49 (0) 441 798 5649"><META NAME="DC.Creator.Person.Address" CONTENT="(Postal) Fachbereich Physik Carl von Ossietzky Universitaet D-26111 Oldenburg"><META NAME="DC.Creator.Person.IsMemberOf" CONTENT="DPG"><META NAME="DC.Creator.Person" CONTENT="(Keywords) Dipl.Phys., Wiss.Mit."><META NAME="DC.Subject" CONTENT=""><META NAME="DC.Relation" CONTENT="(SCHEME=url) "><META NAME="DC.Relation.References" CONTENT="(SCHEME=url) "><META NAME="DC.Date" CONTENT="1999-03-8"><META NAME="DC.Type" CONTENT="Text.Homepage"><META NAME="DC.Rights" CONTENT="These personal data may not be used for any commercial purpose or incorporated in mailing lists without written permission by the author. They are free for information and communication services of learned societies."><LINK REL="SCHEMA.dc" HREF="http://purl.org/DC/">

Page 9: Interoperability of Retrieval

Existent Gatherers and Brokers

www.mathnet.de

www.physik.uni-oldenburg.de/

EPS/PhysNet/

www.iuk-initiative.org/iwi/TheO/

Page 10: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

Search resultsHow they look like:

- list of results

- link to the index-file

- link to the fulltext

(- link to the word in text)

Page 11: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

Page 12: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

inhomogeneous

Page 13: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

Page 14: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

with Metadata

Page 15: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

How to create MetaData?

Using an online tool

Page 16: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

Global Harvest Serverstructure

global

national

Learned

field

Europewide

D D BO P A C

N D L TD

C h em is tryS U B

E d u c a tion a l S c ien ceU L

D eu ts ch er B ild u n g s -S erve r

In fo rm atic sC om p u te r c en tre

D ep artm en ts / In s titu tesU L

M ath em aticsIM P R E S S

D ep artm en ts / In s titu tesU L

P h ys icsP h ysD is

D iss B rokerTh eO

Page 17: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

Harvest - Configuration

Provider

Provider

Provider

Gatherer

Broker

Broker

gmb

SOIF

HTTP

GathererProvider

Provider

Page 18: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

@FILE { http://www.physik.uni-oldenburg.de/Docs/THEO3/publications/metadocs/ebs.shell.structure.htmlupdate-time{9}: 938935362url-references{208}: http://www.physik.uni-oldenburg.de/Docs/THEO3/publications/ebs.shell.structure.pdfmailto:[email protected]://www.physik.uni-oldenburg.de/Docs/THEO3/publications/ebs.shell.structure.pdftitle{59}: Shell Structure and Stability of Very Neutron-Rich Isotopeskeywords{97}:

and author date eberhard ebs files hilf isotopes neutron pdf rich shell stability structure very

head{16}: -Version 1.0 -->dc.type{59}: InProceedings(SCHEME=Freetext)publication-status=publisheddc.title{59}: Shell Structure and Stability of Very Neutron-Rich Isotopesdc.publisher{18}: IKDA, TH Darmstadtdc.language{18}: (SCHEME=Z39.53)ENGdc.format{15}: application/pdfdc.date{75}: (SCHEME=ANSI.X3.30-1985)1975(SCHEME=ANSI.X3.30-1985)(TYPE=current)19990408dc.creator{126}: Eberhard R. Hilf(TYPE=email)[email protected](TYPE=phone)+49-(0)441-798-2543(TYPE=fax)+49-(0)441-798-3201body{190}: =+4>Shell Structure and Stability of Very Neutron-Rich Isotopes Author:Eberhard R. Hilf Phone: +49-(0)441-798-2543 Fax:+49-(0)441-798-3201 Files: ebs.shell.structure.pdf Date: 1975md5{32}: bc1f2750a042a8175cce710030c60d76file-size{4}: 2440type{4}: HTMLgatherer-version{6}: 1.5.19gatherer-host{31}: egoiste.physik.uni-oldenburg.degatherer-name{17}: Physics Oldenburgrefresh-rate{5}: 86400time-to-live{7}: 3888000last-modification-time{9}: 928224570description{186}: =+4>Shell Structure and Stability of Very Neutron-Rich Isotopes Author:Eberhard R. Hilf Phone: +49-(0)441-798-2543 Fax:+49-(0)441-798-3201Files: ebs.shell.structure.pdf Date: 1975}

SOIF: Example

Page 19: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

<tags> and MetadataHTML Element SOIF-Element

<A HREF> url-reference{}

<ADDRESS> address{}

<H1 ... H6> headings{}

<TITLE> title{}

...

Metadaten SOIF-Element

DC.title dc.title{}

DC.author dc.author{}

...

Page 20: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

Harvest-Sources:

ftp://ftp.tardis.ed.ac.uk/pub/harvest/develop/snapshots/

More infos:

http://www.dissonline.org/harvest.html

Harvest links

Page 21: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

Port-Numbers• Harvest 8500

• Webserver http 80

• ftp 21 tcp

• telnet 23

• smtp (email) 25

• pop3 110

• time-server 123

Page 22: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

Why Harvest?

• set up portals for specific needs

• heterogeneous archives

• runs on different platforms

• Software public domain (lower costs)

• open sourcecode

• world wide community

Page 23: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

runtime and pack of dataDFN-Net 3 Docs pro Minute

connecting time see Browser

index [ms]

memory 9 MB

PhysDis (Jan.‘00) 306 ‚real‘ links

1475 Documents

112 Server

Gatherer 2h 4min

Page 24: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

Legal Aspects

§• searching a database- look for robot.txt

• Discussion in DC.Rights

- rights of the resource (un-)restricted access / use

- rights of Metadata

Page 25: Interoperability of Retrieval

Kerstin Zimmermann Oldenburg University

Topics of Discussion• Search depth

• fulltext vs metadata + abstract

• integration of old archives

• access

Questions, Commtents

-> [email protected]