Top Banner
Trier, 12. März, 2001 http://www.zib.de/groetschel [email protected] Konrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB) Martin Grötschel On the Road to Scientific Information Portals: Cooperative Digital Libraries Remarks, Visions, Proposals Martin Grötschel IuK 2001, Universität Trier
60

Martin Grötschel

Jan 14, 2016

Download

Documents

Hue

On the Road to Scientific Information Portals: Cooperative Digital Libraries Remarks, Visions, Proposals. Martin Grötschel. IuK 2001, Universität Trier. Contents. Introduction All Information is Part of the Web Can we make this true? The Visible Web and the Deep Web - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Martin Grötschel

Trier, 12. März, 2001

http://www.zib.de/[email protected]

Konrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB)

Martin Grötschel

On the Road to Scientific Information Portals:Cooperative Digital Libraries

Remarks, Visions, Proposals

Martin Grötschel

IuK 2001, Universität Trier

Page 2: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Contents

IntroductionI. All Information is Part of the Web

Can we make this true?

II. The Visible Web and the Deep WebIII. There could be an Interconnected

Network of Science IV. Integrating All Types of ResourcesV. We should Organize the Cyber SpaceVI. To the Benefit of our Society

Page 3: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Contents

IntroductionI. All Information is Part of the Web

Can we make this true?

II. The Visible Web and the Deep WebIII. There could be an Interconnected

Network of Science IV. Integrating All Types of ResourcesV. We should Organize the Cyber SpaceVI. To the Benefit of our Society

Page 4: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Personal Motivation• I have broad interests.• I (have to) search a lot.• I do find things I look for.• However, this process costs too much

time and money.• The „scientific information system“ could be much better.• It seems that some scientists have to get

involved.• The situation is similar with respect to

communication.

Page 5: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Acting Forces• Science drives Technology• Technology drives Change• Change induces Pressure

Some Consequences:• Higher Speed and Efficiency • Lower Costs• Universal Connectivity• More and Global Competition

What does this imply for Science?

Page 6: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

The World of Information• Tons of Printed MaterialZillions • of Scientific Web Sites• of E-Journals, E-Prints• of Databases and CD-Roms• of Multimedia Documents• of E-Mail• of Digital Photos and Videos• etc.

Page 7: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

The Players

• The Author• The Publisher• The Librarian• The Software Developer• The Service Provider• The Scientific Information Center• The Scientific Society• etc.

the

user

Page 8: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Some Unsolved Issues

• Accessability• Searchability• Stability• Compatibility• Pricing• Heterogeneity• Diversity and

Complexity of Structures

• Quality• Authenticity• etc.

Page 9: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Solution

• Scientists have to get involved• Solution must be user driven• Cooperation of players• Consensus about structures

Some Suggestions in this Talk

Page 10: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Contents

I. All Information is Part of the WebCan we make this true?

Page 11: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Current Mathematical Resources

• Papers and Preprints• Journals and Books• Reviews and Abstracts • Software and Data Collections• Projects and Persons• Voice, Images, and Video Information• Links, Mail, and Virtual Libraries

Page 12: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Math Papers and Preprints

• Preprints of the Math-Net• MPRESS (including ArXiv math,...)• EULER• Digital Library @ ACM

Page 13: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Math Journals and Books

• SUB Göttingen („Sondersammelgebiet“)• TIB Hannover (Tech Information Library)• ELib @ Uni Osnabrück • EMIS• Springer LINK• DOCUMENTA MATHEMATICA• Lehmanns.de

Page 14: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Math Reviews and Abstracts

• MATH @ Zentralblatt• MathSci @ AMS• MATHDI @ FIZ-Karlsruhe• Jahrbuch der Mathematik

Page 15: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Math Software and Data Collections

• Netlib @ ANL• eLib @ ZIB• MuPad @ Uni Paderborn• Algebraic Groups• Cinderella• OpenMath

Page 16: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Projects and Persons

• Web Sites of Math Research Institutes• Web Sites of Math Departments• BerNAM• Directory of Mathematicians @ ACM• Comb. Membership List AMS, SIAM,

MAA• PERSONA MATHEMATICA @ mat-net.de• SIGMA @ math-net.de

Page 17: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Voice, Images, and Video

• Computer Museum• MSRI Video Server• Electronic Geometric Models

Application Servers and Software• MATHEMATICA• Cinderella• Inverse Calculator

Page 18: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Links, Mail, and Virtual Libraries

• mathematik.de• Math-Net.de• Mathematical Archives• Opt-Net @ ZIB• MathML

Page 19: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

There are zillions ofThere are zillions ofMath Resources in the Math Resources in the

Net.Net.

Page 20: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

The Situation is Similar in all other Sciences

How do you know that all this material exists and where it is?

Old Approach: Link Lists = WWW Virtual Libraries

But, much more has come up in the recent years!

Page 21: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Is Everything in the Web?

• Printed Books• Printed Journals• CD-ROMs• Some Data Bases• Historic Archives• Catalog Cards• ...

are not electronically available

Page 22: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Is Everything from the Web in the Web?

Page 23: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Contents

I. All Information is Part of the WebCan we make this true?

II. The Visible Web and the Deep Web

Page 24: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

The Invisible / Deep WebA fundamental Problem with Search Engines: A Vast Amount of Information is Invisible• Surface Web / Web Robots Start at some „Hubs“

Interlinked Web Pages

• Deep Web Isolated Web Sites There are huge Isolated Islands in the Web Information within Databases, behind CGI Interfaces Information without Links (e.g. within OPACs of Libraries) Protected Material, Excluded Explicitly

Page 25: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

A Web Search Engine Collecting Visible Information

From „The Deep Web: Surfacing Hidden Value; BrightPlanet.com, Jan-2000“

Page 26: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

A Direct Meta Search Engine Fishing for Invisible

Information

From „The Deep Web: Surfacing Hidden Value; BrightPlanet.com, Jan. 2000“

Page 27: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Characteristics of the Deep Web

- in Comparison to the Visible Web -

• Public information is currently 400 to 500 times larger than the commonly defined World Wide Web

• 7,500 terabytes of information (550 Billion individual documents), compared to 19 terabytes (1 Billion documents)

From:The Deep Web: Surfacing Hidden Value; BrightPlanet.com, Jan 2000

Page 28: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Characteristics of the Deep Web

- in Comparison to the Visible Web -• More than 100,000 Deep Web sites currently exist60 of the largest Deep Web Sites collectively

contain about 750 terabytes of Information (... narrower, with deeper content)

More than half of the Deep Web content resides in topic specific databases (BrightPlanet concentrates on about 20,000 sites)

• A full 95% of the Deep Web is publicly accessible information – not subject to fees or subscriptions

• The Deep Web is the largest growing category of new information on the Internet. But theDeep Web is widely unknown.

From:The Deep Web: Surfacing Hidden Value; BrightPlanet.com, Jan 2000

Page 29: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Making the Deep Web VisibleTechnology:• Meta Search Engines• Bibliographic Meta Search Engines• Virtual Catalogs and Link ListsOrganisational Issues:• Building Networks of Digital Libraries• Forming Library and other Cooperatives• Working on Standards and Formats

(Common, Open, Metadata,...)

Page 30: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Categories of Information Systems

• Web Sites – Collection, Query Interface• Publications – E-Journals, Preprints, ...• Regional/Nat. Collections – Harvesting Systems• Topical Databases – Subject Specific Aggregation• OPACs – Library Holdings • Journal Archives – Archive of Publishers Software/Data

Collection – Commercial / Public Archive• Compute Servers – Math. Calculations /Demos• Mailing Lists/Archive – Topical Communication Forum• Topical Portals – Wide Spectrum Information System

Page 31: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Problems: Wide Variety of Servers

Problems with Search Engines (Web Robots) Impose High Load on Servers and Networks Perverted use of Metadata Robots can‘t see behind CGI-Interfaces Access Rights, Range of Licenses

Problems with Cascading Search Engines Diversity of data formats (MAB, MARC Formats,

DC, ...) Multitude of protocols (Z39.50, HTTP, proprietary)

Specialized Repositories and Archives Scientific Journals provided by Commercial Publishers Document Delivery Systems and Specialized Historic

Archives Maps, Music, Photos, Videos, Multimedia

Page 32: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Contents

I. All Information is Part of the WebCan we make this true?

II. The Visible Web and the Deep Web

III. There could be an Interconnected Network of Science

Page 33: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Virtual/Digital Library

• VirtualSearch indexLinksMetadataOPAC catalog

entries

• DigitalStructured digital

contentsFull textsData bases

Page 34: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Towards a Scientific Portalto Interconnect the Digital

WorldVirtual Library

Information Portal: Cooperative Virtual

DigitalDigital Library Scientific Library

The Scientific Portal (Information Portal for the Sciences)

is an Entry Pointto all Types of Information Products from the

Sciences.Behind the Scientific Portal is a Structured

Networkto be coordinated and organized by the

Sciences in a cooperative way.A Task for the IuK Initiative?

Page 35: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Lots of Examplesalready exist

Page 36: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

An Example in the Making

Virtuelle Fachbibliothek Technikder TIB Hannover

Page 37: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Example: The DOE Information Bridge

• Started in 1997 with 60.000 searchable full text reports online @ DOE Office of Scientific and Technical Information (OSTI)

• Direct Search based on the Distributed Explorer developed by a small Internet Company: Innovative Web Application Ltd. (IWA)

• A public version in partnership with the Government Printing Office (GPO) of the USA

• Many other Federal Deep Web collections added to the DOE Virtual Library PubScience PubMed NTIS Electronic Catalog (450,000 Titles) NASA Technical Report Server

• Energy Portal Search• Digitization efforts for Gray Literature (@ OSTI)

Page 38: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

OSTI Virtual Library

Page 39: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

PubScience

Page 40: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

The GrayLit Information Network

Graphic from „Searching The Deep Web; W.L. Warnick et al.“D-Lib Magazine, Vol. 7, No. 1, January 2001; www.dlib.org

Page 41: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Preprint Network

Page 42: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

DOE OSTI

Page 43: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Energy Portal Search

Page 44: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

PubMed

Page 45: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

NASA Image

Exchange

Page 46: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Federal R & D Architecture

Graphic from „Searching The Deep Web; W.L. Warnick et al.“D-Lib Magazine, Vol. 7, No. 1, January 2001; www.dlib.org

Page 47: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

An Observation

The Voluntary Work contributed so far was and will stay important.

There will, however, be no satisfactory solution without substantial amounts of

personal and financial investment.

We need to become more professional,e.g., Google versus Math-Net.

Page 48: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Contents

I. All Information is Part of the WebCan we make this true?

II. The Visible Web and the Deep WebIII. There could be an Interconnected

Network of ScienceIV. Integrating All Types of Resources

Page 49: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Distributed Meta Search Engines Exist

What they do:• Query Search Engines, OPACs, Databases• Perform Distributed Searches in Parallel• Cascade Search to reach Large/Vast Amounts of Targets• Deliver Links, Metadata, and/or Full Texts• Handle a Diversity of Data Structures• Use a Multitude of Internet/Web Protocols• Structure Heterogeneous/Large Result Sets

They Rely on a Series of Small Configuration Files

Page 50: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Combination of Search Engines

Integration of Information Offers

SI

Browser

HTTP

SI

Browser

HTTP

Z-Client

Z39.50

Aleph: MAB, USMARC

Browser

HTTP

HTTP

DS

DigiBib: "Dublin Core"

Browser

HTTP

Z39.50

DS

DigiBib+WebPack, Euler,{Aleph}: DC,MAB,USMARC

AltaVista:HTMLMath-Net:Harvest+DC

Windows GUI

HTTP

DS

AltaVista HotBot InfoSeek

"Web Ferret": HTML?

Windows GUI

SI

Z39.50

DS

KOBVGBV DDB

Scout: UNIMARC, MAB2

{Z-Client}

As studied by J. Lügger in „Über Suchmaschinen, Verbünde und die Integration von Informationsangeboten“; ABI-Technik, June, 2000

• Math-Net: Harvest+DC• KOBV Search Engine

• Shared Index• Distributed Search• Shared Index

• EULER and Dublin Core• DigiBib NRW

Page 51: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

A Potential Math Information Portal

SIHTTP

DigiBib with KOBV DigiBib with WebPack

Z39.50 withUNIMARC

Z39.50

DS

Browser

HTTP

HTTP

DS

Math-Net @ ZIB and @ Uni KölnSigmaNetLib SoftwarePersona Mathematica

EMS @ Zentralblatt für MathematikMATH, MATHDIJahrbuch für Mathematik

Universität OsnabrückELibMPRESS

Special Interest Groups of DMVOPT-NET, IM-Net, IuK, ...

Publishers and Software HousesE-Journals, Software

SUB GöttingenOPAC SSG Mathematik

TIB HannoverTIB CAT

CWI AmsterdamOPAC Mathematics

Mathematische Fachbereiche & InstituteSpecialized OPACs

Library CooperativesBVB, GBV, HBZ, KOBV, ...

Die Deutsche BibliothekAuthority Data

Publishers and Math SocietiesMath-Journals and -Document

DigiBibwithMath-Net

Z39.50with MAB2USMARC

OpenDistributedEfficientScalableStable

Page 52: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

ContentsI. All Information is Part of the Web

Can we make this true?

II. The Visible Web and the Deep WebIII. There could be an Interconnected

Network of ScienceIV. Integrating All Types of ResourcesV. We should Organize the Cyber Space

Scientists should Organize the Scientific Cyberspace Cooperatively (Summary and Proposals)

Page 53: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Organizing the Cyberspace: Suggestions

• Partners for the information portal?• Who should form the information

portals?• Organizational framework?Cooperative Digital Libraries

Main Issues: Sustainability and Finance

Page 54: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Partners of the Information Portal

• Scientific Libraries, Scientific Archives• Scientific Departments, Research Institutes• Database / Content Providers• Document Delivery Services• Digitization Centers• Scientific Societies• Publishers• Software Houses• Data (Collecting) Centers

Page 55: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Suggestions for an Information Portal

• Open Digital Archives of Specialized Collections• Scientific Suppliers Obtain Free Access• High Quality Information and Services• Robust/Commercial Software/Database• Distributed/Heterogeneous Architecture• Some Centralization is Necessary Too• Emphasis on Reliable/Long Term Availability• Activities in Long Term Archival• Supported by a Specialized Information

Center/Library• Cooperation with Scientific Societies

Not-for-Profit and For-Profit do not exclude each other.

Page 56: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Suggestions for an Organizational Framework

• University Level (local) University Library University Computing Center Cooperation University Media Center

• Scientific Level (topical/national) Specialized Library / Information Center Consulted by a Scientific Society Editorial Topical Competence Center

• National Level National Competence Center for New Technologies Research and Development for Production

Consultation Standardization / Coordination Activities

A Topical Competence Center may be hosted @ Research Institute.

Page 57: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Key Problems

• No progress without substantial investment

• Long term sustainability• No progress without further research and

development• Institutionalization (The IuK-Initiative can literally initiate , but can‘t run the show)

But the show must go on!

Page 58: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Contents

I. All Information is Part of the WebCan we make this true?

II. The Visible Web and the Deep WebIII. There could be an Interconnected

Network of Science IV. Integrating All Types of ResourcesV. We should Organize the Cyber

SpaceVI. To the Benefit of our Society

Page 59: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

Who Will Benefit

• Student: Access to Vast Amount of Materials• Employee: Further Training, Lifelong Learning• Teacher: Reuse of High Quality Materials• Author: Publishing Cheap, Fast, and Widely• Publisher: Open Sources Generate New Chances• Business: More Profit from Applying Science• Citizen: Contacting Research More Directly• Science: Communicating with the Public• Society: Free Flow of Information

Page 60: Martin Grötschel

Konrad-Zuse-Zentrum für Informationstechnik Berlin Martin Grötschel

The End