Top Banner
A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon Kadury & Elina Masevich
100

A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank1

Digital Libraries (DL): Awareness and Discovery

Ariel Frank

Dept. of Computer Science

Bar-Ilan University

Joint research withNir Yom Tov, Alon Kadury

& Elina Masevich

Page 2: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank2

Presentation motivation

Ad hoc and unsound use of Search Engines (SEs) does not help for retrieval of quality information on the Web.

Digital Libraries (DLs), on the other hand, provide high quality information retrieval of authoritative results, especially when doing exploratory search.

However, the awareness and discovery of DLs on the Web are still lacking.

So what can be done about it?

Page 3: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank3

Contents

• SEs vs. DLs?!

• DL Definition/Types

• How to tilt the balance of SE/DL use?

• SELFDL Model/Architecture

• RIDDLE Model/Architecture

• Future directions

Page 4: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank4

Google/SE Awareness

Page 5: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank5

So how to overcome Googlism?!

Page 6: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank6

Often heard sayings • “What – is there something to search with

besides search engines?”• “Sure I know all about search engines –

I always use google.”• “Sure I know all about directories –

I always use yahoo!”• “Sorry, never heard about digital libraries.”• “Listen, I’m used to classical libraries.”• “I can find only E-books in a digital library,

no?”

Page 7: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank7

Digital Library Vision?!

Page 8: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank8

Sample list of Digital Libraries

• LOC - Library of Congress American Memory (http://memory.loc.gov/ammem/)

• NSDL - National Science DL (http://nsdl.org)• IPL - Internet Public Library (http://www.ipl.org)• CDL - California DL (http://www.cdlib.org)• ADL – Alexandria DL (http://www.alexandria.ucsb.edu)• BL - British Library (http://www.bl.uk/)• NZDL – New Zealand DL (http://www.nzdl.org/)• Einstein Archives Online (http://www.alberteinstein.info/)

Page 9: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank9

Web

IndexIndex DDirectoryirectory

Search Engines

Which kind to use? The right one Which kind to use? The right one

SSearch earch EEnginengine

GeneralGeneral SpecialtySpecialty GeneralGeneral SpecialtySpecialty

Meta-SMeta-Search earch EEnginengine

Page 10: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank10

When not to use SEs?• You know it all.• You prefer asking friends (or paid experts ).• You know the Web site for it (and didn’t forget the

exact URL or have auto-completion or bookmark or can access through another known site).

• You already found a specific/relevant digital library or database (maybe in Invisible Web).

• Tired of paid inclusions, SE spamming, and sponsored commercial results.

• Tired of chasing down useless URLs.

Page 11: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank11

When to use an Index?• Need to search for a narrow piece of

information.

• Have a specific objective/site in mind.

• Want to find/rank many related Web sites.

• Want to factor quantity in (index has crawler based results).

• Need to check/fix spelling (based on Web statistics).

Page 12: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank12

When to use a Directory?• Clear about the exact topic of your query. • Need general information on a rather broad

topic/category.• Want to amass knowledge on a fairly wide subject.• Would like to browse (and then search) a certain area.• Want to factor quality in (directory has human-

powered results), not quantity. • Need information that is usually carefully evaluated

and even annotated.

Page 13: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank13

When to use a Meta-SE?• When single Basic-SE fails to provide good results. • One-stop shopping - prefer to search multiple

SEs/sites at once to get blended ranked results (so as to save effort/time).

• When the query is simple (complex fields/options don't usually work).

• Searching for multi-faceted topics. • Want to get clustered results to focus search on the

relevant keywords. • Looking for current events/news.

Page 14: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank14

When to use a Specialty-SE?• When general-SE fails to provide good results.

• When your target is very topic/technology specific.

• Want to find more than just Web pages/sites.

• Need more results from the Invisible Web.

• Want your search terms to more likely have the meanings you intended them to have.

Page 15: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank15

SE Quantity vs. DL Quality?

SE

DL

Page 16: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank16

SE vs. DL Potential Coverage

Resources

Relevant

SEDL

Page 17: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank17

Contents

• SEs vs. DLs?!

• DL Definition/Types

• How to tilt the balance of SE/DL use?

• SELFDL Model/Architecture

• RIDDLE Model/Architecture

• Future directions

Page 18: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank18

Classical (Analogical) Library

Page 19: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank19

So What is a Digital Library?

• There are scores of definitions.

• Most are very general and verbose . A managed collection of information,

with associated services, where the information is stored in digital formats and accessible over a network.Arms, William, Y., Digital Libraries, MIT press,

Cambridge, 2000.

Page 20: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank20

Definition - A Digital Library is:

1. Collection of digital objects

2. Collection of knowledge structures

3. Collection of library services

4. Library Categories: Domain, Focus & Topic

5. Quality Control

6. Preservation/Persistence

Page 21: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank21

1 .Collection of Digital Objects

• Documents (e.g., texts, HTML pages)

• Books

• Journals

• Multimedia (images, audio, video, etc…)

• Charts/Maps Data objects available

directly or indirectly

Page 22: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank22

2 .Collection of Knowledge Structures

• Metadata: Standards, Markup• Indices, Catalogs, Guides• Taxonomies, Ontologies, Thesauri• Dictionaries, Glossaries,

Concordances• Gazetteers• Abstracts/Summaries

Page 23: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank23

3 .Collection of Library Services

• Management (computerization, communication)• Collections development• Search (query formulation) and Browse interfaces• Multi-access/use for varied users• Online Help, Reference, Consultation• Logging, statistics and Performance Measurement

Evaluation (PME)• SDI: Selective Dissemination of Information (Push

mode)

Page 24: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank24

4 .Library Categories: Domain, Focus & Topic

• Domain: belongs to an area (DNS TLDs).– edu, com, org, gov, us, il, ac.il, co.il, …

• Focus: created to serve a certain community of users/patrons.– Academic, Public, National, School, …

• Topic: the subject of the collection; can be relatively finely-grained.– Law, Medicine, Music, Web, …

Page 25: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank25

5 .Quality Control

• Selection criteria.

• All material is assessed and authorized (“certified”).

• Adhere to licensing and copyrights.

• Use of Digital Rights Management (DRM).

• Integrity enforced (proven quality).

• Use of filtering.

• Support for profiling/stereotyping.

Page 26: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank26

6 .Preservation/Persistence

• Access and usage is long term

• Serves as an archive

• Scanning and digitization

• Quality reproduction of material

• Material persistency

– paper vs. digital media

– digital formats (software tools)

Page 27: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank27

Need for a delicate balance

Page 28: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank28

Basic SE (BSE)

Meta SE (MSE)

Popularity SE (PSE)

Stand-alone DL (SDL)

Harvested DL (HDL)

Federated DL (FDL)

Digital Library

(DL)Search Engine

(SE)Directory

(Catalog, Guide, Subject Gateway)

Web Repositories Hierarchy

Page 29: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank29

Types of DLs

• Stand-alone Digital Library (SDL) – also self-contained, several collections

• Federated Digital Library (FDL)– also confederated, networked

• Harvested Digital Library (HDL)– also distributed

Page 30: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank30

Stand-alone Digital Library (SDL)

• The regular (classical) DL.• Implemented locally in a fully

computerized fashion, with networked access.

• Self-contained material:– edited/generated– scanned/digitized– purchased

• Single or Several digital collections.

Page 31: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank31

Federated Digital Library (FDL)

• Contains several autonomous libraries.• Based on common focus and topic.• Usually heterogeneous repositories.• Connected via a network.• Forms a flat unified library.• Transparent user interface.

The major problem is interoperability

Page 32: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank32

Harvested Digital Library (HDL)

• Virtual library providing metadata-based access to relevant items distributed over the network.

• Objects harvested into metadata (protocol was Harvest/SOIF, nowadays OAI-PMH can be used).

• Harvests digital objects, not full DLs.

• But has regular DL characteristics.

Page 33: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank33

SDL vs. HDL

Single Digital Library Harvested Digital Library

Items Origin Purchased/Digitized Gathered

Items Location Local/Networked Scattered

Material Items+Catalog Catalog

Repository Size Large Small

Update Medium Fast/Dynamic

CompositionMethod

Interoperability Inherent

Page 34: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank34

Parallel Evolution of SEs and DLsSearch Engines Search Engines

GenerationsGenerationsDigital Libraries Digital Libraries

GenerationsGenerations

1st Generation – Basic SE (BSE)includes Robots, Indices, Directories,basic/advanced user interfaces.

1st Generation – Stand-alone (SDL)local, classical, focused material, digitized or scanned.

2nd Generation – Meta SE (MSE) uses several basic-SEs simultaneously (federated search), ranks gathered pages by relevancy.

2nd Generation – Federated (FDL)Comprised of autonomous SDLs representing related, possibly heterogeneous, network repositories

3rd Generation – Popularity SE (PSE)uses link analysis and use frequency measures to filter and rank the Web pages.

3rd Generation – Harvested (HDL)contains only summaries and metadata structures; domain focused, of fine granularity.

Page 35: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank35

Contents

• SEs vs. DLs?!

• DL Definition/Types

• How to tilt the balance of SE/DL use?

• SELFDL Model/Architecture

• RIDDLE Model/Architecture

• Future directions

Page 36: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank36

Why are SEs overused?• I always use Google/Yahoo!

• It’s just a quick search!

• The truth? – not sure what I’m looking for.

• I’m too used to using SEs.

• SEs are more general, no?

• SEs always give me enough answers.

• SEs don’t care what my topic/domain is!

Page 37: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank37

SE vs. DL - Server Side

Search Engine (Harvested) DL

Effort Complex Undertaking Medium

Emphasis Quantitative Qualitative

Content Global/Shallow Focused/Annotated

Repository Huge Small

MaintenanceContinuously Updated

(Robots)Dynamically Updated

Page 38: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank38

SE vs. DL - Client Side

Search Engine (Harvested) DL

Interest Sudden Lasting

Query Ad-hoc Sounder

Use Short Term Medium Term

Information Returned A lot Modest

Quality Noisy Clean

Sift/Filter Manual not much needed

Distribution Pull Mode Both Pull/Push Mode

Page 39: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank39

So what was the message ?

Page 40: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank40

Qualitative IR from Digital Library!?

Fact: Quantity orientation in SE. Fact: Quality orientation in DL.

? Assumption: Accessible DLs in sought after domain.

? Assumption: Usable information retrieval interfaces for DLs.

Result: High quality information retrieval from digital libraries!

Page 41: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank41

Why are DLs underused (social)?• Too used to classical libraries (fond memories).

• No public awareness (an unknown entity).

• No public relations (unlike for Portals/SEs).

• No money in it (marketing, banners, services).

• If It’s a library, you have to pay to use it, no?

• Are DLs up-to-date at all (as much as SEs)?

• No DLs in my language (localization).

Page 42: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank42

Why are DLs underused (general)?• Portals don’t offer DLs (services).

• Aren’t DLs part of the Invisible/Deep Web?

• DLs are just for experts!

• Many interests – will need to know many DLs.

• How to find them at all (need to startjump)?

• How to find relevant ones (sounds like search).

• How to find the right one (too many around).

• Lack of domain coverage (no DL in my area).

Page 43: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank43

Why are DLs underused (technical)?• SEs crawl/index DLs, no?

• Aren’t directories enough?

• Aren’t SSEs (Specialized SEs) enough?

• Too focused/limited (too fine granularity).

• Need know-how to use DLs (unlike for SEs).

• Non-usable interfaces (not user-friendly).

• Mostly textual, not multimedia (like SEs are).

Page 44: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank44

DL Awareness & Discovery Problems

• Lack of use and familiarity with DLs.

• Hard to locate and identify DLs scattered around the Web.

• Not enough metadata kept for and on the DLs.

• DLs topic and focus and user interfaces are not always clear and usable.

Page 45: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank45

So how to tilt the balance of SE/DL use?

Page 46: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank46

Sample (Digital) Library Directories

• Berkeley LibWeb (Library Servers via Web) – http://sunsite.berkeley.edu/Libweb/

• Academic Info: Digital Libraries – http://www.academicinfo.net/digital.html

• Google Directory: Digital Libraries – http://directory.google.com/Top/Reference/Libraries/Digital/

• Librarians’ Index to the Internet – http://lii.org/

Page 47: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank47

Use General SEs and DL Directories?

• Why can’t just use large general SEs?– noisy results, metadata not sufficient,

too many (re)tries to get relevant results.

• Why can’t just use existing DL Directories?– messy categorization, non-friendly UI,

not all libraries are DLs, not really DL Directories.

Page 48: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank48

Some possible directions/solutions

• Get SEs to better index, reference, and advertise DLs.

• Provide specialized SEs for locating DLs.

• Construct and enhance DL directories.

• DL coverage of more topics/domains.

• Employ SE like interfaces in DLs:– user-friendly interface (Google-like)

– easy-to-use site (usability like in SE)

Page 49: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank49

If more time ... we could SEEk more

Page 50: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank50

Theory vs. Practice?

Page 51: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank51

Contents

• SEs vs. DLs?!

• DL Definition/Types

• How to tilt the balance of SE/DL use?

• SELFDL Model/Architecture

• RIDDLE Model/Architecture

• Future directions

Page 52: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank52

SELFDL Goals

Search Engine Locator For Digital Libraries

• Discover/identify/classify/generate DL resources/sites in the (in)visible Web.

• Supply search tools for users to find relevant DLs for their needs.

• Provide better, usable (thin) interfaces for locating DLs.

• Raise awareness, knowledge, discovery and use of DLs.

Page 53: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank53

Naming

Page 54: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank54

SELFDL Model/Architecture

Index Directory Meta

Page 55: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank55

SELFDL – gateway to world of DLs

Page 56: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank56

SELFDL techniques

• Harness SE technologies to locate DLs on the Web using:– Extractors: Extract DLs from DLs directories.– Crawlers: focused crawl in search of DLs.– Scripts: Interface with Google/Yahoo APIs.

• Use site analysis (search for DL terms).

• Support Extended DC (Dublin Core) metadata for each DL.

• Provide SELFDL database indexing.

Page 57: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank57

DLs Identification test

• Manual collection of a list of 65 terms that could be indicative that a Web site is a DL.

• Check if there is statistically significant connection between each of the terms and the fact that a Web site is a DL.

• Initial statistical test included 100 manually identified DLs and a 100 random Web sites.

• The statistical measure used (in SPSS) was Cross tabulation, tested with Chi-square, phi coefficient and Cramer’s V.

Page 58: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank58

Results of DLs Identification test

• Terms that have been found to be statistically significant:

1. documents, book(s), journal(s), electronic/internet/web resource(s)

2. catalog(s)/catalogue(s)3. ask a librarian, patron(s)4. digital library, library, digital collection(s)5. copyright(s)6. preservation/preserve, digitization/digitize

Page 59: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank59

SELFDL Directory UI

Page 60: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank60

SELFDL Directory classifications

Topic Focus Domain

Digital Library

DDC Breeding IANA

Countries - .IL

Commercial - .COM

Educational - .EDU

Children

Academic

Professional

Life Science: DDC 570

Earth Science: DDC 550

Biology: DDC 574

Page 61: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank61

Example DDC topic’s tree

Page 62: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank62

SELFDL Directory results example

Page 63: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank63

Advantages of SELFDL Directory

• Contains just DLs.

• Better classification/perspective based on domain/focus/topic.

• Provides user-friendly interface;like Google Directory.

• Additional metadata (based on DC).

Page 64: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank64

SELFDL Index UI

Page 65: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank65

SELFDL Index

• Results from Web focused crawling.• Can be searched for specific DL criteria:

– keywords– DL type (SDL, FDL, HDL)– DL media/content (audio, E-books,

E-serials, theses, movies, etc…)– Protocol support (OAI-PMH)

Page 66: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank66

SELFDL Index example queries

topic:biology domain:com

algebra domain:com source:crawler

focus:children type:SDL

protocol:OAI

topic:math media:ebooks

Page 67: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank67

SELFDL Index results example

Page 68: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank68

Advantages of SELFDL Index

• Built according to insights/techniques of various studies in the field.

• Supports directory and crawler results.

• Provides specialized SE for DLs.

• Easy to use query interface.

• Supports advanced keywords search.

Page 69: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank69

SELFDL Meta

Page 70: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank70

SELFDL Meta Engine

• Can be searched for DL keywords like in an ordinary search engine.

• Intersects SE (i.e., Google/Yahoo API) results with SELFDL database to extract the current DLs to be returned as query response.

• Performs like a regular SE – convenient for public use.

Page 71: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank71

YAHOO!YAHOO!

SELFDL intersects with Google & Yahoo! results

SELFDLSELFDL GoogleGoogle

Relevant DLs

Page 72: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank72

SELFDL Meta results example

Page 73: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank73

Google “Sponsored” DL Interface

Page 74: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank74

Advantages of SELFDL Meta

• Provides all the advantages of the SELFDL model (UI, metadata).

• Supports query interface for terms, like existing SEs.

• Supports intersection between SEs results and relevant DLs.

• Supports different orders of results.

Page 75: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank75

SELFDL prototype testing methods

• Efficiency measures were computed for Directory and Meta.

• Satisfaction surveys were given to users before and after SELFDL use.

• A check was carried out to find the best GUI for SELFDL (regular or Google-like).

Page 76: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank76

Efficiency testing methods

• Series of queries were evaluated for results relevancy.

• The F-measure was used as the efficiency measure.

Where:

P – Precision of results

R – Relative recall of results

F – Weighted harmonic average of P & R = 2PR/(P+R)

• The two components tested were SELFDL Meta and

SELFDL Directory.

Page 77: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank77

SELFDL Directory vs. DL Directoriesהשוואת מדדי יעילות

0

0.2

0.4

0.6

0.8

1

1.2

Fהחזרדיוק

SELFDL Directory

Academic Info

Google Directory

Yahoo Directory

R P

Page 78: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank78

SELFDL Meta vs. Google & Yahooהשוואת מדדי יעילות

0

0.1

0.2

0.3

0.4

0.5

0.6

Fהחזרדיוק

SELFDL Meta

Google

Yahoo

R P

Page 79: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank79

Users’ satisfaction surveys

1. Usability of Web utilities.2. Ease of locating DLs.3. Ease of identifying if site is DL.4. DL results relevance.5. DL metadata readability.

Page 80: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank80

Google DL Interface

Page 81: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank81

Contents

• SEs vs. DLs?!

• DL Definition/Types

• How to tilt the balance of SE/DL use?

• SELFDL Model/Architecture

• RIDDLE Model/Architecture

• Future directions

Page 82: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank82

RIDDLE Goals

Resource Inquiry and Discovery in a DL Environment

• Enable creation of HDLs by harvesting (filtering) relevant SDLs using OAI-PMH.

• Enable construction of HDLs based on composition of lower-level HDLs, so as to increase the coverage of DLs’ topics.

• Enable information exchange with SELFDL.

• Raise awareness, knowledge, discovery and use of DLs.

Page 83: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank83

Example of topics’ composition

University

Life Sciences

Exact Sciences

Social Sciences

ChemistryComputer Science

HardwareSoftware

Page 84: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank84

OAI-PMH Protocol

• OAI-PMH - Open Archive Initiative (OAI) Protocol for Metadata Harvesting

• Tackles lack of uniformity and interoperability between data repositories, that make information sharing between repositories difficult.

• Addresses these problems by defining the way queries are sent to repositories and the way answers are received.

• Mandates at least one format of metadata for repositories use – Dublin Core (DC).

Page 85: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank85

RIDDLE Model/Architecture

Enhanced OAI-PMH

Layer 4 – Aggregated Service Providers

HDL

Layer 1 – Internet

SDL SDL SDLLayer 2 –

Data Providers

Layer 5 – Presentation

Layer 3 – Service Providers

Web interfaces

Aggregated HDLs

Web

HDL

OAI-PMH

Page 86: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank86

Use of OAI-PMH for FDLs/HDLs

• OAI-PMH was planned to support harvesting, as manifested in its name, and also in its design (i.e., selective harvesting using “Sets”).

• However, the number of FDLs that use the protocol is relatively large, while there very few HDLs that employ it.

• Since HDLs, unlike FDLs, filter the information, and not just federate it, we investigate ways by which HDLs can filter information using the OAI-PMH protocol.

Page 87: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank87

Levels of information filtering

• There are 3 levels where information filtering can be done, though each level has its various problems, mostly caused by lack of uniformity between SDLs:1. Item-level metadata –

relates to problems with the use of DC entries (that are well known).

2. Group-level metadata – the use of OAI-PMH Sets for selective harvesting is not well defined, so it can not be easily used for relating to groups of items.

3. Library-level metadata – description of the metadata of this level is not well defined.

Creation of HDLs using OAI-PMH is not fully supported.

Page 88: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank88

Suggested extensions to OAI-PMH

• Since lack of uniformity in SDLs using OAI-PMH prevents effective creation of HDLs.

• Provide for better harvesting/filtering capabilities from SDLs, by (re-)use of standards, as follows:1. Item-level metadata –

use of extended DC for metadata description, instead of just DC.

2. Group-level metadata – use of a DDC topic as a defined Set identifier.

3. Library-level metadata – use of extended DC for the library description field in the OAI-PMH Identify verb.

Page 89: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank89

The RIDDLE Prototype

• Provides for regular creation of FDLs.• Enables creation of HDLs by harvesting/filtering the

relevant SDLs.• Supports HDL aggregation based on DDC hierarchy.• The user search results return not only items matching

the query but also HDLs and SDLs related to the indicated topic.

• The user can search the HDLs hierarchy (by textual or directory search) for a specific HDL and further down the aggregated HDLs tree.

Page 90: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank90

RIDDLE entry page

Page 91: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank91

Sample results page, first entry an HDL

Page 92: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank92

HDL aggregation

• The HDL aggregation capability is based on:– use of the DDC topics hierarchy.– assigning each HDL a suitable DDC topic

identifier. – providing it with an OAI-PMH interface, similar

to the what data providers have, thus enabling and supporting a HDLs hierarchy.

– supporting both offline and online construction and corresponding search.

Page 93: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank93

Directory search with topics

Page 94: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank94

RIDDLE Experimentation

• Several tests where carried out, as follows:1. The quality of information retrieval when using a

specific HDL vs. use of several FDLs.

2. Ease of discovering and using the aggregated HDLs.

3. User preferences in searching several FDLs vs. use of aggregated HDLs.

• Initial testing indicates that use of HDLs and aggregated HDLs are more efficient when compared to the use of separate FDLs.

Page 95: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank95

Efficiency measures for RIDDLE

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

PrecisionRecallF-measure

HDLs

FDLs

Page 96: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank96

Contents

• SEs vs. DLs?!

• DL Definition/Types

• How to tilt the balance of SE/DL use?

• SELFDL Model/Architecture

• RIDDLE Model/Architecture

• Future directions

Page 97: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank97

Future directions

• Better locating, identification and ranking of DLs and their categories/types.

• Conduct wider, more significant, tests using SELFDL and RIDDLE.

• Publish a beta Web version of SELFDL and RIDDLE for public use/feedback.

• Better integration between SELFDL and RIDDLE.

• Investigate awareness and discovery of DLs on the Web.

Page 98: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank98

References• Sharon, T. & Frank, A., “Digital Libraries on the Internet”,

IFLA'00 66th IFLA Council and General Conference, 13-18, Jerusalem, Israel, August 2000, http://www.ifla.org/IV/ifla66/papers/029-142e.htm

• Hanani, U. & Frank, A., “The Parallel Evolution of Search Engines and Digital Libraries: their Convergence to the Mega-Portal”, ICDL'00 Kyoto Intl. Conf. on Digital Libraries: Research and Practice, 269-276, Kyoto, Japan, November 2000, http://csdl.computer.org/comp/proceedings/kyotodl/2000/1022/00/10220211abs.htm

• Yom Tov, N. & Frank, A., “Harnessing Search Engine Technologies to Increase Awareness and Discovery of Digital Libraries”, 4th IEEE Intl. Conf. on IT: Research and Education (ITRE), Tel-Aviv, October 2006.

• Kadury, A. & Frank, A., “Harvesting and Aggregation of Digital Libraries in the OAI Framework”, WEBIST 2007, 3rd Intl. Conf. on Web Information Systems and Technologies, 441-446, Barcelona, Spain, March 2007.

Page 99: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank99

Bibliography

• Arms W. Y., Digital Libraries, MIT Press, Cambridge, 2000.• Hill, L., Buchel, O., Janée, G. & Lei, Z. M., “Integration of

Knowledge Organization Systems into Digital Library Architectures”, Position Paper for 13th ASIS&T SIG/CR Workshop, “Reconceptualizing Classification Research”, 62-68, Philadelphia, PA, 2002.

• Pace A. K., The Ultimate Digital Library, American Library Association, Chicago, 2003.

• Lossau N., “Search Engine Technology and Digital Libraries: Libraries Need to Discover the Academic Internet”, D-Lib Magazine, Vol. 10, No. 6, June 2004.

• Summann F. & Lossau N., “Search Engine Technology and Digital Libraries: Moving from Theory to Practice”, D-Lib Magazine Online, Vol. 10, No. 9, September 2004.

• Lippincott J. K., “Net Generation Students and Libraries”, EDUCAUSE Review, Vol. 40, No. 2, March/April 2005.

Page 100: A.Frank 1 Digital Libraries (DL): Awareness and Discovery Ariel Frank Dept. of Computer Science Bar-Ilan University Joint research with Nir Yom Tov, Alon.

A.Frank100

Still around)?-: