Handling Heterogeneous Data Sources – An ETANA-DL Case Study Unni Ravindranathan, Rao Shen, Marcos André Gonçalves, Weiguo Fan, Edward A. Fox, James W. Flanagan [email protected]http://fox.cs.vt.edu Virginia Tech, Blacksburg, VA, USA (and ECDL 2004, Bath, England, September 2004
71
Embed
Prototyping Digital Libraries Handling Heterogeneous Data Sources – An ETANA-DL Case Study
Prototyping Digital Libraries Handling Heterogeneous Data Sources – An ETANA-DL Case Study. ECDL 2004, Bath, England, September 2004. Unni Ravindranathan, Rao Shen, Marcos Andr é Gon ç alves, Weiguo Fan, Edward A. Fox, James W. Flanagan [email protected] http://fox.cs.vt.edu - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Prototyping Digital Libraries Handling Heterogeneous Data Sources – An ETANA-DL Case Study
Unni Ravindranathan, Rao Shen, Marcos André Gonçalves, Weiguo
Fan,Edward A. Fox, James W.
Flanagan
[email protected] http://fox.cs.vt.eduVirginia Tech, Blacksburg, VA, USA (and CWRU)
ECDL 2004, Bath, England, September 2004
Acknowledgements(Selected)
Sponsors: NSF grant ITR-0325579; AOL, ASOR, CWRU, ETANA, Vanderbilt U., Virginia Tech
Faculty/Staff: Lillian Cassel, Debra Dudley, Roger Ehrich, Manuel Perez, Naren Ramakrishnan
VT (Former) Students: Aaron Krowne, Ming Luo, Fernando Das Neves, Ricardo Torres, Hussein Suleman
Acknowledgements (contd.)
• Karen Borstad, MPP
• Douglas Clark, Walla Walla College
• Joanne Eustis, CWRU
• Nick Fischio, CWRU
• Paul Gherman, Vanderbilt U.
• Andrew Graham, U. Toronto
• Tim Harrison, U. Toronto
• Larry Herr, Canadian University College
• Christopher Holland, LRP
• Paul Jacobs, Mississippi State U.
• Douglas Knight, Vanderbilt U.
• Stan LaBianca, Andrews U.
• David McCreery, Willamette U.
• Eric Meyers, Duke U.
• Adam Porter, Illinois College
• Jack Sasson, Vanderbilt U.
• Tom Schaub, Indiana U. of Penn.
• Randall Younker, Andrews U.
Outline
Problems Background Approach ETANA-DL ETANA-DL Prototype System
Modeling ETANA-DL ETANA-DL Services
Analysis Conclusions Future Work
Problems
Interoperability among heterogeneous archaeological systems
Delay in publication of primary archaeological data
Lack of sustainable solutions to long-term preservation of valuable information
Lack of services useful to the archaeology community, including “traditional DL services”
Difficulty in understanding complex archaeological information systems
Difficulty in requirements elicitation for archaeological systems
Outline
Problems Background Approach ETANA-DL ETANA-DL Prototype System
Modeling ETANA-DL ETANA-DL Services
Analysis Conclusions Future Work
Open Archives Initiatives
Promotes interoperability among DLs Open Archives Initiative Protocol for
Metadata Harvesting (OAI-PMH) Data Provider
• possess metadata and share it (internally / externally)• via well-defined OAI protocols (e.g., database servers)
Service Provider• harvest data from Data Providers• provide higher-level services to users
Traditional Digital Libraries
?1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video?Monolithic
and/orCustom-built
web-basedapplication
Users Digital Library
Digital Objects
Introduction to ODL(Open Digital Libraries)
Open Digital Libraries• Framework for componentized Digital Libraries
• Design principles for components• Protocols for inter-component communications
• Built upon OAI
Open Digital Libraries Approach
Users ETANA-DL Sites
1010100101010010101010010101010101010101
1010100101010010101010010101010101010101
Bone
Search Filter
Union
Recent
Browse
US
ER
INT
ER
FA
CE
Filter
1010100101010010101010010101010101010101
1010100101010010101010010101010101010101
Seed
1010100101010010101010010101010101010101
1010100101010010101010010101010101010101
Figurine
1010100101010010101010010101010101010101
1010100101010010101010010101010101010101
Pottery
Basic ODL Model: An application for Archaeology
OAI Data Provider
OAI-PMH
ODL Protocol
User Interface
Nimrin
ETANA-DLUnion Catalog
OAI-PMH
ETANA-DL Search Engine
ODL Service ProviderComponent
WWW Interface
ODL Protocol
ODL Protocol
Componentized services example
User
SearchHandlerServlet
Query
Results
IRDBSearchEngine
User Interface
IndexDB
Query in the IRDBquery language
Results in XML
QueryParsed XML
5S Model – Informally
Digital libraries are complex information systems that:
• help satisfy info needs of users (societies)• provide info services (scenarios)• organize info in usable ways (structures)• present info in usable ways (spaces)• communicate info with users (streams)
Outline
Problems Background Approach ETANA-DL ETANA-DL Prototype System
Modeling ETANA-DL ETANA-DL Services
Analysis Conclusions Future Work
Solution – our approach
Applying and extending Digital Library (DL) techniques to solve the following problems: interoperability, making primary data available, data preservation
Modeling archaeological information systems using 5S theory to better understand the domain and design the system and the supported services
Rapidly prototyping DLs that handle heterogeneous archaeological data using componentized frameworks: requirements elicitation, provide useful services.
Outline
Problems Background Approach ETANA-DL ETANA-DL Prototype System
Modeling ETANA-DL ETANA-DL Services
Analysis Conclusions Future Work
ETANA-DL
Archaeological Digital Library Applies and extends the OAI-PMH
• Open Archives Initiative Protocol for Metadata Handling
Value-added• Annotation• Items of interest (Binding service)• Recent searches/discussions• User management
Searching: Search Interface
Searching: Search Results
Searching: Advanced Search
Searching: Advanced Search Results
Multi Dimensional Browsing
Site structur
e
Temporal
Object-specific
User context
Searching within a Context
Searching within a Context: Search Results
Restoring Browsing Contexts
Object Comparison: Selecting Objects for Comparison
Object Comparison: Editing Attributes
Object Comparison: Editing Attributes
Object Comparison: Comparing Objects
Object Comparison: Comparison Results
Marking items
Viewing marked items
Remarking items
Discussion Board (Annotation): View Messages
Discussion Board (Annotation): Post Messages/Replies
Collections Description
Other services
Items of Interest (Binding service) Recent searches/discussions Recommendation User management
Account creation Login
Items of Interest: Binding Service
Recent Searches/Discussions
Recommendation
User Management: New User Account
User Management: Login
User Management: Navigations
Outline
Problems Background Approach ETANA-DL ETANA-DL Prototype System
Modeling ETANA-DL ETANA-DL Services
Analysis Conclusions Future Work
Heterogeneous data handling
SiteArtifact
TypeOriginal data source
Number of attributesin original
record
Number of attributes in
harvested record
Number of records
harvested
Lahav FigurineTab-delimited
text file15 18 564
Nimrin
Bone field record
Table in Oracle DB
21 24 7420
Seed field record
Table in Oracle DB
12 15 430
UmayriBone field record
2 tables in Access DB
8 24 2123
Total 10537
Heterogeneous data handling
SiteData Analysis (in hours)
Data Mapping (in hours)
Data Provider Implementation(in hours)
Service Provider Implementation(in hours)
Lahav 48 144 4 1
Nimrin 48 48 4 1
Umayri
24 48 4 1
Total 120 240 12 3
Heterogeneous data handling
32%
64%
3% 1%
Data Analysis
Data Mapping
Data Provider Implementation
Service Provider Implementation
Rapid prototyping: Lines of Code
Type of Service
LOC for implementing service
LOC reused from components
Total LOC
Reuse Percentage
Componentized
350 3630 3980 91
Non-componentized
7950 - 7950 -
Total 8300 3630 11930 30.4
Rapid prototyping: Service development times
28%14%
58%
35%27%
38%
Requirements Analysis and Design
Implementation
Testing
Componentized Services
Non-componentized
Services
User Analysis
Initial comments from all 3 projects, plus others interested in ETANA-DL
Positive feedback – users liked:• Data integration• Prototype cross-collection information
access services• Information structuring• Utility of supported services
Negative feedback – user concerns:• Need for service enhancements• Usability
Outline
Problems Background Approach ETANA-DL ETANA-DL Prototype System
Modeling ETANA-DL ETANA-DL Services
Analysis Conclusions Future Work
Conclusions
• Apply 5S to the archaeological domain• Identified requirements for future
versions of system• Extensible and componentized
approach for handling heterogeneous archaeological data from disparate sources
• Rapidly generated prototype archaeological DL
• Making primary archaeological data available without significant delay
Outline
Problems Background Approach ETANA-DL ETANA-DL Prototype System
Modeling ETANA-DL ETANA-DL Services
Analysis Conclusions Future Work
Future Work
Componentizing current DL services Creating next-generation DL services
from expanding set of requirements Integrating richer content (Semi-)automatic data mapping Automating the ingest of DL content Enhancing interface capabilities Formal usability studies
Visual Browsing
Visual BrowseBy sites
Visual Browsing: Topographical Drawings
Full site North west quadrant
Square:N40/W20
Visual Browsing: Square information
Loci layout
Square:N40/W20
Locus: 86
Visual Browsing: locus sheet
Publications
1. U. Ravindranathan, R. Shen, M. A. Goncalves, W. Fan, E. A. Fox, J. W. Flanagan. ETANA-DL: A Digital Library for Integrated Handling of Heterogeneous Archaeological Data. To be presented at the ACM-IEEE Joint Conference on Digital Libraries (JCDL 2004), Tucson, AZ, June 7-11, 2004.
2. U. Ravindranathan, R. Shen, M. A. Goncalves, W. Fan, E. A. Fox, J. W. Flanagan. ETANA-DL: Managing Complex Information Applications – An Archaeology Digital Library. Demo to be presented at the ACM-IEEE Joint Conference on Digital Libraries (JCDL 2004), Tucson, AZ, June 7-11, 2004.
3. U. Ravindranathan, R. Shen, M. A. Goncalves, W. Fan, E. A. Fox, J. W. Flanagan. Prototyping Digital Libraries Handling Heterogeneous Data Sources – The ETANA-DL Case Study. European Conference on Digital Libraries (ECDL 2004), Bath, U.K., September 12-17, 2004 (submitted).