Top Banner
A Comprehensive Information Retrieval Portal for Canadian Scientific Researchers Research Proposal for CISTI Andre Vellino August 2006
23
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Vellino presentationtocisti

A Comprehensive Information Retrieval Portal for Canadian Scientific Researchers

Research Proposal for CISTIAndre Vellino

August 2006

Page 2: Vellino presentationtocisti

Overview

Context: CISTI Strategic Plan Proposal Statement System Architecture Proposal Components Partnerships Outcomes and Draft Workplan Andre’s Relevant Experience

Page 3: Vellino presentationtocisti

Holy Grail

“It’s easy to say what would be the ideal online resource for scholars and scientist: all papers in all fields, systematically interconnected, effortlessly accessible and rationally navigable from any researcher’s desk, worldwide, for free”

Stevan Harnad, 1999Professor of Cognitive Science

University of Southampton

Page 4: Vellino presentationtocisti

Excerpts from CISTI Strategic Plan

“Goal 1: Provide universal, seamless, and permanent access to information for Canadian research and innovation.”

“Canadians look to CISTI to deliver distilled, aggregated, and validated information that is relevant to their research and innovation activities.”

“Available at the client’s desktop, these services are provided through a technologically sophisticated infrastructure.”

“[All users] will have electronic access at their desktop to a wealth of national and international STM information resources, supported by intelligent search and analysis tools and expert advice.”

Page 5: Vellino presentationtocisti

Proposal Vision

To develop a web-based information portal that offers universal, seamless access to highly relevant, distilled and aggregated SMT information using intelligent search and analysis tools that support scientific innovation.

Page 6: Vellino presentationtocisti

High Level Functional Architecture

Content AggregatorOpenURL Resolver

Web ApplicationServer

User Agents

CollaborativeFiltering

PersonalizedScientificLiteratureResearchPortal

Commercial Science Publishers

LitMinerContent Analysis

Taste (open source)

Personalization Engine

CISTI & UniversityLibraries

Page 7: Vellino presentationtocisti

Proposal Components

User Needs Content Aggregation Collaborative Filtering Content Mining Results Visualization Partnerships

Page 8: Vellino presentationtocisti

User Needs

Customers of CISTI services and content are elite – highly educated and exacting in their requirements;

Compared to mass-market or intranet commercial search-portals, the number of CISTI end-users is small (30,000 – 100,000);

User needs are (likely) varied but focused: e.g. bibliographic literature searches / peer reviews / competitive analysis / historical research;

Contribution to “innovation” can be measured (in the short term) by asking the user directly.

Page 9: Vellino presentationtocisti

User Profiling

Enables Customized services

Alerts / Notifications Higher precision search results

Greater user satisfaction Item and User based recommender system

Broadens scope of search to semantically cognate but otherwise disparate domains

Page 10: Vellino presentationtocisti

Content Aggregation

Most end users will (likely) not care where the information they seek resides;

Results for a search should show that many sources are available and provide links to these sources (Open Access / Commercial / Academic / Government);

Requires partnerships with content providers and search engines.

Page 11: Vellino presentationtocisti

Collaborative Filtering

Monitors user’s browsing behaviour (and / or explicit feedback) to build a profile of the users choices;

Other users with “similar” profiles can share (anonymously) their opinions (e.g. on the value or usefulness of an article or book) with others. “People who ordered article X also ordered article Y”);

Enables serendipitous recommendations (options that the “active user” might not have considered otherwise) May stimulate “innovation”; May complement citation indexing as a relevance criterion;

Untested technology in the scientific information retrieval community;

Page 12: Vellino presentationtocisti

Content Mining

Concept discovery using: Automatic Classification (Categorization) Named Entity Tagging Document meta-tagging w/ Concepts

Value: Improved Precision in Search Results May add dimensions to meta-data about content “Related Articles” feature in Google Scholar Enables novel visualization of results

Page 13: Vellino presentationtocisti

Entrust Toolkit

Categorizer

Entrust ContentAnalysis Toolkit

Do

cu

me

nt C

on

ve

rsio

n

Concepts

Summarizer

Search

Categories

Concepts,Meta-Data

Summaries,Ranked Phrases

Hits,Locations

FileSystem

DB

Page 14: Vellino presentationtocisti

Example: Healthcare Concept Tree

Page 15: Vellino presentationtocisti

Results Visualization

Content Analysis and Personalization May allow different

display paradigms for “more documents like this” or “similar articles”

Feedback on relevance of the query terms to the selected item.

Interactive Vizualization of Multiple Query Results – Battelle

Using Visualisation to Interpret Search Engine Results– Wolverhampton

Page 16: Vellino presentationtocisti

Partners

Google (Books / Scholar) http://scholar.google.com/

Online Computer Library Center - WorldCat http://www.worldcat.org/

Public Library of Science http://www.plos.org

Science.gov http://www.science.gov/

International Association of STM Publishers http://www.stm-assoc.org/

Annual Reviews http://www.annualreviews.org/

BioMed Central (UK) http://www.biomedcentral.com/

Page 17: Vellino presentationtocisti

Related Areas of Research

Digital Archiving Mechanisms for preserving digital objects (multi-media)

Valuation and payment models for Digital Objects To decide what to preserve / for how long / how much to

charge Application of Metadata Standards

Dublin core / Semantic Web Ontologies (OWL) Digital Rights Management & Security

Access control / Intellectual Property protection

Page 18: Vellino presentationtocisti

Project Phases & Outcomes

Project Phases Requirements / Research Phase Analysis / Design Phase Development / Test Phase

Outcomes Develop prototype of content-aggregation search portal

with collaborative filtering and content analysis engine Establish partnerships with content providers and search

engine organizations Test user satisfaction and "return use" improvements on a

sample population Publish results

Page 19: Vellino presentationtocisti

Requirements /Research Phase

User Requirements Find out what classes of users there are and what

features users want in an information portal that would help them innovate;

Technology Literature Review Content Aggregation Visualization Categorization Personalization / Collaborative Filtering

Page 20: Vellino presentationtocisti

Analysis / Design Phase

Use-Cases For each category of user, enumerate the use-

cases (behavioural scenarios). User Interface Design

Design the interface for query, query-refinement, results visualization and recommendations.

Software Evaluation Portal web-application components Collaborative Filtering packages Categorization / LitMiner interfaces

Page 21: Vellino presentationtocisti

Development / Test Phase

Prototype Information Portal Develop Content Aggregator Personalization / Recommendation agents

Integrate Content Analysis LitMiner or Categorization / Concept Tagging

toolkits Test and Evaluate in a Pilot program.

Experiments with test group to determine Measure of user acceptance Rates of Return Usage

Page 22: Vellino presentationtocisti

Draft Work Plan

Page 23: Vellino presentationtocisti

Andre Vellino – Relevant Experience

Entrust Content Analysis Policy Architect - Concept extraction and automatic categorization.

imGenie – startup Systems architect for a wireless, bi-modal (voice / text), personalized information

retrieval and groupware application. National Research Council

Research Scientist, IIT – Information Retrieval on small-format displays. Nortel Networks

Senior Systems Architect, Disruptive Network Solutions - Personal Identity Management for intelligent mediation of content-delivery in the network.

Carleton University Cognitive Science Ph.D. program, Adjunct Research Professor

NCF Internet Server-side Web architect for new NCF web-portal – registration, payment,

single sign-on to integrated applications. University of Georgia / Environmental Protection Agency

Research Associate, Advanced Computational Methods Center - development of expert system for predicting chemical reactivity from chemical structure.