Daniela Barbosa, Synaptica Business Development Manager, Dow Jones Client Solutions, Dow Jones & Company Paula R McCoy, Manager, Taxonomy Development, ProQuest
Now that you have built your taxonomies, you need to manage and maintain them in a centralized environment that can be leveraged by all of your enterprise applications including search tools, portals, and CMS/DMS systems. This session will review some best practices in centralized taxonomy management and go through the implementation of a thesaurus management tool at ProQuest, which enabled them to create a common language to connect disparate information assets using large and varied vocabularies and authority files linked to new and existing editorial systems.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Portal navigation and browsable website menus Conceptual access to large databases Records management and cataloging e-Commerce online product catalogues Inventory control and de-duplication Auto-classification of internal documents and email Multilingual search and browse Metasearch of enterprise-wide resources
As a centralized repository for multi-lingual semantic management that is:
- Independent from systems like web-portal search and categorization systems - Scalable; capable of evolving with emerging corporate semantic standards
Metadata can transcend information islands and data silos but only if the enterprise is committed to common standards
A centralized system that supports both collaboration and compartmentalization allows common metadata to be shared while also allowing user communities the independence to manage specialized metadata files
Enterprises are increasingly making use of multiple proprietary and open source software tools for categorization, search and portal tasks
While many of these tools support some level of metadata management the diversity of standards, data formats and business rules they support can actually result in exacerbating the data silo problem by creating metadata silos
A snapshot or dashboard is often more desirable than a list of document titles or snippets, especially when looking for information on a customer, supplier or competitor
Also, information will most likely reside in a number of internal repositories, each with their own levels of metadata structure
Taxonomy allows the combination of news, internal CI reports, price plans, coverage data, market share data, share price etc. in one consolidated view by providing mappings or cross-walks
This is essentially applying business intelligence discipline to the world of unstructured information
The previous three scenarios assume the user knows what they are looking for
But what about serendipitous discovery?
By being able see across an aggregation of content and extract facts and relationships from deep within the information stores, true (and sometimes fortunate) discovery can take place
Centralized Taxonomy Management for Centralized Taxonomy Management for Enterprise Information SystemsEnterprise Information Systems
Description of ProQuest Controlled Vocabulary & Authority Files
Taxonomy Management -- Overview
Managing Terms Manually
Synaptica Thesaurus Management System
Topics of DiscussionTopics of Discussion
Access to over 125 billion digital pages of content from magazine, trade, & scholarly publications, current &
historical newspapers, original materials such as annual reports & civil war pamphlets, and daily wire feeds
Subscription-based ProQuest® online information service available in academic and public libraries
ProQuest Controlled Vocabulary used to index subjects; Authority Files used to index company, geographic, personal, product names
CV applied to non-periodical & third-party content via mapping, to allow cross-searching of multiple DBs with one vocabulary
Created in 1970s for ABI/INFORM business database
Based on Library of Congress Subject Headings
Natural language, hierarchical vocabulary complying with ANSI/NISO Standard Z39.19 (Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies)
Corporate/Organization Names: 438,098 Names added in 2008: 5489
Personal Names: 416,239 Names added in 2008: 1526
Geographic (Location) Names: 34,331 Names added in 2008: 144
Product Names: 38,210 Names added in 2008: 54
The Taxonomy Manager’s JobThe Taxonomy Manager’s Job
Add subject terms as dictated by new concepts and new content to index
Maintain hierarchies & Scope Notes
Load updated Thesaurus to ProQuest interface
Manage authority files to maintain standards & control file size
The Taxonomy Manager’s JobThe Taxonomy Manager’s Job
To ensure that indexers and searchers alike have access to a complete and accurate Thesaurus that they can use to maximize the discoverability of documents in ProQuest
OBJECTIVE:
Sample Subject TermSample Subject Term
Chronic obstructive pulmonary disease SN: Any lung disease, such as chronic bronchitis or emphysema, causing obstruction of bronchial airflow UF COPD BT Disease BT Respiratory diseases NT Asthma NT Bronchitis NT Emphysema RT Airway management RT Lungs
Preferred, or main termPreferred, or main term
Scope note defining term and how it is used
Scope note defining term and how it is used
Non-preferred term: points to term used to index
Non-preferred term: points to term used to index
Terms broader in nature to main term: COPD is a
disease, and specifically, a respiratory disease
Terms broader in nature to main term: COPD is a
disease, and specifically, a respiratory disease
Terms narrower in nature to main term: these are
chronic lung diseases
Terms narrower in nature to main term: these are
chronic lung diseases
Terms related to main term that might be used to
narrow the search
Terms related to main term that might be used to
narrow the search
New scientific content requiring a huge enhancement to vocabulary
Seven MS Word vocabulary documents— English and foreign language (French, German, Spanish)—printed for internal use only
Six authority files & 3 vocabulary files in Oracle databases, requiring duplicate entry of subject terms in Word and Oracle
Legacy editorial system in process of being replaced
Synaptica version 7.0 is being implemented now: • Enhanced user interface • Semantic Web standardization (RDF, OWL, SKOS) and Web Services integration• Expanded Reporting functionality • Enhanced adding and editing of term relationships including “rapid-fire” simple drag-and-drop editing• Improved global term editing• Online help and user guides
Benefits of SynapticaBenefits of Synaptica
Greater awareness of thesaurus standards and terminology, e.g.: “preferred” and “non-preferred” instead of Use and Used For
Long-needed updating and improvement in term hierarchies; ability to provide thesaurus statistics
Increase in Company name NPTs — from 1935 to 8952 today
Immediate responsiveness to indexer needs — real-time term additions, esp. NPTs and SNs
Easier loading of updated Thesaurus on PQ interface