Strategies LLC Taxonomy Sept. 10, 2008 Copyright 2008Taxonomy Strategies LLC. All rights reserved. Data Governance Maturity: When the business depends on clear description of fuzzy objects Presented to San Francisco DAMA Sept. 10, 2008 Ron Daniel, Jr.
101
Embed
Strategies LLCTaxonomy Sept. 10, 2008Copyright 2008Taxonomy Strategies LLC. All rights reserved. Data Governance Maturity: When the business depends on.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Strategies LLCTaxonomy
Sept. 10, 2008 Copyright 2008Taxonomy Strategies LLC. All rights reserved.
Data Governance Maturity:
When the business depends on clear description of fuzzy objects
Presented to San Francisco DAMA
Sept. 10, 2008
Ron Daniel, Jr.
2Taxonomy Strategies LLC The business of organized information
Bio: Ron Daniel, Jr.
Over 15 years in the business of metadata & automatic classification Principal, Taxonomy Strategies Standards Architect, Interwoven Senior Information Scientist, Metacode Technologies (acquired by
Interwoven, November 2000) Technical Staff Member, Los Alamos National Laboratory
Metadata and taxonomies community leadership. Chair, PRISM (Publishers Requirements for Industry Standard
Metadata) working group Acting chair, XML Linking working group Member, RDF working groups Co-editor, PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2
reports.
3Taxonomy Strategies LLC The business of organized information
Recent & current projects: http://www.taxonomystrategies.com/html/clients.htm
Government Commercial
Not-for-Profit
4Taxonomy Strategies LLC The business of organized information
Goals for this talk
Provide you with background on maturity models.
Provide the results of our surveys of Search, Metadata, & Taxonomy practices and discuss interesting findings.
Review the practices in use at stock photo houses, and compare them to methods that may be used in typical information management projects.
Give you the tools to do a simple self-assessment of your organization’s metadata maturity
5Taxonomy Strategies LLC The business of organized information
Agenda
9:15 Metadata Definitions
9:30 Maturity Models
9:45 Metadata Maturity Model (ca. 2006)
10:15 Break
10:30 Stock Photo Business
10:40 Data Governance Practices in Stock Photo Agencies
11:40 Summary
11:45 Questions
12:00 Adjourn
6TAXONOMY STRATEGIES The business of organized information
Metadata Definitions
7Taxonomy Strategies LLC The business of organized information
Taxonomy and metadata definitions
Metadata “Data about data”. Different communities have very different assumptions
about they types of data being described. I’m from the Information Science community, not the database,
statistics, or massive storage communities.
Taxonomy1.The classification of organisms in an ordered system
that indicates natural relationships.
2.The science, laws, or principles of classification; systematics.
3.Division into ordered groups, categories, or hierarchies.
8Taxonomy Strategies LLC The business of organized information
Examples of taxonomy used to populate metadata fields
Metadata
Title
Author
Department
Audience
Topic
Topics
Employee Services
Compensation
Retirement
Insurance
Further Education
Finance and Budget
Products and Services
Support Services
Infrastructure
Supplies
Metadata Values(Facets within the overall Taxonomy)
Audience
InternalExecutives
Managers
External
Suppliers
Customers
Partners
9Taxonomy Strategies LLC The business of organized information
Example faceted taxonomy
ABC Computers.com
AllBusinessABC EmployeeEducationGaming Enthusiast
HomeInvestorJob SeekerMediaPartnerShopper
First TimeExperiencedAdvanced
Supplier
Audience
AllHome & Home Office
GamingGovernment, Education & Healthcare
Medium & Large Business
Small Business
Line of Business
AllAsia-PacificCanadaABC EMEAJapanLatin America & Caribbean
Ontology Resembles faceted taxonomy but uses richer semantic relationships among terms and attributes and strict specification rules
A model of reality, allowing inferences to be made.
12Taxonomy Strategies LLC The business of organized information
Pop Quiz
On a blank piece of paper:
• What question(s) did you want to have answered by coming to today’s talks?
Flag one question to be discussed later.
You do NOT have to provide your name.
Please DO provide your job title, division, and either company name or company type.
13Taxonomy Strategies LLC The business of organized information
What do other people ask about?
How to build a taxonomy?
Definitions of terms.
How to govern its use and maintenance?
What’s the ROI?
What are they for?
How do we put them to use?
How do we link them to content?
How do they help search?
How do I sell management on a taxonomy project?
How do we maintain them?
and many more…
development
definitions
governance
ROI
basic taxo purpose
usage
tagging
search
selling
maint
14Taxonomy Strategies LLC The business of organized information
Agenda
9:15 Metadata Definitions
9:30 Maturity Models
9:45 Metadata Maturity Model (ca. 2006)
10:15 Break
10:30 Stock Photo Business
10:40 Data Governance Practices in Stock Photo Agencies
11:40 Summary
11:45 Questions
12:00 Adjourn
15TAXONOMY STRATEGIES The business of organized information
Motivation behind the Metadata Maturity Model
16Taxonomy Strategies LLC The business of organized information
Organizational benchmarking
A common goal of organizations is to ‘benchmark’ themselves against other organizations.
Different organizations have: Different levels of sophistication in their planning,
execution, and follow-up for CMS, Search, Portal, Metadata, and Taxonomy projects.
Different reasons for pursuing Search, Metadata, and Taxonomy efforts
Different cultures
Benchmarks should be to similar organizations.
17Taxonomy Strategies LLC The business of organized information
Is unnecessary capability harmful?
Tool Vendors continue to provide ever-more capable tools with ever-more sophisticated features. But we live in a world where a significant fraction of
public, commercial, web pages don’t have a <title> tag. Organizations that can’t manage <title> tags stand a
very poor chance of putting an entity extractor to use, which requires some ongoing management of the lists of entities to be extracted.
Organizations that can’t create and maintain clean metadata can’t put a faceted search UI to good use.
Unused capability is poor value-for-money. Organizations over-spend on tools and under-spend on
staff & processes.
18Taxonomy Strategies LLC The business of organized information
Towards better benchmarking…
Wanted a method to: Generally identify good and bad practices. Help clients identify the things they can do, and the things that
stand an excellent chance of failing. Predict likely sources of problems in engagements.
We have started to develop a Metadata Maturity Model, inspired by Maturity Models from the software industry.
To keep the model tied to reality, we are conducting surveys to determine the actual state of practice around search, metadata, taxonomy, and supporting business functions such as staffing and project management.
19TAXONOMY STRATEGIES The business of organized information
A Tale of Two Software Maturity Models
CMMI (Capability Maturity Model Integration)
vs.
The Joel Test
20Taxonomy Strategies LLC The business of organized information
CMMI structure
Source: http://chrguibert.free.fr/cmmi
Maturity Models are collections of Practices.
Main differences in Maturity Models concern:
• Descriptivist or Prescriptivist Purpose• Degree of Categorization of Practices• Number of Practices (~400 in CMMI)
21Taxonomy Strategies LLC The business of organized information
22 Process Areas, keyed to 5 Maturity Levels… Process Areas contain Specific
and Generic Practices, organized by Goals and Features, and arranged into Levels
Process Areas cover a broad range of practices beyond simple software development
CMMI Axioms:Individual processes at higher levels are AT RISK from supporting processes at lower levels.A Maturity Level is not achieved until ALL the Practices in that level are in operation.
22Taxonomy Strategies LLC The business of organized information
CMMI Positives
Independent audits of an organization’s level of maturity are a common service Level 3 certification frequently required in bids
“…compared with an average Level 2 program, Level 3 programs have 3.6 times fewer latent defects, Level 4 programs have 14.5 times fewer latent defects, and Level 5 programs have 16.8 times fewer latent defects”.
Michael Diaz and Jeff King – “How CMM Impacts Quality, Productivity,Rework, and the Bottom Line”
‘If you find yourself involved in product liability litigation you're going to hear terms like "prevailing standard of care" and "what a reasonable member of your profession would have done". Considering the fact that well over a thousand companies world-wide have achieved level 3 or above, and the body of knowledge about the CMM is readily available, you might have some explaining to do if you claim ignorance’.
Linda Zarate in a review of A Guide to the Cmm: Understanding the Capability Maturity Model for Software by Kenneth M. Dymond
23Taxonomy Strategies LLC The business of organized information
CMMI Negatives
Complexity and Expense Reading and understanding the materials Putting it into action – identifying processes, mapping
processes to model, gathering required data, … Audits are expensive
CMMI does not scale down well to small shops Has been accused of restraint of trade
24Taxonomy Strategies LLC The business of organized information
At the other extreme, The Joel Test
Developed by Joel Spolsky as reaction to CMMI complexity
Positives - Quick, easy, and inexpensive to use.
Negatives - Doesn’t scale up well:Not a good way to assure the quality of nuclear reactor software.Not suitable for scaring away liability lawyers.Not a longer-term improvement plan.
The Joel Test1. Do you use source control? 2. Can you make a build in one step? 3. Do you make daily builds? 4. Do you have a bug database? 5. Do you fix bugs before writing new code? 6. Do you have an up-to-date schedule? 7. Do you have a spec? 8. Do programmers have quiet working
conditions? 9. Do you use the best tools money can
buy? 10.Do you have testers? 11. Do new candidates write code during
their interview? 12.Do you do hallway usability testing?
Scoring: 1 point for each ‘yes’. Scores below 10 indicate serious trouble.
25Taxonomy Strategies LLC The business of organized information
What does software development “Maturity” really mean?
A low score on a maturity audit DOES NOT mean that an organization can’t develop good software
It DOES mean that whether the organization will do a good job depends on the specific mix of people assigned to the project
In other words, it sets a floor for how bad an organization is likely to do, not a ceiling on how good they can do Probability of failure is a good thing to know before
spending a lot of time and money
26TAXONOMY STRATEGIES The business of organized information
Towards a Metadata Maturity Model
Caveats: Maturity is not a goal, it is a characterization of an
organization’s methods for achieving its core goals.
Mature processes impose expenses which must be justified by consequent cost savings, revenue
gains, or service improvements.
Nevertheless, Maturity Models are useful as collections of best practices and stages in which to try to adopt
them.
27Taxonomy Strategies LLC The business of organized information
Basis for initial maturity model
CEN study on commercial adoption of Dublin Core
Small-scale phone survey Organizations which have world-class search and
metadata externally Not necessarily the most mature overall processes or
the best internal search and metadata
Literature review
Client experiences
Structure from software maturity models
28Taxonomy Strategies LLC The business of organized information
Bakeoff Datasets Budget for Bakeoffs Unneeded Capabils.Tools, then Reqs.
Staff training and hiring
Search Analyst Role Librarian Expertise Pre-hire Testing SME Catalogers
Data creation and QA CM Introduced ROT-Eliminatiion Hybrid Creation Model Adaptive QualificationQuality Measures
Project management Project Plan Std. Proj. Methodol.X-Functional TeamsCommunication PlanMulti-Year Plan
Early Termination
Executive support and ROI
External Search ROI Intranet ROI Model CEO knows Search ROI Use it or Lose It Budgets
37 Practices, Categorized by Area, Level, and
Importance
29Taxonomy Strategies LLC The business of organized information
Shortcomings of the initial model
No idea of how it corresponds to actual practice across multiple organizations Some indications that it over-emphasized the sophisticated
practices and under-emphasized beginning practices.
The initial metadata maturity model can be regarded as a hypothesis about how an organization progresses through various practices as it matures How to test it? Let’s ask! Two surveys to date Surveys are being run in stages because of large number of
practices. Ask about future, current, and former practices to gather
information on progression
30Taxonomy Strategies LLC The business of organized information
Agenda
9:15 Metadata Definitions
9:30 Maturity Models
9:45 Metadata Maturity Model (ca. 2006)
10:15 Break
10:30 Stock Photo Business
10:40 Data Governance Practices in Stock Photo Agencies
11:40 Summary
11:45 Questions
12:00 Adjourn
31TAXONOMY STRATEGIES The business of organized information
Survey 1: Search, Metadata, & Taxonomy Practices
The data in this section comes from a survey conducted in the autumn of 2005.
32Taxonomy Strategies LLC The business of organized information
Participants by Organization Size
33Taxonomy Strategies LLC The business of organized information
Participants by Job Role
34Taxonomy Strategies LLC The business of organized information
Participants by Industry
35Taxonomy Strategies LLC The business of organized information
Search Practices
Not current practice
Being developed In practice
Former practice
NA or Unknown
Search Box in standard place on all web pages. 20% (12) 11% (7) 62% (38) 2% (1) 5% (3)
Search engine indexes multiple repositories in addition to web sites. 25% (15) 21% (13) 44% (27) 2% (1) 8% (5)
Search results grouped by date, location, or other factors in addition to simple relevance score. 37% (22) 20% (12) 37% (22) 0% (0) 7% (4)
Queries are logged and the logs are regularly examined 31% (19) 25% (15) 31% (19) 5% (3) 8% (5)
Common queries identified, 'best' pages for those queries are found, and search engine configured to return them at the top. 46% (28) 25% (15) 21% (13) 0% (0) 8% (5)
Advanced computation of relevance based on data in addition to the text of the document. 43% (26) 16% (10) 25% (15) 0% (0) 16% (10)
A faceted search tool, such as Endeca, has been implemented for the organization's external site or product catalog search. 68% (41) 7% (4) 10% (6) 0% (0) 15% (9)
A faceted search tool, such as Endeca, has been implemented for the organization's internal website(s) or portal. 57% (34) 15% (9) 17% (10) 0% (0) 12% (7)
36Taxonomy Strategies LLC The business of organized information
Metadata Practices
Not current practice
Being developed In practice
Former practice
NA or Unknown
Metadata standards are developed for the needs of each system with no overall attempt to unify them. 22% (13) 12% (7) 37% (22) 20% (12) 10% (6)
An Organization-wide metadata standard exists and new systems consider it during development. 37% (22) 37% (22) 20% (12) 0% (0) 7% (4)
The Organization-wide metadata standard is based on the Dublin Core. 52% (30) 16% (9) 21% (12) 0% (0) 12% (7)
A Cataloging Policy document exists to teach people how to tag data in compliance with organizational metadata standard. 48% (29) 20% (12) 20% (12) 0% (0) 12% (7)
The Cataloging Policy document is revised periodically. 48% (29) 15% (9) 17% (10) 0% (0) 20% (12)
A centralized metadata repository exists to aggregate and unify metadata from disparate sources. 57% (34) 17% (10) 17% (10) 0% (0) 10% (6)
Metadata is manually entered into web forms. 15% (9) 12% (7) 61% (36) 3% (2) 8% (5)
Metadata is generated automatically by software. 38% (23) 18% (11) 27% (16) 2% (1) 15% (9)
Metadata is generated automatically, then reviewed manually for correction. 48% (29) 18% (11) 17% (10) 2% (1) 15% (9)
These two questions were the only ones with much correlation to
organization size
37Taxonomy Strategies LLC The business of organized information
Taxonomy Practices
Not current practice
Being developed In practice
Former practice
NA or Unknown
Org Chart' Taxonomy - One based primarily on the structure of the organization. 36% (21) 10% (6) 34% (20) 5% (3) 15% (9)
'Products' Taxonomy - One based primarily on the products and/or services offered by the organization. 37% (22) 10% (6) 32% (19) 5% (3) 15% (9)
'Content Types' Taxonomy - One based primarily on the different types of documents. 28% (16) 21% (12) 40% (23) 5% (3) 7% (4)
'Topical' Taxonomy - One based primarily on topics of interest to the site users. 20% (12) 36% (21) 34% (20) 3% (2) 7% (4)
'Faceted' Taxonomy - One which uses several of the approaches above. 32% (19) 29% (17) 34% (20) 0% (0) 5% (3)
The Taxonomy, or a portion of it, was licensed from an outside taxonomy vendor. 75% (44) 3% (2) 14% (8) 0% (0) 8% (5)
The Taxonomy follows a written 'style guide' to ensure its consistency over time. 47% (28) 22% (13) 20% (12) 0% (0) 10% (6)
The Taxonomy is maintained using a taxonomy editing tool other than MS Excel. 35% (21) 17% (10) 40% (24) 2% (1) 7% (4)
The Taxonomy was validated on a representative sample of content during its development. 28% (17) 22% (13) 33% (20) 3% (2) 13% (8)
A Roadmap for the future evolution of the Taxonomy has been developed. 38% (23) 40% (24) 13% (8) 0% (0) 8% (5)
38TAXONOMY STRATEGIES The business of organized information
Survey 2: Business Drivers, Processes, and Staffing
The data in this section comes from a survey conducted in the spring of 2006.
39Taxonomy Strategies LLC The business of organized information
Participants by Job Role
40Taxonomy Strategies LLC The business of organized information
Participants by Tenure
41Taxonomy Strategies LLC The business of organized information
Participants by Industry
42Taxonomy Strategies LLC The business of organized information
Participants by Organization Size
43Taxonomy Strategies LLC The business of organized information
Business Drivers: Search, Metadata, and Taxonomy (SMT) Applications
44Taxonomy Strategies LLC The business of organized information
Business Drivers: Desired Benefits
1 Innovation
2 Core to our business product
3 Clients do all the above [From a consultant]
4 Better navigation to diverse State web sites
5 Increased knowledge sharing across the corporation
6 Interoperability
7 Dynamic web applications
8 Improved user search experience
9 Improve R&D
10Higher value to members [From a non-profit membership
org.]
11 For organization to have better understanding of their content
Other desired benefits
:
45Taxonomy Strategies LLC The business of organized information
ROI: Cost Estimation
46Taxonomy Strategies LLC The business of organized information
Processes
Use of search logs is
improving
Surprisingly sophisticated
Basic data quality and communications need improvement
Many solo operators
47Taxonomy Strategies LLC The business of organized information
Team Structures & Staffing
48Taxonomy Strategies LLC The business of organized information
Salary Survey
Experience 0.6 Nice to see it really counts.
Geography 0.5 California and the Northeast have highest salaries.
Co. Size 0.5 Not very reliable, big changes from one datapoint
Education 0.4 Many taxonomists have MLS or above.
Industry 0.4 Surprisingly, retail has high salaries for taxonomists.
Role 0.04 Taxonomists paid about like Information Architects
Time at current job -0.07
49Taxonomy Strategies LLC The business of organized information
Notes from Participants
There is the constant struggle with individual [magazine] titles to hire trained librarians or data specialists instead of trying to save money by hiring an editor who can build articles AND create and assign metadata. This is a governance issue we have been struggling with since we have no monetary stake in the individual publications. We make recommendations, but have no higher level authority to require titles to hire trained staff for metadata.
Reporting metrics have become a new area of confusion as we move to portalized pages consisting of objects in portlets, each with their own metadata.
Key organizational issue is that the "problems" that stem from lack of systematic metadata/taxonomy creation are not "owned" by anyone, and consequently have no budget for their solution.
50TAXONOMY STRATEGIES The business of organized information
Interim Conclusions
51Taxonomy Strategies LLC The business of organized information
Observations (1)
Practices which a single person or a small group can carry out are more commonly used Not surprising Very different than ERP/BPR, indicates that information
management is not being sold to the “C-level” staff. People need to question how inclusive their
“Organizational Metadata Standards” and “Taxonomy Roadmaps” actually are. We have found Taxonomy Roadmaps to be an advanced
practice, due to a dependence on knowing upcoming IT development schedule
52Taxonomy Strategies LLC The business of organized information
Observations (2)
Many of the basics are being skipped More organizations doing “Spell Checking” than “Query
Log Analysis”. 69% have a taxonomy change plan, but only 41% have
a plan for revisiting data if the taxonomy changes. 64% have a communications plan, but only 56% have a
website. This seems to be linked to the previous observation –
things that are easy for an individual get done before things that need an organizational effort, despite their level of ‘sophistication’.
53Taxonomy Strategies LLC The business of organized information
Interim Metadata Maturity Model (ca. May, 2006)
Practice Area Basic Intermediate Advanced Limiting
Data creation and QA CM Introduced ROT-EliminatiionSemi-auto tagging
Quality Measures
Project management Project PlanX-Functional Teams
Std. Proj. Methodol.Multi-Year PlanCommunication PlanSMT Business Manager, instead of IT Manager
Early Termination
Executive support and ROI
External Search ROISMT in separate silos
Intranet ROI Model CEO knows Search ROI Use it or Lose It Budgets
54Taxonomy Strategies LLC The business of organized information
Search and Metadata Maturity Quick Quiz
Basic1) Is there a process in place to examine query logs?2) Is there a process for adding directories and content to the repository, or do people just
do what they want?3) Is there an organization-wide metadata standard, such as an extension of the Dublin
Core, for use by search tools, multiple repositories, etc.?Intermediate4) Does the search engine index more than 4 repositories around the organization?5) Does the search engine integrate with the taxonomy to improve searches and organize
results?6) Are there hiring and training practices especially for metadata and taxonomy positions?7) Is there an ongoing data cleansing procedure to look for ROT (Redundant, Obsolete,
Trivial content)?8) Are tools only acquired after requirements have been analyzed, or are major purchases
sometimes made to use up year-end money?Advanced9) Are there established qualitative and quantitative measures of metadata quality?10) Can the CEO explain the ROI for search and metadata?
55Taxonomy Strategies LLC The business of organized information
Agenda
9:15 Metadata Definitions
9:30 Maturity Models
9:45 Metadata Maturity Model (ca. 2006)
10:15 Break
10:30 Stock Photo Business
10:40 Data Governance Practices in Stock Photo Agencies
11:40 Summary
11:45 Questions
12:00 Adjourn
56Taxonomy Strategies LLC The business of organized information
Agenda
9:15 Metadata Definitions
9:30 Maturity Models
9:45 Metadata Maturity Model (ca. 2006)
10:15 Break
10:30 Stock Photo Business
10:40 Data Governance Practices in Stock Photo Agencies
11:40 Summary
11:45 Questions
12:00 Adjourn
57Taxonomy Strategies LLC The business of organized information
Stock Photo Business
Advertising, Editorial Content, Corporate Communications, and many other types of content rely on images to convey information and moods.
When time and/or budget does not allow a commissioned shoot, stock photo houses can supply images.
Fundamental problem for users: How to search for an image that conveys what you want?
Fundamental problem for houses: How to describe images so that users can find them?
58Taxonomy Strategies LLC The business of organized information
How would you search for this image?
59Taxonomy Strategies LLC The business of organized information
Tagging by emotions
60Taxonomy Strategies LLC The business of organized information
“silence”
Conceptual refinement
Objective criteria
Conceptual refinement
Image Rights Criteria
61Taxonomy Strategies LLC The business of organized information
Clarification: Finger on Lips
62Taxonomy Strategies LLC The business of organized information
Scrolling through results…
This is more of the mood I’m looking for…
63Taxonomy Strategies LLC The business of organized information
More like this
64Taxonomy Strategies LLC The business of organized information
Facets at gettyimages.com
65Taxonomy Strategies LLC The business of organized information
Key Questions
Getty Images (and Corbis) have put a lot of effort into their websites for image purchase*.
Internal staff at such organizations tell me that their intranets are nowhere near as easy to use. ROI is the reason why. Recall that retail had high salaries for taxonomists,
because the ROI for a better shopping site is so clear.
The front-ends are dependent on data. How is that data governed? How does that differ from how their intranets are governed?
*Licensing, not purchasing, to be pedantic.
66Taxonomy Strategies LLC The business of organized information
Agenda
9:15 Metadata Definitions
9:30 Maturity Models
9:45 Metadata Maturity Model (ca. 2006)
10:15 Break
10:30 Stock Photo Business
10:40 Data Governance Practices in Stock Photo Agencies
11:40 Summary
11:45 Questions
12:00 Adjourn
67Taxonomy Strategies LLC The business of organized information
Pop Quiz
What is the #1 underused source of quantitative information on how to improve your metadata
and taxonomy?
Query Logs & Click Trails
68Taxonomy Strategies LLC The business of organized information
Who are the users & what are they looking for?
Only 30-40% of organizations regularly examine their logs.
Sophisticated software available, but don’t wait. 80% of value comes from basic reports
69Taxonomy Strategies LLC The business of organized information
70Taxonomy Strategies LLC The business of organized information
Query log & click trail examination– Query log
UltraSeek Reporting Top queries Queries with no results Queries with no click-through Most requested documents Query trend analysis Complete server usage
summary
71TAXONOMY STRATEGIES The business of organized information
Examining the Stock Photo Agencies in Light of the Metadata Maturity Model
72Taxonomy Strategies LLC The business of organized information
Maturity Model Recap
Practice Area Basic Intermediate Advanced Limiting
• System MD Stds: Both have moved beyond that level.• Organization MD Standard: Both define core metadata
standards with extensions for specific collections.• Multiple repositories comply w/ MD standard:
Collections are tagged to a common core at both vendors, plus extension elements in different collections.
• Reuse ERP taxonomies: N/A• Taxonomy Maint. Doc:• Taxonomy Roadmap: Corbis had plan for facets to be
added, but not keyed to other systems.• Highly abstract vocabularies: Getty shows emotion
tagging in action with their moodstream offering.• Metadata maint. doc: TBD
75Taxonomy Strategies LLC The business of organized information
Image Collections
76Taxonomy Strategies LLC The business of organized information
Editorial rules standard
Abbreviations Ampersands Capitalization General…, More…, Other… Languages & character sets Length limits Multiple parents Plural vs. singular form Scope notes Serial comma Sources of terms Spaces Synonyms & acronyms Term order (Alphabetic or …) Term label order (Direct vs.
inverted)…
Rule Name Editorial Rule
Abbreviations Abbreviations, other than colloquial terms and acronyms, shall not be used in term labels.Example: Public InformationNOT: Public Info.
Ampersands The ampersand [&] character shall be used instead of the word ‘and’. Example: Licensing & ComplianceNOT: Licensing and Compliance
Capitalization Title case capitalization shall be used. Example: Customer ServiceNOT: CUSTOMER SERVICENOT: Customer serviceNOT: customer service
General…, More…, Other…
The term labels “General…”, “More…”, and “Other…” shall be used for categories which contain content items that are not further classifiable. Example: “Other Property”
• Librarian or IA expertise: Both seek this in their cataloging and taxonomy hires, but seek additional things as well.
• Search Analyst: Was goal for Getty at time of interview. Interviewee thought that would take Getty from a “7” to an ”8” in terms of search sophistication.
• Cross-functional taxonomy creation: Not at time of interviews.
• Cross-Functional taxonomy maint: Not at time of interviews.
• SME Catalogers: Yes, esp. Getty Images. Corbis had an art history emphasis, Getty looked for people with variety of backgrounds, esp. science, and photographers.
• Pre-hire testing: Getty did some of this with interns.
80Taxonomy Strategies LLC The business of organized information
Data creation and QA
Practice Area Basic Intermediate Advanced Limiting
Data creation and QA CM Introduced ROT-EliminatiionSemi-auto tagging
Quality Measures
• CM Introduced: Both use strong database systems for cataloging.
• ROT-Elimination: Image collections rarely removed unless licensing problems occur. Both have error detection and error correction processes.
• Semi-auto tagging: Both evaluate this technology periodically but neither has found it usable on images.
• Cross-Functional taxonomy maint: Not at time of interviews.
• Quality measures: Both have quality control processes but neither mentioned analytic models..
81Taxonomy Strategies LLC The business of organized information
84Taxonomy Strategies LLC The business of organized information
User interface survey — Results (1)
Which Interface would you rather use for these tasks?
Google-like Baseline
Faceted Category
Find images of roses 15 16
Find all works from a certain period 2 30
Find pictures by 2 artists in the same media 1 29
…
Overall assessment:Google-like
BaselineFaceted
Category
More useful for your usual tasks 4 28
Easiest to use 8 23
Most flexible 6 24
More likely to result in dead-ends 28 3
Helped you learn more 1 31
Overall preference 2 29
…
Source: Yee, Swearingen, Li, & Hearst
85Taxonomy Strategies LLC The business of organized information
User interface survey — Results (2)
6.06.7
4.7 4.6
5.8 5.56.0
4.0
7.26.3
3.5
7.7 7.4 7.8
4.8
7.6
0123456789
Faceted Category
Google-like Baseline
Source: Yee, Swearingen, Li, & Hearst
86Taxonomy Strategies LLC The business of organized information
Document distribution—How evenly does it divide the content?
Documents do not distribute uniformly across categories
Zipf (1/x) distribution is expected behavior
80/20 rule in action (actually 70/20 rule)
Measured v Expected Distribution of Top 10 Content Types in Library of Congress Database
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
Congre
sses
Biogra
phy
Period
icals
Map
s
Fiction
Exhib
itions
Juve
nile l
itera
ture
Bibliog
raph
y
Statis
tics
Top 10 Content Types
Nu
mb
er o
f R
eco
rds
Leading candidate for splitting
Leading candidates for merging
87Taxonomy Strategies LLC The business of organized information
Document distribution— How evenly does it divide the content?
Methodology: 115 randomly selected URLs from corporate intranet search index were manually categorized. Inaccessible files and ‘junk’ were removed.
Results: Slightly more uniform than Zipf distribution. Above the curve is better than expected.
Measured v Expected Intranet Content Type Distribution
0
5
10
15
20
25
Peo
ple,
Gro
ups
& P
lace
s
New
s &
Eve
nts
Man
uals
&Le
arni
ngM
ater
ials
Ope
ratio
ns &
Inte
rnal
Com
mun
icat
ions
Mar
ketin
g &
Sal
es
Reg
ulat
ions
,P
olic
ies,
Pro
cedu
res
&T
empl
ates
Pap
ers
&P
rese
ntat
ions
Oth
er &
Unc
lass
ified
Pro
gram
s,P
ropo
sals
, P
lans
& S
ched
ules
Content Type
# D
ocu
men
ts
88Taxonomy Strategies LLC The business of organized information
Document distribution— How does taxonomy “shape” match that of content? Background: Hierarchical taxonomies allow
comparison of “fit” between content and taxonomy areas
Methodology: 25,380 resources tagged with
taxonomy of 179 terms. (Avg. of 2 terms per resource)
Counts of terms and documents summed within taxonomy hierarchy
Results: Roughly Zipf distributed (top 20
terms: 79%; top 30 terms: 87%) Mismatches between term% and
document% flagged
Term Group%
Terms%
Docs
Administrators 7.8 15.8
Community Groups 2.8 1.8
Counselors 3.4 1.4
Federal Funds Recipients and Applicants
9.5 34.4
Librarians 2.8 1.1
News Media 0.6 3.1
Other 7.3 2.0
Parents and Families 2.8 6.0
Policymakers 4.5 11.5
Researchers 2.2 3.6
School Support Staff 2.2 0.2
Student Financial Aid Providers
1.7 0.7
Students 27.4 7.0
Teachers 25.1 11.4
Source: Courtesy Keith Stubbs, US. Dept. of Ed.
89Taxonomy Strategies LLC The business of organized information
Project Management
Practice Area Basic Intermediate Advanced Limiting
Project management Project PlanX-Functional Teams
Std. Proj. Methodol.Multi-Year PlanCommunication PlanSMT Business Manager, instead of IT Manager
Early Termination
• Project Plan: Both companies are in a mode where maintaining the cataloging, terminology, and search tools is ongoing enhancement. Neither company discussed project management.
• X-Functional Teams: Very little corss-functional involvement was discussed. Some input from sales and cataloging for taxonomy revisions.
• Std. Project Methodology: Not at time of interviews.• Multi-year plan: Not at time of interviews.• Communication Plan: Not discussed.• SMT Business Manager: Not discussed.• Early Termination: Not discussed.
90Taxonomy Strategies LLC The business of organized information
Key Governance Aspects
Roles and Responsibilities – Managers Reviewers
Policies – For naming Required Fields
Procedures – For reviewing and approving metadata placement For acting on poor metadata application
91Taxonomy Strategies LLC The business of organized information
Recommended Measure and Improve Mindset Measure - Determine current situation and what is wrong.
• Too many documents in a category? Too many categories? People complaining about not finding material that is on the site? People asking for materials not on the site? Common searches without results?
Decide – Decide how to change things to fix the problem.• Change navigation list? Add new categories? Add synonyms to search? Create
new content?
Confirm – Before rolling out changes, test them to make sure they will improve the problem.
100Taxonomy Strategies LLC The business of organized information
Fun Questions
The animals are divided into:(a) belonging to the emperor,(b) embalmed, (c) tame, (d) sucking pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the present classification,(i) frenzied, (j) innumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from along way off look like flies.
Jorge Luis Borges, " THE ANALYTICAL LANGUAGE OF JOHN WILKINS"Works in 3 volumes (in Russian). St. Petersburg, "Polaris", 1994. V. 2: 87.
This was created to be
as bad a classification as possible.
What makes it so bad?
Strategies LLCTaxonomy
Sept. 10, 2008 Copyright 2008Taxonomy Strategies LLC. All rights reserved.