Strategies LLC Taxonomy May 22, 2006 Copyright 2006 Taxonomy Strategies LLC. All rights reserved. 2006 Enterprise Search Summit Taxonomy Fundamentals: What you need to know about taxonomies (but were afraid to ask)
Dec 17, 2015
Strategies LLCTaxonomy
May 22, 2006 Copyright 2006 Taxonomy Strategies LLC. All rights reserved.
2006 Enterprise Search Summit
Taxonomy Fundamentals:
What you need to know about taxonomies (but were afraid to ask)
2Taxonomy Strategies LLC The business of organized information
Pop Quiz
On a blank piece of paper: What questions did you want to have answered by
coming to today’s talks? What new questions do you have, based on what
you’ve learned from the previous presentations? Flag one question to be answered later. You do NOT have to provide your name. Please DO provide your job title, division, and either
company or company type.
3Taxonomy Strategies LLC The business of organized information
What this session will cover
What's involved in creating a taxonomy.
The bottom line benefits of an enterprise taxonomy.
How to calculate the ROI on taxonomy development.
How to convince managers and staff to take taxonomy seriously, in the face of Google.
How to best implement, support, and maintain a taxonomy from beginning to end.
How can taxonomies improve my search system? What are the fundamental principles that dictate when to use metadata and taxonomy to improve the overall search experience?
4Taxonomy Strategies LLC The business of organized information
Taxonomy issues, problems, and concerns
Enormous volumes of information within organizations
Diversity of assets Content and technology
Complex and IT-oriented standards .NET, SOAP, WSDL, etc.
Limited (if any) integration with applications: Search engines Information management applications Back office transaction-based systems Analytical systems
5Taxonomy Strategies LLC The business of organized information
What's involved in creating a taxonomy?
A taxonomy includes: Metadata scheme which are data fields for describing
content so that it can be found and used Vocabularies which are collections of terms that are to
be used to fill-in some of the metadata fields Relationships between content, fields or terms
(hierarchical, equivalence, and associative)
6Taxonomy Strategies LLC The business of organized information
What’s a taxonomy?
A taxonomy is not just a folder structure. A folder structure is a view of a content collection that
can be constructed by using the taxonomy
A taxonomy is not just website navigation Site navigation is a view of a collection of content that
can be constructed using the taxonomy
7Taxonomy Strategies LLC The business of organized information
How do taxonomies actually improve search?
Input (Query) Side “Search” using a small set of pre-defined values instead
of trying to guess what word or words might have been used in the content.
Have synonyms mapped together so searches for “car” and “automobile” return the same things.
Output (Results) Side Organize search results into groups of related items. Sorting and filtering Refining search results
8Taxonomy Strategies LLC The business of organized information
Fundamentals of taxonomy ROI
Tagging content using a taxonomy is a cost, not a benefit.
There is no benefit without exposing the tagged content to users in some way that cuts costs or improves revenues.
Putting taxonomy into operation requires UI changes and/or backend system changes, as well as data changes.
You need to determine those changes, and their costs, as part of the ROI.
9Taxonomy Strategies LLC The business of organized information
Usability research— Taxonomy compared to search results lists
“We found that users preferred a browsing oriented interface for a browsing task, and a direct search interface when they knew precisely what they wanted.”
Marti Hearst (and others)
“The category interface is superior to the list interface in both subjective and objective measures.”
Hao Chen & Susan Dumais
10Taxonomy Strategies LLC The business of organized information
Taxonomy compared to search result lists
0
20
40
60
80
100
120
140
Category List
Me
dia
n S
earc
h T
ime
in
Se
con
ds
In top 20 results
Not in top 20 results
Category is 36% faster
Category is 48% faster
Source: Chen & Dumais
11Taxonomy Strategies LLC The business of organized information
Time saved—Taxonomy compared to search result lists
1 hour per day searching x 36% faster = 22 minutes each day
22 minutes x 250 working days per year = 5500 minutes or 92 hours per year
12Taxonomy Strategies LLC The business of organized information
Time saved—Taxonomy compared to search result lists
Benefit: Service efficiency increase
Number of FOIA requests & information calls per month
50,000
Average cost per call $ 6
Total FOIA & call costs per year $ 3,600,000
Increase in productivity by browsing information 36%
Service costs savings per year $1,296,000
13Taxonomy Strategies LLC The business of organized information
Trusted advisers—Taxonomy avoids costs
“The amount of time wasted in futile searching for vital information is enormous, leading to staggering costs …”
Sue Feldman,
Poor classification costs a 10,000 user organization $10M each year—about $1,000 per employee.
Jakob Nielsen, useit.com
14Taxonomy Strategies LLC The business of organized information
Searching
Creating
Commun-icating
Knowledge workers spend up to 2.5 hours each day looking for information …
… But find what they are looking for only 40% of the time.
Source: Kit Sims Taylor
15Taxonomy Strategies LLC The business of organized information
Creating new
content
Recreating existing content
SearchingCommun-icating
25%8%
Knowledge workers spend more time re-creating existing content than creating new content
Source: Kit Sims Taylor (cited by Sue Feldman in her original article)
16Taxonomy Strategies LLC The business of organized information
Cost saved by not recreating content
Benefit: Increase in productivity
Number of employees 100
Average employee salary $ 50,000
Employee costs per year $5,000,000
Increase in productivity from not re-creating content 25%
Employee cost savings per year $1,250,000
17Taxonomy Strategies LLC The business of organized information
Key Factors in ROI
Breadth “How many people will metadata affect?”
Repeatability “How many times a day will they use it?
Cost/Benefit “Is this a costly effort with little or no benefits?”
Source: Todd Stephens, Dublin Core Global Corporate Circle
18Taxonomy Strategies LLC The business of organized information
Some common taxonomy ROI scenarios
Customer support Cutting FOIA & information costs Increased wed statistics (page hits) Higher ACSI (American Customer Satisfaction Index) score
Knowledge worker productivity Less time searching, more time working Avoiding re-creating information that already exists
Publication catalog Increased self-service & use Increased productivity
Compliance Improved regulatory compliance Improved enforcement
Research & regulatory accountability Higher OMB PARS (Performance & Accountability Reports)
19Taxonomy Strategies LLC The business of organized information
How to estimate costs—Tagging
Taxonomy Facet Hier?TypicalCV Size
Time/ Value (min)
Avg # values /
Item $ / MinCost/
Element
Audience N 10 0.25 2 $ 0.42 $ 0.21
Content Type N 20 0.25 1 $ 0.42 $ 0.11
Organizational Unit Y 50 0.5 2 $ 0.42 $ 0.42
Products & Services Y 500 1.5 4 $ 0.42 $ 2.52
Geographic Region Y 100 0.5 2 $ 0.42 $ 0.42
Broad Topics Y 400 2 4 $ 0.42 $ 3.36
TOTALS 1080 5 15 $ 7.04
Inspired by: Ray Luoma, BAU Solutions
Consider complexity of facet and ambiguity of content to estimate time
per value.
Estimated cost of tagging one item. This can be reduced with automation, but cannot be
eliminated.
Is this fi
eld
worth
the co
st?
20Taxonomy Strategies LLC The business of organized information
Sample ROI Calculations
Description Year 1 Year 2 Year 3 Year 4 Year 5
Costs
Software Licenses/ Maintenance $ 100,000 $ 15,000
$ 15,000
$ 15,000
$ 15,000
Implementation/Support $ 200,000 $ 30,000 $ 30,000
$ 30,000
$ 30,000
Taxonomy Creation/ Maintenance $ 100,000 $ 15,000
$ 15,000
$ 15,000
$ 15,000
Legacy/Ongoing Tagging $ 703,500 $ 105,525 $ 105,525
$ 105,525
$ 105,525
Benefits
Productivity increases $ - $ 125,000 $ 1,250,000 $ 1,250,000 $ 1,250,000
Service efficiency gains $ - $ 129,600 $ 1,296,000 $ 1,296,000 $ 1,296,000
Yearly Net Benefits $(1,103,500) $ 89,075 $ 2,380,475 $ 2,380,475 $ 2,380,475
Payback period 1.4 Years until Benefits = Costs
Inspired by: Todd Stephens, Dublin Core Global Corporate Circle
Ongoing cost of tagging due to 15% content growth.
21Taxonomy Strategies LLC The business of organized information
ROI summary
Taxonomy Value Propositions Find information faster Avoid recreating information that already exists Improve service Improve regulatory compliance Improve performance & accountability
Don’t sell “taxonomy”, sell the vision of what you want to be able to do.
Do the calculus (costs and benefits) Quantify the tangible & intangible benefits Quantify the total cost of ownership including maintenance &
tagging
Support your calculations with research
22Taxonomy Strategies LLC The business of organized information
Three problems of taxonomy governance
The Taxonomy Problem: How to build and maintain the lists of pre-defined values that go
into some of the metadata elements.
The Tagging Problem: How to populate metadata elements with complete and consistent
values. What can be expected from automatic classifiers? What kind of
error detection and error correction procedures are needed?
The ROI (Return On Investment) Problem: How to use content, metadata, and vocabularies in applications to
obtain business benefits.
Business Goals and Cultural Factors are major influences on tagging and taxonomy. These must be acknowledged at the start to avoid re-work.
23Taxonomy Strategies LLC The business of organized information
Who should build the taxonomy?
The taxonomy (and metadata specification) should be produced by a cross-functional team which includes business, technical, information management, and content creation stakeholders.
The team should plan on maintaining the taxonomy as well as building it. Maintenance will not (usually) be anyone’s full-time job. Exact mix of people on team will change.
It should be built in an iterative fashion, with more content and broader review for each iteration.
24Taxonomy Strategies LLC The business of organized information
Controlled items Taxonomy team will need to manage
Metadata Standard
Controlled Vocabularies
Editorial Rules
Tagger Training Materials (manual and automatic)
Charter, Goals, Performance Measures
Team Processes
Outreach & ROI Website Communication plan Presentations Announcements
Taxonomy Roadmap Long range plan for
Development of controlled vocabularies, and
Integration with enterprise applications
25Taxonomy Strategies LLC The business of organized information
Controlled item: Editorial rules
Akin to “Chicago Manual of Style”
Issues commonly addressed in the rules: Abbreviations Ampersands Capitalization Continuations (More… or Other…) Duplicate Terms Fidelity to External Source Hierarchy and Polyhierarchy Languages and Character Sets Length Limits “Other” – Allowed or Forbidden? Plural vs. Singular Forms Relation Types and Limits Scope Notes Serial Comma Sources of Terms Spaces Spelling (British vs. American English) Synonyms and Acronyms Translations Term Order (Alphabetic or …) Term Label Order (Direct vs. Inverted)
What to do when rules conflict – how do people decide which rule is more important?
Rule Name Editorial Rule
Sources of Terms
Other things being equal, reusing an existing vocabulary is preferred to creating a new one.
Ampersands The character '&' is preferred to the word ‘and’ in Term Labels.Example: Use Type: “Manuals & Forms”, not “Manuals and Forms”.
Special Characters
Retain accented characters in Term Labels.Example: Use “España”, not “Espana”.
Serial comma If a category name includes more than two items, separate the items by commas. The last item is separated by the character ‘&’ which IS NOT preceded by a comma.Example: “Education, Learning & Employment”, not “Education, Learning, & Employment”.
Capitalization Use title case (where all words except articles are capitalized).Example: “Education, Learning & Employment”NOT “Education, learning & employment”NOT “EDUCATION, LEARNING & EMPLOYMENT”NOT “education, learning & employment”
… …
26Taxonomy Strategies LLC The business of organized information
Controlled item: Training materials
Staff will require training on UI they use to tag the content Rules to follow when deciding
what codes to apply End-effect of the codes they
apply Structure of the taxonomy
Indexing rulesRule Description
Specificity rule
Apply the most specific terms when tagging assets. Specific terms can always be generalized, but generic terms cannot be specialized.
Repeatable rule
All attributes should be repeatable. Use as many terms as necessary to describe What the asset is about and Why it is important. Storage is cheap. Re-creating content is expensive.
Appropriateness rule
Not all attributes apply to all assets. Only supply values for attributes that make sense.
Usability rule
Anticipate how the asset will be searched for in the future, and how to make it easy to find it. Remember that search engines can only operate on explicit information.
Indexing UI
27Taxonomy Strategies LLC The business of organized information
Controlled item: Communications Plan
Stakeholders: Who are they and what do they need to know?
Channels: Methods available to send messages to stakeholders. Need a mix of narrow vs. broad,
formal vs. informal, interactive vs. archival, …
Messages: Communications to be sent at various stages of project. Bulk of the plan is here
Channel Description
Demo Live, or screen capture for download
Presentation Tailored message for specific audience
Website Overview info for all, link to files
Memo Formal notification
… …
Stakeholders Info. Needed
Project Sponsors Progress, Issues, Policies
Dept. Reps Progress, Priorities,
… …
Users Progress, How-Tos
Vendors RFPs & SOWs
Trigger Msg. Descrip
From To Chan.
Initiation Project overview
Dept. head
All Memo
… … … … …
28Taxonomy Strategies LLC The business of organized information
Controlled item: Team charter
Taxonomy Team is responsible for maintaining: The Taxonomy, a multi-faceted classification scheme Associated materials, including a website providing:
Corporate Metadata Standard Editorial Style Guide Taxonomy Training Materials Team rules and procedures (subject to CIO review)
Team evaluates costs and benefits of suggested changes. Taxonomy Team will:
Manage relationship between providers of source vocabularies and consumers of the Taxonomy
Identify new opportunities for use of the Taxonomy across the Enterprise to improve information management practices
Promote awareness and use of the Taxonomy
29Taxonomy Strategies LLC The business of organized information
Remaining controlled items
Performance Measures to go along with Charter?
Team Processes (see later in this presentation)
Automatic Classifier Training Materials
Tagging Cost and ROI Spreadsheets
Website
Presentations and Announcements
Change Request List (see later in this presentation)
Taxonomy Roadmap
30Taxonomy Strategies LLC The business of organized information
Published Facets
Consuming Applications
IntranetSearch
’’
Web CMS
Archives
ERMS
Custodians
Notifications
Change Requests & Responses
ISO3166-1
Other External
ERP
Other Internal
Vocabulary Management
System
Other Controlled
Items
…
’’
Intranet Nav.
DAM
…
Taxonomy governance environment
Taxonomy Governance Environment
CVs
2: Team decides when to update facets within Taxonomy
3: Team adds value via mappings, translations, synonyms, training materials, etc.
1: External vocabularies change on their own schedule, with some advance notice.
4: Updated versions of facets published to consuming applications
CV (Controlled Vocabulary) – The list of values for one facet in the Taxonomy.
31Taxonomy Strategies LLC The business of organized information
Taxonomy governance can be viewed as a standards process
Closely linked to organizational metadata standard Taxonomy must evolve, but in predictable way Team structure, with an appeals process
Taxonomy stewardship is part-time role at most organizations Team needs to make decisions based on costs and benefits
Documentation and educational material on Taxonomy and Metadata
Announcements Comment-handling responsibilities (part of error-
correction process) Issue Logs Release Schedule
32Taxonomy Strategies LLC The business of organized information
Where taxonomy changes come from
experience
End User
Firewall
Taxonomy
Content TaggingLogic
ApplicationUI
TaggingUI
Tagging Staff
Taxonomy Editor
Staff notes
‘missing’concepts
Query log analysis
Requests from other parts of NASA
experience
End User
Taxonomy Team
FirewallFirewall
Taxonomy
Content TaggingLogic
TaggingLogic
ApplicationUI
ApplicationUI
TaggingUI
TaggingUI
Tagging Staff
Taxonomy Editor
Staff notes
‘missing’concepts
Query log analysis
Requests from other parts of the organization
Team considerations
1. Business goals
2. Changes in user experience
3. Retagging cost
Recommendations by Editor
1. Small taxonomy changes (labels, synonyms)
2. Large taxonomy changes (retagging, application changes)
3. New “best bets” content
Application Logic
33Taxonomy Strategies LLC The business of organized information
Taxonomy maintenance processes
Different organizations will need to consider their own change processes. Organization 1: A custodian is responsible for the content, but
checks facts with department heads before making changes. Organization 2: Analysts suggest changes, editors approve,
copyeditors verify consistency. Organization 3: Marketing reps ask for a change, taxonomy editor
makes demo, web representative approves it.
Change process MUST also consider cost of implementing the change Retagging data Reconfiguring auto-classifier Retraining staff Changes in user expectations
34Taxonomy Strategies LLC The business of organized information
Other change processes
Change Request Process Anyone can ask a team member
for a change. Team members responsible for figuring out details and bringing to team for decision.
Pending changes list for low priority/high cost items.
Change Process Includes preview of change on
site and data mockup
Fast-Track Change Process Anyone can ask editor, he gets
team leader or deputy approval
Processes may be diagramed or written
Provide an ‘emergency’ change process because it will be needed. How can emergency changes be
requested? Who makes the change and who approves it?
Who are backups for the people when they are out?
What is escalation path for denied requests?
Change Request Process should call out decision criteria, e.g. Cost of retagging Benefit of change Conflict with editorial rules
35Taxonomy Strategies LLC The business of organized information
Analyst Editor
Problem?
Copywriter
Problem?
Yes
Yes No
No
Suggest new name/category
Review new name
Taxon-omy
Taxonomy Tool
Copy edit new name
Add to enterprise Taxonomy
Sys Admin
Taxonomy maintenance workflow
36Taxonomy Strategies LLC The business of organized information
Basic Change Request form and process
Need a way to collect and evaluate change requests.
Need a way to track deferred change requests.
Submit Change Request
Simple?Change as
REQUESTEDYes
Research/complete Change Request form
No
E
Change?C
Inform Originator
No
Yes
Immediate?
Yes
No
Assign Priority
E
C
E C
E
O
LegendO – OriginatorE – EditorC – Committee
Done
37Taxonomy Strategies LLC The business of organized information
Process Document
Team structure and roles
Taxonomy change triggers
Items to be controlled by the Team
Prioritization criteria Cost/Benefit considerations for
different types of changes)
Basic change process
Fast-track change process
Situation-specific considerations
38Taxonomy Strategies LLC The business of organized information
Finding information should not be about “Feeling Lucky”
39Taxonomy Strategies LLC The business of organized information
How do taxonomies actually improve search?
Input (Query) Side “Search” using a small set of pre-defined values
instead of trying to guess what word or words might have been used in the content.
Have synonyms mapped together so searches for “car” and “automobile” return the same things.
Output (Results) Side Organize search results into groups of related items. Sorting and filtering Refinement
40Taxonomy Strategies LLC The business of organized information
Taxonomy in action on the results side
Position Category
Company
City
State
Salary
Strategies LLCTaxonomy
May 22, 2006 Copyright 2006 Taxonomy Strategies LLC. All rights reserved.
Questions?
Joseph A. Busch+ 415-377-7912
[email protected]://ww.taxonomystrategies.com
44Taxonomy Strategies LLC The business of organized information
Resources mentioned
The American Customer Satisfaction Index: The voice of the Nation’s consumer. http://www.theacsi.org/overview.htm
S. Feldman. "The high cost of not finding information." 13:3 KM World (March 2004) http://www.kmworld.com/publications/magazine/index.cfm?action=readarticle&Article_ID=1725&Publication_ID=108
M. Hearst, A. Elliott, J. English, R. Sinha, K. Swearingen & K. Yee. “Finding the Flow in Website Search.” 45 Communications of the ACM (Sept 2002) http://bailando.sims.berkeley.edu/papers/cacm02.pdf
Memorandum M-04-20: Performance and Accountability Reports and Reporting Requirements (July 22, 2004) http://www.whitehouse.gov/omb/memoranda/fy04/m04-20.pdf
K.S. Taylor. "The brief reign of the knowledge worker," 1998. http://online.bcc.ctc.edu/econ/kst/BriefReign/BRwebversion.htm