Top Banner
1 Guident - 198 Van Buren Street, Suite 120 Herndon, VA 20170 - Tel: 703.326.0888, Improving Findability Behind the Firewall Bob Boeri Copyright © 2010 Guident - All rights reserved

Improving Findability Inside the Firewall

Jan 28, 2015



This is the breakout session Boeri presented at the 2010 Enterprise Search Summit in NYC. This presentation includes speaker notes.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
  • 1. Improving Findability Behind the Firewall Bob Boeribboeri@guident.comGuident - 198 Van Buren Street, Suite 120 Herndon, VA 20170 - Tel: 703.326.0888, Copyright 2010 Guident - All rights reserved1

2. Agenda Findability What is it? Why is it so hard? Approach to improving findability Findability Project Stages Summary Findability Checklist Copyright 2010 Guident - All rights reserved 2 3. Findability What is it? The art and science of locating information in or about an electronic document. Entails organizing and searching content, semantics, and interface design. Optimizes both recall and precision getting everything that matches your query versus only the one or two items youre looking for. We spend up to 20% of each workday trying to find document information. We want to FIND, not SEARCH. I know what it means well enough, when I find a thing, said the DuckThe question is, what did the archbishop find?Copyright 2010 Guident - All rights reserved 3 4. Some Elements of Findability ClusteringControlled DataVocabularies DictionariesEntity Extraction Semantic Search TaggingTaxonomiesText Analytics Thesaurus Notice the right brain verbal, language aspects ofFindability:The Art and Science of Making Content Easy to FindUltimately people want to find, not searchCopyright 2010 Guident - All rights reserved4 5. Involves Content, Processes, and People Documents are becoming inherently social, so finding and leveragingdocument information requires a broad strategy, not just selecting andinstalling the best search engine. Enhancing findability requires considering all three gears to drive a unifiedinformation access strategy With a comprehensive approach to the findability lifecycle Document SpectrumCopyright 2010 Guident - All rights reserved 5 6. What is a Document? What is a document? A file you can perceive with one or more senses.ISO 15489: "Recorded information or object which can be treated as a unit.Record: Information created, received, and maintained as evidence legal obligations or transactions of businessDocuments constitutes 80% of our business knowledge assets. Databasesetc. the other 20%. Copyright 2010 Guident - All rights reserved 6 7. Findability Why Is It So Hard? FORMATS: Hundreds of formats, versions, fonts, character sets across the structure spectrum. PLACES: Dozens to thousands of file shares, ECM repositories, desktops, email systems, databases, intranet QUANTITY: Information and file counts doubling at least yearly. Google indexed 1 trillion web pages in 2008. Quantities of multi-terabyte and even multi-petabyte are increasingly common inside the firewall. LANGUAGE: Inherently subtle, inconsistent names, dates... RIGHTS: Managing security is difficult since systems define rights differently and repository administers tend to over-protect information. If you dont have rights you cant find content. PROCESSES and PEOPLE: So many tools, so little oversight. Governance. Kilobytes > Megabytes > Gigabytes > Petabytes > Exabytes Copyright 2010 Guident - All rights reserved7 8. Findability Project Success: Keys and ShortcutsKeys: Approach findability projects holistically.Business process and culture analysis (right-brain)PLUSFull project lifecycle best practices (left-brain)Are there shortcuts? When Ptolemy, Alexander the Greats powerful Greek general asked Euclid for the shortcut to learning Geometry, Euclid replied There is no royal road to Geometry.There is no shortcut to findability either. Copyright 2010 Guident - All rights reserved8 9. Findability Enhancements LifecycleDesignFunctional andTechnicalRequirementsTaxonomy andMetadataAnalyze Enterprise RightsBuildManagementPain Points ChangePerformance - InitiateCurrent State ManagementFuture StateSpeed SystemObjectives80-20: WhoGovernance Plan Scope HW / SWSearches? Why? Requirements?Test the System Stakeholders -Technology AlliesSurvey Test theDeliverTaxonomy SponsorStrategy TacticsTo Be ModelMonitor - Govern TaxonomiesContinuous Improvement Train Evangelize Copyright 2010 Guident - All rights reserved 9 10. InitiateInitiateAnalyzeDesign Build Deliver Who is the sponsor? Who are the stakeholders? Who will be helped by the project? Who might object? Scope: Fixing a current problem in one repository? Integrating islands of information? Anticipate trends such as Web 2.0 (blogs, wikis, social tagging) What will be searched and where does it reside? Will you augment or upgrade what you have today, or will you replace, yourcurrent search facility? Is the findability problem a training issue? Training and follow-up are alwaysrequired. Is there a tactical quick win consistent with strategic goals?80% of organizational information is unstructured and 90% of this remains unmanaged. Unmanaged information is growing at roughly 36% annually. AIIM,The New ECM Trifecta, September 17, 2009.Copyright 2010 Guident - All rights reserved10 11. InitiateInitiateAnalyzeDesign Build Deliver Are there allies whom the sponsor might not know? Librarians, taxonomists, records managers, ECM users, Technical Writers, Attorneys (eDiscovery issues), Business Analysts What are the goals and objectives? Business or Technical? Lower costs? Reacting to a lawsuit? Identifying critical business continuity documents? Green issues can include cost savings. Gartner recently said that environmental and social responsibility will exceed compliance as a corporate priority. How will you know youve succeeded? The average office worker uses 10,000 sheets of copy paper each year and wastes about 1,410 of these pages. With the average cost of each wasted page being about six cents, a company with 500 employees could be spending $42,000 per year on wasted prints. AIIM Eight Reasons You Need a Strategy for Managing Information, October 2009. Copyright 2010 Guident - All rights reserved11 12. Analyze InitiateAnalyzeDesign Build Deliver Business Requirements? What do stakeholders say? How about squeaky wheels? 80/20 rule: What must be done? What is the vision for the future state? If none, develop it. Who may rely on the same information and should be part of teamor at least consulted? Think big, but act small initially. If you cant consolidate searchsystems, target them as future parts of the federation. Manage expectations: performance, precision, recall. Rita Knox, Gartner analyst. Search and taxonomy technology is pretty goodnow. In fact, we're seeing taxonomy and search come together wherecompanies can even slant it toward certain results (to fit their needs andindustries). Copyright 2010 Guident - All rights reserved12 13. Analyze InitiateAnalyzeDesign Build Deliver Align with architecture standards if available. If you cant include all, at least have a bridge or cooperative strategy. How does new content become available? Are the processes managed? If searching in a content management system, can users put content in the wrong folder? Search every version of every document? Major versions only? Critical Success Factors? What are the pain points? Green Connection? Demands on Storage, Data Centers, Backup and Recovery. Copyright 2010 Guident - All rights reserved13 14. Analyze InitiateAnalyzeDesign Build Deliver Content - Perform Information Audit and Assessment: Where is the content to be searched: Managed Content Repositories, Email, Shared Drives, Desktops What kind of content is to be found? See formats earlier XML content and DTDs/Schemas? How much content is there, and how fast does it grow? Which content is most important to find? 80/20 rule. Bundled objects and Zip files. When: How often and when is it searched? The perils of paper and OCR. Tools in place: What search engines are already in place? (There always are some, often many.) Taxonomy management tools other than Excel and Mind Manager or FreeMind? Copyright 2010 Guident - All rights reserved14 15. Analyze InitiateAnalyzeDesign Build Deliver Are there allies whom the sponsor might not know? Librarians,taxonomists, records managers, ECM users Performance: How quickly to index and find new content? What taxonomies or metadata currently exist: They exist maybe implicitly or by other names site maps, for example. Folder structures in ECMs Metadata Managed vocabularies, such as thesauruses and value lists Tools other than Excel to manage them? Who if anybody is in charge of information governance? Only after thorough analysis, perform a thorough vendor search. Vendor maturity and Quadrants Hype Cycles Copyright 2010 Guident - All rights reserved 15 16. Analyze InitiateAnalyzeDesign Build Deliver Search isnt homogeneous, and all vendors are not alike. Usually no single best vendor choice. Market share Support Maturity Ability to Execute Completeness of Vision Related products (Document management always comes with search, usuallyOEM-edition). Vendors buy Competitive products Verity Autonomy Convera Fast Microsoft Copyright 2010 Guident - All rights reserved 16 17. Design InitiateAnalyzeDesign Build Deliver Taxonomy design approaches Avoid business organizational (changes, hard to work with cross-organizational content) Consider a process approach: What business processes produce documents? Metadata design approaches: Balancing act: How much is enough? Discover whats wanted, then urge pruning Normalize the various sources youll be searching. ideal person to be responsible for ERM implementation is someone who oversees both security technology and information access policies; or, failing that, an organization where the executives in charge of each of those areas work closely together. Enterprise Rights Management, Gilbane Group, August 2008 Copyright 2010 Guident - All rights reserved 17 18. Design InitiateAnalyzeDesign Build Deliver Search Federation / Integration One ber-search system? Simple and Advanced user interfaces? Simplicity iskey. One-stop searching to display results from other search engines? Prioritize repositories for indexing? Delivery devices: PC and laptop screens Phones and PDAs? Designing style sheets for each type of content (see earlier document spectrum)to each kind of device Our research found that multiple search engines are the norm in most organizations separate search solutions for e-mail, Web content, wikis, Blogs, ERP systems, CRM systems, intranets, File shares (leading) to user frustration with enterprise search. AIIM, MarketIQ Intelligence Quarterly Q2 2008 Findability - The Art and Science of Making Content Easy to Find Copyright 2010 Guident - All rights reserved 18 19. Design InitiateAnalyzeDesign Build Deliver Index design Full index versus incremental index When on the fly for everything? End of day or end of week? Balancing privacy and security Allow me to see at least names or metadata of files whose content I cannot view? Allows me to contact author to learn more. Hide all results I shouldnt see; no option for me to learn more. Try saying IT owns search at your next company meeting, and watch thephone lines to HR light uptheyll raise holy hell at the concept of ITindexing their email or web activity. IT versus Organizational Paranoia, Information Week, November 9, 2009Copyright 2010 Guident - All rights reserved 19 20. Build InitiateAnalyzeDesign Build Deliver Test the Search System but also test its supporting components Testing the taxonomy: Balance your resources and your scope Scope: Who, what, when, how? Expect to revise the taxonomy. Taxonomy Testing Tradeoffs: Scope Whole taxonomy, every node? Costly and time consuming. The hardest branches? Says who? Sampling techniques how many and which documents to test and which branches? Participants Those who are familiar with the taxonomy: May not learn as much. Theyve already drunk the Koolaid. Those unfamiliar with the taxonomy: Learn more, need more upfront training and time.Copyright 2010 Guident - All rights reserved20 21. Taxonomy Testing Practices InitiateAnalyzeDesign Build Deliver Testing is critical to assuring that the taxonomy meets designobjectives and supports general taxonomy metrics (such as breadthand coverage). The primary objective of folder taxonomies: provide an intuitivestructure into which documents will be stored consistently andthrough which users can navigate to find needed content. Who manages the taxonomy definitions? Iterations of the testing are normal; like Clinical Trials, testingevolves as more is learned in different phases. This includestesting after deployment (like Phase IV). Unlike clinical trials, most people have very limited time to test.Copyright 2010 Guident - All rights reserved 21 22. Why Test Taxonomies? InitiateAnalyzeDesign Build Deliver Because no structure is perfect, and initial taxonomiesare just that: Version 1. You want the best practicable solution to build on. You want to be sure that there is a place ideally onlyone place for every document to be stored. You want the taxonomy to be as intuitive and easy tounderstand as practicable. Copyright 2010 Guident - All rights reserved22 23. Testing Tradeoffs InitiateAnalyze Design Build DeliverTaxonomy scope, options include: Test all bottom branches: Tests everything, takes longer. Test only the challenging branches: Doesnt test everything, may takeless time. Hybrid: A good sample test with some pre-selected documents andsome volunteered by the testers.Types of testers: Involve current project participants: Understand the taxonomy, expedited training, participant biases may reduce what we learn. New project participants: Training and testing takes significantly longer, may provide more and more useful results. Hybrid: Use a mix of current project team and new testers.Copyright 2010 Guident - All rights reserved23 24. Testing Tradeoffs InitiateAnalyzeDesign Build Deliver Testing Group Sizes, options include: Large group tests are easier to schedule but provide low quality test results. One-on-one testing provides highest quality test results but takes the most time to complete. Small Groups Test Documents and Sources options: Using documents named in taxonomy discovery meetings is easier but self-fulfilling; not a fair test. Preselecting documents from records schedules gets the process started and uses existing definitions but may not be representative of the final mix. Copyright 2010 Guident - All rights reserved 24 25. Deliver Install and Walk Away? InitiateAnalyzeDesign Build Deliver Ongoing outreach to users Ongoing Auditing and Governance Information Systems Governance:a subset discipline of Corporate Governance focusedon Information Technology (IT) systems and theirperformance and risk management.IT governance implies a system in which all stakeholders,including the board, internal customers, and in particulardepartments such as finance, have the necessary inputinto the decision making process.Wikipedia, Information Technology Governance. Copyright 2010 Guident - All rights reserved25 26. Deliver Install and Walk Away? InitiateAnalyzeDesign Build Deliver Thinking about governance should start as soon as the findability project begins. Keep the governance simple Involve all high-level stakeholders Plan for change in the governance model as findability itself evolves. Copyright 2010 Guident - All rights reserved26 27. In Summary Andy Grove was right: Only the Paranoid Survive and get to deliver findability results successfully. Use both the left (analytical) and right (creative) sides of your brain, and make sure your team has both sufficient technical and political skills, throughout the full lifecycle of your findability projects. And dont forget that findability projects never end, they just change their phases. Copyright 2010 Guident - All rights reserved 27 28. About Guident Professional Services and Consulting Firm: Business Intelligence, Management Consulting, Systems Engineering, ECM and Search Founded in 1996, headquartered in the Washington, DC Metro area Over 260 professionals with broad expertise and backgrounds Named to Inc. Magazines Inc. 5000 list in 2007, 2008, and 2009 Washington Technology Fast 50 member in 2006, 2007, 2008, and 2009 Washington Business Journal Fastest Growing Company in 2008Email Bob Boeri bboeri@guident.comfor Findability Checklist and Presentation Quotes toolCopyright 2010 Guident - All rights reserved28