Do not reinvent Findability and Knowledge Management Håkan Tylén Western Europe Business Development +46703091665 [email protected]
Nov 01, 2014
Do not reinvent Findability and Knowledge Management
Håkan Tylén Western Europe Business Development +46703091665 [email protected]
Agenda outline
Customer/Employee Service, in the Self-service channel
How can I help YOU?
Metadata basics What is it? Where is it stored?
Metadata is the set of properties that characterize a document.
Inconsistent, incorrect or missing metadata is commonplace within most organizations today
This impairs findability in the context of enterprise search
Hard to scan or navigate results
Documents returned may be incomplete or not current
No confidence in authority and correctness of information
Difficult to locate relevant experts
Poor metadata impairs the search experience Degraded findability leads to the erosion of users’ trust in search
I’m not confident I will
find what I need here…
This is a waste of time!
Unchanged template metadata make results
look like duplicates
Meaningless metadata confuses users as they scan the search results
Missing metadata raises questions about result
set completeness
Few options to navigate or refine a large result list other than trying to reformulate the query
Even with refinement tools, users do not rely on them
Multiple variations or spellings
Hit counts do not add up
6 | SharePoint Server 2010 for Internet Sites Microsoft confidential.
ROI - Scenarios
1. Time Wasted Searching
2. Cost of Reworking Information
3. Opportunity Costs to the Enterprise
7 | SharePoint Server 2010 for Internet Sites Microsoft confidential.
Scenario 1: Time wasted
€3.000/month + social €50.000/year
10 minutes/day *220 €1.000/emp/year
1000 employees = €1.000.000/year ”released time”
Creating quality metadata is a real challenge Few organizations have good quality metadata on internal content
• Ineffective information governance across the enterprise
• Multiple content silos and search interfaces
• Manually entered metadata is inconsistent, incorrect or missing
• No automated tools for content classification
• Impossible to keep up with ever growing content volumes
Challenge
Assist users in tagging
content with automated
metadata suggestions
or enrichment tools
• FAST Search for SharePoint (FS4SP) delivers business value out-of-the-box
• Sophisticated content processing optimizes findability across multiple silos
of unstructured and structured content
• In addition, property extraction overcomes poor metadata by generating it
and normalizing it on-the-fly
Solution
Agenda outline
The pipeline is a sequentially arranged set of discrete processing stages that break down and enrich content for indexing
Convert documents to plain text (support for 400+ file formats)
Detect document languages and encoding (support for 80+ languages)
Apply linguistic normalization to optimize content for search
Identify and leverage existing metadata where applicable
Parse content to extract or generate additional metadata
Map content and associated metadata (crawled properties) to the index schema (managed properties) for searching
Custom stages can be created and added to the pipeline
Content Processing Pipeline – what is it? Enhance your content for optimal search experience and findability
Properties
Mapper
Maps the relevant pieces of content and metadata
discovered in the pipeline to the index schema for search
Custom
Processing
Stage
Enables you to extend the content processing pipeline
with custom stages (home-grown solutions or 3rd party
software) to address your own business needs
Date and Time
Normalization
Converts dates and times to a standard representation, to
handle locale-specific representations; for example, the
date 14-Mar-10 is equivalent to March 14, 2010
Vectorization
Creates document vectors (phrase/weight pairs reflecting
important terms and frequency of occurrence) to enable
“find similar” functionality
Property
Extraction
Recognizes predefined entities mentioned in the content;
out of the box support for Companies, Locations and
People but this can be extended to other categories
Lemmatization
Applies language-specific normalization to content so
users’ queries match words and phrases in canonical or
inflected forms (singular/plural, masculine/feminine, etc.)
Tokenization
Breaks text into tokens using language-specific rules for
punctuation, diacritics, accents, compound words, phrases
and numbers (currency, telephones, part numbers, etc.)
Language
Encoding and
Detection
Identifies the encoding and languages used in the text
content so that the appropriate linguistic normalization
rules and dictionaries can be applied downstream
Format
Conversion
Extracts plain text and metadata from multiple content
formats (e.g. Microsoft Office, PDF, HTML, etc.)
Type Doc ID Title Author Date Size Keywords Companies Locations People ... Body Text
xxx Sales For… John Doe 2010-04-15 386 KB sales; pipe… Microsoft; … London; … Bill Gates; … … The mark…
yyy … … … … … … … … … …
zzz … … … … … … … … … …
Property Extraction Create metadata on-the-fly, adding structure to unstructured content
Locations
London
San Francisco
Moscow
…
People
Bill Gates
Barack Obama
José Caires
...
In a nutshell, property extraction is the ability to
Process unstructured content (e.g. a document’s body)
Recognize entities mentioned in the text (e.g. people, companies, locations, concepts, etc.)
Optionally, normalize variations to a single, canonical form
Expose these extracted entities as crawled properties in pipeline
Map them to managed properties for filtering and searching
Crawled Properties
Managed Properties Index Schema:
Companies
Microsoft
Contoso
Woodgrove
…
Metadata quality is critical to the search experience
FS4SP leverages metadata, i.e. managed properties, to present deep refiners
Offer at-a-glance overview
Organize free-text search results into multiple facets
Make search conversational
Guide users toward possible refinement choices
Prevent users drilling down into a “0 results” dead end
Additional uses for managed properties in FS4SP
Relevancy tuning & ranking
Multi-level sorting
Advanced (or fielded) search
Good metadata greatly improves findability Property extraction enables consistent metadata across all content
This is really great! Now I
can navigate through this
large information universe
without feeling lost…
Precise hit counts in deep refiners are
computed across the whole result set.
And many more…
Concepts
Products
Companies
File Formats ,
Metadata is also used for relevancy tuning,
multi-level sorting and advanced search
| 13
- Americas - - Europe - Middle East -
- Africa - - Asia Pacific -
Seattle Dublin
Singapore
19.4 TB 127,986 Sites 345,935 Sub-sites
4.1 TB 45,878 Sites 82,128 Sub-sites
6.4 TB 49,731 Sites 117,324 Sub-sites
29.89 TB ( 31,346,042 MB ) Grows with 1.5TB per quarter
223,595 Sites 545,387 Sub-sites
65% 22%
13%
As of September 2010
The Microsoft IT Intranet Environment
Knowledge Transfer: MSW
FS4SP automatically detects 80+ languages in content
Property extraction dictionaries are included for 11 languages* and 3 types of entities
Locations
Companies
Persons
The metadata is exposed to users as refiners, drives relevancy and other features to improve findability
This delivers real business value to organizations struggling with issues such as
Poor document metadata
Large content volumes
Lack of result refinement options
Low user adoption of search
Property extraction and refiners in FS4SP What’s available out-of-the-box?
* Arabic, Dutch, English, French, German, Italian, Japanese, Norwegian, Portuguese, Russian, Spanish
Property extraction in FS4SP is customizable using a dictionary, i.e. list of keywords and phrases
Matching variations can be normalized to a single entry
Several dictionaries may co-exist to address needs of the business
Projects
Products
Customers
Competitors
Employees
Business-specific concepts
The necessary data may be readily available within the organization or from external sources
Extending property extraction in FS4SP (1/2) Make search speak the language of your business using dictionaries
SharePoint lists & Term Store
LOB applications, Databases & XML
Create custom search refiners to fit your own business needs
External text mining/classification tool Another approach is to invoke external tools during content processing in FS4SP
This leverages the standard pipeline extensibility mechanism
Such tools typically address problems like
Text mining for entity, fact or relationship extraction
Taxonomy classification
Moreover, these tools may be already deployed for other purposes in the enterprise
Home-grown solutions
3rd party, specialized vendors Industry sectors or verticals
Scientific or technology domains
Extending property extraction in FS4SP (2/2) Use existing text mining or classification tools to go even further
?
Original document from repository
Analyze text content
Return metadata tags
Index Content pipeline
Enriched document for indexing
Web service Local software
Agenda outline
Best practice #1 Deepen your understanding of your audiences and your content
En
terp
rise
co
nte
nt
Before you start deploying enterprise search:
understand your content, your users and what
they need to get their jobs done effectively.
Marketing Sales Procurement Consulting Research HR / Legal IT Support Production
Best practice #2 Use existing language resources inside and outside your enterprise
Inte
rnal
ass
ets
•Thesauri, controlled
vocabularies
•Taxonomies,
ontologies
•Master databases
•Enterprise systems
•Line-of-business
applications
•Subject matter
experts
•Examples*
•SharePoint (Lists,
Term Store)
•Employees (AD, HR)
•Customers (CRM)
•Suppliers (ERP)
•Products (PLM)
•Processes (BPM)
•Projects (EPM)
Inte
rnet
reso
urc
es •Government
agencies
•Industry bodies
•Research
institutions
•Academia
•Virtual
communities
•Examples
•Wikipedia.org
•DBpedia.org
•WordNet, from
Princeton University
•Medical Subject
Headings (MeSH) C
on
ten
t p
rovid
ers
Sp
ecia
lized
ven
do
rs
* AD – Active Directory; CRM – Customer Relationship Mgmt.; ERP – Enterprise Resource Mgmt.; PLM – Product Lifecycle Mgmt.; BPM – Business Process Modeling; EPM – Enterprise Project Mgmt.
The language of the business will change over time
External environment
Enterprise content
Users’ needs
Ensure that property extraction dictionaries and search index are systematically updated to respond to these changes
Where possible, automate dictionary upkeep as part of standard business workflows
Taxonomies and thesauri
Enterprise project management
Product lifecycle management
Schedule regular analysis and review checkpoints to handle exceptional cases
Best practice #3 Keep the index synchronized with content sources and dictionaries
Property
Extraction
Dictionaries
Search
Index
Dictionary
Data
Sources
Enterprise
Content
Sources
Searc
h sy
nch
ron
ized
with
ch
an
ges o
ver tim
e
As the language of your business and users’ needs evolve, so should your search solution
If not, the search experience and findability inevitably degrade over time – users’ trust will plunge too
Search management is not an IT responsibility, it’s for the business
Best practice #4 Distinguish search management from systems management
Original implementation of the search solution
Actual search experience,
if left unattended...
• Skillset of a SharePoint administrator (not a
programmer or systems engineer)
• Business perspective and focus
• Good ability with languages
• Attention to detail
Job profile
• Monitor search reports (daily/weekly)
• Run user polls and/or focus groups
(quarterly)
• Process users feedback/questions
• Update dictionaries and manage keywords
(as required)
• Support search-related projects
Sample tasks
• One person part-time, or
• A geographically distributed team
Staffing – depends on scale
Agenda outline
• Researchers forced to search each internal and
external content source separately
• Low relevancy in existing search applications
• High effort in information discovery tasks
• Growing difficulty in establishing connections with
experts as company grew worldwide
Business Problem
• FAST Search for SharePoint indexes all internal
sources and federates external industry services
• Property extraction dictionaries extended to
recognize product names cited in documents
• Deep refiners are used on extracted properties to
drill down by products, companies and people
Approach & Solution
• Improved employee productivity with more relevant
search results in a unified interface
• Greater information sharing and reuse across
product areas & geographies
• Integrated people search eases social networking
• Proof point for wider search roll-out in enterprise
Benefits & Value
Case study #1 General Mills (Research & Development)
By using FAST Search Server 2010 for SharePoint, our researchers can refine their searches and find exactly what they are looking for. They spend more time innovating than looking for information.
– Michelle Check, R&D Systems Leader, General Mills Link to full case study
• Poor access to a large, active collection of paper-
based contracts and project documents
• Metadata managed in a separate DMS (database)
• Information silos stifle and sharing of data and
collaboration
• Requirements to provide internal and public access
Business Problem
• FAST Search for SharePoint indexes images with
iFilter-based OCR technology
• Pipeline extended with custom .NET code to merge
metadata from database with indexed documents
• Custom refiners reflect language used in the
business for navigating search results
Approach & Solution
• Unified self-service interface to locate information
• Ability to slice & dice results according to specific
needs (dates, project, folder, route, district, etc.)
• Information search times cut from several hours or
days to mere seconds or minutes
• Users have more time to focus on higher value tasks
Benefits & Value
Case study #2 Mississippi Department of Transportation (MDOT)
We are literally reducing decision cycles from days to minutes for hundreds of overlapping decisions a day. With SharePoint Server 2010, we can make better spending decisions and enhance program performance without a very large investment.
– John Michael Simpson, CTO, MDOT Link to full case study
The challenges
• Explosive content growth puts information management and
governance under pressure
• Multiple content silos with different search interfaces
• Poor metadata – missing, inconsistent, incorrect
The solution
• Content processing optimizes findability across disparate sources
• Property extraction generates metadata while indexing content
• Deep refiners expose metadata in search results helping users
quickly zoom to the right information
The benefits
• Reduced costs through enterprise search consolidation and
automated metadata enrichment
• Enhanced findability helps employees to get their job done faster
• Increased user adoption across the enterprise drives ROI
Ingredients for great enterprise search The business value of FAST Search Server 2010 for SharePoint
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
microsoft.com / Enterprise Search