Architecture: Fast Search Server 2010 for SharePoint SharePoint Saturday Carlos Valcarcel Fast Technology Specialist, Fast, A Microsoft Subsidiary
Dec 02, 2014
Architecture: Fast Search Server 2010 for SharePointSharePoint Saturday
Carlos ValcarcelFast Technology Specialist, Fast, A Microsoft Subsidiary
Demo: Fast Search Server 2010FAST: A Brief Time of HistorySharePoint 2010
Search features
Fast Search Server 2010FeaturesArchitecture
Why Fast Search Server instead of SharePoint search?
Agenda
MSW – Microsoft Internal Web Site
demo
You’ve probably heard it all before.Fast was founded in 1997; it was 11 when the acquisition completed (2008).AllTheWeb.com – still an active site!
Sold by Fast to Overture, then Overture bought by Yahoo!Fast invested in enterprise search
Our flagship product, ESP, powers some of the largest sites on the web
Dell, Best Buy, Scirus (Reed Elsevier), Financial Times, Oodle, Rakutan
When we OEM’ed our product:DocumentumDell Message One (Email/eDiscovery)CommVaultEMC CenteraMatterSpace®
Fast: A Brief Time of HistoryWhere did Fast come from?
Linear scalabilitySupport for more languagesBetter relevancySupport for 100 million documents per farmFederated results on one page (OpenSearch compliant)Navigators (navigator counts not displayed)Users can tag documentsSharePoint follows clicks to boost relevancyAuto detect languages in documentsUser can increase boosting based on languageQuery completionDid you mean…?Sub second response timeSynonym support (called Aliases)Phonetic matching (Sharten Mickleson Kjartan Mikkelsen)Native 64-bit deploymentScaling along all dimensions
Query processing across multiple servers
Search dashboardAdding contentCrawl rulesPowershell has 128 commandlets for search so everything you want to do for search can now be scripted.
Merges results from multiple nodes
SharePoint 2010 SearchA Brief Look: Great New Features! Less Filling! Secret Ingredients from Norway!
Almost everything available in SharePoint 2010Lemmatization/StemmingDocument Thumbnail and PreviewVisual Best BetsPeople Search with phonetic searchFederated Search (OpenSearch)Single search (federated) across all contentRelevancy per audience
Custom GUI per audience is possibleLocation, Language, Role, and Search awareDocument boosting and blocking (click-through relevancy)
Document processing pipelineSynonymsSecure SearchDynamic navigators (OOTB and custom)TaxonomyBreadcrumb navigation
Fast Search Server for 2010The Future of SharePoint Search: More and Better (did I mention with Secret Ingredients from Norway?)
The GUI: Enhancing the Search ExperienceYou’ve Got Your Search in My Collaboration Platform!
FS4SP
User Interface is visual and actionableVisual and conversational interaction with precise control
Built on SharePoint Search CenterLeverages all of innovations in SharePointOpen Web Parts, Federation, query suggestions, related queries, Did you mean?
Visual results connects users with contentThumbnails for Word and PowerPointVisual Best Bets highlight premium content Preview in browser without leaving the results
Deep Refinement
Thumbnails
Previews
Sort on any field
Similar Results
Map metadata to Managed Properties Automatic association of metadata to content
Crawled Properties Standard document metadata discovered by the crawler or extracted from the full text by the FAST Content Processing Pipeline.
Location
Redmond, WA
Oslo, Norway
Company
Microsoft
FAST
Date
January 8, 2008
January 4, 2008
Concept
Cash tender
Share price
Managed Properties Map one or more Crawled Properties to a single field. Enables sorting, refinement, relevance tuning and fielded searching.
Crawled Properties
Any data can be found!!Maps automatically or through Central Administration or PowerShell
Type DocId Title Author Date Size Location Company Concept Body
123 Press Release
… 01/08/2008
26K Redmond Microsoft Cash Tender
…
345 … … … … … … … …
Index Profile Managed Properties
Put your terms in the out of the box extraction dictionaries by modifying an XML fileMap the crawled property to a managed propertyIndex your contentModify refinement panel web part
How does it work?Example: Create a custom entity extractor
Customized Extraction Dictionary
How does it work?
Built on a SharePoint List or custom extractorEdit the Search Center Results PageModify the shared web part by adding tags to the refinement panel XMLCreate your own labelsSave and Publish
Custom Collections
Add refiners to user interface
Quickly build a contextual experienceUser based tools for creating results that are relevant to your users
Pick the right ingredients Match the proper terms and contexts to boost relevancy for targeted users to ensure your users are always finding the right content
One-way synonymsKeywords map to other termsTwo-way synonymsKeywords become equivalent to other termsBest BetsHighlights key resources that are always relevant to a keywordVisual Best BetsExtend Best Bets with pictures, video, Silverlight controlsDocument Promotion / DemotionTailor specific document relevancy
Create new user contextsSite administrators create contexts based on user profiles to deliver relevant results to the right audiences
Create new keywordsSite Administrators have powerful and simple tools to configure the search experience for groups of users
Deliver results that are contextually relevantwith search that can understands your business and role
”What should I know about selling ERP?”- Alan Brewer, Sales
Lead
”What should I know about implementing
ERP?”- Renee Lo, Consultant
Role-specific relevance
Business drivenrefinement
Targeted Best Bets / Visual
Best Bets
Rank ProfilesTune relevancy without impacting the default algorithm
Quality Also known as static rank, consists of multiple managed properties including site, URL depth (preference for shorter URLs), and relative importance of links to this document.
Authority Applies when the query word falls in the link or anchor text.
Query Authority
Maps the popularity of a document, or the click-through rate when documents are clicked as a result of a query
Freshness Increases the relevancy if a document was recently created or modified, based on the last modified property.
Proximity Applies to where query terms fall and how close they are to each other within a document
Context Increases the rank of a document if the query term is a managed property associated with that document
Managed Property
Effects relevancy when a managed property contains a specific value, such as Woodgrove Bank or Financial Services
Out of the box relevancyTuned for great general productivity experience, relevancy improves with click-throughs and link text analysis.
Extend the default algorithmsCreate new default relevancy models. Blend static and dynamic ranking parameters to instantly improve search results.
How to create a Rank ProfileIT Pros are empowered to create new profiles quickly
Rank Profiles created in PowerShell by extending the default relevancy algorithm…
… and are exposed in the user interface by modifying the sorting
web part.
Back End Processing Tasks:Load content from many different places
Out of the box connectors for SharePoint, exchange public folders, and shared filesSharePoint Designer to configure connection to customer portfolio/holdings database
Create custom metadata with content processing pipeline
Names of holdings, offerings, key concepts, companies, peopleSynonyms for key concepts (real estate ~ REIT)Roll-ups configured with optional results collapsing stage
Create custom relevance profileDesigners can stylize the User Interface
Apply styles to web partsFederation, People Search, Search actions
Build custom web parts for visual navigationUse SharePoint workflows to perform business specific actions
Leveraging the platform to build applicationsPutting together all of the pieces to build search-driven applications
Simplified, powerful administrationA high-end enterprise search solution that’s easy to deploy and manage
Deploy easilyusing wizard-driven installation, a topology designer, and native support for 64-bit virtualization
Manage efficiently with full support for Microsoft System Center and PowerShell scripting to automate tasks
Streamline administrationwith a simplified admin console that helps you manage search services across your enterprise
Architecture
FS4SP
Microsoft’s 2010 Dog-Food FarmDescription: Team Collaboration Portal & Social NetworkingDay to day work and internal experiments
Farm’s Total Data Size 1.8 TB Largest Content Database 800 GB Largest Site Collection Size 280 GB Logging DB Size (14 days) 300 GB Number of Web Applications 4 Number of Content Databases 13 Total number of Site Collections 7,700 (7,200 my sites) # of User Profiles in Profile DB 193,000 Total number of Documents 4 Million
Workload: Total number of users per week: 15,200 Concurrent users (Distinct Users per Minute) ~200 Total Requests per day: ~7,000,000 Hourly Average RPS [Requests per Sec]: ~150 Hourly Max RPS [Requests per Sec]: 270
Data Set:
Search Full Crawl generating ~75%
FAST Search for SharePoint Scaleout
Content Volume
Query Volume
Scale-out multiple “dimensions”
Query VolumeContent VolumeIndexing freshness
Redundancy optionsSearchIndexing
Performance targets*30M Docs/node50 QPS/node35 docs/sec
*Depends on content and hardware specifics
Search and Indexing
Crawling and Content
Processing
Query and Result
Processing
Back-end with extreme and flexible scale out options
No theoretical upper bounds!
FAST Search Server 2010Summary of architectural components
Custom Front-End
OpenSearch or Other Sources
SharePoint Front-end
People Search
Qu
ery
Obje
ct M
od
el
Query and
Result Processin
g
Security AccessModule
SearchCore
Indexing
Federation Object Model
Query Web Service
Advanced Content
Processing
Linguistics
WebLink
Analysis
Connectors
• Web Crawler
• JDBC
Connectors
• SharePoint• File
Traverser• Web • BDC• Exchange• Notes• Documentu
m
Microsoft System Center Operations Manager
Monitoring Services
Administration and Schema Object Model
Site Collection Level Admin UI
• Keyword Management• User Context
Management• Site
Promotion/Demotion
PowerShell
• Schema configuration• Admin configuration• Deployment
configuration
Central Administration UI • Property mapping• Property extraction• Spell-checking
FAST Server(s)
SharePoint Server(s)
Other Server(s)
Content
Enhance SharePoint platform capabilities with out-of-box features, services, and tools that streamline development of solutions with deep integration of External Data and Services.
Dynamics SAP Siebel LOB
Web 2.0
DevPlatform
Business Intelligence
Enterprise Content
Management
CollaborationSocial
Enterprise Search
Model Store
BDC Runtime
LOB/Doc Binding Security Out of box
Parts
Office Apps
CacheOffline
Operations
DesignTools
SPD
VSTO
SharePoint
BDC Client Runtime
Search LOB Systems via BDC/BCS
Document Processing Pipeline Stages
Format ConversioniFilters, OutSideIn
Language detection and encodingLemmatizer
Linguistics normalizationTokenizer
Word breakingEntity Extraction
Persons, companies, locations, email, date/time, URL, prices, file names
DateTimeNormalizer Date normalization
Vectorizer Create document vector for similarity searching
WebAnalyzerAnchor text and link cardinality analysis
PropertiesMapperMap to crawled properties
PropertiesReporter Report detected properties
Default Optional
XML Properties mapperOffensive Content FilterVerbatim extractor
Loads dictionary for custom extraction, e.g product names
Field Collapsing
…
Form
at
Convers
ion
Lang
uag
eD
ete
ctio
n
Enti
tyExtr
act
ion
Config
ura
ble
Sta
ges
Map
per
The different plug-ins can either be configured from UI or from config files
Content Processing and Schema
Extracted document attributes reported as Crawled PropertiesCrawled Properties mapped to Managed PropertiesCharacteristics are defined for Managed Properties, e.g.
RefinersSortingQueryableType
Definition and mapping done via UI or Powershell
Admin UI
Schema CmdLets
Custom Client
Schema Object Model
Schema Service (hosted in IIS)
Property backend bliss psctrl
configserver
Update ToolsPersistence
Document Processing Pipeline
PropertiesMapper
PropertiesReporter
Update configuration
Alert pipelineof updatedschema
Report discovered crawled properties
Pipeline Extensibility API
MotivationStraightforward way to add text analysis functionalityFlexibility and supportability
Example usesSentiment analysisTranslationAuto-Classification
MechanismJust before Mapper“any” binaryRuns in sandbox with timeout
…
Extensibility
MapperStandard processin
g
100 million documents per farmRefiners: only uses the first 1000 resultsSearch is restricted to one farm
Yeah, So What?Tell Me Something Awesome
40 Million Documents per serverRefiners: exact count from the entire result setContent can be indexed and search across farms
3.6 TB of disk space per server (so far!) and support for NAS and SANs.Full support for VMs (Hyper-V and VMware)
SharePoint 2010 Fast Search Server 2010
There is nothing wrong with SharePoint!
SharePoint brings together a number of collaborative technologies that would otherwise not play well togetherAs SharePoint adoption spreads the need for enterprise search only increasesSearch today is where RDBMSs were over 20 years ago
Let me say that again: there is nothing wrong with SharePoint!
Is Something Wrong With SharePoint?
The PresentSharePoint 2010 search addresses a host or previous issuesNo migration path from SP 2010 to Fast Search 2010
The FutureWhere do you think Fast Search Server will be in 3 years (the next release of SharePoint)?
Why Fast Search Instead of SharePoint Search?
You’ve Got QuestionsI’ve probably got answers…
Q and A
Demo: Fast Search Server 2010FAST: A Brief Time of HistorySharePoint 2010
Search features
Fast Search Server 2010FeaturesArchitecture
Why Fast Search Server instead of SharePoint search?
Agenda
The organizers of SharePoint SaturdayTo all of you for attending!
Thanks
Capacity Planning White Paperhttp://www.microsoft.com/downloads/details.aspx?FamilyID=65b799e3-825c-4398-8cd7-3311d3297997&displaylang=en
RSS: FAST Search Server 2010 for SharePoint Newly Published ContentIf you bookmark only one RSS feed for Fast Search Server 2010 this is the one: http://services.social.microsoft.com/feeds/feed/FASTSearchServer2010NewContent
DocumentationTechNet: http://technet.microsoft.com/en-us/library/ee781286.aspx
MSDN BlogsEnterprise Search: http://blogs.msdn.com/b/enterprisesearch/Steve Nicolaou, Fast Architect: http://blogs.msdn.com/b/stevennicolaou/Jørgen's FAST Search Blog: http://blogs.msdn.com/b/jorgeni/ Dark Corners: http://blogs.msdn.com/b/dark_corners/
Enterprise Search User GroupSecond Wednesday of every month! You missed July! Don’t miss August!
Case Study: Search and the FBI Sentinel Program Author: Marti Hearst, Search User Interfaces (http://www.searchuserinterfaces.com/)Next Generation Tools: Content Transformation Service/Interaction Management Service
References
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the
date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
microsoft.com / Enterprise Search
Thank You.