The Inside Scoop: How Microsoft Built a Scale Lab with 120 Million Items across two 15 TB Content Databases Barry Waldbaum FAST Architect Microsoft Corporation Paul J. Learning Sr. Consultant Microsoft Corporation Paul Andrew Sr. Technical Product Manager Microsoft Corporation SPC399
48
Embed
Documents Drop Box Document Library FAST Search Index Archive Content Database(s) Content Routing.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The Inside Scoop:How Microsoft Built a Scale Lab with 120 Million Items across two 15 TB Content Databases
Barry WaldbaumFAST ArchitectMicrosoft Corporation
Paul J. LearningSr. ConsultantMicrosoft Corporation
Paul AndrewSr. Technical Product ManagerMicrosoft Corporation
SPC399
Session Objectives and Takeaways
Describe the very large scale test lab we did for
SharePoint
The test lab was shown in the keynote by Jeff Teper and Richard
Riley
Present Test Results for 6 series of testing on the
populated farm
Review Architecture for SharePoint and FAST
Identify lessons learned from building a large-scale
environment
Discuss tools leveraged to create & load content,
performance test
Project Overview and ResultsPaul Andrew, Sr. Technical Product Manager
Demonstrate very large SharePoint Farm Example of new SharePoint Boundaries and Limits
Enterprise Content Management (ECM) document archive scenario Use average Office document types Largest scale limits are document archive focused
Scale out across multiple content databases Adds scale out and scale up
Test SharePoint without limits on hardware or storage resources
Index content with FAST Search Load test with 15,000 concurrent users Test upgrading on a very large farm
Scale Lab Test Goals
Content database 60 million
Scale out permits multiple
New docs saved to dropbox
Content routing rules Separate content
databases Index all content with
FAST
Multiple SharePoint Content Databases
DocumentsDrop Box Document
Library
FAST Search Index
Archive Content
Database(s)
Content Routing
New boundaries and limits for SharePoint released in July 2011
SharePoint can scale to any customer requirement Partly thanks to this test lab Up to 200GB supported as before Up to 4TB supported for ALL scenarios with requirements
guidance Unlimited size supported for Document Archive scenarios
with requirements guidance New limit of 60 Million items in a content database 5TB SQL Server database instance limit is removed Remote Blob Storage (RBS) does not alter these limits
Software Boundaries and Limits Impacted
RBS allows Binary Large Objects to be stored outside SQL Server Reduces the size of the SQL Server database to metadata only This may be just 5% the total SharePoint Content Database
RBS does not alter SharePoint content size limits Blob and Metadata must be synchronized during backup/ restore Storage must return TTFB under 20 mS RBS extensions must use supported SharePoint APIs and not do direct
SQL database access RBS Benefits
Allows use of NAS (with iSCSI) ISV’s adding Tiered storage ISV’s adding custom Backup and Restore and other management features Performance improvements have been seen with > 1Mb files Useful in write once archive scenarios
We didn’t use RBS in this test lab
Value of Remote Blob Storage (RBS)
The report with all this detail published on Monday
BulkLoader Utility Up to 10 million unique Word, Excel, PowerPoint and HTML documents Variable size (250KB used in lab effort) .NET Framework 4.0, OpenXML 2.0 SDK and Wikipedia dump file
LoadBulk2SP Utility 4 Processes containing 16 Threads each targeting unique DL Mimics Folder/File hierarchy from file system Loads using SPFileCollection.Add() method Top load achieved was 233 documents/second Average load achieved was 127 documents/second http://code.msdn.microsoft.com/Load-Bulk-Content-to-3f379974
Reconstructed and connected to original SAN and Virtual Network
Full Farm Failover
PURELY OUT OF BOX INSTALLATION FOR LARGE-SCALE LAB No caching enabled No Thresholds No Site Quotas
Provided adequate recommendation of 2 IOPS per GB SPFileCollection.Add() vs. SPFolder.CopyTo()
Add achieved max of 233 documents/seconds with 16 concurrent threads
CopyTo achieved max of 31 documents/second Loopback Check Registry Key
Create Registry key and set to DISABLE \HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa\
DisableLoopbackCheck=1
Lessons Learned for SharePoint
SQL Server MAXDOP=1 ; Default Installation Value=0 Multiple LUNs on SAN and one virtual CPU to each LUN Database segregation to unique LUNs, spindles and CPUs Reduced SQL Server RAM to 600 GB Table Index Fragmentation (Bulk load only)
SP Timer Job (Health Analyzer) did not function correctly Microsoft.SharePoint.Administration.Health.DatabasesAreFragmented
Table indexes closely monitored during content loading Determined Indexes most impacted by load and created SQL
Stored Procedure to execute ALTER INDEX for dynamic rebuilds Stored Procedure also executes job to Update Statistics Procedure can be dynamically run at load start (Application
Configuration)
Lessons Learned for SQL Server
FAST Search for SharePoint Barry Waldbaum, MCS Architect
FAST Search Server 2010 for SharePoint
Built on SharePoint Search CenterLeverages all of innovations in SharePointOpen Web Parts, Federation, query suggestions, related queries, Did you mean?
Visual results connects users with contentThumbnails for Word and PowerPointVisual Best Bets highlight premium content Preview in browser without leaving the results
Deep Refinement
Thumbnails
Previews
Sort on any field
Similar Results
Big goals Access to big iron! Virtualized hardware and storage SharePoint topology Crawling SharePoint vs File Share content Monitoring at this scale
Why was this interesting to me?
Screenshots
Screenshots
FAST Topology 2 Physical nodes for
document processing 4 VMs
(16GB + 4 VCPUs) Index, Search, Web analyzer Disks:
C: 128GB VHD (not expanded, < 40GB used)
E: 3TB LUN IO Observed:
100MB/s Reads, 100MB/s Writes, 1K IOPS
SharePoint topology for FAST Search 2 Crawl components + 2 Query components VM specs:
Registry Settings on Crawler Nodes HKLM\SOFTWARE\Microsoft\Office Server\14.0\Search\Global\Gathering
Manager FilterProcessMemoryQuota
Default 100MB, Changed to 200MB DedicatedFilterProcessMemoryQuota
Default 100MB, Changed to 200MB
Monitoring the crawler via perfmon <confirm> OSS FAST plugin: Batches Open, Ready,
Submitted, Failed Incremental Crawl
Can take an hour to kick off, high database load 120M items crawldb stays under a 600GB Overall Crawl rate around 70 DPS
SharePoint Crawler Configuration
We can run on big iron FAST can run on VMs, but physical nodes do have
advantages The SAN performed very well Monitor the crawl at least 3 times a day
SCOM SharePoint Perfmon FAST command line tools
Backup of the index is not recommended at scale
FAST Search Lessons Learned
FAST has lots of tools to monitoring what’s going on! rc –r | select-string “# doc”
How busy are the doc procs Monitoring crawl queue size
Use reporting or SQL studio to see MSCrawlURL Indexerinfo –a doccount
Make sure all indexers are reporting to see how many are indexed in 1000 seconds
Indexerinfo –a status Monitor the health of the indexers and partition layout
Monitoring inside FAST
The limit of document processors per node is 20 can be increased if procserver_21 is stopped 50 ran successfully on the physical nodes
System maintenance during a crawl: pause the crawl Do not ignore the capacity planning guide
Make sure your hardware is spec’d to the minimums Admin node makes a great VM!
FAST Search Tips and tricks
Test Report (http://go.microsoft.com/fwlink/?LinkId=229493) SharePoint Server 2010 capacity management: Software boundaries and limits
(http://technet.microsoft.com/en-us/library/cc262787.aspx) Estimate performance and capacity requirements for large scale document repositories in SharePoint
Server 2010 (http://technet.microsoft.com/en-us/library/hh395916.aspx)
Storage and SQL Server capacity planning and configuration (SharePoint Server 2010) (http://technet.microsoft.com/en-us/library/cc298801.aspx)
SharePoint Performance and Capacity Planning Resource Center on TechNet (http://technet.microsoft.com/en-us/office/sharepointserver/bb736741)
Best practices for virtualization (SharePoint Server 2010) (http://technet.microsoft.com/en-us/library/hh295699.aspx)
Best practices for SQL Server 2008 in a SharePoint Server 2010 farm (http://technet.microsoft.com/en-us/library/hh292622.aspx)
Best practices for capacity management for SharePoint Server 2010 (http://technet.microsoft.com/en-us/library/hh403882.aspx)
Performance and Capacity Recommendations for FAST Search Server 2010 for SharePoint (http://technet.microsoft.com/en-us/library/gg702613.aspx)
to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.