Top Banner
{ Integration Services Best Practices} Itay Braun BI and SQL Server Consultant Email: [email protected] Blog: http://blogs.microsoft.co.il/blogs/itaybraun/
36
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

{ Integration Services Best Practices}Itay Braun

BI and SQL Server Consultant

Email: [email protected]: http://blogs.microsoft.co.il/blogs/itaybraun/

Page 2: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

BI User Group Messages

New website for SQL Server in Hebrew: www.sqlserver.co.ilTwingo is looking for experienced BI / SQL Server developers. At least two years experience. Please contact [email protected] for more detailsIf you are looking for employees or looking for a job, please contact Yossi Elkayam [email protected]

Page 3: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Agenda

If it moves – Log it!Establishing performance baselinePackage ConfigurationLookup OptimizationData ProfilingOther tips and tricks

Page 4: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

If It moves – Log it!

SSIS Log ProvidersEvent handlersAnalyzing the dataDon’t forget the jobs

Page 5: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

SSIS Log Providers

Used to capture run-time information about a packageHelps to audit and troubleshoot a package every time it is runIntegration Services includes the following log providers:

The Text File log provider (CSV)The SQL Server Profiler log providerThe SQL Server log provider (sysssislog table) The Windows Event log providerThe XML File log provider

Page 6: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

SSIS Log Providers

All tasks share the same basic eventsEach task also has unique events

Page 7: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Custom Logging Using Event Handlers

Build manually the table and eventsAllows better control on the collected dataFor Ex.

Row countImportant step was finished

Page 8: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Event Handlers

Simple SSIS package within the packageMostly used to response to OnError events

Log and sending email

Page 9: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Analyzing the Data

SQL 2008 – sysssislog table

http://technet.microsoft.com/en-us/library/ms186984.aspx SQL 2005 – sysdtslog90http://msdn.microsoft.com/en-us/library/ms186984(SQL.90).

aspx Analyze:

Total execution timeSSAS partition processing timeErrors and Warnings Time elapsed between PackageStart and PackageEnd

Page 10: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Don’t forget the jobs

Don’t forget to monitor the execution of the ETL jobs. Use Reporting Services to write simple reports about the ETL execution process.

Page 11: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Agenda

If it moves – Log it!Establishing performance baselinePackage ConfigurationLookup OptimizationData ProfilingOther tips and tricks

Page 12: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Performance Counters

Understanding resource utilizationCPU BoundMemory BoundI/O BoundNetwork Bound

Page 13: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Performance Counters - CPU

Processor timeProcess / % Processor Time (Total)

sqlservr.exe and dtexec.exe Do the tasks run in parallel

Page 14: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Performance Counters – Memory

Process / Private Bytes (DTEXEC.exe) – The amount of memory currently in use by Integration Services. Process / Working Set (DTEXEC.exe) – The total amount of allocated memory by Integration Services.SQL Server: Memory Manager / Total Server Memory: The total amount of memory allocated by SQL Server. Memory / Page Reads / sec – Represents to total memory pressure on the system.

If this consistently goes above 500, the system is under memory pressure.

Page 15: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Performance Counters - Memory

SSIS Pipeline/ Buffers in use - the number of pipeline buffers in use throughout the pipeline.Buffer Spooled / Buffer Spooled - The number of buffers spooled to disk. Buffer spooled has initial value of 0. When it goes above 0, it indicates that the engine has started memory swapping.Rows Read - The number of rows read from all data sources in total.Rows Written - The number of rows written to all data destinations in total.

Page 16: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Performance Counters – I/O

To ensure that Integration Services is minimally writing to disk, SSIS should only hit the disk when it reads from the source and writes to the target. For SAN / NAS use the vendors applications

Page 17: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Performance Counters - Network

SSIS moves data as fast as the network is able to handle it.Network Interface / Current Bandwidth: This counter provides an estimate of current bandwidth.Network Interface / Bytes Total / sec: The rate at which bytes are sent and received over each network adapter.Network Interface / Transfers/sec: Tells how many network transfers per second are occurring.

If it is approaching 40,000 IOPs, then get another NIC card and use teaming between the NIC cards.

Page 18: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Agenda

If it moves – Log it!Establishing performance baselinePackage ConfigurationLookup OptimizationData ProfilingOther tips and tricks

Page 19: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Package Configuration

the package needs to know where it is moving data from and where it is moving data toTypically Integration Services packages are built on a different environment to where they are intended to be executed in production.

Page 20: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Package Configuration

Object which can be configures:TasksContainers VariablesConnection ManagersData Flow Components

Page 21: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Configuration Types

XML Configuration FileMost popular configuration typeEasy deploymentDisadvantage - Path to the .dtsconfig file must be hard coded within the package

Environment Variable Configuration Takes the value for a property from whatever is stored in a named environment vriableStores the property path inside the package and the value outside the package

Page 22: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Configuration Types

Parent Package ConfigurationFetch a value from a variable in a calling packageStores the property path inside the package and the value outside the package.

Registry ConfigurationThe value to be applied to a package property is stored in a registry entrystores the property path inside the package and the value outside the package

Page 23: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Configuration Types

SQL Server Configuration stored in a SQL Server table. The table can have any name you like, and can be in any database on any server that you like.

Page 24: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Configuration Best Practices

Consider command-line options as an alternative to configurations

The /SET option used to apply a value to some property in the package that is being runThe /CONFIGFILE option used to tell the package to use an XML configuration file, even if one has not been defined in the package

Configure Only the ConnectionString Property for Connection Managers

Instead of Servername, initialCatalog, UserName, Password

Don’t save the password in XML files

Page 25: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Agenda

If it moves – Log it!Establishing performance baselinePackage ConfigurationLookup OptimizationData ProfilingOther tips and tricks

Page 26: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Lookup Optimization

Use the NOLOCK or TABLOCK hints to remove locking overheadTo optimize memory usage, SELECT only the columns you actually needIf possible, perform datetime conversions at the source or target databases, as it is more expensive to perform within Integration Services.In SQL Server 2008 Integration Services, there is a new feature of the shared lookup cache.

Page 27: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Lookup Optimization

Commit size 0 is fastest on heap bulk targets

because only one transaction is committed

If commit size = 0 is not possible, use the highest possible value of commit size

to reduce the overhead of multiple-batch writing

Commit size = 0 is a bad idea if inserting into a Btree

all incoming rows must be sorted at once into the target Btree

Page 28: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Lookup Optimization

Batchsize = 0 is ideal for inserting into a heap.

For an indexed destination, I recommend testing between 100,000 and 1,000,000 as batch size.

Use a commit size of <5000 to avoid lock escalation when insertingUse partitions and partition SWITCH commandMore info here: Getting Optimal Performance with Integration Services Lookups.

Page 29: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Agenda

If it moves – Log it!Establishing performance baselinePackage ConfigurationLookup OptimizationData ProfilingOther tips and tricks

Page 30: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Data Profiling

New Feature in SSIS 2008Used to profile the data

Null valuesValues distributionColumn length

Page 31: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Agenda

If it moves – Log it!Establishing performance baselinePackage ConfigurationLookup OptimizationData ProfilingOther tips and tricks

Page 32: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Other tips

Make data types as narrow as possible so you will allocate less memory for your transformationWatch precision issues when using the money, float, and decimal types.

money is faster than decimal, and money has fewer precision considerations than float

Page 33: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Other Tips

Do not sort within Integration Services unless it is absolutely necessary.

In order to perform a sort, Integration Services allocates the memory space of the entire data set that needs to be transformed

There are times where using Transact-SQL will be faster than processing the data in SSIS.

As a general rule, any and all set-based operations will perform faster in Transact-SQL.

Page 34: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

Other Tips

To perform delta detection, you can use a change detection mechanism such as the new SQL Server 2008 Change Data Capture (CDC) functionality

Page 36: Ssis Best Practices   Israel Bi U Ser Group   Itay Braun

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after

the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.