Microsoft MCSA upgrade to SQL Server 2012 Certification Courseware Version 1.0 www.firebrandtraining.com
Microsoft MCSA upgrade to SQL Server 2012 Certification
Courseware Version 1.0
www.firebrandtraining.com
1
5/9/2013 1 ©2007 – Body Temple 5/9/2013
1
SQL Server 2012 for Data Warehousing Appendix
MSCA SQL Server 2012 Upgrade Course
5/9/2013 2 ©2007 – Body Temple 5/9/2013
2
Deployment Models
Package Deployment Model
Project Deployment Model
2
5/9/2013 3 ©2007 – Body Temple 5/9/2013
3
Package Deployment Model
Storage
• MSDB
• File System
Package Configurations
• Property values to be set dynamically at run time
Package Deployment Utility
• Generate all required files for easier deployment
5/9/2013 4 ©2007 – Body Temple 5/9/2013
4
Project Deployment Model
The SSIS catalog
• Storage and management for SSIS
projects on a SQL Server 2012 instance
Folders
• A hierarchical structure for organizing
and securing SSIS projects
3
5/9/2013 5 ©2007 – Body Temple 5/9/2013
5
Deployment Model Comparison
Feature Package Deployment Project Deployment
Unit of Deployment Package Project
Storage File system or MSDB SSIS Catalog
Dynamic configuration Package configurations Environment variables
mapped to project-level
parameters and connection
managers
Compiled format Multiple .dtsx files Single .ispac file
Troubleshooting Configure logging for each
package
SSIS catalog includes built-
in reports and views
5/9/2013 6 ©2007 – Body Temple 5/9/2013
6
Creating an SSIS Catalog
Pre-requisites
• SQL Server 2012
• SQL CLR enabled
Creating a catalog
• Use SQL Server Management Studio
• One SSIS catalog per SQL Server instance
Catalog Security
• Folder Security
• Object Security
• Catalog Encryption
• Sensitive Parameters
4
5/9/2013 7 ©2007 – Body Temple 5/9/2013
7
Environments and Variables
• Environments
Execution contexts for projects
• Variables
Environment-specific values that can be mapped to project parameters and connection manager properties at run time
5/9/2013 8 ©2007 – Body Temple 5/9/2013
8
Deploying an SSIS Project
Integration Services
Deployment Wizard
• SQL Server Data Tools
• SQL Server
Management Studio
5
5/9/2013 9 ©2007 – Body Temple 5/9/2013
9
Viewing Project Execution Information
Integration Services Dashboard provides built-in reports
Additional sources of information:
• Event Handlers
• Error Outputs
• Logging
• Debug Dump Files
5/9/2013 10 ©2007 – Body Temple 5/9/2013
10
Data Quality Services
Cleansing Data
Matching Data
6
5/9/2013 11 ©2007 – Body Temple 5/9/2013
11
What Is a Knowledge Base?
Repository of knowledge about data:
• Domains define values and rules for each
field
• Matching policies define rules for
identifying duplicate records
1011000110
KB
5/9/2013 12 ©2007 – Body Temple 5/9/2013
12
What Is a Domain?
• Domains are specific to a data field
• Domains contain the rules for the data
• Domains can be individual or composite
KB
City State Name
Family Name
First Name
7
5/9/2013 13 ©2007 – Body Temple 5/9/2013
13
KB
Name
Family Name
First Name
• The Azure Marketplace hosts specialist data cleansing providers
Set up an account
Subscribe to a reference service
Map your domain to the reference service
What Is a Reference Data Service?
Address
5/9/2013 14 ©2007 – Body Temple 5/9/2013
14
Creating a Data Cleansing Project
1. Select a knowledge base
2. Map columns to domains
3. Review suggestions and corrections
4. Export results
8
5/9/2013 15 ©2007 – Body Temple 5/9/2013
15
Viewing Cleansed Data
• Output – The values for all fields after data cleansing
• Source – The original value for fields that were mapped to domains and cleansed
• Reason – The reason the output value was selected by the cleansing operation
• Confidence – An indication of the confidence Data Quality Services estimates for
corrected values
• Status – The status of the output column (correct or corrected)
5/9/2013 16 ©2007 – Body Temple 5/9/2013
16
Using the Data Cleansing Data Flow Transformation
Input columns to be cleansed
Select knowledge base and map
columns to domains
Output cleansed columns
9
5/9/2013 17 ©2007 – Body Temple 5/9/2013
17
Creating a Matching Policy
• Define matching rules for business entities
• Rules match entities based on domains:
Similarity: Similar or exact match
Weight: Percentage to apply if match succeeds
Prerequisite: Mandatory domain match for rule to succeed
• If the combined weight of all matches meets or exceeds the rule’s minimum matching score, the entities are duplicates
5/9/2013 18 ©2007 – Body Temple 5/9/2013
18
Creating a Data Matching Project
1. Select a knowledge base
2. Map columns to domains
3. Review match clusters
4. Export matches and survivors
• Select survivorship rule:
• Pivot record
• Most complete and longest record
• Most complete record
• Longest record
10
5/9/2013 19 ©2007 – Body Temple 5/9/2013
19
Viewing Data Matching Results
Cluster ID – Identifier for a cluster of matched records
Record ID – Identifier for a matched record
Matching Rule – The rule that produced the match
Score – Combined weighting of match criteria
Pivot Mark – A matched record chosen arbitrarily by Data Quality Services as
the pivot record for a cluster
5/9/2013 20 ©2007 – Body Temple 5/9/2013
20
Master Data Services
Master Data Services v Data Quality Services
Master Data Services
11
5/9/2013 21 ©2007 – Body Temple 5/9/2013
21
Master Data Services and Data Quality Services
Domain-Focused
Applicable to any dataset that
contains domains in the KB
• Is “Seattle” a valid value for City?
Master Data Services
• Entity-Focused
• Applicable to specific instances of business entities
What is the correct name, address, and phone number for customer 1235?
Data Quality Services
Use Data Quality Services to:
• Cleanse data before importing into master data hub
• Apply matching policies to find duplicate master data entities
5/9/2013 22 ©2007 – Body Temple 5/9/2013
22
Components of Master Data Services
Master Data Services database
• Contains the objects that support Master Data Services
Master Data Manager Web application
• Enables administrators to create and manage Master
Data Services objects, such as entities and attributes
• Enables data stewards to manage master data
Master Data Services Configuration Manager
• Enables administrators to create the Master Data
Services database and Web application
Master Data Services Add-In for Microsoft Excel
• Enables administrators and data stewards to manage
master data in Excel
Data Steward Administrator
12
5/9/2013 23 ©2007 – Body Temple 5/9/2013
23
What Is a Master Data Services Model?
A versioned data model for specific business item or area of the business
Contains definitions for entities required in the business area
• Often an entity with the same name as the model, as well as related entities
Each entity has a defined set of attributes
• All entities have Code and Name attributes
• Attributes can be categorized in attribute groups
Each instance of an entity is a known as a member
Customers Model
Version 1
Version 2 Version 3
Account Type Entity
Customer Entity
Attributes: • Code (string) • Name (string)
• Code: 1 • Name: Standard
Member
• Code: 2 • Name: Premier
Member
• Code: 1235 • Name: Ben Smith • Address: 1 High St, Seattle • Phone: 555-12345 • AccountType: 1 • CreditLimit: 1000
Member
Attributes: • Code (free-form text) • Name (free-form text) • Address (free-form text) • Phone (free-form text) • AccountType (domain-based) • CreditLimit (free-form number)
Contact Details Attribute Group
5/9/2013 24 ©2007 – Body Temple 5/9/2013
24
Editing a Model in Microsoft Excel
Use the Master Data Services Add-In for Excel to connect to a model
Create entities
Add columns to create attributes
Edit entity member data in worksheets
Publish changes to Master Data Services
13
5/9/2013 25 ©2007 – Body Temple 5/9/2013
25
Hierarchies and Collections
Derived Hierarchies
• Natural hierarchical groupings based on domain-based relationships
• Code: 1 • Name: Standard
Account Type Member
• Code: 1235 • Name: Ben Smith • Account Type: 1
Customer Member
• Code: 1267 • Name: Amy Strande • Account Type: 1
Customer Member
• Code: CustUS • Name: US Customers
Consolidated Member
• Code: 1235 • Name: Ben Smith
Leaf Member
• Code: 1267 • Name: Amy Strande
Leaf Member
• Code: CustEU • Name: European Customers
Consolidated Member
• Code: 2214 • Name: Sabina Schütz
Leaf Member
Special Delivery Customers
• Code: Cust-EU • Name: European Customers
Consolidated Member
• Code: 2600 • Name: Andrew Sinclair
Leaf Member
• Code: 1785 • Name: Doris Krieger
Leaf Member
•Collections
Non-hierarchical groupings of specific members
Can include leaf and consolidated members from any hierarchy
• Explicit Hierarchies
Hierarchical groupings of specific members
Can be mandatory or non-mandatory
5/9/2013 26 ©2007 – Body Temple 5/9/2013
26
Finding Duplicate Members
Create a Data Quality
Services knowledge base
with domains for entity
attributes
Create a matching
policy to identify
possible duplicates
Match data in the
Master Data Services
Add-In for Excel
14
5/9/2013 27 ©2007 – Body Temple 5/9/2013
27
Validating Members with Business Rules
• Define business rules with the business rule expression editor
• Publish business rules
• Use business rules to validate entity member data in:
Explorer
Excel
5/9/2013 28 ©2007 – Body Temple 5/9/2013
28
Master Data Hub Architecture
Users insert and update data in application data stores
Application data is loaded into master data hub for consolidation and management by data stewards
Master data flows back to application data stores and other consumers across the enterprise
CRM
Marketing System
Order Processing System
Master Data Hub
Data Steward
Other consumers
(e.g. Data Warehouse ETL)
SSIS
SSIS SSIS
SSIS
15
5/9/2013 29 ©2007 – Body Temple 5/9/2013
29
Master Data Services Staging Tables MDS generates a staging table for each:
• Leaf member
• Consolidated member
• Relationship
stg.EntityName_Consolidated
• ID
• ImportType
• ImportStatus_ID
• Batch_ID
• BatchTag
• HierarchyName
• ErrorCode
• Code
• Name
• NewCode
• <AttributeName>
stg.EntityName_Relationship
• ID
• RelationshipType
• ImportStatus_ID
• Batch_ID
• BatchTag
• HierarchyName
• ParentCode
• ChildCode
• SortOrder
• ErrorCode
stg.EntityName_Leaf
• ID
• ImportType
• ImportStatus_ID
• Batch_ID
• BatchTag
• ErrorCode
• Code
• Name
• NewCode
• <AttributeName>
5/9/2013 30 ©2007 – Body Temple 5/9/2013
30
Staging and Importing Data
1. Load data into staging tables
• Using SSIS, Import and Export Wizard, Transact-SQL, etc.
2. Run staging stored procedures
• stg.udp.EntityName_Leaf
• stg.udp.EntityName_Consolidated
• stg.udp.EntityName_Relationship
3. View import status in Master Data Manager
4. Match data to identify any duplicates
5. Apply business rules to validate data
16
5/9/2013 31 ©2007 – Body Temple 5/9/2013
31
Consuming Master Data with Subscription Views
• Create subscription views in Master Data Manager
• Formats:
Leaf Attributes
Consolidated Attributes
Collection Attributes
Collections
Explicit Parent Child
Explicit levels
Derived Parent Child
Derived Levels
5/9/2013 32 ©2007 – Body Temple 5/9/2013
32
Incremental ETL
Overview of Data Warehouse Load Cycles
Considerations for Incremental ETL
17
5/9/2013 33 ©2007 – Body Temple 5/9/2013
33
Overview of Data Warehouse Load Cycles
Extract changes from data sources
Refresh the data warehouse based on changes
Data Warehouse Staging Database
ETL process inserts or modifies data
in the data warehouse based on
changes
ETL process extracts new and
modified data
Users modify data in
business
applications
5/9/2013 34 ©2007 – Body Temple 5/9/2013
34
Considerations for Incremental ETL
Data modifications to be tracked
Load order
Dimension keys
Updating dimension members
Updating fact records
?
18
5/9/2013 35 ©2007 – Body Temple 5/9/2013
35
Slowly Changing Dimensions Types of change to a dimension member:
• Type 1: Changing attributes are updated in the dimension record
• Type 2: Historical attribute changes result in a new record
• Type 3: The original and current values of historical attributes are
stored in the dimension record
Key AltKey Name Phone City
101 C123 Mary 5551234 New York
Key AltKey Name Phone City
101 C123 Mary 5554321 New York
Key AltKey Name Phone City Current
101 C123 Mary 5551234 New York True Key AltKey Name Phone City Current
101 C123 Mary 5551234 New York False
102 C123 Mary 5551234 Seattle True
Key AltKey Name Phone OriginalCity CurrentCity EffectiveDate
101 C123 Mary 5551234 New York Seattle 6/7/11
Key AltKey Name Phone OriginalCity CurrentCity EffectiveDate
101 C123 Mary 5551234 New York New York 1/1/00
5/9/2013 36 ©2007 – Body Temple 5/9/2013
36
Options for Extracting Modified Data
• Extract all records
• Store a primary key and checksum
• Use a datetime column as a “high water mark”
• Use Change Data Capture
• Use Change Tracking
19
5/9/2013 37 ©2007 – Body Temple 5/9/2013
37
Extracting Rows Based on a Datetime Column
1. Note the current time
2. Retrieve the last extraction time from an extraction log
3. Extract and transfer records that were modified between the last extraction and the current time
4. Replace the stored last extraction value with the current time
Staging Database Data Source
1 2
3 4
Extraction Log
5/9/2013 38 ©2007 – Body Temple 5/9/2013
38
Change Data Capture
1. Enable Change Data Capture
2. Map start and end times to log sequence numbers
3. Handle null log sequence numbers
4. Extract changes between log sequence numbers
EXEC sys.sp_cdc_enable_db
EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'Customers',
@role_name = NULL, @supports_net_changes = 1
DECLARE @from_lsn binary(10), @to_lsn binary(10);
SET @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than', @StartDate)
SET @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @EndDate)
IF (@from_lsn IS NULL) OR (@to_lsn IS NULL)
-- There may have been no transactions in the timeframe
SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_Customers(@from_lsn, @to_lsn, 'all')
20
5/9/2013 39 ©2007 – Body Temple 5/9/2013
39
Extracting Data with Change Data Capture
1. Identify the end-point for the extraction (LSN or DateTime)
2. Retrieve the last extraction end-point from an extraction log
3. Extract and transfer records that were modified during the LSN range defined by the previous extraction end-point and the current end-point
4. Replace the logged end-point value with the current end-point
Staging Database Data Source
1 2
3 4
Extraction Log
5/9/2013 40 ©2007 – Body Temple 5/9/2013
40
The CDC Control Task and Data Flow Components Initial Extraction Incremental Extraction
CDC Control Mark Initial Load Start
Source
Staged Inserts
CDC Control Mark Initial Load End
CDC State Table
CDC State
Variable
CDC Control Get Processing Range
CDC Source
Staged
Inserts
CDC Control Mark Processed Range
CDC State
Variable
CDC Splitter
Staged
Updates
Staged
Deletes
1. A CDC Control Task records the starting LSN
2. A data flow extracts all records
3. A CDC Control task records the ending LSN
1. CDC Control Task establishes the range of LSNs to be extracted
2. A CDC Source extracts records and CDC metadata
3. Optionally, a CDC Splitter splits the data flow into inserts, updates, and deletes
4. A CDC Control task records the ending LSN
Data
Flo
w
Data
Flo
w
1
2
3
1
2
3
4
CDC
21
5/9/2013 41 ©2007 – Body Temple 5/9/2013
41
Change Tracking 1. Enable Change Tracking
2. Record the current version and extract the initial data
3. Extract changes since the last extracted version, and then update the last extracted version
ALTER DATABASE Sales
SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 7 DAYS, AUTO_CLEANUP = ON)
ALTER TABLE Salespeople
ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = OFF)
SET @CurrentVersion = CHANGE_TRACKING_CURRENT_VERSION();
SELECT * FROM Salespeople
SET @LastExtractedVersion = @CurrentVersion
SET @CurrentVersion = CHANGE_TRACKING_CURRENT_VERSION();
SELECT * FROM CHANGETABLE(CHANGES Salespeople, @LastExtractedVersion) CT
INNER JOIN Salespeople s ON CT.SalespersonID = s.SalespersonID
SET @LastExtractedVersion = @CurrentVersion
Tip: Use snapshot isolation to ensure consistency
5/9/2013 42 ©2007 – Body Temple 5/9/2013
42
Extracting Data with Change Tracking
1. Retrieve the last version number that was extracted from an extraction log
2. Extract and transfer records that were modified since the last version, retrieving the current version number
3. Replace the logged version number with the current version number
Staging Database Data Source
1
2 3
Extraction Log
22
5/9/2013 43 ©2007 – Body Temple 5/9/2013
43
Options for Incrementally Loading Data
Insert, Update, or Delete from CDC Output Tables
Use a Lookup transformation
Use the Slowly Changing Dimension transformation
Use the MERGE statement
Use a checksum
5/9/2013 44 ©2007 – Body Temple 5/9/2013
44
Using CDC Output Tables
Staging DB
Data Warehouse Staging DB Data Warehouse
Execute SQL Task INSERT… FROM
Execute SQL Task UPDATE… FROM
JOIN ON BizKey
Execute SQL Task DELETE WHERE BizKey IN
or
UPDATE… FROM
JOIN ON BizKey
Source Staged Inserts
Source Staged Updates
Source Staged Deletes
Destination Dimension Table
OLE DB Command UPDATE…
OLE DB Command UPDATE… or DELETE…
Staging and Data Warehouse Co-located Remote Data Warehouse
Data
Flo
w
Data
Flo
w
Data
Flo
w
23
5/9/2013 45 ©2007 – Body Temple 5/9/2013
45
The Lookup Transformation
Redirect non-matched rows to the no match output
Look up extracted data in a dimension or fact table based on a business key or unique combination of keys
If no match is found, insert a new record
Optionally, if a match is found, update non-key columns to apply a type 1 change
Match No Match
Lookup
Update Insert
5/9/2013 46 ©2007 – Body Temple 5/9/2013
46
The Slowly Changing Dimension Transformation
Changing Attributes
(Type 1) Inferred members
SCD
OLE DB Command to
update existing
record
OLE DB Destination
for new records
OLE DB Command to
insert minimal
record
Historical
Attributes
(Type 2)
Derived Column to
add current row
indicator column
OLE DB Command to
set existing record’s
current row
indicator to false
Union All
Derived Column sets
current row
indicator to True
New records
24
5/9/2013 47 ©2007 – Body Temple 5/9/2013
47
The MERGE Statement
Matches source and target rows
Performs insert, update, or delete operations based on row
matching results MERGE INTO DW.dbo.FactSales as tgt
USING
(SELECT c.CustomerKey, s.OrderNo, s.Quantity, s.Amount
FROM Staging.dbo.Sales s
INNER JOIN DW.dbo.DimCustomer c ON s.CustomerID = c.CustomerBizKey)
AS src (CustomerKey, OrderNo, Quantity, Amount)
ON
(src.CustomerKey = tgt.CustomerKey
AND src.OrderNo = tgt.OrderNo)
WHEN MATCHED THEN
UPDATE
SET Quantity = src.Quantity, Amount = src.Amount
WHEN NOT MATCHED THEN
INSERT (CustomerKey, OrderNo, Quantity, Amount)
VALUES (src.CustomerKey, src.OrderNo, src.Quantity, src.Amount)
The target table being loaded
Matching criteria to
identify existing
records
Action when
existing
records are
found
Action when
no existing
records are
found
source
data
5/9/2013 48 ©2007 – Body Temple 5/9/2013
48
SQL Server 2012 for Data Warehousing Appendix
MSCA SQL Server 2012 Upgrade Course
End of Presentation