Top Banner
Populating a Data Warehouse
33

Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Jan 02, 2016

Download

Documents

Aldous Lawrence
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Populating a Data Warehouse

Page 2: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Overview

Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data Warehouse by Using DTS

Page 3: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Process Overview

Validate, Gather ,Validate, Gather , Transform Transform Populate Data Populate Data Distribute Distribute Make Data Consistent Make Data Consistent Data Data WarehouseWarehouse DataData

SalesSales

ServiceService

OtherOther

Data MartsSource OLTP

SystemsTemporary Data

Staging Area

DataDataWarehouseWarehouse

Sales DataSales DataSales DataSales Data

Hardware DataHardware DataHardware DataHardware Data

OracleOracle

SQLSQLServerServer

OtherOther

Page 4: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Validating Data

Validate and Correct Data at the Source Before You Import It

Determine and Correct Processes That Invalidate Data

Save Invalid Data to a Log for Review

Page 5: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Making Data Consistent

Data Can Be Inconsistent in Several Ways:

Data in each source is consistent, but you want to represent it differently in the data warehouse

Data is represented differently in different sources

You Can Make Data Consistent by:

Translating codes or values to readable strings

Converting multiple versions of the same information into a single representation

Page 6: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Transforming DataTransformTransform

Change

Combine

Calculate

buyer_namebuyer_namebuyer_namebuyer_name

Barr, AdamBarr, Adam

Chai, SeanChai, Sean

O’Melia, ErinO’Melia, Erin

......

reg_idreg_idreg_idreg_id

22

44

66

......

total_salestotal_salestotal_salestotal_sales

17.6017.60

52.8052.80

8.828.82

......

buyer_namebuyer_namebuyer_namebuyer_name

Barr, AdamBarr, Adam

Chai, SeanChai, Sean

O’Melia, ErinO’Melia, Erin

......

reg_idreg_idreg_idreg_id

22

44

66

......

total_salestotal_salestotal_salestotal_sales

17.6017.60

52.8052.80

8.828.82

......

buyer_namebuyer_namebuyer_namebuyer_name

Barr, AdamBarr, Adam

Chai, SeanChai, Sean

O’Melia, ErinO’Melia, Erin

......

price_idprice_idprice_idprice_id

.55.55

1.101.10

.98.98

......

qty_idqty_idqty_idqty_id

3232

4848

99

......

buyer_namebuyer_namebuyer_namebuyer_name

Barr, AdamBarr, Adam

Chai, SeanChai, Sean

O’Melia, ErinO’Melia, Erin

......

reg_idreg_idreg_idreg_id

IIII

IVIV

VIVI

......

total_salestotal_salestotal_salestotal_sales

17.6017.60

52.8052.80

8.828.82

......

buyer_namebuyer_namebuyer_namebuyer_name

Barr, AdamBarr, Adam

Chai, SeanChai, Sean

O’Melia, ErinO’Melia, Erin

......

price_idprice_idprice_idprice_id

.55.55

1.101.10

.98.98

......

qty_idqty_idqty_idqty_id

3232

4848

99

......

total_salestotal_salestotal_salestotal_sales

17.6017.60

52.8052.80

8.828.82

......

buyer_firstbuyer_firstbuyer_firstbuyer_first

AdamAdam

SeanSean

ErinErin

......

buyer_lastbuyer_lastbuyer_lastbuyer_last

BarrBarr

ChaiChai

O’MeliaO’Melia

......

reg_idreg_idreg_idreg_id

22

44

66

......

total_salestotal_salestotal_salestotal_sales

17.6017.60

52.8052.80

8.828.82

......

Page 7: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Methods of Populating a Data Warehouse

Select the Method of Populating a Data WarehouseThat Suits Your Business Needs

Method 1: Validate, combine, and transform datain a temporary data staging area

Method 2: Validate, combine, and transform data during the loading process

Migrate Data During Periods of Relatively Low System Use

Page 8: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Tools for Populating a Data Warehouse

What Is the Appropriate Tool to Use

Transact-SQL Query

Distributed Query

bcp Utility and the BULK INSERT Statement

DTS

Page 9: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

What Is the Appropriate Tool to Use

Format of Source and Destination Data

Location of Source and Destination Data

Import or Export of Database Objects

Frequency of Data Transfer

Interface Preference

Tool Performance

Page 10: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Transact-SQL Query

FullNameFullNameFullNameFullName

Johnson, SteveJohnson, Steve

Smith, DouglasSmith, Douglas

Wilson, LesWilson, Les

Salinger, PaulSalinger, Paul

CustomerSummaryCustomerFirstNameFirstNameFirstNameFirstName

SteveSteveLastNameLastNameLastNameLastName

JohnsonJohnson

DouglasDouglas SmithSmith

LesLes WilsonWilson

PaulPaul SalingerSalinger

USE northwind_martSELECT Lastname + ', ' + Firstname As FullnameINTO CustomerSummaryFROM Northwind.dbo.Customer

USE northwind_martSELECT Lastname + ', ' + Firstname As FullnameINTO CustomerSummaryFROM Northwind.dbo.Customer

Page 11: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Distributed Query

USE northwind_martSELECT productname, companyname INTO item_dimFROM StockServer.sales.dbo.products p JOIN AccountingServer.sales.dbo.suppliers s ON p.supplierid = s.supplierid

USE northwind_martSELECT productname, companyname INTO item_dimFROM StockServer.sales.dbo.products p JOIN AccountingServer.sales.dbo.suppliers s ON p.supplierid = s.supplierid

SalesSalesProducts TableProducts TableProducts TableProducts Table

SalesSales

AccountingServer StockServer

Local SQL Server

Suppliers TableSuppliers TableSuppliers TableSuppliers Table

Item_Dim TableItem_Dim TableItem_Dim TableItem_Dim Table

Page 12: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

bcp Utility and the BULK INSERT Statement

BULK INSERT Accounting.dbo.ordersFROM 'C:\ordersdir\orderstble.dat'WITH(DATAFILE TYPE = 'char'FIELDTERMINATOR = '|',ROWTERMINATOR = '|\n')

BULK INSERT Accounting.dbo.ordersFROM 'C:\ordersdir\orderstble.dat'WITH(DATAFILE TYPE = 'char'FIELDTERMINATOR = '|',ROWTERMINATOR = '|\n')

BCP accounting.dbo.orders in Orderstbl.dat –c –t, -r \n–Smysqlserver –Usa –Pmypassword

BCP accounting.dbo.orders in Orderstbl.dat –c –t, -r \n–Smysqlserver –Usa –Pmypassword

bcp Utililty

BULK INSERT Statement

Page 13: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

DTS

When to Use DTS

DTS Data Source and Destination Types

OLE DB ODBC ASCII text file

DTS Tools DTS Import and Export wizards DTS Designer dtsrun utility

Custom HTML Spreadsheet

Page 14: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Populating a Data Warehouse by Using DTS

Building a DTS Package

Transforming Data by Using an ActiveX Script

Transforming Data by Using a Lookup Query

Defining Transactions

Tracking Data Lineage

Creating a DTS Package Programmatically

Page 15: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Building a DTS Package

Mapping Source and Destination Data

Defining Data Transformation Tasks

Creating and Saving a DTS Package

Executing a DTS Package

Scheduling and Securing a DTS Package

Page 16: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Mapping Source and Destination Data

Mapping Columns

Decide which columns to copy

Choose the columns in the target database that map to the source columns

Mapping Data Types

Specify transformation rules

Specify levels of data conversion

Page 17: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Defining Data Transformation Tasks

DTS Packages Contain Tasks

A Task Can:

Execute a Transact-SQL statement

Execute a script

Launch an external application

Transfer SQL Server 7.0 objects

Execute or retrieve results from a DTS package

Page 18: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Creating and Saving a DTS Package

Creating a DTS Package

By using DTS wizards By using DTS Designer By using a COM interface exposed by DTS

Saving a DTS Package

COM-structured storage file Microsoft Repository SQL Server msdb database

Page 19: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Executing a DTS Package

You Can Execute a DTS Package by Using SQL Server Enterprise Manager or dtsrun Command Prompt Utility

File Storage Location Determines the dtsrun Syntax

dtsrun /sAccounts /uJose /nOrdersImportdtsrun /sAccounts /uJose /nOrdersImport

Page 20: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Scheduling and Securing a DTS Package

Scheduling a DTS Package

Use DTS Import or DTS Export wizards when you save the DTS package to the msdb database

Use SQL Server Enterprise Manager when you usethe dtsrun command prompt utility

Implementing DTS Package Security

Login permissions Owner and user passwords

Page 21: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Demonstration: Transferring Data by Using DTS

Page 22: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Transforming Data by Using an ActiveX Script

Why Use an ActiveX Script

How to Use an ActiveX Script Define a function to contain the transformation script Specify the destination column Specify the source columns

Use operators and VBScript or JScript functions and control-of-flow statements

Set the return code value for the function How to Handle Errors with Return Codes

Page 23: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Examples of ActiveX Scripts

FullNameFullNameFullNameFullName

Johnson, SteveJohnson, Steve

Smith, DouglasSmith, Douglas

Wilson, LesWilson, Les

Salinger, PaulSalinger, Paul

CustomerSummaryCustomerFirstNameFirstNameFirstNameFirstName

SteveSteveLastNameLastNameLastNameLastName

JohnsonJohnson

DouglasDouglas SmithSmith

LesLes WilsonWilson

PaulPaul SalingerSalinger

Function Main()DTSDestination(“FullName”) = DTSSource(“Lastname”) + “, ” + DTSSource(“Firstname”)Main = DTSTransformStat_OKEnd Function

Function Main()DTSDestination(“FullName”) = DTSSource(“Lastname”) + “, ” + DTSSource(“Firstname”)Main = DTSTransformStat_OKEnd Function

Page 24: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Demonstration: Transforming Data by Using an ActiveX Script

Page 25: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Transforming Data by Using a Lookup Query

Customer_dimCustomer_dimCustomer_dimCustomer_dim

NameName

D. SmithD. Smith

L. WilsonL. Wilson

P. SalingerP. Salinger

StateState

FloridaFlorida

WyomingWyoming

ArkansasArkansas

Destination Data

Source Data

Customer_sourceCustomer_sourceCustomer_sourceCustomer_source

NameName

D. SmithD. Smith

L. WilsonL. Wilson

P. SalingerP. Salinger

StateState

FLFL

WYWY

ARAR

Lookup Table

State_lookupState_lookupState_lookupState_lookup

AbbreviationAbbreviation

FLFL

WYWY

ARAR

StateState

FloridaFlorida

WyomingWyoming

ArkansasArkansas

Transform

Page 26: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Implementing a Lookup Query

Set Up Connections to Source, Destination, and Lookup Tables

Create a Task, and Specify the Source and Destination

Add a Lookup Query Definition

Map the Source and Destination Columns, andCall the Lookup Query from the ActiveX Script

Page 27: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Defining Transactions

You Specifically Must Add a Step or Task to the Transaction

You Can Specify When a Transaction Commits

DTS Only Supports One Transaction Per Package

MS DTC Must Be Running

The Data Provider for the Data Destination Must Support Transactions

Page 28: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Tracking Data Lineage

Using Data Lineage Tracks history of data at package and table row levels Provides audit trail of data transformation and DTS

package execution

Implementing Data Lineage Create the table columns in the data warehouse Add data lineage variables to the DTS package Map data lineage source and destination columns

Viewing Data Lineage

1111

2222

3333

Page 29: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Demonstration: Defining Transactions and Tracking Data Lineage

Page 30: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

DTS PackageDTS Package

Create ProcessCreate Process

SourceSourceColumns

StepsStepsStepsStepsStepsStepsStepsStepsStepsStepsPrecedence ConstraintsPrecedence Constraints

Send MailSend Mail

Bulk InsertBulk Insert Transfer ObjectsTransfer Objects

Execute SQLExecute SQL Data-driven QueryData-driven Query

CustomCustom ActiveXActiveX

Data PumpData Pump

StepsStepsStepsStepsTasksTasks

StepsStepsStepsStepsGlobal VariablesGlobal VariablesDestinationDestination

StepsStepsStepsStepsConnectionsConnections

Creating a DTS Package Programmatically

Page 31: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Recommended Practices

Correct and Validate Data at the SourceCorrect and Validate Data at the Source

Use an ActiveX Script or a Transact-SQL Script to Transferand Transform DataUse an ActiveX Script or a Transact-SQL Script to Transferand Transform Data

Use a Temporary Data Storage AreaUse a Temporary Data Storage Area

Save and Store DTS Packages in the Microsoft Repositoryto Maintain Data LineageSave and Store DTS Packages in the Microsoft Repositoryto Maintain Data Lineage

Page 32: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Lab A: Populating a Data Warehouse

Page 33: Populating a Data Warehouse. Overview Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data.

Review

Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data Warehouse by Using DTS