Top Banner
Building the Warehouse Chapter 10
23

Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Dec 14, 2015

Download

Documents

Gloria Lowry
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Building the Warehouse

Chapter 10

Page 2: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Overview

Defining DW Concepts& Terminology

PlanningFor a

SuccessfulWarehouse

Project Management(Methodology, Maintaining Metadata)

Meeting aBusiness

Need

Choosing aComputingArchitecture

ModelingThe Data

Warehouse

AnalyzingUser Query

Needs

PlanningWarehouse

Storage

ETT(BuildingThe

Warehouse)

ETT(BuildingThe

Warehouse)

SupportingEnd UserAccess

ManagingThe Data

Warehouse

Page 3: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Extraction/Transformation/Transportation Process (ETT)

* Extract source data * Load data into WH

* Transform/clean data * Detect change

* Index and summarize * Refresh data

Programs

Gateways

Tools

ETT

Operational systems

Warehouse

Page 4: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

ETT Processes

Must result in data that is relevant, useful, high-quality, accurate, and accessible

Require a large proportion of warehouse development time and resources

Clean up

Consolidate

Restructure

Relevant

Useful

Quality

Accurate

AccessibleOpertational Systems

ETT

Warehouse

Page 5: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Data Staging Area

The Construction site for the warehouseRequired by most implementationsComposed of ODS, flat files, or

relational server tablesFrequently configured as multitier

staging

Operationalsystem

Operationalsystem

DataStaging

area

DataStaging

areaWarehouseWarehouseExtract

Transport (Load)

Page 6: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Remote Staging Model

Data staging area within the warehouse environment

Operationalsystem

Operationalsystem

Oper.envt.Data

Stagingarea

DataStaging

areaWarehouseWarehouse

Operationalsystem

Operationalsystem

Data Staging

area

Data Staging

areaWarehouseWarehouse

Oper.envt.

Staging envt.

Warehouse envt.

Warehouse environment

Data staging area in its own environment, avoiding negative impact on the warehouse environment

Extract, Transform,transport

Transport (Local)

Page 7: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Onsite Staging Model

Data staging area within the operationalenvironment, possibly affecting the operationalsystem

Operationalsystem

Operationalsystem

Datastaging

area

Datastaging

areaWarehouseWarehouse

WH envt.Operational environment

TransformExtract

Page 8: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Extracting Data

Routines developed to select fields from sourceVarious data formatsRules, audit trails, error correction facilities

Operational databases

Warehouse database

DataStagingarea

Transform

Datamapping

Page 9: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Source Systems

ProductionArchiveInternalExternal

Page 10: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Production Data

Operating system platformsHardware platformsFile systemsDatabase systems and vertical applications

IMSDB2VSAMNonStop SQLOracleSybaseRdb

SAPShared MedicalSystemsDun and BradstreetFinancialsHogan FinancialsOracle Financials

Page 11: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Archive Data

Historical dataUseful for analysis over long periods of timeUseful for first-time loadMay require unique transformations

Operational database

Warehouse database

Page 12: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Internal Data

Planning, sales, and marketing organization data

Maintained by: - Spreadsheets (structured) - Documents (unstructured)Treated like any other source data

Planning

Marketing

Accounting Warehousedatabase

Page 13: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

External Data

Information from outside the organization Issues of frequency, format, and predictabilityDescribed and tracked using metadata

A.C.Nielsen, IRI, IMS,Waish America Competitive

information

Economicforecasts

Wall StreetJournal

Warehousingdatabases

Barron’s

Dun and Bradstreet

Purchaseddatabases

Page 14: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Mapping Defines which operational attributes to use Defines how to transform the attributes for the

warehouse Defines where the attributes exist in the warehouse Mapping tools are available

MetadataFile A Staging File OneF1 NumberF2 NameF3 DOB

File AF1 123F2 BloggsF3 10/12/56

Staging File OneNumber USA123Name Mr.BloggsDOB 10-Dec-56

Page 15: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Extraction Techniques

Programs: C, COBOL, PL/SQLGateways: transparent database

accessIn-house development is popularTools - High initial cost - Ongoing automation - Data cleanup

Page 16: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Sources and Targets

Data marts

Data analysis

Data mining

OLAP

Page 17: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Designing Extraction Processes

Analysis: - Source, technologies - Data types, quality, ownersDesign options: - Manual, custom, gateway, third-party - Replication, full, or delta refreshDesign issues: - Batch window, volumes, data currency - Automation, skills needed, resources

Page 18: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Maintaining Extraction Metadata

Source location, type, structureAccess methodPrivilege informationTemporary storageFailure proceduresValidity checksHandlers for missing data

Page 19: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Possible ETT Failure

A missing source fileA system failurePoor mapping informationInadequate storage planningA source structural changeNo contingency planInadequate data validation

Page 20: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Maintaining ETT Quality

ETT must be: - Tested - Documented - Monitored and reviewedDisplay metadata must be

coordinated

Page 21: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Selection CriteriaBase functionality Interface featuresMetadata repositoryOpen APIMetadata accessRepository utilities Input and output processingCleansing, reformatting, and auditingReferenceTraining requirements

Page 22: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

WTI Partner ETT Tools

CarletonConstellarEvolutionary Technologies Informatica Information BuildersOracle EDMS, Toolkits, OADWPrism SolutionsSagentVality Technology

Page 23: Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a Successful Warehouse Project Management (Methodology, Maintaining.

Summary

This lesson discussed the following topics:ETT processes are essential and consume a

large proportion of warehouse resources and time

The extraction process acquires source data

You may encounter many data sourcesThere are many data extraction issuesETT Tools should be considered