Top Banner
Introduction to Data Warehousing Pasquale LOPS Gestione della Conoscenza d’Impresa A.A. 2003-2004 Introduction to Data Warehousing Pasquale LOPS Gestione della Conoscenza d’Impresa A.A. 2003-2004 Introduction Introduction Data warehousing and decision support have given rise to a new class of databases. Design strategies for OLAP databases differ significantly from OLTP systems. Today’s decision support systems must deliver multidimensional analysis capabilities.
32

Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

Feb 19, 2018

Download

Documents

doantuyen
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

1

Introduction to Data Warehousing

Pasquale LOPSGestione della Conoscenza d’Impresa

A.A. 2003-2004

Introduction to Data Warehousing

Pasquale LOPSGestione della Conoscenza d’Impresa

A.A. 2003-2004

IntroductionIntroduction

Data warehousing and decision support have given rise to a new class of databases.

Design strategies for OLAP databases differ significantly from OLTP systems.

Today’s decision support systems must deliver multidimensional analysis capabilities.

Page 2: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

2

Terminology – What is a Data Warehouse?Terminology – What is a Data Warehouse?

A database- typically read-only- Data stored in relational or multidimensional format- Multidimensional db often populated from relational db

Populated from existing source systems- Secondary sources of data- Populated from existing internal or external data sources- It is possible to build DSS on top of operational systems

Used for reporting purposes- not transaction-based- Used primarily for reporting purposes- Must be designed for analysis purposes- OLAP; not OLTP

Terminology – Decision Support and Multidimensional AnalysisTerminology – Decision Support and Multidimensional Analysis

Decision Support Systems (DSS)Facilitate business analysisSupport business decision makers by providing various types of analysis: trend, comparison and ad hoc reporting

Multidimensional AnalysisAllows analysis by business dimensionEquates to ‘Flexible Reporting’Allows for drill down, drill up and iterative data analysis

Page 3: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

3

Terminology – OLTP and OLAPTerminology – OLTP and OLAP

OLTP On-Line Transaction Processing

- support specific application- Maintain integrity of data

OLAPOn-Line Analytical Processing

- support business analysisPoints of Difference

Orientation or alignment of dataIntegrationHistory—time horizon of dataData access and manipulationUsage patterns

OLTP vs. OLAP – Orientation or Alignment of DataOLTP vs. OLAP – Orientation or Alignment of Data

Organized around Applications

Different systems hold differenttypes of data.

Data is inherently organized by application.

Different information in a different system.

Organized for Business Dimension

All types of data are integrated into one system.

Data is organized by defined dimensions of the business.

Information from different systems stored in a single database.

OLTP OLAP

Page 4: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

4

OLTP vs. OLAP – IntegrationOLTP vs. OLAP – Integration

Typically Not Integrated

Different key structuresDifferent naming conventionsDifferent file formatsDifferent hardware platforms

Must Be Integrated

Standard key structuresStandard naming conventionsStandard file formatOne warehouse server– Logical server

OLTP OLAP

OLTP vs. OLAP – HistoryOLTP vs. OLAP – History

Recent or Current Data

60-90 daysCurrent values onlyNo time key No time series analysisPrimary source

Historical Data

2 or more years Historical snapshots of OLTP dataTime key Time series analysisSecondary source– Until data is purged or lost from

OLTP (after 60-90 days)

OLTP OLAP

Page 5: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

5

OLTP vs. OLAP – Data Access and ManipulationOLTP vs. OLAP – Data Access and Manipulation

Transactions

Inserts, Updates, Deletes, SelectsSmall amount of data involved in each transactionHighly ‘indexable’RDBMS focus– Locking– Concurrency– Logical Unit of Work

Bulk Processes

Selects onlyLarge amount of data involved in each processNot always ‘indexable’RDBMS focus– Parallel Loader, Query– Star Join– Bit mapped Indexes

OLTP OLAP

OLTP vs. OLAP – Usage PatternsOLTP vs. OLAP – Usage Patterns

Fairly ConsistentMaintain a constant system

utilization pattern

OLTP OLAP

Spiked or UnevenLarge period of light use and

spiked usage pattern

System Resource Utilization Graphs

Page 6: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

6

OLTP vs. OLAP – SummaryOLTP vs. OLAP – Summary

Aligned by Application

Typically Not Integrated

Recent or Current Data

Transactions

Fairly Consistent

Aligned by Dimension

Must Be Integrated

Historical Data

Bulk Processes

Spiked or Uneven

OLTP OLAP

Alignment:

Integration:

History:

Data Access:

Usage:

WarehouseHeadaches

Batch

Maintenance

Tuning

Intro to ERM and ERDIntro to ERM and ERD

Terms and Concepts

Page 7: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

7

ENTITIESENTITIES

ERM TerminologyERM Terminology

ERM - Entity Relationship Model (design)

ERD - Entity Relationship Diagram (graphical)

Entity - things of interest to the business, represented by boxes and implemented as tables

Attributes - things to know about an entity, implemented as columns in tables

Relationships - how entities relate, represented by lines on ERD and implemented as foreign keys

Page 8: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

8

Entity ParadigmsEntity Paradigms

Rounded corners - ERD,

Square corners - Relational

NamingShould be singular in natureConsistency, communication, compatibility

RELATIONSHIPSRELATIONSHIPS

Page 9: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

9

Relationships and Business RulesRelationships and Business Rules

RX TransactionRX

Relationship - Line and “crow’s foot” represent a foreign key relationship from RX Transaction and RX

Cardinality - crow’s foot means “one or more”, absence means “one”

Relationships and Business RulesRelationships and Business Rules

RX TransactionRXallowsIs allowed by

Optionality

Solid bar means that the relationship MUST existCircle means that relationship MAY existUse words near entities with optionality symbols to complete sentences for definitionRX may allow RX TransactionsRX Transactions must be allowed by an RX

Page 10: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

10

RelationshipsRelationships

Vice PresidentDepartmentBe managed by

manage

One-to-One relationship:Each department must be managed by one VP.Each VP may manage one department.

RelationshipsRelationships

Vice PresidentManagement Team

contains

Is contained in

One-to-Many relationship:Each management team must contain many VP’s.Each VP may be contained in one management team.

Page 11: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

11

RelationshipsRelationships

DegreeEmployeeIs held by

hold

Many-to-Many relationship:Each Employee must hold one or more degrees.Each Degree may be held by one or more employees.

ATTRIBUTESATTRIBUTES

Page 12: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

12

Attributes - TerminologyAttributes - Terminology

Attributes are the information we wish to keep about a particular entity

Example: Inventory

InventoryStore_idItem_id

AmtUnits

Attributes - ImplementationAttributes - Implementation

Entities and their attributes are shown as:ENTITY NAME (Attribute1, Attribute2, Attribute3)

Inventory (Store_id, Item_id,Amount, Units)

To specify a primary key for an entity/table, underline the appropriate Attribute(s)DRUG (Store_id, Item_id, Amount, Units)

For the purposes of normalization, repeating groups of attributes may be shown in bracketsSALES (ITEM_ID, DATE, SALES_AMT, {ITEM_NAME, CLASS})

Page 13: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

13

Warehouse Architecture OverviewWarehouse Architecture Overview

Warehouse OverviewWarehouse Overview

Basic componentsWarehouse

Server

WarehouseAccess Tool

SourceSystems

DesignStrategies

Page 14: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

14

Warehouse OverviewWarehouse Overview

Designers must consider and understand unique characteristics and requirements of all three previous components

Ideally, a project team should pick the best-of-class tools for storing and accessing data.In reality, all three pieces should be selected with regard to the others, to ensure that each component will complement the others.

Source SystemsSource SystemsOne or more operational systems will be the source(s) of the data stored in the data warehouse.

Source System

A

Source System

B

Source System

C

UserGroup A

UserGroup B

UserGroup C

Page 15: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

15

Source Systems (cont’d)Source Systems (cont’d)

Source systems are typically not integrated.Have unique key structures and unique naming conventions

Possess overlapping data

Source systems hold current value data.

Source systems will indirectly define the scope of a warehouse.

Only data found in source systems can be included in data warehouse; no “new” data can be created.

Each operational system will have unique characteristics (levels of detail or granularity of data, types of data or metrics available)

Warehouse ServerWarehouse Server

DWHRDBMS

HW Platform (typically UNIX-based)

DWHRDBMSDWH

RDBMS

Distributed Architecture A

DWHRDBMS

DWHRDBMS

Gateway

Distributed Architecture B

Page 16: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

16

Warehouse Access Tool / ArchitectureWarehouse Access Tool / Architecture

DWHRDBMS

HWPlatfom

MDDB

HWPlatform

Client

HWPlatform

App ServerSQL

SQL

SQL

Messaging

MDDB Calls

MOLAP

ROLAP (2-3 tier)

Design OverviewDesign Overview

Page 17: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

17

The Warehouse Trade-Off TriangleThe Warehouse Trade-Off Triangle

Query Performance

User Requirements

Data Warehouse Maintenance

Schema

The ETL Process

ETL = Extraction, Transformation

and Loading

The ETL Process

ETL = Extraction, Transformation

and Loading

Page 18: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

18

Batch Process – OverviewBatch Process – Overview

Source System

ExtractProgram

ExtractFile

Source System Server Warehouse Server

LoadFile

DWHRDBMS

File Transfer

LandingSpace

Batch Process – ExtractsBatch Process – Extracts

Source System

ExtractProgram

ExtractFile

Source System Server

Extracts are programs that generate data files.

Perform data transformations, data cleaning.

Perform key conversions.

Reformat data to the standards of the warehouse.

Must produce data in a file format suitable for loading into the data warehouse (delimiters, capitalization, etc.).

May build aggregation tables.

Page 19: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

19

Batch Process – ExtractsBatch Process – Extracts

Source System

ExtractProgram

ExtractFile

Source System Server

Basic Types of Extracts1) Facts tables

Must provide load files for the following tables:

–Base tables–Historical tables–Aggregate tables

2) Lookup tablesMust provide data to populate the following tables:

–Lookup tables–Relationship tables

Batch Process – Extracts (cont’d)Batch Process – Extracts (cont’d)

Static Extraction for the first loading of the DWH

Incremental Extraction for the update of the DWH

Page 20: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

20

Batch Process – File TransfersBatch Process – File Transfers

Source System

ExtractProgram

ExtractFile

Source System Server Warehouse Server

LoadFile

File Transfer

LandingSpace

Batch Process – File Transfers (cont’d)Batch Process – File Transfers (cont’d)

File Transfer: Process of moving data files to data warehouse server.

After Extracts, generated files must be moved from source systems to data warehouse server.

Design Considerations

Transfer method and network impact Usually transferred via FTP

Data volumes are usually large, therefore, the impact on the network and the transfer rate should be tested and understood

Landing space must have enough disk space on the data warehouse to temporarily store extract files before they are loaded into the warehouse

Page 21: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

21

Batch Process – File Transfers (cont’d)Batch Process – File Transfers (cont’d)

Design Considerations

Scheduling routinesIf there is not enough landing space, scheduling routines must

be designed.

Routines must coordinate file transfers and database loads, transferring a new data file only after an existing file has been loaded and is no longer needed.

Batch Process – Data LoadsBatch Process – Data Loads

Source System

ExtractProgram

ExtractFile

Source System Server Warehouse Server

LoadFile

DWHRDBMS

File Transfer

LandingSpace

Page 22: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

22

Batch Process – Data Loads (cont’d)Batch Process – Data Loads (cont’d)

Data Load: data loaded from extract file into database.

Post-load processes

Aggregation routines

Basic Types of Load Procedures

Append new records to existing table.

Drop table and reload updated data file.

Update existing records.

Batch Process – Data Loads (cont’d)Batch Process – Data Loads (cont’d)

Post Load Processes

Must update table indexes after data loads.

For statistics-based optimizers, must update table and index statistics after loads.

Aggregation Routines

Must run aggregation routines if aggregate data preparation is performed in the data warehouse database.

Page 23: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

23

Batch Jobs – OverviewBatch Jobs – Overview

Basic Refresh Jobs: necessary to update tables in warehouse with current information

Lookup TableFact TableAggregate Table

Maintenance Jobs: necessary to maintain tables in warehouse

Updating fact dataRe-organizing data

Basic Refresh Jobs – Lookup TablesBasic Refresh Jobs – Lookup Tables

Purpose

Apply changes in existing “organizational” systems to lookup data in data warehouse.

Changes include addition of new items or changes to descriptive information.

No changes to attribute keys or attribute relationships.

Page 24: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

24

Basic Refresh Jobs – Lookup Tables (cont’d)Basic Refresh Jobs – Lookup Tables (cont’d)Basic Methods

No refresh Extract is run once to populate DWHOften used in pilot or prototype systems

Drop and reloadExisting table is dropped or emptiedExtract is re-run to capture current informationTable is loaded with new extract

Append to existing tableExtract is re-run to capture current informationNew extract and “old” or “master” lookup file are compared.New “Delta” file is generated.Delta is applied to master lookup file and lookup table in warehouse.Delta file may be loaded into warehouse directly, ORDelta may be applied to master lookup file and then use Drop and Reload method, loading the master lookup file.Sophisticated batch routines, normally used in production

Basic Refresh Jobs – Lookup Tables (cont’d)Basic Refresh Jobs – Lookup Tables (cont’d)

Org

Source System

ExtractProgram

ExtractFile 1/96

DWHRDBMS

ExtractFile 2/96

LookupTable

Page 25: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

25

Basic Refresh Jobs – Lookup Tables (cont’d)Basic Refresh Jobs – Lookup Tables (cont’d)

Org

Source System

ExtractProgram

MasterLookup

DWHRDBMS

ExtractFile 2/96

LookupTable

CompareProgram

DeltaFile

Basic Refresh Jobs – Fact TablesBasic Refresh Jobs – Fact Tables

Purpose

Refresh or update fact data in DWH with the new data from source systems.

Basic methods

Bulk or historical insertExtract is run to capture all data existing in source systemsData is bulk-loaded into data warehouse fact tablesSimple batch routineUsed to “start” warehouse or provide initial data setsOften doesn’t perform any cleansing or integration

Drop and reload

Append to existing table

Page 26: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

26

Basic Refresh Jobs – Fact Tables (cont’d)Basic Refresh Jobs – Fact Tables (cont’d)

Fact

Source System

ExtractProgram

DWHRDBMS

ExtractFile for

9/95 thru1/96

FactTable

Basic Refresh Jobs – Fact Tables (cont’d)Basic Refresh Jobs – Fact Tables (cont’d)

Basic methods

Drop and reloadHistorical or Bulk extract is re-run to capture all available dataExisting warehouse table is emptied or truncatedFile is inserted into empty fact tableSimple batch routineUsed in prototypes or pilotNot feasible for large data sets

Append to existing tableExtract is re-run to capture current informationNew extract is added to “end” of existing fact tableMost common method used in production systems

Page 27: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

27

Basic Refresh Jobs – Fact Tables (cont’d)Basic Refresh Jobs – Fact Tables (cont’d)

Fact

Source System

ExtractProgram

DWHRDBMS

FactTable

9/95 - 1/96

ExtractFile 2/96

Append2/96

Basic Refresh Jobs – Aggregate tablesBasic Refresh Jobs – Aggregate tables

Purpose

Refresh or aggregate tables

Basic methods

Aggregate in warehouse RDBMSProduce atomic extractTransfer and load atomic extract into atomic fact tableProduce aggregate values using SQL accessing atomic fact tableInsert aggregate values into aggregate fact table

Aggregate in batch (on source systems or warehouse server)Produce atomic extractTransfer and load atomic extract into atomic fact tableProduce aggregate extract from atomic extractTransfer and load aggregate extract into aggregate fact table

Page 28: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

28

Basic Refresh Jobs – Aggregate Tables (cont’d)Basic Refresh Jobs – Aggregate Tables (cont’d)

Source System

ExtractProgram

AtomicExtract

Source System Server Warehouse RDBMS

AggregateExtract

AggregateProgram

AtomicFact

Table

AggregateFact

Table

Aggregate SQL

Routines

Maintenance Jobs – Updating Fact DataMaintenance Jobs – Updating Fact Data

Purpose

As changes are made to source system data, they should be reflected in the data warehouse.

Basic Methods

Ignore changes.

Wait until audited data is available.

Drop and reload day’s extract.

Capture and apply changes.

Transfer changes.

Page 29: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

29

Maintenance Jobs – Updating Fact Data (cont’d)Maintenance Jobs – Updating Fact Data (cont’d)

Sun Mon Tue Wed Thu Fri Sat

SunPre

AuditData

WedPre

AuditData

SunPostAuditData

Scenario

Audit Process produces clean data set 3 days after initial set is posted.

Maintenance Jobs – Updating Fact Data (cont’d)Maintenance Jobs – Updating Fact Data (cont’d)

Fact

Source System

ExtractProgram

Sunday

DWHRDBMS

FactTableCompare

ProgramDeltaFile

SunPre

Data

WedPre

Data

SunPostData

Page 30: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

30

Maintenance Jobs – Re-Organizing DataMaintenance Jobs – Re-Organizing Data

Store

Region Lookup Store

Store_idStore_descRegion_id

Lookup Region

Region_idRegion_desc

Relationship between Region

and Store changes

Must update foreign key in Store Lookup

Region_id

212 1112

Store_id

13 2124273557

Store_desc

San FranBostonDallas PhillyDCLas Vegas

Dallas is moved to the East

Region

Maintenance Jobs – Re-Organizing Data (cont’d)Maintenance Jobs – Re-Organizing Data (cont’d)

Lookup Store

Region_idStore_idStore_desc

Lookup Region

Region_idRegion_desc Must update key

for Store in all tables

Region_id

11122 12

Store_id

040711030811

Store_desc

BostonPhillyDCSan FranDallasLas Vegas

Region_id

11122 12

Store_id

0407 X110307 X11

Store_desc

BostonPhillyDCSan FranDallasLas Vegas

Best Case Worst Case

Fact Sales

Region_idStore_idItem_idWeek_idSales_dollarsSales_units

Page 31: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

31

Maintenance Jobs – Re-Organizing Data (cont’d)Maintenance Jobs – Re-Organizing Data (cont’d)

Store Sales

Store_idItem_idDateSales_DollarsSales_Units

Region Sales

Region_idItem_idDateSales_DollarsSales_Units

Lookup Store

Store_idStore_descRegion_id

Lookup Region

Region_idRegion_desc

Region_id

212 1112

Store_id

13 2124273557

Store_desc

San FranBostonDallas PhillyDCLas Vegas

It is necessary to Re-aggregate table values

Batch Process – FrequencyBatch Process – Frequency

Batch job frequencies differ with data sources and level of detail

Typically there will be a set of batch routines dedicated to each level of time detail (daily, weekly and monthly batch job)

Frequency is often but not necessarily tied to the level of timedetail included in the data files to be loaded during that batchroutine

Page 32: Introduction to Data Warehousing - Dipartimento di …semeraro/GCI/Seminario_datawarehouse.pdf · Introduction to Data Warehousing Pasquale LOPS ... Design Considerations 9Scheduling

32

Batch Process – Frequency (cont’d)Batch Process – Frequency (cont’d)

Daily

Daily

Weekly

Weekly

Level of Detail

FrequencyFrequency vs. Detail Chart

A

A

B

C??

Weekly Tables

Daily Tables

Current Week

Current week problem

ReferencesReferences

Golfarelli, M., Rizzi, S., Data Warehouse: Teoria e pratica della progettazione, McGraw-Hill, 2002.