Top Banner

of 20

Slowly Changning Dimension

May 29, 2018

Download

Documents

Deepak Rawat
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/8/2019 Slowly Changning Dimension

    1/20

    Slowly Changing Dimension

  • 8/8/2019 Slowly Changning Dimension

    2/20

    In Data warehouse, there are two types of tables:

    --Dimension table

    --Fact table.

    Fact tables change regularly as new information collects, whereasdimension table changes very often.

    Although dimension tables are typically static lists, most dimensiontables do change over time. For example, we might need to updatean inventory dimension once a month to reflect new or changed partnumbers. Since these changes are smaller in magnitude comparedto changes in fact tables, these dimensions are known as slowlygrowing or slowly changing dimensions.

    Slowly changing dimensions are dimension tables that have slowlyincreasing dimension data and updates to existing dimensions.When updating existing dimensions, we decide whether to keep allhistorical dimension data, no historical data, or just the current andprevious versions of dimension data.

  • 8/8/2019 Slowly Changning Dimension

    3/20

    Using the Slowly Changing

    Dimensions Wizard The Slowly Changing Dimensions Wizard creates mappings to load slowly changing dimension tables: Type 1 Dimension mapping. Loads a slowly changing dimension table by inserting new dimensions and

    overwriting existing dimensions. Use this mapping when you do not want a history of previous dimension data.

    Type 2 Dimension/Version Data mapping. Loads a slowly changing dimension table by inserting new andchanged dimensions using a version number and incremented primary key to track changes. Use this mappingwhen you want to keep a full history of dimension data and to track the progression of changes.

    Type 2 Dimension/Flag Current mapping. Loads a slowly changing dimension table by inserting new andchanged dimensions using a flag to mark current dimension data and an incremented primary key to trackchanges.Use this mapping when you want to keep a full history of dimension data, tracking the progression ofchanges while flagging only the current dimension.

    Type 2 Dimension/Effective Date Range mapping. Loads a slowly changing dimension table by inserting newand changed dimensions using a date range to define current dimension data. Use this mapping when you want tokeep a full history of dimension data, tracking changes with an effective date range.

    Type 3 Dimension mapping.

    Loads a slowly changing dimension table by inserting new dimensions and updatingvalues in existing dimensions. Use this mapping when you want to keep the current and previous dimension

    values in the dimension table.

  • 8/8/2019 Slowly Changning Dimension

    4/20

    SCD Overview:

    Slowly Changing

    Dimensions Mapping

    Target Table History Data Handling

    Type 1 Dimension Slowly ChangingDimension

    None Inserts new dimensions. Overwrites existingdimensions with changed dimensions.

    Type 2

    Dimension/Version

    Data

    Slowly Changing

    Dimension

    Full Inserts new and changed dimensions. Creates a

    version number and increments the primary keyto track the changes.

    Type 2 Dimension/Flag

    Current

    Slowly ChangingDimension

    Full Inserts new and changed dimensions. Flags thecurrent version and increments the primary key

    to track the changes.

    Type 2

    Dimension/Effective

    Date Range

    Slowly ChangingDimension

    Full Inserts new and changed dimensions. Createsan effective date range to track changes.

    Type 3 Dimension Slowly ChangingDimension

    Partial Inserts new dimensions. Updates changedvalues in existing dimensions. Optionally uses

    the load date to track changes.

  • 8/8/2019 Slowly Changning Dimension

    5/20

    Creating a Type 1 Dimension Mapping

    The Type 1 Dimension mapping filters source rows based on user-

    defined comparisons and inserts only those found to be new

    dimensions to the target.

    Rows containing changes to existing dimensions are updated in the

    target by overwriting the existing dimension. In the Type 1Dimension mapping, all rows contain current dimension data.

    Handling Keys :

    When we use the Type 1 Dimension option, the Designer creates an

    additional column in the mapping target, PM_PRIMARYKEY. In thiscolumn, the Integration Service generates a primary key for each

    row written to the target, incrementing new key values by 1.

  • 8/8/2019 Slowly Changning Dimension

    6/20

    Understanding the SCD 1 Mapping

    The Type 1 Dimension mapping performs the following tasks:

    Selects all rows.

    Caches the existing target as a lookup table.

    Compares logical key columns in the source against corresponding columns in the target lookup table.

    Compares source columns against corresponding target columns if key columns match.

    Flags new rows and changed rows.

    Creates two data flows: one for new rows, one for changed rows.

    Generates a primary key for new rows.

    Inserts new rows to the target.

    Updates changed rows in the target, overwriting existing rows.

  • 8/8/2019 Slowly Changning Dimension

    7/20

    Data Example:

    PM_PRIMARYKEY

    ITEM STYLES

    65000 Sandal 5

    Data after first run:

    Data after second run:

    PM_PRIMARYK

    EY

    ITEM STYLES

    65000 Sandal 12

  • 8/8/2019 Slowly Changning Dimension

    8/20

    Creating a Type 2 Dimension/Version

    Data Mapping The Type 2 Dimension/Version Data mapping filters source rows based on user-definedcomparisons and inserts both new and changed dimensions into the target. Changes are trackedin the target table by versioning the primary key and creating a version number for eachdimension in the table.

    In the Type 2 Dimension/Version Data target, the current version of a dimension has the highestversion number and the highest incremented primary key of the dimension.

    When we use this option, the Designer creates two additional fields in the target:--PM_PRIMARYKEY. The Integration Service generates a primary key for each row

    written to the target.

    --PM_VERSION_NUMBER. The Integration Service generates a version number foreach row written to the target.

    Handling Keys : In a Type 2 Dimension/Version Data mapping, the Integration Service generates a new primary

    key value for each new dimension it inserts into the target. An Expression transformationincrements key values by 1,000 for new dimensions.

    When updating a dimension, the Integration Service increments the existing primary key by 1.

  • 8/8/2019 Slowly Changning Dimension

    9/20

    Understanding the SCD 2 Mapping

    using versioning

    The Type 2 Dimension/Version Data mapping performs the following tasks:

    Selects all rows.

    Caches the existing target as a lookup table.

    Compares logical key columns in the source against corresponding columns in the target lookup table.

    Compares source columns against corresponding target columns if key columns match.

    Flags new rows and changed rows.

    Creates two data flows: one for new rows, one for changed rows.

    Generates a primary key and version number for new rows.

    Inserts new rows to the target.

    Increments the primary key and version number for changed rows.

    Inserts changed rows in the target.

  • 8/8/2019 Slowly Changning Dimension

    10/20

    Data Example:

    PM_PRIMARYKEY ITEM STYLES PM_VERSION_NUMBER

    65000 Sandal 5 0

    Data after first run:

    Data after second run:

    PM_PRIMARYKEY ITEM STYLES PM_VERSION_NUMBER

    65000 Sandal 5 0

    65001 Sandal 12 1

    65002 Sandal 17 2

  • 8/8/2019 Slowly Changning Dimension

    11/20

    Creating a Type 2 Dimension/Flag

    Current Mapping The Type 2 Dimension/Flag Current mapping filters source rows based on user-defined

    comparisons and inserts both new and changed dimensions into the target. Changes are trackedin the target table by flagging the current version of each dimension and versioning the primarykey.

    In the Type 2 Dimension/Flag Current target, the current version of a dimension has a current flagset to 1 and the highest incremented primary key.

    When we use this option, the Designer creates two additional fields in the target:-- PM_CURRENT_FLAG. The Integration Service flags the current row 1 and all

    previous versions 0.

    --PM_VERSION_NUMBER. The Integration Service generates a version number foreach row written to the target.

    Handling Keys : The Integration Service generates a current flag of 1 for each row written to the target. This flag

    indicates the dimension is new or newly updated. If the row is an update to an existing dimension,the Integration Service resets the existing dimension current flag to 0.

    As a result, all current versions of a dimension appear in the target with a current flag of 1. Allprevious versions have a current flag of 0.

  • 8/8/2019 Slowly Changning Dimension

    12/20

    Understanding the SCD 2 Mapping

    using current flagThe Type 2 Dimension/Flag Current mapping performs the following tasks:

    Selects all rows.

    Caches the existing target as a lookup table.

    Compares logical key columns in the source against corresponding columns in the target lookup table.

    Compares source columns against corresponding target columns if key columns match.

    Flags new rows and changed rows.

    Creates two data flows: one for new rows, one for changed rows.

    Generates a primary key and current flag for new rows.

    Inserts new rows to the target.

    Increments the existing primary key and sets the current flag for changed rows.

    Inserts changed rows in the target.

    Updates existing versions of the changed rows in the target, resetting the current flag to indicate the row is nolonger current.

  • 8/8/2019 Slowly Changning Dimension

    13/20

    Data Example:

    PM_PRIMARYKEY ITEM STYLES PM_CURRENT

    _FLAG

    65000 Sandal 5 0

    Data after first run:

    Data after three runs:

    PM_PRIMARYKEY ITEM STYLES PM_CURRENT_FLAG

    65000 Sandal 5 0

    65001 Sandal 12 0

    65002 Sandal 17 1

  • 8/8/2019 Slowly Changning Dimension

    14/20

    Creating a Type 2 Dimension/Effective

    Date Range Mapping The Type 2 Dimension/Effective Date Range mapping filters source rows based on user-defined comparisons and

    inserts both new and changed dimensions into the target. Changes are tracked in the target table by maintainingan effective date range for each version of each dimension in the target.

    In the Type 2 Dimension/Flag Current target, the current version of a dimension has a current flag set to 1 and thehighest incremented primary key.

    When we use this option, the Designer creates two additional fields in the target:

    -- PM_BEGIN_DATE For each new and changed dimension written to the target, the Integration

    Service uses the system date to indicate the start of the effective date range for the dimension.--PM_END_DATE. For each dimension being updated, the Integration Service uses the system date

    to indicate the end of the effective date range for the dimension.

    --PM_VERSION_NUMBER. The Integration Service generates a version number for each rowwritten to the target.

    Handling Keys : The Integration Service generates a begin date for each new and changed dimension it inserts into the target,

    using the current system date. The end date for these dimensions is NULL.

    Each time the Integration Service inserts a changed dimension, it updates the previous version of the dimension inthe target, using the current system date to fill the previously null end date column.

    As a result, all current dimension data in the Type 2 Dimension/Effective Date Range target have null values in thePM_END_DATE column. All previous versions of dimension data have a system date in PM_END_DATE toindicate the end of the effective date range for each version.

  • 8/8/2019 Slowly Changning Dimension

    15/20

    Understanding the SCD 2 Mapping

    using Effective Date RangeThe Type 2 Dimension/Effective Date Range mapping performs the following tasks:

    Selects all rows.

    Caches the existing target as a lookup table.

    Compares logical key columns in the source against corresponding columns in the target lookup table.

    Compares source columns against corresponding target columns if key columns match.

    Flags new rows and changed rows.

    Creates three data flows: one for new rows, one for changed rows, one for updating existing rows.

    Generates a primary key and beginning of the effective date range for new rows.

    Inserts new rows to the target.

    Generates a primary key and beginning of the effective date range for changed rows.

    Inserts changed rows in the target.

    Updates existing versions of the changed rows in the target, generating the end of the effective date range toindicate the row is no longer current.

  • 8/8/2019 Slowly Changning Dimension

    16/20

    Data Example:

    PM_PRIMARYKEY ITEM STYLES PM_BEGIN_DATE

    PM_END_DATE

    65000 Sandal 5 9/1/98

    Data after first run:

    Data after second run:PM_PRIMARYKEY ITEM STYLES PM_BEGIN

    _DATEPM_END_DATE

    65000 Sandal 5 9/1/98 9/1/99

    65283 Sandal 12 9/1/99

  • 8/8/2019 Slowly Changning Dimension

    17/20

    Creating a Type 3 Dimension Mapping

    The Type 3 Dimension mapping filters source rows based on user-defined comparisons andinserts only those found to be new dimensions to the target. Rows containing changes to existingdimensions are updated in the target. When updating an existing dimension, the IntegrationService saves existing data in different columns of the same row and replaces the existing datawith the updates. The Integration Service optionally enters the system date as a timestamp foreach row it inserts or updates.

    In the Type 3 Dimension target, each dimension contains current dimension data.

    When we use this option, the Designer creates two additional fields in the target:

    --PM_PRIMARYKEY. The Integration Service generates a primary key for each rowwritten to the target.

    --PM_PREV_COLNAME. The Designer generates a previous column corresponding toeach column for which you want historical data. The Integration Service keeps the previousversion of dimension data in these columns.

    Handling Keys : In the Type 3 Dimension mapping, the Integration Service generates a primary key value for eachnew row written to the target, incrementing key values by one. Updated rows retain their originalkey values.

  • 8/8/2019 Slowly Changning Dimension

    18/20

    Understanding the SCD 3 Mapping

    The Type 3 Dimension mapping performs the following tasks:

    Selects all rows.

    Caches the existing target as a lookup table.

    Compares logical key columns in the source against corresponding columns in the target lookup table.

    Compares source columns against corresponding target columns if key columns match.

    Flags new rows and changed rows.

    Creates two data flows: one for new rows, one for updating changed rows.

    Generates a primary key and optionally notes the effective date for new rows.

    Inserts new rows to the target.

    Writes previous values for each changed row into previous columns and replaces previous values with updatedvalues.

    Optionally uses the system date to note the effective date for inserted and updated values.

    Updates changed rows in the target.

  • 8/8/2019 Slowly Changning Dimension

    19/20

    Data Example:

    PM_PRIMARYKEY ITEM STYLES PM_PREV_ST

    YLES

    65000 Sandal 5

    Data after first run:

    Data after second run:

    PM_PRIMARYKEY ITEM STYLES PM_VERSION

    _NUMBER

    65000 Sandal 12 5

  • 8/8/2019 Slowly Changning Dimension

    20/20

    Thank You.