Top Banner
S IEBEL S YSTEMS , I NC . Siebel Analytics 7/7.5 MetaData Construction Guidelines Version 1.0
23
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Analytics Metadata

S I E B E L S Y S T E M S , I N C .

Siebel Analytics 7/7.5

MetaData Construction Guidelines

Version 1.0

Page 2: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page i

Revision History:

Date Author Description

October 28, 2002

Jeff McQuigg Original creation, incomplete draft

November 11, 2002

Kurt Wolff Comments

November 12 Jeff McQuigg Revision 1, draft

November 15 Jeff McQuigg Revision 2, draft

November 25 Jeff McQuigg Revision 3, draft, input from Paul Benedict and Kurt Wolff

February 17, 2003

Jeff McQuigg Version 1

Page 3: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page ii

Table of Contents

REVISION HISTORY: ..................................................................................... I

TABLE OF CONTENTS................................................................................... II

EXECUTIVE SUMMARY .................................................................................. 1

ANALYTICS METADATA BEST PRACTICES ..................................................... 2

Siebel Analytics 7.x MetaData Modeling Best Practices .......................................2Data Warehousing Best Practices for Siebel Analytics 7.x ...................................3

SIEBEL ANALYTICS META DATA MODELING SCENARIOS .............................. 4

Circular Joins .............................................................................................4

Multi-Fact table Metrics and Reports..............................................................6

Dimensional Extension Tables ......................................................................7

Fact Extension Tables..................................................................................8

Combo Tables ............................................................................................9

No direct physical link between a base Dimension and a Fact table .................11

Fragmentation..........................................................................................13

Review Presentation Layer Aliases ..............................................................16

Many-to-Many Solutions ............................................................................16

DATA WAREHOUSE ARCHITECTURAL RECOMMENDATIONS ........................ 19

Simple Dimensional Attribute Denormalization .............................................19

Use of ROW_WIDs for LOVs .......................................................................19

Ensure Full Referential Integrity in the Physical Data Model............................20

Convert Non-Indexable Common Filters to Indexable Filters via ETL ...............20

Page 4: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 1

Executive Summary

This document provides Siebel recommended Best Practices and several specific scenario solution details regarding Physical Data Modeling and MetaData Construction in Siebel Analytics 7.x. It is intended to be used as a reference guide for any Analytics project which requires development of the Analytics Repository, both for Siebel OOB Applications and Stand Alone. The recommendations presented in this document are limited to the three layers in the MetaData – Physical, Business and Presentation, and does not address other Analytics areas such as Report creation, server tuning, security, installation, or other such areas. Additionally, several key Physical Data Modeling recommendations are reviewed which if implemented, may reduce the complexity of the Analytics MetaData.

The document contains the following sections:

Analytics MetaData Best Practices, which briefly overviews some best practicesregarding overall Data Warehouse design and Siebel Analytics Meta Data construction.

Siebel Analytics Meta Data Modeling Scenarios, which presents a series of problems and issues along with detailed solutions

Data Warehouse Architectural Recommendations, which reviews some design considerations when modifying or building the base data model.

Page 5: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 2

Analytics MetaData Best Practices

This section contains several best practices for Data Warehouses as they relate to the Siebel Analytics 7.x MetaData. Many of the specifics behind the implementation of some of these Best Practices are discussed in greater detail in the following section.

Siebel Analytics 7.x MetaData Modeling Best Practices Create the Business Model with 1:M complex joins between Logical Dimension

Tables and the Facts. The Business Model should ideally resemble a simple star schema – facts surrounded by several dimensions that link directly into them. By modeling a snowflake, more flexibility is allowed, but may create more columns in the presentation layer.

For Analytics version 7.0.x, map all Physical Fact sources to one Logical Fact table. Version 7.5.2+ can support multiple logical Fact tables, but it is recommended to do so only when entirely new Fact sources are added to an existing model. By doing so, identification of major additions to the MetaData will become easier. Modifications to existing Fact tables or aggregates should be added into the same Facts Logical Table.

Aggregate sources should be created as a separate Source within the single Logical Fact Table. Their Aggregation Content in the Content tab should describe which dimensions and at which levels they correspond.

Combine all like dimensional attributes into one logical dimension table. Where needed, include data from other dimensions into the main dimension source via the use of aliases in the Physical Layer. Ideally this should occur during the ETL for optimal performance.

Every logical dimension table should have a dimensional hierarchy associated with it. Ensure that all appropriate Fact sources link to the proper level in the hierarchy via Aggregation Content in the Content tab.

Eliminate all physical joins that cross dimensions with the use of aliases (Inter-Dimensional circular joins).

Eliminate all Circular Joins within a dimension in the Physical Model via the creation of physical table aliases (Intra-Dimensional Circular Joins).

To aid in reducing lurking physical joins, Import the Physical Data Model without FKs, and create them as needed. As an added management practice, use aliases for all tables that are used in the Logical layer – doing so will allow easy identification of which physical tables are used and which are not used.

Physically model Fact Extension tables to their base tables via 1:M FK joins, and include them in the existing source for the Logical Table. In certain cases the Extension table may be joined directly to the fact table to eliminate an additional join, improving performance.

Physically model Dimension Extension tables to their base tables via a 1:M FK joins, and included them in the existing source for the logical table. Additionally, create a source for just the Dimension _DX table, and create a 1:M physical join between it and the Fact tables it applies to. Note that the PK for both the Dimension Base table and the Dimension Extension table are identical, and the relationship is required to be 1:1. Thus, although a circular join will occur in certain instances, it does not alter the record set or negatively impact performance.

Page 6: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 3

Save backups of the online repository before and after every completed unit of work. If needed, use File | Copy As to make an offline copy with the changes you have made.

Ensure the MetaData is generating the correct record set first, then focus on performance tuning activities.

Ensure that aliases for presentation layer columns and tables are not used unless necessary. Verify that reports do not use the aliases. Be mindful that renaming an element at either the Presentation or Logical layer will cause this to occur.

When creating table aliases in the Physical Layer, keep the original table name, followed by the alias name, for example W_DAY_D Hire Date. This will keep all like tables together in the Physical layer window when tables are displayed alphabetically. Note this is the opposite of how the MetaData was developed.

Opaque Views (A Physical Layer table that consists of a Select statement) should be used only as a last resort option. Ideally a physical table should be created, or alternatively, a database view.

In general, push as much processing to the database as possible. This includes tasks such as filtering, string manipulation and additive measures.

Ensure that all levels of a hierarchy contain an appropriate value for the Number of elements at this level field. Fact sources are selected on a combination of the fields selected as well as the levels in the dimensions that they link into. By adjusting these values, you can alter the fact source that Analytics will select.

Data Warehousing Best Practices for Siebel Analytics 7.x Denormalize data into _DX tables via the ETL process to reduce runtime joins to

other tables Join to _WID values instead of codes or names Create new fact tables to support requirements when existing fact tables do not

adequately meet the dimensional needs Create new fact tables or use the _FX to physically store links to other

dimensions when not in the existing data model Move as much of the query logic to the ETL as possible to improve system

response time. Pre-calculation of additive metrics and attributes will reduce query complexity and therefore response time.

The Physical Data Model should more closely resemble the Analytics Meta Data (Star/ Snowflake) instead of an OLTP system (approximately 3NF). When the Physical model becomes more like the underlying transactional model, performance problems will most likely arise.

Avoid Coded Records, where the meaning of a record or field changes depending upon the value of a field. An example of this would be if joins to the W_LOV_D table were done on Code and type, not with the ROW_WID as is currently done in the Siebel Applications.

Page 7: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 4

Siebel Analytics Meta Data Modeling Scenarios

The following scenarios represent common situations that may arise in an Analytics project. Along with a description of each, solutions are provided that detail the techniques used to handle the issue.

Circular Joins

Issue Description

Ensure that all circular joins are removed from the physical model. There are two types of circular joins, each having a different effect on queries:

Intra-Dimensional

Description: Two join paths exist between two tables in a single source for a Logical Dimension table.

Effects: The wrong join path is chosen in the SQL, resulting in the wrong record set.

Example: Frequently seen when small decode tables are linked into two or more main tables in a dimensional source for the purposes of denormalization. For example, a Country Lookup Table may be linked to the Account table and to the Customer table, which is the parent of Account:

Solution: Alias each of the tables, and have one alias link to one base table, and the other alias to another. Ex.: Alias the Dim_Country table, call it Dim_Customer_ Country, link it to the Dim_Customer table, remove the link to Account. For the original Dim_Country table, break its join to Dim_Customer. Thus one Dim_Country table links to one base table, and the other country table links to the other.

Determine if the new aliased Dim_Country table should be mapped to the same logical columns as the original Country table (in the case of pure denormalization (where the Customer Country will always equal the Account Country) or mapped to new columns (in the case of different context, meaning where Customer Country may not always equal Account Country).

If there is a pure denormalization occurring (Account_Country = Customer_Country), then additional source for the logical table will be needed. As a source can have only one mapping per logical column,

Page 8: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 5

two sources will be needed, with one source having the original Dim_Country and another one having the new, aliased Dim_Country. In this case, map the Dim_Country and Dim_Customer_Country columns to the same logical columns.

The exception to the rule of no Intra-Dimensional joins is when dealing with _DX or _DH tables. As long as these tables are 1:1 with the base _D table, the Circular join will not cause a problem of reduced record set.

Inter-Dimensional

Description: Two Logical Dimension Tables are physically connected by a join“behind the scenes”.

Effects: An extra join is issued between the two dimensions when used in conjunction with facts. In some cases, there may be no impact, but more frequently the resulting record set may be incorrect.

Example: A common example of this is shown in the diagram below. Here, there are two Logical Dimension Tables, Time and Customer, and one Fact Logical Table. The Time Logical Table includes Dim_Day, and the Customer Logical Table includes both Dim_Customer and Dim_Day. The relationship between Dim_Customer and Dim_Day indicated when the customer was acquired.

In this example, the Dim_Day physical table is used in two different logical dimension tables, which will cause an invalid result. When Dim_Day, Dim_Customer, and Fact_Sales are used together, Analytics will join Dim_Day (indicting the day the revenue was booked) with both the Fact_Sales table and with the Customer_Acquire_Date in Dim_Customer. This will undoubtedly result in incorrect results.

Page 9: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 6

Solution: Alias each of the physical dimension tables and eliminate the joinbetween them. Join the aliased table into one dimension, and the original, non-aliased one into the other dimension. In the example, Dim_Day is aliased, renamed, and joined into the Customer Dimension via a different join:.

Comments

Out of the box Siebel Analytics does not have any such circular join issues. An indication of this can be seen in the numerous aliases for common tables such as W_ORG_D, W_GEO_D, and W_PERSON_D. For these tables, several aliases havebeen made, each indicating its specific type (as defined by the dimensions in the Business Model). For example, W_ORG_D has aliases for Created by Org, Competitor, Owner Org, Shipped Account Org, etc. Each of these aliases is used in one and only one logical table/dimension.

Siebel Analytics 7.x functions best when its underlying data model to be of the Star/Snowflake schema variety. In such a modeling schema, there are no cross dimensional links – all instances of a physical table are replicated to align with their context (i.e. Dimension). There is no concept of simple Geography for example; there are concepts of Sales Geography, Customer Geography, Originating Geography, Billing Geography, etc. The context of how the specific table is used is critical; by first identifying the multiple contexts in which it will be used, a determination of which Logical Tables/Dimensions (and therefore aliases) will be needed can be more readily made. Continuing with the example, there would most likely be logical tables in the Business Model for each of the Geographies: Sales Geography, Customer Geography, Originating Geography, Billing Geography. In each of these Logical Tables, there would be a corresponding aliased version of the W_GEO_D table.

If it is determined that one of the joins is not needed then the join can simply be deleted and the circular join will be solved.

Multi-Fact table Metrics and Reports

Issue Description

It is frequently desired to have a report with metrics from two fact tables or a single metric derived from two fact tables.

Solution

Ensure that there are no Fact-to-Fact joins in the Physical Layer. Siebel Analytics will determine that two fact sources (and therefore two fact tables) are required to

Page 10: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 7

retrieve the desired results, and issue parallel SQL to the database, and finally join the results on the Analytics server.

Comments

A fact to fact join is frequently a very poor performing join, given the size of the record sets involved. A query which requires a metric from two fact sources (2 fact tables) will be handled via the parallel execution of SQL. A data set is retrieved from the 1st fact table, and another data set is retrieved from the 2nd fact table. The Analytics server then merges the two data sets on the Server, ideally on reduced record sets due to the prior application of filters in the database.

Fact Extensions do not fit into this category; as they are not standalone fact tables, they require a FK to the base fact table to derive Dimensional keys.

In most cases, this scenario would not occur, as metrics that are used together would be modeled into the same fact table.

Dimensional Extension Tables

Issue Description

How to properly include Dimension Extension tables into Siebel Analytics.

Solution

Physical Layer:

When modeling a Dimension Extension (_DX), join it to the base Dimension table (_D) as a 1:M FK join on the ROW_WID. By mimicking a parent child between _DX and _D, the _DX will not be included in queries which do not require any of its fields, reducing overjoins. Additionally, join it to the facts in the same manner as the _D table, using a 1:M between the _DX and the Facts. By doing so, queries that need values from the _DX and the facts but not the _D will bypass the _D table, improving run time performance. Queries that use the Facts and values from both the _DX and the _D will produce an extra join; however this join is redundant and may be ignored. This assumes that there is a 1:1 between the _DX and the _D; if this is not the case then such a join path may not return valid results.

_DH tables should be modeled in an identical manner: 1:M _DH to _D and 1:M _DH to Facts as shown below. Note both join to the base dimension on the ROW_WID.

Page 11: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 8

Business Layer:

Add the new extension table to the existing source for the Logical Dimension Table. A separate source for just the base dimension table should also be created for performance reasons. To support a new source with just the _DX or _DH, the physical joins between it and the facts must exist as shown above. Thus, a fully described Logical Table for this sample dimension is as follows:

Source: W_ORG_D: includes tables W_ORG_D, W_ORG_DX and W_ORG_DH, required

Source: W_ORG_DX includes table W_ORG_DX, optional for performance

Source: W_ORG_DH includes table W_ORG_DH, optional for performance

Comments

By modeling the _DX as a parent of the _D, Analytics simply thinks of it as another parent table. As Analytics is primarily designed to support Star/Snowflake schemas, it will not include parent tables when not necessary.

Note that the Extension table should be modeled in this manner even if it contains FKs to other dimensional tables.

Fact Extension Tables

Issue Description

How to properly include Fact Extension tables into Siebel Analytics.

Solution

Physical Layer:

Model the Extension table (_FX) to the base Fact table (_F) as a parent-child 1:M.. By mimicking a parent child between the _FX and the _F, the _FX will not be included in queries which do not require any of its fields.

Page 12: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 9

Business Layer:

Add the new extension table to the existing source for the Facts. A separate source for just the base fact table will not be necessary. If the _FX table contains additional FKs to new dimensions, then the following will be necessary:

A physical join to the Dimension table

Adjust the Aggregation Content Filter to include the new dimension

Comments

If there is a case where a query will be generated that only contains metrics from the _FX and dimensions linked directly to the _FX, then create a new source for this _FX. However this scenario is unlikely, and therefore an additional source just for the _FX table is not needed. If this scenario is identified, a new fact table should be considered.

Combo Tables

Issue Description

In some cases, a need may exist where a table is needed to support both Attributes and Metrics. It therefore is needed as both a Fact table and as a Dimension table. Refer to the W_ACTIVTY_F table in the Siebel 7.x Core Meta Data model as a real life example – it serves as both the dimension and the fact.

Solution

Create the physical model as would normally be done – no aliases are necessary for most cases. Create a logical Dimension table with a source containing all of the necessary tables to support the dimension. If a table has both facts and attributes, this may include a Fact tables.

Do the same for the facts by creating a new source with all of the necessary tables to support the metrics. Note that this may include several dimensions, as there may be counts off of these dimension tables.

By not aliasing the fact table, an additional join will be eliminated when the two are used together. If one of the tables uses an alias, then a self-join will be used.

Comments

Care must be taken if the Dimensional version of the physical table is to be used in queries where it joins to another fact table.

Analytics will mix the direct joins that are desired for the table as a fact and as a dimension, and will over-join when the table is a dimension.

Page 13: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 10

For example, Assume Physical Table Activities is used (without an alias) in both a Fact Source and a Dimension Source. When Activities serves as the fact table in a query, it joins to other dimensions such as the Dim_Acct table in a normal manner as shown above.

This scenario supports Activities as a dimension as well, shown below joining to a fact table:

A problem will occur when a query uses the Activities table as a dimension, and includes other dimensions that join to both the Fact table and the Combo table. The diagram below shows a query that wishes to see facts from W_FACTS_F by Activity and Account. The query that is generated will include an Inter-Dimensional join (see section above) between Dim_Acct and Activities, which will most likely alter the record set:

In this case, alias the Activities table, and have one of the Activities physical tables be the source for the dimension and the other be the source for the fact. Thus, Activities Dimension and Activities Fact will have different physical tables, and the Inter-Dimensional join will not occur:

Page 14: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 11

No direct physical link between a base Dimension and a Fact table

Issue Description

There may be a need to link a particular dimension table with a particular fact table, but no FK to the dimension exists on the fact table. When another table exists which contains this relationship, it can be modeled in Siebel Analytics to create the link.

For example, a link is needed from W_PROGRAM_D to a fact table W_INVITEM_F, but there is no direct join possible between the two tables:

However, there are other tables in the Physical model that can be used to create this join, as shown below:

Through the use of the tables W_PERSON_D and W_CAMP_HIST_F, a link between W_PROGRAM_D and W_INVITEM_F can be established via the following steps. Note that this involves a Many-to-Many relationship between W_PROGRAM_D and W_INVITEM_F, and as such additional techniques discussed in a later section may be applicable.

Page 15: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 12

Solutions

There are two alternative solutions to this problem. Solution (A) involves modifying the Facts, and Solution (B) involves modifying the Dimension. Solution A is simplerthan Solution B, and is therefore recommended.

Solution A

This solution involves the creation of a new fact source for the Logical Fact table, with the added linkage tables included. By doing this, you are effectively adding a new FK to the logical fact source. Then, add the new dimension to the Aggregation Content filter for the source. This is in effect identical to Fact Extension (_FX) tables that have additional FKs to other dimensions. Thus, the Facts Logical Table will have the following two sources:

Source1: W_INVITEM_F, Aggregation Content at current levels

Source2: W_INVITEM_F, W_PERSON_D, W_CAMP_HIST_F, Aggregation content at same levels as Source1, plus the Program Dimension

Solution B

Solution B involves putting the two tables used to join (W_PERSON_D and W_CAMP_HIST_F) into a new, lower level in the existing Program dimension. This requires three main steps: first the creation of a new logical table source for the dimension with all three tables (W_PROGRAM_D, W_CAMP_HIST_F and W_PERSON_D). Second, a new lower level in the Dimensional Hierarchy needs to be created. Finally, the existing fact source needs to be adjusted to include the new Dimension by adding to its Aggregation Content filters.

Page 16: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 13

Comments

This is a common need, and does represent a Many-to-Many scenario. As such, metrics will be over counted.

Fragmentation

Issue Description

How to properly implement Fragmentation on a fact source.

Solution

In this example, the filter will be applied to the W_REVN_F_CURR table, which hold data for 2002 and beyond, and the W_REVN_F_HIST table which holds data prior to 2002.

Map in both fact tables to separate sources in the Facts. Everything about them should be identical except for the table name.

For each fragmented fact table source, enter its filter in the Fragmentation content section as follows, using the Dimension, filter:

TheSystem.Time.”Day” < DATE ‘2002-01-01’ (for Historical fact source)TheSystem.Time.”Day” >= DATE ‘2002-01-01’ (for current fact source)

All other variations and derivatives from the same hierarchy must be addressed as well, for example Month, Week, Quarter, Year, etc.

Be sure to check the check box labeled “This source should be combined with other sources at this level” if the fact source is a sub-set of the entire data set. If the source has data that overlaps another table, leave this check box unchecked.

Differing queries will display the following types of behavior:

Query by Day, with a range covering one of the fragments: Only one fragment will be used, with Analytics performing the filter.

SELECT W_REVN_F.REVN,

Page 17: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 14

W_DAY_D."Dim Date"FROM TheSystemWHEREW_DAY_D."Dim Date" BETWEEN date '2001-12-25' AND date '2001-12-31'

select T1030."REVN" as c4, T294."DAY_DT" as c5from "W_DAY_D" T294, "W_REVN_F" T1030where T294."ROW_WID" = T1030."CLOSE_DT_WID"

Query by Day, with a range covering both fragments: Unfiltered parallel SQL will be issued, one for each fragment. Analytics will then perform the filter:

SELECT W_REVN_F.REVN, W_DAY_D."Dim Date"FROM TheSystemWHEREW_DAY_D."Dim Date" BETWEEN date '2001-12-25' AND date '2002-01-05'

-------------------- Sending query to database named OLAP (id: <<8680457>>):select T1030."REVN" as c1, T294."DAY_DT" as c2from "W_DAY_D" T294, "W_REVN_F" T1030where T294."ROW_WID" = T1030."CLOSE_DT_WID"

-------------------- Sending query to database named OLAP (id: <<8680532>>):select T1030."REVN" as c1, T294."DAY_DT" as c2from "W_DAY_D" T294, "W_REVN_F" T1030where T294."ROW_WID" = T1030."CLOSE_DT_WID"

For each differing criteria that may be needed, a similar process must be undertaken. Continuing with the sample above, the tables will be described to Analytics so that if a user runs a report by Year, Analytics will know how to break up the query.

Add CAL_YEAR into the Logical Table for Facts, map it to each of the Fragments, and add it to the Logical Fact Table key.

Add in additional Fragment filters as follows:

TheSystem.Time.”CAL_YEAR” < DATE 2002 for Historical fact sourceTheSystem.Time.”CAL_YEAR” >= DATE 2002 for current fact source

Note that the query results are different this time:

Page 18: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 15

For a query on a single year, one fragment is used with the filter in the query:

SELECT W_REVN_F.REVN, W_DAY_D.CAL_YEARFROM TheSystemWHEREW_DAY_D.CAL_YEAR = 2001

select sum(T1030."REVN") as c1, T294."CAL_YEAR" as c2from "W_DAY_D" T294, "W_REVN_F" T1030where T294."ROW_WID" = T1030."CLOSE_DT_WID" and T294."CAL_YEAR" = 2001group by T294."CAL_YEAR"

For a query that hits multiple years, a single select containing two sub selects and a union all is used. Each sub select has the combined filter on it (e.g. For the fact table with 2002+, its select has a where clause of CAL_YEAR = 2002 or CAL_YEAR=2001):

SELECT W_REVN_F.REVN, W_DAY_D.CAL_YEARFROM TheSystemWHEREW_DAY_D.CAL_YEAR IN (2001, 2002)

select sum(D3.c3) as c1, D3.c2 as c2from (select T294."CAL_YEAR" as c2, T1030."REVN" as c3 from "W_DAY_D" T294, "W_REVN_F" T1030 where T294."ROW_WID" = T1030."CLOSE_DT_WID" and (T294."CAL_YEAR" = 2001 or

T294."CAL_YEAR" = 2002) union all select T294."CAL_YEAR" as c2, T1030."REVN" as c3 from "W_DAY_D" T294, "W_REVN_F" T1030 where T294."ROW_WID" = T1030."CLOSE_DT_WID" and (T294."CAL_YEAR" = 2001 or

T294."CAL_YEAR" = 2002) ) D3group by D3.c2

Comments

Be aware of large tables that use database partitioning. In order for the database to properly use its partitions, the query must be structured in such a way that filtering

Page 19: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 16

occurs on the Fact table, and not the dimension table. This can be accomplished by denormalizing some of the time elements into the Time Logical table as new sources.

Review Presentation Layer Aliases

The use of aliases on the Presentation Layer for both table and columns can have undesired and difficult to diagnose effects. An alias is automatically created when the Presentation Layer object is renamed. Aliasing allows front end reports to continue to use the names on which they were developed for backward compatibility among Presentation layer versions.

Verify that all aliases are removed from the Presentation Layer unless required. Note that the removal of aliases will possibly invalidate several pre-existing reports if these reports were constructed before the new name of the column or table. These reports should be re-developed with the new Presentation Table and Column names, replacing the older ones.

Many-to-Many Solutions

It is common to want to model a Many-to-Many relationship between Dimensions and Facts. For example, it may be necessary to see all employees associated with an opportunity, not just the primary. This section presents a series of tools and techniques that may be applied to solve a particular case.

Technique #1: Select a Primary

Although not a technical solution, the best way to solve the M:M problem is to eliminate it. By selecting one of the many dimensional records that are associated with a fact, the entire problem can be avoided. In the Siebel OLTP, Primaries are used throughout the model, which are carried over and used in the Analytics model. If it is at all possible to identify a primary, and the use of the primary is acceptable to the user community, then it is recommended to use this technique.

Technique #2: Direct Modeling into the Dimension

A straightforward technique where the table that serves as the intersection table is modeled into a lower level in the Dimension. The specifics of this technique are similar to those outlined in Solution B of the No direct physical link between a base Dimension and a Fact table section above.

Note that over-counting will occur when performing the many-to-many join.

Technique #3a: Use of a Bridge Table

Instead of modeling the relationship table into a new lower level in the dimension as in Technique #2, the relationship table can become a separate logical table that servers as the Bridge between the dimension and the facts. Create a new Logical table with the M:M relationship table as the source, mark the logical table as a Bridge table, and adjust the Business model to show the relationship of Facts:Bridge as 1:M and Bridge:Dimension as M:1. The indication that the Logical Table is a Bridge table is merely an indicator to Analytics that the table is not a Fact table, which it assumes to be any lowest-level table in the data model.

Note that over-counting will occur when performing the many-to-many join

Page 20: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 17

Technique #3b: Use a Weighted Bridge Table

Similar to Technique #3a, this technique is the classic Kimball approach, where the Bridge table employs weighting factors to prorate a total value over multiple records. For example, if there is one Opportunity worth $1,000,000 and there are two Employees associated with it, the bridge table might contain a record for each with a weighting factor of 0.5. In this way, each employee will be associated with 0.5 of the whole amount of $1,000,000, or $500,000. If it is determined that Employee A should receive 75% of the credit, then the weighting factors would be stored as 0.75 and 0.25, which would give Employee A 75 of the total or $750,000.

It is important to note that the weighting factors must all add up to 1 (One), as they are effectively percentages of a whole. Additional ETL effort will be required to complete this solution.

This technique eliminates over-counting, but may be difficult to implement if users are not comfortable prorating a value over several records.

Technique #4: Use Level Based Measures

As an enhancement to Techniques 2 and 3, the use of level based measures can help prevent the over counting problem associated with each. When a metric or measure is explicitly bound to a specific level in a dimension, it is indicating that the metric will be viewed at that level. If the metrics in a fact table are to be viewed by a Dimension with which it has a M:M relationship, those metrics can be set to a level in the dimension, thereby forcing that the records be broken out across that dimension. By forcing a breakout of rows (one fact row for each dimensional row), aggregation is prevented, and therefore over counting will not occur.

As an example, suppose there is a M:M between Employee and Fact_Opty_Revenue. The data in the tables indicate that Tom, Larry and Bill are all linked to an Opportunity worth $9 million. The user makes a report that asks for the Opportunity Type and the total Potential Opportunity Revenue. Without level setting the metrics on the fact table, a report that does not include the employee dimension will overcount, as each of the three dim records will be brought into the query and aggregated into one:

Opportunity Type Potential Opportunity Revenue

Software Sales $27,000,000

By level setting the Revenue metrics to the Employee level in the Employee Dimension, this same report will return the following:

Opportunity Type Potential Opportunity Revenue

Software Sales $9,000,000

Software Sales $9,000,000

Software Sales $9,000,000

Page 21: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 18

Although not intuitively obvious as to the cause of the breakout to the end user, the over counting scenario is prevented. When the user adds the Employee to the report, the breakout becomes clearer:

Opportunity Type Employee Potential Opportunity Revenue

Software Sales Larry $9,000,000

Software Sales Tom $9,000,000

Software Sales Bill $9,000,000

Technique #5: Lower the Fact Table

The most complicated and involved solution is to lower the level of the fact table, and create a 1:M between the Dimensions and the Facts. This involves a business rule to split up the metrics and spread them over all possible dimensional records. In the example above, the simplest spread would be to assign Larry, Tom and Bill each 1/3 of the total amount of $9,000,000, or $3,000,000. Thus, a report that does not break out by Employee will still total to the correct $9,000,000. Note that this would require three records in the fact table instead of one, hence the concept of lowering the level of detail in the fact.

Page 22: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 19

Data Warehouse Architectural Recommendations

This section is intended to aid in the design process of the Physical Data Model upon which Siebel Analytics sits. Many difficult Analytics MetaData Modeling scenarios can be avoided through a properly designed Physical Data Model. As Siebel Analytics MetaData Modeling issues are addressed in the above section, this section is aimed at the Physical Data Model and items that may be added to an ETL process.

Simple Dimensional Attribute Denormalization

In many cases, logical dimension tables in the MetaData are overly complex and require many tables to provide the necessary attributes. When a dimension requires the inclusion of large tables, including fact tables, to link to this data, run time performance will suffer greatly. These values should be determined during the ETL process and stored in the _DX tables.

As an example, assume that the Employee Dimension requires the NAME field from W_ORG_D, and two ATTRIB columns from W_TERR_DX. In order to accomplish this, Analytics must join in at run time, the W_PERSON_F, W_PERSON_FX, W_TERR_DX and W_ORG_D tables. Having this logic applied at load time, storing these 3 values in the W_PERSON_DX table will not only speed up the queries, but will simplify the Analytics MetaData by removing 4 aliases and their required joins.

This type of change can be performed independently on each dimension in a serial fashion. For example, Employee can be addressed first, followed by Account, then Opportunity, etc. For each dimension, review the Column Mapping for each source, and determine if these columns can be added to the base level _DX. Next, modify the Data Model (if need be), then the ETL code, re-import the effected tables, delete the aliases and joins that are no longer needed, and remap the column in the Logical Table source.

A second example demonstrates how this is done in the Siebel Analytics Horizontal Application:

Source Table Source Column

OLAP Table OLAP Column

S_ORG_EXT LOC W_PERSON_D EMP_ACCNT_LOCS_ORG_EXT LOC W_OPTY_D ACCNT_LOCS_ORG_EXT LOC W_ORG_D ACCNT_LOCS_ORG_EXT LOC W_PRODUCT_D VENDOR_LOC

This table clearly shows how the physical LOC column is used in four different dimensions. With such denormalizations, it will not be necessary to perform any additional joins to retrieve the data from the LOC column from other tables.

Use of ROW_WIDs for LOVs

The use of ROW_WIDs when joining to the W_LOV_D table will simplify some of the joins and speed up query execution. LOV lookups can be done on the NAME column

Page 23: Analytics Metadata

DRAFT

5/11/2008 Siebel Systems, Inc. Confidential and subject to change Page 20

and a filter on the TYPE column. Although this can be modeled in Analytics, it is far from ideal, as it complicates the model and forces additional and unnecessary joins. Modify the ETL process to do these lookups, and join based on the ROW_WIDs. Note that the aliasing of W_LOV_D will still be required.

Ensure Full Referential Integrity in the Physical Data Model

The use of Outer Joins should be severely limited or eliminated in a properly designed Data Warehouse. It is a best practice to ensure RI by having all children assigned to a parent, ensuring that they are not omitted in queries. In cases where a parent record does not exist, assign the child to the ‘Unspecified’, ‘Unknown’, ‘N/A’ or similar type of record. This will allow a proper 1:M FK join to be built in Analytics, which provides the following benefits:

The parent table will not be used unless it is used in the query (no overjoin)

An index will be used on the parent when it is joined into queries on using the child

No child records will be lost in queries that group by the parent. This is critical, as a Data Warehouse should account for all numerical values, even if the proper dimensionality is not known

It allows for simpler Analytics MetaData

Note that the ETL should ensure RI before loading, as it is good warehousing practice to remove FK constraints in the database to speed load times.

Convert Non-Indexable Common Filters to Indexable Filters via ETL

In some cases, it may be common to perform a complex filter on a column. If this filter is either in multiple reports, or is used very heavily, then its corresponding logic should be moved into the ETL. For example, the following filter used in a report should be converted into an Indicator or flag via the ETL: W_ORG_D.NAME not like ‘%SI%’. By performing this calculation and creating an indicator on the table, an index may be properly used, improving response time.