Dimensional Modeling Concept Dimensional Model is a logical
design technique that seeks to present the data in a standard,
intuitive framework that allows for high-performance access. It is
inherently dimensional, and it adheres to a discipline that uses
the relational model with some important restrictions. Every
dimensional model is composed of one table with a multi-part key,
called the fact table, and a set of smaller tables called dimension
tables. Each dimension table has a single-part primary key that
corresponds exactly to one of the components of the multi-part key
in the fact table. (See Figure) This characteristic 'star-like'
structure is often called a star join. A fact table, because it has
a multi-part primary key made up of two OR more foreign keys,
always expresses a many-to-many relationship. The most useful fact
tables also contain one OR more numerical measures, OR 'facts,'
that occur for the combination of keys that define each record. In
Figure, the facts are Units_Sold, Dollars_Sold, and Avg_sales. The
most useful facts in a fact table are numeric and additive.
Additivity is crucial because data warehouse applications almost
never retrieve a single fact table record; rather, they fetch back
hundreds, thousands, OR even millions of these records at a time,
and the only useful thing to do with so many records is to add them
up. Dimension tables, by contrast, most often contain descriptive
textual information, and the attributes (also called classification
attributes), which are used for analysis. Dimension attributes are
used as the source of most of the interesting constraints in data
warehouse queries, and they are virtually always the source of the
row headers in the SQL answer set.
Fact Table and Dimension Tables in a Dimensional Model Schema
Lets consider a Data-Warehouse cube. This cube has 4 dimensions and
three measures. This means that for every value of each of these 4
dimensions there will two values of coordinates. For
example:Co-ordinate [City(X), Product(Y), channel(Z),Month] = [
Sales (Quantity), Sales (Value)]OR [NY, Standard Desk-top, Mail,
September 2005] = [2000 units, $15000] In the dimensional modeling
schema, the FACT table contains the value of coordinates against
the lowest granularity of all the possible combinations of
dimensions. The dimension tables contain the details of the
dimensions, which include the attributes of dimensions including
all the higher-level hierarchies. The link between the fact table
and all the associated dimension tables is through a dimension key,
which is the lowest level granularity primary key of the dimension
tables.Fact Table- The central linkage in Dimensional Modeling A
fact table contains the value of all the measures linked to the set
of dimensions linked to the FACT table. It contains the measure
values for the combination of lowest level of granularity of
dimensions. The measures are typically numeric, which can undergo
mathematical aggregation and analysis. Families of FACT Tables
Chains and Circles. Heterogeneous products. Transactions and
snapshots. Aggregates
Dimension Table- What does and should it containThe dimension
table contains all the information on the dimension. This
includes:a. The primary key (Equivalent foreign key in the Fact
Table). b. All attributes of the dimension. These include: The
hierarchy attributes- Consider a business
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fbusiness-dimensional-entity-hierarchy.php&sa=D&sntz=1&usg=AFQjCNETV0x9cHi-pux7fhOO_L84qB9r3g"
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fbusiness-dimensional-entity-hierarchy.php&sa=D&sntz=1&usg=AFQjCNETV0x9cHi-pux7fhOO_L84qB9r3g"hierarchy--
pin-code to city to district to state to country for location
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fanalytics-reporting%2Flocation-dimension.php&sa=D&sntz=1&usg=AFQjCNEU5793OCh7h5eTQLrx5r45BdYa-w"
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fanalytics-reporting%2Flocation-dimension.php&sa=D&sntz=1&usg=AFQjCNEU5793OCh7h5eTQLrx5r45BdYa-w"dimension.
This means that each hierarchy element will be an attribute.
Textual as well as the code attributes- Location code as well as
the name of the location. This is required, because both could be
used for different reasons by different users. A power user could
be looking for location code (NY01), whereas an end user could be
looking for more explicit header (New Jersey). Include all parallel
hierarchies A product could be having different hierarchies,
depending upon if CFO OR Head of sales is looking at it. This
enables the done on all hierarchies as well as cross-hierarchies.
Production Primary Key Refer Surrogate primary key link to FACT
table These keys are used because the production keys could change
OR could be reused. For example a bill number could be reused after
5 years, OR a part number (especially FMCG) could be reused after
few years. Production OR source system key- This is required for
audit ability OR link to the Extraction data and source systems.
Dimensional Model Schemas- Star, Snow-Flake and Constellation
Dimensional model can be organized in star-schema or snow-flaked
schema.
Dimensional Model Star Schema using Star Query
The star schema is perhaps the simplest data warehouse schema.
It is called a star schema because the entity-relationship diagram
of this schema resembles a star, with points radiating from a
central table. The center of the star consists of a large fact
table and the points of the star are the dimension tables.A star
schema is characterized by one OR more very large fact tables that
contain the primary information in the data warehouse, and a number
of much smaller dimension tables (OR lookup tables), each of which
contains information about the entries for a particular attribute
in the fact table.A star query is a join between a fact table and a
number of dimension tables. Each dimension table is joined to the
fact table using a primary key to foreign key join, but the
dimension tables are not joined to each other. The cost-based
optimizer recognizes star queries and generates efficient execution
plans for them.A typical fact table contains keys and measures. For
example, in the sample schema, the fact table, sales, contain the
measures quantity_sold, amount, and average, and the keys time_key,
item-key, branch_key, and location_key. The dimension tables are
time, branch, item and location.A star join is a primary key to
foreign key join of the dimension tables to a fact table.The main
advantages of star schemas are that they: Provide a direct and
intuitive mapping between the business entities being analyzed by
end users and the schema design. Provide highly optimized
performance for typical star queries. Are widely supported by a
large number of business intelligence tools, which may anticipate
OR even require that the data-warehouse schema contains dimension
tables Snow-Flake Schema in Dimensional Modeling
The snowflake schema is a more complex data warehouse model than
a star schema, and is a type of star schema. It is called a
snowflake schema because the diagram of the schema resembles a
snowflake.Snowflake schemas normalize dimensions to eliminate
redundancy. That is, the dimension data has been grouped into
multiple tables instead of one large table. For example, a location
dimension table in a star schema might be normalized into a
location table and city table in a snowflake schema. While this
saves space, it increases the number of dimension tables and
requires more foreign key joins. The result is more complex queries
and reduced query performance. Figure above presents a graphical
representation of a snowflake schema.Fact Constellation Schema
This Schema is used mainly for the aggregate fact tables, OR
where we want to split a fact table for better comprehension. The
split of fact table is done only when we want to focus on
aggregation over few facts & dimensions.
Data
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-model-components.php&sa=D&sntz=1&usg=AFQjCNGHNKkURLQJDvFyC9TASd6SY9imRg"
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-model-components.php&sa=D&sntz=1&usg=AFQjCNGHNKkURLQJDvFyC9TASd6SY9imRg"Warehouse
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-model-components.php&sa=D&sntz=1&usg=AFQjCNGHNKkURLQJDvFyC9TASd6SY9imRg"
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-model-components.php&sa=D&sntz=1&usg=AFQjCNGHNKkURLQJDvFyC9TASd6SY9imRg"Dimensional
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-model-components.php&sa=D&sntz=1&usg=AFQjCNGHNKkURLQJDvFyC9TASd6SY9imRg"
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-model-components.php&sa=D&sntz=1&usg=AFQjCNGHNKkURLQJDvFyC9TASd6SY9imRg"Model
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-model-components.php&sa=D&sntz=1&usg=AFQjCNGHNKkURLQJDvFyC9TASd6SY9imRg"
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-model-components.php&sa=D&sntz=1&usg=AFQjCNGHNKkURLQJDvFyC9TASd6SY9imRg"Components
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-model-components.php&sa=D&sntz=1&usg=AFQjCNGHNKkURLQJDvFyC9TASd6SY9imRg"
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-model-components.php&sa=D&sntz=1&usg=AFQjCNGHNKkURLQJDvFyC9TASd6SY9imRg"Concept
Dimensional
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-relational-modeling.php&sa=D&sntz=1&usg=AFQjCNH_r9GvbAdVnybkV3yKI_geOpNffg"
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-relational-modeling.php&sa=D&sntz=1&usg=AFQjCNH_r9GvbAdVnybkV3yKI_geOpNffg"Modeling
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-relational-modeling.php&sa=D&sntz=1&usg=AFQjCNH_r9GvbAdVnybkV3yKI_geOpNffg"
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-relational-modeling.php&sa=D&sntz=1&usg=AFQjCNH_r9GvbAdVnybkV3yKI_geOpNffg"vs
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-relational-modeling.php&sa=D&sntz=1&usg=AFQjCNH_r9GvbAdVnybkV3yKI_geOpNffg".
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-relational-modeling.php&sa=D&sntz=1&usg=AFQjCNH_r9GvbAdVnybkV3yKI_geOpNffg"Relational
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-relational-modeling.php&sa=D&sntz=1&usg=AFQjCNH_r9GvbAdVnybkV3yKI_geOpNffg"
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-relational-modeling.php&sa=D&sntz=1&usg=AFQjCNH_r9GvbAdVnybkV3yKI_geOpNffg"Modeling
Dimensional Modeling vs. Relational Modeling
Dimensional modeling is different from the OLTP normalized
modeling to enable analysis and querying through massive and
unpredicted queries. Something which is a relational model is
ill-equipped to handle.
How Dimensional model is different from an E-R diagram? An E-R
diagram (used in OLTP or transactional system) has highly
normalized model (Even at a logical level), whereas dimensional
model aggregates most of the attributes and hierarchies of a
dimension into a single entity. An E-R diagram is a complex maze of
hundreds of entities linked with each other, whereas the
Dimensional model has logical grouped set of star-schemas. The E-R
diagram is split as per the entities. A dimension model is split as
per the dimensions and facts. In an E-R diagram all attributes for
an entity including textual as well as numeric, belong to the
entity table. Whereas a 'dimension' entity in dimension model has
mostly the textual attributes, and the 'fact' entity has mostly
numeric attributes. Dimensional modeling is a better approach for
Data warehouse compared to standard Data Model.The dimensional
model has a number of important data warehouse advantages that the
ER model lacks. First advantage of the
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-model-components.php&sa=D&sntz=1&usg=AFQjCNGHNKkURLQJDvFyC9TASd6SY9imRg"dimensional
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-model-components.php&sa=D&sntz=1&usg=AFQjCNGHNKkURLQJDvFyC9TASd6SY9imRg"
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fdimensional-model-components.php&sa=D&sntz=1&usg=AFQjCNGHNKkURLQJDvFyC9TASd6SY9imRg"model
is that there are standard type of joins and framework. All
dimensions can be thought of as symmetrically equal entry points
into the fact table. The logical design can be done independent of
expected query patterns. The user interfaces are symmetrical, the
query strategies are symmetrical, and the SQL generated against the
dimensional model is symmetrical. In other words, You will never
find attributes in fact tables and facts in dimension tables. If
you see a non-fact field in the fact table, you can assume that it
is a key to a dimension table Second advantage of the dimensional
model is that it is smoothly extensible to accommodate unexpected
new data elements and new design decisions. First, all existing
tables (both fact and dimension) can be changed in place by simply
adding new data rows in the table. Data should not have to be
reloaded. Typically, No query tool OR reporting tool needs to be
reprogrammed to accommodate the change. All old applications
continue to run without yielding different results. You can,
respectively, make the following graceful changes to the design
after the data warehouse is up and running by: Adding new
unanticipated facts (that is, new additive numeric fields in the
fact table), as long as they are consistent with the fundamental
grain of the existing fact table. Adding completely new dimensions,
as long as there is a single value of that dimension defined for
each existing fact record Adding new, unanticipated dimensional
attributes. Breaking existing dimension records down to a lower
level of granularity from a certain point in time forward. Third
advantage of the dimensional model is that there is a body of
standard approaches for handling common modeling situations in the
business world. Each of these situations has a well-understood set
of alternatives that can be specifically programmed in report
writers, query tools, and other user interfaces. These modeling
situations include: Slowly changing dimensions, where a 'constant'
dimension such as Product OR Customer actually evolves slowly and
asynchronously. Dimensional modeling provides specific techniques
for handling slowly
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fslowly-changing-dimension-SCD.php&sa=D&sntz=1&usg=AFQjCNHEb9XYFviCIZFbFOvXPLM7GYfIgQ"
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fslowly-changing-dimension-SCD.php&sa=D&sntz=1&usg=AFQjCNHEb9XYFviCIZFbFOvXPLM7GYfIgQ"changing
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fslowly-changing-dimension-SCD.php&sa=D&sntz=1&usg=AFQjCNHEb9XYFviCIZFbFOvXPLM7GYfIgQ"
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Fdata-warehouse%2Fslowly-changing-dimension-SCD.php&sa=D&sntz=1&usg=AFQjCNHEb9XYFviCIZFbFOvXPLM7GYfIgQ"dimensions,
depending on the business environment. Heterogeneous products,
where a business such as a bank needs to: track a number of
different lines of business together within a single common set of
attributes and facts, but at the same time.. it needs to describe
and measure the individual lines of business in highly
idiosyncratic ways using incompatible measures.
Data Warehousing - Dimensions & Measures and Related
ConceptsEach data warehouse consists of dimensions and measures.
Dimensions allow data analysis from various perspectives. For
example, time dimension could show you the breakdown of sales by
year, quarter, month, day and hour. Product dimension could help
you see which products bring in the most revenue. Supplier
dimension could help you choose those business partners who always
deliver their goods on time. Customer dimension could help you pick
the strategic set of consumers to whom you'd like to extend your
very special offers.
Measures are numeric representations of a set of facts that have
occurred. Examples of measures include dollars of sales, number of
credit hours, store profit percentage, dollars of operating
expenses, number of past-due accounts and so forth.
Additivity
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fmeasure-facts-additivity.php&sa=D&sntz=1&usg=AFQjCNHA_wibqu_w18crSDDVgcNDwi_fig"
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fmeasure-facts-additivity.php&sa=D&sntz=1&usg=AFQjCNHA_wibqu_w18crSDDVgcNDwi_fig"of
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fmeasure-facts-additivity.php&sa=D&sntz=1&usg=AFQjCNHA_wibqu_w18crSDDVgcNDwi_fig"
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fmeasure-facts-additivity.php&sa=D&sntz=1&usg=AFQjCNHA_wibqu_w18crSDDVgcNDwi_fig"Measures
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fmeasure-facts-additivity.php&sa=D&sntz=1&usg=AFQjCNHA_wibqu_w18crSDDVgcNDwi_fig"-
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fmeasure-facts-additivity.php&sa=D&sntz=1&usg=AFQjCNHA_wibqu_w18crSDDVgcNDwi_fig"Facts
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fmeasure-facts-additivity.php&sa=D&sntz=1&usg=AFQjCNHA_wibqu_w18crSDDVgcNDwi_fig"
Additivity and correct aggregation methods application is
fundamental to the success of Business Intelligence. The most
common mistakes the modelers and designers make is on - Setting the
Right Hierarchies AND Establishing Right Additivity and aggregation
rules. You need to go through the chapter of business dimensional
hierarchies, before you go through this chapter. Additivity of a
measure is when you are able to apply the sum operator across all
the dimensions. Other aggregations on measures-facts are when you
use operators like Average, Maximum and Minimum.
Non
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fnon-additive-measures-facts.php&sa=D&sntz=1&usg=AFQjCNFhFUcpraWlc-eJ-tg9smv-FATwfA"-
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fnon-additive-measures-facts.php&sa=D&sntz=1&usg=AFQjCNFhFUcpraWlc-eJ-tg9smv-FATwfA"Additive
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fnon-additive-measures-facts.php&sa=D&sntz=1&usg=AFQjCNFhFUcpraWlc-eJ-tg9smv-FATwfA"
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fnon-additive-measures-facts.php&sa=D&sntz=1&usg=AFQjCNFhFUcpraWlc-eJ-tg9smv-FATwfA"Measures
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fnon-additive-measures-facts.php&sa=D&sntz=1&usg=AFQjCNFhFUcpraWlc-eJ-tg9smv-FATwfA"-
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fnon-additive-measures-facts.php&sa=D&sntz=1&usg=AFQjCNFhFUcpraWlc-eJ-tg9smv-FATwfA"Facts
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fnon-additive-measures-facts.php&sa=D&sntz=1&usg=AFQjCNFhFUcpraWlc-eJ-tg9smv-FATwfA"
Non-Additivity is that when you cannot use a sum operator to
generate the needed aggregation.
Semi
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fsemi-additive-measures-facts.php&sa=D&sntz=1&usg=AFQjCNE7FJqJ8lmPrFPOQA2Txg4GYbPovQ"-
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fsemi-additive-measures-facts.php&sa=D&sntz=1&usg=AFQjCNE7FJqJ8lmPrFPOQA2Txg4GYbPovQ"Additive
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fsemi-additive-measures-facts.php&sa=D&sntz=1&usg=AFQjCNE7FJqJ8lmPrFPOQA2Txg4GYbPovQ"
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fsemi-additive-measures-facts.php&sa=D&sntz=1&usg=AFQjCNE7FJqJ8lmPrFPOQA2Txg4GYbPovQ"Measures
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fsemi-additive-measures-facts.php&sa=D&sntz=1&usg=AFQjCNE7FJqJ8lmPrFPOQA2Txg4GYbPovQ"-
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fsemi-additive-measures-facts.php&sa=D&sntz=1&usg=AFQjCNE7FJqJ8lmPrFPOQA2Txg4GYbPovQ"Facts
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fwww.executionmih.com%2Folap%2Fsemi-additive-measures-facts.php&sa=D&sntz=1&usg=AFQjCNE7FJqJ8lmPrFPOQA2Txg4GYbPovQ"
Semi-Additivity is when you can have a measure aggregated on a
certain dimension, but not all the dimensions. Another phrase for
semi-additivity is when you have the summarization with an index of
in-accuracy.
Additive measures are measures that can be added across all
dimensions. For example dollars of sales can be added across all
dimensions within a retail store warehouse.
Semi-additive measures are measures that can be added across
some, but not all dimensions. For example the bank account balance
is simply a snapshot in time and cannot be summed over time.
However you could add multiple accounts of the same customer to get
the total balance for that customer.Non-additive measures are
measures that cannot be added across any dimensions. For example
the inventory is simply a snapshot in time and cannot be summed
over time. Nor can you combine inventory for various
products.Hierarchy defines parent-child relationships among various
levels within a single dimension. For instance in a time dimension,
year level is parent of four quarters, each of which is a parent of
three months, which are parents of 28 to 31 days, which are parents
of 24 hours. Similarly in a geography dimension a continent is a
parent of countries, country could be a parent of states, and state
could be a parent of cities. Level is a column within a dimension
table that could be used for aggregating data. For example, product
dimension could have levels of product type (beverage), product
category (alcoholic beverage), product class (beer), product name
(miller lite, budlite, corona, etc).Member is a value within a
dimension level that can be used for aggregating and reporting
data. For example each product category such as beverage,
non-consumable, food, clothing, etc is a member. Each product class
such as beer, wine, coke, bottled water would represent a
member.
Data Mart is a subset of the data warehouse typically serving a
functional area such as marketing or finance, or particular
location of the business (for instance mid-Western division).Jump
to: navigation, searchData Warehousing - Fact and Dimension
TablesData warehouses are built using dimensional data models which
consist of fact and dimension tables. Dimension tables are used to
describe dimensions; they contain dimension keys, values and
attributes. For example, the time dimension would contain every
hour, day, week, month, quarter and year that has occurred since
you started your business operations. Product dimension could
contain a name and description of products you sell, their unit
price, color, weight and other attributes as applicable.
Dimension tables are typically small, ranging from a few to
several thousand rows. Occasionally dimensions can grow fairly
large, however. For example, a large credit card company could have
a customer dimension with millions of rows. Dimension table
structure is typically very lean, for example customer dimension
could look like following:
Customer_keyCustomer_full_nameCustomer_cityCustomer_stateCustomer_countryAlthough
there might be other attributes that you store in the relational
database, data warehouses might not need all of those attributes.
For example, customer telephone numbers, email addresses and other
contact information would not be necessary for the warehouse. Keep
in mind that data warehouses are used to make strategic decisions
by analyzing trends. It is not meant to be a tool for daily
business operations. On the other hand, you might have some reports
that do include data elements that aren't necessary for data
analysis.
Most data warehouses will have one or multiple time dimensions.
Since the warehouse will be used for finding and examining trends,
data analysts will need to know when each fact has occurred. The
most common time dimension is calendar time. However, your business
might also need a fiscal time dimension in case your fiscal year
does not start on January 1st as the calendar year. Most data
warehouses will also contain product or service dimensions since
each business typically operates by offering either products or
services to others. Geographically dispersed businesses are likely
to have a location dimension.
Fact tables contain keys to dimension tables as well as
measurable facts that data analysts would want to examine. For
example, a store selling automotive parts might have a fact table
recording a sale of each item. The fact table of an educational
entity could track credit hours awarded to students. A bakery could
have a fact table that records manufacturing of various baked
goods.
Fact tables can grow very large, with millions or even billions
of rows. It is important to identify the lowest level of facts that
makes sense to analyze for your business this is often referred to
as fact table "grain". For instance, for a healthcare billing
company it might be sufficient to track revenues by month; daily
and hourly data might not exist or might not be relevant. On the
other hand, the assembly line warehouse analysts might be very
concerned in number of defective goods that were manufactured each
hour. Similarly a marketing data warehouse might be concerned by
the activity of a consumer group with a specific income-level
rather than purchases made by each individual.
Jump to: navigation, searchData Warehousing - Star and Snowflake
SchemasJump to: navigation, search
The foundation of each data warehouse is a relational database
built using a dimensional model. A dimensional model consists of
dimension and fact tables and is typically described as star or
snowflake schema.
Star schema resembles a star; one or more fact tables are
surrounded by the dimension tables. Dimension tables aren't
normalized - that means even if you have repeating fields such as
name or category no extra table is added to remove the redundancy.
For example, in a car dealership scenario you might have a product
dimension that might look like this:
Product_keyProduct_categoryProduct_subcategoryProduct_brandProduct_makeProduct_modelProduct_year
In a relational system such design would be clearly unacceptable
because product category (car, van, truck) can be repeated for
multiple vehicles and so could product brand (Toyota, Ford,
Nissan), product make (Camry, Corolla, Maxima) and model (LE, XLE,
SE and so forth). So a vehicle table in a relational system is
likely to have foreign keys relating to vehicle category, vehicle
brand, vehicle make and vehicle model. However in the dimensional
star schema model you simply list out the names of each vehicle
attribute.
Star schema also contains the entire dimension hierarchy within
a single table. Dimension hierarchy provides a way of aggregating
data from the lowest to highest levels within a dimension. For
example, Camry LE and Camry XLE sales roll up to Camry make, Toyota
brand and cars category. Here is what a star schema diagram could
look like:
File
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2F138.gif&sa=D&sntz=1&usg=AFQjCNGYsyHMB1h7SYRmzAs_TUM6UbPc4g":
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2F138.gif&sa=D&sntz=1&usg=AFQjCNGYsyHMB1h7SYRmzAs_TUM6UbPc4g"ASDW
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2F138.gif&sa=D&sntz=1&usg=AFQjCNGYsyHMB1h7SYRmzAs_TUM6UbPc4g"3
138.
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2F138.gif&sa=D&sntz=1&usg=AFQjCNGYsyHMB1h7SYRmzAs_TUM6UbPc4g"gif
Notice that each dimension table has a primary key. The fact
table has foreign keys to each dimension table. Although data
warehouse does not require creating primary and foreign keys, it is
highly recommended to do so for two reasons: 1. Dimensional models
that have primary and foreign keys provide superior performance,
especially for processing Analysis Services cubes. 2. Analysis
Services requires creating either physical or logical relationships
between fact and dimension tables. Physical relationships are
implemented through primary and foreign keys. Therefore if the keys
exist you save a step when building cubes.
Snowflake schema resembles a snowflake because dimension tables
are further normalized or have parent tables. For example we could
extend the product dimension in the dealership warehouse to have a
product_category and product_subcategory tables. Product categories
could include trucks, vans, sport utility vehicles, etc. Product
subcategory tables could contain subcategories such as leisure
vehicles, recreational vehicles, luxury vehicles, industrial trucks
and so forth. Here is what the snowflake schema would look like
with extended product dimension:
File
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2F139.gif&sa=D&sntz=1&usg=AFQjCNG_wV0GkqV-au8zza_37J2ZCscTtA":
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2F139.gif&sa=D&sntz=1&usg=AFQjCNG_wV0GkqV-au8zza_37J2ZCscTtA"ASDW
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2F139.gif&sa=D&sntz=1&usg=AFQjCNG_wV0GkqV-au8zza_37J2ZCscTtA"3
139.
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2F139.gif&sa=D&sntz=1&usg=AFQjCNG_wV0GkqV-au8zza_37J2ZCscTtA"gif
Snowflake schema generates more joins than a star schema during
cube processing, which translates into longer queries. Therefore it
is normally recommended to choose the star schema design over the
snowflake schema for optimal performance. Snowflake schema does
have an advantage of providing more flexibility, however. For
example, if you were working for an auto parts store chain you
might wish to report on car parts (car doors, hoods, engines) as
well as subparts (door knobs, hood covers, timing belts and so
forth). In such cases you could have both part and subpart
dimensions, however some attributes of subparts might not apply to
parts and vise versa. For example, you could examine the thread
size attribute would apply to a tire but not for nuts and bolts
that go on the tire. If you wish to aggregate your sales by part
you will need to know which subparts should rollup to each part as
in the
following:Dim_subpartsubpart_keysubpart_namesubpart_SKUsubpart_sizesubpart_weightsubpart_colorpart_key
Dim_partpart_keypart_namepart_SKU
With such a design you could create reports that show you a
breakdown of your sales by each type of engine, as well as each
part that makes up the engine.
Data Warehousing - Extraction, Transformation and LoadingJump
to: navigation, search
A data warehouse does not generate any data; instead it is
populated from various transactional or operational data stores.
The process of importing and manipulating transactional data into
the warehouse is referred to as Extraction, Transformation and
Loading (ETL). SQL Server supplies an excellent ETL tool known as
Data Transformation Services (DTS) in version 2000 and SQL Server
Integration Services (SSIS) in version 2005.
ETL resolves the inconsistencies in entity and attribute naming
across multiple data sources. For example the same entity could be
called customers, clients, prospects or consumers in various data
stores. Furthermore attributes such as address might be stored as
three or more different columns (address line1, address line2,
city, state, county, postal code, country and so forth). Each
column can also be abbreviated or spelled out completely, depending
on data source. Similarly there might be differences in data types,
such as storing data and time as a string, number or date. During
the ETL process data is imported from various sources and is given
a common shape.
In addition to the changes that you can manage in ETL relatively
easily, there are some data inconsistencies that you might have to
fix manually. For example, examine the following data values:Dr.
Jimmy SmithJames L. Smith, Jr.Jim L Smith, M.D.James Smith MDJim
Smith, JR - M.D.
A human eye can easily suspect that all of these values could
represent the same person. However unless you work with James Smith
or his accounts you cannot be certain. Should you show each of
these values as a separate person on your reports Writing a program
that can fix such data inconsistencies could be a challenge,
whereas a data entry clerk that created these values might be able
to change them to a single, correct value with minimal effort.
Data inconsistencies are commonplace in operational data sources
that allow free form data entry. A data warehouse cannot fix
problems with poorly designed operational systems, but it is likely
to make such issues known to data analysts and business managers.
Even if you design smart ETL logic to correct the existing issues
predicting all future variations of "Doctor Jim L Smith Junior" is
a daunting task. Instead you should attempt to fix the data entry
applications to limit the human error.
In addition to importing data from various sources, ETL is also
responsible for transforming data into a dimensional model.
Depending on your data sources the import process can be relatively
simple or very complicated. For example, some organizations keep
all of their data in a single relational engine, such as SQL
Server. Others could have numerous systems that might not be easily
accessible. In some cases you might have to rely on scanned
documents or scrape report screens to get the data for your
warehouse. In such situations you should bring all data into a
common staging area first and then transform it into a dimensional
model.
The need for a staging database isn't limited to those
warehouses that have inaccessible data sources. A staging area also
provides a good place for assuring that your ETL is working
correctly before data is loaded into dimension and fact tables. So
your ETL could be made up of multiple stages: 1. Import data from
various data sources into the staging area. 2. Cleanse data from
inconsistencies (could be either automated or manual effort). 3.
Ensure that row counts of imported data in the staging area match
the counts in the original data source. 4. Load data from the
staging area into the dimensional model. Retrieved from "http
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fsqlserverpedia.com%2Fwiki%2FData_Warehousing_-_Extraction%2C_Transformation_and_Loading&sa=D&sntz=1&usg=AFQjCNFIPS-jcXqA4f6PmudlZUIuPuKXGg"://
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fsqlserverpedia.com%2Fwiki%2FData_Warehousing_-_Extraction%2C_Transformation_and_Loading&sa=D&sntz=1&usg=AFQjCNFIPS-jcXqA4f6PmudlZUIuPuKXGg"sqlserverpedia
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fsqlserverpedia.com%2Fwiki%2FData_Warehousing_-_Extraction%2C_Transformation_and_Loading&sa=D&sntz=1&usg=AFQjCNFIPS-jcXqA4f6PmudlZUIuPuKXGg".
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fsqlserverpedia.com%2Fwiki%2FData_Warehousing_-_Extraction%2C_Transformation_and_Loading&sa=D&sntz=1&usg=AFQjCNFIPS-jcXqA4f6PmudlZUIuPuKXGg"com
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fsqlserverpedia.com%2Fwiki%2FData_Warehousing_-_Extraction%2C_Transformation_and_Loading&sa=D&sntz=1&usg=AFQjCNFIPS-jcXqA4f6PmudlZUIuPuKXGg"/
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fsqlserverpedia.com%2Fwiki%2FData_Warehousing_-_Extraction%2C_Transformation_and_Loading&sa=D&sntz=1&usg=AFQjCNFIPS-jcXqA4f6PmudlZUIuPuKXGg"wiki
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fsqlserverpedia.com%2Fwiki%2FData_Warehousing_-_Extraction%2C_Transformation_and_Loading&sa=D&sntz=1&usg=AFQjCNFIPS-jcXqA4f6PmudlZUIuPuKXGg"/
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fsqlserverpedia.com%2Fwiki%2FData_Warehousing_-_Extraction%2C_Transformation_and_Loading&sa=D&sntz=1&usg=AFQjCNFIPS-jcXqA4f6PmudlZUIuPuKXGg"Data
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fsqlserverpedia.com%2Fwiki%2FData_Warehousing_-_Extraction%2C_Transformation_and_Loading&sa=D&sntz=1&usg=AFQjCNFIPS-jcXqA4f6PmudlZUIuPuKXGg"_
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fsqlserverpedia.com%2Fwiki%2FData_Warehousing_-_Extraction%2C_Transformation_and_Loading&sa=D&sntz=1&usg=AFQjCNFIPS-jcXqA4f6PmudlZUIuPuKXGg"Warehousing
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fsqlserverpedia.com%2Fwiki%2FData_Warehousing_-_Extraction%2C_Transformation_and_Loading&sa=D&sntz=1&usg=AFQjCNFIPS-jcXqA4f6PmudlZUIuPuKXGg"_-_
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fsqlserverpedia.com%2Fwiki%2FData_Warehousing_-_Extraction%2C_Transformation_and_Loading&sa=D&sntz=1&usg=AFQjCNFIPS-jcXqA4f6PmudlZUIuPuKXGg"Extraction
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fsqlserverpedia.com%2Fwiki%2FData_Warehousing_-_Extraction%2C_Transformation_and_Loading&sa=D&sntz=1&usg=AFQjCNFIPS-jcXqA4f6PmudlZUIuPuKXGg",_
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fsqlserverpedia.com%2Fwiki%2FData_Warehousing_-_Extraction%2C_Transformation_and_Loading&sa=D&sntz=1&usg=AFQjCNFIPS-jcXqA4f6PmudlZUIuPuKXGg"Transformation
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fsqlserverpedia.com%2Fwiki%2FData_Warehousing_-_Extraction%2C_Transformation_and_Loading&sa=D&sntz=1&usg=AFQjCNFIPS-jcXqA4f6PmudlZUIuPuKXGg"_
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fsqlserverpedia.com%2Fwiki%2FData_Warehousing_-_Extraction%2C_Transformation_and_Loading&sa=D&sntz=1&usg=AFQjCNFIPS-jcXqA4f6PmudlZUIuPuKXGg"and
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fsqlserverpedia.com%2Fwiki%2FData_Warehousing_-_Extraction%2C_Transformation_and_Loading&sa=D&sntz=1&usg=AFQjCNFIPS-jcXqA4f6PmudlZUIuPuKXGg"_
HYPERLINK
"http://www.google.com/url?q=http%3A%2F%2Fsqlserverpedia.com%2Fwiki%2FData_Warehousing_-_Extraction%2C_Transformation_and_Loading&sa=D&sntz=1&usg=AFQjCNFIPS-jcXqA4f6PmudlZUIuPuKXGg"Loading"
Definitions: Fact table, Dimension tableFact tableA fact table
consists of the measurements, metrics or facts of a business
process. It is often located at the centre of a star schema,
surrounded by dimension tables.Fact tables provide the (usually)
additive values that act as independent variables by which
dimensional attributes are analyzed. Fact tables are often defined
by their grain. The grain of a fact table represents the most
atomic level by which the facts may be defined. Additive - Measures
that can be added across all dimensions. Non Additive - Measures
that cannot be added across all dimensions. Semi Additive -
Measures that can be added across few dimensions and not with
others. A fact table might contain either detail level facts or
facts that have been aggregated (fact tables that contain
aggregated facts are often instead called summary tables).
In the real world, it is possible to have a fact table that
contains no measures or facts. These tables are called "Factless
Fact tables".Dimension tableA dimension table is one of the set of
companion tables to a fact table. The fact table contains business
facts or measures and foreign keys which refer to candidate keys
(normally primary keys) in the dimension tables. The dimension
tables contain attributes (or fields) used to constrain and group
data when performing data warehousing queries.
Over time, the attributes of a given row in a dimension table
may change. For example, the shipping address for a company may
change. Kimball refers to this phenomenon as Slowly Changing
Dimensions. Strategies for dealing with this kind of change are
divided into three categories: Type One - Simply overwrite the old
value(s). Type Two - Add a new row containing the new value(s), and
distinguish between the rows using Tuple-versioning techniques.
Type Three - Add a new attribute to the existing row.