-
By Ralph Kimball
(http://www.kimballgroup.com/author/ralph/)January 1, 2003
Kimball Group. All rights reserved.
Home (http://www.kimballgroup.com) / Fact Tables and Dimension
Tables
Dimensional modeling is a design discipline that straddles the
formal relational model and theengineering realities of text and
number data. Compared to entity/relation modeling, its lessrigorous
(allowing the designer more discretion in organizing the tables)
but more practicalbecause it accommodates database complexity and
improves performance. Contrasted withother modeling disciplines,
dimensional modeling has developed an extensive portfolio
oftechniques for handling real-world situations.
Measurements and ContextDimensional modeling begins by dividing
the world intomeasurementsandcontext.Measurements are usually
numeric and taken repeatedly. Numeric measurements arefacts.Facts
are always surrounded by mostly textual context thats true at the
moment the fact isrecorded. Facts are very specific, well-defined
numeric attributes. By contrast, the contextsurrounding the facts
is open-ended and verbose. Its not uncommon for the designer to
addcontext to a set of facts partway through the
implementation.
Although you could lump all context into a wide, logical record
associated with each measuredfact, youll usually find it convenient
and intuitive to divide the context into independent logical
Fact Tables and Dimension Tables
Kimball Univ 2015 Calendar(http://www.kimballgroup.com/data-
warehouse-business-intelligence-courses/schedule/)
(http://www.kimballgroup.com)
http://www.kimballgroup.com/http://www.kimballgroup.com/http://www.kimballgroup.com/data-warehouse-business-intelligence-consulting/http://www.kimballgroup.com/contact-kimball-group/http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/http://www.kimballgroup.com/data-warehouse-business-intelligence-courses/http://www.kimballgroup.com/author/ralph/http://www.kimballgroup.com/http://www.kimballgroup.com/data-warehouse-business-intelligence-courses/schedule/http://www.kimballgroup.com/about-kimball-group/
-
fact, youll usually find it convenient and intuitive to divide
the context into independent logicalclumps. When you record facts
dollar sales of a grocery store purchase of an individualproduct,
for example you naturally divide the context into clumps named
Product, Store,Time, Customer, Clerk, and several others. We call
these logical clumpsdimensionsandassume informally that these
dimensions are independent.
In truth, dimensions rarely are completely independent in a
strong statistical sense. In thegrocery store example, Customer and
Store clearly will show a statistical correlation. But itsusually
the right decision to model Customer and Store as separate
dimensions. A single,combined dimension would likely be unwieldy
with tens of millions of rows. And the record ofwhen a given
customer shopped in a given store would be expressed more naturally
in a facttable that also showed the Time dimension.
The assumption of dimension independence would mean that all the
dimensions, such asProduct, Store, and Customer, are independent of
Time. But you have to account for the slow,episodic change of these
dimensions in the way you handle them. In effect, as keepers of
thedata warehouse, we have taken a pledge to faithfully represent
these changes. Thispredicament gives rise to the technique ofslowly
changing dimensions, the subject of the nextcolumn in this
series.
Dimensional KeysIf the facts are truly measures taken
repeatedly, you find that fact tables always create acharacteristic
many-to-many relationship among the dimensions. Many customers buy
manyproducts in many stores at many times.
Therefore, you logically model measurements as fact tables with
multiple foreign keys referringto the contextual entities. And the
contextual entities are each dimensions with a singleprimary key.
Although you can separate the logical design from the physical
design, in arelational database fact tables and dimension tables
are most often explicit tables.
Actually, a real relational database has two levels of physical
design. At the higher level, tablesare explicitly declared together
with their fields and keys. The lower level of physical
designdescribes the way the bits are organized on the disk and in
memory. Not only is this designhighly dependent on the particular
database, but some implementations may even invert thedatabase
beneath the level of table declarations and store the bits in ways
that are not directlyrelated to the higher-level physical records.
What follows is a discussion of the higher levelonly.
A fact table in a pure star schema consists of multiple foreign
keys, each paired with a primarykey in a dimension, together with
the facts containing the measurements. In Figure 1, theforeign keys
in the fact table are labeled FK, and the primary keys in the
dimension tables arelabeled PK. (The field labeled DD, special
degenerate dimension key, is discussed later in thiscolumn.)
I insist that the foreign keys in the fact table obey
referential integrity with respect to theprimary keys in their
respective dimensions. In other words, every foreign key in the
fact tablehas a match to a unique primary key in the respective
dimension. Note that this design allowsthe dimension table to
possess primary keys that arent found in the fact table. Therefore,
a
product dimension table might be paired with a sales fact table
in which some of the productsare never sold. This situation is
perfectly consistent with referential integrity and
properdimensional modeling.
-
dimensional modeling.
In the real world, there are many compelling reasons to build
the FK-PK pairs assurrogatekeysthat are just sequentially assigned
integers. Its a major mistake to build data warehouse keysout of
the natural keys that come from the underlying data sources. I
discuss this fascinatingand intricate topic in detail in a pair
ofcolumns, Surrogate
Keys(http://www.kimballgroup.com/1998/05/02/surrogate-keys/) and
Pipelining Your
Surrogates(http://www.kimballgroup.com/1998/06/02/pipelining-your-surrogates/)
.
Occasionally a perfectly legitimate measurement will involve a
missing dimension. Perhaps insome situations a product can be sold
to a customer in a transaction without a store defined.In this
case, rather than attempting to store a null value in the Store FK,
you build a specialrecord in the Store dimension representing No
Store. Now the No Store condition has aperfectly normal FK-PK
representation in the fact table.
Logically, a fact table doesnt need a primary key because,
depending on the informationavailable, two different legitimate
observations could be represented identically. Practicallyspeaking,
this is a terrible idea because normal SQL makes it very hard to
select one of therecords without selecting the other. It would also
be hard to check data quality if multiplerecords were
indistinguishable from each other.
Relating the Two Modeling WorldsDimensional models are
full-fledged relational models, where the fact table is in third
normalform and the dimension tables are in second normal form,
confusingly referred toasdenormalized. Remember that the chief
difference between second and third normal formsis that repeated
entries are removed from a second normal form table and placed in
their ownsnowflake. Thus the act of removing the context from a
fact record and creating dimensiontables places the fact table in
third normal form.
I resist the urge to further snowflake the dimension tables and
am content to leave them in flatsecond normal form because the flat
tables are much more efficient to query. In particular,dimension
attributes with many repeated values are perfect targets for bitmap
indexes.Snowflaking a dimension into third normal form, while not
incorrect, destroys the ability to usebitmap indexes and increases
the user-perceived complexity of the design. Remember that inthe
presentation system in the data warehouse, you dont have to worry
about enforcing many-to-one data rules in the physical table design
by demanding snowflaked dimensions. Thestaging system has already
enforced those rules.
Declaring the GrainAlthough theoretically any mixture of
measured facts could be shoehorned into a singledimension table, a
proper dimensional design allows only facts of auniform grain(the
samedimensionality) to coexist in a single fact table. Uniform
grain guarantees that all thedimensions are used with all the fact
records (keeping in mind the No Store example), and itgreatly
reduces the possibility of application errors due to combining data
at different grains.For example, its usually meaningless to
blithely add daily data to yearly data. When you havefacts at two
different grains, you place the facts in separate tables.
Additive FactsAt the heart of every fact table is the list of
facts that represent the measurements. Becausemost fact tables are
huge, with millions or even billions of rows, you almost never
fetch a single
http://www.kimballgroup.com/1998/06/02/pipelining-your-surrogates/http://www.kimballgroup.com/1998/05/02/surrogate-keys/
-
Ralph Kimball is the founder of the Kimball Group and
KimballUniversity where he has taught data warehouse design to
morethan 10,000 students. He is known for the best selling series
ofToolkit books. He started with a Ph.D. in man-machine systemsfrom
Stanford in 1973 and has spent nearly four decades
most fact tables are huge, with millions or even billions of
rows, you almost never fetch a singlerecord into your answer set.
Rather, you fetch a very large number of records, which youcompress
into digestible form by adding, counting, averaging, or taking the
min or max. But forpractical purposes, the most common choice, by
far, is adding. Applications are simpler if theystore facts in an
additive format as often as possible. Thus, in the grocery example,
you dontneed to store the unit price. You merely compute the unit
price by dividing the dollar sales bythe unit sales whenever
necessary.
Some facts, like bank balances and inventory levels, represent
intensities that are awkward toexpress in an additive format. You
can treat these semiadditive facts as if they were additive but
just before presenting the results to the end user, divide the
answer by the number of timeperiods to get the right result. This
technique is calledaveraging over time.
Some perfectly good fact tables represent measurements that have
no facts! This kind ofmeasurements is often called anevent. The
classic example of such afactless fact tableis arecord representing
a student attending a class on a specific day. The dimensions are
Day,Student, Professor, Course, and Location, but there are no
obvious numeric facts. The tuitionpaid and grade received are good
facts but not at the grain of the daily attendance.
Degenerate DimensionsIn many modeling situations where the grain
is a child, the natural key of the parent winds upas an orphan in
the design. In the grocery example, the grain is the line item on a
sales ticket,but the ticket number is the natural key of the parent
ticket. Because you have systematicallystripped off all the ticket
context as dimensions, the ticket number is left exposed without
anyattributes of its own. You model this reality by placing the
ticket number by itself, right in thefact table. We call this key
adegeneratedimension. The ticket number is useful because its
theglue that holds the child records together.
In the next issue, the sixth column in this Fundamentals series
will detail the latest thinking onhow to handle slowly changing
dimensions.
Share this:
About the Author: Ralph
Kimball(http://www.kimballgroup.com/author/ralph/)
http://www.kimballgroup.com/author/ralph/javascript:;
-
(http://www.kimballgroup.com/author/ralph/)
from Stanford in 1973 and has spent nearly four decadesdesigning
systems for users that are simple and fast.
Subscribe to Design Tips
(http://www.kimballgroup.com/subscribe-to-design-tips/)
Categories
Kimball Classics
(http://www.kimballgroup.com/category/kimball-classics/)
Before Diving In
(http://www.kimballgroup.com/category/before-diving-in/)
Project/Program Planning
(http://www.kimballgroup.com/category/project-program-planning/)
Requirements Definition
(http://www.kimballgroup.com/category/requirements-definition/)
Data Architecture
(http://www.kimballgroup.com/category/data-architecture/)
Dimensional Modeling
Fundamentals(http://www.kimballgroup.com/category/dimensional-modeling-fundamentals/)
Dimensional Modeling Tasks
(http://www.kimballgroup.com/category/dimensional-modeling-tasks/)
Fact Table Core Concepts
(http://www.kimballgroup.com/category/fact-table-core-concepts/)
Dimension Table Core Concepts
(http://www.kimballgroup.com/category/dimension-table-core-concepts/)
Advanced Dimension Patterns & Case
Studies(http://www.kimballgroup.com/category/advanced-dimension-patterns-case-studies/)
ETL and Data Quality
(http://www.kimballgroup.com/category/etl-and-data-quality/)
Technical Architecture
(http://www.kimballgroup.com/category/technical-architecture/)
Business Intelligence Applications
(http://www.kimballgroup.com/category/business-intelligence-applications/)
http://www.kimballgroup.com/category/advanced-dimension-patterns-case-studies/http://www.kimballgroup.com/category/business-intelligence-applications/http://www.kimballgroup.com/category/etl-and-data-quality/http://www.kimballgroup.com/category/dimension-table-core-concepts/http://www.kimballgroup.com/category/before-diving-in/http://www.kimballgroup.com/category/fact-table-core-concepts/http://www.kimballgroup.com/author/ralph/http://www.kimballgroup.com/category/data-architecture/http://www.kimballgroup.com/category/project-program-planning/http://www.kimballgroup.com/category/dimensional-modeling-tasks/http://www.kimballgroup.com/category/requirements-definition/http://www.kimballgroup.com/category/technical-architecture/http://www.kimballgroup.com/category/dimensional-modeling-fundamentals/http://www.kimballgroup.com/category/kimball-classics/http://www.kimballgroup.com/subscribe-to-design-tips/
-
Maintenance and Growth
(http://www.kimballgroup.com/category/maintenance-and-growth/)
Archives
(http://www.kimballgroup.com/2015/)2015
(http://www.kimballgroup.com/2015/)
(http://www.kimballgroup.com/2014/)2014
(http://www.kimballgroup.com/2014/)
(http://www.kimballgroup.com/2013/)2013
(http://www.kimballgroup.com/2013/)
(http://www.kimballgroup.com/2012/)2012
(http://www.kimballgroup.com/2012/)
(http://www.kimballgroup.com/2011/)2011
(http://www.kimballgroup.com/2011/)
(http://www.kimballgroup.com/2010/)2010
(http://www.kimballgroup.com/2010/)
(http://www.kimballgroup.com/2009/)2009
(http://www.kimballgroup.com/2009/)
(http://www.kimballgroup.com/2008/)2008
(http://www.kimballgroup.com/2008/)
(http://www.kimballgroup.com/2007/)2007
(http://www.kimballgroup.com/2007/)
(http://www.kimballgroup.com/2006/)2006
(http://www.kimballgroup.com/2006/)
(http://www.kimballgroup.com/2005/)2005
(http://www.kimballgroup.com/2005/)
(http://www.kimballgroup.com/2004/)2004
(http://www.kimballgroup.com/2004/)
(http://www.kimballgroup.com/2003/)
2003 (http://www.kimballgroup.com/2003/)
(http://www.kimballgroup.com/2002/)2002
(http://www.kimballgroup.com/2002/)
(http://www.kimballgroup.com/2001/)2001
(http://www.kimballgroup.com/2001/)
http://www.kimballgroup.com/2002/http://www.kimballgroup.com/2012/http://www.kimballgroup.com/2011/http://www.kimballgroup.com/2003/http://www.kimballgroup.com/2003/http://www.kimballgroup.com/2009/http://www.kimballgroup.com/2006/http://www.kimballgroup.com/2013/http://www.kimballgroup.com/2014/http://www.kimballgroup.com/2007/http://www.kimballgroup.com/2006/http://www.kimballgroup.com/2014/http://www.kimballgroup.com/2007/http://www.kimballgroup.com/2002/http://www.kimballgroup.com/2015/http://www.kimballgroup.com/2013/http://www.kimballgroup.com/2005/http://www.kimballgroup.com/2005/http://www.kimballgroup.com/category/maintenance-and-growth/http://www.kimballgroup.com/2001/http://www.kimballgroup.com/2011/http://www.kimballgroup.com/2012/http://www.kimballgroup.com/2008/http://www.kimballgroup.com/2004/http://www.kimballgroup.com/2015/http://www.kimballgroup.com/2004/http://www.kimballgroup.com/2010/http://www.kimballgroup.com/2010/http://www.kimballgroup.com/2009/http://www.kimballgroup.com/2008/http://www.kimballgroup.com/2001/
-
(http://www.kimballgroup.com/2000/)2000
(http://www.kimballgroup.com/2000/)
(http://www.kimballgroup.com/1999/)1999
(http://www.kimballgroup.com/1999/)
(http://www.kimballgroup.com/1998/)1998
(http://www.kimballgroup.com/1998/)
(http://www.kimballgroup.com/1997/)1997
(http://www.kimballgroup.com/1997/)
(http://www.kimballgroup.com/1996/)1996
(http://www.kimballgroup.com/1996/)
(http://www.kimballgroup.com/1995/)1995
(http://www.kimballgroup.com/1995/)
Latest Design Tips
Design Tip #173 Risky Project Resources are Risky
Business(http://www.kimballgroup.com/2015/03/design-tip-173-risky-project-resources-risky-business/)
Design Tip #172 Leverage Your Dimensional Model for Predictive
Analytics(http://www.kimballgroup.com/2015/02/design-tip-172-leverage-dimensional-model-predictive-analytics/)
Design Tip #171 Unclogging the Fact Table Surrogate Key
Pipeline(http://www.kimballgroup.com/2015/01/design-tip-171-unclogging-fact-table-surrogate-key-pipeline/)
Kimball University 2015 Final Tour
(http://www.kimballgroup.com/2014/12/kimball-university-2015-final-tour/)
Design Tip #170 Leverage Process Metadata for Self-Monitoring DW
Operations(http://www.kimballgroup.com/2014/11/design-tip-170-leverage-process-metadata-self-monitoring-dw-operations/)
Consulting
DW/BI Strategy
(http://www.kimballgroup.com/data-warehouse-business-intelligence-consulting/dw-bi-strategy-assessment/)
http://www.kimballgroup.com/1997/http://www.kimballgroup.com/1998/http://www.kimballgroup.com/2014/12/kimball-university-2015-final-tour/http://www.kimballgroup.com/1996/http://www.kimballgroup.com/1999/http://www.kimballgroup.com/1996/http://www.kimballgroup.com/2000/http://www.kimballgroup.com/1997/http://www.kimballgroup.com/2000/http://www.kimballgroup.com/1998/http://www.kimballgroup.com/data-warehouse-business-intelligence-consulting/business-intelligence-requirements/http://www.kimballgroup.com/2015/01/design-tip-171-unclogging-fact-table-surrogate-key-pipeline/http://www.kimballgroup.com/1999/http://www.kimballgroup.com/1995/http://www.kimballgroup.com/2014/11/design-tip-170-leverage-process-metadata-self-monitoring-dw-operations/http://www.kimballgroup.com/2015/03/design-tip-173-risky-project-resources-risky-business/http://www.kimballgroup.com/2015/02/design-tip-172-leverage-dimensional-model-predictive-analytics/http://www.kimballgroup.com/data-warehouse-business-intelligence-consulting/dw-bi-strategy-assessment/http://www.kimballgroup.com/1995/
-
DW/BI Requirements
(http://www.kimballgroup.com/data-warehouse-business-intelligence-consulting/business-intelligence-requirements/)
Dimensional Modeling
(http://www.kimballgroup.com/data-warehouse-business-intelligence-consulting/dimensional-modeling/)
Dimensional Model Design Review
(http://www.kimballgroup.com/data-warehouse-business-intelligence-consulting/dimensional-model-design-review/)
DW/BI Project Review
(http://www.kimballgroup.com/data-warehouse-business-intelligence-consulting/business-intelligence-project-review/)
Training
Public Course Descriptions
(http://www.kimballgroup.com/data-warehouse-business-intelligence-courses/education-training-classes/)
Class Schedule
(http://www.kimballgroup.com/data-warehouse-business-intelligence-courses/schedule/)
Logistics
(http://www.kimballgroup.com/data-warehouse-business-intelligence-courses/logistics/)
Pricing and Policies
(http://www.kimballgroup.com/data-warehouse-business-intelligence-courses/pricing-and-policies/)
Registration
(http://www.kimballgroup.com/data-warehouse-business-intelligence-courses/registration/)
Onsite Course Descriptions
(http://www.kimballgroup.com/data-warehouse-business-intelligence-courses/on-site-education-training-classes/)
LinkedIn for Alumni
(http://www.kimballgroup.com/data-warehouse-business-intelligence-courses/linkedin-for-alumni/)
Resources
Kimball Techniques
(http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/)
Books
(http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/books/)
Design Tips
(http://www.kimballgroup.com/category/design-tips/)
Articles & Papers
(http://www.kimballgroup.com/category/business-intelligence-and-data-warehouse-articles/)
Events
(http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/events/)
http://www.kimballgroup.com/data-warehouse-business-intelligence-courses/on-site-education-training-classes/http://www.kimballgroup.com/data-warehouse-business-intelligence-courses/linkedin-for-alumni/http://www.kimballgroup.com/data-warehouse-business-intelligence-consulting/business-intelligence-project-review/http://www.kimballgroup.com/category/business-intelligence-and-data-warehouse-articles/http://www.kimballgroup.com/data-warehouse-business-intelligence-courses/pricing-and-policies/http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/http://www.kimballgroup.com/data-warehouse-business-intelligence-consulting/business-intelligence-requirements/http://www.kimballgroup.com/data-warehouse-business-intelligence-courses/logistics/http://www.kimballgroup.com/data-warehouse-business-intelligence-consulting/dimensional-modeling/http://www.kimballgroup.com/data-warehouse-business-intelligence-courses/education-training-classes/http://www.kimballgroup.com/data-warehouse-business-intelligence-courses/schedule/http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/events/http://www.kimballgroup.com/category/design-tips/http://www.kimballgroup.com/data-warehouse-business-intelligence-courses/registration/http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/books/http://www.kimballgroup.com/data-warehouse-business-intelligence-consulting/dimensional-model-design-review/
-
The Kimball Group. All Rights Reserved. Spark Logix Studios
(http://www.sparklogix.com/)
Forum (http://forum.kimballgroup.com/)
http://www.kimballgroup.com/about-kimball-group/http://www.sparklogix.com/http://www.kimballgroup.com/contact-kimball-group/http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/events/http://www.kimballgroup.com/sitemap_index.xmlhttp://www.kimballgroup.com/kimball-group-privacy-policy/http://forum.kimballgroup.com/