OLAPThe best starting point to approach the multidimensional
model - queries for which this model is best suited (Jarke et al.,
2000): "What is the total amount of receipts recorded last year per
state and per product category?" "What is the relationship between
the trend of PC manufacturers' shares and quarter gains over the
last five years?" "Which orders maximize receipts?" "Which one of
two new treatments will result in a decrease in the average period
of admission?" "What is the relationship between profit gained by
the shipments consisting of less than 10 items and the profit
gained by the shipments of more than 10 items?"It is clear that
using traditional languages, such as SQL, to express these types of
queries can be a very difficult task for inexperienced users. It is
also clear that running these types of queries against operational
databases would result in an unacceptably long response time.
The multidimensional model begins with the observation that the
factors affecting decision-making processes are enterprise-specific
facts, such as sales, shipments, hospital admissions, surgeries,
and so on. Instances of a fact correspond to events that occurred.
For example, every single sale or shipment carried out is an event.
Each fact is described by the values of a set of relevant measures
that provide a quantitative description of events. For example,
sales receipts, amounts shipped, hospital admission costs, and
surgery time are measures.Obviously, a huge number of events occur
in typical enterprisestoo many to analyze one by one. Imagine
placing them all into an n-dimensional space to help us quickly
select and sort them out. The n-dimensional space axes are called
analysis dimensions, and they define different perspectives to
single out events.
Examples:- the sales in a store chain can be represented in a
three-dimensional space whose dimensions are products, stores, and
dates. - as far as shipments are concerned, products, shipment
dates, orders, destinations, and terms & conditions can be used
as dimensions. - hospital admissions can be defined by the
department-date-patient combination, and you would need to add the
type of operation to classify surgery operations.
The concept of dimension gave life to the broadly used metaphor
of cubes to represent multidimensional data. According to this
metaphor, events are associated with cube cells and cube edges
stand for analysis dimensions. If more than three dimensions exist,
the cube is called a hypercube. Each cube cell is given a value for
each measure. Its analysis dimensions are store, product and date.
An event stands for a specific item sold in a specific store on a
specific date, and it is described by two measures: the quantity
sold and the receipts. This figure highlights that the cube is
sparsethis means that many events did not actually take place. Of
course, you cannot sell every item every day in every store.
HistoryThe first commercial multidimensional (OLAP) products
appeared approximately 30 years ago (Express). When Edgar Codd
introduced the OLAP definition in his 1993 white paper, there were
already dozens of OLAP.After Codd's research appeared, the software
industry began appreciating OLAP functionality and many companies
have integrated OLAP features into their products (RDBMS,
integrated business intelligence suites, reporting tools, portals,
etc.). In addition, for the last decade, pure OLAP tools have
considerably improved and become cheaper and more user-friendly.
These developments brought OLAP functionality to a much broader
range of users and organizations. Now OLAP is used not only for
strategic decision-making in large corporations, but also to make
daily tactical decisions about how to better streamline business
operations in organizations of all sizes and shapes. However, the
acceptance of OLAP is far from maximized. For example, one year
ago, The OLAP Survey 2 found that only thirty percent of its
participants actually used OLAP.
DefinitionsAn OLAP cube is an array of data understood in terms
of its 0 or more dimensions. OLAP is an acronym for online
analytical processing. OLAP is a computer-based technique for
analyzing business data in the search for business intelligence.
Source: Wikipedia.org
An online analytical processing cube (OLAP cube) is a
multidimensional array of data that serves as a database optimized
for OLAP applications and data warehousing. It is a way of storing
relevant data in a multidimensional form to make it appear more
logical when used to generate reports and facilitate more efficient
analytics.Source:
http://www.techopedia.com/definition/21142/online-analytical-processing-cube-olap-cube
An OLAP cube is a multidimensional database that is optimized
for data warehouse and online analytical processing (OLAP)
applications.An OLAP cube is a method of storing data in a
multidimensional form, generally for reporting purposes. In OLAP
cubes, data (measures) are categorized by dimensions. OLAP cubes
are often pre-summarized across dimensions to drastically improve
query time over relational databases. The query language used to
interact and perform tasks with OLAP cubes is multidimensional
expressions (MDX). The MDX language was originally developed by
Microsoft in the late 1990s, and has been adopted by many other
vendors of multidimensional databases.Although it stores data like
a traditional database does, an OLAP cube is structured very
differently. Databases, historically, are designed according to the
requirements of the IT systems that use them. OLAP cubes, however,
are used by business users for advanced analytics. Thus, OLAP cubes
are designed using business logic and understanding. They are
optimized for analytical purposes, so that they can report on
millions of records at a time. Source:
http://searchoracle.techtarget.com/tip/Why-OLAP-deserves-more-attention
Stands for "Online Analytical Processing." OLAP allows users to
analyze database information from multiple database systems at one
time. While relational databases are considered to be
two-dimensional, OLAP data is multidimensional, meaning the
information can be compared in many different ways. For example, a
company might compare their computer sales in June with sales in
July, then compare those results with the sales from another
location, which might be stored in a different database.Source:
http://www.techterms.com/definition/olap
Basically OLAP is an awful name, Nigel Pendse, author of the
OLAP reportcalls the same thing FASMI, which I think is a far
better term: Fast - 90% of queries back in under 10 secs and no
query takes longer than 30 secs. Analysis - Drill down, multiple
aggregation techniques, sophisticated graphics, trends all form
part of this Shareable - good security at the back end and
available to a wide community of users.also multi currency, multi
lingual to cope with the global economy. Multi-Dimensional - Excel
pivot tables but more so. The ability to have any
multipledimensions of information on each axisof a cross-tab with
other dimensions being used to furtherfilter theresults returned.
Information -Real world KPI's(Key Performance Indicators) rather
than raw numbers.Source: Andrew.Fryer, OLAP, Cubes and
Multidimensional Analysis, available at:
http://blogs.technet.com/b/andrew/archive/2007/08/22/olap-cubes-and-multidimensional-analysis.aspx
In OLAP the cube is the database structurethat is queried on and
to get a handle on how this works below is a simple 3dimensional
cubeExemple1.
The coordinate system in a cube not only has a reference to a
point in multidimensional space it also has an understanding of
hierarchies. So the cube 'knows' that January 2007 has a parent
called 2007 in the example above. This forms a key part of the OLAP
concept - that the results of calculations can be stored at the
parent level rather than using on the fly aggregation of all the
children e.g. the sales totalfor 2007 is stored in the cube for
bike, components etc. as is the cost of sale. The profit margin %
has to be worked out on the fly for bikes for 2007 but this is
quick as the cost of sales and the sales that contribute to this
calculation are pre-calculated. This gives OLAP it's speed while
allowing for rich calculations to be stored. As always in IT there
is a catch, and in this case is the complexity of the language used
to query a cube and that is MDX or multi-dimensional
expressions.
Exemple2Years later, the technology has been sufficiently
perfected to make OLAP against large data warehouses feasible,
truly bringing the "intelligence" to business intelligence. A huge
departure from traditional relational design, OLAP allows the data
to be stored and accessed in the most efficient mannerallowing
end-users to traverse the edges of a hypothetical "cube" of many
dimensions. (See below for an example of such a data cube).
The cube's dimensions are associated with facts (also called
"measures"). In relational terms, the facts have a many-to-one
relationship with the dimensions. For example, Acme Computer
Supplies may have a database for sales. Dimensions are usually
Customers, Products, and Time Element (month, quarter, etc.). The
sales figure for a specific product (Cat5e cables) to a specific
customer (Oracle Corp.) during a specific time period (Aug 2008) is
one measure. The dimensions are stored on individual tables and so
are the factsi.e. the sales figure. So the fact table, in
relational terminology, is a child table of the dimension
tables.But that's where the analogy ends. The access to the
measures in relational design would have been through indexes
created on the customer, product, or time columns of the fact
table. In the OLAP approach, specific cells (the measures) are
accessed by traversing the cube: in this example, by going to the
slice containing the time - Aug 08; then product - Cat5e; and
finally the customer - Oracle.Oracle knows how to go to these
slices by calculating the destination as in an array, not a table.
For instance, suppose the dimensions are organized as shown
below:Dimension Time := {'May','Jun','Jul','Aug'}Dimension Customer
:= {'Microsoft','IBM','Oracle','HP'}Dimension Product :=
{'Fiber','Cat6e','Cat5e','Serial'}
To find the measure for Oracle + Aug + Cat5e, the OLAP engine
performs the navigation like this:1. Aug 08 is the fourth element
of the array called Time, so travel to the fourth cell along the
time dimension of the cube.2. Cat5e is the third element of the
Product array, so travel to the third element.3. Oracle is the
third element of the Customer array, so travel to the third
element.That's it; now you've arrived at the measure you want. This
is done without indexes since the dimension values serve as array
pointers. Similarly, if you want to calculate the total sales to
all customers in Aug 08, you do the same thing as above, except
that in Step 3 you total the measures of the elements of the array
without going to a specific cell.
OLAP versus OLTPHari Mailvaganam, Slice, Dice and Drill! ,
available at
http://www.dwreview.com/OLAP/Introduction_OLAP.htmlOLAP allows
business users to slice and dice data at will. Normally data in an
organization is distributed in multiple data sources and are
incompatible with each other. A retail example: Point-of-sales data
and sales made via call-center or the Web are stored in different
location and formats. It would a time consuming process for an
executive to obtain OLAP reports such as - What are the most
popular products purchased by customers between the ages 15 to
30?Part of the OLAP implementation process involves extracting data
from the various data repositories and making them compatible.
Making data compatible involves ensuring that the meaning of the
data in one repository matches all other repositories. An example
of incompatible data: Customer ages can be stored as birth date for
purchases made over the web and stored as age categories (i.e.
between 15 and 30) for in store sales.It is not always necessary to
create a data warehouse for OLAP analysis. Data stored by
operational systems, such as point-of-sales, are in types of
databases called OLTPs. OLTP, Online Transaction Process, databases
do not have any difference from a structural perspective from any
other databases. The main difference, and only, difference is the
way in which data is stored.Examples of OLTPs can include ERP, CRM,
SCM, Point-of-Sale applications, Call Center.OLTPs are designed for
optimal transaction speed. When a consumer makes a purchase online,
they expect the transactions to occur instantaneously. With a
database design, call data modeling, optimized for transactions the
record 'Consumer name, Address, Telephone, Order Number, Order
Name, Price, Payment Method' is created quickly on the database and
the results can be recalled by managers equally quickly if
needed.
Figure 1. Data Model for OLTPData are not typically stored for
an extended period on OLTPs for storage cost and transaction speed
reasons.OLAPs have a different mandate from OLTPs. OLAPs are
designed to give an overview analysis of what happened. Hence the
data storage (i.e. data modeling) has to be set up differently. The
most common method is called the star design.
Figure 2. Star Data Model for OLAP
The central table in an OLAP start data model is called the fact
table. The surrounding tables are called the dimensions. Using the
above data model, it is possible to build reports that answer
questions such as: The supervisor that gave the most discounts. The
quantity shipped on a particular date, month, year or quarter. In
which zip code did product A sell the most. To obtain answers, such
as the ones above, from a data model OLAP cubes are created. OLAP
cubes are not strictly cuboids - it is the name given to the
process of linking data from the different dimensions. The cubes
can be developed along business units such as sales or marketing.
Or a giant cube can be formed with all the dimensions.
Figure 3. OLAP Cube with Time, Customer and Product
DimensionsOLAP can be a valuable and rewarding business tool. Aside
from producing reports, OLAP analysis can aid an organization
evaluate balanced scorecard targets.
Figure 4. Steps in the OLAP Creation Process
OLAP Storage
OLAP storage is one of the critical choices to be made when
designing the solution. OLAP storage comes in three forms:MOLAP -
Multidimensional OLAP. In MOLAP, both the source data and the
aggregations are stores in a multidimensional format. MOLAP is the
fastest option for data retrieval, but requires the most disk
space. Disk space is less of a concern these days with lowering
storage and processing cost.ROLAP - Relational OLAP. All data,
including the aggregations are stored within the source relational
database. This will be a concern for larger data warehousing
implementations which have higher usage needs. ROLAP is the slowest
for data retrieval. Whether an aggregation exists or not, a ROLAP
database must access the data warehouse itself. ROLAP is best
suited for smaller data warehousing implementations.HOLAP - Hybrid
OLAP. HOLAP is a combination of both the above storage
methodologies. HOLAP databases store the aggregations that exist
within a multidimensional structure, leaving the cell-level data
itself in a relational form. Where the data is pre aggregated,
HOLAP offers the performance of MOLAP, where the data must be
fetched from the tables. HOLAP is as slow as ROLAP.Due to shrinking
hardware and processing cost, MOLAP are generally most often used.
HOLAP is a better solution if the solution is accessing a
stand-alone database. ROLAP are more convenient to set up when the
query demands are relatively low and also on a stand-alone
database.http://businessintelligence.ittoolbox.com/documents/advantagesdisadvantages-of-molap-rolap-and-holap-15897
MOLAP Excellent performance- this is the more traditional way of
OLAP analysis. In MOLAP, data is stored in a multidimensional cube.
The storage is not in the relational database, but in proprietary
formats. Advantages: MOLAP cubes are built for fast data retrieval,
and are optimal for slicing and dicing operations. They can also
perform complex calculations. All calculations have been
pre-generated when the cube is created. Hence, complex calculations
are not only doable, but they return quickly. Disadvantages: It is
limited in the amount of data it can handle. Because all
calculations are performed when the cube is built, it is not
possible to include a large amount of data in the cube itself. This
is not to say that the data in the cube cannot be derived from a
large amount of data. Indeed, this is possible. But in this case,
only summary-level information will be included in the cube itself.
It requires an additional investment. Cube technology are often
proprietary and do not already exist in the organization.
Therefore, to adopt MOLAP technology, chances are additional
investments in human and capital resources are needed.
ROLAP This methodology relies on manipulating the data stored in
the relational database to give the appearance of traditional
OLAP's slicing and dicing functionality. In essence, each action of
slicing and dicing is equivalent to adding a "WHERE" clause in the
SQL statement. Advantages: It can handle large amounts of data. The
data size limitation of ROLAP technology is the limitation on data
size of the underlying relational database. In other words, ROLAP
itself places no limitation on data amount. It can leverage
functionalities inherent in the relational database. Often,
relational database already comes with a host of functionalities.
ROLAP technologies, since they sit on top of the relational
database, can therefore leverage these functionalities.
Disadvantages: Performance can be slow. Because each ROLAP report
is essentially a SQL query (or multiple SQL queries) in the
relational database, the query time can be long if the underlying
data size is large. It has limited by SQL functionalities. Because
ROLAP technology mainly relies on generating SQL statements to
query the relational database, and SQL statements do not fit all
needs (for example, it is difficult to perform complex calculations
using SQL), ROLAP technologies are therefore traditionally limited
by what SQL can do. ROLAP vendors have mitigated this risk by
building into the tool out-of-the-box complex functions as well as
the ability to allow users to define their own functions.
HOLAP HOLAP technologies attempt to combine the advantages of
MOLAP and ROLAP. For summary-type information, HOLAP leverages cube
technology for faster performance. When detail information is
needed, HOLAP can "drill through" from the cube into the underlying
relational data. Disclaimer: Contents are not reviewed for
correctness and are not endorsed or recommended by Toolbox.com or
any vendor. Popular Q&A contents include summarized information
from Business Intelligence Career discussion unless otherwise
noted.
Operations
The information in a multidimensional cube is very difficult for
users to manage because of its quantity, even if it is a concise
version of the information stored to operational databases. If, for
example, a store chain includes 50 stores selling 1000 items, and a
specific data warehouse covers three-year-long transactions
(approximately 1000 days), the number of potential events totals 50
1000 1000 = 5 10(7th). Assuming that each store can sell only 10
percent of all the available items per day, the number of events
totals 5 10(6th). This is still too much data to be analyzed by
users without relying on automatic tools.You have essentially two
ways to reduce the quantity of data and obtain useful information:
restriction and aggregation. The cube metaphor offers an
easy-to-use and intuitive way to understand both of these methods,
as we will discuss in the following paragraphs.
RestrictionRestricting data means separating part of the data
from a cube to mark out an analysis field. In relational algebra
terminology, this is called making selections and/or
projections.Selection has two forms: slicing and dicing.
Restriction - selections - slicing dicing projections
Common operations include Slice and Dice, Drill-Down, Roll-Up,
and Pivot:Source: Multidimensional OLAP Cubes, available at:
http://www.practicaldb.com/blog/cubes/
When you slice data, you decrease cube dimensionality by setting
one or more dimensions to a specific value. For example, if you set
one of the sales cube dimensions to a value, such as
store='EverMore', this results in the set of events associated with
the items sold in the EverMore store. According to the cube
metaphor, this is simply a plane of cellsthat is, a data slice that
can be easily displayed in spreadsheets. In the store chain example
given earlier, approximately 10(5th) events still appear in your
result. If you set two dimensions to a value, such as
store='EverMore' and date='4/5/2008', this will result in all the
different items sold in the EverMore store on April 5
(approximately 100 events). Graphically speaking, this information
is stored at the intersection of two perpendicular planes resulting
in a line. If you set all the dimensions to a particular value, you
will define just one event that corresponds to a point in the
three-dimensional space of sales.
Dicing is a generalization of slicing. It poses some constraints
on dimensional attributes to scale down the size of a cube. For
example, you can select only the daily sales of the food items in
April 2008 in Florida. In this way, if five stores are located in
Florida and 50 food products are sold, the number of events to
examine changes to 5 50 30 = 7500.Finally, a projection can be
referred to as a choice to keep just one subgroup of measures for
every event and reject other measures.
Slice: A slice is a subset of a multi-dimensional array
corresponding to a single value for one or more members of the
dimensions not in the subset.
Slice is any two-dimensional slice of the data cube. You slice a
data cube to filter information. For example, the figure below
shows a data cube with following dimensions: Retailer, Date and
ProductIf you are interested only in the data for a specific
retailer you can slice off a single (two dimensional) layer. In our
example the slice contains information on date and product for
department stores.
Dice: The dice operation is a slice on more than two dimensions
of a data cube (or more than two consecutive slices).
Dice is the "rotation" of the cube to reveal another, different
slice of data. For exploring data from various perspectives, you
can dice a data cube by exchanging the dimension for other
dimensions. For example, after exploring the data by date and
product for a specific retailer (orange slice on the left cube),
you want to get deeper information on date and retailer for a
specific product.
Drill Down/Up: Drilling down or up is a specific analytical
technique whereby the user navigates among levels of data ranging
from the most summarized (up) to the most detailed (down).
Drill Down is the exploration of data to subsequent levels of
more detail along a dimension. For example, the dimension
"Retailer" can be drilled-down to specific retailers, the dimension
"Date" can be drilled-down to months, and the dimension "Product"
finally, can be explored in more detail by single products.
Roll-up: (Aggregate, Consolidate) A roll-up involves computing
all of the data relationships for one or more dimensions. To do
this, a computational relationship or formula might be defined.
Roll-up is the aggregation of data to subsequent levels of
summary, along a dimension. This implies that dimensions are
typically hierarchical in nature based on parent/child
relationships between dimension values. Pivot: This operation is
also called rotate operation. It rotates the data in order to
provide an alternative presentation of data the report or page
display takes a different dimensional orientation.
ConclusionsIn summary, a multidimensional cube hinges on a fact
relevant to decision-making. It shows a set of events for which
numeric measures provide a quantitative description. Each cube axis
shows a possible analysis dimension. Each dimension can be analyzed
at different detail levels specified by hierarchically structured
attributes.
OLAP Benefits Successful OLAP applications increase the
productivity of business managers, developers, and whole
organizations. The inherent flexibility of OLAP systems means
business users of OLAP applications can become more
self-sufficient. Managers are no longer dependent on IT to make
schema changes, to create joins, or worse. Perhaps more
importantly, OLAP enables managers to model problems that would be
impossible using less flexible systems with lengthy and
inconsistent response times. More control and timely access to
strategic information equal more effective decision-making. IT
developers also benefit from using the right OLAP software.
Although it is possible to build an OLAP system using software
designed for transaction processing or data collection, it is
certainly not a very efficient use of developer time. By using
software specifically designed for OLAP, developers can deliver
applications to business users faster, providing better service.
Faster delivery of applications also reduces the applications
backlog. OLAP reduces the applications backlog still further by
making business users self-sufficient enough to build their own
models. However, unlike standalone departmental applications
running on PC networks, OLAP applications are dependent on Data
Warehouses and transaction processing systems to refresh their
source level data. As a result, IT gains more self-sufficient users
without relinquishing control over the integrity of the data. IT
also realizes more efficient operations through OLAP. By using
software designed for OLAP, IT reduces the query drag and network
traffic on transaction systems or the Data Warehouse. Lastly, by
providing the ability to model real business problems and a more
efficient use of people resources, OLAP enables the organization as
a whole to respond more quickly to market demands. Market
responsiveness, in turn, often yields improved revenue and
profitability.
OLAP functionality is: Multidimensional -- OLAP services provide
a wide variety of possible views or a multidimensional conceptual
view of the data by supporting a dimensional aggregation path or
hierarchies and/or multiple hierarchies. Easy to understand -- The
data mart designed for OLAP analysis should handle any business
logic and statistical analysis that is relevant to the application
and the developer, while at the same time, keeps it easy enough for
the target user. Interactive -- OLAP helps the user synthesize
business information through comparative, personalized viewing, as
well as thorough analysis of historical and projected data in
various "what-if" data model scenarios. The users are allowed to
define new ad hoc calculations as part of the analysis and can
report on the data in any desired way. Fast -- OLAP services are
usually implemented in a multi-user client/server mode and offer
consistently rapid responses to queries, regardless of database
size and complexity. The consolidated business data can be
pre-aggregated along with the hierarchies in all dimensions to
reduce the runtime calculation for building the OLAP reports.
Overview of the Dimensional Data ModelAvailable at:
http://docs.oracle.com/cd/B28359_01/olap.111/b28124/overview.htm#OLAUG9115
Dimensional objects are an integral part of OLAP. Because OLAP
is on-line, it must provide answers quickly; analysts pose
iterative queries during interactive sessions, not in batch jobs
that run overnight. And because OLAP is also analytic, the queries
are complex. The dimensional objects and the OLAP engine are
designed to solve complex queries in real time.The dimensional
objects include cubes, measures, dimensions, attributes, levels and
hierarchies. The simplicity of the model is inherent because it
defines objects that represent real-world business entities.
Analysts know: which business measures they are interested in
examining which dimensions and attributes make the data meaningful
how the dimensions of their business are organized into levels and
hierarchies.
Figure 1. Diagram of the OLAP Dimensional Model
Description of Diagram of the OLAP Dimensional ModelThe
dimensional data model is highly structured. Structure implies
rules that govern the relationships among the data and control how
the data can be queried. Cubes are the physical implementation of
the dimensional model, and thus are highly optimized for
dimensional queries. The OLAP engine leverages this innate
dimensionality in performing highly efficient cross-cube joins for
inter-row calculations, outer joins for time series analysis, and
indexing. Dimensions are pre-joined to the measures. The technology
that underlies cubes is based on an indexed multidimensional array
model, which provides direct cell access.The OLAP engine
manipulates dimensional objects in the same way that the SQL engine
manipulates relational objects. However, because the OLAP engine is
optimized to calculate analytic functions, and dimensional objects
are optimized for analysis, analytic and row functions can be
calculated much faster in OLAP than in SQL.The dimensional model
enables Oracle OLAP to support high-end business intelligence tools
and applications such as OracleBI Discoverer Plus OLAP, OracleBI
Spreadsheet Add-In, OracleBI Suite Enterprise Edition,
BusinessObjects Enterprise, and Cognos ReportNet.
CubesCubes provide a means of organizing measures that have the
same shape, that is, they have the exact same dimensions. Measures
in the same cube can easily be analyzed and displayed together. A
cube usually corresponds to a single fact table or view.
MeasuresMeasures populate the cells of a cube with the facts
collected about business operations. Measures are organized by
dimensions, which typically include a Time dimension.An analytic
database contains snapshots of historical data, derived from data
in a transactional database, legacy system, syndicated sources, or
other data sources. Three years of historical data is generally
considered to be appropriate for analytic applications.Measures are
static and consistent while analysts are using them to inform their
decisions. They are updated in a batch window at regular intervals:
weekly, daily, or periodically throughout the day. Some
administrators refresh their data by adding periods to the time
dimension of a measure, and may also roll off an equal number of
the oldest time periods. Each update provides a fixed historical
record of a particular business activity for that interval. Other
administrators do a full rebuild of their data rather than
performing incremental updates.A critical decision in defining a
measure is the lowest level of detail. Users may never view this
detail data, but it determines the types of analysis that can be
performed. For example, market analysts (unlike order entry
personnel) do not need to know that Beth Miller in Ann Arbor,
Michigan, placed an order for a size 10 blue polka-dot dress on
July 6, 2006, at 2:34 p.m. But they might want to find out which
color of dress was most popular in the summer of 2006 in the
Midwestern United States.The base level determines whether analysts
can get an answer to this question. For this particular question,
Time could be rolled up into months, Customer could be rolled up
into regions, and Product could be rolled up into items (such as
dresses) with an attribute of color. However, this level of
aggregate data could not answer the question: At what time of day
are women most likely to place an order? An important decision is
the extent to which the data has been aggregated before being
loaded into a data warehouse.
DimensionsDimensions contain a set of unique values that
identify and categorize data. They form the edges of a cube, and
thus of the measures within the cube. Because measures are
typically multidimensional, a single value in a measure must be
qualified by a member of each dimension to be meaningful. For
example, the Sales measure has four dimensions: Time, Customer,
Product, and Channel. A particular Sales value (43,613.50) only has
meaning when it is qualified by a specific time period (Feb-06), a
customer (Warren Systems), a product (Portable PCs), and a channel
(Catalog).Base-level dimension values correspond to the unique keys
of a fact table.
Hierarchies and LevelsA hierarchy is a way to organize data at
different levels of aggregation. In viewing data, analysts use
dimension hierarchies to recognize trends at one level, drill down
to lower levels to identify reasons for these trends, and roll up
to higher levels to see what affect these trends have on a larger
sector of the business.The elements of a dimension can be organized
as a hierarchy, a set of parent-child relationships, typically
where a parent member summarizes its children. Parent elements can
further be aggregated as the children of another parent. For
example May 2005's parent is Second Quarter 2005 which is in turn
the child of Year 2005. Similarly cities are the children of
regions; products roll into product groups and individual expense
items into types of expenditure.
Level-Based HierarchiesEach level represents a position in the
hierarchy. Each level above the base (or most detailed) level
contains aggregate values for the levels below it. The members at
different levels have a one-to-many parent-child relation. For
example, Q1-05 and Q2-05 are the children of 2005, thus 2005 is the
parent of Q1-05 and Q2-05.Suppose a data warehouse contains
snapshots of data taken three times a day, that is, every 8 hours.
Analysts might normally prefer to view the data that has been
aggregated into days, weeks, quarters, or years. Thus, the Time
dimension needs a hierarchy with at least five levels.Hierarchies
and levels have a many-to-many relationship. A hierarchy typically
contains several levels, and a single level can be included in more
than one hierarchy.Each level typically corresponds to a column in
a dimension table or view. The base level is the primary key.
Value-Based HierarchiesAlthough hierarchies are typically
composed of named levels, they do not have to be. The parent-child
relations among dimension members may not define meaningful levels.
For example, in an employee dimension, each manager has one or more
reports, which forms a parent-child relation. Creating levels based
on these relations (such as individual contributors, first-level
managers, second-level managers, and so forth) may not be
meaningful for analysis. Likewise, the line item dimension of
financial data does not have levels. This type of hierarchy is
called a value-based hierarchy.
AttributesAn attribute provides additional information about the
data. Some attributes are used for display. You might have
attributes like colors, flavors, or sizes. This type of attribute
can be used for data selection and answering questions such as:
Which colors were the most popular in women's dresses in the summer
of 2005? How does this compare with the previous summer?Time
attributes can provide information about the Time dimension that
may be useful in some types of analysis, such as identifying the
last day or the number of days in each time period.Each attribute
typically corresponds to a column in dimension table or view.
17