Top Banner
Retail Sales Kimball & Ross, Chapter 2 Name Student ID Course Jessica Raquel Zaqueu 14165 BIS Chayanit Nadam 14191 BIS Wong Aun Chyi 14214 BIS Presented by:
32

Chapter 2 - Retail Sales

Nov 30, 2014

Download

Education

This is a group assignment by my students on Chapter 2 Retail Sales of the book The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling
By Ralph Kimball, Margy Ross
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 2 - Retail Sales

Retail SalesKimball & Ross, Chapter 2

Name Student ID Course

Jessica Raquel Zaqueu 14165 BIS

Chayanit Nadam 14191 BIS

Wong Aun Chyi 14214 BIS

Presented by:

Page 2: Chapter 2 - Retail Sales

Overview• Four-step dimensional design process• Transaction-level fact tables• Additive and non-additive facts• Sample dimension table attributes• Causal dimensions• Degenerate dimensions• Extending an existing dimension model• Snow flaking dimension attributes• Avoiding the “too many dimensions” trap• Surrogate keys

Page 3: Chapter 2 - Retail Sales

Four-Step Dimensional Design Process

1. Select the business process to model.▫ not business department or function ▫ E.g., purchasing, ordering, shipping, invoicing,

inventorying

2. Declare the grain of the business process.▫ Specifies individual fact table row▫ E.g., individual line item on sales ticket, daily

snapshot of the inventory levels for a product

Page 4: Chapter 2 - Retail Sales

Four-Step Dimensional Design Process3. Choose the dimensions that apply for each fact table

row.▫ Q: How do business people describe the data that results

from the business process?▫ E.g., date, product, store, customer, transaction type

4. Identify the numeric (measured) facts that will populate each fact table row.▫ Q: What are we measuring?▫ Typical facts are numeric additive figures▫ E.g., quantity ordered, dollar cost amount

• In making decisions regarding the 4 steps, consider both the user requirements as well as the realities of the source data

Page 5: Chapter 2 - Retail Sales

Retail Case Study• Large grocery chain: 100 grocery stores over 5 regions• Each store:

▫ Departments: grocery, frozen foods, dairy, meat, produce, bakery, floral, health/beauty aids, etc.

▫ 60,000 products (SKUs = stock keeping units) on shelves▫ 55,000 SKUs with UPCs▫ 5,000 SKUs without UPCs but with assigned SKU numbers

• Data is collected:▫ from cash registers into a point-of-sale (POS) system ▫ at back door where vendors make deliveries

Page 6: Chapter 2 - Retail Sales

Retail Case Study – Cont’d• Management concerns

▫ Logistics of ordering, stocking, and selling products▫ Maximizing profit▫ Product pricing▫ Lowering cost of acquisition and overhead▫ Use of promotions to increase sales

temporary price reductions newspaper ads grocery store displays coupons

Page 7: Chapter 2 - Retail Sales

Step 1. Select the Business Process

• Decide what business process to model, by combining an understanding of the business requirements with an understanding of data realities.

• The first dimensional model built should be the one ▫ with the most impact, ▫ that answers the most pressing business questions,▫ is readily accessible for data extraction.

• In retail case study: POS retail sales• Business Question: What products are selling in which

stores on what days and under what promotional conditions?

Page 8: Chapter 2 - Retail Sales

Step 2. Declare the Grain• What level of data detail should be made

available in the dimensional model?• Choose the most atomic information captured

by the business process.▫ Atomic data

Most detailed, cannot be subdivided Facilitates ad hoc, unexpected usage and ability to

drill down to details

• Case study grain: individual line item on a POS transaction

Page 9: Chapter 2 - Retail Sales

Step 3. Choose the Dimensions

• A careful grain statement determines the primary dimensions.

• It is then usually possible to add additional dimensions.

• If an additional desired dimension violates the grain by causing additional fact rows to be generated, then the grain statement must be revised to accommodate this dimension.

• Case study dimensions: date, product, store, promotion

Page 10: Chapter 2 - Retail Sales

Preliminary Retail Sales Schema

Page 11: Chapter 2 - Retail Sales

Step 4. Identify the Facts • Picking the business measurements for the fact table:

true to the grain. • Case study - Facts collected by POS system:

▫ Sales quantity, sales price/unit, sales $ amount, standard cost $ amount

▫ Gross Profit = cost – sales Recommendation: Include in fact table even though it can

be calculated. Eliminates the possibility of user error.• For non-additive measurements such as percentages

and ratios (e.g., gross margin) store the numerator (gross profit) and denominator ($ revenue) in the fact table. The ratio can be calculated in a data access tool for any slice of the fact table. Caution: Calculate the ratio of the sums, not the sum of the ratios

Page 12: Chapter 2 - Retail Sales

Measured Facts in the retail sales schema

Page 13: Chapter 2 - Retail Sales

Key Input to the four-step dimensional design process

Page 14: Chapter 2 - Retail Sales

Data Dimension• In every data mart•Use verbose -> self-explanatory values

rather than codes valuesEx. Holiday indicator by using holiday and

nonholiday instead of using Y and N•Data key should be integer rather than date

type• If transaction time is of interest -> Time

dimensional table

Page 15: Chapter 2 - Retail Sales

Data Dimension

Page 16: Chapter 2 - Retail Sales

Data Dimension

•Why explicit date dimension table is needed?

•Answer: because relational database cannot handle an efficient join to the date dimension table -> deep trouble

•Answer: because most database do not index SQL date calculation

Page 17: Chapter 2 - Retail Sales

Product Dimension

•Describe every SKU in the store•From operational product master file•Hold the descriptive attribute of each

SKU•Hierarchies = groups of attributes•Merchandise hierarchy -> each is a many

to one relationship•It will be redundancy -> no need to

normalized -> space saving is minimal

Page 18: Chapter 2 - Retail Sales

Product Dimension

Page 19: Chapter 2 - Retail Sales

Store Dimension•Describe every store in grocery chain =!

Product master file •Possible to represent multiple hierarchies

in a dimension table

Page 20: Chapter 2 - Retail Sales

Store Dimension

Page 21: Chapter 2 - Retail Sales

Promotion Dimension

•Describe the promotion conditions under which product was sold

•Causal Dimension -> factors thought to cause a change in product sales

•4 causal mechanisms -> Price reductions -> Ads -> Displays -> Coupons

Page 22: Chapter 2 - Retail Sales

Promotion Dimension4 casual mechanism

Keep all dimensions together- correlated -> so not much difference in

space requirement - browed efficiently to see hot the various promotions are used together Separating the 4 causal mechanism into distinct dimension table - more understandable to the business community

- more straightforward than administering a combined dimension • No Promotion in Effect -> line item not being promoted

-> avoid null promotion key in the fact table

Page 23: Chapter 2 - Retail Sales

Promotion Dimension• Q: Which products were under promotion but did

not sell? Cannot answer! -> POS sales fact table has only

products that were sold

• Factless Fact Table = has no measurement metrics -> determine what product where on promotion but did not sell

• 2 step processes to answer Q

- Query the promotion coverage table- Determine what products sold from the POS sales fact table

So -> the answer is the set difference between these 2 lists of products.

Page 24: Chapter 2 - Retail Sales

Promotion Dimension

Page 25: Chapter 2 - Retail Sales

Degenerate Dimension• Degenerate dimensions often play an integral role in the

fact table’s primary key. • Degenerate dimensions are very common when the

grain of a fact table represents a single transaction or transaction line item because the degenerate dimension represents the unique identifier of the parent.

• Operational control numbers such as order numbers, invoice numbers, and bill-of- lading numbers usually give rise to empty dimensions and are represented as degenerate dimensions (that is, dimension keys without corresponding dimension tables) in fact tables where the grain of the table is the document itself or a line item in the document.

Page 26: Chapter 2 - Retail Sales

Retail Schema Extensibility

•Original schema extends gracefully because POS transaction data was modeled at its most granular level.

•Premature aggregation limits ability to extend if new dimensions do not apply to higher grain

Page 27: Chapter 2 - Retail Sales

Surrogate Keys

•We strongly encourage the use of surrogate keys in dimensional models rather than relying on operational production codes.

•surrogate keys are integers that are assigned sequentially as needed to populate a dimension.

•The surrogate keys merely serve to join the dimension tables to the fact table.

Page 28: Chapter 2 - Retail Sales

Surrogate Keys

•One of the primary benefits of surrogate keys is that they buffer the data warehouse environment from operational changes.

•Surrogate keys allow the warehouse team to maintain control of the environment rather than being whipsawed by operational rules for generating, updating, deleting, recycling, and reusing production codes.

Page 29: Chapter 2 - Retail Sales

• Surrogate keys provide the warehouse with a mechanism to differentiate these two separate instances of the same operational account number.

• Surrogate keys allow the data warehouse team to integrate data from multiple operational source systems, even if they lack consistent source keys.

• The surrogate key is as small an integer as possible while ensuring that it will accommodate the future cardinality or maximum number of rows in the dimension comfortably.

• The smaller surrogate key translates into smaller fact tables, smaller fact table indices, and more fact table rows per block input-output operation.

• Finally, surrogate keys are needed to support one of the primary techniques for handling changes to dimension table attributes. This is actually one of the most important reasons to use surrogate keys.

Page 30: Chapter 2 - Retail Sales

Market Basket Analysis•This notion of analyzing the combination of

products that sell together is known by data miners as affinity grouping but more popularly is called market basket analysis.

•Market basket analysis gives the retailer insights about how to merchandise various combinations of items. If frozen pasta dinners sell well with cola products, then these two products perhaps should be located near each other or marketed with complementary pricing.

Page 31: Chapter 2 - Retail Sales

Market Basket Analysis (cont’d)

•Data mining tools and some OLAP products can assist with market basket analysis

•The key to realistic market basket analysis is to remember that the primary goal is to understand the meaningful combinations of products sold together.

Page 32: Chapter 2 - Retail Sales

Conclusion• In this chapter we got our first exposure to designing a

dimensional model. • Regardless of industry, we strongly encourage the four-step

process for tackling dimensional model designs. Remember that it is especially important that we clearly state the grain associated with our dimensional schema. Loading the fact table with atomic data provides the greatest flexibility because we can summarize that data “every which way.”

• As soon as the fact table is restricted to more aggregated information, we’ll run into walls when the summarization assumptions prove to be invalid.

• Remember that it is vitally important to populate our dimension tables with verbose, robust descriptive attributes.