Top Banner
DATAWAREHOUSING WITH MySQL
66

Datawarehousing with MySQL

Jan 19, 2015

Download

Technology

Harshit Parekh

Dimensional Data Warehousing with MySQL
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Datawarehousing with MySQL

DATAWAREHOUSING WITH MySQL

Page 2: Datawarehousing with MySQL

CHAPTER 1-BASIC CONCEPTS

Page 3: Datawarehousing with MySQL

CHAPTER 1-BASIC CONCEPTS

In chapter 1 of the book dimensional datawarehousing with MySQL, four tasks are primarily covered which include:

• Creating a database user

• Creating the data warehouse database and the source database

• Creating data warehouse tables

• Generating surrogate keys

For the purpose of practice, the first step of creating a user-id has been avoided and the root id is used. All other tasks have been covered with a screenshot of the outcomes

Page 4: Datawarehousing with MySQL

CREATING A DATABASE

In this task two databases dw and source have been created using the following commands:

Create database dw;

Create database source;

Page 5: Datawarehousing with MySQL

CREATING DATAWAREHOUSING TABLES

Creating customer_dim table:

Page 6: Datawarehousing with MySQL

CREATING DATAWAREHOUSING TABLES

Creating product_dim table:

Page 7: Datawarehousing with MySQL

CREATING DATAWAREHOUSING TABLES

Creating order_dim table:

Page 8: Datawarehousing with MySQL

CREATING DATAWAREHOUSING TABLES

Creating date_dim table:

Page 9: Datawarehousing with MySQL

CREATING DATAWAREHOUSING TABLES

Creating sales_order_fact table:

Page 10: Datawarehousing with MySQL

GENERATING SURROGATE KEYSIn the customer_dim table, 3 entries are inserted with a null value for customer_sk field. The surrogate keys are automatically created

Page 11: Datawarehousing with MySQL

CHAPTER 2-DIMENSIONAL HISTORY

Page 12: Datawarehousing with MySQL

• Slowly Changing Dimension (SCD) is the technique for implementing dimension history in a dimensional data warehouse

• For practicing we would use SCD1 and SCD2 techniques

• Slowly Changing Dimension Type1 involves updating and inserting into the customer_dim table

• Before updating and inserting data a customer_stg table has to be created

Page 13: Datawarehousing with MySQL

• The Update statement copies the value of the customer_name column in the staging table to the customer_name column in the customer_dim table

• The Insert statement inserts the record in the staging table that is not yet present in the customer_dim table

• Running the script updates the name of the first customer and inserts the seventh customer in the staging table to the customer_dim table.

Page 14: Datawarehousing with MySQL

Creating and Loading Customer Staging Table

Page 15: Datawarehousing with MySQL

Applying SCD1: Updating Existing Customers and inserting into customer_dim

Page 16: Datawarehousing with MySQL

Slowly Changing SCD Type 2

• SCD2 has been applied to the product_dim table

• Whenever there is a change in the product_name and product_category columns, SCD2 would remove the existing row and add a new row that would describe the same product.

Page 17: Datawarehousing with MySQL

Creating a product_stg file

Page 18: Datawarehousing with MySQL

Applying SCD2 to the product_name and product_category in the product_dim table

Page 19: Datawarehousing with MySQL

• The next output would show that SCD2 has been applied successfully

• Product 1 has two rows

• One of the rows, with product_sk 1, has expired with expiry date 4th Febuary, 2007

• This is one day earlier to the expiry date before applying SCD2

• Also another row is created with product_sk as 3 and it has a new name

• Its effective date is 5th February 2007 and expiry date 9999-12-31

• This means that it has not yet expired

Page 20: Datawarehousing with MySQL
Page 21: Datawarehousing with MySQL

Chapter 3: Measure Additivity

Page 22: Datawarehousing with MySQL

Testing Full AdditivityInserting data into order_dim table

Page 23: Datawarehousing with MySQL

Testing Full Additivity: Inserting data into table date_dim

Page 24: Datawarehousing with MySQL

Testing Full Additivity:Inserting data into sales_order_fact:

Page 25: Datawarehousing with MySQL

Testing Full Additivity: Generating the sum of the total order amounts by querying across all

dimensions:

Page 26: Datawarehousing with MySQL

Testing Full Additivity:Generating the sum by querying across date,

customer and order

Page 27: Datawarehousing with MySQL

Testing Full Additivity:Generating the sum of total orders by querying across date and

order:

Page 28: Datawarehousing with MySQL

Chapter 4: Dimensional Queries

Page 29: Datawarehousing with MySQL

Aggregate Queries

• Aggregate queries aggregates individual facts

• The values are either summed or counted

• Under aggregate queries we would run two examples: aggregation of daily sales and annual sales

• In all the cases, joins between tables is done using surrogate keys.

Page 30: Datawarehousing with MySQL

Daily Sales Aggregation: The aggregation of the order amounts and

number of orders is done by date

Page 31: Datawarehousing with MySQL

Annual Sales Aggregation: The order amounts and the number of orders are not only aggregated by date, but also by product and customer city

Page 32: Datawarehousing with MySQL
Page 33: Datawarehousing with MySQL

Specific Queries: Monthly Storage Product Sales:

The following query aggregates sales amount and the number of orders per month.

Page 34: Datawarehousing with MySQL

Specific Queries: Quarterly Sales in Mechanicsburg:

The following query produces the quarterly aggregation of the order amounts in Mechanicsburg

Page 35: Datawarehousing with MySQL

Inside-Out Queries:Product Performer:

The following query gives you the sales orders of products that have a monthly sales amount of 7,500 or more.

Page 36: Datawarehousing with MySQL

Inside –Out Queries:Loyal Customer

The following query shows customers who have placed more than five orders annually in the past 18 months

Page 37: Datawarehousing with MySQL

Chapter 5: Source Extraction

Page 38: Datawarehousing with MySQL

• Push-by-source CDC- It means that the source system extracts only the changes since the last extraction

• Push-by-source CDC has been demonstrated on the sales order source data

• It has been done using a stored procedure that extracts sales order data from the sales_order in the source database

Page 39: Datawarehousing with MySQL

Creating a sales_order table in another database called source and inserting values in the tables

Order_dim:

Page 40: Datawarehousing with MySQL

Date_dim

Page 41: Datawarehousing with MySQL

Inserting values in the order_dim and date_dim tables in the dw database

Page 42: Datawarehousing with MySQL

Running the following stored procedure:

USE source;

DELIMITER // ;

DROP PROCEDURE IF EXISTS push_sales_order //

CREATE PROCEDURE push_sales_order()

BEGIN

INSERT INTO dw.sales_order_fact

SELECT a.order_amount, b.order_sk, c.customer_sk, d.product_sk, e.date_sk

FROM sales_order a, dw.order_dim b, dw.customer_dim c, dw.product_dim d, dw.date_dim e

WHERE a.entry_date = CURRENT_DATE

AND a.order_number = b.order_number AND a.customer_number = c.customer_number AND a.product_code = d.product_code AND a.order_date >= d.effective_date

AND a.order_date <= d.expiry_date

AND a.order date = e.date

;

END

//

 DELIMITER ; //

 

 

 

Page 43: Datawarehousing with MySQL

The above stored procedure will make changes to the sales_order_fact table in dw database.

Page 44: Datawarehousing with MySQL

Chapter 6 : Populating the Date Dimension

Page 45: Datawarehousing with MySQL

Pre-population: It is the simplest of the three techniques where the dates are inserted for a period of time

For e.g. date could be inserted for 5 years between 2009 and 2014

Truncating the date_dim table

Page 46: Datawarehousing with MySQL

One Date Everyday: This technique is similar to pre-population technique, but in

this technique only one date is pre-populated in a dayDaily Date population:

Page 47: Datawarehousing with MySQL

Loading dates from the source: The query loads the sales order dates from the sales_order table of the source database into the date_dim table of the DW database.

Page 48: Datawarehousing with MySQL

Adding more dates from the additional sales order

Page 49: Datawarehousing with MySQL

Chapter 7: Initial Population

Page 50: Datawarehousing with MySQL

Initial Population: After identifying the source data, a script is written for initial

populationOrder_dim table:

Page 51: Datawarehousing with MySQL

Sales_order_fact table

Page 52: Datawarehousing with MySQL

Running the Initial Population SchemeTruncating the sales_order table:

Page 53: Datawarehousing with MySQL

Running the Initial Population SchemePreparing the sales order table

Page 54: Datawarehousing with MySQL

Query to confirm whether the sales_order are loaded correctly or not

Page 55: Datawarehousing with MySQL

Chapter 8: Regular Population

Page 56: Datawarehousing with MySQL

Regular Population Script: In this script, customer_dim and product_dim have been reloaded with

data. SCD2 is applied to customer addresses, product names, and product groups. SCD1 is applied to customer names

Order_dim table:

Page 57: Datawarehousing with MySQL

Product_dim:

Page 58: Datawarehousing with MySQL

Product_stg:

Page 59: Datawarehousing with MySQL

Sales_order_fact:

Page 60: Datawarehousing with MySQL

Customer_dim:

Page 61: Datawarehousing with MySQL

Testing Data: The data is tested by running the select query on the sales_order table and the sales_order_fact table

Page 62: Datawarehousing with MySQL

Chapter 10: Adding Columns

Page 63: Datawarehousing with MySQL

Adding New Columns to the customer dimension:

When two new columns are added, the null values would be displayed in the respective

rows:Customer_dim table:

Page 64: Datawarehousing with MySQL

Customer_stg table:

Page 65: Datawarehousing with MySQL

• Adding the order_quantity column in the sales_order•

Page 66: Datawarehousing with MySQL