ETL and OLAP Cube Reporting Using the NetFlix OLTP Database By: Rona Charlene Lao
2. Introduction
This project is about building a Data Warehouse database from the
Netflix database from the first weeks Assignment.
Objectives:
To provide an end to end solution to upload transactional data into
the Data Warehouse.
Provide dynamic reports for NetFlix showing various representations
of their aggregated data based on Rental, Shipment, Payment and DVD
Inventory.
To demonstrate how OLAP is used to provide dynamic multidimensional
reports.
3. Scope
To create mock up data to be uploaded into the Data Warehouse
To build a complete end to end ETL solution.
Use of SQL*Loader, stored procedures and triggers to implement
business transformation rules from Staging to Target Area.
To create canned reports and demonstrate how Data Warehouses can
provide Dynamic multidimensional reports
4. Out of Scope
To build the OLTP database from scratch
Code all business and functional rules related to Netflix data
storage and operational requirements
5. Tools and Environment
6. Process Flow
7. Process Flow - Extract
SQL Queries
SQL Queries were ran against the NetFlix OLTP Database to extract
the data for the dimension tables.
The extracts were saved as CSV Files.
SQL*Loader This tool was used to upload the CSV Files into the
Staging Area of the DW database.
Stored Procedures Used to extract data for the Memberand DVD
dimension tables and for the fact tables.
Fact Tables stored procedures have two parameters, startdt and
enddt.
8. Process Flow - Extract
Control File
SQL*Loader
9. Process Flow - Transform
After the Stored Procedure for the DVD extract executes, the V_DVD
materialized view gets refreshed (force)
T_STAR_DIM, also gets automatically updated through a trigger once
the STG_MOVIEPERSONROLE_DIM table gets populated.
The T_STAR_DIM table is a denormalized version of the
MOVIEPERSONROLE table
T_MEMBER_DIM is also a denormalizedversion of a source table
10. Process Flow Load
The Stored Procedure, POP_TARGET_SP, moves the data from the
Staging Area (STG_) to its corresponding table in the Target Area
(T_) within the DW Database.
Only takes the records that are not already in the Target
Area.
Ensures that there is only a subset of data that is run by the
process while guaranteeing the preservation of historical data in
the Target Fact Tables (T_*_F).
Uses NOT IN statements to ensure that there is no duplication
Listed in sequence to preserve and abide byintegrity constraints
set up in the Target Area.
11. Database Diagram - NetFlix
12. Database Diagram - DW
13. OLAP Cubes and Reporting
3 Cubes
Rental Cube
DVD Cube
Payment Cube
Reports
Dashboard
Microsoft Excel Pivot Tables using Offline Cubes
14. Rental-DVD Cube
This cube is a virtual cube, a combination of the Rental cube and
the DVD cube.
Rental Cube
DVD Cube
15. Rental-DVD Cube
Dimensions and Measures
16. Rental-DVD Dashboard
17. Payment Cube
Starflake schema
Outer join on T_MEMBER_DIM
Calculated Measure
Example of a Data Warehouse constraint
18. Payment Cube
Dimensions and Measures
19. Payment Cube Dashboard and Report
20. Incremental Load
Created mock up data
Performed CSV extracts
Ran SQL*Loader
Ran Stored Procedures for the population of the Staging Area
Ran Stored Procedure for the population of the Target Area
Refreshed Online Cubes
Recreated Offline Cubes
21. Demo
Please see the demo.avi file in the ronalao_term.zip file
22. Sources/References
CS779 NetFlix_Oracle_Inserts.sql
CS779 Netflix_Oracle_Create_Indexes.sql
CS779 NetFlix_Oracle_Create_Tables.sql
OLAP Cube 3.0 : http://www.adersoft.com
http://msdn.microsoft.com/en-us/library/aa216377(SQL80).aspx
http://e-articles.info/e/a/title/Dashboard-Report/
http://camstudio.org
23. Thank you
Good luck in the final exams!