Top Banner
ETL and OLAP Cube Reporting Using the NetFlix OLTP Database By: Rona Charlene Lao
23
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 1.ETL and OLAP Cube Reporting
    Using the NetFlix OLTP Database
    By: Rona Charlene Lao

2. Introduction
This project is about building a Data Warehouse database from the Netflix database from the first weeks Assignment.
Objectives:
To provide an end to end solution to upload transactional data into the Data Warehouse.
Provide dynamic reports for NetFlix showing various representations of their aggregated data based on Rental, Shipment, Payment and DVD Inventory.
To demonstrate how OLAP is used to provide dynamic multidimensional reports.
3. Scope
To create mock up data to be uploaded into the Data Warehouse
To build a complete end to end ETL solution.
Use of SQL*Loader, stored procedures and triggers to implement business transformation rules from Staging to Target Area.
To create canned reports and demonstrate how Data Warehouses can provide Dynamic multidimensional reports
4. Out of Scope
To build the OLTP database from scratch
Code all business and functional rules related to Netflix data storage and operational requirements
5. Tools and Environment
6. Process Flow
7. Process Flow - Extract
SQL Queries
SQL Queries were ran against the NetFlix OLTP Database to extract the data for the dimension tables.
The extracts were saved as CSV Files.
SQL*Loader This tool was used to upload the CSV Files into the Staging Area of the DW database.
Stored Procedures Used to extract data for the Memberand DVD dimension tables and for the fact tables.
Fact Tables stored procedures have two parameters, startdt and enddt.
8. Process Flow - Extract
Control File
SQL*Loader
9. Process Flow - Transform
After the Stored Procedure for the DVD extract executes, the V_DVD materialized view gets refreshed (force)
T_STAR_DIM, also gets automatically updated through a trigger once the STG_MOVIEPERSONROLE_DIM table gets populated.
The T_STAR_DIM table is a denormalized version of the MOVIEPERSONROLE table
T_MEMBER_DIM is also a denormalizedversion of a source table
10. Process Flow Load
The Stored Procedure, POP_TARGET_SP, moves the data from the Staging Area (STG_) to its corresponding table in the Target Area (T_) within the DW Database.
Only takes the records that are not already in the Target Area.
Ensures that there is only a subset of data that is run by the process while guaranteeing the preservation of historical data in the Target Fact Tables (T_*_F).
Uses NOT IN statements to ensure that there is no duplication
Listed in sequence to preserve and abide byintegrity constraints set up in the Target Area.
11. Database Diagram - NetFlix
12. Database Diagram - DW
13. OLAP Cubes and Reporting
3 Cubes
Rental Cube
DVD Cube
Payment Cube
Reports
Dashboard
Microsoft Excel Pivot Tables using Offline Cubes
14. Rental-DVD Cube
This cube is a virtual cube, a combination of the Rental cube and the DVD cube.
Rental Cube
DVD Cube
15. Rental-DVD Cube
Dimensions and Measures
16. Rental-DVD Dashboard
17. Payment Cube
Starflake schema
Outer join on T_MEMBER_DIM
Calculated Measure
Example of a Data Warehouse constraint
18. Payment Cube
Dimensions and Measures
19. Payment Cube Dashboard and Report
20. Incremental Load
Created mock up data
Performed CSV extracts
Ran SQL*Loader
Ran Stored Procedures for the population of the Staging Area
Ran Stored Procedure for the population of the Target Area
Refreshed Online Cubes
Recreated Offline Cubes
21. Demo
Please see the demo.avi file in the ronalao_term.zip file
22. Sources/References
CS779 NetFlix_Oracle_Inserts.sql
CS779 Netflix_Oracle_Create_Indexes.sql
CS779 NetFlix_Oracle_Create_Tables.sql
OLAP Cube 3.0 : http://www.adersoft.com
http://msdn.microsoft.com/en-us/library/aa216377(SQL80).aspx
http://e-articles.info/e/a/title/Dashboard-Report/
http://camstudio.org
23. Thank you
Good luck in the final exams!