Introduction To Data Warehouse Using Cognos 8 BI Using Cognos 8 BI Created By : Gourav Atalkar Reviewed By: Amit Sharma Contact Point : [email protected]
Introduction To Data Warehouse Using Cognos 8 BIUsing Cognos 8 BI
Created By : Gourav Atalkar
Reviewed By: Amit Sharma
Contact Point : [email protected]
Course Roadmap
• Data Warehousing - An Overview• Data Warehouse Architecture• Data Modeling for Data Warehousing• Overview (OLAP)• Multidimensional Analysis• Multidimensional AnalysisØ Multidimensional Analysis IntroductionØ Operations In multidimensional AnalysisØ Multidimensional Data ModelØ Multi-Dimensional vs. Relational
Objectives
• At the end of this lesson, you will know :– What is the Need of Data Warehousing (Scenarios)– What is Data Warehousing – The evolution of Data Warehousing– Need for Data Warehousing– Need for Data Warehousing– OLTP Vs Warehouse Applications– Data marts Vs Data Warehouses– Data Warehouse Schemas– Reporting fundamentals
Business Scenario –I
You are a database administrator for a company that is called TBC: TheFMCG Company. The company manufactures daily needs products forsale to other businesses. The financial department wants to track,analyze, and forecast the sales revenue across geographic regions on aperiodic basis for all products sold.
•What is the most effective distribution channel ?•What product promotions have the biggest impact on revenue?•Who are my customers and what products are they buying?•Which are our lowest/highest margin customers ?•What impact will new products/services have on revenue and margins?•Which customers are most likely to go to the competition ?
Data Input
Business Scenario -I
Delhi
Sales per product type
OLAP S
Mumbai
Kolkata
Bhopal
Sales per product type per branch
for first quarter.
SERVER
Sales Manager
Solution: I
Extract sales information from each database.Store the information in a common repository at a single site.Data Input
DelhiData Output via
Query &Analysis tools Report
Mumbai
Kolkata
Bhopal
Data Ware House
Data Output via Business
Intelligence Tool (i.e. Cognos, MSBI,
Hyperion)
Sales Manager
One Stop Shopping Super Market has huge operational database.Whenever Executives wants some report the OLTP systembecomes slow and data entry operators have to wait for sometime.
Business Scenario –II
time.
Business Scenario –II Data Entry Operator
Management
Report
Data Entry Operator
WaitOperational
Database
Solution: II
Extract data needed for analysis from operational database.
Store it in warehouse. Refresh warehouse at regular interval
so that it contains up to date information for analysis.
Warehouse will contain data with historical perspective.
Solution: II Data Entry Operator
Report
Data Entry Operator
Operational Database
Data Ware House
Extractdata
Transaction
Management
Cakes & Cookies is a small, new company. President ofthe company wants his company should grow. He needsinformation so that he can make correct decisions.
Business Scenario –III
Improve the quality of data before loading it into the warehouse.Perform data cleaning and transformation before loading the data.Use query analysis tools to support ad-hoc queries.
Solution: III
Data Output via
Query &Analysis tools
Improvement
Data Ware House
Data Output via Business
Intelligence Tool (i.e. Cognos,
MSBI, Hyperion)President
A single, complete and consistent store of data obtainedfrom a variety of different sources made available to endusers in a what they can understand and use in a businesscontext.
What is a Data ware House ?
A process of transforming data into information and makingit available to users in a timely enough manner to make adifference
Characteristics of Data Warehouse
• A data warehouse is a
Subject oriented
Integrated
Time varyingTime varying
Non-volatile
collection of data that is used primarily inorganizational decision making.
Subject-oriented Characteristics of a Data Warehouse
Operational Data Warehouse
Quotes
Leads
Orders
Inventory Customers Products
Regions Time
Integrated Characteristics of a Data Warehouse
RDBMS
• Data Warehouse is constructed by integrating multiple heterogeneous sources.
• Data Preprocessing are applied to ensure consistency.
RDBMS
DataWarehouse
Flat File
LegacySystem
Data ProcessingData Transformation
Time Variant Characteristics of a Data Warehouse
Operational Data Warehouse
Current Value data• time horizon : 60-90 days• key may not have element of time
Snapshot data• time horizon : 5-10 years• key has an element of time• data warehouse stores historical data
Non Volatile Characteristics of a Data Warehouse
Operational Data
changeinsertOnly Select
Operational Data Warehouse
replace change
insertdelete
load
Optimized Loader
ExtractionCleansing
RelationalDatabases
ERPSystems
Data Warehouse Architecture
Data Warehouse Engine
Cleansing
AnalyzeQuery
Metadata RepositoryLegacyData
Purchased Data
Systems
OLTP vs Data Warehouse
• OLTP– Application Oriented– Used to run business– Detailed data
• Warehouse (DSS)– Subject Oriented– Used to analyze business– Summarized and refined
– Current up to date– Isolated Data– Repetitive access– Clerical User
– Snapshot data– Integrated Data– Ad-hoc access– Knowledge User
(Manager)
Online analytical Process[OLAP]
OLAP is a category of software tools that provides analysis of data storedin a database. With OLAP, analysts, managers, and executives can gaininsight into data through fast, consistent, interactive access to a widevariety of possible views.
Data Ware House
Product
OLAP is a category of software tools that provides analysis ofdata stored in a database. With OLAP, analysts, managers, andexecutives can gain insight into data through fast, consistent,interactive access to a wide variety of possible views.
Online analytical Process[OLAP]
•What is an OLAP Cube? As you saw in the definition of OLAP,the key requirement is multidimensional. OLAP achieves themultidimensional functionality by using a structure called acube. The OLAP cube provides the multidimensional way tolook at the data. The cube is comparable to a table in arelational database.
Features of Cube RepresentationSlicing: A slice is a subset of a multidimensional arraycorresponding to a single value for one or more members ofthe dimensions not in the subset.
Features of Cube Representation
Dicing : A related operation to slicing is dicing. In the case ofdicing, you define a sub-cube of the original space. The datayou see is that of one cell from the cube. Dicing provides youthe smallest available slice.
Rotating : Rotating changes the dimensional orientation ofthe report from the cube data. For example, rotating mayconsist of swapping the rows and columns, or moving one ofthe row dimensions into the column dimension.
Features of Cube Representation
Dimension :A dimension represents descriptive categories ofdata such as time or location. In other words, dimensions arebroad groupings of descriptive data about a major aspect ofa business, such as dates, markets, or products.
Features of Cube Representation
Measure : The measures are the actual data values that occupy thecells as defined by the dimensions selected. Measures include facts orvariables typically stored as numerical fields, which provide the focalpoint of investigation using OLAP. For instance, you are amanufacturer of cellular phones. The question you want answered ishow many xyz model cell phones (product dimension) a particularplant (location dimension) produced during the month of January
Features of Cube Representation
plant (location dimension) produced during the month of January2003 (time dimension).
Data Warehouse Schema
ØStar SchemaØFact Constellation SchemaØSnowflake Schema
Definition : Facts are numeric measurements (values) thatrepresent a specific business activity.
Facts are stored in a FACT table I.e. the center of the starschema . Facts are used in business data analysis, are units,cost, prices and revenues
Fact:
cost, prices and revenues
Example: sales figures are numeric measurements thatrepresent product and/or service sales.
The Fact Table holds the measures, or facts. The measures arenumeric and additive across some or all of the dimensions.For example, sales are numeric and users can look at totalsales for a product, or category, or subcategory, and by anytime period. The sales figures are valid no matter how the
Fact:
time period. The sales figures are valid no matter how thedata is sliced.The centralized table in a star schema is called as FACT table, that contains facts and connected to dimensions.
A fact table typically has two types of columns:Ø Contain facts and Ø Foreign keys to dimension tables.
The primary key of a fact table is usually a composite key that is
Fact:
The primary key of a fact table is usually a composite key that is made up of all of its foreign keys.A fact table might contain either detail level facts or facts thathave been aggregated (fact tables that contain aggregated factsare often instead called summary tables). A fact table usuallycontains facts with the same level of aggregation.
Definition : Qualifying characteristics that provide additionalperspective to a given fact.
Example: sales might be compared by product from region toregion and from one time period to the next.
Dimension
region and from one time period to the next.Here sales have product, location and time dimensions.Such dimensions are stored in DIMENSIONAL TABLE.
Definition : The dimensions of the fact table are furtherdescribed with dimension tables
Fact table:
Dimension Table
Sales (Market_id, Product_Id, Time_Id, Sales_Amt)Dimension Tables:
Market (Market_Id, City, State, Region)Product (Product_Id, Name, Category, Price)Time (Time_Id, Week, Month, Quarter)
• Definition: Star Schema is a relational database schema forrepresenting multidimensional data. It is the simplest form ofdata warehouse schema that contains one or more dimensionsand fact tables.
• It is called a star schema because the entity-relationship
What is Star Schema?
• It is called a star schema because the entity-relationshipdiagram between dimensions and fact tables resembles a starwhere one fact table is connected to multiple dimensions.
• The center of the star schema consists of a large fact tableand it points towards the dimension tables.
• The advantage of star schema are slicing down, performanceincrease and easy understanding of data.
Steps in designing Star Schema
ØIdentify a business process for analysis(like sales).
ØIdentify measures or facts (sales dollar).
ØIdentify dimensions for facts(product dimension, locationdimension, time dimension, organization dimension).dimension, time dimension, organization dimension).
ØList the columns that describe each dimension.(region name,branch name, region name).
ØDetermine the lowest level of summary in a fact table(salesdollar).
ØIn a star schema every dimension will have a primary key.
ØIn a star schema, a dimension table will not have any parent table.
Ø Whereas in a snow flake schema, a dimension table will have one or more parent tables.
Steps in designing Star Schema
ØHierarchies for the dimensions are stored in the dimensional table itself in star schema.
ØWhereas hierarchies are broken into separate tables in snow flake schema. These hierarchies helps to drill down the data from topmost hierarchies to the lowermost hierarchies.
Fact table provides salesstatistics broken down byproduct, period and storedimensions
Dimension tablescontain descriptions about subjects of the business
Star Schema Examples
1:N relationship between fact and dimension tables
Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins.
ØRepresent dimensional hierarchy directly by normalizing the dimension tables
ØEasy to maintain
ØSaves storage, but is alleged that it reduces effectiveness of
Snowflake Schema
ØSaves storage, but is alleged that it reduces effectiveness of browsing
ØA single , large and central fact table and one or more tables for each dimension.
ØDimension tables are normalized i.e. split dimension table data into additional tables.
Region Dim.
Region_id
City
Store Dim.
Store_id
Store Name
Sales Fact
Store_id
Product_id
Product Dim.
Product_id
Product Desc
Product Name
Product Line
Snowflake Schema Example
Time Dim.
Time_id
Year
Quarter
Month
City
State
Country
Store Name
Store Add.
Region id
Product_id
Time_id
measure
Product Line
Product Type
Drawbacks: Time consuming joins , report generation slow
Fact Constellation
Multiple fact tables that share many dimension tables
Booking and Checkout may share many dimension tables in the
Fact Constellation
Booking and Checkout may share many dimension tables in the hotel industry
This schema is viewed as collection of stars hence called galaxy schema or fact constellation.
Sophisticated application requires such schema.
Shipping Fact
Shipper Key
Store Key
Sales Fact
Store Key
Product Key
Product Dim.
Period Key
Product Desc
Product Name
Product Line
Fact Constellation Example
Product Key
Period Key
PriceStore Dim.
Store Key
Store Name
Store Add.
City
Product Key
Period Key
measure
Product Type
From the Data Warehouse to Data Marts
IndividuallyStructured
Less
Information
DepartmentallyStructured
Data WarehouseOrganizationallyStructured
More
HistoryNormalizedDetailed
Data
Reporting Fundamental Case Study
• DSS Books & Music is a new company which Sales books,music and videos items.
• There products are sold in different region of the world.
• They have sales units at Mumbai, Pune , Ahemdabad ,Delhi and Baroda.
• The President of the company wants sales information.
Sales Measures & Dimensions
• Measure – Units sold, Amount.
• Dimensions – Product ,Time , Region.
Sales Data Ware House Tables
Store Dimensions Table
Sales Data Ware House Tables
Region Dimensions Table
Sales Data Ware House Tables
Product Dimensions Table
Sales Data Ware House Tables
Time Dimensions Table
Sales Data Ware House Tables
Sales Fact Table
Sales Data Ware House Model
The product details which has minimum Amount Sales less than 50000 rupees.
Sales Information
The Top N Store details which has maximum Amount Sales.
Sales Information
sales by Store Type to determine which Store are generating the most revenue and the highest sales volume.
Sales Information
Contribution that each Country makes to revenue.
Sales Information
Questions
Thanks You
Contact Us: bisp.consulting@gmail.combispsolutions.wordpress.comlearnhyperion.wordpress.com