8/6/2019 DWH Spring 2011 Lecture Slides Week6&7
http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 1/18
Dr. Abdul Basit Siddiqui
FUIEMS
(Lecture Slides Weeks # 6&7)
8/6/2019 DWH Spring 2011 Lecture Slides Week6&7
http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 2/18
8/6/2019 DWH Spring 2011 Lecture Slides Week6&7
http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 3/18
The need for ER modeling?
y Problems with early COBOLian data processingsystems.
y Data redundancies
y From flat file to Table, each entity ultimately
becomes a Table in the physical schema.
y Simple O(n2) Join to work with Tables
Data Warehousing - Spring2011FUIEMS
8/6/2019 DWH Spring 2011 Lecture Slides Week6&7
http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 4/18
Why ER Modeling has been so successful?
y Coupled with normalization drives out all theredundancy out of the database.
y Change (or add or delete) the data at just one point.
y Can be used with indexing for very fast access.
y Resulted in success of OLTP systems.
Data Warehousing - Spring2011FUIEMS
8/6/2019 DWH Spring 2011 Lecture Slides Week6&7
http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 5/18
Need for DM: Un-answered Qs
y Lets have a look at a typical ER data model first.
y Some Observations:
y All tables look-alike, as a consequence it is difficult to identify:
y Which table is more important ?
y Which is the largest?
y
Which tables contain numerical measurements of thebusiness?
y Which table contain nearly static descriptive attributes?
Data Warehousing - Spring2011FUIEMS
8/6/2019 DWH Spring 2011 Lecture Slides Week6&7
http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 6/18
Need for DM: Complexity of Representation
y Many topologies for the same ER diagram, all appearingdifferent.
y Very hard to visualize and remember.
y A large number of possible connections to any two (ormore) tables
Data Warehousing - Spring2011
110
3
12
2
6
5
11 4
7
89
1
10
3
12
2
6
5
11
4
78
9
FUIEMS
8/6/2019 DWH Spring 2011 Lecture Slides Week6&7
http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 7/18
Need for DM: The Paradox
y The Paradox: Trying to make information accessible using tablesresulted in an inability to query them!
y ER and Normalizationresult in large number of tables which are:y Hard to understand by the users (DB programmers)
y Hard to navigate optimally by DBMS software
y Real value of ER is in using tables individuallyor in pairs
y Too complex for queries that span multiple tables with a largenumber of records
Data Warehousing - Spring2011FUIEMS
8/6/2019 DWH Spring 2011 Lecture Slides Week6&7
http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 8/18
ER vs. DMER
y Constituted to optimize OLTPperformance.
y Models the microrelationships among dataelements.
y A wild variability of thestructure of ER models.
y Very vulnerable to changes inthe user's querying habits,because such schemas areasymmetrical.
DMy Constituted to optimize DSS
query performance.
y Models the macrorelationships among dataelements with an overalldeterministic strategy.
y All dimensions serve as equalentry points to the fact table.
y Changes in users' queryinghabits can be accommodatedbyautomatic SQL generators.
Data Warehousing - Spring2011FUIEMS
8/6/2019 DWH Spring 2011 Lecture Slides Week6&7
http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 9/18
ow to simplify a ER data model?
yTwo general methods:
y De-Normalization
y
Dimensional Modeling (DM)
Data Warehousing - Spring2011FUIEMS
8/6/2019 DWH Spring 2011 Lecture Slides Week6&7
http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 10/18
What is DM?
y A simpler logical model optimized for decisionsupport.
y Inherently dimensional in nature, with a single
central fact table and a set of smallerdimensional tables.
yMulti-part key for the fact table
y
Dimensional tables with a single-part PK.y Keys are usually system generated
Data Warehousing - Spring2011FUIEMS
8/6/2019 DWH Spring 2011 Lecture Slides Week6&7
http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 11/18
What is DM?
y Results in a star like structure, called star schema orstar join.
y
All relationships mandatory M-1.
y Single path between any two levels.
y
Supports ROLAP operations.
Data Warehousing - Spring2011FUIEMS
8/6/2019 DWH Spring 2011 Lecture Slides Week6&7
http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 12/18
Dimensions have Hierarchies
Data Warehousing - Spring2011
Items
Books Cloths
Fiction Text Men Women
MedicalEngg
Analysts tend to look at the data through Analysts tend to look at the data through
dimension at a particular ³level´ in thedimension at a particular ³level´ in the
hierarchyhierarchy
FUIEMS
8/6/2019 DWH Spring 2011 Lecture Slides Week6&7
http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 13/18
The two Schemas
Data Warehousing - Spring2011
Star
Snow-flake
FUIEMS
8/6/2019 DWH Spring 2011 Lecture Slides Week6&7
http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 14/18
Simplified 3NF (Retail)
Data Warehousing - Spring2011
&,7< ',675,&7
=21( &,7<',675,&7 ',9,6,21
0217+ 475
6725( 675((7 =21(
:((. 0217+
'$7( :((.
5(&(,37 6725( '$7(
,7(05(&(,37
,7(0 &$7(*25<,7(0
'(37&$7(*25<
\HDU
PRQWK
ZHHN
VDOHBKHDGHU
VWRUH
VDOHBGHWDLO
LWHPB[BFDWLWHPB[BVSOLU
FDWB[BGHSW
0
0
0
0
0 0
0
0 0
0
0
<($5 475
0
TXDUWHU
6833/,(5
',9,6,21 3529,1&(0
GLYLVLRQ
GLVWULFW
]RQH
FUIEMS
8/6/2019 DWH Spring 2011 Lecture Slides Week6&7
http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 15/18
Vastly Simplified Star Schema
Data Warehousing - Spring2011
RECEIPT#
STORE#
DATE
ITEM# M
Fact Table
ITEM#
CATEGORY
DEPT
SUPPLIER
Product Dim
M
Sale Rs.
M
STORE#
ZONE
CITY
PROVINCE
Geography Dim
DISTRICT
DATE
WEEK
QUARTER
YEAR
Time Dim
MONTH
.
.
.
1
11
facts
DIVISION
FUIEMS
8/6/2019 DWH Spring 2011 Lecture Slides Week6&7
http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 16/18
The Benefit of Simplicity
Data Warehousing - Spring2011
Beauty lies in close
correspondence with thebusiness, evident even to
business users.
FUIEMS
8/6/2019 DWH Spring 2011 Lecture Slides Week6&7
http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 17/18
Features of Star Schema
Dimensional hierarchies are collapsed into a single tablefor each dimension. Loss of Information?
A single fact table created with a single header from the
detail records, resulting in:
y A vastly simplified physical data model!
y Fewer tables (thousands of tables in some ERP systems).
y Fewer joins resulting in high performance.
y Some requirement of additional space.
Data Warehousing - Spring2011FUIEMS
8/6/2019 DWH Spring 2011 Lecture Slides Week6&7
http://slidepdf.com/reader/full/dwh-spring-2011-lecture-slides-week67 18/18
Quantifying space requirement
Quantifying use of additional space using star schema
There are about 10 million mobile phone users in Pakistan.Say the top company has half of them = 500,000
Number of days in 1 year = 365Number of calls recorded each day = 250,000 (assumed)Maximum number of records in fact table = 91 billion rows Assuming a relatively small header size = 128 bytesFact table storage used = 11 Tera bytes Average length of city name = 8 characters } 8 bytesT
otal number of cities with
teleph
one access = 170 (1 byte)Space used for city name in fact table using Star = 8 x 0.091 = 0.728 TBSpace used for city code using snow-flake = 1x 0.091= 0.091 TB Additional space used } 0.637 Tera byte i.e. about 5.8%
Data Warehousing - Spring2011FUIEMS