Multidimensional Databases Prof. Navneet Goyal Computer Science Department BITS, Pilani
Dec 18, 2015
Multidimensional Databases
Prof. Navneet Goyal
Computer Science Department
BITS, Pilani
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 2
Database Evolution
• Flat files• Hierarchical and Network• Relational• Distributed Relational• Multidimensional
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 3
Why Multi-Dimensional Databases?
• No single "best" data structure for all applications within an enterprise
• Organizations have abandoned the search for the HOLY GRAIL of globally accepted database
• Select the most appropriate data structure on a case-by-case basis from a palette of standard database structures
• Multidimensional Databases for OLAP?
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 4
Why Multi-Dimensional Databases?• From econometric research conducted at MIT in
the 1960s, the multidimensional database has matured into the database engine of choice for data analysis applications
• Inherent ability to integrate and analyze large volumes of enterprise data
• Offers a good conceptual fit with the way end-users visualize business data– Most business people already think about their
businesses in multidimensional terms– Managers tend to ask questions about product sales in
different markets over specific time periods
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 5
Multidimensional Database
Spreadsheets – A 2D database? Functionalities What about a stack of similar
spreadsheets for different times? Limitations?
We can not relate data in different sheets easily
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 6
Multidimensional Database
An MDDB is a computer software system designed to allow for the efficient and convenient storage and retrieval system of large volumes of data that is
1. Intimately related &
2. Stored, viewed and analyzed form different perspectives
These perspectives are called Dimensions
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 7
A Motivating Example
An automobile manufacturer wants to increase sale volumes by examining sales data collected throughout the organization. The evaluation would require viewing historical sales volume figures from multiple dimensions such as Sales volume by model Sales volume by color Sales volume by dealer Sales volume over time
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 8
Relational Structure
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 9
COLOR
MODEL
Mini Van
Sedan
Coupe
Red WhiteBlue
6 5 4
3 5 5
4 3 2
Sales Volumes
Multidimensional Array Structure
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 10
RDBMS vs. MDD
• Multidimensional array structure represents a higher level of organization than the relational table
• Perspectives are embedded directly into the structure in the multidimensional model
• All possible combinations of perspectives containing a specific attribute (the color BLUE, for example) line up along the dimension position for that attribute.
• Perspectives are placed in fields in the relational model - tells us nothing about field contents.
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 11
• MDD makes data browsing and manipulation intuitive to the end-user
• Any data manipulation action possible with a MDD is also possible using relational technology
• Substantial cognitive advantages in query formulation
• Substantial computational performance advantages in query processing when using MDD
RDBMS vs. MDD
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 12
RDBMS vs. MDD
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 13
Mutlidimensional Representation
Sales Volumes
DEALERSHIP
Mini Van
Coupe
Sedan
Blue Red White
MODEL
ClydeGleason
Carr
COLOR
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 14
Viewing Data - An Example
DEALERSHIP
Sales Volumes
MODEL
COLOR
Assume that each dimension has 10 positions, as shown in the cube above
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 15
Viewing Data - An Example
•How many records would be there in a relational table? •Implications for viewing data from an end-user standpoint?
MODEL COLOR DEALERSHIP VOLUMEMINI VAN BLUE CLYDE 2MINI VAN BLUE GLEASON 2MINI VAN BLUE CARR 2MINI VAN RED CLYDE 1MINI VAN WHITE GLEASON 3
•••RECORD NUMBER.... 998RECORD NUMBER.... 999RECORD NUMBER.... 1000
SALES VOLUMES FOR ALL DEALERSHIPS
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 16
• Volume figure when car type = SEDAN, color=BLUE, & dealer=GLEASON?
• RDBMS – all 1000 records might need to be searched to find the right record
• MDB has more ‘knowledge’ about where the data lies
• Max. of 30 position searches!!
• Average case15 vs. 500
Performance Advantages
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 17
• Total Sales across all colors and dealers when model = SEDAN?
• RDBMS – all 1000 records must be searched to get the answer
• MDB – Sum the contents of one 10x10 ‘slice’
Performance Advantages
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 18
• Data manipulation that requires a minute in RDBMS may require only a few seconds in MDB
• MDBs are an order of magnitude faster than RDBMSs
• Performance benefits are more for queries that generate cross-tab views of data
• The performance advantages offered by multidimensional technology facilitates the development of interactive decision support applications like OLAP that can be impractical in a relational environment.
Performance Advantages
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 19
• Any data manipulation action possible with a multidimensional database is also possible using relational technology
• MDBs however offer several advantages like:–Ease of data presentation and navigation
–Ease of maintenance
–Performance
RDBMS vs. MDB
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 20
• Intuitive spreadsheet like data views are natural output of MDBs
• Obtaining the same views in a relational environment, requires either a complex SQL or a SQL generator against a RDB to convert the table outputs into a more intuitive format
• Top N queries are not possible with SQL at all
Ease of Data Presentation & Navigation
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 21
• Ease of maintenance because data is stored as it is viewed
• No additional overhead is required to translate user queries into requests for data
• To provide same intuitiveness, RDBs use indexes and sophisticated joins which require significant maintenance and storage
Ease of Maintenance
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 22
• Performance of MDBs can be matched by RDBs through database tuning
• Not possible to tune the database for all possible adhoc queries
• Tuning requires resources of an expensive DB specialist
• Aggregate navigators are helping RDBs to catch up with MDBs as far as aggregation queries are concerned
Performance
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 23
Adding Dimension - An Example
MODEL
Mini Van
Coupe
Sedan
Blue Red White
ClydeGleason
Carr
COLOR
Sales Volumes
Coupe
Sedan
Blue Red White
ClydeGleason
Carr
COLOR
DEALERSHIP
Mini Van
Coupe
Sedan
Blue Red White
ClydeGleason
Carr
COLOR
JANUARY FEBRUARY MARCH
Mini Van
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 24
When is MDD (In)appropriate?
PERSONNEL
LAST NAMEEMPLOYEE# EMPLOYEE AGESMITH 01 21REGAN 12 19FOX 31 63WELD 14 31KELLY 54 27LINK 03 56KRANZ 41 45LUCUS 33 41WEISS 23 19
First, consider situation 1
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 25
Now consider situation 2
SALES VOLUMES FOR GLEASON DEALERSHIP
MODEL COLOR VOLUME
MINI VAN BLUE 6MINI VAN RED 5MINI VAN WHITE 4SPORTS COUPE BLUE 3SPORTS COUPE RED 5SPORTS COUPE WHITE 5SEDAN BLUE 4SEDAN RED 3SEDAN WHITE 2
1. Set up a MDD structure for situation 1, with LAST NAMEand Employee# as dimensions, and AGE as the measurement.2. Set up a MDD structure for situation 2, with MODEL andCOLOR as dimensions, and SALES VOLUME as the measurement.
When is MDD (In)appropriate?
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 26
When is MDD (In)appropriate?
COLOR
MODEL
Mini Van
Sedan
Coupe
Red WhiteBlue
6 5 4
3 5 5
4 3 2
Sales Volumes
EMPLOYEE #
LAST
NAME
Kranz
Weiss
Lucas
41 3331
45
19
Employee Age
41
31
56
63
21
19
Smith
Regan
Fox
Weld
Kelly
Link
01 14 54 03 1223
27
Note the sparse between the two MDD representations
MDD Structures for the Situations
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 27
When is MDD (In)appropriate?
Our sales volume dataset has a great number of meaningful interrelationships
Interrelationships more meaningful than individual data elements themselves.
The greater the number of inherent interrelationships between the elements of a dataset, the more likely it is that a study of those interrelationships will yield business information of value to the company.
Highly interrelated dataset types be placed in a multidimensional data structure for greatest ease of access and analysis
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 28
When is MDD (In)appropriate?
• No last name is matching with more than one emp # and no emp # is matching with more than one last name
• In contrast, there is a sales figure associated with every combination of model and color resulting in a completed filled up 3x3 matrix
• Performance suffers (RDB 9 vs. MDB 18)
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 29
When is MDD (In)appropriate?
• The relative performance advantages of storing multidimensional data in a multidimensional array increase as the size of the dataset increases
• The relative performance disadvantages of storing non-multidimensional data in a multidimensional array increase as the size of the dataset increases.
• NO inherent value of storing Non-multidimensional data (employee data) in multidimensional arrays
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 30
When is MDD Appropriate?
The greater the number of inherent interrelationships between the elements of a dataset, the more likely it is that a study of those interrelationships will yield business information of value to the company.
• Most companies have limited time and resources to devote to analyzing data
• It therefore becomes critical that these highly interrelated dataset types be placed in a multidimensional data structure for greatest ease of access and analysis.
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 31
When is MDD Appropriate?
Examples of applications that are suited formultidimensional technology:
1. Financial Analysis and Reporting2. Budgeting3. Promotion Tracking4. Quality Assurance and Quality Control5. Product Profitability
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 32
MDD Features - Rotation
Sales Volumes
COLOR
MODEL
Mini Van
Sedan
Coupe
Red WhiteBlue
6 5 4
3 5 5
4 3 2
MODEL
COLOR
SedanCoupe
Red
White
Blue 6 3 4
5 5 3
4 5 2( ROTATE 90
o )
View #1: ModelxColor View #2: ColorxModel
Mini Van
•Also referred to as “data slicing.”•Each rotation yields a different slice or two dimensional tableof data.
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 33
MDD Features - Rotation
COLORCOLORMODEL
MODELDEALERSHIPDEALERSHIP
MODEL
Mini Van
Coupe
Sedan
Blue Red White
ClydeGleason
Carr
COLOR
Mini Van
Blue
Red
WhiteClyde
GleasonCarr
MODEL
Mini Van
Coupe
Sedan
Blue
Red
White
Carr
COLOR
COLOR
DEALERSHIP
View #1 View #2 View #3
DEALERSHIP
Mini Van
CoupeSedan
BlueRedWhite
Clyde
Gleason
Carr
Mini Van Coupe Sedan
Blue
RedWhite
Clyde
Gleason
Carr Mini Van
Coupe
SedanBlue
RedWhite
Clyde Gleason Carr
View #4 View #5 View #6
DEALERSHIP
CoupeSedan
( ROTATE 90o
) ( ROTATE 90o
) ( ROTATE 90o
)
COLOR MODEL
MODEL
DEALERSHIP( ROTATE 90
o ) ( ROTATE 90
o )
Gleason Clyde
Sales Volumes
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 34
MDD Features - Rotation
• All the six views can be obtained by simple rotation
• In MDBs rotations are simple as no rearrangement of data is required
• Rotation is also referred to as “data slicing”
• No. of views
•2D – 2
•3D – 6
•4D - ?24
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 35
MDD Features - Ranging
• How sales volume of models painted with new metallic blue compared with the sales of normal blue color models?
• The user knows that only Sports Coupe and Mini Van models have received the new paint treatment
• Also the user knows that only 2 dealers viz, Carr and Clyde have unconstrained supply of these models
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 36
MDD Features - Ranging
• The end user selects the desired positions along each dimension.• Also referred to as "data dicing." • The data is scoped down to a subset grouping
Sales Volumes
DEALERSHIP
Mini Van
Coupe
Metal Blue
MODEL
ClydeCarr
COLOR
Normal Blue
Mini Van
Coupe
Normal Blue
Metal Blue
ClydeCarr
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 37
MDD Features - Ranging
• The reduced array can now be rotated and used in computations in the same was as the parent array
• Referred to as “Data Dicing” as data is scoped down to a subset grouping
• Complex SQL query is required in RDB
• Performance is better in MDB as less resource consuming searches are required
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 38
MDD Features – Roll-Up & Drill-Down
• Users want different views of the same data• For eg., Sales Volume by model vs, sales
volume by dealership• Many times views are similar
Sales volume by dealership vs. volume by district
• Natural relationship between Sales Volumes at the DEALERSHIP level and Sales Volumes at the DISTRICT level
• Sales Volumes for all the dealerships in a district sum to the Sales Volumes for that district
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 39
MDD Features – Roll-Up & Drill-Down
• Multidimensional database technology is specially designed to facilitate the handling of these natural relationships
• Define two related aggregates on the same dimension
• One aggregation is dealership and the other district
• District is at a higher level of aggregation than dealership
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 40
MDD Features - Roll-Ups & Drill Downs
Gary
Gleason Carr Levi Lucas Bolton
Midwest
St. LouisChicago
Clyde
REGION
DISTRICT
DEALERSHIP
ORGANIZATION DIMENSION
• The figure presents a definition of a hierarchy within the organization dimension.• Aggregations perceived as being part of the same dimension.•Moving up and moving down levels in a hierarchy is referred to as “roll-up” and “drill-down.”
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 41
MDD Features - Roll-Ups & Drill Downs
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 42
MDD Features: Drill-Down Through a Dimension
GaryGleason Carr Levi Lucas Bolton MidwestSt. Louis ChicagoClyde
REGION
DISTRICT
DEALERSHIP
MODEL
COLOR
Sales Volumes
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 43
Queries
• High degree of structure in MDB makes the query language very simple and efficient
• Query language is intuitive• Output is immediately useful
to end user
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 44
Queries: Example
• Display sales volume by model for each dealershipPRINT TOTAL.(SALES_VOLUME KEEP MODEL DEALERSHIP)
Trends emerge and comparisons are easily made
DEALERSHIPMODEL CLYDE GLEASON CARRMINI VAN 7 5 6SPORTS COUPE 4 6 8
SEDAN 3 8 12
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 45
Queries: Example
• Corresponding SQLSELECT MODEL, DEALERSHIP, SUM(SALES_VOLUME)
FROM SALES_VOLUME
GROUP BY MODEL, DEALERSHIP
ORDER BY MODEL, DEALERSHIPMODEL | DEALERSHIP | SUM(SALES_VOLUME)
MINI VAN | CLYDE | 7MINI VAN | GLEASON | 5MINI VAN | CARR | 6SPORTS COUPE| CLYDE | 4SPORTS COUPE| GLEASON | 6SPORTS COUPE| CARR | 8SEDAN | CLYDE | 3SEDAN | GLEASON | 8
SEDAN | CARR | 12
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 46
Queries: ExampleUse report writer in addition to SQL and we getMINI VAN
CLYDE 7
GLEASON 5
CARR 6
SPORTS COUPE
CLYDE 4
GLEASON 5
CARR 8
SEDAN
CLYDE 3
GLEASON 8
CARR 12
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 47
MDD Features:Multidimensional Computations
• Well equipped to handle demanding mathematical functions.
• Can treat arrays like cells in spreadsheets. For example, in a budget analysis situation, one can divide the ACTUAL array by the BUDGET array to compute the VARIANCE array.
• Applications based on multidimensional database technology typically have one dimension defined as a "business measurements" dimension.
• Integrates computational tools very tightly with the database structure.
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 48
MDD Features:Multidimensional Computations
BUSINESS MEASUREMENTS
Mini Van
Coupe
Actual Budget Variance
16 12 0.33
11 10 0.1
8 10 - 0.2
16 16 0.0
Sedan
Sales Volumes
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 49
The Time Dimension
• TIME as a predefined hierarchy for rolling-up and drilling-down across days, weeks, months, years and special periods, such as fiscal years.– Eliminates the effort required to build sophisticated
hierarchies every time a database is set up.
– Extra performance advantages
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 50
Contrasting Relational Model and MD Model
Criteria Relational Model MultidimensionalDatabases
Focus Data integrity of each pieceof data
Facilitate exploration ofinterrelationships betweendimensions
Organization structure One-dimensional array Multi-dimensional arraysPerspectives Embedded in fields Embedded directly in
MDDB structureComputational power forQuery processing
Joining tables oftenrequired; computationallyexpensive
Structure designed forOLAP; computationallycheap
Cognitive issues inquerying data
Cumbersome Intuitive
Query Languages SQL or SQL front-ends,such as QBE
Point-and-click emphasisNo standardized language
Management of TimeDimension
Not well suited Well suited
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 51
RDBMS vs. MDDB
• DO I still use RDBMS for my DW?• MDDBs store data in hypercube, i.e.,
multidimensional array• RDBMS store data as tables with row and
columns that do not map directly to multidimensional view that user have of data
• EDW – RDBMS• Data Marts - MDDB
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 52
RDBMS vs. MDDB: Trade-Offs
• SIZE – MDDBs limited by size
• Mid – 1990s 10GB caused problems• Today – 100GB is OK
– Large DWs are still better served by relational front-ends running against high performance and scalable RDBMS
• VOLATILITY– Highly volatile data are better handled by
RDBMS– MDDBs take long to load and update
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 53
RDBMS vs. MDDB: Trade-Offs
• AGGREGATE STRATEGY– MDDBs support aggregates better– RDBMSs are catching up with the help of
Aggregate Navigators• INVESTMENT PROTECTION
– Most organizations already have made significant investments in relational technology and skill sets
– Continued use for another purpose (DW) provides additional ROI and lowers technical risk of failure
– MDDBs – need to acquire new software and train staff to use it
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 54
RDBMS vs. MDDB: Trade-Offs
TYPE OF USERS– Power users prefer the range of
functionalities available in MOLAP tools– Users that require broad views of enterprise
data require access to DW and therefore better served by a ROLAP tool
April 18, 2023 Dr. Navneet Goyal, BITS, Pilani 55
INTEGRATED ARCHITECTURE
• DB vendors have integrated their multidimensional and relational database products
• Multidimensional Front-end tools• If queries require data that are not
available in MDDB, the tools retrieve the data from the larger RDB
• Known as “DRILL-THROUGH”
Q & A
Thank You