Data Cube and OLAP Server Madhavi Gundavarapu
Jan 02, 2016
Madhavi Gundavarapu
Data Cube and OLAP Server 2
Outline
• What is Data Analysis?
• Steps in Data Analysis
• SQL-92 Aggregate Functions
• Limitations of GROUP BY
• OLAP Server
• CUBE Operator
• ROLLUP Operator
Madhavi Gundavarapu
Data Cube and OLAP Server 3
What is Data Analysis?
• User issues a query, receives a response and formulates the next query based on the response
• This process repeats until the user gets the required result
• Fundamentally an iterative process
DATA ANALYSIS
query
exactresponse
Madhavi Gundavarapu
Data Cube and OLAP Server 4
Why Data Analysis?
• Search for unusual patterns of data
• Summarize data values
• Extract statistical information
• Contrast one category with another
• Provide a consolidated view of enterprise data buried in OLTP databases – Help Decision makers understand business trends
• Derive intelligible results from ad hoc, voluminous and scattered data
Madhavi Gundavarapu
Data Cube and OLAP Server 5
Steps in Data Analysis
• Formulate query
• Extract aggregated data
• Visualize results • Analyze
Analyze &Formulate
Visualize
Extract
19901991
1992ALL
Red
Blue0
50
100
150
200 150-200
100-150
50-100
0-50
Madhavi Gundavarapu
Data Cube and OLAP Server 6
• SQL has several aggregate operators:– sum(), count(), avg(), min(), max()
• The basic idea is:– Combine all values in a column
– into a single scalar value
• Syntax– SELECT sum(units)
FROM inventory;
SUM()
Overview of SQL-92
Madhavi Gundavarapu
Data Cube and OLAP Server 7
Overview of SQL-92 (contd.): Distinct Clause
•DISTINCT– Allows aggregation over distinct values
– Example
SELCT COUNT(DISTINCT locations) FROM inventory;
Madhavi Gundavarapu
Data Cube and OLAP Server 8
Overview of SQL-92 (contd.): GROUP BY Clause
• Group By allows aggregates over table sub-groups
• Result is a new table
• Syntax:
SELCT location, sum(units)FROM inventoryGROUP BY locationHAVING nation = “USA”;
TableSUM()
A
B
C
D
attributeA A A B B B B B C C C C C D D
Madhavi Gundavarapu
Data Cube and OLAP Server 9
• Users want CrossTabs – GROUP BY is limited to 0-D and 1-D aggregates
• Users want sub-totals and totals– drill-down & roll-up reports
sum
M T W T F S S AIR
HOTEL
FOOD
MISC
Limitations of GROUP BY
Madhavi Gundavarapu
Data Cube and OLAP Server 10
Multidimensional Data• Measure Attributes
• Dimension Attributes
• ExampleItem-name Color Size NumberSkirt Dark Large 10Skirt Pastel Large 20Skirt White Large 15… … … …
Model Year Color SalesChevy 1990 Red 5Chevy 1990 White 87Chevy 1990 Blue 62… … … …
Madhavi Gundavarapu
Data Cube and OLAP Server 11
OLAP System
• On-Line Analytical Processing System
• Interactive system
• Permits analysts to view summaries of multidimensional data
• On-Line indicates– No long waits to see result of a query– response times within a few seconds for new
summaries
• View data at different levels of granularity
Madhavi Gundavarapu
Data Cube and OLAP Server 12
SQL:1999 OLAP Extensions
• SQL-92 functionality was limited
• SQL:1999 standard defines
– CUBE
– ROLLUP
– as generalizations of GROUP BY clause
Madhavi Gundavarapu
Data Cube and OLAP Server 13
CUBE : Relational Aggregate Operator
CHEVY
FORD 19901991
19921993
REDWHITEBLUE
By Color
By Make & Color
By Make & Year
By Color & Year
By MakeBy Year
Sum
The Data Cube and The Sub-Space Aggregates
REDWHITE
BLUE
Chevy Ford
By Make
By Color
Sum
Cross TabRED
WHITE
BLUE
By Color
Sum
Group By (with total)Sum
Aggregate
•N-dimensional generalization of simple aggregate functions
Madhavi Gundavarapu
Data Cube and OLAP Server 14
CUBE : The Idea
• 0-dimensional Aggregate (sum(), max(),...)• a1, a2, ...., aN, f()
• Super-aggregate over 1-Dimensional sub-cubes• ALL, a2, ...., aN , f()
• a1, ALL, a3, ...., aN , f()
• ...
• a1, a2, ...., ALL, f()
• Super-aggregate over 2-Dimensional sub-cubes• ALL, ALL, a3, ...., aN , f()
• ...
• a1, a2 ,...., ALL, ALL, f()
Madhavi Gundavarapu
Data Cube and OLAP Server 15
An ExampleChevy Sales Cross Tab
Chevy 1990 1991 1992 Total (ALL)
black 50 85 154 289white 40 115 199 354 Total(ALL)
90 200 353 1286
SELECT model, year, color, sum(sales) as sales
FROM sales
WHERE model in (‘Chevy’)
AND year BETWEEN 1990 AND 1992
GROUP BY CUBE (model, year, color);
Madhavi Gundavarapu
Data Cube and OLAP Server 16
CUBE Contd.
SELECT model, year, color, sum(sales) as sales
FROM sales
WHERE model in (‘Chevy’)
AND year BETWEEN 1990 AND 1992
GROUP BY CUBE (model, year, color);
• Computes union of 8 different groupings:
– {(model, year, color), (model, year), (model, color), (year, color), (model), (year), (color), ()}
Madhavi Gundavarapu
Data Cube and OLAP Server 17
Example Contd.
SALES Model Year Color Sales Chevy 1990 red 5 Chevy 1990 white 87 Chevy 1990 blue 62 Chevy 1991 red 54 Chevy 1991 white 95 Chevy 1991 blue 49 Chevy 1992 red 31 Chevy 1992 white 54 Chevy 1992 blue 71 Ford 1990 red 64 Ford 1990 white 62 Ford 1990 blue 63 Ford 1991 red 52 Ford 1991 white 9 Ford 1991 blue 55 Ford 1992 red 27 Ford 1992 white 62 Ford 1992 blue 39
DATA CUBE Model Year Color Sales ALL ALL ALL 942 chevy ALL ALL 510 ford ALL ALL 432 ALL 1990 ALL 343 ALL 1991 ALL 314 ALL 1992 ALL 285 ALL ALL red 165 ALL ALL white 273 ALL ALL blue 339 chevy 1990 ALL 154 chevy 1991 ALL 199 chevy 1992 ALL 157 ford 1990 ALL 189 ford 1991 ALL 116 ford 1992 ALL 128 chevy ALL red 91 chevy ALL white 236 chevy ALL blue 183 ford ALL red 144 ford ALL white 133 ford ALL blue 156 ALL 1990 red 69 ALL 1990 white 149 ALL 1990 blue 125 ALL 1991 red 107 ALL 1991 white 104 ALL 1991 blue 104 ALL 1992 red 59 ALL 1992 white 116 ALL 1992 blue 110
CUBE
Madhavi Gundavarapu
Data Cube and OLAP Server 18
GROUPING Function
• SQL:1999 uses NULL to represent both ALL and regular null values
• GROUPING function
– Can be applied to an attribute
– Returns 1 if NULL value represents ALL
– Returns 0 in all other cases
Madhavi Gundavarapu
Data Cube and OLAP Server 19
GROUPING Example
SELECT model, year, color, sum(sales) as sales,
GROUPING(model) as model_flag,
GROUPING(year) as year_flag,
GROUPING(color) as color_flag
FROM sales
WHERE model in (‘Chevy’)
AND year BETWEEN 1990 AND 1992
GROUP BY CUBE (model, year, color);
Madhavi Gundavarapu
Data Cube and OLAP Server 20
Rollup and Drill down• Allow analysts to view data at any desired
level of granularity
• Rollup – Operation of moving from finer-granularity of
data to a coarser granularity
• Drill Down– Operation of moving from coarser-granularity
of data to a finer granularity– Cannot be generated from coarse-granularity
data– Has to be computed from original data
Madhavi Gundavarapu
Data Cube and OLAP Server 21
ROLLUP Operator
• Rollup example
SELECT model, year, color, sum(sales) as sales
FROM sales
WHERE model in (‘Chevy’)
AND year BETWEEN 1990 AND 1992
GROUP BY ROLLUP (model, year, color);
• Only 4 groupings are generated
– {(model, year, color), (model, year), (model), ()}
Madhavi Gundavarapu
Data Cube and OLAP Server 22
Summary
• SQL-92 has limited functionality to support OLAP operations
• SQL:1999 has introduced extensions to address these limitations– provides operators such as CUBE, GROUPING and ROLLUP