Top Banner
Introduction to OLAP and Dimensional Modelling Tuesday
29

Introduction to OLAP and Dimensional Modelling Tuesday.

Dec 11, 2015

Download

Documents

Devyn Volante
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to OLAP and Dimensional Modelling Tuesday.

Introduction to OLAP and Dimensional Modelling

Tuesday

Page 2: Introduction to OLAP and Dimensional Modelling Tuesday.

Overview: Tuesday

Format Time DescriptionLecture 10:00 - 10:45 Introduction to OLAP and

Analysis ServicesDemo 10:45 - 11:30 Dimensional modellingLab 12:15 - 13:00 Practical session: Defining a

data source and defining and deploying a cube

Lab 13:00 - 13:45 Practical session: Modifying measures, attributes and hierarchies

Lecture 14:30 - 15:15 Observations about design for OLAP and Reporting

Discussion 15:15 - 16:00 Wrap-up: questions and feedback

Page 3: Introduction to OLAP and Dimensional Modelling Tuesday.

Definition of OLAP

• Fast Analysis of Shared Multidimensional Information (‘FASMI’, Nigel Pendse)– Fast– Analysis (statistical and business logic)– Shared– Multidimensional– Information (all of the data and derived

information needed)

Page 4: Introduction to OLAP and Dimensional Modelling Tuesday.

Multidimensional

• “The system must provide a multidimensional conceptual view of the data, including full support for hierarchies and multiple hierarchies, as this is certainly the most logical way to analyze businesses and organizations.”

Page 5: Introduction to OLAP and Dimensional Modelling Tuesday.

Alternative definition of OLAP (from SAS)

• OLAP is "fast access to large amounts of summarized data".

• This implies the concept of dimensionality. For without dimensions, there would be nothing to summarize the data by.

• Alternative definition is that OLAP provides: "the ability of users to conveniently interrogate

large amounts of data, at varying levels of detail, across a variety of combinations of business dimensions"

Page 6: Introduction to OLAP and Dimensional Modelling Tuesday.

Kimball’s Four-Step Design Process

1. Select a business process

2. Declare the grain

3. Choose dimensions

4. Identify facts

Page 7: Introduction to OLAP and Dimensional Modelling Tuesday.

STEP 1: Select a business process

• For our exercise, we will be looking at Internet sales

Page 8: Introduction to OLAP and Dimensional Modelling Tuesday.

A Quick Look at the Data (1)

USE AdventureWorksDW;

SELECT TOP 5

CustomerKey,

ProductKey, OrderDateKey,

OrderQuantity

FROM FactInternetSales

Page 9: Introduction to OLAP and Dimensional Modelling Tuesday.

A Quick Look at the Data (2)

Customer Product OrderDate SalesAmount

Key Key Key

----------- ---------- ------------ ---------------------

11003 346 1 3399.99

14501 336 1 699.0982

21768 310 1 3578.27

25863 346 1 3399.99

28389 346 1 3399.99

Page 10: Introduction to OLAP and Dimensional Modelling Tuesday.

STEP 2: Declare the grain

• (What does a row in the fact table mean?)

• In our example, a row is an individual order.

• Design “rule”: recognise the trade-off.– A finer grain facilitates more detailed analysis,

but results in a larger quantity of data.

Page 11: Introduction to OLAP and Dimensional Modelling Tuesday.

STEP 3: Choose dimensions

• DimCustomer

• DimGeography

• DimProduct

• DimTime

Page 12: Introduction to OLAP and Dimensional Modelling Tuesday.

STEP 4: Identify facts

• The numeric facts that we will measure

• FactInternetSales

Page 13: Introduction to OLAP and Dimensional Modelling Tuesday.

DimCustomerCustomerKey

GeographyKey

CustomerAlternateKey

Title

FirstName

MiddleName

LastName

DimGeographyGeographyKey

City

StateProvinceCode

StateProvinceName

CountryRegionCode

EnglishCountryRegionName

SpanishCountryRegionNa...

FrenchCountryRegionName

DimProductProductKey

ProductAlternateKey

ProductSubcategoryKey

WeightUnitMeasureCode

SizeUnitMeasureCode

EnglishProductName

SpanishProductName

FrenchProductName

StandardCost

FinishedGoodsFlag

Color

SafetyStockLevel

ReorderPoint

ListPrice

DimTimeTimeKey

FullDateAlternateKey

DayNumberOfWeek

EnglishDayNameOfWeek

SpanishDayNameOfWeek

FrenchDayNameOfWeek

DayNumberOfMonth

FactInternetSalesProductKey

OrderDateKey

DueDateKey

ShipDateKey

CustomerKey

PromotionKey

CurrencyKey

SalesTerritoryKey

Page 14: Introduction to OLAP and Dimensional Modelling Tuesday.

Demo

• Dimensional modelling

Page 15: Introduction to OLAP and Dimensional Modelling Tuesday.

Lab

• Defining a data source view

• Defining and deploying a cube

Page 16: Introduction to OLAP and Dimensional Modelling Tuesday.

Lab

• Practical session: Modifying measures, attributes and hierarchies

Page 17: Introduction to OLAP and Dimensional Modelling Tuesday.

Lecture: Observations about design for OLAP and Reporting

Page 18: Introduction to OLAP and Dimensional Modelling Tuesday.

The BI Bottleneck (1)

• Report consumers– The report may be electronic, e.g. Excel

• Power users– Capable of some self-service

• Report authors– The know the data and the business.

• Reporting administrator– They know the database and the data, but not necessarily how it

relates to the business.

• Challenge: make reporting more interactive so that changes can be accommodated without passing along the chain

Page 19: Introduction to OLAP and Dimensional Modelling Tuesday.

The BI bottleneck (2)

• Typically, analyst’s time is the scarce resource.– The number of iterations is the killer.

• Sometimes, testing is the bottleneck.

• Possible solution: the analyst spends a bit more time in the first iteration providing the business user with a more generic/interactive report.

Page 20: Introduction to OLAP and Dimensional Modelling Tuesday.

The BI Bottleneck (3)

• Long lead times

• High development costs

• Apparently small changes to a requirement for a report take a long time to implement.

• For each link along the chain that a request for a change needs to go, delay goes up by a big factor.

Page 21: Introduction to OLAP and Dimensional Modelling Tuesday.

The Relational Model of Data

• Conceptually, homogeneous tabular structure:– Logic: for declarative query language– Algebra: for query optimization– Application interface (e.g. simple reporting tools).– Applications designers and even some end-users can

(just about) understand tables.

• Relational model provided a mutually intelligible language for implementers, administrators, developers, researchers and even users.

• Flexible: join anything with anything (c.f. OLAP).

Page 22: Introduction to OLAP and Dimensional Modelling Tuesday.

Inadequacy of the Relational Model for Reporting applications

• Heterogeneous data sources:– Database, OLAP, XML Web services, etc.

• Relational model does not fit well with the area between storage and presentation.– Aggregation hierarchies– Matrix structures

Page 23: Introduction to OLAP and Dimensional Modelling Tuesday.

The Microsoft approach: UDM

• Server Analysis Services 2005 implements UDM.

• Acts as a bridge between users and their data. • Encapsulates semantics, language and time. • UDM ‘perspectives’ allow the user to view

subsets. • Integrated with Data Mining. • Accessed via SOAP and XML for Analysis.

Page 24: Introduction to OLAP and Dimensional Modelling Tuesday.

UDM• A UDM provides a single dimensional model for all OLAP analysis and

relational reporting needs. So you can use either MDX or SQL • Perspectives are the new data marts • Cubes are largely transparent concepts downgraded to the status of caches • Commonly you’ll only have 1 cube with multiple measure groups and

multiple perspectives. • Its better to think of measure groups instead of cubes; partitions now apply

to measure groups. • Whilst a UDM can gather data from numerous data sources, the need to

cleanse data still requires a data warehouse. • A cube is structured around dimensional attributes (previously known as

member properties) rather than dimensional hierarchies. Hence the virtual dimension, as a term, is now gone and concept converted to a real, first class, dimension.

• UDM has five new dimension types, Role Playing, Fact, Reference, Data Mining and Many to many.

Page 25: Introduction to OLAP and Dimensional Modelling Tuesday.

The pre-UDM and UDM stacks

• Pre-UDM stack– Dimension model

(pivot table)– Calculations (Excel)– End-user model (if you

are lucky)– Data source view– Management settings

• UDM stack– Management settings– End-user model– Calculations– Dimensional model– Data source view

Page 26: Introduction to OLAP and Dimensional Modelling Tuesday.

Enterprise BI with UDM

DW

Datamart

Datamart

BI Applications

XML Web

Service

MOLAP

Reporting Tool (1)

OLAP Browser (2)

OLAP Browser (1)

Reporting Tool (1)UDM

Page 27: Introduction to OLAP and Dimensional Modelling Tuesday.

Desirable features of a BI data model

• The model must facilitate – re-use of report spare-parts by the power

users (rather than just the report authors).– more flexibility for report consumers– easier maintenance of the set of all reports

used by an enterprise. (E.g. Avoiding the reporting chain.)

– Interaction.

Page 28: Introduction to OLAP and Dimensional Modelling Tuesday.

Current design principles

• All about how to make reports look good.– See, for example, Microsoft SQL Server 2005

Report Design: Best Practices and Guidelines

• Some focus on maintenance.

• No focus on re-use.

Page 29: Introduction to OLAP and Dimensional Modelling Tuesday.

Wrap-Up

• Questions

• Feedback