Top Banner
Lecture 4 Themes in this session How OLAP really works Enterprise data models for data warehousing • Metadata
32

Lecture 4

Feb 12, 2016

Download

Documents

lisle

Lecture 4. Themes in this session How OLAP really works Enterprise data models for data warehousing Metadata. How OLAP really works. see demonstration…. Enterprise data models for data warehousing. From 2-layered to 3-layer information architecture. Before (2 layered architecture): - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 4

Lecture 4

Themes in this session

• How OLAP really works• Enterprise data models for data warehousing• Metadata

Page 2: Lecture 4

How OLAP really works

Page 3: Lecture 4

see demonstration….

Page 4: Lecture 4

Enterprise data models for data warehousing

Page 5: Lecture 4

From 2-layered to 3-layer information architecture

• Before (2 layered architecture):• Layer 1 - real-time data - run the business• Layer 2 - derived data - manage the business

• Suggested (3 layered architecture):• Layer 1 - real time data• Layer 2 - reconciled data• Layer 3 - derived data

Page 6: Lecture 4

The failings of traditional data modelling in enterprises

Used only in business applications with well defined boundaries and roles.

This means that:– entities are generalised only to the extent

needed within the boundaries– provides no support for integrating applications– provides no support for combining data from

different sources

Page 7: Lecture 4

Enterprise data modelling

Modelling at the enterprise level and not the operational level aids the understanding needed for the reconciliation of operational data demands by:– showing how different data sets interrelate– showing the role of the different data sets in the

business

This is achieved by:– treating and modelling data entities at their most

general level– making all commonalties in business data visible and

usable

Page 8: Lecture 4

Aims for enterprise data modelling

• Providing a single systems development base and promoting the integration of existing applications

• Supporting the sharing of data between different areas of the business

• Enabling effective management of data resources by providing a single set of consistent data divisions

• Supporting the establishment and maintenance of a company-wide comprehensive management information system

• Providing a structured methodology which allows business users to be involved in the implementation of business information strategies

Page 9: Lecture 4

The structure of the enterprise data model

The enterprise data model has 5 distinct layers:– scope and architecture layer– business data classifications– generic entity relationship model– logical applications view– physical data design

Page 10: Lecture 4

The challenges of enterprise data modelling

• Very wide scope, is always very complex• Demands input from all areas of the business• Is very time consuming and constitutes a moving

target• Requires good management• Must be planned so as to deliver value and gain

momentum• Requires access to skilled and knowledgeable

business users, this is always in direst competition to operational priorities

• Difficult to apply in application development situations

Page 11: Lecture 4

A strategy for enterprise data modelling

• Tackle problems by breaking them down and dealing with them piece by piece

• Use a layered structure with vertical subdivisions into business subject areas

• Employ a stages definition approach• Employ a staged implementation approach

Page 12: Lecture 4

General steps in the creation of the enterprise data model

• Obtain a unified view of the data needed to run the company• should be sufficiently generic for all the sections of the

business to accept• should be sufficiently detailed to allow reasonably

independent subsets to be identified as the basis for further work

• All key entities that are commonly used across the enterprise must be identified with certainty

• Local key attributes must be given initial definitions• Important relationships between key entities must be

identified

Note: for many industries pre-constructed industry models can be bought and customised

Page 13: Lecture 4

Modelling the Business Data Warehouse (reconciliation layer)

• Determine the vertical segment (depth and breadth) of the subject area which the BDW data model is to cover

• this should include a set of strongly interrelated entities• Choose which section of the Generic ERM to model on the

basis of the business units’ needs• Develop a logical application view in which the entities in the

GERM are customised in order fit the purposes of the application

• Generate an optimal physical data design for the logical application view.

• Note: the physical data design will nearly always differ form the actual design of the BDW because the legacy systems compromise the physical design of the BDW

Page 14: Lecture 4

Modelling the Business Information Warehouse (derivation layer)

• Identify end-user groups and ascertain their intentions and requirements for the use of information

• Select the relevant subset of the GERM from within the bounds of the BDW segment• Identify any isolated data needs that fall outside the bounds of the BDW• Create the Logical Application view for the BIW• Create the Physical Data Design for the BIW• Map the transformations between the reconciled and derived data models stipulated how

the data will be moved from the physical sources in the BDW to the BIW (Metadata)Note: Sometimes it will be necessary to bypass the BDW and collect data directly from the

real-time systems. This must be seen as a short term solution and eliminated ASAP.

• Note: bypass

Page 15: Lecture 4

Retrofitting the model

• Employ an approach of modifying operational applications at the same time as the BDW is evolved.

• Try and steer both the operational applications and the BDW towards the optimal form dictated by the model.

• The model serves as a goal towards which both the operational applications and BDW are fitted

Page 16: Lecture 4

The staged implementation of a data warehouse

• Stage 1 - define the high level enterprise model – (1-3 months)

• Stage 2 - model the subset intended for the BDW – (6-9 months)

• Stage 3 - model the first BIW – (1-2 months)

• Stage 4 release initial versions and continue with BDW/BIW evolution

Note: the whole process of modelling, in parallel with an implementation program may take a year to get off the ground!

Page 17: Lecture 4

Using pilot applications for Data Warehousing

In order to achieve quicker ROI a scaled down version of the BDW can be released. The size of the BDW can coincide exactly with that of the first BIW.

• In order to prevent a fragmented evolution of the BDW several guidelines must be followed:– Use a pilot only once in any given business area– The structure of the BDW/BIW should not be too

highly optimised for performance– A plan should for the migration to a full-blown

three layer architecture should be be delivered and approved before the delivery of the pilot

Page 18: Lecture 4

Approaches to building the data warehouses and data marts with

enterprise models• The top-down approach

• first develop an enterprise data warehouse from the enterprise data model. Follow this up with data marts until a multi-tier architecture is obtained

• The bottom-up approach• random growth of data marts, hopefully the evolution of

a enterprise data warehouse on top of these after a while

• The hybrid approach• develop the enterprise model first. When the model is in

place begin building the enterprise warehouse and data marts in parallel

Page 19: Lecture 4

IFW - an example of an enterprise data model

Applica-

tion

Network SystemData Function Work-

flow

SolutionStructure Strategy Skills

Organization View Business View Technical View

Page 20: Lecture 4

IFW - the business view

Level Data Function Workflow

A Data ConceptsFunction

Workflow Concepts

B ClassificationHierarchies

HierarchiesWorkflow Structures

C Business ObjectER Models

Business Object Mo

State Transmissiondels

D

Page 21: Lecture 4

The nine key data concepts on the FSDM - A level

• Involved Party ex. customer, supplier• Location ex. address• Arrangement ex. contract• Business Direction Item ex. goal, method• Classification ex. account, • Condition ex. price, interest• Product ex. article, service• Event ex. payment • Resource Item ex. price list, document

Page 22: Lecture 4

Classification hierarchies on the B-level of the FSDM

Location LOCATION TYPE

Geographic Area

Address

GEOGRAPHICAREATYPE

ADDRESS TYPE

Time Zone

CountryPostcode Area

Legal Address

Internal AddressPostal Address

Value SCHEME Value SCHEME Value

::

::

Page 23: Lecture 4

Generic ERD on the C-Level

Geographic Area

includes

is_included_in

is_classified_byclassifies

Time Zone

is_subtype_of

is_supertype_of

is_subtype_of

is_supertype_of

Location

Geographic Area Type

Geographic AreaTime Zone Difference

Geographic AreaName

Geographic AreaID

Geographic AreaID Type

classifies

is_classified_by

is_classified_by

calssifies

Geographic AreaName Type

includesis_included_in includes

is_included_in

Page 24: Lecture 4

Metadata

Page 25: Lecture 4

What is metadata?

Data about data

Main functions are to give...• data definitions• the origin of data• the structure of data• rules for the selection and transfer of data• qualitative and quantitative data about data

Page 26: Lecture 4

Why is metadata needed?

• Increasing functionality of data warehouses• Increasing size and complexity of data

warehouses• Increasing number of varied user groups• Evolution of data warehouses and historical

data analysis requirements

Users and developers need a better, more standardised, way to document and communicate their knowledge of the warehouse, its rules and data sources

Page 27: Lecture 4

The metadata repository

A specialised database designed to maintain metadata together with tools and interfaces which allow the company to collect and distribute the data

• Is a combination of shared and local data about data

• Is the vital component in a distributed metadata architecture– supports the distribution of sharable components– supports the autonomy and control of unshared local

components

Page 28: Lecture 4

The life cycle of metadata

• Collection• identify and capture metadata in a central repository

• Maintenance• establish processes to synchronise metadata

automatically with the changing data structure• Deployment

• provide metadata to users in the right form and with the right tools

Page 29: Lecture 4

Focus areas for the collection of metadata

• Warehouse data sources• physical data structures• business definitions of all data elements• platforms,data formats, update frequencies

• Data models• the logical and physical enterprise data model

• Warehouse data models• the logical and physical schemas for the data warehouse

• Warehouse mappings• between warehouse and operational data structures

• Warehouse usage information• who’s using the warehouse and how they’re using it• try and relate business problems and specific queries

Page 30: Lecture 4

Target groups for metadata deployment

• Warehouse developers• physical structure models for data sources• target physical data structures as they evolve• evolving mapping schemas

• Warehouse maintenance staff• monitor changes in the provision and utilisation environment

and manage the effects of these changes on the DW• responsible for updating the metadata when the Dw

architecture is affected• ensure the capability of tracing changes

• End-users• aid exploration and understandability of information• validate information on the basis of source and quality• standard queries for specific business problems

Page 31: Lecture 4

Integration with data access tools

4 levels of possible integration are suggested• Side-by-side access• Use a query tool to provide context sensitive help

texts• Query tools specifically suited for accessing

metadata itself• Full interconnectivity between metadata tool and

query tool• access to business query tools through metadata• transparent move from business query tools to

metadata

Page 32: Lecture 4

Versioning of metadata and metadata maintenance

• DW always contains a long history of data in order to support analysis

• The time-specific context of the information has to be saved in order to explain the content

• Changes in the DW demand a new version of the metadata

• Parallel version management of the DW and metadata