Introduction to KDD for Tony's MI Course
Post on 13-May-2015
960 Views
Preview:
Transcript
CogNovaTechnologies
1
COMP 3503COMP 3503
Deductive Modeling with OLAPDeductive Modeling with OLAP
withwith
Daniel L. SilverDaniel L. Silver
Copyright (c), 2007All Rights Reserved
CogNovaTechnologies
2
AgendaAgenda
What is OLAP?What is OLAP? OLAP, MOLAP and ROLAPOLAP, MOLAP and ROLAP OLAP FunctionalityOLAP Functionality Overview of Cognos PowerPlayOverview of Cognos PowerPlay OLAP Pros and ConsOLAP Pros and Cons
CogNovaTechnologies
3
What is OLAP?What is OLAP?
CogNovaTechnologies
4
On-Line Analytical On-Line Analytical ProcessingProcessing
OLAPOLAP Term coined by E.F. Codd in a document Term coined by E.F. Codd in a document
published in 1993 sponsored by Arbor published in 1993 sponsored by Arbor Software Corp (ESSBASE)Software Corp (ESSBASE)
Redefined requirements for tools to Redefined requirements for tools to implement decision support and implement decision support and business intelligence systems.business intelligence systems.
Has had a significant impact on the Has had a significant impact on the database and business software market.database and business software market.
CogNovaTechnologies
5
OLAP DefinitionOLAP Definition Online Analytical Processing = OLAPOnline Analytical Processing = OLAP refers to refers to
technology that allows users of multidimensional technology that allows users of multidimensional data bases to generate on-line descriptive or data bases to generate on-line descriptive or comparative summaries ("views") of data and comparative summaries ("views") of data and other analytic queries. other analytic queries.
OLAPOLAP facilities can (and should) be integrated facilities can (and should) be integrated into enterprise-wide data base systems and they into enterprise-wide data base systems and they allow analysts and managers to monitor the allow analysts and managers to monitor the performance of the business (e.g., such as performance of the business (e.g., such as various aspects of the manufacturing process or various aspects of the manufacturing process or numbers and types of completed transactions at numbers and types of completed transactions at different locations) or the market.different locations) or the market.
Courtesy Anders Stjarne
CogNovaTechnologies
6Multidimensional Multidimensional RequirementsRequirements
Example: Example: Sales volumeSales volume as a function as a function of of productproduct, , timetime, and , and geography.geography.
Pro
duct
Geogr
aphy
Time
Dimensions: Product, Geography, Time
Measure: ‘Sales Volume’
Courtesy Anders Stjarne
More than three dimensional data cube is referred to as a hypercube
CogNovaTechnologies
7
Deductive Modelling and Deductive Modelling and AnalysisAnalysis
QuarterMonth
TypeCustomer
LineBrandNumber
CountryBranchSales Rep
QuantityCostMargin
Combination 1
QuarterMonth
TypeCustomer
LineBrandNumber
CountryBranchSales Rep
QuantityCostMargin
Combination 2
When?Time(1997)
Who?Customers(Channels)
What?Product(Type)
Where?Location(Region)
Result?Indicator
(Revenue)
Comprehensive Sales Analysis
Courtesy Anders Stjarne
CogNovaTechnologies
8
On-Line Analytical On-Line Analytical ProcessingProcessing Strong connection to multi-dimensional Strong connection to multi-dimensional
database model - MOLAPdatabase model - MOLAP Data-cubes are typically constructed Data-cubes are typically constructed
off-line due to time required to build off-line due to time required to build indicesindices
Dimensions, values, and aggregations Dimensions, values, and aggregations are limited to that within data-cubeare limited to that within data-cube
On-line cube development has allowed On-line cube development has allowed RDBMS vendors to survive as major RDBMS vendors to survive as major players in OLAP market - ROLAPplayers in OLAP market - ROLAP
CogNovaTechnologies
9
On-Line Analytical On-Line Analytical ProcessingProcessing12 Rules of an OLAP Environment12 Rules of an OLAP Environment by E.F. Coddby E.F. Codd
Multi-dimensional - Multi-dimensional - data-cubes data-cubes oror hypercubes hypercubes
Transparent accessTransparent access Navigation aidsNavigation aids Consistent reportingConsistent reporting Client-sever basedClient-sever based Generic Generic
dimensionalitydimensionality Efficient data storageEfficient data storage
Multi-user supportMulti-user support Unrestricted cross-Unrestricted cross-
dimensional dimensional operationsoperations
Intuitive data Intuitive data manipulationmanipulation
Flexible reportingFlexible reporting Unlimited levels of Unlimited levels of
aggregationaggregation
CogNovaTechnologies
10
OLAP FunctionalityOLAP Functionality
CogNovaTechnologies
11
On-Line Analytical On-Line Analytical ProcessingProcessing
Deductive Modeling with OLAP Deductive Modeling with OLAP Model is developed within the users mind as Model is developed within the users mind as
data is exploreddata is explored Verification or rejection is facilitated by Verification or rejection is facilitated by
multi-dimensional functions which display multi-dimensional functions which display data numerically and graphicallydata numerically and graphically
Best practices:Best practices:• Determine suspected variable interaction Determine suspected variable interaction • Verify/reject model through explorationVerify/reject model through exploration• Drill-down to refine model Drill-down to refine model • Maintain record of exploratory findingsMaintain record of exploratory findings
CogNovaTechnologies
12
On-Line Analytical On-Line Analytical ProcessingProcessing
Basic OLAP FunctionalityBasic OLAP Functionality Dimension selection - slice & diceDimension selection - slice & dice Rotation - allows change in perspectiveRotation - allows change in perspective Filtration -value range selectionFiltration -value range selection Hierarchies of aggregation levelsHierarchies of aggregation levels
• drill-downs to lower levels drill-downs to lower levels • roll-ups to higher levelsroll-ups to higher levels
Tremendous tool for decision support and Tremendous tool for decision support and executive information delivery and analysisexecutive information delivery and analysis
CogNovaTechnologies
13
OLAP - Sample OperationsOLAP - Sample Operations Roll up: summarize dataRoll up: summarize data
• total sales volume last year by product category total sales volume last year by product category by regionby region
Roll down, drill down, drill through: go from Roll down, drill down, drill through: go from higher level summary to lower level summary higher level summary to lower level summary or detailed dataor detailed data
• For a particular product category, find the For a particular product category, find the detailed sales data for each salesperson by datedetailed sales data for each salesperson by date
Slice and dice: select and projectSlice and dice: select and project• Sales of beverages in the West over the last 6 Sales of beverages in the West over the last 6
monthsmonths Pivot or rotate: change visual dimensionsPivot or rotate: change visual dimensions
Courtesy Anders Stjarne
CogNovaTechnologies
14
OLAP and Data MiningOLAP and Data Mining
The final results from OLAP The final results from OLAP exploration can lead to inductive data exploration can lead to inductive data miningmining
Data Mining techniques can be Data Mining techniques can be applied to the data views and applied to the data views and summaries generated by OLAP to summaries generated by OLAP to provide more in-depth and often more provide more in-depth and often more multidimensional knowledgemultidimensional knowledge
Data Mining techniques can be Data Mining techniques can be considered analytic extension of OLAPconsidered analytic extension of OLAP
CogNovaTechnologies
15
OLAP, MOLAP and ROLAPOLAP, MOLAP and ROLAP
CogNovaTechnologies
16
OLAP Distributed FrameworkOLAP Distributed FrameworkOLAP functions are independent of:OLAP functions are independent of:
• Front-end user interfaceFront-end user interface• Back-end data storageBack-end data storage
OLAPTool
Front-endclient tool
- Web browser- Spread Sheet
StagedMulti-Dim
DataData Source: Data Mart
PopulateMulti-Dim
Data Structurein realtime(on the fly)
(server)“CUBE”
OLAPTool
Front-endclient tool
- Web browser- Spread Sheet
StagedMulti-Dim
DataData Source: Data Mart
PopulateMulti-Dim
Data Structurein realtime(on the fly)
(server)“CUBE”
Courtesy Anders Stjarne
CogNovaTechnologies
17
MOLAP vs. ROLAPMOLAP vs. ROLAPMultidimensionalMultidimensional• difficulty handling sparcity difficulty handling sparcity
efficiently efficiently • direct representation of the direct representation of the
data “cube”data “cube”• rapid drill down on rapid drill down on
summary datasummary data• proprietary solutionsproprietary solutions• better performance better performance
responseresponse• does not scale well to does not scale well to
handle large amounts of handle large amounts of detaildetail
• thin client, analytical thin client, analytical processing done on serverprocessing done on serverREF: White, “MOLAP vs ROLAP,” (B&A-15)
Relational• multidimensional view built on a
Relational DBMS
• hampered by the limitations of SQL
• handles sparcity automatically
• stores summary and detail data equally easily
• easy to share common dimensions across DWs
• scales well using well-developed relational technology
• depends on efficient processing of STAR joins and indexes
• analytical processing done on the client (or middle server)
Courtesy Anders Stjarne
CogNovaTechnologies
18
Overview of CognosOverview of CognosPowerPlay OLAPPowerPlay OLAP
CogNovaTechnologies
19
PowerPlay includes the PowerPlay includes the following components:following components:
•Transformer Transformer o Used to define the contents of a Used to define the contents of a
cube and create the cubecube and create the cube
•PowerPlayPowerPlayo Accesses cubes for data exploration Accesses cubes for data exploration
and reporting.and reporting.
PowerPlay for Windows PowerPlay for Windows ComponentsComponents
Courtesy Anders Stjarne
CogNovaTechnologies
20
PowerPlay CubesPowerPlay Cubes A cube is a structure that stores data multi-dimensionally and A cube is a structure that stores data multi-dimensionally and
provides:provides:• secure data accesssecure data access• fast retrieval of data.fast retrieval of data.
Cubes can be distributed across a network or to individual computers.Cubes can be distributed across a network or to individual computers.
CustomersChannels
Products
LocationsSales Reps
Time
CustomersChannels
Products
LocationsSales Reps
Time
Courtesy Anders Stjarne
CogNovaTechnologies
21
MeasuresMeasures The The numericnumeric (continuous) data that is collected and stored by your organization. (continuous) data that is collected and stored by your organization.
The performance measures used to evaluate your business.The performance measures used to evaluate your business.
Examples:Examples:• RevenueRevenue• CostCost• Quantity soldQuantity sold• Units on-handUnits on-hand• Hours per JobHours per Job• Number of callsNumber of calls• Defective units.Defective units.
#%
Revenue - Cost = Profit Margin
Basic
Derived
Courtesy Anders Stjarne
CogNovaTechnologies
22
Dimensions and LevelsDimensions and Levels DimensionsDimensions are a broad group of descriptive are a broad group of descriptive
data about the major aspects of your business.data about the major aspects of your business. LevelsLevels represent established hierarchy within represent established hierarchy within
dimensionsdimensions..
Dimensions
Levels
When?
Date
What?
Products
Where?
Locations
Years
Months
Days
Line
Type
Product
Region
Branch
Country
Courtesy Anders Stjarne
CogNovaTechnologies
23
Levels and CategoriesLevels and Categories•A A category iscategory is a data item that populates a level in a a data item that populates a level in a
dimension.dimension.
Levels
CategoriesDimension Locations
Region
Country
Branch
Europe
United Kingdom
London, U.K.
Manchester, U.K.
Courtesy Anders Stjarne
CogNovaTechnologies
24
Application Development ProcessApplication Development Process
Plan measures and dimensions
Create the cube
RevenueUnitsDiscountsQuota
Years 2
Quarters 8
Months 24
State 4City 16Store 72
Business Units
3ProductLines
6Brands 18Products 125
All Years National SalesForce
All Products
Sales Management Example
Technician 158
Obtain the required data
Develop the PowerPlay model
Explore the cube data using PowerPlay
Courtesy Anders Stjarne
CogNovaTechnologies
25
Explorer and ReporterExplorer and Reporter
PowerPlay offers two report modes:PowerPlay offers two report modes:
Build custom reports
Addcategories
Reporter
Investigate
Replace categories
Explorer
Courtesy Anders Stjarne
CogNovaTechnologies
26
Explorer Crosstab ReportExplorer Crosstab Report The default Explorer crosstab report contains:The default Explorer crosstab report contains:
• the first two dimensions in the rows and the first two dimensions in the rows and columnscolumns
• values for the first measurevalues for the first measure
• a summary row and column.a summary row and column.
Rows
ColumnsSummarycolumn
SummaryRow
Measures
Courtesy Anders Stjarne
CogNovaTechnologies
27
PowerPlay Toolbar and PowerPlay Toolbar and MenusMenus
You can access commonly used You can access commonly used features on the PowerPlay toolbar.features on the PowerPlay toolbar.
PowerPlay menus offer extended PowerPlay menus offer extended features.features.
Right-click a report to view and Right-click a report to view and use theuse theavailable options from a shortcut available options from a shortcut menu.menu.
Courtesy Anders Stjarne
CogNovaTechnologies
28
The Dimension LineThe Dimension Line
Courtesy Anders Stjarne
Use the dimension line to:Use the dimension line to:• filter datafilter data• navigate dimensions and change navigate dimensions and change
measuresmeasures• view the current level.view the current level.
CogNovaTechnologies
29
Dimension ViewerDimension Viewer
The dimension viewer is used to view the content and navigation paths of a selected cube, and the cube path.
The toolbox buttons provide access to commonly used features.
Dimension = Locations
Toolbox
Cube path
Level 1 = StatesCategory = CA
Level 2 = CitiesCategory = San Diego
Measures
Courtesy Anders Stjarne
CogNovaTechnologies
30
PowerPlay File ExtensionsPowerPlay File Extensions
.ppr, .ppx, .pdffor reports
.mdc for cubes
Courtesy Anders Stjarne
CogNovaTechnologies
31
Basic OLAP OperationsBasic OLAP Operations• Selection (Filter) – within the range of a dimensionSelection (Filter) – within the range of a dimension• Scope – the range on a dimensionScope – the range on a dimension• Slice – Slice – a two dimensional ‘page’ from the cubea two dimensional ‘page’ from the cube
• Dice Dice – chopping up along the dimensions– chopping up along the dimensions
• Drill down analysis - Drill down analysis - to the detail beneath summary datato the detail beneath summary data
• Rollup/ ConsolidateRollup/ Consolidate• Rotate (Pivot) – change dimension orientationRotate (Pivot) – change dimension orientation
o Swap rows and columnsSwap rows and columnso Swap on or offSwap on or offo Change nesting orderChange nesting order
• Reach Through – to the source data detailReach Through – to the source data detail• Calculations / Derivation formulas on the measured factsCalculations / Derivation formulas on the measured facts
o Ratios, Rankings, etc.Ratios, Rankings, etc.o E.g., E.g., NetSales = GrossSales – Cost; NetSales = GrossSales*(1 - NetSales = GrossSales – Cost; NetSales = GrossSales*(1 -
Margin)Margin) REFS: INMON, Building, Ch. 7, p. 243; White, “MOLAP vs ROLAP,” (B&A-15)
Courtesy Anders Stjarne
CogNovaTechnologies
32
Advanced OLAP Advanced OLAP OperationsOperations Trend analysis - over broad vistas of timeTrend analysis - over broad vistas of time
• handling time series data, time handling time series data, time calculationscalculations
Key ratio indicator measurement and Key ratio indicator measurement and trackingtracking
Comparisons - present to: past, plan, and Comparisons - present to: past, plan, and othersothers• competitive market analysiscompetitive market analysis
Problem monitoring - of variables within Problem monitoring - of variables within control limitscontrol limits
Alerts and Event-Driven Agent ProcessingAlerts and Event-Driven Agent ProcessingCourtesy Anders Stjarne
CogNovaTechnologies
33
OLAP Pros and ConsOLAP Pros and Cons
CogNovaTechnologies
34
On-Line Analytical On-Line Analytical ProcessingProcessing
Strengths of OLAP Strengths of OLAP Powerful visualization ability via GUIPowerful visualization ability via GUI Fast, interactive response timesFast, interactive response times Analysis of time seriesAnalysis of time series Deductive discovery of Deductive discovery of
clusters/exceptionsclusters/exceptions Many OLAP products available and Many OLAP products available and
integrated to DB productsintegrated to DB products
CogNovaTechnologies
35
On-Line Analytical On-Line Analytical ProcessingProcessing
Weaknesses of OLAP Weaknesses of OLAP Does not handle continuous variablesDoes not handle continuous variables Does not automatically discover Does not automatically discover
patterns and models patterns and models Generation of a hypercube requires Generation of a hypercube requires
some training and experiencesome training and experience Hypercube generation and update - Hypercube generation and update -
MOLAP Vs. ROLAPMOLAP Vs. ROLAP
CogNovaTechnologies
36
On-Line Analytical On-Line Analytical ProcessingProcessing Products and SuppliersProducts and Suppliers
PC OLAPPC OLAP• PowerPlay (Cognos)PowerPlay (Cognos)
High-end ROLAPHigh-end ROLAP• DSS Agent (Microstrategy)DSS Agent (Microstrategy)• InfoBeacon (Platinum Technology)InfoBeacon (Platinum Technology)
High-end MOLAPHigh-end MOLAP• Accumate (Kenan)Accumate (Kenan)• Oracle Express (Oracle)Oracle Express (Oracle)• Wired/ESSBASE (AppSource/Arbor Software)Wired/ESSBASE (AppSource/Arbor Software)
CogNovaTechnologies
37
TutorialTutorial
Cognos Transformer and Cognos Transformer and PowerPlayPowerPlay
Star Schema – Star Schema – http://www.ciobriefings.com/whitephttp://www.ciobriefings.com/whitepapers/StarSchema.aspapers/StarSchema.asp
CogNovaTechnologies
38
THE ENDTHE END
danny.silver@acadiau.cadanny.silver@acadiau.ca
CogNovaTechnologies
39
Codd’s 18 Rules for OLAPCodd’s 18 Rules for OLAP BASIC FEATURES:BASIC FEATURES:
• Multidimensional Conceptual View (#1)Multidimensional Conceptual View (#1)• Intuitive data manipulation (#10)Intuitive data manipulation (#10)• Accessibility (#3) – OLAP server engine as middlewareAccessibility (#3) – OLAP server engine as middleware• Batch Extraction & Interpretive (on the fly) – implies hybridBatch Extraction & Interpretive (on the fly) – implies hybrid• OLAP Analysis Models – categorical, exegetical, contemplative, goal seekingOLAP Analysis Models – categorical, exegetical, contemplative, goal seeking• Client-Server Architecture (#5)Client-Server Architecture (#5)• Transparency (#2)Transparency (#2)• Multi-User Support (#8) – concurrent access, and update, with securityMulti-User Support (#8) – concurrent access, and update, with security
SPECIAL FEATURES:SPECIAL FEATURES:• Treatment of Non-Normalized DataTreatment of Non-Normalized Data• Storing OLAP Results separate from Source DataStoring OLAP Results separate from Source Data• Extraction of Missing Values – missing(NULL) distinct from zeroExtraction of Missing Values – missing(NULL) distinct from zero• Treatment of Missing Values – excluded from statistical calculationsTreatment of Missing Values – excluded from statistical calculations
REPORTING FEATURES:REPORTING FEATURES:• Flexible Reporting (#11) – laying out dimensions in any wayFlexible Reporting (#11) – laying out dimensions in any way• Uniform Reporting Performance (#4) – not vary by #dimensions, or sizeUniform Reporting Performance (#4) – not vary by #dimensions, or size• Automatic Adjustment of Physical Level (#7) – adjust for sparsity, sizeAutomatic Adjustment of Physical Level (#7) – adjust for sparsity, size
DIMENSION CONTROL:DIMENSION CONTROL:• Generic Dimensionality (#6) – all dimensions treated uniformlyGeneric Dimensionality (#6) – all dimensions treated uniformly• Unlimited Dimensions & Aggregation Levels (#12)Unlimited Dimensions & Aggregation Levels (#12)• Unrestricted Cross-Dimensional Operations (#9)Unrestricted Cross-Dimensional Operations (#9)
Courtesy Anders Stjarne
top related