Decision Support, Data Decision Support, Data Warehousing, and OLAP Warehousing, and OLAP Anindya Datta Anindya Datta Director, iXL Center for Director, iXL Center for E-Commerce E-Commerce Georgia Institute of Georgia Institute of Technology Technology [email protected][email protected]
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Decision Support, Data Decision Support, Data Warehousing, and OLAPWarehousing, and OLAP
Anindya DattaAnindya Datta
Director, iXL Center for E-Director, iXL Center for E-CommerceCommerce
Georgia Institute of Georgia Institute of TechnologyTechnology
Terminology: OLAP vs. OLTPTerminology: OLAP vs. OLTP Data Warehousing ArchitectureData Warehousing Architecture TechnologiesTechnologies ProductsProducts Research IssuesResearch Issues ReferencesReferences
Decision Support and Decision Support and OLAPOLAP
Information technology to help the knowledge worker Information technology to help the knowledge worker (executive, manager, analyst) make faster and better (executive, manager, analyst) make faster and better decisions.decisions.• What were the sales volumes by region and product category What were the sales volumes by region and product category
for the last year?for the last year?
• How did the share price of computer manufacturers correlate How did the share price of computer manufacturers correlate with quarterly profits over the past 10 years?with quarterly profits over the past 10 years?
• Which orders should we fill to maximize revenues?Which orders should we fill to maximize revenues?
• Will a 10% discount increase sales volume sufficiently?Will a 10% discount increase sales volume sufficiently?
• Which of two new medications will result in the best Which of two new medications will result in the best outcome: higher recovery rate & shorter hospital stay?outcome: higher recovery rate & shorter hospital stay?
On-Line Analytical Processing (OLAP) is an element of On-Line Analytical Processing (OLAP) is an element of decision support systmes (DSS). decision support systmes (DSS).
EvolutionEvolution
60’s: Batch reports60’s: Batch reports• hard to find and analyze informationhard to find and analyze information
• inflexible and expensive, reprogram every new requestinflexible and expensive, reprogram every new request
70’s: Terminal-based DSS and EIS (executive information 70’s: Terminal-based DSS and EIS (executive information systems)systems)• still inflexible, not integrated with desktop toolsstill inflexible, not integrated with desktop tools
80’s: Desktop data access and analysis tools80’s: Desktop data access and analysis tools• query tools, spreadsheets, GUIsquery tools, spreadsheets, GUIs
• easier to use, but only access operational databaseseasier to use, but only access operational databases
90’s: Data warehousing with integrated OLAP engines and 90’s: Data warehousing with integrated OLAP engines and toolstools
OLTP vs. OLAPOLTP vs. OLAP
Clerk, IT ProfessionalClerk, IT Professional Day to day operationsDay to day operations
Historical, ConsolidatedHistorical, Consolidated Summarized, MultidimensionalSummarized, Multidimensional Ad hocAd hoc Complex queryComplex query Read MostlyRead Mostly Lots of ScansLots of Scans MillionsMillions HundredsHundreds 100GB-TB100GB-TB Query throughput, responseQuery throughput, response
User
Function
DB Design
Data
View
Usage
Unit of work
Access
Operations
# Records accessed
#Users
Db size
Metric
OLTPOLTP OLAPOLAP
Data WarehouseData Warehouse
A decision support database that is maintained separately A decision support database that is maintained separately from the organization’s operational databases.from the organization’s operational databases.
A data warehouse is a A data warehouse is a
• subject-oriented,subject-oriented,
• integrated,integrated,
• time-varying,time-varying,
• non-volatilenon-volatile
collection of data that is used primarily in organizational collection of data that is used primarily in organizational decision makingdecision making
Why Separate Data Why Separate Data Warehouse?Warehouse?
PerformancePerformance• Op dbs designed & tuned for known txs & workloads.Op dbs designed & tuned for known txs & workloads.
• Complex OLAP queries would degrade perf. For op txs.Complex OLAP queries would degrade perf. For op txs.
• Special data organization, access & implementation methods Special data organization, access & implementation methods needed for multidimensional views & queries. needed for multidimensional views & queries.
FunctionFunction• Missing data: Decision support requires historical data, which op dbs do not typically Missing data: Decision support requires historical data, which op dbs do not typically
maintain.maintain.• Data consolidation: Decision support requires consolidation (aggregation, summarization) of Data consolidation: Decision support requires consolidation (aggregation, summarization) of
data from many heterogeneous sources: op dbs, external sources. data from many heterogeneous sources: op dbs, external sources. • Data quality: Different sources typically use inconsistent data representations, codes, and Data quality: Different sources typically use inconsistent data representations, codes, and
formats which have to be reconciled.formats which have to be reconciled.
Data Warehousing MarketData Warehousing Market
Hardware: servers, storage, clientsHardware: servers, storage, clients Warehouse DBMsWarehouse DBMs ToolsTools Market growing from Market growing from
• $2B in 1995 to $8 B in 1998 (Meta Group)$2B in 1995 to $8 B in 1998 (Meta Group)
• 1.5B today to $6.9B in 1999 (Gartner Group)1.5B today to $6.9B in 1999 (Gartner Group)
Systems integration & ConsultingSystems integration & Consulting Already deployed in many industries: manufacturing, Already deployed in many industries: manufacturing,
Warehouse database serverWarehouse database server• Almost always a relational DBMS; rarely flat filesAlmost always a relational DBMS; rarely flat files
OLAP serversOLAP servers• Relational OLAP (ROLAP): extended relational DBMS that maps Relational OLAP (ROLAP): extended relational DBMS that maps
operations on multidimensional data to standard relational operations.operations on multidimensional data to standard relational operations.
• Multidimensional OLAP (MOLAP): special purpose server that Multidimensional OLAP (MOLAP): special purpose server that directly implements multidimensional data and operations.directly implements multidimensional data and operations.
ClientsClients• Query and reporting tools.Query and reporting tools.
• Analysis toolsAnalysis tools
• Data mining tools (e.g., trend analysis, prediction) Data mining tools (e.g., trend analysis, prediction)
Data Warehouse vs. Data Data Warehouse vs. Data MartsMarts
Enterprise warehouse: collects all information about subjects Enterprise warehouse: collects all information about subjects (customers, products, sales, assets, personnel) that span the entire (customers, products, sales, assets, personnel) that span the entire organization.organization.• Requires extensive business modelingRequires extensive business modeling
• May take years to design and buildMay take years to design and build Data Marts: Departmental subsets that focus on selected subjects: Data Marts: Departmental subsets that focus on selected subjects:
Marketing data mart: customer, products, sales.Marketing data mart: customer, products, sales.• Faster roll out, but complex integration in the long run.Faster roll out, but complex integration in the long run.
Virtual warehouse: views over operational dbsVirtual warehouse: views over operational dbs• Materialize some summary views for efficient query processingMaterialize some summary views for efficient query processing
• Easier to buildEasier to build
• Requisite excess capcaity on operational db serversRequisite excess capcaity on operational db servers
Define architecture. Do capacity planning.Define architecture. Do capacity planning. Integrate db and OLAP servers, storage and client tools.Integrate db and OLAP servers, storage and client tools. Design warehouse schema, views.Design warehouse schema, views. Design physical warehouse organization: data placement, Design physical warehouse organization: data placement,
partitioning, access methods.partitioning, access methods. Connect sources: gateways, ODBC drivers, wrappers.Connect sources: gateways, ODBC drivers, wrappers. Design & implement scripts for data extract, load refresh.Design & implement scripts for data extract, load refresh. Define metadata and populate repository.Define metadata and populate repository. Design & implement end-user applications.Design & implement end-user applications. Roll out warehouse and applications.Roll out warehouse and applications. Monitor the warehouse.Monitor the warehouse.
OLAP for Decision SupportOLAP for Decision Support
Goal of OLAP is to support ad-hoc querying for the Goal of OLAP is to support ad-hoc querying for the business analystbusiness analyst
Business analysts are familiar with spreadsheetsBusiness analysts are familiar with spreadsheets Extend spreadsheet analysis model to work with Extend spreadsheet analysis model to work with
warehouse datawarehouse data• Large data setLarge data set
• Semantically enriched to understand business terms (e.g., time, Semantically enriched to understand business terms (e.g., time, geography)geography)
• Combined with reporting featuresCombined with reporting features
Multidimensional Multidimensional view of data is the foundation of OLAP view of data is the foundation of OLAP
Multidimensional Data Multidimensional Data ModelModel Database is a set ofDatabase is a set of facts facts (points) in a multidimensional space (points) in a multidimensional space A fact has a A fact has a measuremeasure dimension dimension
• quantity that is analyzed, e.g., sale, budgetquantity that is analyzed, e.g., sale, budget
A set of A set of dimensionsdimensions on which data is analyzed on which data is analyzed• e.g. , store, product, date associated with a sale amounte.g. , store, product, date associated with a sale amount
Dimensions form a sparsely populated coordinate systemDimensions form a sparsely populated coordinate system Each dimension has a set of Each dimension has a set of attributesattributes
• e.g., owner city and county of storee.g., owner city and county of store
Attributes of a dimension may be related by partial orderAttributes of a dimension may be related by partial order• HierarchyHierarchy: e.g., street > county >city: e.g., street > county >city
Sales Sales Volume Volume as a as a functiofunction of n of time, time, city city and and productproduct3/1 3/2 3/3 3/1 3/2 3/3
3/43/4
DateDate
Operations in Operations in Multidimensional Data Multidimensional Data ModelModel
Aggregation (Aggregation (roll-uproll-up))• dimension reduction: e.g., total sales by citydimension reduction: e.g., total sales by city• summarization over aggregate hierarchy: e.g., total sales by summarization over aggregate hierarchy: e.g., total sales by
city and year -> total sales by region and by yearcity and year -> total sales by region and by year
Selection (Selection (sliceslice) defines a subcube) defines a subcube• e.g., sales where city = Palo Alto and date = 1/15/96e.g., sales where city = Palo Alto and date = 1/15/96
Navigation to detailed data (Navigation to detailed data (drill-downdrill-down))• e.g., (sales - expense) by city, top 3% of cities by average e.g., (sales - expense) by city, top 3% of cities by average
A Visual Operation: Pivot A Visual Operation: Pivot (Rotate)(Rotate)
1010
4747
3030
1212
JuiceJuice
ColaCola
Milk Milk
CreaCreamm
NYNY
LALA
SFSF
3/1 3/2 3/3 3/1 3/2 3/3 3/43/4
DateDate
Month
Month
Reg
ion
Reg
ion
ProductProduct
Approaches to OLAP Approaches to OLAP ServersServers
Relational OLAP (ROLAP)Relational OLAP (ROLAP)• Relational and Specialized Relational DBMS to store and manage warehouse Relational and Specialized Relational DBMS to store and manage warehouse
datadata• OLAP middleware to support missing piecesOLAP middleware to support missing pieces
– Optimize for each DBMS backendOptimize for each DBMS backend
Relational DBMS as Relational DBMS as Warehouse ServerWarehouse Server
Schema designSchema design Specialized scan, indexing and join techniquesSpecialized scan, indexing and join techniques Handling of aggregate views (querying and Handling of aggregate views (querying and
materialization)materialization) Supporting query language extensions beyond Supporting query language extensions beyond
SQLSQL Complex query processing and optimizationComplex query processing and optimization Data partitioning and parallelismData partitioning and parallelism
A single fact table and a single table for each dimensionA single fact table and a single table for each dimension Every fact points to one tuple in each of the dimensions Every fact points to one tuple in each of the dimensions
and has additional attributesand has additional attributes Does not capture hierarchies directlyDoes not capture hierarchies directly Generated keys are used for performance and maintenance Generated keys are used for performance and maintenance
reasonsreasons Fact constellation: Multiple Fact tables that share many Fact constellation: Multiple Fact tables that share many
dimension tablesdimension tables• Example: Projected expense and the actual expense may share Example: Projected expense and the actual expense may share
dimensional tablesdimensional tables
Example of a Snowflake Example of a Snowflake SchemaSchema
Order NoOrder No
Order DateOrder Date
Customer NoCustomer No
Customer Customer NameName
Customer Customer AddressAddress
CityCity
SalespersonIDSalespersonID
SalespersonNaSalespersonNameme
CityCity
QuotaQuota
OrderNOOrderNO
SalespersonIDSalespersonID
CustomerNOCustomerNO
ProdNoProdNo
DateKeyDateKey
CityNameCityName
QuantityQuantity
Total Price
ProductNOProductNO
ProdNameProdName
ProdDescrProdDescr
CategoryCategory
CategoryCategory
UnitPriceUnitPrice
DateKeyDateKey
DateDate
MonthMonth
CityNameCityName
StateState
CountryCountry
OrderOrder
CustomerCustomer
SalespersoSalespersonn
CityCity
DateDate
ProductProduct
Fact TableFact Table
CategoryNaCategoryNameme
CategoryDeCategoryDescrscr
MontMonthh
YearYear YearYear
StateNameStateName
CountryCountry
CategoryCategory
StateState
MonthMonthYearYear
Snowflake SchemaSnowflake Schema
Represent dimensional hierarchy directly by Represent dimensional hierarchy directly by normalizing the dimension tablesnormalizing the dimension tables
Easy to maintainEasy to maintain Saves storage, but is alleged that it reduces Saves storage, but is alleged that it reduces
effectiveness of browsing (Kimball)effectiveness of browsing (Kimball)
Indexing TechniquesIndexing Techniques
Exploiting indexes to reduce scanning of Exploiting indexes to reduce scanning of data is of crucial importancedata is of crucial importance
Bitmap IndexesBitmap Indexes Join IndexesJoin Indexes Other IssuesOther Issues
• Text indexingText indexing• Parallelizing and sequencing of index builds and Parallelizing and sequencing of index builds and
incremental updatesincremental updates
BitMap IndexesBitMap Indexes
An alternative representation of RID-listAn alternative representation of RID-list Specially advantageous for low-cardinality domainsSpecially advantageous for low-cardinality domains Represent each row of a table by a bit and the table Represent each row of a table by a bit and the table
as a bit vectoras a bit vector There is a distinct bit vector Bv for each value v for There is a distinct bit vector Bv for each value v for
the domainthe domain Example: the attribute sex has values M and F. A Example: the attribute sex has values M and F. A
table of 100 million people needs 2 lists of 100 table of 100 million people needs 2 lists of 100 million bitsmillion bits
Bit Map IndexBit Map Index
Cust Region RatingC1 N HC2 S MC3 W LC4 W HC5 S LC6 W LC7 N H
Base Base TableTable
Row ID N S E W1 1 0 0 02 0 1 0 03 0 0 0 14 0 0 0 15 0 1 0 06 0 0 0 17 1 0 0 0
Row ID H M L1 1 0 02 0 1 03 0 0 04 0 0 05 0 1 06 0 0 07 1 0 0
Rating IndexRating Index
Region Region IndexIndex
Customers whereCustomers where Region = Region = WW
Rating = Rating = 11AndAnd
BitMap IndexesBitMap Indexes
Comparison, join and aggregation operations are Comparison, join and aggregation operations are reduced to bit arithmetic with dramatic reduced to bit arithmetic with dramatic improvement in processing timeimprovement in processing time
Significant reduction in space and I/O (30:1)Significant reduction in space and I/O (30:1) Adapted for higher cardinality domains as well.Adapted for higher cardinality domains as well. Compression (e.g., run-length encoding) exploitedCompression (e.g., run-length encoding) exploited Products that support bitmaps: Model 204, Products that support bitmaps: Model 204,
TargetIndex (Redbrick), IQ (Sybase), Oracle 7.3TargetIndex (Redbrick), IQ (Sybase), Oracle 7.3
Issues in Handling of Issues in Handling of Aggregate ViewsAggregate Views
Important component for ROLAP ServersImportant component for ROLAP Servers Representation in the context of star schemaRepresentation in the context of star schema
Logic for Aggregation NavigationLogic for Aggregation Navigation• make optimum use of materialized aggregates to answer a make optimum use of materialized aggregates to answer a
queryquery
Choice of aggregate views to materializeChoice of aggregate views to materialize HP Intelligent Warehouse pioneered some of the HP Intelligent Warehouse pioneered some of the
techniquestechniques
SQL Extensions for Front SQL Extensions for Front End ToolsEnd Tools
Extended Family of Aggregate functionsExtended Family of Aggregate functions• rank (top 10) and N-Tile (“top 30%” of all products)rank (top 10) and N-Tile (“top 30%” of all products)
Results of multiple group by:Results of multiple group by:• total sales by month and total sales by producttotal sales by month and total sales by product
SQL comes in the way of sequential processing and SQL comes in the way of sequential processing and columnar aggregationscolumnar aggregations• changes in total sale from 1994 to 1996, aggregated by brandchanges in total sale from 1994 to 1996, aggregated by brand
Query Processing in Query Processing in MOLAP ServersMOLAP Servers
The storage model is an n-dimensional arrayThe storage model is an n-dimensional array Front end multidimensional queries map to Front end multidimensional queries map to
server capabilities in a straightforward wayserver capabilities in a straightforward way Direct Addressing abilitiesDirect Addressing abilities A straightforward array representation has A straightforward array representation has
good indexing properties but very poor good indexing properties but very poor storage utilization when the data is sparsestorage utilization when the data is sparse
Query Processing in Query Processing in MOLAP ServersMOLAP Servers
2-dimensional dense arrays indexed by 2-dimensional dense arrays indexed by B-TreesB-Trees
Traditional Traditional indexing indexing structurestructure
RefreshRefresh• Propogate updates from sources to the warehousePropogate updates from sources to the warehouse
Data CleaningData Cleaning
Why?Why?• Data warehouse contains data that is analyzed for business decisionsData warehouse contains data that is analyzed for business decisions
• More data and multiple sources could mean more errors in the data More data and multiple sources could mean more errors in the data and harder to trace such errorsand harder to trace such errors
• Results in incorrect analysisResults in incorrect analysis
Detecting data anomalies and rectifying them early has huge Detecting data anomalies and rectifying them early has huge payoffspayoffs
Important to identify tools that work together wellImportant to identify tools that work together well Long Term SolutionLong Term Solution
• Change business practices and data entry toolsChange business practices and data entry tools
• Repository for meta-dataRepository for meta-data
Data Cleaning TechniquesData Cleaning Techniques
Transformation RulesTransformation Rules• Example: translate “gender” to “sex”Example: translate “gender” to “sex”
Discover facts that flag unusual patterns (auditing)Discover facts that flag unusual patterns (auditing)• Some dealer has never received a single complaintSome dealer has never received a single complaint
Issues:Issues:• huge volumes of data to be loadedhuge volumes of data to be loaded• small time window (usually at night) when the warehouse can be taken off-small time window (usually at night) when the warehouse can be taken off-
lineline• When to build indexes and summary tablesWhen to build indexes and summary tables• allow system administrator to monitor status, cancel suspend, resume load, allow system administrator to monitor status, cancel suspend, resume load,
or change load rateor change load rate• restart after failure with no loss of data integrityrestart after failure with no loss of data integrity
Techniques:Techniques:• batch load utility: sort input records on clustering key and use sequential batch load utility: sort input records on clustering key and use sequential
I/O; build indexes and derived tablesI/O; build indexes and derived tables• sequential loads still too long (~100 days for TB)sequential loads still too long (~100 days for TB)• use parallelism and incremental techniquesuse parallelism and incremental techniques
RefreshRefresh
Issues:Issues:• when to refreshwhen to refresh
– on every update: too expensive, only necessary if OLAP on every update: too expensive, only necessary if OLAP queries need current data queries need current data (e.g., up-the-minute stock quotes)(e.g., up-the-minute stock quotes)
– periodically (e.g., every 24 hours, every week) or after periodically (e.g., every 24 hours, every week) or after “significant” events“significant” events
– refresh policy set by administrator based on user needs and refresh policy set by administrator based on user needs and traffictraffic
– possibly different policies for different sourcespossibly different policies for different sources
• how to refreshhow to refresh
Refresh TechniquesRefresh Techniques
Full extract from base tablesFull extract from base tables• read entire source table or database: expensiveread entire source table or database: expensive• may be the only choice for legacy databases or files.may be the only choice for legacy databases or files.
Incremental techniques (related to work on active dbs)Incremental techniques (related to work on active dbs)• detect & propagate changes on base tables: replication servers (e.g., detect & propagate changes on base tables: replication servers (e.g.,
Sybase, Oracle, IBM Data Propagator)Sybase, Oracle, IBM Data Propagator)– snapshots & triggers (Oracle)snapshots & triggers (Oracle)– transaction shipping (Sybase)transaction shipping (Sybase)
• Logical correctnessLogical correctness– computing changes to star tablescomputing changes to star tables– computing changes to derived and summary tablescomputing changes to derived and summary tables– optimization: only significant changesoptimization: only significant changes
Administrative metadataAdministrative metadata• source databases and their contentssource databases and their contents• gateway descriptionsgateway descriptions• warehouse schema, view & derived data definitionswarehouse schema, view & derived data definitions• dimensions, hierarchiesdimensions, hierarchies• pre-defined queries and reportspre-defined queries and reports• data mart locations and contentsdata mart locations and contents• data partitionsdata partitions• data extraction, cleansing, transformation rules, defaultsdata extraction, cleansing, transformation rules, defaults• data refresh and purging rulesdata refresh and purging rules• user profiles, user groupsuser profiles, user groups• security: user authorization, access controlsecurity: user authorization, access control
Metdata Repository .. 2Metdata Repository .. 2
Business dataBusiness data• business terms and definitionsbusiness terms and definitions• ownership of dataownership of data• charging policiescharging policies
operational metadataoperational metadata• data lineage: history of migrated data and sequence of data lineage: history of migrated data and sequence of
• Shipping metadata to and from RDBMS catalogue (e.g., Prism Shipping metadata to and from RDBMS catalogue (e.g., Prism Warehouse Manager).Warehouse Manager).
Planning & analysis toolsPlanning & analysis tools• impact of schema changesimpact of schema changes
• capacity planningcapacity planning
• refresh performance: changing refresh rates or time windowsrefresh performance: changing refresh rates or time windows
Monitoring and reporting tools (e.g., HP Intelligent Warehouse Monitoring and reporting tools (e.g., HP Intelligent Warehouse Advisor)Advisor)• which partitions, summary tables, columns are used which partitions, summary tables, columns are used • query execution timesquery execution times• for summary tables, types & frequencies of roll downsfor summary tables, types & frequencies of roll downs• warehouse usage over time (detect peak periods)warehouse usage over time (detect peak periods)
Systems and network management tools (e.g., HP OpenView, Systems and network management tools (e.g., HP OpenView, IBM NetView, Tivoli): traffic, utilizationIBM NetView, Tivoli): traffic, utilization
Analysis/Visualization tools: OLAP on metadataAnalysis/Visualization tools: OLAP on metadata
State of Commercial State of Commercial PracticePractice
Products and Vendors [Datamation, May 15, 1996; R.C. Barquin, H.A. Edelstein: Planning and Designin Products and Vendors [Datamation, May 15, 1996; R.C. Barquin, H.A. Edelstein: Planning and Designin gthe Data Warehous. Prentice Hall. 1997]gthe Data Warehous. Prentice Hall. 1997]
• Connectivity to sourcesConnectivity to sources– ApertusApertus CA-Ingres GatewayCA-Ingres Gateway– Information Builders EDA/SQLInformation Builders EDA/SQL IBM Data JionerIBM Data Jioner– Informix Enterprise GatewayInformix Enterprise Gateway Microsoft ODBCMicrosoft ODBC– Oracle Open ConnectOracle Open Connect Platinum InfohubPlatinum Infohub– SAS ConnectSAS Connect Software AG EntireSoftware AG Entire– Sybase Enterprise ConnectSybase Enterprise Connect Trinzic InfoHubTrinzic InfoHub
• Data extract, clean, transfomr, refreshData extract, clean, transfomr, refresh– CA-Ingres ReplicatorCA-Ingres Replicator Carleton PassportCarleton Passport– Evolutionary Tech Inc. ETI-ExtractEvolutionary Tech Inc. ETI-Extract Harte-Hanks TrilliumHarte-Hanks Trillium– IBM Data Joiner, Data PropagatorIBM Data Joiner, Data Propagator Oracle 7Oracle 7– Platinum InfoRefiner, InfroPumpPlatinum InfoRefiner, InfroPump Praxis OmniReplicatorPraxis OmniReplicator– Prism Warehouse ManagerPrism Warehouse Manager Redbrick TMURedbrick TMU– SAS AccessSAS Access Software AG SouorcepointSoftware AG Souorcepoint– Sybase Replication ServerSybase Replication Server Trinzic InfoPumpTrinzic InfoPump
State of Commercial State of Commercial Practice..2Practice..2
• IBM DataHub, NetViewIBM DataHub, NetView Information Builder Site AnalyzerInformation Builder Site Analyzer
• Prism Warehouse ManagerPrism Warehouse Manager SAS CPESAS CPE
• TivoliTivoli Software AG Source PointSoftware AG Source Point
• Redbrick Enterprise Control and CoordinationRedbrick Enterprise Control and Coordination Process Management Process Management
• At& T TOPENDAt& T TOPEND HP Intelligent WarehouseHP Intelligent Warehouse
• IBM FlowMarkIBM FlowMark Platinum RepositoryPlatinum Repository
• Prism Warehouse ManagerPrism Warehouse Manager Software AG Source PointSoftware AG Source Point Systems integration and consultingSystems integration and consulting
Research IssuesResearch Issues
Data cleaningData cleaning• focus on data inconsistencies, not schema differencesfocus on data inconsistencies, not schema differences
• data mining techniquesdata mining techniques Physical DesignPhysical Design
• design of summary tables, partitions, indexesdesign of summary tables, partitions, indexes
• tradeoffs in use of different indexestradeoffs in use of different indexes Query processingQuery processing
• dynamic optimization with feedbackdynamic optimization with feedback
• acid test for query optimization: cost estimation, use of transformations, acid test for query optimization: cost estimation, use of transformations, search strategiessearch strategies
• partitioning query processing between OLAP server and backend server.partitioning query processing between OLAP server and backend server.