Top Banner
Integrating Data Integrating Data Mining with SQL Mining with SQL Databases: OLE DB Databases: OLE DB for Data Mining for Data Mining Presenter: Lei Chen Presenter: Lei Chen
37

Integrating Data Mining with SQL Databases: OLE DB for Data ...

Jun 20, 2015

Download

Documents

Tommy96
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Integrating Data Integrating Data Mining with SQL Mining with SQL

Databases: OLE DB Databases: OLE DB for Data Miningfor Data Mining

Presenter: Lei ChenPresenter: Lei Chen

Page 2: Integrating Data Mining with SQL Databases: OLE DB for Data ...

OverviewOverview

Background knowledgeBackground knowledge Overview and design philosophyOverview and design philosophy Basic componentsBasic components Operations on data modelOperations on data model Concluding remarksConcluding remarks

Page 3: Integrating Data Mining with SQL Databases: OLE DB for Data ...

What is Data Ming?What is Data Ming?

Definition:Definition:   Discovery of useful summarDiscovery of useful summaries of data. ies of data.

Some Examples of Applications: Some Examples of Applications: 1.1. Decision Trees constructed from bank-loan histories to prDecision Trees constructed from bank-loan histories to pr

oduce algorithms to decide whether to grant a loan.oduce algorithms to decide whether to grant a loan.2.2. Patterns of traveler behavior mined to manage the sale of Patterns of traveler behavior mined to manage the sale of

discounted seat on the planes, rooms in hotels, etc. discounted seat on the planes, rooms in hotels, etc. 3.3. ““Diapers and Beer” in supermarkets for increasing sales.Diapers and Beer” in supermarkets for increasing sales.4.4. Comparison of genotype with/without a condition allowed Comparison of genotype with/without a condition allowed

the discovery of genes that account for diseases.the discovery of genes that account for diseases.

Page 4: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Why need OLE DBWhy need OLE DBTo address the following needs, Microsoft created To address the following needs, Microsoft created

OLE DBOLE DB.. A vast amount of the critical information is found A vast amount of the critical information is found

outside of traditional corporate databases, instead, outside of traditional corporate databases, instead, found in file systems, or personal systems such as found in file systems, or personal systems such as Microsoft Access , spreadsheets, E-mails, and even on Microsoft Access , spreadsheets, E-mails, and even on the World Wide Web. the World Wide Web.

To take advantage of the benefits of database To take advantage of the benefits of database technology, such as queries, transactions, and security, technology, such as queries, transactions, and security, businesses have traditionally had to move the data businesses have traditionally had to move the data from its original containing system into (DBMS). This from its original containing system into (DBMS). This process is expensive and redundant. process is expensive and redundant.

Furthermore, businesses need to be able to exploit the Furthermore, businesses need to be able to exploit the advantages of database technology not only when advantages of database technology not only when accessing data within a DBMS but also when accessing accessing data within a DBMS but also when accessing data from any other type of information container.data from any other type of information container.

Page 5: Integrating Data Mining with SQL Databases: OLE DB for Data ...

What is OLE DB?What is OLE DB? OObject bject LLinking and inking and EEmbedding for mbedding for DDataataBBasesases, is a , is a

means Microsoft use for accessing different types of data means Microsoft use for accessing different types of data stores in a uniform manner. Now, OLE DB is an open stores in a uniform manner. Now, OLE DB is an open technology available royalty free in many operating technology available royalty free in many operating systems.systems.

OLE DB is a set of Component Object Model (COM) OLE DB is a set of Component Object Model (COM) interfaces that provide applications with uniform access to interfaces that provide applications with uniform access to data stored in diverse information sources and that also data stored in diverse information sources and that also provide the ability to implement additional database provide the ability to implement additional database services. services.

OLE DB is the way to access data in a MS COM OLE DB is the way to access data in a MS COM environment.environment.

References:httpReferences:http://msdn.microsoft.com/library/default.asp?://msdn.microsoft.com/library/default.asp?url=/library/en-us/oledb/htm/oledboverview_of_ole_db.aspurl=/library/en-us/oledb/htm/oledboverview_of_ole_db.asp

Page 6: Integrating Data Mining with SQL Databases: OLE DB for Data ...

OverviewOverview

Background knowledgeBackground knowledge Overview and design philosophyOverview and design philosophy Basic componentsBasic components Operations on data modelOperations on data model Concluding remarksConcluding remarks

Page 7: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Why OLE DB for Data Why OLE DB for Data Mining?Mining?

Industry standard is critical for data mining development, Industry standard is critical for data mining development, usage, interoperability, and exchange.usage, interoperability, and exchange.

Objects, data types, properties, and important programming Objects, data types, properties, and important programming in OLE DB naturally cater to the needs in OLE DB naturally cater to the needs

Building mining applications over relational databases is Building mining applications over relational databases is nontrivial.nontrivial. Need different customized data mining algorithms and Need different customized data mining algorithms and

methods.methods. Significant work on the part of application builders.Significant work on the part of application builders.

Data providers: All structured data, i.e. data that support OLE Data providers: All structured data, i.e. data that support OLE DBDB

Data consumers : All development tools or languages Data consumers : All development tools or languages requiring access to a broad range of datarequiring access to a broad range of data

Page 8: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Motivation of OLE DB for Motivation of OLE DB for DMDM

Facilitate deployment of data mining modelsFacilitate deployment of data mining models Generate data mining modelsGenerate data mining models Store, maintain and refresh models as data are Store, maintain and refresh models as data are

updatedupdated Programmatically use the model on other data Programmatically use the model on other data

setssets Browse modelsBrowse models

Enable enterprise application developers to Enable enterprise application developers to participate in building data mining solutionsparticipate in building data mining solutions

Page 9: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Features of OLE DB for Features of OLE DB for DMDM

Independent of provider or softwareIndependent of provider or software Not specialized to any specific mining modelNot specialized to any specific mining model Structured to cater to all well-known mining moStructured to cater to all well-known mining mo

delsdels Part of upcoming release of Microsoft SQL ServePart of upcoming release of Microsoft SQL Serve

r 2000 r 2000 Not propose new Data mining algorithm, but to sNot propose new Data mining algorithm, but to s

uggest an infrastructure to “plug in” any algoriuggest an infrastructure to “plug in” any algorithms.thms.

Page 10: Integrating Data Mining with SQL Databases: OLE DB for Data ...

OLE DB for DM: OLE DB for DM: OverviewOverview

Core relational engine exCore relational engine exposes OLE DB in a languaposes OLE DB in a language-based APIge-based API

Analysis server exposes OAnalysis server exposes OLE DB OLAP and OLE DB LE DB OLAP and OLE DB DMDM

Maintain SQL metaphorMaintain SQL metaphor Reuse existing notionsReuse existing notions RDB engine

OLE DB

Analysis Server

OLE DB DM

Data miningapplications

Page 11: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Key Operations to Support Key Operations to Support Data Mining ModelsData Mining Models

DefineDefine a mining model a mining model Attributes to be predictedAttributes to be predicted Attributes to be used for predictionAttributes to be used for prediction Algorithm used to build the modelAlgorithm used to build the model

PopulatePopulate a mining model from training a mining model from training datadata

PredictPredict attributes for new data attributes for new data BrowseBrowse a mining model fro reporting an a mining model fro reporting an

d visualizationd visualization

Page 12: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Data Mining Model is Data Mining Model is Analogous to A Table in Analogous to A Table in

SQLSQL Create a data mining module objectCreate a data mining module object

CREATE MINING MODEL [model_name] CREATE MINING MODEL [model_name] Insert training data into the model and train itInsert training data into the model and train it

INSERT INTO [model_name]INSERT INTO [model_name] Use the data mining modelUse the data mining model

SELECT relation_name.[id], [model_name].[predict_attr]SELECT relation_name.[id], [model_name].[predict_attr] consult DMM content in order to make predictions and broconsult DMM content in order to make predictions and bro

wse statistics obtained by the model wse statistics obtained by the model Use DELETE to Use DELETE to empty/resetempty/reset Predictions on datasets: Predictions on datasets: prediction joinprediction join between a between a

model and a data set (tables)model and a data set (tables) Deploy DMM by just writing SQL queries!Deploy DMM by just writing SQL queries!

Page 13: Integrating Data Mining with SQL Databases: OLE DB for Data ...

OverviewOverview

Background knowledgeBackground knowledge Overview and design philosophyOverview and design philosophy Basic componentsBasic components Operations on data modelOperations on data model Concluding remarksConcluding remarks

Page 14: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Two Basic Components Two Basic Components Beyond Traditional OLE DBBeyond Traditional OLE DB

Cases/caseset: input dataCases/caseset: input data A table or nested tables (for hierarchical data)A table or nested tables (for hierarchical data)

Data mining model (DMM): a special type of Data mining model (DMM): a special type of tabletable A caseset is associated with a DMM and meta-info A caseset is associated with a DMM and meta-info

while creating a DMMwhile creating a DMM Save mining algorithm and resulting abstraction Save mining algorithm and resulting abstraction

instead of data itselfinstead of data itself Fundamental operations: CREATE, INSERT INTO, Fundamental operations: CREATE, INSERT INTO,

PREDICTION JOIN, SELECT, DELETE FROM, and PREDICTION JOIN, SELECT, DELETE FROM, and DROPDROP

Page 15: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Flatterned Representation oFlatterned Representation of Casesetf Caseset

CustomeCustomersrs

Customer Customer IDID

GenderGender

Hair Hair ColorColor

AgeAge

Age ProbAge ProbCar OwernCar Owern

ershipershipCustomer Customer IDID

CarCar

Car ProbCar Prob

Product Product PurchasesPurchases

Customer Customer IDID

Product Product NameName

QuantityQuantity

Product Product TypeTypeCIDCID GendGend HairHair AgeAge Age proAge pro

bbProProdd

QuanQuan TypTypee

CarCar Car proCar probb

11 MaleMale BlacBlackk

3535 100%100% TVTV 11 ElecElec CarCar 100%100%

11 MaleMale BlacBlackk

3535 100%100% VCRVCR 11 ElecElec CarCar 100%100%

11 MaleMale BlacBlackk

3535 100%100% HamHam 66 FoodFood CarCar 100%100%

11 MaleMale BlacBlackk

3535 100%100% TVTV 11 ElecElec VanVan 50%50%

11 MaleMale BlacBlackk

3535 100%100% VCRVCR 11 ElecElec VanVan 50%50%

11 MaleMale BlacBlackk

3535 100%100% HamHam 66 FoodFood VanVan 50%50%

Problem: Lots of replication!

Page 16: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Logical Nested Table Logical Nested Table Representation of Representation of

CasesetCaseset Caseset: a set of cases.Caseset: a set of cases.

Use Data Shaping Service to Use Data Shaping Service to

generate a hierarchical rowset.generate a hierarchical rowset.

CIDCID GendGend HairHair AgeAge Age proAge probb

Product Product PurchasesPurchases

Car Car OwnershipOwnership

ProProdd

QuanQuan TypTypee

CarCar Car proCar probb

11 MaleMale BlacBlackk

3535 100%100%

TVTV 11 ElecElec CarCar 100%100%

VCRVCR 11 ElecElecVanVan 50%50%

HamHam 66 FoodFood

Page 17: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Defining A Data Mining Defining A Data Mining ModelModel

The name of the modelThe name of the model

The algorithm and parametersThe algorithm and parameters

The algorithm for prediction using this modelThe algorithm for prediction using this model

The columns of caseset and the relationships The columns of caseset and the relationships

among columnsamong columns

““Source columns” and “prediction columns”Source columns” and “prediction columns”

Page 18: Integrating Data Mining with SQL Databases: OLE DB for Data ...

ExampleExample

CREATE MINING MODEL [Age Prediction] %Name of Model([Customer ID] LONG KEY, %source column[Gender] TEXT DISCRETE, %source column[Age] Double DISCRETIZED() PREDICT, %prediction column[Product Purchases] TABLE %source column([Product Name] TEXT KEY, %source column[Quantity] DOUBLE NORMAL CONTINUOUS, %source column[Product Type] TEXT DISCRETE RELATED TO [Product Name]

%source column))USING [Decision_Trees_101] %Mining algorithm used

Page 19: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Content Type of Columns: Content Type of Columns: Column Specifiers Column Specifiers

KEYKEY ATTRIBUTEATTRIBUTE RELATION (RELATED TO clause)RELATION (RELATED TO clause) QUALIFIER (OF clause)QUALIFIER (OF clause)

PROBABILITY: [0, 1]PROBABILITY: [0, 1] VARIANCEVARIANCE SUPPORTSUPPORT PROBABILITY-VARIANCEPROBABILITY-VARIANCE ORDERORDER TABLETABLE

Page 20: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Content Type of Columns: Content Type of Columns: Attribute TypesAttribute Types

DISCRETE: “Area Code”DISCRETE: “Area Code” ORDERED: “ a ranking of skill level” ORDERED: “ a ranking of skill level”

(say one to five)(say one to five) CYCLICAL: “day of week”CYCLICAL: “day of week” CONTINOUS: “salary”CONTINOUS: “salary” DISCRETIZED: “Age”DISCRETIZED: “Age” SEQUENCE_TIME a measurement SEQUENCE_TIME a measurement

for “time”for “time”

Page 21: Integrating Data Mining with SQL Databases: OLE DB for Data ...

OverviewOverview

Background knowledgeBackground knowledge Overview and design philosophyOverview and design philosophy Basic componentsBasic components Operations on data modelOperations on data model Concluding remarksConcluding remarks

Page 22: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Populating a DMMPopulating a DMM Once a mining model is defined, the next step is to populate a Once a mining model is defined, the next step is to populate a

mining model by consuming a caseset that satisfies mining model by consuming a caseset that satisfies

specification;specification; In other works,In other works, pull the information into a single pull the information into a single

rowset, and use INSERT INTO statementrowset, and use INSERT INTO statement

Consume a case using the data mining modelConsume a case using the data mining model

Use SHAPE statement to create the nested table from the input Use SHAPE statement to create the nested table from the input

data. data.

Train the model using the data and algorithm specified in create Train the model using the data and algorithm specified in create

syntax.syntax.

Page 23: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Example: Populating a Example: Populating a DMMDMM

INSERT INTO [Age Prediction]([Customer ID], [Gender], [Age],[Product Purchases]( [Product Name], [Quantity], [Product Type])) SHAPE{SELECT [Customer ID], [Gender], [Age] FROM Customers ORDER BY [Customer ID]}APPEND{SELECT [CustID], {product Name], [Quantity], [Product Type] FROM SalesORDER BY [CustID]}RELATE [Customer ID] TO [CustID])AS [Product Purchases]

Page 24: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Using Data Model to Using Data Model to PredictPredict

Prediction joinPrediction join Prediction on dataset D using DMM MPrediction on dataset D using DMM M Different to join statement in SQLDifferent to join statement in SQL

DMM: a “truth table”DMM: a “truth table” SELECT statement associated with SELECT statement associated with

PREDICTION JOIN specifies values PREDICTION JOIN specifies values extracted from DMMextracted from DMM

Page 25: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Example: Using a DMM Example: Using a DMM in Predictionin Prediction

SELECT t.[Customer ID], [Age Prediction].[Age]FROM [Age Prediction]PREDICTION JOIN(SHAPE

{SELECT [Customer ID], [Gender] FROM Customers ORDER BY [Customer ID]}APPEND({SELECT [CustID], [Product Name], [Quantity] FROM Sales ORDER BY [CustID]}RELATE [Customer ID] TO [CustID])AS [Product Purchases])

AS tON [Age Prediction].[Gender]=t.[Gender] AND[Age Prediction].[Product Purchases].[Product Name]=t.[Product Purchases].[Product Name] AND[Age Prediction].[Product Purchases].[Quantity]=t.[Product Purchases].[Quantity]

Page 26: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Browsing DMMBrowsing DMM

What is in a DMM?What is in a DMM?

Rules, formulas, trees, …, etcRules, formulas, trees, …, etc

Browsing DMMBrowsing DMM

VisualizationVisualization

Page 27: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Concluding RemarksConcluding Remarks

This paper focus on the problem of This paper focus on the problem of integration of data mining with relational integration of data mining with relational databases, rather than KDD algorithms.databases, rather than KDD algorithms.

OLE DB for DM integrates data mining OLE DB for DM integrates data mining and database systemsand database systems A good standard for mining application A good standard for mining application

builders builders Possible improvementPossible improvement

Extend the functionalities of the OLE DB Extend the functionalities of the OLE DB DM?DM?

Design more concrete language primitives?Design more concrete language primitives?

Page 28: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Finally See why OLEDB Finally See why OLEDB for Data Mining? for Data Mining?

Disadvantages of other Data Mining LanguageDisadvantages of other Data Mining Languages: They can not deal with either arbitrary mis: They can not deal with either arbitrary mining models or integration of relational dataning models or integration of relational database API with mining applications. So Microbase API with mining applications. So Microsoft proposes OLEDB for DM.soft proposes OLEDB for DM.

DMQL: A Data Mining Query Language for Relational DatabasesDMQL: A Data Mining Query Language for Relational Databases MSQL (Imielinski & Virmani’99) MSQL (Imielinski & Virmani’99) MineRule (Meo Psaila and Ceri’96) MineRule (Meo Psaila and Ceri’96) Query flocks based on Datalog syntax (Tsur et al’98)Query flocks based on Datalog syntax (Tsur et al’98) CRISP-DM (CRoss-Industry Standard Process for Data Mining)CRISP-DM (CRoss-Industry Standard Process for Data Mining)

Page 29: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Questions and Questions and comments?comments?

Thank You!Thank You!

Page 30: Integrating Data Mining with SQL Databases: OLE DB for Data ...

OLE DB (sometimes written OLE DB (sometimes written as OLEDB or OLE-DB), as OLEDB or OLE-DB),

OLE DBOLE DB (sometimes written as OLEDB or OLE-DB), Object Linking and (sometimes written as OLEDB or OLE-DB), Object Linking and Embedding for Databases, is a means Embedding for Databases, is a means MicrosoftMicrosoft use for accessing different types use for accessing different types of of datadata stores in a uniform manner. Microsoft has separated the data store from stores in a uniform manner. Microsoft has separated the data store from the application that needs access to it through the use of this technology: this the application that needs access to it through the use of this technology: this was done because different applications need access to different types and was done because different applications need access to different types and sources of data and do not necessarily want to know how to access functionality sources of data and do not necessarily want to know how to access functionality with technology-specific methods. OLE DB is conceptually divided into with technology-specific methods. OLE DB is conceptually divided into consumersconsumers and and providersproviders. The consumers are the applications that need access . The consumers are the applications that need access to the data, and the provider is the software component that exposes an OLE DB to the data, and the provider is the software component that exposes an OLE DB interface through the use of the interface through the use of the Component Object ModelComponent Object Model (or COM). (or COM).

OLE DB is part of the OLE DB is part of the MDACMDAC (Microsoft Data Access Components) stack and is (Microsoft Data Access Components) stack and is the database access interface technology. MDAC is a group of Microsoft the database access interface technology. MDAC is a group of Microsoft technologies that interact together as a framework that allows programmers a technologies that interact together as a framework that allows programmers a uniform and comprehensive way of developing applications for accessing almost uniform and comprehensive way of developing applications for accessing almost any data store. OLE DB providers can be created to access such simple data any data store. OLE DB providers can be created to access such simple data stores as a text file or stores as a text file or spreadsheetspreadsheet, through to such complex databases as , through to such complex databases as OracleOracle and and SQL ServerSQL Server. However, because different data store technology can have . However, because different data store technology can have different capabilities, OLE DB providers may not implement every possible different capabilities, OLE DB providers may not implement every possible interface available to OLE DB. The capabilities that are available are interface available to OLE DB. The capabilities that are available are implemented through the use of COM objects - an OLE DB provider will map the implemented through the use of COM objects - an OLE DB provider will map the data store technologies functionality to a particular COM interface. Microsoft data store technologies functionality to a particular COM interface. Microsoft calls the availability of an interface to be "provider-specific" as it may not be calls the availability of an interface to be "provider-specific" as it may not be applicable depending on the database technology involved. Additionally, applicable depending on the database technology involved. Additionally, however, providers may also augment the capabilities of a data store - these however, providers may also augment the capabilities of a data store - these capabilities are known as capabilities are known as servicesservices in Microsoft parlance. in Microsoft parlance.

Page 31: Integrating Data Mining with SQL Databases: OLE DB for Data ...

ADO

Microsoft ADO(ActiveX Data Objects) is a Component object model object for accessing data sources. It provides a layer between programming languages and OLE DB (a means of accessing data stores, whether they be databases or otherwise, in a uniform manner), which allows a developer to write programs which access data, without knowing how the database is implemented. You must be aware of your database for connection only. No knowledge of SQL is required to access a database when using ADO, although one can use ADO to execute arbitrary SQL commands. The disadvantage of this is that this introduces a dependency upon the database.

Page 32: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Component Object Model Component Object Model Component Object ModelComponent Object Model, or , or COMCOM, is a , is a MicrosoftMicrosoft platf platf

orm for orm for software software componentrycomponentry. It is used to enable cros. It is used to enable cross-application communication and dynamic object creatis-application communication and dynamic object creation in any on in any programming languageprogramming language that supports the tech that supports the technology. COM is often used in the software development nology. COM is often used in the software development world as an umbrella term that encompasses the world as an umbrella term that encompasses the OLEOLE, , ActiveX, COM+ and ActiveX, COM+ and DCOMDCOM technologies. COM has been technologies. COM has been around since 1993, however Microsoft only really startearound since 1993, however Microsoft only really started emphasizing the name around 1997.d emphasizing the name around 1997.

Although it has been implemented on several platforms,Although it has been implemented on several platforms, it is primarily used with it is primarily used with Microsoft WindowsMicrosoft Windows. COM is ex. COM is expected to be replaced to at least some extent by the pected to be replaced to at least some extent by the Microsoft .NETMicrosoft .NET framework. framework.

Page 33: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Component Object ModelComponent Object Model,,

Component Object ModelComponent Object Model, or , or COMCOM, is a , is a MicrosoftMicrosoft platform for platform for software software componentrycomponentry. It is used to enable cross-app. It is used to enable cross-application communication and dynamic objeclication communication and dynamic object creation in any t creation in any programming languageprogramming language th that supports the technology. COM is often usat supports the technology. COM is often used in the software development world as an ed in the software development world as an umbrella term that encompasses the umbrella term that encompasses the OLEOLE, , ActiveX, COM+ and ActiveX, COM+ and DCOMDCOM technologies. technologies.

Page 34: Integrating Data Mining with SQL Databases: OLE DB for Data ...

Component Object ModelComponent Object Model,,

The object model for all Windows softThe object model for all Windows softwareware

Proven, de facto industry-standardProven, de facto industry-standard Becoming available on non-Windows pBecoming available on non-Windows p

latformslatformsUnix via Bristol and Mainsoft productsUnix via Bristol and Mainsoft productsSoftware AG has 18 ports in progress for USoftware AG has 18 ports in progress for U

nix and Mainframe platformsnix and Mainframe platforms

Page 35: Integrating Data Mining with SQL Databases: OLE DB for Data ...

data warehousedata warehouse A A data warehousedata warehouse is, primarily, a record of an enterprise's past transactional a is, primarily, a record of an enterprise's past transactional a

nd operational information, stored in a nd operational information, stored in a databasedatabase designeddesigned to favour efficient da to favour efficient data analysis and reporting (especially ta analysis and reporting (especially OLAPOLAP). Data warehousing is not meant for ). Data warehousing is not meant for current "live" data.current "live" data.

Data warehouses often hold large amounts of Data warehouses often hold large amounts of informationinformation which are sometime which are sometimes subdivided into smaller logical units called s subdivided into smaller logical units called dependent data martsdependent data marts..

Usually, two basic ideas guide the creation of a data warehouse:Usually, two basic ideas guide the creation of a data warehouse: IntegrationIntegration of data from distributed and differently structured databases, whic of data from distributed and differently structured databases, whic

h facilitates a global overview and comprehensive analysis in the data warehouh facilitates a global overview and comprehensive analysis in the data warehouse. se.

SeparationSeparation of data used in daily operations from data used in the data warehou of data used in daily operations from data used in the data warehouse for purposes of reporting, decision support, analysis and controlling. se for purposes of reporting, decision support, analysis and controlling.

Periodically, one imports Periodically, one imports datadata from from enterprise resource planningenterprise resource planning (ERP) system (ERP) systems and other related business software systems into the data warehouse for furths and other related business software systems into the data warehouse for further processing. It is common practice to "stage" er processing. It is common practice to "stage" datadata prior to merging it into a da prior to merging it into a data warehouse. In this sense, to "stage data" means to queue it for preprocessing, ta warehouse. In this sense, to "stage data" means to queue it for preprocessing, usually with an usually with an ETLETL tool. The preprocessing program reads the staged data (oft tool. The preprocessing program reads the staged data (often a en a businessbusiness's primary 's primary OLTPOLTP databases), performs qualitative preprocessing o databases), performs qualitative preprocessing or filtering (including r filtering (including denormalizationdenormalization, if deemed necessary), and writes it into t, if deemed necessary), and writes it into the warehouse.he warehouse.

Page 36: Integrating Data Mining with SQL Databases: OLE DB for Data ...

ODBCODBC (pronounced as separate letters) Short for (pronounced as separate letters) Short for OOpen pen DDataataBBase ase

CConnectivityonnectivity, a , a standardstandard databasedatabase accessaccess method develop method developed by the ed by the SQLSQL AccessAccess group in 1992. The goal of group in 1992. The goal of ODBCODBC is t is to make it possible to access any o make it possible to access any datadata from any from any applicationapplication, regardless of which database management system (DBM, regardless of which database management system (DBMS) is handling the data. ODBC manages this by inserting a S) is handling the data. ODBC manages this by inserting a middle layer, called a middle layer, called a databasedatabase driver , between an applic driver , between an application and the DBMS. The purpose of this layer is to translaation and the DBMS. The purpose of this layer is to translate the application's data queries into commands that the Dte the application's data queries into commands that the DBMS understands. For this to work, both the application aBMS understands. For this to work, both the application and the DBMS must be ODBC-compliant -- that is, the applind the DBMS must be ODBC-compliant -- that is, the application must be capable of issuing ODBC commands and tcation must be capable of issuing ODBC commands and the DBMS must be capable of responding to them. Since vehe DBMS must be capable of responding to them. Since version 2.0, the standard supports SAG SQL.rsion 2.0, the standard supports SAG SQL.

Page 37: Integrating Data Mining with SQL Databases: OLE DB for Data ...

OLE DBOLE DB Object Linking and Embedding for DataBasesObject Linking and Embedding for DataBases, is , is

a means Microsoft use for accessing different types of a means Microsoft use for accessing different types of data stores in a uniform manner. Microsoft has data stores in a uniform manner. Microsoft has separated the data store from the application that separated the data store from the application that needs access to it through the use of this technology: needs access to it through the use of this technology: this was done because different applications need this was done because different applications need access to different types and sources of data and do access to different types and sources of data and do not necessarily want to know how to access not necessarily want to know how to access functionality with technology-specific methods. OLE functionality with technology-specific methods. OLE DB is conceptually divided into DB is conceptually divided into consumersconsumers and and providersproviders. The consumers are the applications that . The consumers are the applications that need access to the data, and the provider is the need access to the data, and the provider is the software component that exposes an OLE DB software component that exposes an OLE DB interface through the use of the Component Object interface through the use of the Component Object Model (or COM).Model (or COM).