June 10, 2022 Data Mining: Concepts and Tec hniques 1 Data Mining: Concepts and Techniques
Data Mining:
Concepts and Techniques
Data Mining: Concepts and Techniques
Appendix A: An Introduction to Microsofts OLE OLDB for Data Mining
IntroductionOverview and design philosophyBasic componentsData set componentsData mining modelsOperations on data modelConcluding remarksData Mining: Concepts and Techniques
Why OLE DB for Data Mining?
Industry standard is critical for data mining development, usage, interoperability, and exchangeOLEDB for DM is a natural evolution from OLEDB and OLDB for OLAPBuilding mining applications over relational databases is nontrivialNeed different customized data mining algorithms and methodsSignificant work on the part of application buildersGoal: ease the burden of developing mining applications in large relational databasesData Mining: Concepts and Techniques
Motivation of OLE DB for DM
Facilitate deployment of data mining modelsGenerating data mining modelsStore, maintain and refresh models as data is updatedProgrammatically use the model on other data setBrowse modelsEnable enterprise application developers to participate in building data mining solutionsData Mining: Concepts and Techniques
Features of OLE DB for DM
Independent of provider or softwareNot specialized to any specific mining modelStructured to cater to all well-known mining modelsPart of upcoming release of Microsoft SQL Server 2000Data Mining: Concepts and Techniques
Overview
Core relational engine exposes OLE DB in a language-based APIAnalysis server exposes OLE DB OLAP and OLE DB DMMaintain SQL metaphorReuse existing notionsRDB engine
OLE DB
Analysis Server
OLE DB OLAP/DM
Data mining
applications
Data Mining: Concepts and Techniques
Key Operations to Support Data Mining Models
Define a mining modelAttributes to be predictedAttributes to be used for predictionAlgorithm used to build the modelPopulate a mining model from training dataPredict attributes for new dataBrowse a mining model fro reporting and visualizationData Mining: Concepts and Techniques
DMM As Analogous to A Table in SQL
Create a data mining module objectCREATE MINING MODEL [model_name] Insert training data into the model and train itINSERT INTO [model_name]Use the data mining modelSELECT relation_name.[id], [model_name].[predict_attr]consult DMM content in order to make predictions and browse statistics obtained by the model Using DELETE to empty/resetPredictions on datasets: prediction join between a model and a data set (tables)Deploy DMM by just writing SQL queries!Data Mining: Concepts and Techniques
Two Basic Components
Cases/caseset: input dataA table or nested tables (for hierarchical data)Data mining model (DMM): a special type of tableA caseset is associated with a DMM and meta-info while creating a DMMSave mining algorithm and resulting abstraction instead of data itselfFundamental operations: CREATE, INSERT INTO, PREDICTION JOIN, SELECT, DELETE FROM, and DROPData Mining: Concepts and Techniques
Flatterned Representation of Caseset
Problem: Lots of replication!
CustomersCustomer IDGenderHair ColorAgeAge ProbCar OwernershipCustomer IDCarCar ProbProduct PurchasesCustomer IDProduct NameQuantityProduct TypeCIDGendHairAgeAge probProdQuanTypeCarCar prob1MaleBlack35100%TV1ElecCar100%1MaleBlack35100%VCR1ElecCar100%1MaleBlack35100%Ham6FoodCar100%1MaleBlack35100%TV1ElecVan50%1MaleBlack35100%VCR1ElecVan50%1MaleBlack35100%Ham6FoodVan50%Data Mining: Concepts and Techniques
Logical Nested Table Representation of Caseset
Use Data Shaping Service to generate a hierarchical rowsetPart of Microsoft Data Access Components (MDAC) productsCIDGendHairAgeAge probProduct PurchasesCar OwnershipProdQuanTypeCarCar prob1MaleBlack35100%TV1ElecCar100%VCR1ElecVan50%Ham6FoodData Mining: Concepts and Techniques
More About Nested Table
Not necessary for the storage subsystem to support nested recordsCases are only instantiated as nested rowsets prior to training/predicting data mining modelsSame physical data may be used to generate different casesetsData Mining: Concepts and Techniques
Defining A Data Mining Model
The name of the modelThe algorithm and parametersThe columns of caseset and the relationships among columnsSource columns and prediction columnsData Mining: Concepts and Techniques
Example
CREATE MINING MODEL [Age Prediction]%Name of Model
(
[Customer ID]LONGKEY,%source column
[Gender]TEXTDISCRETE,%source column
[Age]DoubleDISCRETIZED() PREDICT,%prediction column
[Product Purchases]TABLE%source column
(
[Product Name]TEXTKEY,%source column
[Quantity]DOUBLENORMAL CONTINUOUS,%source column
[Product Type]TEXTDISCRETE RELATED TO [Product Name]
%source column
))
USING [Decision_Trees_101]%Mining algorithm used
Data Mining: Concepts and Techniques
Column Specifiers
KEYATTRIBUTERELATION (RELATED TO clause)QUALIFIER (OF clause)PROBABILITY: [0, 1]VARIANCESUPPORTPROBABILITY-VARIANCEORDERTABLEData Mining: Concepts and Techniques
Attribute Types
DISCRETEORDEREDCYCLICALCONTINOUSDISCRETIZEDSEQUENCE_TIMEData Mining: Concepts and Techniques
Populating A DMM
Use INSERT INTO statementConsuming a case using the data mining modelUse SHAPE statement to create the nested table from the input dataData Mining: Concepts and Techniques
Example: Populating a DMM
INSERT INTO [Age Prediction]
(
[Customer ID], [Gender], [Age],
[Product Purchases](SKIP, [Product Name], [Quantity], [Product Type])
)
SHAPE
{SELECT [Customer ID], [Gender], [Age] FROM Customers ORDER BY [Customer ID]}
APPEND
{SELECT [CustID], {product Name], [Quantity], [Product Type] FROM Sales
ORDER BY [CustID]}
RELATE [Customer ID] TO [CustID]
)
AS [Product Purchases]
Data Mining: Concepts and Techniques
Using Data Model to Predict
Prediction joinPrediction on dataset D using DMM MDifferent to equi-joinDMM: a truth tableSELECT statement associated with PREDICTION JOIN specifies values extracted from DMMData Mining: Concepts and Techniques
Example: Using a DMM in Prediction
SELECT t.[Customer ID], [Age Prediction].[Age]
FROM [Age Prediction]
PRECTION JOIN
(SHAPE
{SELECT [Customer ID], [Gender] FROM Customers ORDER BY [Customer ID]}
APPEND
(
{SELECT [CustID], [Product Name], [Quantity] FROM Sales ORDER BY [CustID]}
RELATE [Customer ID] TO [CustID]
)
AS [Product Purchases]
)
AS t
ON [Age Prediction].[Gender]=t.[Gender] AND
[Age Prediction].[Product Purchases].[Product Name]=t.[Product Purchases].[Product Name] AND
[Age Prediction].[Product Purchases].[Quantity]=t.[Product Purchases].[Quantity]
Data Mining: Concepts and Techniques
Browsing DMM
What is in a DMM?Rules, formulas, trees, , etcBrowsing DMMVisualizationData Mining: Concepts and Techniques
Concluding Remarks
OLE DB for DM integrates data mining and database systemsA good standard for mining application builders How can we be involved?Provide association/sequential pattern mining modules for OLE DB for DM?Design more concrete language primitives?Referenceshttp://www.microsoft.com/data.oledb/dm.htmlData Mining: Concepts and Techniques