Top Banner
Francesco Bonchi, Fosca Giannotti, Claudio Lucchese, Salvatore Orlando, Raffaele Perego, Roberto Trasarti KDD Laboratory HPC Laboratory ISTI – C.N.R. Italy “On Interactive Pattern Mining from Relational Databases”
19

ConQueSt

May 27, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ConQueSt

Francesco Bonchi, Fosca Giannotti, Claudio Lucchese,

Salvatore Orlando, Raffaele Perego, Roberto Trasarti

KDD LaboratoryHPC Laboratory

ISTI – C.N.R.Italy

“On Interactive Pattern Mining from Relational Databases”

Page 2: ConQueSt

Demo @ ICDE’06 and Black Forest Workshop

New features:Discretization toolsOn the fly strenghtening/relaxing of contraintsSoft constraints (see this afternoon talk)

Plan of the talk: in a nutshell Constraint-based Frequent Pattern Discovery Language, architecture, mining engine Demo Future developments

Page 3: ConQueSt

in a nutshellA Constraint-based Querying System aimed at supporting Frequent Patterns Discovery.

Follows the Inductive Database vision: mining as a querying process closure principle: patterns are first class citizens mining engine amalgamated with commercial DBMS

Focus on constraint-based frequent patterns: large variety of constraints handled very efficient and robust mining engine

SPQL: “simple pattern query language”superset of SQLuses SQL to define the input data sourcesplus some syntactic sugar to specify data prep-processingplus some syntactic sugar to specify mining parameters

Page 4: ConQueSt

in a nutshell

The Knowledge Discovery Process

Page 5: ConQueSt

in a nutshell

Knowledge Discovery is an intrinsically exploratory process:

human-guided interactive Iterative… efficiency is a issue!

Constraints can be used to drive the discovery process toward potentially interesting patterns.

Constraints can also be used to reduce the cost of pattern mining computation.

Page 6: ConQueSt

Frequent Pattern Discovery

Frequent Pattern Discovery, i.e. mining patterns which satisfy a user-defined constraint of minimum frequency.

Basic step of “Association Rules” mining

Market Basket Analysis

Customer1

Customer2 Customer3

Milk, eggs, sugar, bread

Milk, eggs, cereal, bread Eggs, sugar

Page 7: ConQueSt

Constraint-based Frequent Patterns

I = {x1,…,xn}

Constraint: C: 2I {True, False}

Frequency constraint: D a bag of transactions t I supD(x) = |{tD| X t}|

minimum support supD(x)

Other constraints: defined on the items forming an itemset

defined on some attributes of the items

Page 8: ConQueSt

Constraint-based Frequent Patterns

Q: supD(x) 2 sum(x.price) 20

Solution set:

{meat}{fruit,meat}

Item pricebeer 4milk 2meat 20fruit 3

vegetables 15cereals 6

Transaction ID Items Bought1 beer,milk2 meat,fruit, vegetable3 beer, fruit4 fruit, cereals, meat

Page 9: ConQueSt

Constraint-based Frequent Patterns

This is an ideal situation…

... when you come to real data:No transactions but relations

Functional dipendency itemattribute hardly held

(e.g. prices change along time)

Item pricebeer 4milk 2meat 20fruit 3

vegetables 15cereals 6

Transaction ID Items Bought1 beer,milk2 meat,fruit, vegetable3 beer, fruit4 fruit, cereals, meat

Page 10: ConQueSt

provides:

easy way to define the “mining view” just indicate which features are items

which features are transactions

which features are items attributes

it handles both inter-attribute and intra-attribute frequent patterns mining

easy way to solve items-attribute conflicts e.g. different prices for item “beer”

possible solutions: take-first, take-avg, take-min etc…

Page 11: ConQueSt

(Simple Pattern Query Language)

MINE PATTERNS WITH SUPP>= 5 INSELECT product.product_name, product.gross_weight,

sales.time_id, sales.customer_id, sales.store_idFROM [product], [sales_fact_1998]WHERE sales_fact_1998.product_id=product.product_id TRANSACTION sales.time_id, sales.customer_id,

sales.store_idITEM product.product_nameATTRIBUTE product.gross_weightCONSTRAINED BY Average(product.gross_weight)<=15

Page 12: ConQueSt
Page 13: ConQueSt

’s mining engine

Level-wise apriori-like algorithm DCIDCI + ExAMinerExAMiner + ExAMinerExAMinerlam lam + … Able to push a large variety of constraintssubset, supset, lenght, min, max, sum, range, avg, var,

med, md, std, etc…

Efficient and robust Modular Data aware Resource aware

Page 14: ConQueSt

Demo

Page 15: ConQueSt

: future developments

Strenghten the pattern browser interactive querying

incremental mining

visualization tools

Strenghten post-processing of patterns not only rules… build global models from the extracted patterns

More complex patterns sequences, graphs etc…

Page 16: ConQueSt

’s contacts

Webpage (wrk in progress):

http://www-kdd.isti.cnr.it/ConQueSt/

Contact:

[email protected]

[email protected]

Page 17: ConQueSt

ReferencesF. Bonchi, F. Giannotti, C. Lucchese, S. Orlando, R. Perego, R. Trasarti ConQueSt: a Constraint-based Querying System for Exploratory Pattern Discovery In Proceedings of The 22nd International Conference on Data Engineering (ICDE'06), ©IEEE. April 3-7, 2006, Atlanta, GA, USA.

S. Bistarelli, F. Bonchi, Interestingness is not a Dichotomy: Introducing Softness in Constrained Pattern Mining In Proceedings of the Ninth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'05) Lecture Notes in Computer Science, Volume 3721, ©Springer. October 3-7, 2005, Porto, Portugal.

F. Bonchi, C. Lucchese Pushing Tougher Constraints in Frequent Pattern Mining In Proceedings of the Ninth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'05). Lecture Notes in Computer Science, Volume 3518, ©Springer. May 18-20, 2005, Hanoi, Vietnam.

F. Bonchi, C. Lucchese On Closed Constrained Frequent Pattern Mining In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM'04), ©IEEE. November 01-04, 2004. Brighton, UK.

F. Bonchi, B. Goethals FP-Bonsai: the Art of Growing and Pruning Small FP-trees In Proceedings of the Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'04). Lecture Notes in Computer Science, Volume 3056, ©Springer. May 26-28, 2004, Sydney, Australia.

Page 18: ConQueSt

ReferencesF. Bonchi, F. Giannotti, A. Mazzanti, D. Pedreschi ExAMiner: Optimized Level-wise Frequent Pattern Mining with Monotone Constraints In Proceedings of the Third IEEE International Conference on Data Mining (ICDM'03), ©IEEE. November 19-22, 2003 Melbourne, Florida, USA.

F. Bonchi, F. Giannotti, A. Mazzanti, D. Pedreschi ExAnte: Anticipated Data Reduction in Constrained Pattern Mining In Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'03) Lecture Notes in Computer Science, Volume 2838, ©Springer. September 22-26, 2003, Cavtat-Dubrovnik, Croatia.

F. Bonchi, F. Giannotti, A. Mazzanti, D. Pedreschi Adaptive Constraint Pushing in Frequent Pattern Mining In Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'03) Lecture Notes in Computer Science, Volume 2838, ©Springer. September 22-26, 2003, Cavtat-Dubrovnik, Croatia.

F.Bonchi, C.Lucchese Extending the State-of-the-Art of Constraint-based Pattern Discovery Data and Knowledge Engineering (DKE) ©Elsevier, Accepted for Publication, 2006.

F.Bonchi, C.Lucchese On Condensed Representations of Constrained Frequent Patterns Knowledge and Information Systems - An International Journal (KAIS) ©Springer, 9(2), February 2006.

Page 19: ConQueSt

ReferencesF.Bonchi, F.Giannotti, A. Mazzanti, D. Pedreschi ExAnte: A Preprocessing Method for Frequent Pattern Mining IEEE Intelligent Systems, ©IEEE, 20(3):2-8 May/June 2005.

F.Bonchi, F.Giannotti, A. Mazzanti, D. Pedreschi Efficient Breadth-first Mining of Frequent Pattern with Monotone Constraints Knowledge and Information Systems - An International Journal (KAIS) ©Springer, 8(2), August 2005.

F. Bonchi, F.Giannotti, D.Pedreschi A Relational Query Primitive For Constraint-based Pattern Mining In "Constraint-based Mining and Inductive Databases", Jean-Francois Boulicaut, Luc De Raedt and Heikki Mannila Ed., Lecture Notes in Computer Science, Volume 3848, ©Springer, 2005.

F. Bonchi, F.Giannotti Pushing Constraints To Detect Local Patterns In "Detecting Local Patterns", Katharina Morik, Jean-Francois Boulicaut and Arno Siebes Ed., Lecture Notes in Computer Science, Volume 3539, ©Springer, 2005.

F. Bonchi, F. Giannotti, D. Pedreschi Frequent Pattern Queries for Flexible Knowledge Discovery In Proceedings of the Twelfth Italian Symposium on Advanced Database Systems (SEBD'04), 2004.

F. Bonchi Frequent Pattern Queries: Language and Optimizations Ph.D. Thesis, TD10-03, Dipartimento di Informatica Università di Pisa, 2003.