Francesco Bonchi, Fosca Giannotti, Claudio Lucchese, Salvatore Orlando, Raffaele Perego, Roberto Trasarti KDD Laboratory HPC Laboratory ISTI – C.N.R. Italy “On Interactive Pattern Mining from Relational Databases”
Francesco Bonchi, Fosca Giannotti, Claudio Lucchese,
Salvatore Orlando, Raffaele Perego, Roberto Trasarti
KDD LaboratoryHPC Laboratory
ISTI – C.N.R.Italy
“On Interactive Pattern Mining from Relational Databases”
Demo @ ICDE’06 and Black Forest Workshop
New features:Discretization toolsOn the fly strenghtening/relaxing of contraintsSoft constraints (see this afternoon talk)
Plan of the talk: in a nutshell Constraint-based Frequent Pattern Discovery Language, architecture, mining engine Demo Future developments
in a nutshellA Constraint-based Querying System aimed at supporting Frequent Patterns Discovery.
Follows the Inductive Database vision: mining as a querying process closure principle: patterns are first class citizens mining engine amalgamated with commercial DBMS
Focus on constraint-based frequent patterns: large variety of constraints handled very efficient and robust mining engine
SPQL: “simple pattern query language”superset of SQLuses SQL to define the input data sourcesplus some syntactic sugar to specify data prep-processingplus some syntactic sugar to specify mining parameters
in a nutshell
The Knowledge Discovery Process
in a nutshell
Knowledge Discovery is an intrinsically exploratory process:
human-guided interactive Iterative… efficiency is a issue!
Constraints can be used to drive the discovery process toward potentially interesting patterns.
Constraints can also be used to reduce the cost of pattern mining computation.
Frequent Pattern Discovery
Frequent Pattern Discovery, i.e. mining patterns which satisfy a user-defined constraint of minimum frequency.
Basic step of “Association Rules” mining
Market Basket Analysis
Customer1
Customer2 Customer3
Milk, eggs, sugar, bread
Milk, eggs, cereal, bread Eggs, sugar
Constraint-based Frequent Patterns
I = {x1,…,xn}
Constraint: C: 2I {True, False}
Frequency constraint: D a bag of transactions t I supD(x) = |{tD| X t}|
minimum support supD(x)
Other constraints: defined on the items forming an itemset
defined on some attributes of the items
Constraint-based Frequent Patterns
Q: supD(x) 2 sum(x.price) 20
Solution set:
{meat}{fruit,meat}
Item pricebeer 4milk 2meat 20fruit 3
vegetables 15cereals 6
Transaction ID Items Bought1 beer,milk2 meat,fruit, vegetable3 beer, fruit4 fruit, cereals, meat
Constraint-based Frequent Patterns
This is an ideal situation…
... when you come to real data:No transactions but relations
Functional dipendency itemattribute hardly held
(e.g. prices change along time)
Item pricebeer 4milk 2meat 20fruit 3
vegetables 15cereals 6
Transaction ID Items Bought1 beer,milk2 meat,fruit, vegetable3 beer, fruit4 fruit, cereals, meat
provides:
easy way to define the “mining view” just indicate which features are items
which features are transactions
which features are items attributes
it handles both inter-attribute and intra-attribute frequent patterns mining
easy way to solve items-attribute conflicts e.g. different prices for item “beer”
possible solutions: take-first, take-avg, take-min etc…
(Simple Pattern Query Language)
MINE PATTERNS WITH SUPP>= 5 INSELECT product.product_name, product.gross_weight,
sales.time_id, sales.customer_id, sales.store_idFROM [product], [sales_fact_1998]WHERE sales_fact_1998.product_id=product.product_id TRANSACTION sales.time_id, sales.customer_id,
sales.store_idITEM product.product_nameATTRIBUTE product.gross_weightCONSTRAINED BY Average(product.gross_weight)<=15
’s mining engine
Level-wise apriori-like algorithm DCIDCI + ExAMinerExAMiner + ExAMinerExAMinerlam lam + … Able to push a large variety of constraintssubset, supset, lenght, min, max, sum, range, avg, var,
med, md, std, etc…
Efficient and robust Modular Data aware Resource aware
Demo
: future developments
Strenghten the pattern browser interactive querying
incremental mining
visualization tools
Strenghten post-processing of patterns not only rules… build global models from the extracted patterns
More complex patterns sequences, graphs etc…
’s contacts
Webpage (wrk in progress):
http://www-kdd.isti.cnr.it/ConQueSt/
Contact:
ReferencesF. Bonchi, F. Giannotti, C. Lucchese, S. Orlando, R. Perego, R. Trasarti ConQueSt: a Constraint-based Querying System for Exploratory Pattern Discovery In Proceedings of The 22nd International Conference on Data Engineering (ICDE'06), ©IEEE. April 3-7, 2006, Atlanta, GA, USA.
S. Bistarelli, F. Bonchi, Interestingness is not a Dichotomy: Introducing Softness in Constrained Pattern Mining In Proceedings of the Ninth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'05) Lecture Notes in Computer Science, Volume 3721, ©Springer. October 3-7, 2005, Porto, Portugal.
F. Bonchi, C. Lucchese Pushing Tougher Constraints in Frequent Pattern Mining In Proceedings of the Ninth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'05). Lecture Notes in Computer Science, Volume 3518, ©Springer. May 18-20, 2005, Hanoi, Vietnam.
F. Bonchi, C. Lucchese On Closed Constrained Frequent Pattern Mining In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM'04), ©IEEE. November 01-04, 2004. Brighton, UK.
F. Bonchi, B. Goethals FP-Bonsai: the Art of Growing and Pruning Small FP-trees In Proceedings of the Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'04). Lecture Notes in Computer Science, Volume 3056, ©Springer. May 26-28, 2004, Sydney, Australia.
ReferencesF. Bonchi, F. Giannotti, A. Mazzanti, D. Pedreschi ExAMiner: Optimized Level-wise Frequent Pattern Mining with Monotone Constraints In Proceedings of the Third IEEE International Conference on Data Mining (ICDM'03), ©IEEE. November 19-22, 2003 Melbourne, Florida, USA.
F. Bonchi, F. Giannotti, A. Mazzanti, D. Pedreschi ExAnte: Anticipated Data Reduction in Constrained Pattern Mining In Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'03) Lecture Notes in Computer Science, Volume 2838, ©Springer. September 22-26, 2003, Cavtat-Dubrovnik, Croatia.
F. Bonchi, F. Giannotti, A. Mazzanti, D. Pedreschi Adaptive Constraint Pushing in Frequent Pattern Mining In Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'03) Lecture Notes in Computer Science, Volume 2838, ©Springer. September 22-26, 2003, Cavtat-Dubrovnik, Croatia.
F.Bonchi, C.Lucchese Extending the State-of-the-Art of Constraint-based Pattern Discovery Data and Knowledge Engineering (DKE) ©Elsevier, Accepted for Publication, 2006.
F.Bonchi, C.Lucchese On Condensed Representations of Constrained Frequent Patterns Knowledge and Information Systems - An International Journal (KAIS) ©Springer, 9(2), February 2006.
ReferencesF.Bonchi, F.Giannotti, A. Mazzanti, D. Pedreschi ExAnte: A Preprocessing Method for Frequent Pattern Mining IEEE Intelligent Systems, ©IEEE, 20(3):2-8 May/June 2005.
F.Bonchi, F.Giannotti, A. Mazzanti, D. Pedreschi Efficient Breadth-first Mining of Frequent Pattern with Monotone Constraints Knowledge and Information Systems - An International Journal (KAIS) ©Springer, 8(2), August 2005.
F. Bonchi, F.Giannotti, D.Pedreschi A Relational Query Primitive For Constraint-based Pattern Mining In "Constraint-based Mining and Inductive Databases", Jean-Francois Boulicaut, Luc De Raedt and Heikki Mannila Ed., Lecture Notes in Computer Science, Volume 3848, ©Springer, 2005.
F. Bonchi, F.Giannotti Pushing Constraints To Detect Local Patterns In "Detecting Local Patterns", Katharina Morik, Jean-Francois Boulicaut and Arno Siebes Ed., Lecture Notes in Computer Science, Volume 3539, ©Springer, 2005.
F. Bonchi, F. Giannotti, D. Pedreschi Frequent Pattern Queries for Flexible Knowledge Discovery In Proceedings of the Twelfth Italian Symposium on Advanced Database Systems (SEBD'04), 2004.
F. Bonchi Frequent Pattern Queries: Language and Optimizations Ph.D. Thesis, TD10-03, Dipartimento di Informatica Università di Pisa, 2003.