1 A distributed method for mining association rules Pham Nguyen Anh Huy* Department of Information Technology Vietnam National University of HoChiMinh city presented by Ho Tu Bao School of Knowledge Science Japan Advanced Institute of Science and Technology work done during 3 months of the author JSPS’s fellowship in JAIST)
29
Embed
1 A distributed method for mining association rules Pham Nguyen Anh Huy* Department of Information Technology Vietnam National University of HoChiMinh.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
A distributed method for mining association rules
Pham Nguyen Anh Huy*Department of Information Technology
Vietnam National University of HoChiMinh city
presented by Ho Tu BaoSchool of Knowledge Science
Japan Advanced Institute of Science and Technology
(*work done during 3 months of the author JSPS’s fellowship in JAIST)
2
Introduction
Background
A distributed Apriori algorithm using mobile agents
Experimental evaluation
Conclusion
Outline
3
IntroductionAssociation analysis is a new and attractive research area in data mining
Apriori algorithm (R. Agrawal, IBM 1993) is a key technique for association analysis
Though the apriori principle allows us to considerably reduce the search space, the technique still requires a huge computation, particularly for large database
This research proposes a distributed version of Apriori algorithm using mobile agents. The experiments show that we can reduce computation time when using computers in a distributed computing environment.
4
Introduction
Background Association rules and Apriori
algorithm Mobile agents and Aglets
A distributed Apriori algorithm using mobile agents
Experimental evaluation
Conclusion
Outline
5
Association rules: Market basket analysis
Analyzes customer buying habits by finding associations between the different items that customers place in their “shopping baskets” (in the form X Y, where X and Y are sets of items)
The Apriori algorithm: Finding frequent itemsets using candidate generation
1. Find the frequent itemsets: the sets of items that have support higher than the minimum support A subset of a frequent itemset must also be a frequent itemset
i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset
Iteratively find frequent itemsets Lk with cardinality from 1 to k (k-
itemset) by from candidate itemsets Ck (Lk Ck)
2. Use the frequent itemsets to generate association rules.
Generate candidates C3 from L2using Apriori principle
Itemset
{I1, I2, I3} {I1, I2, I5}
Scan D for count of each candidate
Itemset Sc
{I1, I2, I3} 2 {I1, I2, I5} 2
C3
Compare candidate support count with minimum support count
Itemset Sc
{I1, I2, I3} 2 {I1, I2, I5} 2
L3
Generate candidates C2 from L1using Apriori principle
12
Agents and Mobile agents
An agent is a computation entity that:
Acts on behalf of other entities in autonomous fashion.
Performs its actions with some level of pro-activity and re-activeness.
Exhibits some level of the key attributes of co-operation.
Mobile network agents are programs that:
can migrate from system to system within a network environment
Performs some processing at each host
Agent decides when and where to move next
How does it move? Save state Transport saved state to next
system Resume execution of saved
state
13
Distributed Computing using Mobile Programs
14
Mobile agent toolsNo. Name Description Developer Language Application
1 Concordia Framework for agent development Mitsubishi E.I.T. Java Mobile computing, Data base
2 Aglet Java Class libraries IBM, Tokyo Java Internet3 Agent Tcl Transportable agent system R. Gray, U Dart. Tcl Tk Information management
4 Odyssey Set of Java Class libraries General Magic Telescript Electronic commerce
5 OAA Open Agent Architecture SRI International, AI C, C-Lisp, Java, VB General purpose
6 Ara Agent for Remote Action U Kaiserslautern C/C++, Tcl, Java Partially connected c. D.D.B.
7 Tacoma Tromso and Cornel Moving Agent Norway & Cornell C, UNIX-based, Client/Server model issues / OS support
8 Voyager Platform for distributed applic. ObjectSpace Java Support for agent systems
9 AgentSpace Agent building platform Ichiro Sato, O. U. Java General purpose
10 Mole First Java-Based MA system Stuttgart U. Germany Java, UNIX-based General purpose
11 MOA Mobile Object and Agents OpenGroup, UK Java General purpose
12 Kali Scheme Distributed impl. of Scheme NEC Research I. Scheme Distributed data mining, load balancing
13 The Tube mobile code system David Halls, UK Scheme Remote execution of Scheme
14 Ajanta Network mobile object Minoseta U. Java General purpose
15 Knowbots Research infrastructure of MA CNRI Python Distributed systems / Internet
16 AgentSpace Mobile agent framework Alberto Sylva Java Support for dynamic and dist. Appl.
17 Plangent Intelligent Agent system Toshiba Corporation Java Intelligent tasks
18 JATlite Java Agent framework dev /KQML Standford U. Java Information retrievial, Interface agent
19 Kafka Multiagent libraries for Java Fujitsu Lab. Japan Java UNIX based General purpose
20 Messengers Autonomous messages UCI C (Messenger-C) General purpose
15
What are Aglets ?Aglets (Agile Applets) are Java objects that can move from one host on the Internet to another, and perform arbitrary operations within the security limits.
When an Aglet moves it takes along its program code as well as its data.
The Aglets framework is implemented by the Aglets Software Development Kit (ASDK) from IBM. It is an environment for programming mobile Internet Agent in Java.
16
Aglets at RuntimeCurrently aglets use the Agent Transfer Protocol (ATP) as a default implementation of the communication layer (ATP is modeled after HTTP)
Used on the Tahiti aglet server
Use the Aglets Server Interface to write application capable of hosting, receiving and dispatching aglets
17
Introduction
Background
A distributed Apriori algorithm using the mobile agents
Experimental evaluation
Conclusion
Outline
18
A distributed Apriori algorithm (1) spawn n slave processes; (2) divide database into partitions (3) distribute partitions to each slave process
Master process
1. send global candidate (k-1)-itemsets Ck-1 to each slave process
4. wait and receive local supports, count global supports for global candidate (k-1)-itemsets Ck-1
5. compute frequent (k-1)-itemsets Lk-1,
and send clusters of frequent (k-1)-itemsets Lk-1 to slave processes
8. wait and receive local candidate k-itemsets from slave processes
9. unionize local candidate k-itemsets and prune to form global candidate k-itemsets.
1
2 Slave processes
2. receive the global candidate (k-1)-itemsets Ck-1
3. count local supports for global candidate (k-1)-itemsets Ck-1, and
send local supports to the master process.
6. receive frequent (k-1)-itemsets Lk-1 from the master process
7. generate local candidate k-itemsets and send these local candidate k-itemsets to the master process
19
A distributed Apriori algorithm
SEND global candidate(k-1) itemsets Ck-1
COUNT and SEND local supports for global candidate (k-1)-itemsets(counting support Aglets)
COUNT global supports for global candidate (k-1)-itemsets Ck-1
UNIONIZE local candidate k-itemsets and PRUNE to form global candidate k-itemsets Ck
JOIN and SEND local candidate k-itemsets(Aprio_gen Aglet)
...
…
e.g.,{AB}
2
3
1
8FIND and SEND frequent (k-1)-itemsets Lk-1
DB1
DB2
DBn
...
DB1
DB2
DBn
master slaves master slaves master
DB DB DB
…
20
Global support count & Global candidate itemsets
X is a candidate itemset, global support count of X is
The set of global candidate k-itemsets GCk formed by local candidate k-itemsets
GLk formed by Apriori-gen with ID segment (p, q) of GLk-1
GLk = {GCk ׀ GCk.G-Supp G-Min-Supp}
n
iikk LSuppXGSuppX
1
..
n
i kk LCGC1
),( 1 qpGLgenAprioriGL kik
21
Introduction
Background
A distributed Apriori algorithm using the mobile agents
Experimental evaluation
Conclusion
Outline
22
Experiments: Synthetic datasets
Using synthetic datasets of varying sizes:
Name |D| |T| Size (MB)
D100k.T30 100K 30 3M
D100k.T100 100K 100 10M
D320k.T150 320K 150 48M
|D| Number of transactions|T| Average amount of items on transactions
23
Experiment environmentSoftware Database : Oracle server Language: Java – JDK1.3-Sun Mobile agents: Aglet- IBM Protocol traffic: ATP – Aglet Transfer Protocol Platform: Windows
Hardware PC Petium3-300 Mhz, RAM 128MB 15 machines (at Knowledge Science Center, JAIST)
24
Execution time (sec.) with different minimum support thresholds