Mining Frequent Closed Cubes in 3D Datasets
Post on 06-Feb-2016
41 Views
Preview:
DESCRIPTION
Transcript
Mining Frequent Closed Cubes in 3D Datasets
Liping Ji Kian-Lee Tan
Anthony K. H. Tung
Computer Science DepartmentNational University of Singapore
Motivation Motivation
Frequent Closed Pattern (FCP) Mining:Frequent Closed Pattern (FCP) Mining: great importance, wide applicationgreat importance, wide application Previous works all limited to 2D FCP miningPrevious works all limited to 2D FCP mining biological data: biological data: gene-timegene-time, , gene-samplegene-sample market basket data: market basket data: transanction-itemsettransanction-itemset Extend the 2D FCP mining to the 3D contextExtend the 2D FCP mining to the 3D context biological data: gene-sample-time
marketing data: region-time-items
tt11: a: a11 a a22 a a33 a a55
tt22: a: a11 a a22 a a33
tt33: a: a11 a a22 a a33 a a44
tt44: a: a33 a a55
TransactionsTransactions
ItemsetsItemsets
BackgroundBackground
Frequent Pattern (FP) and Frequent Closed Frequent Pattern (FP) and Frequent Closed Pattern (FCP)Pattern (FCP)
minimum support threshold: minsup=2
TransactionsTransactions
BackgroundBackground
Frequent Pattern (FP) and Frequent Closed Frequent Pattern (FP) and Frequent Closed Pattern (FCP)Pattern (FCP)
minimum support threshold: minsup=2
tt11: a: a11 a a22 a a33 a a55
tt22: a: a11 a a22 a a33
tt33: a: a11 a a22 a a33 a a44
tt44: a: a33 a a55
ItemsetsItemsets
TransactionsTransactions
BackgroundBackground
Frequent Pattern (FP) and Frequent Closed Frequent Pattern (FP) and Frequent Closed Pattern (FCP)Pattern (FCP)
minimum support threshold: minsup=2
tt11: a: a11 a a22 a a33 a a55
tt22: a: a11 a a22 a a33
tt33: a: a11 a a22 a a33 a a44
tt44: a: a33 a a55
ItemsetsItemsets
FCPFCP
FPFP
tt11: a: a11 a a22 a a33 a a55
tt22: a: a11 a a22 a a33
tt33: a: a11 a a22 a a33 a a44
tt44: a: a33 a a55
TT
II
Binary MappingBinary Mapping
T\IT\I aa11 aa22 aa33 aa44 aa55
tt11 11 11 11 00 11tt22 11 11 11 00 00tt33 11 11 11 11 00tt44 00 00 11 00 11
BackgroundBackground
tt11: a: a11 a a22 a a33 a a55
tt22: a: a11 a a22 a a33
tt33: a: a11 a a22 a a33 a a44
tt44: a: a33 a a55
TT
II
Binary MappingBinary Mapping
T\IT\I aa11 aa22 aa33 aa44 aa55
tt11 11 11 11 00 11tt22 11 11 11 00 00tt33 11 11 11 11 00tt44 00 00 11 00 11
BackgroundBackground
Frequent Closed CubeFrequent Closed Cube 3D Dataset3D Dataset
RowRow
ColumnColumn
HeightHeight
SliceSlice
Frequent Closed CubeFrequent Closed Cube Slices by Height DimensionSlices by Height Dimension
hh33hh22hh11
Frequent Closed CubeFrequent Closed Cube Closed Cube: MaximalClosed Cube: Maximal
hh33hh22hh11
Frequent Closed CubeFrequent Closed Cube Closed Cube: MaximalClosed Cube: Maximal
hh33hh22hh11
Definition: Frequent Closed Cube (FCC)Definition: Frequent Closed Cube (FCC) Maximal: cannot be extended in any Maximal: cannot be extended in any
dimensiondimension Frequent: satisfy Frequent: satisfy minH, minR, minCminH, minR, minC
threshodsthreshods
Frequent Closed CubeFrequent Closed Cube
Definition: Frequent Closed Cube (FCC)Definition: Frequent Closed Cube (FCC) Maximal: cannot be extended in any Maximal: cannot be extended in any
dimensiondimension Frequent: satisfy Frequent: satisfy minH, minR, minCminH, minR, minC
thresholdsthresholds
Frequent Closed CubeFrequent Closed Cube
RSM vs. CubeMinerRSM vs. CubeMiner
Representative Slice Mining (RSM)Representative Slice Mining (RSM) extend existing 2D FCP mining algorithms for extend existing 2D FCP mining algorithms for
FCC miningFCC mining CubeMinerCubeMiner operate on the 3D space directlyoperate on the 3D space directly
RSMRSM
Representative Slice (RS) GenerationRepresentative Slice (RS) Generation enumerate all possible combination of slicesenumerate all possible combination of slices 2D FCP Mining from each RS2D FCP Mining from each RS Post-pruning to Remove Unclosed CubesPost-pruning to Remove Unclosed Cubes If a 2D FCP is contained in other slices besides
its contributing slices, it is unclosed and hence removed; otherwise, it is retained.
Slices by Height DimensionSlices by Height Dimension
hh33hh22hh11
RSMRSM
RSMRSM
Slices by Height DimensionSlices by Height Dimension
hh33hh22hh11
RSMRSM
CubeMiner PrincipleCubeMiner Principle
α
β
γ
CubeMiner PrincipleCubeMiner Principle
γ
β
α
α
β
γ
CubeMiner: CuttersCubeMiner: Cutters
Slice hSlice h11 Cutters from hCutters from h11
RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))
hh11,, rr11, c, c4 4
Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree
RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))
hh11,, rr11, c, c4 4 Cutter Checking: Cutter Checking: A.A.
Cutter Checking:Cutter Checking: check if the Cutter is applicable (A.) check if the Cutter is applicable (A.) Subset of the node: A.Subset of the node: A. Otherwise: N.A.Otherwise: N.A.
Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree
RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))
hh11,, rr11, c, c4 4
((hh22hh33,, rr11~r~r44, c, c11~c~c5 5 ))
Left Tree:Left Tree: remove Cutter’s left atom h remove Cutter’s left atom h1 1 from from parent node parent node
Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree
RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))
hh11,, rr11, c, c4 4
((hh22hh33,, rr11~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c5 5 ))
Middle Tree:Middle Tree: remove Cutter’s middle atom r remove Cutter’s middle atom r1 1 from from parent node parent node
Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree
RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))
hh11,, rr11, c, c4 4
((hh22hh33,, rr11~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr11~r~r44, c, c11cc22cc33cc5 5 ))
Right Tree:Right Tree: remove Cutter’s right atom c remove Cutter’s right atom c4 4 from from parent node parent node
Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree
RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))
hh11,, rr11, c, c4 4
((hh22hh33,, rr11~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr11~r~r44, c, c11cc22cc33cc5 5 ))
hh1 1 ,, rr22, c, c44cc5 5 hh1 1 ,, rr22, c, c44cc5 5 hh1 1 ,, rr22, c, c44cc5 5
Next Cutter:Next Cutter: checking checking
N.A.N.A. A.A. A.A.
Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree
RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))
hh11,, rr11, c, c4 4
((hh22hh33,, rr11~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr11~r~r44, c, c11cc22cc33cc5 5 ))
hh1 1 ,, rr22, c, c44cc5 5 hh1 1 ,, rr22, c, c44cc5 5
((hh22hh3 3 ,, rr22~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr33rr44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c3 3 ))
Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree
Subset CubeSubset Cube
RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))
hh11,, rr11, c, c4 4
((hh22hh33,, rr11~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr11~r~r44, c, c11cc22cc33cc5 5 ))
hh1 1 ,, rr22, c, c44cc5 5 hh1 1 ,, rr22, c, c44cc5 5
((hh22hh3 3 ,, rr22~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr33rr44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c3 3 ))
Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree
RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))
hh11,, rr11, c, c4 4
((hh22hh33,, rr11~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr11~r~r44, c, c11cc22cc33cc5 5 ))
hh1 1 ,, rr22, c, c44cc5 5 hh1 1 ,, rr22, c, c44cc5 5
((hh22hh3 3 ,, rr22~r~r44, c, c11~c~c55 )) ((hh11~h~h3 3 ,, rr33rr44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c33 ))
Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree
Left Track CheckingLeft Track Checking
RootRoot ((hh11hh22hh3 3 ,, rr11rr22rr33rr44, c, c11cc22cc33cc44cc5 5 ))
hh11,, rr11, c, c4 4
((hh22hh33,, rr11~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr11~r~r44, c, c11cc22cc33cc5 5 ))
hh1 1 ,, rr22, c, c44cc5 5 hh1 1 ,, rr22, c, c44cc5 5
((hh22hh3 3 ,, rr22~r~r44, c, c11~c~c55 )) ((hh11~h~h3 3 ,, rr33rr44, c, c11~c~c5 5 )) ((hh11~h~h3 3 ,, rr22~r~r44, c, c11~c~c33 ))
Mining FCC: CubeMiner Mining FCC: CubeMiner Splitting Tree Splitting Tree
Parallelism Parallelism RSMRSM
Task: mining of each Representative SliceTask: mining of each Representative Slice CubeMiner:CubeMiner:
Task: mining of each branchTask: mining of each branch Processor:Processor:
Initial: keep a copy of the whole datasetInitial: keep a copy of the whole dataset Independent and concurrent with few Independent and concurrent with few
communication costcommunication cost
Real yeast cell-cycle regulated genesReal yeast cell-cycle regulated genes Elutriation Experiments: 14*9*7161Elutriation Experiments: 14*9*7161 CDC15 Experiments: 19*9*7761CDC15 Experiments: 19*9*7761
Synthetic Data: IBM data generatorSynthetic Data: IBM data generator Synthetic 1: H*R*C=(8~20)*20*1000Synthetic 1: H*R*C=(8~20)*20*1000 Synthetic 2: H*R*C=100*100*10000Synthetic 2: H*R*C=100*100*10000
Mining FCC: ExperimentsMining FCC: Experiments
Experiments: Optimize CubeMinerExperiments: Optimize CubeMiner
Optimal: sort Optimal: sort slices by zero slices by zero decreasing decreasing order order
Prune off Prune off infrequent infrequent cubes early cubes early
Elutritration(14*9*7161)Elutritration(14*9*7161)
Experiments: Optimize RSMExperiments: Optimize RSM
Optimal: Optimal: enumerate slices enumerate slices by the smallest by the smallest dimension dimension
Slice enumeration Slice enumeration takes relatively long takes relatively long processing time processing time
Elutritration(14*9*7161)Elutritration(14*9*7161)
Experiments: RSM vs. CubeMinerExperiments: RSM vs. CubeMiner
With the increase With the increase of the smallest of the smallest dimension, CubeMiner dimension, CubeMiner outperforms RSMoutperforms RSM
Synthetic Data (vary size of height dimension)Synthetic Data (vary size of height dimension)
Experiments: ParallelismExperiments: Parallelism
CDC15 (Vary Number of Processors)CDC15 (Vary Number of Processors)
As the degree of As the degree of parallelism increases, parallelism increases, the response time the response time decreases.decreases.
Optimal number of processors
Notion of Frequent Closed CubeNotion of Frequent Closed Cube
RSM: RSM: efficient when one of the dimension is smallefficient when one of the dimension is small
CubeMiner: superior for large datasets
Parallel RSM and CubeMiner
Conclusion Conclusion
Thank You!Thank You!
top related