Top Banner
ICDM 2004 Business Meeting 11/4/2004 1 Data Mining Data Mining on ICDM Submission on ICDM Submission Data Data Shusaku Tsumoto Ning Zhong and Xindong Wu
20

1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

Jan 02, 2016

Download

Documents

Felicia Hunt
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 1

Data Mining Data Mining on ICDM Submission Dataon ICDM Submission Data

Shusaku Tsumoto

Ning Zhong and Xindong Wu

Page 2: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 2

Data Mining Data Mining on ICDM Submission Dataon ICDM Submission Data

38 countries, 445 Submissions Regular Papers: 39 (9%) Short Papers: 66 (14.8%)

High Acceptance Ratio (Regular)– Germany: 4/15 (26.7%)– Finland: 2/ 9 (22.2%)– USA: 20/109 (18.3%)

Page 3: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 3

CountryCountry

Country Regular Short Total Ratio

USA 20 28 109 44.0%

China 3 4 55 12.7%

UK 1 6 39 17.9%

Japan 0 5 28 17.9%

Canada 3 3 25 24.0%

Taiwan 0 1 18 5.6%

Australia 2 1 17 17.6%

Germany 4 5 15 60.0%

France 0 2 14 14.3%

India 1 0 14 7.1%

Singapore 0 3 12 25.0%

Brazil 0 1 12 8.3%

Italy 2 1 10 30.0%

Finland 2 1 9 33.3%

Spain 0 1 7 14.3%

HongKong 1 1 6 33.3%

Top 15 39 63 390 26.2%

Total 39 66 445 23.8%

Page 4: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 4

Data Mining Data Mining on ICDM Submission Dataon ICDM Submission Data

Top 5 Areas of Submissions:– Data mining applications– Data mining and machine learning algorithms and methods– Mining text and semi-structured data, and mining temporal, spatial and multimedia

data– Data pre-processing, data reduction, feature selection and feature transformation– Soft computing and uncertainty management for data mining

High Acceptance Ratio Areas (Regular+Short)– Quality assessment and interestingness metrics of data mining results

5/10 50.0%– Data pre-processing, data reduction, feature selection and feature transfor

mation 14/35 40.0%– Complexity, efficiency, and scalability issues in data mining

4/11 36.4%

Page 5: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

5

TopicsTopics

TopicRegular

Short

Total

Ratio

Data mining applications 4 10 8416.7

%

Data mining and machine learning algorithms and methods

9 20 8135.8

%

Mining text and semi-structured data, and mining temporal, spatial and multimedia data

3 8 4425.0

%

Data pre-processing, data reduction, feature selection and feature transformation

7 7 3540.0

%

Soft computing and uncertainty management for data mining

  3 348.8

%

Foundations of data mining 2 1 2611.5

%

Mining data streams 3 4 2528.0

%

Human-machine interaction and visual data mining   1 166.3

%

Security, privacy and social impact of data mining 2 1 1520.0

%

Data and knowledge representation for data mining 1 1 1216.7

%

Pattern recognition and trend analysis   1 119.1

%

Complexity, efficiency, and scalability issues in data mining

2 2 1136.4

%

Quality assessment and interestingness metrics of data mining results

2 3 1050.0

%

Statistics and probability in large-scale data mining 1   911.1

%

Integration of data warehousing, OLAP and data mining

  1 911.1

%

Collaborative filtering/personalization   2 728.6

%

Post-processing of data mining results 1 1 728.6

%

Others 2   633.3

%

High performance and parallel/distributed data mining

1   250.0

%

Query languages and user interfaces for mining     10.0

%

Total 39 66 44523.8

%

Page 6: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 6-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5

Corresponding AnalysisCorresponding Analysis(Country vs Final Decision)(Country vs Final Decision)

Reject

Regular

Short

Slovenia

Japan

Hong Kong

USA

r2=0.177

Germany

ItalyIndia

r1=0.378

Finland

UK France

Canada

Australia

Page 7: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 7-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

-1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

Corresponding AnalysisCorresponding Analysis(Topics vs Final Decision)(Topics vs Final Decision)

RejectShort

RegularStatistics and probability

Security, privacy

Applications

Post-processing

r2=0.184

Preprocessing, Feature Selection

r1=0.280

High-performance

Quality-assessment

Collaborative Filtering

Soft-computing

DM Methods

Page 8: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 8

Corresponding AnalysisCorresponding Analysis

Country vs Final Decision– Regular: Germany, USA– Short: ? – Reject: Most of the countries are located near this region.

Topics vs Final Decision– Regular: Quality Assessment,

Preprocessing/Feature Selection– Short: DM/ML Methods, Collaborative Filtering– Reject: DM Applications

Page 9: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 9

Rule Mining Rule Mining on ICDM Submission Dataon ICDM Submission Data

Datasets– Sample Size: 445– Attributes: 5

• Paper No. : ordered by submission date• # of Authors• # of Characters in Title• Country• Category

– Analyzed by Clementine 7.1 (and SPSS12.0J)

Page 10: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 10

Rule Mining (C5.0)Rule Mining (C5.0)on ICDM Submission Dataon ICDM Submission Data

C5.0

– [Topic=Mining semi-structured data,…] & [129< Paper No.<=369] => Reject (Confidence 0.87, Support 10)

– [Country=USA] & [Topic=Mining semi-structured data,…] & [Paper No.>369] & [# of Authors <=3] =>Accept (Confidence 0.667, Support 3)

– [Topic=Preprocessing/Feature Selection] & [# of Authors>4] => Accept (Confidence: 1.0, Support 3)

– Topic, Paper No, # of Authors : Important Features

Page 11: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 11

Rule Mining (GRI)Rule Mining (GRI)on ICDM Submission Dataon ICDM Submission Data

Generalized Rule Induction

– [# of Authors <2] & [Paper No. <120.5] => Rejected (Confidence 96.0%, Support 24)

– [# of Chars in Title< 27] & [Paper No. > 212]=> Accepted (Confidence 100%, Support 5)

Paper No., # of Chars in Title, # of Authors: Important Features

Page 12: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 12

Multidimensional ScalingMultidimensional Scaling(2004)(2004)

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

-1 -0.5 0 0.5 1 1.5

Decision

# of Authors

Review Score

# of Chars in Title

TopicsPaper No.

Country

Page 13: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 13

Summary (2004) of Mining Summary (2004) of Mining on ICDM Submission Data on ICDM Submission Data

Do not submit a paper too fast ! – Reflection not only on the contents, but also on the titles needed

Mining Text/Web/Semi-structured Data are very popular. # of Application papers are growing now. (But, many: rejected) Strong Topics

– Preprocessing/Feature-Selection

– Postprocessing

– Security and Privacy Several topics are emerging in ICDM2004:

– Mining Data Streams

– Collaborative Filtering

– Quality Assessment

Page 14: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 14

Comparison Comparison between 02-04between 02-04Review Scores: Review Scores:

Box-plot Box-plot

2002 2003 2004

year

0.00

1.00

2.00

3.00

4.00

5.00

score

1,1691,176

Page 15: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 15

Comparison between 02-04Comparison between 02-04Countries Countries

CountryAcceptance Ratio (2002)

Country Acceptance Ratio (2003)

Country Acceptance Ratio (2004)

Hong Kong 64.7% Israel 55.0% Germany 60.0%

USA 47.9% Hong Kong 50.0% USA 44.0%

Canada 45.5% Japan 37.0% Finland 33.0%

Finland 33.3% USA 33.0% Hong Kong 33.0%

France 33.3% Germany 32.0% Italy 30.0%

Page 16: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

16

Comparison between 02 and 04Comparison between 02 and 04Topics Topics

Top 5 in 2002

AcceptanceRatio

Top 5 in 2003

AcceptanceRatio

Top 5 in 2004

Acceptance Ratio

Graph Mining

75.0%Process-centric DM

80.0% Quality Assessment 50.0%

Temporal Data

52.6%Security, privacy

57.0%Preprocessing, Feature Selection

40.0%

Theory 42.9%Statistics and Probability

47.0%Complexity/Scalability

36.4%

Text Mining

42.1%Visual Data Mining

38.0%DM and ML Methods

35.8%

Rule 41.7%Post-processing

41.7%Collaborative Filtering

28.6%

        Post-processing 28.6%

Page 17: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 17

Multidimensional ScalingMultidimensional Scaling(2003 and 2004)(2003 and 2004)

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

-1 -0.5 0 0.5 1 1.5

Decision

# of Authors

Review Score

# of Chars in Title

Topics Paper No.

Country

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

-1 -0.5 0 0.5 1 1.5

Decision

# of Authors

Review Score

# of Chars in Title

Topics Paper No.

Country

2003

2004

Topological structure w.r.t. similaritiesseems not to be changed in 2003 and 2004.

Page 18: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 18

Data Mining Data Mining on ICDM Submission Dataon ICDM Submission Data

Acknowledgements– Many thanks to

• PC chairs, Vice Chairs and PC members

• All the authors• All the contributors to ICDM2004

– See you again in ICDM2005!

Page 19: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 19

Multidimensional ScalingMultidimensional Scaling(2004)(2004)

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

-1 -0.5 0 0.5 1 1.5

Decision

# of Authors

Review Score

# of Chars in Title

TopicsPaper No.

Country

Page 20: 1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.

ICDM 2004 Business Meeting 11/4/2004 20-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

-1 -0.5 0 0.5 1 1.5

Multidimensional ScalingMultidimensional Scaling(2003)(2003)

Decision

# of Authors

Review Score

# of Chars in Title

TopicsPaper No.

Country