Top Banner
Fuzzy Final Homework System Implementation Selected paper: Fuzzy integration of structure adaptive SOMs for web content mining, Fuzzy Sets and Systems 148 (2004) 43–60 Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao [email protected]
15

Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao [email protected]

Mar 15, 2016

Download

Documents

acton-camacho

Fuzzy Final Homework System Implementation Selected paper: Fuzzy integration of structure adaptive SOMs for web content mining , Fuzzy Sets and Systems 148 (2004) 43–60. Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao [email protected]. Outline. Introduction - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao D9415004@mail.ntust.tw

Fuzzy Final HomeworkSystem ImplementationSelected paper: Fuzzy integration of structure adaptive SOMs for web content mining, Fuzzy Sets and Systems 148 (2004) 43–60

Lecture: Prof. Hahn-Ming LeeStudent: Ching-Hao Mao

[email protected]

Page 2: Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao D9415004@mail.ntust.tw

Outline Introduction Proposed method in selected paper Implementation Conclusion References

Page 3: Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao D9415004@mail.ntust.tw

Introduction In this report, we implement Kim and Cho’s paper

appear on Fuzzy Set and System in 2004 User profile represents different aspects of user’s

characteristics The author proposed an ensemble of classifiers

that estimate user’s preference using web content labeled by user as “like” or “dislike”

Page 4: Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao D9415004@mail.ntust.tw

Introduction- Preview Studies [2]

Page 5: Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao D9415004@mail.ntust.tw

Feature Selection Method Properties

Feature selection methods such as Information Gain, TFIDF, and ODDS ratio have different properties

TFIDF does not consider class values of documents when calculating the relevance of features while information gain uses class labels of documents

Odds ratio uses class labels of documents but they find useful features to classify only one specific class

Page 6: Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao D9415004@mail.ntust.tw

Overview of the proposed method in [1]

Classification

TFIDF, Information Gain,ODDS Ratio

Page 7: Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao D9415004@mail.ntust.tw

Structure Adaptive SOM

Page 8: Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao D9415004@mail.ntust.tw

Training SASOM’s using different feature sets

Fuzzy Integral

Hot

Cold

or

Page 9: Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao D9415004@mail.ntust.tw

Data Set Description UCI Syskill & Webert data (http://kdd.ics.uci.edu) Contain the HTML source of web pages plus the

ratings of a single user on these web pages The web pages are on four separate subjects

Bands- recording artists (Implement in this report) Goats (Implement in this report) Sheep BioMedical

Page 10: Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao D9415004@mail.ntust.tw

Implementation Coding Java (J2SE 1.5) program for

preprocessing, feature selection (TFIDF and ODDS Ratio), and Fuzzy Integral mechanism

Using Weka for Feature Selection (Information Gain) and Classification

This report not successfully program SASOM…

Page 11: Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao D9415004@mail.ntust.tw

Implementation-preprocessing

UCI Syskill & Webert data

ExtractHTMLContent.java

Pure Text without Anchor Text

Bands.txt

After Stopword and Porter Stemmer

Bands_Stopword.txtBands_Porter.txt

Page 12: Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao D9415004@mail.ntust.tw

Implementation- Feature Selection In Bands, 61 dataset E.g. Attribute

Number: 5436->32

Information Gain TFIDF ODDS Ratio

0.1435 1411 mother0.1435 4109 writes0.1054 49 places0.1054 855 letter0.1054 3883 movement0.1054 1464 stories0.1054 3856 synthesizer0.1054 2568 songwriters0.0962 4643 singer0.0937 50 america

seaacidprogramminginnovativelettermethodmembersbleedconcentratedmother

osswildculturesvehementlysmokingdefinebookchargelibraryhand

Page 13: Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao D9415004@mail.ntust.tw

Implementation- Fuzzy IntegralFuzzy measure of classifiers that are determined subjectively [1]

Bayes Classifier b1,b2,b3

b1=0, b2=1, b3=0 0.99

FuzzyIntegral.java

(g1,g2,g3)0.99,0.99,0.99) (0.01,0.01,0.99)(b1,b2,b3) Result (b1,b2,b3) Result

(0,1,0) 0.99 (0,0,1) 0.01

(1,1,1) 0.99 (0,1,1) 0.01

(0,0,0) 0,99 (0,0,0) 0.01

Page 14: Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao D9415004@mail.ntust.tw

Conclusion Fuzzy integral provides the method of measuring

the importance of classifiers subjectively, especially in semi-supervised learning method

The method based on fuzzy integral can be effectively applied to web content mining for predicting user’s preference as user profile

Fuzzy Integral maybe can apply into my research area to integrate expert or user’s knowledge

Page 15: Lecture: Prof. Hahn-Ming Lee Student: Ching-Hao Mao D9415004@mail.ntust.tw

References1. Kyung-Joong Kim, Sung-Bae Cho, Fuzzy integration of structure adaptive

SOMs for web content mining, Fuzzy Sets and Systems 148 (2004) 43–602. Pazzani M., Billsus, D., Learning and Revising User Profiles: The

identification of interesting web sites, Machine Learning 27 (1997), 313-331

3. http://kdd.ics.uci.edu/databases/SyskillWebert/SyskillWebert.data.html