Top Banner
Intelligent Database Systems Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classi cation from unlabeled documents with bootstrapping and feature projection techniques
22

Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Jan 05, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Presenter : Chang,Chun-Chih

Authors : Youngjoong Ko, Jungyun Seo

2009, IPM

Text classification from unlabeled documents with bootstrapping

and feature projection techniques

Page 2: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Outlines

MotivationObjectivesMethodologyExperimentsConclusionsComments

Page 3: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Motivation

• A general inductive process automatically builds a text classifier by learning, generally known as supervised learning.

• The most notable problem is that they require a large number of labeled training documents for accurate learning.

Page 4: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Objectives

• The propose a new text classification method based on unsupervised or semi-supervised learning

• The proposed method launches text classification tasks with only unlabeled documents.

Page 5: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Methodology-Framework

Page 6: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Methodology -Creating keyword lists

Page 7: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Methodology -Creating keyword lists

1 = 1.0+( 1.0 - 1.0 )

Student

traffic

is

1.0

1.0

Title WordTitle WordStudent

trafficbook

0.05

0.6

1.15 = 0.6+( 0.6 – 0.05 )

Page 8: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Methodology -Extracting & verifying centroid-context

Page 9: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Methodology-Creating the context-cluster of each category

1.

Page 10: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Methodology-Creating the context-cluster of each category2.

3.

Page 11: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Methodology-Creating the context-cluster of each category

EX: 1. eat Banana 2. taste Banana 3. eat Apple

Page 12: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Methodology-The TCFP classifier with robustness from noisy data

Page 13: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Methodology-The TCFP classifier with robustness from noisy data

Page 14: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Experiments

Page 15: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Experiments

Page 16: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Experiments

Page 17: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Experiments

Page 18: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Experiments

Page 19: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Experiments

Page 20: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Experiments

Page 21: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Conclusions

• The proposed method is useful for low-cost text classification

• If some text classification tasks require high accuracy, can be used as an assistant tool for easily creating training data.

Page 22: Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classification from unlabeled documents.

Intelligent Database Systems Lab

Comments

• Advantages– faster – less expensive

• Applications– Text classification