Abstract—In this paper, we describe a method for extracting comparative sentences and their components from messages posted on bulletin board systems (BBSs) or discussion forums. A comparative sentence is a message representing the merits and demerits between two or more targets. We propose a method for extracting a set of object-attribute-evaluation triples, which are the elements that make up a comparative statement. We first extract comparative sentences that meet our definition of a comparative sentence by using the method devised from observations of actual sentences in a BBS. Then, by analyzing the messages by using the difference in characteristics of each component of the comparative sentence, we identify object-attribute-evaluation triples, which are the components of a comparative sentence. We conducted evaluation experiments with the proposed method. As a result, we raised the extraction of the comparative sentence and the extraction accuracy of a component. Index Terms—Comparative Sentence, Forum, Expression of evaluation I. INTRODUCTION On the Internet, the number of communities, in which many and unspecified people can exchange opinions, are increasing. One typical example is the bulletin board system (BBS). A BBS consists of many threads that have comments posted in relation to the topic of the thread. In each thread, users can communicate, talk, and argue with many people about the topic. In some of these threads, users talk about the differences between things related to the topic, for example, the superiority or inferiority of the one to the other. Messages in these threads contain information for judging which is superior in a certain aspect. It is helpful to judge which is superior if we can obtain these information. In this paper, we call messages containing this information “comparative sentences.” We aim to obtain this information from comparative sentences. In this paper, we use threads on the bulletin board system “2channel” [1] as the target of an experiment. We conducted experiments on comparative Keita Ozaki is with the Graduate School of Information Science and Engineering, Ritsumeikan University, Kusatsu, Shiga 525-8577, Japan (corresponding author to provide phone: 077-599-4365; fax: 077-599-4365; e-mail: [email protected]). Fuminori Kimura is with the Kinugasa Research Organization, Ritsumeikan University, Kyoto, Kyoto 603-8577, Japan (e-mail: [email protected]). Akira Maeda is with the College of Information Science and Engineering, Ritsumeikan University, Kusatsu, Shiga 525-8577, Japan (e-mail: [email protected]). sentence extraction and its components in order to evaluate the proposed method. II. RELATED WORK In this section, we introduce related research on extracting and classifying comparative sentences and evaluation aspects. For the extraction of comparative sentences, Kurashima et al. [2] focused on the superiority or inferiority between comparative objects, e.g., a sentence saying which restaurant is cheaper or more delicious between restaurants A and B, and proposed a method for extracting four kinds of elements that constitute a comparative sentence, i.e., criteria, object, attribute, and evaluation, from sentences, by using rules devised from observing actual comparison expressions. Jindal and Liu [3] collected comparative sentence candidates comprehensively using a manually created list of clue word clauses that express comparisons in English, and they proposed a comparative sentence classifier that uses class sequential rules created from the list as the feature. Regarding the extraction of evaluation aspects, Iida et al. [4] extracted groups of two or more evaluation candidates and one attribute, obtained the optimal combination of an attribute and its evaluation by supervised learning, and extracted pairs of attribute-evaluation. Suzuki et al. [5] extracted the candidates of object-attribute-evaluation triples by combining the naive Bayes classifier and EM algorithm. In our proposed method, comparative sentences are extracted independent of specific expressions. In addition, in the extraction of evaluation aspects, our method focuses on colloquial expressions that are used mainly on BBSs in Japanese, which has not been dealt with in previous studies. To tackle the problem of colloquial expressions in Japanese that are often grammatically broken, we use a clause, instead of a word or a clause, as the unit of comparative elements, i.e., target, attribute, and evaluation. III. THE DEFINITION OF A COMPARATIVE SENTENCE In this section, we define a comparative sentence and explain the comparison components of the such sentences. We define comparative sentences as messages that fulfill any of the below three definitions. Definition 1 A message contains two “objects” (candidate for comparisons). Definition 2 A message contains “evaluation” for at least one “object.” Definition 3 A message mentions relationships between “objects,” such as “superiority or inferiority,” “equivalent,” “the best,” and “the feature.” Extraction of Comparative Sentences and their Components from BBS Messages Keita Ozaki, Fuminori Kimura, and Akira Maeda Proceedings of the World Congress on Engineering and Computer Science 2013 Vol I WCECS 2013, 23-25 October, 2013, San Francisco, USA ISBN: 978-988-19252-3-7 ISSN: 2078-0958 (Print); ISSN: 2078-0966 (Online) WCECS 2013
6
Embed
Extraction of Comparative Sentences and their Components ... · comparative objects, e.g., ... Extraction of Comparative Sentences and their ... (adjective), is included in the dictionary.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Abstract—In this paper, we describe a method for
extracting comparative sentences and their components
from messages posted on bulletin board systems (BBSs) or
discussion forums. A comparative sentence is a message
representing the merits and demerits between two or
more targets. We propose a method for extracting a set of
object-attribute-evaluation triples, which are the
elements that make up a comparative statement. We first
extract comparative sentences that meet our definition of
a comparative sentence by using the method devised from
observations of actual sentences in a BBS. Then, by
analyzing the messages by using the difference in
characteristics of each component of the comparative
sentence, we identify object-attribute-evaluation triples,
which are the components of a comparative sentence. We
conducted evaluation experiments with the proposed
method. As a result, we raised the extraction of the
comparative sentence and the extraction accuracy of a
component.
Index Terms—Comparative Sentence, Forum,
Expression of evaluation
I. INTRODUCTION
On the Internet, the number of communities, in which many
and unspecified people can exchange opinions, are increasing.
One typical example is the bulletin board system (BBS). A
BBS consists of many threads that have comments posted in
relation to the topic of the thread. In each thread, users can
communicate, talk, and argue with many people about the
topic. In some of these threads, users talk about the
differences between things related to the topic, for example,
the superiority or inferiority of the one to the other. Messages
in these threads contain information for judging which is
superior in a certain aspect. It is helpful to judge which is
superior if we can obtain these information.
In this paper, we call messages containing this information
“comparative sentences.” We aim to obtain this information
from comparative sentences. In this paper, we use threads on
the bulletin board system “2channel” [1] as the target of an
experiment. We conducted experiments on comparative
Keita Ozaki is with the Graduate School of Information Science and
Engineering, Ritsumeikan University, Kusatsu, Shiga 525-8577, Japan
(corresponding author to provide phone: 077-599-4365; fax: 077-599-4365;
高塚の-attribute×/頃は-others○/こんなの-Others○/なかったのに、-evaluation○/ゲーム-attribute○/よくなっても-others×/動かなきゃ- others ○/意味ないじゃんwww-others × Fig. 7. Example of component extraction from comparative sentences
(1)
The comparative sentences that were not correctly
extracted with the “tfidf + part-of-speech + part-of-speech
sequence pattern” of TABLE IV are shown in Figure 8. In the
incorrectly extracted clauses, one of the noticeable cases is a
clause like “楽しくないんだろうなウイイレw” (ウイイ
レ will not be fun). There are some cases that cannot be
divided by the dependency analysis, i.e., between “な” and
“ウ” in this case. It becomes a correct answer if “楽しくない
んだろうな” (will not be fun) is judged as an “evaluation”
and “ウイイレ” is judged as an “object.” Also, although the
clause “やってるの (doing)” should be extracted as an
“attribute” that expresses the meaning of a “user,” it was
extracted as “others.”
Proceedings of the World Congress on Engineering and Computer Science 2013 Vol I WCECS 2013, 23-25 October, 2013, San Francisco, USA